Re: [PATCH] configure: actually disable 'git_update' mode with --disable-git-update
Assuming you're just using git for conveniently applying local downstream patches, you don't need the git repo to exist once getting to the build stage. IOW just delete the .git dir after applying patches before running a build. ...then what do you do if the build fails and you want to edit/update the patch before retrying? "Blow away your .git tree every time you build and reconstitute it somehow later" doesn't seem like a very friendly thing to require... +1. This option is disconnected with sustaining engineering reality IMHO: tons of interactive rebases, adding and dropping patches, re-orderings - so previous existing patches can allow the new ones (or even existing ones) to become clean cherry-picks - in between patch sets being worked on, bisections before continuing all this, etc.
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
I just pushed/uploaded a SRU for bionic from: https://code.launchpad.net/~rafaeldtinoco/ubuntu/+source/qemu/+git/qemu/+merge/387269 Waiting for SRU on it. ** Changed in: qemu (Ubuntu Bionic) Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. Unfortunately the Bionic SRU did not work and we had to reverse the change. Since then we had another update and now I'm retrying the SRU. After discussing with @paelzer (and @dannf as a reviewer) extensively, Christian and I agreed that we should scope this SRU as Aarch64 only AND I was much, much more conservative in question of what is being changed in the AIO qemu code. New code has been tested against the initial Test Case and the new one, regressed for Bionic. More information (about tests and discussion) can be found in the MR at ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic-refix BIONIC REGRESSION BUG: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] INITIAL * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. BIONIC REGRESSED ISSUE https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Description changed: + + SRU TEAM REVIEWER: This has already been SRUed for Focal, Eoan and Bionic. Unfortunately the Bionic SRU did not work and we had to reverse the change. Since then we had another update and now I'm retrying the SRU. + + After discussing with @paelzer (and @dannf as a reviewer) extensively, + Christian and I agreed that we should scope this SRU as Aarch64 only AND + I was much, much more conservative in question of what is being changed + in the AIO qemu code. + + New code has been tested against the initial Test Case and the new one, + regressed for Bionic. More information (about tests and discussion) can + be found in the MR at ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256 + -bionic-refix + + BIONIC REGRESSION BUG: + + https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 + [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] + + INITIAL * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. + + BIONIC REGRESSED ISSUE + + https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1885419 [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at
[Bug 1886811] Re: systemd complains Failed to enqueue loopback interface start request: Operation not supported
qemu (1:5.0-5ubuntu3) groovy; urgency=medium has the merge with this fix: - linux-user-add-netlink-RTM_SETLINK-command.patch (Closes: #964289) ** Changed in: qemu (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1886811 Title: systemd complains Failed to enqueue loopback interface start request: Operation not supported Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu package in Debian: Fix Released Bug description: This symptom seems similar to https://bugs.launchpad.net/qemu/+bug/1823790 Host Linux: Debian 11 Bullseye (testing) on x84-64 architecture qemu version: latest git of git commit hash eb2c66b10efd2b914b56b20ae90655914310c925 compiled with "./configure --static --disable-system" Down stream bug report at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=964289 Bug report (closed) to systemd: https://github.com/systemd/systemd/issues/16359 systemd in armhf and armel (both little endian 32-bit) containers fail to start with Failed to enqueue loopback interface start request: Operation not supported How to reproduce on Debian (and probably Ubuntu): mmdebstrap --components="main contrib non-free" --architectures=armhf --variant=important bullseye /var/lib/machines/armhf-bullseye systemd-nspawn -D /var/lib/machines/armhf-bullseye -b When "armhf" architecture is replaced with "mips" (32-bit big endian) or "ppc64" (64-bit big endian), the container starts up fine. The same symptom is also observed with "powerpc" (32-bit big endian) architecture. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1886811/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Status from old attempts to solve same nature issues: Older (2018) merge request from @raharper: https://github.com/koverstreet/bcache-tools/pull/1 addressing the fact that kernel uevents would not always emit CACHED_UUID parameters, making udev to delete (whenever that happens) /dev/bcache/{by-uuid,by-label} symlinks. This last MR pointed to previous related bugs: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890446 https://bugs.launchpad.net/curtin/+bug/1728742 And to an upstream kernel patch: https://lore.kernel.org/patchwork/patch/921298/ to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729145 that wasn't accepted upstream. Even not being accepted upstream, the SRU was attempted: LP: #1729145 https://lists.ubuntu.com/archives/kernel-team/2017-December/088680.html https://lists.ubuntu.com/archives/kernel-team/2017-December/088679.html Both were NACKED. Attempted again: https://lists.ubuntu.com/archives/kernel-team/2017-December/088682.html https://lists.ubuntu.com/archives/kernel-team/2017-December/088683.html NACKED again. And a v2 was sent: https://lists.ubuntu.com/archives/kernel-team/2017-December/088751.html https://lists.ubuntu.com/archives/kernel-team/2017-December/088750.html https://lists.ubuntu.com/archives/kernel-team/2017-December/088749.html and acked in January 2018 by Coling: https://lists.ubuntu.com/archives/kernel-team/2018-January/089492.html but not upstreamed. BIONIC contains the fix: commit ed9333e1b583 Author: Ryan Harper Date: Mon Dec 11 12:12:01 2017 UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent BugLink: http://bugs.launchpad.net/bugs/1729145 - decouple emitting a cached_dev CHANGE uevent which includes dev.uuid and dev.label from bch_cached_dev_run() which only happens when a bcacheX device is bound to the actual backing block device (bcache0 -> vdb) - update bch_cached_dev_run() to invoke bch_cached_dev_emit_change() as needed; no functional code path changes here - Modify register_bcache to detect a re-registering of a bcache cached_dev, and in that case call bcache_cached_dev_emit_change() to Signed-off-by: Ryan Harper Signed-off-by: Joseph Salisbury Acked-by: Colin Ian King Acked-by: Stefan Bader Signed-off-by: Khalid Elmously [ saf: fix incorrect indentation ] Signed-off-by: Seth Forshee FOCAL contains the fix: commit 67553dcd7905 Author: Ryan Harper Date: Mon Dec 11 12:12:01 2017 UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent GROOVY contains the fix: commit 67553dcd7905 Author: Ryan Harper Date: Mon Dec 11 12:12:01 2017 UBUNTU: SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent So, the kernel patch wasn't accepted, nor bcache-tools patch by @raharper, the bcache-export-cached. New Upstream summary from @raharper: https://github.com/systemd/systemd/pull/16317#issuecomment-655647313 in the upstream merge request made by @rbalint. ** Bug watch added: Debian Bug tracker #890446 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=890446 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
I've hidden last post as it was posted in the wrong bug. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Thanks @dannf! I spoke to Christian and him and I agreed to confine this change into ARM builds only (as SRU for Bionic). Preparing it... -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Worked being done for the Bionic SRU: BUG: https://bugs.launchpad.net/qemu/+bug/1805256 (fix for the bionic regression demonstrated at LP: #1885419) PPA: https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1805256-bionic MERGE: https://tinyurl.com/y8sucs6x Merge proposal currently going under review, tests and discussions. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Started working on this again... ** Changed in: qemu (Ubuntu Bionic) Status: Triaged => In Progress -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Fix Released Status in kunpeng920 ubuntu-20.04 series: Fix Released Status in kunpeng920 upstream-kernel series: Invalid Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: In Progress Status in qemu source package in Eoan: Fix Released Status in qemu source package in Focal: Fix Released Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0,
[Bug 1878134] Re: Assertion failures in ati_reg_read_offs/ati_reg_write_offs
Hello Alexander, I believe your fuzz test result was meant to the upstream project so I moved it. o/ ** Also affects: qemu Importance: Undecided Status: New ** No longer affects: qemu-kvm (Ubuntu) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1878134 Title: Assertion failures in ati_reg_read_offs/ati_reg_write_offs Status in QEMU: New Bug description: Hello, While fuzzing, I found inputs that trigger assertion failures in ati_reg_read_offs/ati_reg_write_offs uint32_t extract32(uint32_t, int, int): Assertion `start >= 0 && length > 0 && length <= 32 - start' failed #3 0x76866092 in __GI___assert_fail (assertion=0x56e760c0 "start >= 0 && length > 0 && length <= 32 - start", file=0x56e76120 "/home/alxndr/Development/qemu/include/qemu/bitops.h", line=0x12c, function=0x56e76180 <__PRETTY_FUNCTION__.extract32> "uint32_t extract32(uint32_t, int, int)") at assert.c:101 #4 0x5653d8a7 in ati_mm_read (opaque=, addr=0x1a, size=) at /home/alxndr/Development/qemu/include/qemu/log-for-trace.h:29 #5 0x5653c825 in ati_mm_read (opaque=, addr=0x4, size=) at /home/alxndr/Development/qemu/hw/display/ati.c:289 #6 0x5601446e in memory_region_read_accessor (mr=0x6314dc20, addr=, value=, size=, shift=, mask=, attrs=...) at /home/alxndr/Development/qemu/memory.c:434 #7 0x56001a70 in access_with_adjusted_size (addr=, value=, size=, access_size_min=, access_size_max=, access_fn=, mr=0x6314dc20, attrs=...) at /home/alxndr/Development/qemu/memory.c:544 #8 0x56001a70 in memory_region_dispatch_read1 (mr=0x6314dc20, addr=0x4, pval=, size=0x4, attrs=...) at /home/alxndr/Development/qemu/memory.c:1396 I can reproduce it in qemu 5.0 built with using: cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none outl 0xcf8 0x80001018 outl 0xcfc 0xe200 outl 0xcf8 0x8000101c outl 0xcf8 0x80001004 outw 0xcfc 0x7 outl 0xcf8 0x8000fa20 write 0xe204 0x1 0x1a readq 0xe200 EOF Similarly for ati_reg_write_offs: cat << EOF | ~/Development/qemu/build/i386-softmmu/qemu-system-i386 -M pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none outl 0xcf8 0x80001018 outl 0xcfc 0xe200 outl 0xcf8 0x8000101c outl 0xcf8 0x80001004 outw 0xcfc 0x7 outl 0xcf8 0x8000fa20 write 0xe200 0x8 0x6a006a00 EOF I also attached the traces to this launchpad report, in case the formatting is broken: qemu-system-i386 -M pc-q35-5.0 -device ati-vga -nographic -qtest stdio -monitor none -serial none < attachment Please let me know if I can provide any further info. -Alex To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1878134/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
FYIO, from now on all the "merge" work will be done in the merge requests being linked to this BUG (at the top). @paelzer will be verifying those. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: Triaged Status in kunpeng920 ubuntu-18.04-hwe series: Triaged Status in kunpeng920 ubuntu-19.10 series: Triaged Status in kunpeng920 ubuntu-20.04 series: Triaged Status in kunpeng920 upstream-kernel series: Fix Committed Status in QEMU: Fix Released Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: In Progress Status in qemu source package in Disco: In Progress Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: In Progress Bug description: [Impact] * QEMU locking primitives might face a race condition in QEMU Async I/O bottom halves scheduling. This leads to a dead lock making either QEMU or one of its tools to hang indefinitely. [Test Case] * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs in Aarch64. [Regression Potential] * This is a change to a core part of QEMU: The AIO scheduling. It works like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU AIO code is responsible to schedule QEMU coroutines or event listeners callbacks. * There was a long discussion upstream about primitives and Aarch64. After quite sometime Paolo released this patch and it solves the issue. Tested platforms were: amd64 and aarch64 based on his commit log. * Christian suggests that this fix stay little longer in -proposed to make sure it won't cause any regressions. * dannf suggests we also check for performance regressions; e.g. how long it takes to convert a cloud image on high-core systems. [Other Info] * Original Description bellow: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Description changed: + [Impact] + + * QEMU locking primitives might face a race condition in QEMU Async I/O + bottom halves scheduling. This leads to a dead lock making either QEMU + or one of its tools to hang indefinitely. + + [Test Case] + + * qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + + Hangs indefinitely approximately 30% of the runs in Aarch64. + + [Regression Potential] + + * This is a change to a core part of QEMU: The AIO scheduling. It works + like a "kernel" scheduler, whereas kernel schedules OS tasks, the QEMU + AIO code is responsible to schedule QEMU coroutines or event listeners + callbacks. + + * There was a long discussion upstream about primitives and Aarch64. + After quite sometime Paolo released this patch and it solves the issue. + Tested platforms were: amd64 and aarch64 based on his commit log. + + * Christian suggests that this fix stay little longer in -proposed to + make sure it won't cause any regressions. + + [Other Info] + + * Original Description bellow: + + Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). - [ Original Description ] - On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: New Status in kunpeng920 ubuntu-18.04-hwe
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Changed in: qemu (Ubuntu) Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned) ** Changed in: qemu Status: In Progress => Fix Released ** Changed in: qemu (Ubuntu Focal) Status: Incomplete => In Progress ** Changed in: qemu (Ubuntu Eoan) Status: Incomplete => In Progress ** Changed in: qemu (Ubuntu Disco) Status: Incomplete => In Progress ** Changed in: qemu (Ubuntu Bionic) Status: Incomplete => In Progress ** Changed in: qemu (Ubuntu) Status: Incomplete => In Progress -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: New Status in kunpeng920 ubuntu-18.04-hwe series: New Status in kunpeng920 ubuntu-19.10 series: New Status in kunpeng920 ubuntu-20.04 series: New Status in kunpeng920 upstream-kernel series: Fix Committed Status in QEMU: Fix Released Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: In Progress Status in qemu source package in Disco: In Progress Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: In Progress Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Hello Ike, Please, let me know if you want me to go after the needed SRUs for this fix or if you will. I'll wait for the final feedback from tests with your PPA. Cheers! -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Triaged Status in kunpeng920 ubuntu-18.04 series: New Status in kunpeng920 ubuntu-18.04-hwe series: New Status in kunpeng920 ubuntu-19.10 series: New Status in kunpeng920 ubuntu-20.04 series: New Status in kunpeng920 upstream-kernel series: Fix Committed Status in QEMU: In Progress Status in qemu package in Ubuntu: Incomplete Status in qemu source package in Bionic: Incomplete Status in qemu source package in Disco: Incomplete Status in qemu source package in Eoan: Incomplete Status in qemu source package in Focal: Incomplete Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Changed in: qemu (Ubuntu Eoan) Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned) ** Changed in: qemu Assignee: Rafael David Tinoco (rafaeldtinoco) => (unassigned) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Incomplete Status in QEMU: In Progress Status in qemu package in Ubuntu: Incomplete Status in qemu source package in Bionic: Incomplete Status in qemu source package in Disco: Incomplete Status in qemu source package in Eoan: Incomplete Status in qemu source package in Focal: Incomplete Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Hello Fred, Based on Dann's feedback on testing, I'm failing to see where your patch fixes the "root" cause (despite being able to mitigate the issue by changing the aio notification mechanism). I think the root cause is best described in this 2 emails from the thread: https://lore.kernel.org/qemu-devel/20191009080220.GA2905@hc/ and https://lore.kernel.org/qemu-devel/966c119d-aa76-2149-108f- 867aebd77...@redhat.com/ So, by adding ctx->notify_for_convert, it is very likely you workarounded the issue by doing what Jan already said: removing both variables (ctx->list_lock and, in old case, ctx->notify_me, in your case, ctx->notify_for_convert) from the same cacheline and making the issue to "disappear" (as we would eventually do in a workaround patch). What about aarch64 issue with both, ctx->list_lock and ctx->notify_for_convert, being synchronized by qemu used primitives, and being in the same cache line ? Any "workaround" here would try to dodge the same cacheline situation, but, for upstream, I suppose Paolo wants to have something else regarding aarch64 ATOMIC_SEQ_CST. like describe in this part of the discussion: https://lore.kernel.org/qemu-devel/96c26e21-5996-0c63-ce8b- 99a1b5473...@redhat.com/ Unless I'm missing something, am I ? Thank you! -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in kunpeng920: Confirmed Status in QEMU: In Progress Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Disco: Confirmed Status in qemu source package in Eoan: In Progress Status in qemu source package in Focal: Confirmed Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
On 02/10/19 16:58, Torvald Riegel wrote: > This example looks like Dekker synchronization (if I get the intent right). It is the same pattern. However, one of the two synchronized variables is a counter rather than just a flag. > Two possible implementations of this are either (1) with all memory > accesses having seq-cst MO, or (2) with relaxed-MO accesses and seq-cst > fences on between the store and load on both ends. It's possible to mix > both, but that get's trickier I think. I'd prefer the one with just > fences, just because it's easiest, conceptually. Got it. I'd also prefer the one with just fences, because we only really control one side of the synchronization primitive (ctx_notify_me in my litmus test) and I don't like the idea of forcing seq-cst MO on the other side (bh_scheduled). The performance issue that I mentioned is that x86 doesn't have relaxed fetch and add, so you'd have a redundant fence like this: lockxaddl $2, mem1 mfence ... movlmem1, %r8 (Gory QEMU details however allow us to use relaxed load and store here, because there's only one writer). > It works if you use (1) or (2) consistently. cppmem and the Batty et al. > tech report should give you the gory details. > >> 1) understand why ATOMIC_SEQ_CST is not enough in this case. QEMU code >> seems to be making the same assumptions as Linux about the memory model, >> and this is wrong because QEMU uses C11 atomics if available. >> Fortunately, this kind of synchronization in QEMU is relatively rare and >> only this particular bit seems affected. If there is a fix which stays >> within the C11 memory model, and does not pessimize code on x86, we can >> use it[1] and document the pitfall. > > Using the fences between the store/load pairs in Dekker-like > synchronization should do that, right? It's also relatively easy to deal > with. > >> 2) if there's no way to fix the bug, qemu/atomic.h needs to switch to >> __sync_fetch_and_add and friends. And again, in this case the >> difference between the C11 and Linux/QEMU memory models must be documented. > > I surely not aware of all the constraints here, but I'd be surprised if the > C11 memory model isn't good enough for portable synchronization code (with > the exception of the consume MO minefield, perhaps). This helps a lot already; I'll work on a documentation and code patch. Thanks very much. Paolo >> int main() { >> atomic_int ctx_notify_me = 0; >> atomic_int bh_scheduled = 0; >> {{{ { >> bh_scheduled.store(1, mo_release); >> atomic_thread_fence(mo_seq_cst); >> // must be zero since the bug report shows no notification >> ctx_notify_me.load(mo_relaxed).readsvalue(0); >> } >> ||| { >> ctx_notify_me.store(2, mo_seq_cst); >> r2=bh_scheduled.load(mo_relaxed); >> } >> }}}; >> return 0; >> } ** Changed in: qemu (Ubuntu Disco) Importance: Undecided => Medium ** Changed in: qemu (Ubuntu Bionic) Importance: Undecided => Medium ** Changed in: qemu (Ubuntu Ff-series) Importance: Undecided => Medium -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: New Status in qemu source package in Disco: New Status in qemu source package in Eoan: In Progress Status in qemu source package in FF-Series: New Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)]
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
On Wed, 2019-10-02 at 15:20 +0200, Paolo Bonzini wrote: > On 02/10/19 13:05, Jan Glauber wrote: >> The arm64 code generated for the >> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any >> memory barriers. It is just plain ldaxr/stlxr. >> >> From my understanding this is not sufficient for SMP sync. >> If I read this comment correct: void aio_notify(AioContext *ctx) { /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. */ smp_mb(); if (ctx->notify_me) { it points out that the smp_mb() should be paired. But as I said the used atomics don't generate any barriers at all. >>> >>> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub >>> are defined as sequentially consistent: >>> >>> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, >>> __ATOMIC_SEQ_CST)) >>> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, >>> __ATOMIC_SEQ_CST)) >> >> Compiler bug sounds kind of unlikely... > > Indeed the assembly produced by the compiler matches for example the > mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. A > small testcase is as follows: > > int ctx_notify_me; > int bh_scheduled; > > int x() > { > int one = 1; > int ret; > __atomic_store(_scheduled, , __ATOMIC_RELEASE); // x1 > __atomic_thread_fence(__ATOMIC_SEQ_CST); // x2 > __atomic_load(_notify_me, , __ATOMIC_RELAXED); // x3 > return ret; > } > > int y() > { > int ret; > __atomic_fetch_add(_notify_me, 2, __ATOMIC_SEQ_CST); // y1 > __atomic_load(_scheduled, , __ATOMIC_RELAXED); // y2 > return ret; > } > > Here y (which is aio_poll) wants to order the write to ctx->notify_me > before reads of bh->scheduled. However, the processor can speculate the > load of bh->scheduled between the load-acquire and store-release of > ctx->notify_me. So you can have something like: > > thread 0 (y) thread 1 (x) > --- - > y1: load-acq ctx->notify_me > y2: load-rlx bh->scheduled >x1: store-rel bh->scheduled <-- 1 >x2: memory barrier >x3: load-rlx ctx->notify_me > y1: store-rel ctx->notify_me <-- 2 > > Being very puzzled, I tried to put this into cppmem: > > int main() { > atomic_int ctx_notify_me = 0; > atomic_int bh_scheduled = 0; > {{{ { > bh_scheduled.store(1, mo_release); > atomic_thread_fence(mo_seq_cst); > // must be zero since the bug report shows no notification > ctx_notify_me.load(mo_relaxed).readsvalue(0); > } > ||| { > ctx_notify_me.store(2, mo_seq_cst); > r2=bh_scheduled.load(mo_relaxed); > } > }}}; > return 0; > } > > and much to my surprise, the tool said r2 *can* be 0. Same if I put a > CAS like > > cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, > mo_seq_cst, mo_seq_cst); > > which resembles the code in the test case a bit more. This example looks like Dekker synchronization (if I get the intent right). Two possible implementations of this are either (1) with all memory accesses having seq-cst MO, or (2) with relaxed-MO accesses and seq-cst fences on between the store and load on both ends. It's possible to mix both, but that get's trickier I think. I'd prefer the one with just fences, just because it's easiest, conceptually. > I then found a discussion about using the C11 memory model in Linux > (https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) which contains the > following statement, which is a bit disheartening even though it is > about a different test: > >My first gut feeling was that the assertion should never fire, but >that was wrong because (as I seem to usually forget) the seq-cst >total order is just a constraint but doesn't itself contribute >to synchronizes-with -- but this is different for seq-cst fences. It works if you use (1) or (2) consistently. cppmem and the Batty et al. tech report should give you the gory details. My comment is just about seq-cst working differently on memory accesses vs. fences (in the way it's specified in the memory model). > and later in the thread: > >Use of C11 atomics to implement Linux kernel atomic operations >requires knowledge of the underlying architecture and the compiler's >implementation, as was noted earlier in this thread. > > Indeed if I add an atomic_thread_fence I get only one valid execution, > where r2 must be 1. This is similar to GCC's bug > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697, and we can fix it in > QEMU by using
[Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
Documenting this here as bug# was dropped from the mail thread: On 02/10/19 13:05, Jan Glauber wrote: > The arm64 code generated for the > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any > memory barriers. It is just plain ldaxr/stlxr. > > From my understanding this is not sufficient for SMP sync. > >>> If I read this comment correct: >>> >>> void aio_notify(AioContext *ctx) >>> { >>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs >>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. >>> */ >>> smp_mb(); >>> if (ctx->notify_me) { >>> >>> it points out that the smp_mb() should be paired. But as >>> I said the used atomics don't generate any barriers at all. >> >> Awesome! That would be a compiler bug though, as atomic_add and atomic_sub >> are defined as sequentially consistent: >> >> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, >> __ATOMIC_SEQ_CST)) >> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, >> __ATOMIC_SEQ_CST)) > > Compiler bug sounds kind of unlikely... Indeed the assembly produced by the compiler matches for example the mappings at https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html. A small testcase is as follows: int ctx_notify_me; int bh_scheduled; int x() { int one = 1; int ret; __atomic_store(_scheduled, , __ATOMIC_RELEASE); // x1 __atomic_thread_fence(__ATOMIC_SEQ_CST); // x2 __atomic_load(_notify_me, , __ATOMIC_RELAXED); // x3 return ret; } int y() { int ret; __atomic_fetch_add(_notify_me, 2, __ATOMIC_SEQ_CST); // y1 __atomic_load(_scheduled, , __ATOMIC_RELAXED); // y2 return ret; } Here y (which is aio_poll) wants to order the write to ctx->notify_me before reads of bh->scheduled. However, the processor can speculate the load of bh->scheduled between the load-acquire and store-release of ctx->notify_me. So you can have something like: thread 0 (y) thread 1 (x) --- - y1: load-acq ctx->notify_me y2: load-rlx bh->scheduled x1: store-rel bh->scheduled <-- 1 x2: memory barrier x3: load-rlx ctx->notify_me y1: store-rel ctx->notify_me <-- 2 Being very puzzled, I tried to put this into cppmem: int main() { atomic_int ctx_notify_me = 0; atomic_int bh_scheduled = 0; {{{ { bh_scheduled.store(1, mo_release); atomic_thread_fence(mo_seq_cst); // must be zero since the bug report shows no notification ctx_notify_me.load(mo_relaxed).readsvalue(0); } ||| { ctx_notify_me.store(2, mo_seq_cst); r2=bh_scheduled.load(mo_relaxed); } }}}; return 0; } and much to my surprise, the tool said r2 *can* be 0. Same if I put a CAS like cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, mo_seq_cst, mo_seq_cst); which resembles the code in the test case a bit more. I then found a discussion about using the C11 memory model in Linux (https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) which contains the following statement, which is a bit disheartening even though it is about a different test: My first gut feeling was that the assertion should never fire, but that was wrong because (as I seem to usually forget) the seq-cst total order is just a constraint but doesn't itself contribute to synchronizes-with -- but this is different for seq-cst fences. and later in the thread: Use of C11 atomics to implement Linux kernel atomic operations requires knowledge of the underlying architecture and the compiler's implementation, as was noted earlier in this thread. Indeed if I add an atomic_thread_fence I get only one valid execution, where r2 must be 1. This is similar to GCC's bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697, and we can fix it in QEMU by using __sync_fetch_and_add; in fact cppmem also shows one valid execution if the store is replaced with something like GCC's assembly for __sync_fetch_and_add (or Linux's assembly for atomic_add_return): cas_strong_explicit(ctx_notify_me.readsvalue(0), 0, 2, mo_release, mo_release); atomic_thread_fence(mo_seq_cst); So we should: 1) understand why ATOMIC_SEQ_CST is not enough in this case. QEMU code seems to be making the same assumptions as Linux about the memory model, and this is wrong because QEMU uses C11 atomics if available. Fortunately, this kind of synchronization in QEMU is relatively rare and only this particular bit seems affected. If there is a fix which stays within the C11 memory model, and does not pessimize code on x86, we can use it[1] and document the pitfall. 2) if there's no way
Re: Thoughts on VM fence infrastructure
>>> There are times when the main loop can get blocked even though the CPU >>> threads can be running and can in some configurations perform IO >>> even without the main loop (I think!). >> Ah, that's a very good point. Indeed, you can perform IO in those >> cases specially when using vhost devices. >> >>> By setting a timer in the kernel that sends a signal to qemu, the kernel >>> will send that signal however broken qemu is. >> Got you now. That's probably better. Do you reckon a signal is >> preferable over SIGEV_THREAD? > Not sure; probably the safest is getting the kernel to SIGKILL it - but > that's a complete nightmare to debug - your process just goes *pop* > with no apparent reason why. > I've not used SIGEV_THREAD - it looks promising though. Sorry to "enter" the discussion, but, in "real" HW, its not by accident that watchdog devices timeout generates a NMI to CPUs, causing the kernel to handle the interrupt - and panic (or to take other action set by specific watchdog drivers that re-implements the default ones). Can't you simple "inject" a NMI in all guest vCPUs BEFORE you take any action in QEMU itself? Just like the virtual watchdog device would do, from inside the guest (/dev/watchdog), but capable of being updated by outside, in this case of yours (if I understood correctly). Possibly you would have to have a dedicated loop for this "watchdog device" (AIO threads ?) not to compete with existing coroutines/BH Tasks and their jittering on your "realtime watchdog needs". Regarding remaining existing I/OS for the guest's devices in question (vhost/vhost-user etc), would be just like a real host where the "bus" received commands, but sender died right after...
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Also affects: qemu (Ubuntu Ff-series) Importance: Undecided Status: New ** Also affects: qemu (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: qemu (Ubuntu Eoan) Importance: Medium Assignee: Rafael David Tinoco (rafaeldtinoco) Status: In Progress ** Also affects: qemu (Ubuntu Disco) Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Bionic: New Status in qemu source package in Disco: New Status in qemu source package in Eoan: In Progress Status in qemu source package in FF-Series: New Bug description: Command: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). [ Original Description ] On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
> Zhengui's theory that notify_me doesn't work properly on ARM is more > promising, but he couldn't provide a clear explanation of why he thought > notify_me is involved. In particular, I would have expected notify_me to > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example: > > > glib_pollfds_fill > g_main_context_prepare > aio_ctx_prepare > atomic_or(>notify_me, 1) > qemu_poll_ns > glib_pollfds_poll > g_main_context_check > aio_ctx_check > atomic_and(>notify_me, ~1) > g_main_context_dispatch > aio_ctx_dispatch > /* do something for event */ > qemu_poll_ns > Paolo, I tried confining execution in a single NUMA domain (cpu & mem) and still faced the issue, then, I added a mutex "ctx->notify_me_lcktest" into context to protect "ctx->notify_me", like showed bellow, and it seems to have either fixed or mitigated it. I was able to cause the hung once every 3 or 4 runs. I have already ran qemu-img convert more than 30 times now and couldn't reproduce it again. Next step is to play with the barriers and check why existing ones aren't enough for ordering access to ctx->notify_me ... or should I try/do something else in your opinion ? This arch/machine (Huawei D06): $ lscpu Architecture:aarch64 Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Thread(s) per core: 1 Core(s) per socket: 48 Socket(s): 2 NUMA node(s):4 Vendor ID: 0x48 Model: 0 Stepping:0x0 CPU max MHz: 2000. CPU min MHz: 200. BogoMIPS:200.00 L1d cache: 64K L1i cache: 64K L2 cache:512K L3 cache:32768K NUMA node0 CPU(s): 0-23 NUMA node1 CPU(s): 24-47 NUMA node2 CPU(s): 48-71 NUMA node3 CPU(s): 72-95 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm dcpop diff --git a/include/block/aio.h b/include/block/aio.h index 0ca25dfec6..0724086d91 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -84,6 +84,7 @@ struct AioContext { * dispatch phase, hence a simple counter is enough for them. */ uint32_t notify_me; +QemuMutex notify_me_lcktest; /* A lock to protect between QEMUBH and AioHandler adders and deleter, * and to ensure that no callbacks are removed while we're walking and diff --git a/util/aio-posix.c b/util/aio-posix.c index 51c41ed3c9..031d6e2997 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx, int64_t max_ns, int64_t *timeout) bool progress; int64_t start_time, elapsed_time; +qemu_mutex_lock(>notify_me_lcktest); assert(ctx->notify_me); +qemu_mutex_unlock(>notify_me_lcktest); assert(qemu_lockcnt_count(>list_lock) > 0); trace_run_poll_handlers_begin(ctx, max_ns, *timeout); @@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking) * so disable the optimization now. */ if (blocking) { +qemu_mutex_lock(>notify_me_lcktest); assert(in_aio_context_home_thread(ctx)); atomic_add(>notify_me, 2); +qemu_mutex_unlock(>notify_me_lcktest); } qemu_lockcnt_inc(>list_lock); @@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking) } if (blocking) { +qemu_mutex_lock(>notify_me_lcktest); atomic_sub(>notify_me, 2); aio_notify_accept(ctx); +qemu_mutex_unlock(>notify_me_lcktest); } /* Adjust polling time */ diff --git a/util/async.c b/util/async.c index c10642a385..140e1e86f5 100644 --- a/util/async.c +++ b/util/async.c @@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint*timeout) { AioContext *ctx = (AioContext *) source; +qemu_mutex_lock(>notify_me_lcktest); atomic_or(>notify_me, 1); +qemu_mutex_unlock(>notify_me_lcktest); /* We assume there is no timeout already supplied */ *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)); @@ -239,8 +241,10 @@ aio_ctx_check(GSource *source) AioContext *ctx = (AioContext *) source; QEMUBH *bh; +qemu_mutex_lock(>notify_me_lcktest); atomic_and(>notify_me, ~1); aio_notify_accept(ctx); +qemu_mutex_unlock(>notify_me_lcktest); for (bh = ctx->first_bh; bh; bh = bh->next) { if (bh->scheduled) { @@ -346,11 +350,13 @@ void aio_notify(AioContext *ctx) /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. */ -smp_mb(); +//smp_mb(); +qemu_mutex_lock(>notify_me_lcktest); if (ctx->notify_me) { event_notifier_set(>notifier); atomic_mb_set(>notified, true); } +qemu_mutex_unlock(>notify_me_lcktest); } void aio_notify_accept(AioContext *ctx) @@ -424,6 +430,8 @@ AioContext *aio_context_new(Error
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
> Note that the RCU thread is expected to sit most of the time doing > nothing, so I don't think this matters. Agreed. > Zhengui's theory that notify_me doesn't work properly on ARM is more > promising, but he couldn't provide a clear explanation of why he thought > notify_me is involved. In particular, I would have expected notify_me to > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example: > > > glib_pollfds_fill > g_main_context_prepare > aio_ctx_prepare > atomic_or(>notify_me, 1) > qemu_poll_ns > glib_pollfds_poll > g_main_context_check > aio_ctx_check > atomic_and(>notify_me, ~1) > g_main_context_dispatch > aio_ctx_dispatch > /* do something for event */ > qemu_poll_ns > Yep, will focus there. > > Can you place somewhere your util/async.o object file for me to look at it? Sure! https://send.firefox.com/download/45c26bbe1075eea1/#ZD_e_96imPG2QuDqaX-jhg Note: this async.o has value as int, EV_BUSY as 3, aborts if any errno in qemu_futex() and uses >value as 1st argument to wake/wait (as in https://pastebin.ubuntu.com/p/xk8D6H6kgM/). > > You could change it to 3, but it has to have all the bits in EV_FREE > (see atomic_or(>value, EV_FREE) in qemu_event_reset). > > You could also change it to -1u, but I don't see a particular need to do so. > Yep, it was a dead end on my side. >> - Should qemu_event_set() check return code from >> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY >> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ? > > Why would it need to do so? > No need, just realized after I saw no tasks waking that thread up. Like you said, ctx->notify_me seems more promising, will give it a try.
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Description changed: + Command: + + qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + + Hangs indefinitely approximately 30% of the runs. + + + + Workaround: + + qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + + Run "qemu-img convert" with "a single coroutine" to avoid this issue. + + + + (gdb) thread 1 + ... + (gdb) bt + #0 0xbf1ad81c in __GI_ppoll + #1 0xaabcf73c in ppoll + #2 qemu_poll_ns + #3 0xaabd0764 in os_host_main_loop_wait + #4 main_loop_wait + ... + + (gdb) thread 2 + ... + (gdb) bt + #0 syscall () + #1 0xaabd41cc in qemu_futex_wait + #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) + #3 0xaabed05c in call_rcu_thread + #4 0xaabd34c8 in qemu_thread_start + #5 0xbf25c880 in start_thread + #6 0xbf1b6b9c in thread_start () + + (gdb) thread 3 + ... + (gdb) bt + #0 0xbf11aa20 in __GI___sigtimedwait + #1 0xbf2671b4 in __sigwait + #2 0xaabd1ddc in sigwait_compat + #3 0xaabd34c8 in qemu_thread_start + #4 0xbf25c880 in start_thread + #5 0xbf1b6b9c in thread_start + + + + (gdb) run + Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 + ./disk01.ext4.qcow2 ./output.qcow2 + + [New Thread 0xbec5ad90 (LWP 72839)] + [New Thread 0xbe459d90 (LWP 72840)] + [New Thread 0xbdb57d90 (LWP 72841)] + [New Thread 0xacac9d90 (LWP 72859)] + [New Thread 0xa7ffed90 (LWP 72860)] + [New Thread 0xa77fdd90 (LWP 72861)] + [New Thread 0xa6ffcd90 (LWP 72862)] + [New Thread 0xa67fbd90 (LWP 72863)] + [New Thread 0xa5ffad90 (LWP 72864)] + + [Thread 0xa5ffad90 (LWP 72864) exited] + [Thread 0xa6ffcd90 (LWP 72862) exited] + [Thread 0xa77fdd90 (LWP 72861) exited] + [Thread 0xbdb57d90 (LWP 72841) exited] + [Thread 0xa67fbd90 (LWP 72863) exited] + [Thread 0xacac9d90 (LWP 72859) exited] + [Thread 0xa7ffed90 (LWP 72860) exited] + + + """ + + All the tasks left are blocked in a system call, so no task left to call + qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock + thread #1 (doing poll() in a pipe with thread #2). + + Those 7 threads exit before disk conversion is complete (sometimes in + the beginning, sometimes at the end). + + + + [ Original Description ] + On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt - #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, - timeout=, timeout@entry=0x0, sigmask=0xc123b950) - at ../sysdeps/unix/sysv/linux/ppoll.c:39 - #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, - __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 - #2 qemu_poll_ns (fds=, nfds=, - timeout=timeout@entry=-1) at util/qemu-timer.c:322 + #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, + timeout=, timeout@entry=0x0, sigmask=0xc123b950) + at ../sysdeps/unix/sysv/linux/ppoll.c:39 + #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, + __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 + #2 qemu_poll_ns (fds=, nfds=, + timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) - at util/main-loop.c:233 + at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) ** Description changed: Command: - qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 + qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Hangs indefinitely approximately 30% of the runs. Workaround: qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Run "qemu-img convert" with "a single coroutine" to avoid this issue. (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start
Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Quick update... > value INT_MAX (4294967295) seems WRONG for qemu_futex_wait(): > > - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *, > unsigned), is a two's complement, making argument into a INT_MAX when > that's not what is expected (unless I missed something). > > *** If that is the case, unsure if you, Paolo, prefer declaring > *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay > here *** > > BUG: description: > https://bugs.launchpad.net/qemu/+bug/1805256/comments/15 I realized this might be intentional, but, still, I tried: https://pastebin.ubuntu.com/p/6rkkY6fJdm/ looking for anything that could have misbehaved in arm64 (specially concerned on casting and type conversions between the functions). > QUESTION: > > - Should qemu_event_set() check return code from > qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY > waiter was ever woken up ? Maybe even loop until at least 1 is awaken ? And I also tried: -qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0); +while(qemu_futex(pval, FUTEX_WAKE, val, NULL, NULL, 0) == 0) +continue; and it made little difference (took way more time for me to reproduce the issue though): """ (gdb) run Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2 ./disk01.ext4.qcow2 ./output.qcow2 [New Thread 0xbec5ad90 (LWP 72839)] [New Thread 0xbe459d90 (LWP 72840)] [New Thread 0xbdb57d90 (LWP 72841)] [New Thread 0xacac9d90 (LWP 72859)] [New Thread 0xa7ffed90 (LWP 72860)] [New Thread 0xa77fdd90 (LWP 72861)] [New Thread 0xa6ffcd90 (LWP 72862)] [New Thread 0xa67fbd90 (LWP 72863)] [New Thread 0xa5ffad90 (LWP 72864)] [Thread 0xa5ffad90 (LWP 72864) exited] [Thread 0xa6ffcd90 (LWP 72862) exited] [Thread 0xa77fdd90 (LWP 72861) exited] [Thread 0xbdb57d90 (LWP 72841) exited] [Thread 0xa67fbd90 (LWP 72863) exited] [Thread 0xacac9d90 (LWP 72859) exited] [Thread 0xa7ffed90 (LWP 72860) exited] """ All the tasks left are blocked in a system call, so no task left to call qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock thread #1 (doing poll() in a pipe with thread #2). Those 7 threads exit before disk conversion is complete (sometimes in the beginning, sometimes at the end). I'll try to check why those tasks exited. Any thoughts ? Tks
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
In comment #14, please disregard the second half of the issue, related to: 0xaabd4100 <+16>: cbz w1, 0xaabd4108 0xaabd4104 <+20>: ret 0xaabd4108 <+24>: ldaxr w1, [x0] 0xaabd410c <+28>: orr w1, w1, #0x1 => 0xaabd4110 <+32>: stlxr w2, w1, [x0] 0xaabd4114 <+36>: cbnz w2, 0xaabd4108 Duh! This is just a regular load/xor/store logic for atomic_or() inside qemu_event_reset(). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
Paolo, While debugging hungs in ARM64 while doing a simple: qemu-img convert -f qcow2 -O qcow2 file.qcow2 output.qcow2 I might have found 2 issues which I'd like you to review, if possible. ISSUE #1 I've caught the following stack trace after an HUNG in qemu-img convert: (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} value INT_MAX (4294967295) seems WRONG for qemu_futex_wait(): - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *, unsigned), is a two's complement, making argument into a INT_MAX when that's not what is expected (unless I missed something). *** If that is the case, unsure if you, Paolo, prefer declaring *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay here *** BUG: description: https://bugs.launchpad.net/qemu/+bug/1805256/comments/15 ISSUE #2 I found this when debugging lockups while in futex() in a specific ARM64 server - https://bugs.launchpad.net/qemu/+bug/1805256 - which I'm still investigating. After fixing the issue above, I'm still getting stuck into: qemu_event_wait() -> qemu_futex_wait() *** As if qemu_event_set() has ran before qemu_futex_wait() ever started running *** The Other threads are waiting for poll() on a PIPE coming from this stuck thread (thread #1), and in sigwait(): (gdb) thread 1 ... (gdb) bt #0 0xbf1ad81c in __GI_ppoll #1 0xaabcf73c in ppoll #2 qemu_poll_ns #3 0xaabd0764 in os_host_main_loop_wait #4 main_loop_wait ... (gdb) thread 2 ... (gdb) bt #0 syscall () #1 0xaabd41cc in qemu_futex_wait #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) #3 0xaabed05c in call_rcu_thread #4 0xaabd34c8 in qemu_thread_start #5 0xbf25c880 in start_thread #6 0xbf1b6b9c in thread_start () (gdb) thread 3 ... (gdb) bt #0 0xbf11aa20 in __GI___sigtimedwait #1 0xbf2671b4 in __sigwait #2 0xaabd1ddc in sigwait_compat #3 0xaabd34c8 in qemu_thread_start #4 0xbf25c880 in start_thread #5 0xbf1b6b9c in thread_start QUESTION: - Should qemu_event_set() check return code from qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY waiter was ever woken up ? Maybe even loop until at least 1 is awaken ? Tks in advance, Rafael D. Tinoco
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
QEMU BUG: #1 Alright, one of the issues is (according to comment #14): """ Meaning that code is waiting for a futex inside kernel. (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. rcu_call_ready_event->value is only touched by: qemu_event_init() -> bool init ? EV_SET : EV_FREE qemu_event_reset() -> atomic_or(>value, EV_FREE) qemu_event_set() -> atomic_xchg(>value, EV_SET) qemu_event_wait() -> atomic_cmpxchg(>value, EV_FREE, EV_BUSY)' """ Now I know why rcu_call_ready_event->value is set to INT_MAX. That is because in the following declaration: struct QemuEvent { #ifndef __linux__ pthread_mutex_t lock; pthread_cond_t cond; #endif unsigned value; bool initialized; }; #define EV_SET 0 #define EV_FREE1 #define EV_BUSY -1 "value" is declared as unsigned, but EV_BUSY sets it to -1, and, according to the Two's Complement Operation (https://en.wikipedia.org/wiki/Two%27s_complement), it will be INT_MAX (4294967295). So this is the "first bug" found AND it is definitely funny that this hasn't been seen in other architectures at all... I can reproduce it at will. With that said, it seems that there is still another issue causing (less frequently): (gdb) thread 2 [Switching to thread 2 (Thread 0xbec5ad90 (LWP 17459))] #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 38 ../sysdeps/unix/sysv/linux/aarch64/syscall.S: No such file or directory. (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xaabd41cc in qemu_futex_wait (val=, f=) at ./util/qemu-thread-posix.c:438 #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 ) at ./util/qemu-thread-posix.c:442 #3 0xaabed05c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 #4 0xaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #5 0xbf25c880 in start_thread (arg=0xf5bf) at pthread_create.c:486 #6 0xbf1b6b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 2 to be stuck at "futex()" kernel syscall (like the FUTEX_WAKE never happened and/or wasn't atomic for this arch/binary). Need to investigate this also. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images
** Summary changed: - qemu-img hangs on high core count ARM system + qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images ** Changed in: qemu Status: Confirmed => In Progress ** Changed in: qemu Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, here is what is happening: Whenever program is stuck, thread #2 backtrace is this: (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xaabd41b0 in qemu_futex_wait (val=, f=) at ./util/qemu-thread-posix.c:438 #2 qemu_event_wait (ev=ev@entry=0xaac87ce8 ) at ./util/qemu-thread-posix.c:442 #3 0xaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 #4 0xaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #5 0xbf26a880 in start_thread (arg=0xf5bf) at pthread_create.c:486 #6 0xbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Meaning that code is waiting for a futex inside kernel. (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. rcu_call_ready_event->value is only touched by: qemu_event_init() -> bool init ? EV_SET : EV_FREE qemu_event_reset() -> atomic_or(>value, EV_FREE) qemu_event_set() -> atomic_xchg(>value, EV_SET) qemu_event_wait() -> atomic_cmpxchg(>value, EV_FREE, EV_BUSY)' And there should be no 0x7fff value for "ev->value". qemu_event_init() is the one initializing the global: static QemuEvent rcu_call_ready_event; and it is called by "rcu_init_complete()" which is called by "rcu_init()": static void __attribute__((__constructor__)) rcu_init(void) a constructor function. So, "fixing" this issue by: (gdb) print rcu_call_ready_event $8 = {value = 4294967295, initialized = true} (gdb) watch rcu_call_ready_event Hardware watchpoint 1: rcu_call_ready_event (gdb) set rcu_call_ready_event.initialized = 1 (gdb) set rcu_call_ready_event.value = 0 and note that I added a watchpoint to rcu_call_ready_event global: Thread 1 "qemu-img" received signal SIGINT, Interrupt. (gdb) thread 2 [Switching to thread 2 (Thread 0xbec61d90 (LWP 33625))] (gdb) bt #0 0xaabd4110 in qemu_event_reset (ev=ev@entry=0xaac87ce8 ) #1 0xaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255 #2 0xaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #3 0xbf26a880 in start_thread (arg=0xf5bf) at pthread_create.c:486 #4 0xbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) print rcu_call_ready_event $9 = {value = 0, initialized = true} You can see I advanced in the qemu_event_{reset,set,wait} logic. (gdb) disassemble /m 0xaabd4110 Dump of assembler code for function qemu_event_reset: 408 in ./util/qemu-thread-posix.c 409 in ./util/qemu-thread-posix.c 410 in ./util/qemu-thread-posix.c 411 in ./util/qemu-thread-posix.c 0xaabd40f0 <+0>: ldrbw1, [x0, #4] 0xaabd40f4 <+4>: cbz w1, 0xaabd411c 0xaabd411c <+44>:stp x29, x30, [sp, #-16]! 0xaabd4120 <+48>:adrpx3, 0xaac2 0xaabd4124 <+52>:add x3, x3, #0x908 0xaabd4128 <+56>:mov x29, sp 0xaabd412c <+60>:adrpx1, 0xaac2 0xaabd4130 <+64>:adrpx0, 0xaac2 0xaabd4134 <+68>:add x3, x3, #0x290 0xaabd4138 <+72>:add x1, x1, #0xc00 0xaabd413c <+76>:add x0, x0, #0xd40 0xaabd4140 <+80>:mov w2, #0x19b// #411 0xaabd4144 <+84>:bl 0xaaaff190 <__assert_fail@plt> 412 in ./util/qemu-thread-posix.c 0xaabd40f8 <+8>: ldr w1, [x0] 413 in ./util/qemu-thread-posix.c 0xaabd40fc <+12>:dmb ishld 414 in ./util/qemu-thread-posix.c 0xaabd4100 <+16>:cbz w1, 0xaabd4108 0xaabd4104 <+20>:ret 0xaabd4108 <+24>:ldaxr w1, [x0] 0xaabd410c <+28>:orr w1, w1, #0x1 => 0xaabd4110 <+32>:stlxr w2, w1, [x0] 0xaabd4114 <+36>:cbnzw2, 0xaabd4108 0xaabd4118 <+40>:ret And I'm currently inside the STLXR and LDAXR logic. To make sure my program counter is advancing, I added a breakpoint at 0xaabd4108, so CBNZ instruction would branch indefinitely into LDXAR instruction again, until the LDAXR<->STLXR logic is satisfied (inside qemu_event_wait()). (gdb) break *(0xaabd4108) Breakpoint 2 at 0xaabd4108: file ./util/qemu-thread-posix.c, line 414. which is basically this: if (value == EV_SET) {EV_SET == 0 atomic_or(>value, EV_FREE); EV_FREE = 1 } and we can see that this logic being called one time after another: (gdb) c Thread 2 "qemu-img" hit Breakpoint 3, 0xaabd4108 in qemu_event_reset (
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, I'm still investigating this but wanted to share some findings... I haven't got a kernel dump yet after the task is frozen, I have analyzed only the userland part of it (although I have checked if code was running inside kernel with perf cycles:u/cycles:k at some point). The big picture is this: Whenever qemu-img hangs, we have 3 hung tasks basically with these stacks: TRHREAD #1 __GI_ppoll (../sysdeps/unix/sysv/linux/ppoll.c:39) ppoll (/usr/include/aarch64-linux-gnu/bits/poll2.h:77) qemu_poll_ns (./util/qemu-timer.c:322) os_host_main_loop_wait (./util/main-loop.c:233) main_loop_wait (./util/main-loop.c:497) convert_do_copy (./qemu-img.c:1981) img_convert (./qemu-img.c:2457) main (./qemu-img.c:4976) got stack traces: ./33293/stack ./33293/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sys_poll+0x508/0x5c0 [<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_ppoll+0xc0/0x118 [<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 [<0>] work_pending+0x8/0x10[<0>] el0_svc_handler+0x38/0x78 [<0>] el0_svc+0x8/0xc root@d06-1:~$ perf record -F -e cycles:u -p 33293 -- sleep 10 [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.871 MB perf.data (48730 samples) ] root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # .. .. # 37.82% qemu-img libc-2.29.so[.] 0x000df710 21.81% qemu-img [unknown] [k] 0x10099504 14.23% qemu-img [unknown] [k] 0x10085dc0 9.13% qemu-img [unknown] [k] 0x1008fff8 6.47% qemu-img libc-2.29.so[.] 0x000df708 5.69% qemu-img qemu-img[.] qemu_event_reset 2.57% qemu-img libc-2.29.so[.] 0x000df678 0.63% qemu-img libc-2.29.so[.] 0x000df700 0.49% qemu-img libc-2.29.so[.] __sigtimedwait 0.42% qemu-img libpthread-2.29.so [.] __libc_sigwait TRHREAD #3 __GI___sigtimedwait (../sysdeps/unix/sysv/linux/sigtimedwait.c:29) __sigwait (linux/sigwait.c:28) qemu_thread_start (./util/qemu-thread-posix.c:498) start_thread (pthread_create.c:486) thread_start (linux/aarch64/clone.S:78) ./33303/stack ./33303/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] do_sigtimedwait.isra.9+0x194/0x288 [<0>] get_signal+0x5a4/0x730 [<0>] __arm64_sys_rt_sigtimedwait+0xac/0x110 [<0>] do_notify_resume+0x158/0x358 [<0>] el0_svc_common+0xa0/0x168 [<0>] work_pending+0x8/0x10[<0>] el0_svc_handler+0x38/0x78 [<0>] el0_svc+0x8/0xc root@d06-1:~$ perf record -F -e cycles:u -p 33303 -- sleep 10 [ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 1.905 MB perf.data (49647 samples) ] root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol # .. .. # 45.37% qemu-img libc-2.29.so[.] 0x000df710 23.52% qemu-img [unknown] [k] 0x10099504 9.08% qemu-img [unknown] [k] 0x1008fff8 8.89% qemu-img [unknown] [k] 0x10085dc0 5.56% qemu-img libc-2.29.so[.] 0x000df708 3.66% qemu-img libc-2.29.so[.] 0x000df678 1.01% qemu-img libc-2.29.so[.] __sigtimedwait 0.80% qemu-img libc-2.29.so[.] 0x000df700 0.64% qemu-img qemu-img[.] qemu_event_reset 0.55% qemu-img libc-2.29.so[.] 0x000df718 0.52% qemu-img libpthread-2.29.so [.] __libc_sigwait TRHREAD #2 syscall (linux/aarch64/syscall.S:38) qemu_futex_wait (./util/qemu-thread-posix.c:438) qemu_event_wait (./util/qemu-thread-posix.c:442) call_rcu_thread (./util/rcu.c:261) qemu_thread_start (./util/qemu-thread-posix.c:498) start_thread (pthread_create.c:486) thread_start (linux/aarch64/clone.S:78) ./33302/stack ./33302/stack [<0>] __switch_to+0xc0/0x218 [<0>] __switch_to+0xc0/0x218 [<0>] ptrace_stop+0x148/0x2b0 [<0>] ptrace_stop+0x148/0x2b0 [<0>] get_signal+0x5a4/0x730 [<0>] get_signal+0x5a4/0x730 [<0>] do_notify_resume+0x1c4/0x358 [<0>] do_notify_resume+0x1c4/0x358 [<0>] work_pending+0x8/0x10[<0>] work_pending+0x8/0x10 root@d06-1:~$ perf report --stdio # Overhead Command Shared Object Symbol #
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, with a d06 aarch64 machine I was able to reproduce it after 8 attempts.I'll debug it today and provide feedback on my findings. (gdb) bt full #0 0xb0b2181c in __GI_ppoll (fds=0xce5ab770, nfds=4, timeout=, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 _x3tmp = 0 _x0tmp = 187650583213936 _x0 = 187650583213936 _x3 = 0 _x4tmp = 8 _x1tmp = 4 _x1 = 4 _x4 = 8 _x2tmp = _x2 = 0 _x8 = 73 _sys_result = _sys_result = sc_cancel_oldtype = 0 sc_ret = tval = {tv_sec = 0, tv_nsec = 187650583137792} #1 0xcd2a773c in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 No locals. #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at ./util/qemu-timer.c:322 No locals. #3 0xcd2a8764 in os_host_main_loop_wait (timeout=-1) at ./util/main-loop.c:233 context = 0xce599d90 ret = context = ret = #4 main_loop_wait (nonblocking=) at ./util/main-loop.c:497 ret = timeout = 4294967295 timeout_ns = #5 0xcd1df454 in convert_do_copy (s=0xf9b2b1d8) at ./qemu-img.c:1981 ret = i = n = sector_num = ret = i = n = sector_num = #6 img_convert (argc=, argv=) at ./qemu-img.c:2457 c = bs_i = flags = 16898 src_flags = 0 fmt = 0xf9b2bad1 "qcow2" out_fmt = cache = 0xcd2cb1c8 "unsafe" src_cache = 0xcd2ca9c0 "writeback" out_baseimg = out_filename = out_baseimg_param = snapshot_name = 0x0 drv = proto_drv = bdi = {cluster_size = 65536, vm_state_offset = 32212254720, is_dirty = false, unallocated_blocks_are_zero = true, needs_compressed_writes = false} out_bs = opts = 0xce5ab390 sn_opts = 0x0 create_opts = 0xce5ab0c0 open_opts = options = 0x0 local_err = 0x0 writethrough = false src_writethrough = false quiet = image_opts = false skip_create = false progress = tgt_image_opts = false ret = force_share = false explict_min_sparse = false s = {src = 0xce577240, src_sectors = 0xce577300, src_num = 1, total_sectors = 62914560,allocated_sectors = 9572096, allocated_done = 6541440, sector_num = 8863744, wr_offs = 8859776, status = BLK_DATA, sector_next_status = 8863744, target = 0xce5bd2a0, has_zero_init = true,compressed = false, unallocated_blocks_are_zero = true, target_has_backing = false, target_backing_sectors = -1, wr_in_order = true, copy_range = false, min_sparse = 8, alignment = 8,cluster_sectors = 128, buf_sectors = 4096, num_coroutines = 8, running_coroutines = 8, co = {0xce5ceda0,0xce5cef50, 0xce5cf100, 0xce5cf2b0, 0xce5cf460, 0xce5cf610, 0xce5cf7c0,0xce5cf970, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, wait_sector_num = {-1, 8859904, 8860928, 8863360,8861952, 8862976, 8862592, 8861440, 0, 0, 0, 0, 0, 0, 0, 0}, lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0}, ret = -115} __PRETTY_FUNCTION__ = "img_convert" #7 0xcd1d8400 in main (argc=7, argv=) at ./qemu-img.c:4976 cmd = 0xcd34ad78 cmdname = local_error = 0x0 trace_file = 0x0 c = long_options = {{name = 0xcd2cbbb0 "help", has_arg = 0, flag = 0x0, val = 104}, { name = 0xcd2cbc78 "version", has_arg = 0, flag = 0x0, val = 86}, {name = 0xcd2cbc80 "trace", has_arg = 1, flag = 0x0, val = 84}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Alright, I couldn't reproduce this yet, I'm running same test case in a 24 cores box and causing lots of context switches and CPU migrations in parallel (trying to exhaust the logic). Will let this running for sometime to check. Unfortunately this can be related QEMU AIO BH locking/primitives and cache coherency in the HW in question (which I got specs from: https://en.wikichip.org/wiki/hisilicon/kunpeng/hi1616): l1$ size8 MiB l1d$ size 4 MiB l1i$ size 4 MiB l2$ size32 MiB l3$ size64 MiB like for example when having 2 threads in different NUMA domains, or some other situation. I can't simulate the same since I have a SOC with: Cortex-A53 MPCore 24cores, L1 I/D=32KB/32KB L2 =256KB L3 =4MB and I'm not even close to L1/L2/L3 cache numbers from D06 =o). Just got a note that I'll be able to reproduce this in the real HW, will get back soon with real gdb debugging. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
OOhh nm on the virtual environment test, as I just remembered we don't have KVM on 2nd level for aarch64 yet (at least in ARMv8 implementing virt extension). I'll try to reproduce in the real env only. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
Hello Liz, I'll try to reproduce this issue in a Cortex-A53 aarch64 real environment (w/ 24 HW threads) AND in a virtual environment w/ lots of vCPUs... but, if it's a barrier missing - or the lack of atomicity and/or ordering in a primitive - then, I'm afraid the context switch in between vCPUs might not be the same as in real CPUs (IPIs are sent and handled differently and host kernel delays IPI delivery because of its own callbacks, before scheduling, etc...) and I could need a qemu dump from your environment. Would that be feasible ? Can you reproduce this nowadays ? This bug has aged a little, so I'm now sure! Could you provide me the dump caused by latest package available for your Ubuntu version ? This way I have the debug symbols to work with. Meanwhile, I'll be trying to reproduce on my side. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1805256] Re: qemu-img hangs on high core count ARM system
** Changed in: qemu (Ubuntu) Status: Confirmed => In Progress ** Changed in: qemu (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) ** Changed in: qemu (Ubuntu) Importance: Undecided => Medium -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on high core count ARM system Status in QEMU: Confirmed Status in qemu package in Ubuntu: In Progress Bug description: On the HiSilicon D06 system - a 96 core NUMA arm64 box - qemu-img frequently hangs (~50% of the time) with this command: qemu-img convert -f qcow2 -O qcow2 /tmp/cloudimg /tmp/cloudimg2 Where "cloudimg" is a standard qcow2 Ubuntu cloud image. This qcow2->qcow2 conversion happens to be something uvtool does every time it fetches images. Once hung, attaching gdb gives the following backtrace: (gdb) bt #0 0xae4f8154 in __GI_ppoll (fds=0xe8a67dc0, nfds=187650274213760, timeout=, timeout@entry=0x0, sigmask=0xc123b950) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xbbefaf00 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:322 #3 0xbbefbf80 in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:233 #4 main_loop_wait (nonblocking=) at util/main-loop.c:497 #5 0xbbe2aa30 in convert_do_copy (s=0xc123bb58) at qemu-img.c:1980 #6 img_convert (argc=, argv=) at qemu-img.c:2456 #7 0xbbe2333c in main (argc=7, argv=) at qemu-img.c:4975 Reproduced w/ latest QEMU git (@ 53744e0a182) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions
[Qemu-devel] [Bug 1834113] Re: QEMU touchpad input erratic after wakeup from sleep
Avi, Something I have realized we missed as a feedback here - or maybe I missed checking previous comments - is how your mouse is being setup for the guest. Is it being PS/2 emulated (default) or is it being given as an USB device (when qemu cmd line has "-usb -device usb-tablet"). Also, are you using SPICE protocol (perhaps with USB direction option ?). Are you able to tell which xserver-xorg-input-XX module is being used inside the guest ? You will probably find that information from Xorg log files (check if you're using xf86-input-wacom or xserver-xorg-input- evdev or some other). Another thing that comes to my mind as well, are you using powersaving features ? Specifically the I2C bus I'm concerned. Using "powertop", you are able to change "Runtime PM for I2C Adapter" option under the Tunables Tab (turning the power mgmt to off). I would like to know if you are able to reproduce the issue without having power management enabled for I2C. You can try disabling only I2C and then disabling all PM options as a second attempt. >From your host: Device #1 [2.834320] input: WCOM488E:00 056A:488E Mouse as /devices/pci:00/:00:15.0/i2c_designware.0/i2c-1/i2c- WCOM488E:00/0018:056A:488E.0001/input/input12 [3.064686] input: Wacom HID 488E Finger as /devices/pci:00/:00:15.0/i2c_designware.0/i2c-1/i2c- WCOM488E:00/0018:056A:488E.0001/input/input17 Device #2 [2.834860] input: SYNA2393:00 06CB:7A13 Mouse as /devices/pci:00/:00:15.1/i2c_designware.1/i2c-6/i2c- SYNA2393:00/0018:06CB:7A13.0002/input/input13 [2.834929] input: SYNA2393:00 06CB:7A13 Touchpad as /devices/pci:00/:00:15.1/i2c_designware.1/i2c-6/i2c- SYNA2393:00/0018:06CB:7A13.0002/input/input14 Could you describe your input devices ? How many mice, trackpads, pens, etc, you are using connected to the host ? Thanks! And sorry for so many questions =). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1834113 Title: QEMU touchpad input erratic after wakeup from sleep Status in QEMU: Incomplete Status in libvirt package in Ubuntu: Incomplete Status in qemu package in Ubuntu: Incomplete Bug description: Using Ubuntu host and guest. Normally the touchpad works great. Within the last few days, suddenly, apparently after a wake from sleep, the touchpad will behave erratically. For example, it will take two clicks to select something, and when moving the cursor it will act as though it is dragging even with the button not clicked. A reboot fixes the issue temporarily. ProblemType: Bug DistroRelease: Ubuntu 19.04 Package: qemu 1:3.1+dfsg-2ubuntu3.1 Uname: Linux 5.1.14-050114-generic x86_64 ApportVersion: 2.20.10-0ubuntu27 Architecture: amd64 CurrentDesktop: ubuntu:GNOME Date: Mon Jun 24 20:55:44 2019 Dependencies: EcryptfsInUse: Yes InstallationDate: Installed on 2019-02-20 (124 days ago) InstallationMedia: Ubuntu 18.04 "Bionic" - Build amd64 LIVE Binary 20180608-09:38 Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 002: ID 8087:0025 Intel Corp. Bus 001 Device 003: ID 0c45:671d Microdia Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: Dell Inc. Precision 5530 ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR= LANG=en_US.UTF-8 SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.1.14-050114-generic root=UUID=18e8777c-1764-41e4-a19f-62476055de23 ro mem_sleep_default=deep mem_sleep_default=deep acpi_rev_override=1 scsi_mod.use_blk_mq=1 nouveau.modeset=0 nouveau.runpm=0 nouveau.blacklist=1 acpi_backlight=none acpi_osi=Linux acpi_osi=! SourcePackage: qemu UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/26/2019 dmi.bios.vendor: Dell Inc. dmi.bios.version: 1.10.1 dmi.board.name: 0FP2W2 dmi.board.vendor: Dell Inc. dmi.board.version: A00 dmi.chassis.type: 10 dmi.chassis.vendor: Dell Inc. dmi.modalias: dmi:bvnDellInc.:bvr1.10.1:bd04/26/2019:svnDellInc.:pnPrecision5530:pvr:rvnDellInc.:rn0FP2W2:rvrA00:cvnDellInc.:ct10:cvr: dmi.product.family: Precision dmi.product.name: Precision 5530 dmi.product.sku: 087D dmi.sys.vendor: Dell Inc. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1834113/+subscriptions
[Qemu-devel] [Bug 1830821] Re: Expose ARCH_CAP_MDS_NO in guest
*** This bug is a duplicate of bug 1828495 *** https://bugs.launchpad.net/bugs/1828495 Commit: https://bugs.launchpad.net/intel/+bug/1828495/comments/42 Addresses exactly this bug fix. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830821 Title: Expose ARCH_CAP_MDS_NO in guest Status in intel: New Status in QEMU: Fix Released Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Cosmic: Confirmed Status in qemu source package in Disco: Confirmed Bug description: Description: MDS_NO is bit 5 of ARCH_CAPABILITIES. Expose this bit to guest. Target Qemu: 4.1 To manage notifications about this bug go to: https://bugs.launchpad.net/intel/+bug/1830821/+subscriptions
[Qemu-devel] [Bug 1830821] Re: Expose ARCH_CAP_MDS_NO in guest
*** This bug is a duplicate of bug 1828495 *** https://bugs.launchpad.net/bugs/1828495 I'm marking this bug as a duplicate of LP: #1828495 since both are asking for mitigations pass-through to i386 kvm guests and I'm preparing a fix for both simultaneously. ** This bug has been marked a duplicate of bug 1828495 [KVM][CLX] CPUID_7_0_EDX_ARCH_CAPABILITIES is not enabled in VM. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830821 Title: Expose ARCH_CAP_MDS_NO in guest Status in intel: New Status in QEMU: Fix Released Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Cosmic: Confirmed Status in qemu source package in Disco: Confirmed Bug description: Description: MDS_NO is bit 5 of ARCH_CAPABILITIES. Expose this bit to guest. Target Qemu: 4.1 To manage notifications about this bug go to: https://bugs.launchpad.net/intel/+bug/1830821/+subscriptions
[Qemu-devel] [Bug 1830821] Re: Expose ARCH_CAP_MDS_NO in guest
** Changed in: qemu (Ubuntu Disco) Status: Fix Released => Confirmed ** Changed in: qemu (Ubuntu Disco) Importance: Undecided => Wishlist ** Changed in: qemu (Ubuntu) Status: Fix Released => Confirmed ** Changed in: qemu (Ubuntu) Importance: Undecided => Wishlist ** Changed in: qemu (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830821 Title: Expose ARCH_CAP_MDS_NO in guest Status in intel: New Status in QEMU: Fix Released Status in qemu package in Ubuntu: Confirmed Status in qemu source package in Bionic: Confirmed Status in qemu source package in Cosmic: Confirmed Status in qemu source package in Disco: Confirmed Bug description: Description: MDS_NO is bit 5 of ARCH_CAPABILITIES. Expose this bit to guest. Target Qemu: 4.1 To manage notifications about this bug go to: https://bugs.launchpad.net/intel/+bug/1830821/+subscriptions
[Qemu-devel] [Bug 1830821] Re: Expose ARCH_CAP_MDS_NO in guest
This effort, if done, would be done together with: https://bugs.launchpad.net/intel/+bug/1828495 Please read comments: https://bugs.launchpad.net/intel/+bug/1828495/comments/8 and https://bugs.launchpad.net/intel/+bug/1828495/comments/10 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1830821 Title: Expose ARCH_CAP_MDS_NO in guest Status in intel: New Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Bionic: Confirmed Status in qemu source package in Cosmic: Confirmed Status in qemu source package in Disco: Fix Released Bug description: Description: MDS_NO is bit 5 of ARCH_CAPABILITIES. Expose this bit to guest. Target Qemu: 4.1 To manage notifications about this bug go to: https://bugs.launchpad.net/intel/+bug/1830821/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Thanks Christian! Will do!! -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Released Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
For me we had enough tests already. Upstream development/tests, Zesty, Yakkety. Christian, could you please move Xenial for me ? I have some end users waiting for this. Thank you very much. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Released Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Released Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Yakkety Verification (with 3.13 kernel from Trusty since a <= 3.17 kernel is needed). This verifies that Ubuntu Cloud Archive repositories will be alright with this new packages (from Xenial / Yakkety). ## CURRENT inaddy@(ykvm01):~$ apt-cache policy qemu-kvm qemu-kvm: Installed: 1:2.6.1+dfsg-0ubuntu5.1 Candidate: 1:2.6.1+dfsg-0ubuntu5.1 ykvm01 (sender): Jan 11 11:34:35 ykvm01 kernel: type=1400 audit(1484141675.962:53): apparmor="DENIED" operation="mknod" profile="libvirt-7cdcb6c0-f85e-4639 -912b-c785bd5992d9" name="/tmp/memfd-bF8new" pid=1934 comm="qemu- system-x86" requested_mask="c" denied_mask="c" fsuid=111 ouid=111 inaddy@(ykvm01):~$ sudo virsh migrate --live guest qemu+ssh://ykvm02/system error: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory ykvm02 (receiver): Jan 11 11:39:31 ykvm02 kernel: type=1400 audit(1484141971.526:53): apparmor="DENIED" operation="mknod" profile="libvirt-7cdcb6c0-f85e-4639 -912b-c785bd5992d9" name="/tmp/memfd-JZ6L9T" pid=2177 comm="qemu- system-x86" requested_mask="c" denied_mask="c" fsuid=111 ouid=111 OBS: The check was being done in the wrong place AND situation, like I showed in this bug. ## PROPOSED inaddy@(ykvm01):~$ apt-cache policy qemu-kvm qemu-kvm: Installed: 1:2.6.1+dfsg-0ubuntu5.2 Candidate: 1:2.6.1+dfsg-0ubuntu5.2 ykvm01 (sender): ykvm02 (receiver): inaddy@(ykvm02):~$ virsh list IdName State 1 guest running Its all good. verification-yakkety-done ** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Committed Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Xenial Verification (with 3.13 kernel from Trusty since a <= 3.17 kernel is needed). This verifies that Ubuntu Cloud Archive repositories will be alright with this new packages (from Xenial / Yakkety). ## CURRENT inaddy@(xkvm01):~$ apt-cache policy qemu-kvm qemu-kvm: Installed: 1:2.5+dfsg-5ubuntu10.6 Candidate: 1:2.5+dfsg-5ubuntu10.6 xkvm01 (sender): Jan 11 01:07:54 xkvm01 kernel: type=1400 audit(1484104074.014:13): apparmor="DENIED" operation="mknod" profile="libvirt-7cdcb6c0-f85e-4639 -912b-c785bd5992d9" name="/tmp/memfd-Jh5UhR" pid=2535 comm="qemu- system-x86" requested_mask="c" denied_mask="c" fsuid=112 ouid=112 $ sudo virsh migrate --live guest qemu+ssh://xkvm02/system error: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory xkvm02 (receiver): Jan 11 01:08:23 xkvm02 kernel: type=1400 audit(1484104103.888:53): apparmor="DENIED" operation="mknod" profile="libvirt-7cdcb6c0-f85e-4639 -912b-c785bd5992d9" name="/tmp/memfd-fc9rij" pid=2000 comm="qemu- system-x86" requested_mask="c" denied_mask="c" fsuid=112 ouid=112 OBS: The check was being done in the wrong place AND situation, like I showed in this bug. ## PROPOSED inaddy@(xkvm01):~$ apt-cache policy qemu-kvm qemu-kvm: Installed: 1:2.5+dfsg-5ubuntu10.7 Candidate: 1:2.5+dfsg-5ubuntu10.7 xkvm01 (sender): xkvm02 (receiver): inaddy@(xkvm02):~$ virsh list IdName State 1 guest running Its all good. verification-xenial-done -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Committed Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined"
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
@jamespage, @cpaelzer, I'll verify this fix in couple of days so it can be released. Thank you! Rafael -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Committed Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes,
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Hello Antonio (@arcimboldo) The fix only makes sense for newer QEMUs (>= Xenial, like the one from Mitaka Ubuntu Cloud Archive). OBS: The "migration check" is done in VHOST initialization functions when the devices are virtually attached to the virtual machine. If you are using kernel 3.13 and have apparmor enabled, then all the running instances have the "migration blocker" ON - because of this buggy migration check - and won't be able to live migration. Unfortunately there is a "in-memory" linked list telling qemu that is has a blocker (with the reason). This blocker was added during instance startup and will be checked/used only when instance is live-migrated. Check this: http://pastebin.ubuntu.com/23517175/ If you started the instance in a host not running apparmor (or not having libvirt profile loaded, for example) it won't block the creation of /tmp/memfd-XXX files during instance initialization. That won't trigger the "blocker flag" inside the running program and, if/when needed, the live migration will be able to occur. This means that, after installing the new package, if you're using apparmor, yes, you would have to RESTART running instances that were affected by this bug in order to live migrating them. Sorry for the bad news! Even if you remove the apparmor rules, the migration blocker is already set. Hacking your process virtual memory would jeopardize the contents of the virtual memory (could be catastrophic specially for a virtual machine). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: Invalid Status in Ubuntu Cloud Archive mitaka series: Fix Committed Status in QEMU: Fix Committed Status in qemu package in Ubuntu: Fix Released Status in qemu source package in Xenial: Fix Committed Status in qemu source package in Yakkety: Fix Committed Status in qemu source package in Zesty: Fix Released Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
** Patch added: "zesty_qemu_2.6.1+dfsg-0ubuntu7.debdiff" https://bugs.launchpad.net/qemu/+bug/1626972/+attachment/4781485/+files/zesty_qemu_2.6.1+dfsg-0ubuntu7.debdiff -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: In Progress Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes,
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
** Description changed: - And, when libvirt starts using apparmor, and creating apparmor profiles - for every virtual machine created in the compute nodes, mitaka qemu (2.5 - - and upstream also) uses a fallback mechanism for creating shared - memory for live-migrations. This fall back mechanism, on kernels 3.13 - - that don't have memfd_create() system-call, try to create files on /tmp/ + [Impact] + + * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. + * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. + * Apparmor will block access to /tmp/ and QEMU will fail migrating. + + [Test Case] + + * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. + * Try to live-migration from one to another. + * Apparmor will block creation of /tmp/memfd-XXX files. + + [Regression Potential] + + Pros: + * Exhaustively tested this. + * Worked with upstream on this fix. + * I'm implementing new vhost log mechanism for upstream. + * One line change to a blocker that is already broken. + + Cons: + * To break live migration in other circumstances. + + [Other Info] + + * Christian Ehrhardt has been following this. + + ORIGINAL DESCRIPTION: + + When libvirt starts using apparmor, and creating apparmor profiles for + every virtual machine created in the compute nodes, mitaka qemu (2.5 - + and upstream also) uses a fallback mechanism for creating shared memory + for live-migrations. This fall back mechanism, on kernels 3.13 - that + don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { - if (memfd_create)... ### only works with HWE kernels + if (memfd_create)... ### only works with HWE kernels - else ### 3.13 kernels, gets blocked by apparmor -tmpdir = g_get_tmp_dir -... -mfd = mkstemp(fname) + else ### 3.13 kernels, gets blocked by apparmor + tmpdir = g_get_tmp_dir + ... + mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Right now Zesty is behind Yakkety because of a Security Update. Not sure you need me to attach a debdiff for Zesty as well. Let me know. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: In Progress Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: [Impact] * Updated QEMU (from UCA) live migration doesn't work with 3.13 kernels. * QEMU code checks if it can create /tmp/memfd-XXX files wrongly. * Apparmor will block access to /tmp/ and QEMU will fail migrating. [Test Case] * Install 2 Ubuntu Trusty (3.13) + UCA Mitaka + apparmor rules. * Try to live-migration from one to another. * Apparmor will block creation of /tmp/memfd-XXX files. [Regression Potential] Pros: * Exhaustively tested this. * Worked with upstream on this fix. * I'm implementing new vhost log mechanism for upstream. * One line change to a blocker that is already broken. Cons: * To break live migration in other circumstances. [Other Info] * Christian Ehrhardt has been following this. ORIGINAL DESCRIPTION: When libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Took some more time here because of LP: #1621269. ** Patch added: "yakkety_qemu_2.6.1+dfsg-0ubuntu5.2.debdiff" https://bugs.launchpad.net/qemu/+bug/1626972/+attachment/4781464/+files/yakkety_qemu_2.6.1+dfsg-0ubuntu5.2.debdiff -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: In Progress Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Thanks Christian, Then I'll finish this SRU first. Will work in the vhost mmap log file right after. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: In Progress Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
** Patch added: "xenial_qemu_2.5+dfsg-5ubuntu10.7.debdiff" https://bugs.launchpad.net/qemu/+bug/1626972/+attachment/4781425/+files/xenial_qemu_2.5+dfsg-5ubuntu10.7.debdiff -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: In Progress Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
** Changed in: cloud-archive Status: New => In Progress ** Changed in: cloud-archive Assignee: (unassigned) => Rafael David Tinoco (inaddy) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in Ubuntu Cloud Archive: In Progress Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
For Ubuntu Xenial (Mitaka), Yakkety (Newton), Zesty: Commit 0d34fbabc1 fixes the issue for vhost-net kernel. Vhost-net kernel doesn't use shared log so the verification is not used and apparmor profiles won't block the live migration. With customers using vhost-user that might still cause migration problems, but, likely, those are the vast minority. commit 0d34fbabc13891da41582b0823867dc5733fffef Author: Rafael David Tinoco <rafael.tin...@canonical.com> Date: Mon Oct 24 15:35:03 2016 + vhost: migration blocker only if shared log is used Commit 31190ed7 added a migration blocker in vhost_dev_init() to check if memfd would succeed. It is better if this blocker first checks if vhost backend requires shared log. This will avoid a situation where a blocker is added inappropriately (e.g. shared log allocation fails when vhost backend doesn't support it). Signed-off-by: Rafael David Tinoco <rafael.tin...@canonical.com> Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> Reviewed-by: Michael S. Tsirkin <m...@redhat.com> Signed-off-by: Michael S. Tsirkin <m...@redhat.com> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 131f164..25bf67f 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1122,7 +1122,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, if (!(hdev->features & (0x1ULL << VHOST_F_LOG_ALL))) { error_setg(>migration_blocker, "Migration disabled: vhost lacks VHOST_F_LOG_ALL feature."); - } else if (!qemu_memfd_check()) { + } else if (vhost_dev_log_is_shared(hdev) && !qemu_memfd_check()) { error_setg(>migration_blocker, "Migration disabled: failed to allocate shared memory"); } The "final" fix for upstream fix is being finished by me, but, might not be suitable for SRU since it will add features in qemu (and likely to libvirt) in order for the vhost log file to be passed (by using an already opened file descriptor). This will require changes in libvirt and nova-compute but this change will, finally, allow security driver to apply rules to vhost log file for shared logs (mostly for vhost-user drivers). -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel:
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
** Changed in: qemu (Ubuntu Xenial) Status: New => In Progress ** Changed in: qemu (Ubuntu Yakkety) Status: New => In Progress ** Changed in: qemu (Ubuntu Xenial) Assignee: (unassigned) => Rafael David Tinoco (inaddy) ** Changed in: qemu (Ubuntu Yakkety) Assignee: (unassigned) => Rafael David Tinoco (inaddy) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
** Also affects: qemu (Ubuntu) Importance: Undecided Status: New ** Changed in: qemu (Ubuntu) Status: New => In Progress ** Changed in: qemu (Ubuntu) Assignee: (unassigned) => Rafael David Tinoco (inaddy) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Status in qemu package in Ubuntu: In Progress Status in qemu source package in Xenial: In Progress Status in qemu source package in Yakkety: In Progress Status in qemu source package in Zesty: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions
Re: [Qemu-devel] [PATCH] vhost: secure vhost shared log files using argv paremeter
Hello, > On Tue, Nov 8, 2016 at 4:49 PM Rafael David Tinoco > <rafael.tin...@canonical.com> wrote: > Hello Michael, André, > > Could you do a quick review before a final submission ? > > http://paste.ubuntu.com/23446279/ > ... > (André) > Could it be only a filename? This would simplify testing. > (Michael) > When vhostlog is not specified, can we just use memfd as we did? > > Michael said: > https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg08197.html > I think that the best approach is to allow passing in the fd, not the file > path. If not passed, use memfd. Missed this one. > I do agree :) Sounds good. I see that the new approach is to let the managing library to create the files and just pass the file descriptors, this way security rules are applied to library itself and not to qemu processes. > Do we really need to give a path? (pass fd with -add-fd/qmp add-fd) I guess not. So, for shared logs: - vhostlogfd has to be provided. - if vhostlogfd is not provided, use memfd. (we don't want writes in /tmp, should i remove fallback mechanism from memfd logic) - if memfd fails, log can't be shared/created and there is a migration blocker. André, Michael, I'll work on that and get the patches soon, meanwhile, could u push: - "vhost: migration blocker only if shared log is use" so I can backport it to Debian ? Thank you, -Rafael Tinoco
Re: [Qemu-devel] [PATCH] vhost: secure vhost shared log files using argv paremeter
Hello Michael, André, Could you do a quick review before a final submission ? http://paste.ubuntu.com/23446279/ - I split the commits into 1) bugfix, 2) new util with test, 3) vhostlog The unit test is testing passing fds between 2 processes and asserting contents of mmap buffer coming from the "vhostlog" util (mmap-file). Your final comment on the "vhostlog" was: >> Argv examples: >> >> -netdev tap,id=net0,vhost=on >> -netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log >> -netdev tap,id=net0,vhost=on,vhostlog=/tmp (André) > Could it be only a filename? This would simplify testing. (Michael) > When vhostlog is not specified, can we just use memfd as we did? I'm going to change this to: 1 - if vhostlog is not provided shared log can't be used. Use memfd. 2 - for shared logs, vhostlog has to be provided as a "file" ? Should i keep vhostlog being a directory also ? (i know we are unlinking the file so might not be needed BUT a static file might have a race condition in between different instances and providing a directory - that creates random files on it - might be better approach). Is there anything else ? Thank you Rafael Tinoco On Mon, Oct 31, 2016 at 8:30 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > On Mon, Oct 31, 2016 at 08:35:33AM -0200, Rafael David Tinoco wrote: >> On Sun, Oct 30, 2016 at 5:26 PM, Michael S. Tsirkin <m...@redhat.com> wrote: >> > >> > On Sat, Oct 22, 2016 at 07:00:41AM +, Rafael David Tinoco wrote: >> > > Commit 31190ed7 added a migration blocker in vhost_dev_init() to >> > > check if memfd would succeed. It is better if this blocker first >> > > checks if vhost backend requires shared log. This will avoid a >> > > situation where a blocker is added inappropriately (e.g. shared >> > > log allocation fails when vhost backend doesn't support it). >> > >> > Sounds like a bugfix but I'm not sure. Can this part be split >> > out in a patch by itself? >> >> Already sent some days ago (and pointed by Marc today). >> >> > > Commit: 35f9b6e added a fallback mechanism for systems not supporting >> > > memfd_create syscall (started being supported since 3.17). >> > > >> > > Backporting memfd_create might not be accepted for distros relying >> > > on older kernels. Nowadays there is no way for security driver >> > > to discover memfd filename to be created: /memfd-XX. >> > > >> > > Also, because vhost log file descriptors can be passed to other >> > > processes, after discussion, we thought it is best to back mmap by >> > > using files that can be placed into a specific directory: this commit >> > > creates "vhostlog" argv parameter for such purpose. This will allow >> > > security drivers to operate on those files appropriately. >> > > >> > > Argv examples: >> > > >> > > -netdev tap,id=net0,vhost=on >> > > -netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log >> > > -netdev tap,id=net0,vhost=on,vhostlog=/tmp >> > > >> > > For vhost backends supporting shared logs, if vhostlog is non-existent, >> > > or a directory, random files are going to be created in the specified >> > > directory (or, for non-existent, in tmpdir). If vhostlog is specified, >> > > the filepath is always used when allocating vhost log files. >> > >> > When vhostlog is not specified, can we just use memfd as we did? >> > >> >> This was my approach on a "pastebin" example before this patch (in the >> discussion thread we had). Problem goes back to when vhost log file >> descriptor is shared with some vhost-user implementation - like the >> interface allows to - and the security driver labelling issue. IMO, >> yes, we could let vhostlog to specify a log file, and, if not >> specified, assume memfd is ok to be used. >> >> Please let me know if you - and Marc - want me to keep using memfd. >> I'll create the mmap-file tests and files in a different commit, like >> Marc has asked for, and will propose the patch again by the end of >> this week. > > I think that the best approach is to allow passing in the fd, > not the file path. If not passed, use memfd. > > -- > MST
Re: [Qemu-devel] [PATCH] vhost: secure vhost shared log files using argv paremeter
On Sun, Oct 30, 2016 at 5:26 PM, Michael S. Tsirkin <m...@redhat.com> wrote: > > On Sat, Oct 22, 2016 at 07:00:41AM +0000, Rafael David Tinoco wrote: > > Commit 31190ed7 added a migration blocker in vhost_dev_init() to > > check if memfd would succeed. It is better if this blocker first > > checks if vhost backend requires shared log. This will avoid a > > situation where a blocker is added inappropriately (e.g. shared > > log allocation fails when vhost backend doesn't support it). > > Sounds like a bugfix but I'm not sure. Can this part be split > out in a patch by itself? Already sent some days ago (and pointed by Marc today). > > Commit: 35f9b6e added a fallback mechanism for systems not supporting > > memfd_create syscall (started being supported since 3.17). > > > > Backporting memfd_create might not be accepted for distros relying > > on older kernels. Nowadays there is no way for security driver > > to discover memfd filename to be created: /memfd-XX. > > > > Also, because vhost log file descriptors can be passed to other > > processes, after discussion, we thought it is best to back mmap by > > using files that can be placed into a specific directory: this commit > > creates "vhostlog" argv parameter for such purpose. This will allow > > security drivers to operate on those files appropriately. > > > > Argv examples: > > > > -netdev tap,id=net0,vhost=on > > -netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log > > -netdev tap,id=net0,vhost=on,vhostlog=/tmp > > > > For vhost backends supporting shared logs, if vhostlog is non-existent, > > or a directory, random files are going to be created in the specified > > directory (or, for non-existent, in tmpdir). If vhostlog is specified, > > the filepath is always used when allocating vhost log files. > > When vhostlog is not specified, can we just use memfd as we did? > This was my approach on a "pastebin" example before this patch (in the discussion thread we had). Problem goes back to when vhost log file descriptor is shared with some vhost-user implementation - like the interface allows to - and the security driver labelling issue. IMO, yes, we could let vhostlog to specify a log file, and, if not specified, assume memfd is ok to be used. Please let me know if you - and Marc - want me to keep using memfd. I'll create the mmap-file tests and files in a different commit, like Marc has asked for, and will propose the patch again by the end of this week.
[Qemu-devel] [PATCH] vhost: migration blocker only if shared log is used
Commit 31190ed7 added a migration blocker in vhost_dev_init() to check if memfd would succeed. It is better if this blocker first checks if vhost backend requires shared log. This will avoid a situation where a blocker is added inappropriately (e.g. shared log allocation fails when vhost backend doesn't support it). Signed-off-by: Rafael David Tinoco <rafael.tin...@canonical.com> Reviewed-by: Marc-André Lureau <marcandre.lur...@redhat.com> --- hw/virtio/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index bd051ab..742d0aa 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1122,7 +1122,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, if (!(hdev->features & (0x1ULL << VHOST_F_LOG_ALL))) { error_setg(>migration_blocker, "Migration disabled: vhost lacks VHOST_F_LOG_ALL feature."); -} else if (!qemu_memfd_check()) { +} else if (vhost_dev_log_is_shared(hdev) && !qemu_memfd_check()) { error_setg(>migration_blocker, "Migration disabled: failed to allocate shared memory"); } -- 2.9.3
[Qemu-devel] [Bug 1626972] Fwd: [PATCH] vhost: secure vhost shared log files using argv paremeter
> Begin forwarded message: > > From: Rafael David Tinoco <rafael.tin...@canonical.com> > Subject: Re: [Qemu-devel] [PATCH] vhost: secure vhost shared log files using > argv paremeter > Date: October 22, 2016 at 19:52:31 GMT-2 > To: Marc-André Lureau <marcandre.lur...@gmail.com> > Cc: Rafael David Tinoco <rafael.tin...@canonical.com>, qemu-devel > <qemu-devel@nongnu.org> > > Hello, > >> On Oct 22, 2016, at 05:18, Marc-André Lureau <marcandre.lur...@gmail.com> >> wrote: >> >> Hi >> >> On Sat, Oct 22, 2016 at 10:01 AM Rafael David Tinoco >> <rafael.tin...@canonical.com> wrote: >> Commit 31190ed7 added a migration blocker in vhost_dev_init() to >> check if memfd would succeed. It is better if this blocker first >> checks if vhost backend requires shared log. This will avoid a >> situation where a blocker is added inappropriately (e.g. shared >> log allocation fails when vhost backend doesn't support it). >> >> Could you make this a seperate patch? > > Just did, in another e-mail, cc'ing you. > >> Argv examples: >> >>-netdev tap,id=net0,vhost=on >>-netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log >>-netdev tap,id=net0,vhost=on,vhostlog=/tmp >> >> Could it be only a filename? This would simplify testing. > > It could. Should I keep the /tmp/ logic if no vhostlog arg is present > ? Or you think it should fail if no arg is given ? I'm afraid of backward > compatibility when back-porting this to older qemu versions on stable > releases (like my case: I'll backport this to ~3 different versions). > >> For vhost backends supporting shared logs, if vhostlog is non-existent, >> or a directory, random files are going to be created in the specified >> directory (or, for non-existent, in tmpdir). If vhostlog is specified, >> the filepath is always used when allocating vhost log files. >> >> >> Regarding testing, you add utility code mmap-file, could you make this a >> seperate commit, with unit tests? >> > > Sure, I'll work on it. > >> thanks > > Thank u! > > -Rafael Tinoco -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1
[Qemu-devel] [Bug 1626972] Fwd: [PATCH] vhost: secure vhost shared log files using argv paremeter
> Begin forwarded message: > > From: Marc-André Lureau <marcandre.lur...@gmail.com> > Subject: Re: [Qemu-devel] [PATCH] vhost: secure vhost shared log files using > argv paremeter > Date: October 22, 2016 at 05:18:02 GMT-2 > To: Rafael David Tinoco <rafael.tin...@canonical.com> > Cc: QEMU <qemu-devel@nongnu.org> > > Hi > > On Sat, Oct 22, 2016 at 10:01 AM Rafael David Tinoco > <rafael.tin...@canonical.com <mailto:rafael.tin...@canonical.com>> wrote: > Commit 31190ed7 added a migration blocker in vhost_dev_init() to > check if memfd would succeed. It is better if this blocker first > checks if vhost backend requires shared log. This will avoid a > situation where a blocker is added inappropriately (e.g. shared > log allocation fails when vhost backend doesn't support it). > > Could you make this a seperate patch? > > Commit: 35f9b6e added a fallback mechanism for systems not supporting > memfd_create syscall (started being supported since 3.17). > > Backporting memfd_create might not be accepted for distros relying > on older kernels. Nowadays there is no way for security driver > to discover memfd filename to be created: /memfd-XX. > > Also, because vhost log file descriptors can be passed to other > processes, after discussion, we thought it is best to back mmap by > using files that can be placed into a specific directory: this commit > creates "vhostlog" argv parameter for such purpose. This will allow > security drivers to operate on those files appropriately. > > Argv examples: > > -netdev tap,id=net0,vhost=on > -netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log > -netdev tap,id=net0,vhost=on,vhostlog=/tmp > > Could it be only a filename? This would simplify testing. > > > For vhost backends supporting shared logs, if vhostlog is non-existent, > or a directory, random files are going to be created in the specified > directory (or, for non-existent, in tmpdir). If vhostlog is specified, > the filepath is always used when allocating vhost log files. > > > Regarding testing, you add utility code mmap-file, could you make this a > seperate commit, with unit tests? > > thanks > > Signed-off-by: Rafael David Tinoco <rafael.tin...@canonical.com > <mailto:rafael.tin...@canonical.com>> > --- > hw/net/vhost_net.c| 4 +- > hw/scsi/vhost-scsi.c | 2 +- > hw/virtio/vhost-vsock.c | 2 +- > hw/virtio/vhost.c | 41 +++-- > include/hw/virtio/vhost.h | 4 +- > include/net/vhost_net.h | 1 + > include/qemu/mmap-file.h | 10 +++ > net/tap.c | 6 ++ > qapi-schema.json | 3 + > qemu-options.hx | 3 +- > util/Makefile.objs| 1 + > util/mmap-file.c | 153 > ++ > 12 files changed, 207 insertions(+), 23 deletions(-) > create mode 100644 include/qemu/mmap-file.h > create mode 100644 util/mmap-file.c > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > index f2d49ad..d650c92 100644 > --- a/hw/net/vhost_net.c > +++ b/hw/net/vhost_net.c > @@ -171,8 +171,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) > net->dev.vq_index = net->nc->queue_index * net->dev.nvqs; > } > > -r = vhost_dev_init(>dev, options->opaque, > - options->backend_type, options->busyloop_timeout); > +r = vhost_dev_init(>dev, options->opaque, options->backend_type, > + options->busyloop_timeout, options->vhostlog); > if (r < 0) { > goto fail; > } > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c > index 5b26946..5dc3d30 100644 > --- a/hw/scsi/vhost-scsi.c > +++ b/hw/scsi/vhost-scsi.c > @@ -248,7 +248,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error > **errp) > s->dev.backend_features = 0; > > ret = vhost_dev_init(>dev, (void *)(uintptr_t)vhostfd, > - VHOST_BACKEND_TYPE_KERNEL, 0); > + VHOST_BACKEND_TYPE_KERNEL, 0, NULL); > if (ret < 0) { > error_setg(errp, "vhost-scsi: vhost initialization failed: %s", > strerror(-ret)); > diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c > index b481562..6cf6081 100644 > --- a/hw/virtio/vhost-vsock.c > +++ b/hw/virtio/vhost-vsock.c > @@ -342,7 +342,7 @@ static void vhost_vsock_device_realize(DeviceState *dev, > Error **errp) > vsock->vhost_dev.nvqs = ARRAY_SIZE(vsock->vhost_vqs); > vsock->vhost_dev.vqs = vsock->vhost_vqs; > ret =
Re: [Qemu-devel] [PATCH] vhost: secure vhost shared log files using argv paremeter
Hello, > On Oct 22, 2016, at 05:18, Marc-André Lureau <marcandre.lur...@gmail.com> > wrote: > > Hi > > On Sat, Oct 22, 2016 at 10:01 AM Rafael David Tinoco > <rafael.tin...@canonical.com> wrote: > Commit 31190ed7 added a migration blocker in vhost_dev_init() to > check if memfd would succeed. It is better if this blocker first > checks if vhost backend requires shared log. This will avoid a > situation where a blocker is added inappropriately (e.g. shared > log allocation fails when vhost backend doesn't support it). > > Could you make this a seperate patch? Just did, in another e-mail, cc'ing you. > Argv examples: > > -netdev tap,id=net0,vhost=on > -netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log > -netdev tap,id=net0,vhost=on,vhostlog=/tmp > > Could it be only a filename? This would simplify testing. It could. Should I keep the /tmp/ logic if no vhostlog arg is present ? Or you think it should fail if no arg is given ? I'm afraid of backward compatibility when back-porting this to older qemu versions on stable releases (like my case: I'll backport this to ~3 different versions). > For vhost backends supporting shared logs, if vhostlog is non-existent, > or a directory, random files are going to be created in the specified > directory (or, for non-existent, in tmpdir). If vhostlog is specified, > the filepath is always used when allocating vhost log files. > > > Regarding testing, you add utility code mmap-file, could you make this a > seperate commit, with unit tests? > Sure, I'll work on it. > thanks Thank u! -Rafael Tinoco
[Qemu-devel] [PATCH] vhost: secure vhost shared log files using argv paremeter
Commit 31190ed7 added a migration blocker in vhost_dev_init() to check if memfd would succeed. It is better if this blocker first checks if vhost backend requires shared log. This will avoid a situation where a blocker is added inappropriately (e.g. shared log allocation fails when vhost backend doesn't support it). --- hw/virtio/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index bd051ab..742d0aa 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -1122,7 +1122,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, if (!(hdev->features & (0x1ULL << VHOST_F_LOG_ALL))) { error_setg(>migration_blocker, "Migration disabled: vhost lacks VHOST_F_LOG_ALL feature."); -} else if (!qemu_memfd_check()) { +} else if (vhost_dev_log_is_shared(hdev) && !qemu_memfd_check()) { error_setg(>migration_blocker, "Migration disabled: failed to allocate shared memory"); } -- 2.9.3
[Qemu-devel] [PATCH] vhost: secure vhost shared log files using argv paremeter
Commit 31190ed7 added a migration blocker in vhost_dev_init() to check if memfd would succeed. It is better if this blocker first checks if vhost backend requires shared log. This will avoid a situation where a blocker is added inappropriately (e.g. shared log allocation fails when vhost backend doesn't support it). Commit: 35f9b6e added a fallback mechanism for systems not supporting memfd_create syscall (started being supported since 3.17). Backporting memfd_create might not be accepted for distros relying on older kernels. Nowadays there is no way for security driver to discover memfd filename to be created: /memfd-XX. Also, because vhost log file descriptors can be passed to other processes, after discussion, we thought it is best to back mmap by using files that can be placed into a specific directory: this commit creates "vhostlog" argv parameter for such purpose. This will allow security drivers to operate on those files appropriately. Argv examples: -netdev tap,id=net0,vhost=on -netdev tap,id=net0,vhost=on,vhostlog=/tmp/guest.log -netdev tap,id=net0,vhost=on,vhostlog=/tmp For vhost backends supporting shared logs, if vhostlog is non-existent, or a directory, random files are going to be created in the specified directory (or, for non-existent, in tmpdir). If vhostlog is specified, the filepath is always used when allocating vhost log files. Signed-off-by: Rafael David Tinoco <rafael.tin...@canonical.com> --- hw/net/vhost_net.c| 4 +- hw/scsi/vhost-scsi.c | 2 +- hw/virtio/vhost-vsock.c | 2 +- hw/virtio/vhost.c | 41 +++-- include/hw/virtio/vhost.h | 4 +- include/net/vhost_net.h | 1 + include/qemu/mmap-file.h | 10 +++ net/tap.c | 6 ++ qapi-schema.json | 3 + qemu-options.hx | 3 +- util/Makefile.objs| 1 + util/mmap-file.c | 153 ++ 12 files changed, 207 insertions(+), 23 deletions(-) create mode 100644 include/qemu/mmap-file.h create mode 100644 util/mmap-file.c diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index f2d49ad..d650c92 100644 --- a/hw/net/vhost_net.c +++ b/hw/net/vhost_net.c @@ -171,8 +171,8 @@ struct vhost_net *vhost_net_init(VhostNetOptions *options) net->dev.vq_index = net->nc->queue_index * net->dev.nvqs; } -r = vhost_dev_init(>dev, options->opaque, - options->backend_type, options->busyloop_timeout); +r = vhost_dev_init(>dev, options->opaque, options->backend_type, + options->busyloop_timeout, options->vhostlog); if (r < 0) { goto fail; } diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c index 5b26946..5dc3d30 100644 --- a/hw/scsi/vhost-scsi.c +++ b/hw/scsi/vhost-scsi.c @@ -248,7 +248,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error **errp) s->dev.backend_features = 0; ret = vhost_dev_init(>dev, (void *)(uintptr_t)vhostfd, - VHOST_BACKEND_TYPE_KERNEL, 0); + VHOST_BACKEND_TYPE_KERNEL, 0, NULL); if (ret < 0) { error_setg(errp, "vhost-scsi: vhost initialization failed: %s", strerror(-ret)); diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c index b481562..6cf6081 100644 --- a/hw/virtio/vhost-vsock.c +++ b/hw/virtio/vhost-vsock.c @@ -342,7 +342,7 @@ static void vhost_vsock_device_realize(DeviceState *dev, Error **errp) vsock->vhost_dev.nvqs = ARRAY_SIZE(vsock->vhost_vqs); vsock->vhost_dev.vqs = vsock->vhost_vqs; ret = vhost_dev_init(>vhost_dev, (void *)(uintptr_t)vhostfd, - VHOST_BACKEND_TYPE_KERNEL, 0); + VHOST_BACKEND_TYPE_KERNEL, 0, NULL); if (ret < 0) { error_setg_errno(errp, -ret, "vhost-vsock: vhost_dev_init failed"); goto err_virtio; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index bd051ab..d874ebb 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -20,7 +20,7 @@ #include "qemu/atomic.h" #include "qemu/range.h" #include "qemu/error-report.h" -#include "qemu/memfd.h" +#include "qemu/mmap-file.h" #include #include "exec/address-spaces.h" #include "hw/virtio/virtio-bus.h" @@ -326,7 +326,7 @@ static uint64_t vhost_get_log_size(struct vhost_dev *dev) return log_size; } -static struct vhost_log *vhost_log_alloc(uint64_t size, bool share) +static struct vhost_log *vhost_log_alloc(char *path, uint64_t size, bool share) { struct vhost_log *log; uint64_t logsize = size * sizeof(*(log->log)); @@ -334,9 +334,7 @@ static struct vhost_log *vhost_log_alloc(uint64_t size, bool share) log = g_new0(struct vhost_log, 1); if (share) { -log->log = qemu_memfd_alloc("vhos
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
gard printfs and minor problems) OBS: I'm basically removing fallback mechanism from memfd, creating a generic qemu_mmap_XXX implementation, adding a vhostlog parameter in tap cmdline AND changing the decision on what to use: if vhostlog is present in cmdline, qemu_mmap_XXX on vhostlog is used. If it is a directory, a random file is created inside it. If it is a file, the file is used. If no vhostlog is given (default while libvirt isn't changed), it tries first to use memfd (all newer kernels), and, if not possible, it tries to fallback using the qemu_mmap mechanism on "tmp" directory creating random files. PS: Remember that this is because selinux/apparmor labelling on tmp files (and because file descriptors can be passed away, like we discussed before). If that is okay I'll provide a patch asap. Let me know if you prefer something else. Thank you, Rafael > On Oct 04, 2016, at 12:29, Rafael David Tinoco <rafael.tin...@canonical.com> > wrote: > > >> On Oct 04, 2016, at 10:50, Marc-André Lureau <marcandre.lur...@gmail.com> >> wrote: >> >> What about having a single config parameter as a place to put all vhost logs >> for all drives for a single instance ? Remove the memfd implementation with >> all the memfd shared_memory option ? Replace it with a >> open+unlink+ftruncate+mmap approach only. >> >> >> I fail to see your point, memfd is superior to open+unlink and has other >> advantages with sealing etc. > > I was just summarising needs based on previous statement from Daniel: > >> This makes me wonder about the memfd_create() code path too - we'll >> again not want that external process to be granted access to arbitrary >> FDs of QEMU's and I'm not sure of a way to get the memfd FD to have >> a specific label. So I think it is possible that when using libvirt >> we'll want the ability to tell QEMU to *always* use an explicit file >> in a path libvirt specifies, and never use memfd even if available. >> >> Regards, >> Daniel
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
The correct (and draft) one: http://pastebin.ubuntu.com/23357210/ Im passing vhostlog parameter as "hdev->log_filename" so it can be accessed from net_init_tap()-> functions AND from vhost_dev_start()-> functions. This way I don't have to change function prototypes anymore. > On Oct 21, 2016, at 01:03, Rafael David Tinoco <rafael.tin...@canonical.com> > wrote: > > Also, if possible, I would like comments about a draft: > > https://pastebin.canonical.com/168579/ > (please disregard printfs and minor problems)
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
> On Oct 04, 2016, at 10:50, Marc-André Lureau> wrote: > > What about having a single config parameter as a place to put all vhost logs > for all drives for a single instance ? Remove the memfd implementation with > all the memfd shared_memory option ? Replace it with a > open+unlink+ftruncate+mmap approach only. > > > I fail to see your point, memfd is superior to open+unlink and has other > advantages with sealing etc. I was just summarising needs based on previous statement from Daniel: > This makes me wonder about the memfd_create() code path too - we'll > again not want that external process to be granted access to arbitrary > FDs of QEMU's and I'm not sure of a way to get the memfd FD to have > a specific label. So I think it is possible that when using libvirt > we'll want the ability to tell QEMU to *always* use an explicit file > in a path libvirt specifies, and never use memfd even if available. > > Regards, > Daniel
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
> On Oct 04, 2016, at 10:10, Marc-André Lureau> wrote: > > > How will this path be used? Is it going to be global to qemu for various > > use (kinda like $TMP), or per-device, or for memfd fallback only? Should > > the path pre-exist? (I suppose, if not, qemu should clean it up when > > leaving) > > I'd expect it to be an option set against the vhost user backend, since > that's the thing using this. > > If other things have similar usage needs wrt memfd in future, they would > also need similar path config option. I was going for that approach. I could have something similar to: -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:5c:10:f2,bus=pci.0,addr=0x3,vhostpath=/var/lib/// > The log may be shared if there are several vhost-user (stored in > vhost_log_shm global), so I think it makes more sense to have a global config > path for it, or you may end up duplicating that information per vhost backend > and having files in either of the specified paths. But, yes, indeed the vhost_log_shm makes that approach tricky. If sharing the same log file with multiple vhost backend. Besides, tools like openstack would put all the vhost log files in the same place at the end. Having a global config path, forced to be specified, orelse the vhost log isn't created, like when it fails nowadays. This seems to be the right approach.
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
True. What about having a single config parameter as a place to put all vhost logs for all drives for a single instance ? Remove the memfd implementation with all the memfd shared_memory option ? Replace it with a open+unlink+ftruncate+mmap approach only. This way every device would get its own log file and vhost-user backends would be able to get its file descriptors. (and, of course, allow the security drivers to do their jobs). >> On Oct 04, 2016, at 10:25, Daniel P. Berrangewrote: >> >> Hmm, is there a reason why it is shared? That seems to make an assumption >> that all vhost-user backends would be managed by the same external process. >> While that may be the common case today, it doesn't feel like a reasonable >> assumption to make long term. IOW it feels wiser to have it set per-NIC >> unless I'm missing something important that means it must be shared ? >
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
Let me work on it. I'll get back soon. Tks Daniel. > On Oct 04, 2016, at 05:36, Daniel P. Berrange <berra...@redhat.com> wrote: > > On Mon, Oct 03, 2016 at 04:15:55PM -0300, Rafael David Tinoco wrote: >> Yes, definitely. Check this: > > [snip] > > So in that case, I think we must add ability to specify an explicit path > that apps can use *regardles* of whether memfd support exists or not. > >>> On Oct 03, 2016, at 15:46, Rafael David Tinoco >>> <rafael.tin...@canonical.com> wrote: >>> >>>> So you're saying that the file descriptor here is actually getting >>>> passed to a different process for it to use ?
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
Yes, definitely. Check this: /** * @qemu_chr_fe_set_msgfds: * * For backends capable of fd passing, set an array of fds to be passed with * the next send operation. * A subsequent call to this function before calling a write function will * result in overwriting the fd array with the new value without being send. * Upon writing the message the fd array is freed. * * Returns: -1 if fd passing isn't supported. */ int qemu_chr_fe_set_msgfds(CharDriverState *s, int *fds, int num); So, at least for vhost_dev_log_resize, this "interface" is being implemented: vhost_user_set_log_base -> VhostUserMsg = VHOST_USER_SET_LOG_BASE vhost_user_write(with the VHOST_USER_GET_LOG_BASE message): - configures the file descriptors(... , fds, fd_num) qemu_chr_fe_set_msgfds - writes them down the char driver qemu_chr_fe_write_all > On Oct 03, 2016, at 15:46, Rafael David Tinoco <rafael.tin...@canonical.com> > wrote: > >> So you're saying that the file descriptor here is actually getting >> passed to a different process for it to use ?
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
Hello Daniel, > On Oct 03, 2016, at 14:55, Daniel P. Berrangewrote: > >> Well, it unlinks the file but the references are still there while the >> descriptor isn't closed by this process, or by the one that receives the >> descriptor (that is why is the "unlink" so early). >> >> If you check vhost_dev_log_resize(), it gets *possible* new vhost log >> (if a new size is given) and informs the vhost dev driver about the new >> log base (vhost_ops->vhost_set_log_base). >> >> For vhost_user, this means that the file descriptors for vhost logs are >> likely going to be passed to vhost backend (fds[] in >> vhost_user_set_log_base). This is just one example, not sure about >> others. >> >> Probably the best approach here, like what Marc-André said, is to create >> some sort of TMPDIR, set by libvirt perhaps ? > > So you're saying that the file descriptor here is actually getting > passed to a different process for it to use ? > > If so that means we definitely do not want this in TMPDIR. If we > create a generic file in TMPDIR, then its going to have a generic > security label. That means that the other process we're giving the > FD to is going to have to be granted permission to access this FD > and we certainly don't want to grant permission for it to access > any of QEMU's other FDs. So for the SELinux integration, we'll > need this FD to be in a specific directory, so that we can setup > policy such that the file created gets given a specific SELinux > label. We can then grant the other process access to only that > particular file, and not anything else of QEMU's. > > This makes me wonder about the memfd_create() code path too - we'll > again not want that external process to be granted access to arbitrary > FDs of QEMU's and I'm not sure of a way to get the memfd FD to have > a specific label. So I think it is possible that when using libvirt > we'll want the ability to tell QEMU to *always* use an explicit file > in a path libvirt specifies, and never use memfd even if available. Check this execution path: (vhost_vsock_device_realize) vhost_dev_init vhost_commit |- vhost_get_log_size |... |- vhost_dev_log_resize (vhost_dev_log_resize): vhost_log_get -> here if the size is bigger, a new log is created dev->vhost_ops->vhost_set_log_base() -> kernel or user vhost driver vhost_log_put() So, * In case of the kernel mode, this is just a: vhost in kernel mode = vhost_kernel_set_log_base return vhost_kernel_call(dev, VHOST_SET_LOG_BASE, ); which makes an ioctl to dev->opaque file descriptor to set a new vhost log base. * But in the case of user mode: vhost in user mode = vhost_user_set_log_base which gets the log file descriptor (log->fd) and gives to vhost_user_write. vhost_user_write will do a qemu_chr_fe_set_msgfds passing the log file descriptors for the backend vhost driver (CharDriverState). If I'm reading this right.. if the backend driver is: static int tcp_set_msgfds(CharDriverState *chr, int *fds, int num) it would check for: !qio_channel_has_feature(s->ioc, QIO_CHANNEL_FEATURE_FD_PASS)) { and configure s->write_msgfds. This would be sent in: static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len) with "io_channel_send_full" + "qio_channel_writev_full + io_writev from QIOChannelClass. https://www.berrange.com/posts/2016/08/16/ This, from your blog, probably confirms this behaviour: "The migration code supports a number of different protocols besides just “tcp:“. In particular it allows an “fd:” protocol to tell QEMU to use a passed-in file descriptor, and an “exec:” protocol to tell QEMU to launch an external command to tunnel the connection. It is desirable to be able to use TLS with these protocols too, but when using TLS the client QEMU needs to know the hostname of the target QEMU in order to correctly validate the x509 certificate it receives. Thus, a second “tls-hostname” parameter was added to allow QEMU to be informed of the hostname to use for x509 certificate validation when using a non-tcp migration protocol. This can be set on the source QEMU prior to starting the migration using the “migrate_set_str_parameter” monitor command" =) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability
Re: [Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism
Hello Marc, > On Sep 27, 2016, at 08:13, Marc-André Lureau <mlur...@redhat.com> wrote: > >>> On Tue, Sep 27, 2016 at 03:06:21AM +, Rafael David Tinoco wrote: >>> We should not have QEMU creating unpredictabile filenames in the >>> first place - any filenames should be determined by libvirt >>> explicitly. >> >> Note that the filename, per se, is not as important as other files, >> since qemu won't provide it for being accessed by external programs, and, >> deletes the file, while keeping the descriptor, right after its creation >> (due to its nature, that is probably why it was created in /tmp). >> >> Having libvirt to define a filename that would not be used for recent >> kernels (> 3.17) and would exist for a fraction of second doesn't seem >> right to me. >> > > There are other parts of qemu that rely on creating temporary files, and this > seems to lack a bit of uniformity. Would it make sense to define a place > where qemu could create those? Or setting TMPDIR should help too. Could > libvirt set a per-vm TMPDIR with appropriate security rules? Best move I can see. Only problem is that if we do that, we would have to create a fallback mechanism for when TMPDIR is not set. It would go back to /tmp ? In my particular case (for 1 vhost log file): -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:5c:10:f2,bus=pci.0,addr=0x3 I could have something similar to: -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:5c:10:f2,bus=pci.0,addr=0x3,vhostpath=/var/lib/// and put mkstemp() files (one per vhost device) in there. Even so, what to do when "vhostpath" is not informed ? I'm worried that, right now there are security drivers either blocking the live migration entirely or allowing all instances to be able to read /tmp/memfd-. Don't you think we could push the first patch until we come up with a better approach for the tmp (and default tmp) files & directories ? The patch is not worse than what was committed already. Tks Rafael
Re: [Qemu-devel] [Bug 1626972] Re: [PATCH] util: secure memfd_create fallback mechanism
Sorry, I was only able to come back to this today. > On Sep 27, 2016, at 09:18, Daniel Berrange <1626...@bugs.launchpad.net> wrote: > >> There are numerous people relying on older kernels in openstack >> deployments - sometimes with specific drivers (ovswitch, dpdk, >> infiniband) holding kernel upgrades - but still in need of upgrading >> userland (e.g. newer releases). Having a fallback mechanism seems >> appropriate for those cases. > > I'm not against some kind of fallback - just about the way it > silently creates files in /tmp. > That is why memfd_create is used here I suppose: To allow anonymous-backed-pages to have a descriptor and to be sealed. When falling back this mechanism I don't see any other way other than creating a temporary file. Of course one way would be something like: http://paste.ubuntu.com/23270379/ But this is pretty much the same, just solving the "where to place the temporary file" (non configurable for this usage). >> >> Note that the filename, per se, is not as important as other files, >> since qemu won't provide it for being accessed by external programs, and, >> deletes the file, while keeping the descriptor, right after its creation >> (due to its nature, that is probably why it was created in /tmp). > > If it doesn't shared with other processes, and is deleted immediately, > why does the file need to be on disk at all ? Well, it unlinks the file but the references are still there while the descriptor isn't closed by this process, or by the one that receives the descriptor (that is why is the "unlink" so early). If you check vhost_dev_log_resize(), it gets *possible* new vhost log (if a new size is given) and informs the vhost dev driver about the new log base (vhost_ops->vhost_set_log_base). For vhost_user, this means that the file descriptors for vhost logs are likely going to be passed to vhost backend (fds[] in vhost_user_set_log_base). This is just one example, not sure about others. Probably the best approach here, like what Marc-André said, is to create some sort of TMPDIR, set by libvirt perhaps ? > > Regards, > Daniel
Re: [Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism
Hello! > On Sep 27, 2016, at 08:13, Marc-André Lureauwrote: > >> Note that the filename, per se, is not as important as other files, >> since qemu won't provide it for being accessed by external programs, and, >> deletes the file, while keeping the descriptor, right after its creation >> (due to its nature, that is probably why it was created in /tmp). >> >> Having libvirt to define a filename that would not be used for recent >> kernels (> 3.17) and would exist for a fraction of second doesn't seem >> right to me. >> > > There are other parts of qemu that rely on creating temporary files, and this > seems to lack a bit of uniformity. Would it make sense to define a place > where qemu could create those? Or setting TMPDIR should help too. Could > libvirt set a per-vm TMPDIR with appropriate security rules? You got a point. With a per-vm TMPDIR we don't have to care about filenames in future for the security driver, while still securing them per-instance base. I'll come back to you! Thank you!
Re: [Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism
> On Sep 27, 2016, at 05:36, Daniel P. Berrange <berra...@redhat.com> wrote: > > On Tue, Sep 27, 2016 at 03:06:21AM +0000, Rafael David Tinoco wrote: >> Commit: 35f9b6ef3acc9d0546c395a566b04e63ca84e302 added a fallback >> mechanism for systems not supporting memfd_create syscall (started >> being supported since 3.17). > > This is really dubious code in general and IMHO should just > be reverted. There are numerous people relying on older kernels in openstack deployments - sometimes with specific drivers (ovswitch, dpdk, infiniband) holding kernel upgrades - but still in need of upgrading userland (e.g. newer releases). Having a fallback mechanism seems appropriate for those cases. > > We have a golden rule that any time QEMU needs to be able to > create a file on disk, then the path should be explicitly > provided as a command line argument so that mgmt apps can > control the location used. > >> Backporting memfd_create might not be accepted for distros relying >> on older kernels. Nowadays there is no way for security driver >> to discover memfd filename to be created: /memfd-XX. >> >> It is more appropriate to include UUID and/or VM names in the >> temporary filename, allowing security driver rules to be applied >> while maintaining the required unpredictability with mkstemp. > > We should not have QEMU creating unpredictabile filenames in the > first place - any filenames should be determined by libvirt > explicitly. Note that the filename, per se, is not as important as other files, since qemu won't provide it for being accessed by external programs, and, deletes the file, while keeping the descriptor, right after its creation (due to its nature, that is probably why it was created in /tmp). Having libvirt to define a filename that would not be used for recent kernels (> 3.17) and would exist for a fraction of second doesn't seem right to me. > >> This change will allow libvirt to know exact memfd file to be created >> for vhost log AND to create appropriate security rules to allow access >> per instance (instead of a opened rule like /memfd-*). > > Even with this change it is bad - we don't want driver backends > creating arbitrary files in the shared /tmp directory. On the other hand, if we are creating a tmp file, like I said, I see benefit on having unpredictability (mkstemp), but providing predictable parts to allow security driver to apply rules per instance basis (/tmp/memfd-UUID*, /tmp/memfd-VMname*). Looking forward to a decision so I can backport correct behaviour (with or without memfd file). Thank you! Best Regards, Rafael
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
I'll follow to see if patch was accepted upstream: https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06191.html https://www.mail-archive.com/qemu-devel@nongnu.org/msg400892.html -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions
[Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism
Commit: 35f9b6ef3acc9d0546c395a566b04e63ca84e302 added a fallback mechanism for systems not supporting memfd_create syscall (started being supported since 3.17). Backporting memfd_create might not be accepted for distros relying on older kernels. Nowadays there is no way for security driver to discover memfd filename to be created: /memfd-XX. It is more appropriate to include UUID and/or VM names in the temporary filename, allowing security driver rules to be applied while maintaining the required unpredictability with mkstemp. This change will allow libvirt to know exact memfd file to be created for vhost log AND to create appropriate security rules to allow access per instance (instead of a opened rule like /memfd-*). Example of apparmor deny messages with this change: Per VM UUID (preferred, generated automatically by libvirt): kernel: [26632.154856] type=1400 audit(1474945148.633:78): apparmor= "DENIED" operation="mknod" profile="libvirt-0b96011f-0dc0-44a3-92c3- 196de2efab6d" name="/tmp/memfd-0b96011f-0dc0-44a3-92c3-196de2efab6d- qeHrBV" pid=75161 comm="qemu-system-x86" requested_mask="c" denied_ mask="c" fsuid=107 ouid=107 Per VM name (if no UUID is specified): kernel: [26447.505653] type=1400 audit(1474944963.985:72): apparmor= "DENIED" operation="mknod" profile="libvirt----- " name="/tmp/memfd-instance-teste-osYpHh" pid=74648 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Signed-off-by: Rafael David Tinoco <rafael.tin...@canonical.com> --- util/memfd.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/util/memfd.c b/util/memfd.c index 4571d1a..4b715ac 100644 --- a/util/memfd.c +++ b/util/memfd.c @@ -30,6 +30,9 @@ #include #include "qemu/memfd.h" +#include "qmp-commands.h" +#include "qemu-common.h" +#include "sysemu/sysemu.h" #ifdef CONFIG_MEMFD #include @@ -94,11 +97,32 @@ void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, return NULL; } } else { +int ret = 0; const char *tmpdir = g_get_tmp_dir(); +UuidInfo *uinfo; +NameInfo *ninfo; gchar *fname; -fname = g_strdup_printf("%s/memfd-XX", tmpdir); +uinfo = qmp_query_uuid(NULL); + +ret = strcmp(uinfo->UUID, UUID_NONE); +if (ret == 0) { +ninfo = qmp_query_name(NULL); +if (ninfo->has_name) { +fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir, +ninfo->name); +} else { +fname = g_strdup_printf("%s/memfd-XX", tmpdir); +} +qapi_free_NameInfo(ninfo); +} else { +fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir, +uinfo->UUID); +} + mfd = mkstemp(fname); + +qapi_free_UuidInfo(uinfo); unlink(fname); g_free(fname); -- 2.9.3
[Qemu-devel] [PATCH] util: secure memfd_create fallback mechanism
Commit: 35f9b6ef3acc9d0546c395a566b04e63ca84e302 added a fallback mechanism for systems not supporting memfd_create syscall (started being supported since 3.17). Backporting memfd_create might not be accepted for distros relying on older kernels. Nowadays there is no way for security driver to discover memfd filename to be created: /memfd-XX. It is more appropriate to include UUID and/or VM names in the temporary filename, allowing security driver rules to be applied while maintaining the required unpredictability with mkstemp. This change will allow libvirt to know exact memfd file to be created for vhost log AND to create appropriate security rules to allow access per instance (instead of a opened rule like /memfd-*). Example of apparmor deny messages with this change: Per VM UUID (preferred, generated automatically by libvirt): kernel: [26632.154856] type=1400 audit(1474945148.633:78): apparmor= "DENIED" operation="mknod" profile="libvirt-0b96011f-0dc0-44a3-92c3- 196de2efab6d" name="/tmp/memfd-0b96011f-0dc0-44a3-92c3-196de2efab6d- qeHrBV" pid=75161 comm="qemu-system-x86" requested_mask="c" denied_ mask="c" fsuid=107 ouid=107 Per VM name (if no UUID is specified): kernel: [26447.505653] type=1400 audit(1474944963.985:72): apparmor= "DENIED" operation="mknod" profile="libvirt----- " name="/tmp/memfd-instance-teste-osYpHh" pid=74648 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Signed-off-by: Rafael David Tinoco <rafael.tin...@canonical.com> --- util/memfd.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/util/memfd.c b/util/memfd.c index 4571d1a..4b715ac 100644 --- a/util/memfd.c +++ b/util/memfd.c @@ -30,6 +30,9 @@ #include #include "qemu/memfd.h" +#include "qmp-commands.h" +#include "qemu-common.h" +#include "sysemu/sysemu.h" #ifdef CONFIG_MEMFD #include @@ -94,11 +97,32 @@ void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, return NULL; } } else { +int ret = 0; const char *tmpdir = g_get_tmp_dir(); +UuidInfo *uinfo; +NameInfo *ninfo; gchar *fname; -fname = g_strdup_printf("%s/memfd-XX", tmpdir); +uinfo = qmp_query_uuid(NULL); + +ret = strcmp(uinfo->UUID, UUID_NONE); +if (ret == 0) { +ninfo = qmp_query_name(NULL); +if (ninfo->has_name) { +fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir, +ninfo->name); +} else { +fname = g_strdup_printf("%s/memfd-XX", tmpdir); +} +qapi_free_NameInfo(ninfo); +} else { +fname = g_strdup_printf("%s/memfd-%s-XX", tmpdir, +uinfo->UUID); +} + mfd = mkstemp(fname); + +qapi_free_UuidInfo(uinfo); unlink(fname); g_free(fname); -- 2.9.3
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Fixed it according to checkpatch.pl as stated in http://wiki.qemu.org/Contribute/SubmitAPatch. http://paste.ubuntu.com/23220104/ Will submit to mailing list after testing everything. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
I came up with this patch for QEMU: http://paste.ubuntu.com/23217056/ I'm finishing libvirt patch so I can propose upstream QEMU already sure that libvirt will benefit from this change. Right after I'll propose libvirt upstream patch (changing vert-aa-helper logic). And later: Improved it a little bit: http://paste.ubuntu.com/23217333/ And fixed it: http://paste.ubuntu.com/23219599/ (Probable the version to be suggested to upstream) ** Changed in: qemu Status: New => In Progress ** Changed in: qemu Assignee: (unassigned) => Rafael David Tinoco (inaddy) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] Re: QEMU memfd_create fallback mechanism change for security drivers
Related: https://bugs.launchpad.net/nova/+bug/1613423 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1626972/+subscriptions
[Qemu-devel] [Bug 1626972] [NEW] QEMU memfd_create fallback mechanism change for security drivers
Public bug reported: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate. >From qemu 2.5, logic is on : void *qemu_memfd_alloc(const char *name, size_t size, unsigned int seals, int *fd) { if (memfd_create)... ### only works with HWE kernels else ### 3.13 kernels, gets blocked by apparmor tmpdir = g_get_tmp_dir ... mfd = mkstemp(fname) } And you can see the errors: >From the host trying to send the virtual machine: 2016-08-15 16:36:26.160 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Migration operation has aborted 2016-08-15 16:36:26.248 1974 ERROR nova.virt.libvirt.driver [req-0cac612b-8d53-4610-b773-d07ad6bacb91 691a581cfa7046278380ce82b1c38ddd 133ebc3585c041aebaead8c062cd6511 - - -] [instance: 2afa1131-bc8c-43d2-9c4a-962c1bf7723e] Live Migration failure: internal error: unable to execute QEMU command 'migrate': Migration disabled: failed to allocate shared memory >From the host trying to receive the virtual machine: Aug 15 16:36:19 tkcompute01 kernel: [ 1194.356794] type=1400 audit(1471289779.791:72): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12565 comm="apparmor_parser" Aug 15 16:36:19 tkcompute01 kernel: [ 1194.357048] type=1400 audit(1471289779.791:73): apparmor="STATUS" operation="profile_load" profile="unconfined" name="qemu_bridge_helper" pid=12565 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.877027] type=1400 audit(1471289780.311:74): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.904407] type=1400 audit(1471289780.343:75): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="qemu_bridge_helper" pid=12613 comm="apparmor_parser" Aug 15 16:36:20 tkcompute01 kernel: [ 1194.973064] type=1400 audit(1471289780.407:76): apparmor="DENIED" operation="mknod" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/memfd-tNpKSj" pid=12625 comm="qemu-system-x86" requested_mask="c" denied_mask="c" fsuid=107 ouid=107 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979871] type=1400 audit(1471289780.411:77): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 Aug 15 16:36:20 tkcompute01 kernel: [ 1194.979881] type=1400 audit(1471289780.411:78): apparmor="DENIED" operation="open" profile="libvirt-2afa1131-bc8c-43d2-9c4a-962c1bf7723e" name="/var/tmp/" pid=12625 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=107 ouid=0 When leaving libvirt without apparmor capabilities (thus not confining virtual machines on compute nodes, the live migration works as expected, so, clearly, apparmor is stepping into the live migration). I'm sure that virtual machines have to be confined and that this isn't the desired behaviour... ** Affects: qemu Importance: Undecided Assignee: Rafael David Tinoco (inaddy) Status: In Progress -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1626972 Title: QEMU memfd_create fallback mechanism change for security drivers Status in QEMU: In Progress Bug description: And, when libvirt starts using apparmor, and creating apparmor profiles for every virtual machine created in the compute nodes, mitaka qemu (2.5 - and upstream also) uses a fallback mechanism for creating shared memory for live-migrations. This fall back mechanism, on kernels 3.13 - that don't have memfd_create() system-call, try to create files on /tmp/ directory and fails.. causing live-migration not to work. Trusty with kernel 3.13 + Mitaka with qemu 2.5 + apparmor capability = can't live migrate.