Re: [Qemu-devel] [PATCH v2 00/14] fp-test + hardfloat
Hi, This series failed build test on s390x host. Please find the details below. Type: series Message-id: 1522128840-498-1-git-send-email-c...@braap.org Subject: [Qemu-devel] [PATCH v2 00/14] fp-test + hardfloat === TEST SCRIPT BEGIN === #!/bin/bash # Testing script will be invoked under the git checkout with # HEAD pointing to a commit that has the patches applied on top of "base" # branch set -e echo "=== ENV ===" env echo "=== PACKAGES ===" rpm -qa echo "=== TEST BEGIN ===" CC=$HOME/bin/cc INSTALL=$PWD/install BUILD=$PWD/build echo -n "Using CC: " realpath $CC mkdir -p $BUILD $INSTALL SRC=$PWD cd $BUILD $SRC/configure --cc=$CC --prefix=$INSTALL make -j4 # XXX: we need reliable clean up # make check -j4 V=1 make install === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' acf121ab02 hardfloat: support float32_to_float64 717c53e9e1 hardfloat: support float32/64 comparison 96c01923f0 hardfloat: support float32/64 square root f2df632dcd hardfloat: support float32/64 fused multiply-add 9ed1261941 hardfloat: support float32/64 division 030055e5dd hardfloat: support float32/64 multiplication a6ceaae3fd hardfloat: support float32/64 addition and subtraction 85ac349d03 fpu: introduce hardfloat 4661c2c7a8 target/tricore: use float32_is_denormal bb6b4e9ee9 softfloat: add float{32, 64}_is_{de, }normal 6b0d547870 fp-test: add muladd variants c56f6dc7d1 softfloat: fix {min, max}nummag for same-abs-value inputs bfa4c9f682 tests: add fp-test, a floating point test suite 0787c157b4 tests: add fp-bench, a collection of simple floating-point microbenchmarks === OUTPUT BEGIN === === ENV === LANG=en_US.UTF-8 XDG_SESSION_ID=120232 USER=fam PWD=/var/tmp/patchew-tester-tmp-t8mztrft/src HOME=/home/fam SHELL=/bin/sh SHLVL=2 PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug LOGNAME=fam DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus XDG_RUNTIME_DIR=/run/user/1012 PATH=/usr/bin:/bin _=/usr/bin/env === PACKAGES === gpg-pubkey-873529b8-54e386ff glibc-debuginfo-common-2.24-10.fc25.s390x fedora-release-26-1.noarch dejavu-sans-mono-fonts-2.35-4.fc26.noarch xemacs-filesystem-21.5.34-22.20170124hgf412e9f093d4.fc26.noarch bash-4.4.12-7.fc26.s390x freetype-2.7.1-9.fc26.s390x libSM-1.2.2-5.fc26.s390x libmpc-1.0.2-6.fc26.s390x libaio-0.3.110-7.fc26.s390x libverto-0.2.6-7.fc26.s390x perl-Scalar-List-Utils-1.48-1.fc26.s390x iptables-libs-1.6.1-2.fc26.s390x p11-kit-trust-0.23.9-2.fc26.s390x tcl-8.6.6-2.fc26.s390x libxshmfence-1.2-4.fc26.s390x expect-5.45-23.fc26.s390x perl-Thread-Queue-3.12-1.fc26.noarch perl-encoding-2.19-6.fc26.s390x keyutils-1.5.10-1.fc26.s390x gmp-devel-6.1.2-4.fc26.s390x enchant-1.6.0-16.fc26.s390x net-snmp-libs-5.7.3-17.fc26.s390x python-gobject-base-3.24.1-1.fc26.s390x python3-enchant-1.6.10-1.fc26.noarch python-lockfile-0.11.0-6.fc26.noarch python2-pyparsing-2.1.10-3.fc26.noarch python2-lxml-4.1.1-1.fc26.s390x librados2-10.2.7-2.fc26.s390x trousers-lib-0.3.13-7.fc26.s390x libpaper-1.1.24-14.fc26.s390x libdatrie-0.2.9-4.fc26.s390x libsoup-2.58.2-1.fc26.s390x passwd-0.79-9.fc26.s390x bind99-libs-9.9.10-3.P3.fc26.s390x python3-rpm-4.13.0.2-1.fc26.s390x systemd-233-7.fc26.s390x virglrenderer-0.6.0-1.20170210git76b3da97b.fc26.s390x s390utils-ziomon-1.36.1-3.fc26.s390x s390utils-osasnmpd-1.36.1-3.fc26.s390x libXrandr-1.5.1-2.fc26.s390x libglvnd-glx-1.0.0-1.fc26.s390x texlive-ifxetex-svn19685.0.5-33.fc26.2.noarch texlive-psnfss-svn33946.9.2a-33.fc26.2.noarch texlive-dvipdfmx-def-svn40328-33.fc26.2.noarch texlive-natbib-svn20668.8.31b-33.fc26.2.noarch texlive-xdvi-bin-svn40750-33.20160520.fc26.2.s390x texlive-cm-svn32865.0-33.fc26.2.noarch texlive-beton-svn15878.0-33.fc26.2.noarch texlive-fpl-svn15878.1.002-33.fc26.2.noarch texlive-mflogo-svn38628-33.fc26.2.noarch texlive-texlive-docindex-svn41430-33.fc26.2.noarch texlive-luaotfload-bin-svn34647.0-33.20160520.fc26.2.noarch texlive-koma-script-svn41508-33.fc26.2.noarch texlive-pst-tree-svn24142.1.12-33.fc26.2.noarch texlive-breqn-svn38099.0.98d-33.fc26.2.noarch texlive-xetex-svn41438-33.fc26.2.noarch gstreamer1-plugins-bad-free-1.12.3-1.fc26.s390x xorg-x11-font-utils-7.5-33.fc26.s390x ghostscript-fonts-5.50-36.fc26.noarch libXext-devel-1.3.3-5.fc26.s390x libusbx-devel-1.0.21-2.fc26.s390x libglvnd-devel-1.0.0-1.fc26.s390x emacs-25.3-3.fc26.s390x alsa-lib-devel-1.1.4.1-1.fc26.s390x kbd-2.0.4-2.fc26.s390x dconf-0.26.0-2.fc26.s390x ccache-3.3.4-1.fc26.s390x mc-4.8.19-5.fc26.s390x doxygen-1.8.13-9.fc26.s390x dpkg-1.18.24-1.fc26.s390x libtdb-1.3.13-1.fc26.s390x python2-pynacl-1.1.1-1.fc26.s390x nss-sysinit-3.34.0-1.0.fc26.s390x kernel-4.13.16-202.fc26.s390x perl-Filter-1.58-1.fc26.s390x python2-pip-9.0.1-11.fc26.noarch dnf-2.7.5-2.fc26.noarch sssd-common-1.16.0-4.fc26.s390x python2-sssdconfig-1.16.0-4.fc26.noarch bind-license-9.11.2-1.P1.fc26.noarch libtasn1-4.13-1.fc26.s390x glusterfs-fuse-3.10.10-1.fc26.s390x cpp-7.3.1-2.fc26.s390x pkgconf-1.3.12-2.fc26.s390x python2-fedora-0.10.0-1.fc26.noarch cmake
Re: [Qemu-devel] [PATCH 1/1] block/file-posix.c: fix not reopened lock file descriptor
On Thu, 03/22 19:08, Dion Bosschieter wrote: > Yeah I have a use case, before a last sync on a storage migration we suspend a > VM -> send the last diffs -> mount the new storage server and after that we > change a symlink -> call reopen -> check if all file descriptors are changed > before resuming the VM. What is the point of changing the symlink and checking if FDs are changed? Fam
Re: [Qemu-devel] [PATCH 1/1] block/file-posix.c: fix not reopened lock file descriptor
On Thu, 03/22 18:39, Kevin Wolf wrote: > [ Cc: qemu-block ] > > Am 22.03.2018 um 18:20 hat Dion Bosschieter geschrieben: > > In commit 244a5668106297378391b768e7288eb157616f64 another > > file descriptor to BDRVRawState is added. When we try to issue the > > reopen command only s->fd is reopened; lock_fd could still hold an old > > file descriptor "possibly" pointing to another file. > > > > - change raw_reopen_prepare so it checks use_lock from BDRVRawState and > > tries to reopen lock_fd accordingly > > - change raw_reopen_commit so it closes the old lock_fd on use_lock > > > > Signed-off-by: Dion Bosschieter > > bdrv_reopen() is not meant for opening a different file, it is meant to > change the flags and options of the same file. Do you have a use case > where you would actually need to switch to a different file? > > As far as I know, lock_fd was specifically introduced _because_ it stays > the same across reopen, so we don't need a racy release/reacquire pair. > Fam (CCed) should know more. I think (I remember we have discussed a bit before) techinically we can do reopen just well without a separate s->lock_fd. After all all locks we acquire are shared lock, so it is possible to handle the lock between old fd and the new one back and forth freely. Fam
[Qemu-devel] [PULL] qemu-doc: Rework the network options chapter to make "-net" less prominent
From: Thomas Huth "-net" is clearly a legacy option. Yet we still use it in almost all examples in the qemu documentation, and many other spots in the network chapter. We should make it less prominent that users are not lured into using it so often anymore. So instead of starting the network chapter with "-net nic" and documenting "-net " below "-netdev " everywhere, all the "-net" related documentation is now moved to the end of the chapter. The new "-nic" option is moved to the beginning of the chapter instead, with a new example that should demonstrate how "-nic" can be used to shortcut "-device" with "-netdev". The examples in this chapter are changed to use the "-device" and "-netdev" options or "-nic" instead of "-net nic -net ". While we're at it, also remove a legacy remark about very old Linux distributions. Also remove the "[...]" from the examples in this chapter since we are not using this ellipsis in any other examples in our docu- mentation. Signed-off-by: Thomas Huth Signed-off-by: Jason Wang --- qemu-options.hx | 189 1 file changed, 94 insertions(+), 95 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 74158e7..3ece30d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2048,41 +2048,40 @@ DEF("net", HAS_ARG, QEMU_OPTION_net, "old way to initialize a host network interface\n" "(use the -netdev option if possible instead)\n", QEMU_ARCH_ALL) STEXI -@item -net nic[,vlan=@var{n}][,netdev=@var{nd}][,macaddr=@var{mac}][,model=@var{type}] [,name=@var{name}][,addr=@var{addr}][,vectors=@var{v}] -@findex -net -Configure or create an on-board (or machine default) Network Interface Card -(NIC) and connect it either to VLAN @var{n} (@var{n} = 0 is the default), or -to the netdev @var{nd}. The NIC is an e1000 by default on the PC -target. Optionally, the MAC address can be changed to @var{mac}, the -device address set to @var{addr} (PCI cards only), -and a @var{name} can be assigned for use in monitor commands. -Optionally, for PCI cards, you can specify the number @var{v} of MSI-X vectors -that the card should have; this option currently only affects virtio cards; set -@var{v} = 0 to disable MSI-X. If no @option{-net} option is specified, a single -NIC is created. QEMU can emulate several different models of network card. -Valid values for @var{type} are -@code{virtio}, @code{i82551}, @code{i82557b}, @code{i82559er}, -@code{ne2k_pci}, @code{ne2k_isa}, @code{pcnet}, @code{rtl8139}, -@code{e1000}, @code{smc91c111}, @code{lance} and @code{mcf_fec}. -Not all devices are supported on all targets. Use @code{-net nic,model=help} -for a list of available devices for your target. +@item -nic [tap|bridge|user|l2tpv3|vde|netmap|vhost-user|socket][,...][,mac=macaddr][,model=mn] +@findex -nic +This option is a shortcut for configuring both the on-board (default) guest +NIC hardware and the host network backend in one go. The host backend options +are the same as with the corresponding @option{-netdev} options below. +The guest NIC model can be set with @option{model=@var{modelname}}. +Use @option{model=help} to list the available device types. +The hardware MAC address can be set with @option{mac=@var{macaddr}}. + +The following two example do exactly the same, to show how @option{-nic} can +be used to shorten the command line length (note that the e1000 is the default +on i386, so the @option{model=e1000} parameter could even be omitted here, too): +@example +qemu-system-i386 -netdev user,id=n1,ipv6=off -device e1000,netdev=n1,mac=52:54:98:76:54:32 +qemu-system-i386 -nic user,ipv6=off,model=e1000,mac=52:54:98:76:54:32 +@end example + +@item -nic none +Indicate that no network devices should be configured. It is used to override +the default configuration (default NIC with ``user'' host network backend) +which is activated if no other networking options are provided. @item -netdev user,id=@var{id}[,@var{option}][,@var{option}][,...] @findex -netdev -@item -net user[,@var{option}][,@var{option}][,...] -Use the user mode network stack which requires no administrator +Configure user mode host network backend which requires no administrator privilege to run. Valid options are: @table @option -@item vlan=@var{n} -Connect user mode stack to VLAN @var{n} (@var{n} = 0 is the default). - @item id=@var{id} -@itemx name=@var{name} Assign symbolic name for use in monitor commands. -@option{ipv4} and @option{ipv6} specify that either IPv4 or IPv6 must -be enabled. If neither is specified both protocols are enabled. +@item ipv4=on|off and ipv6=on|off +Specify that either IPv4 or IPv6 must be enabled. If neither is specified +both protocols are enabled. @item net=@var{addr}[/@var{mask}] Set IP network address the guest will see. Optionally specify the netmask, @@ -2134,7 +2133,7 @@ can not be resolved. Example: @example -qemu -net user,dnssearch=mgmt.example.org,dnssearch=exam
[Qemu-devel] [PULL] Net patches
The following changes since commit 47d3b60858d90ac8a0cc3a72af7f95c96781125a: Merge remote-tracking branch 'remotes/riscv/tags/riscv-qemu-2.12-important-fixes' into staging (2018-03-28 22:13:38 +0100) are available in the git repository at: https://github.com/jasowang/qemu.git tags/net-pull-request for you to fetch changes up to a0350d780e24d2f6a0cf8fc3191badb0329a: qemu-doc: Rework the network options chapter to make "-net" less prominent (2018-03-30 11:44:11 +0800) Thomas Huth (1): qemu-doc: Rework the network options chapter to make "-net" less prominent qemu-options.hx | 189 1 file changed, 94 insertions(+), 95 deletions(-)
Re: [Qemu-devel] [PATCH 2/2] contrib/vhost-user-blk: enable protocol feature for vhost-user-blk
On Fri, Mar 30, 2018 at 10:46:17AM +0800, Changpeng Liu wrote: > This patch reports the protocol feature that is only advertised by > QEMU if the device implements the config ops. > > Signed-off-by: Changpeng Liu OK but pls just send a single patch next time. People often split out just the headers or some utility functions with little value, but this does not help review at all. No need to repost just for this. > --- > contrib/vhost-user-blk/vhost-user-blk.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/contrib/vhost-user-blk/vhost-user-blk.c > b/contrib/vhost-user-blk/vhost-user-blk.c > index 67dac81..f6e0170 100644 > --- a/contrib/vhost-user-blk/vhost-user-blk.c > +++ b/contrib/vhost-user-blk/vhost-user-blk.c > @@ -311,6 +311,12 @@ vub_get_features(VuDev *dev) > 1ull << VHOST_USER_F_PROTOCOL_FEATURES; > } > > +static uint64_t > +vub_get_protocol_features(VuDev *dev) > +{ > +return 1ull << VHOST_USER_PROTOCOL_F_CONFIG; > +} > + > static int > vub_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len) > { > @@ -372,6 +378,7 @@ vub_set_config(VuDev *vu_dev, const uint8_t *data, > > static const VuDevIface vub_iface = { > .get_features = vub_get_features, > +.get_protocol_features = vub_get_protocol_features, > .queue_set_started = vub_queue_set_started, > .get_config = vub_get_config, > .set_config = vub_set_config, > -- > 1.9.3
Re: [Qemu-devel] [PATCH v2 04/10] migration: detect compression and decompression errors
On 03/29/2018 12:25 PM, Peter Xu wrote: On Thu, Mar 29, 2018 at 11:51:03AM +0800, Xiao Guangrong wrote: On 03/28/2018 05:59 PM, Peter Xu wrote: On Tue, Mar 27, 2018 at 05:10:37PM +0800, guangrong.x...@gmail.com wrote: [...] -static int compress_threads_load_setup(void) +static int compress_threads_load_setup(QEMUFile *f) { int i, thread_count; @@ -2665,6 +2685,7 @@ static int compress_threads_load_setup(void) } decomp_param[i].stream.opaque = &decomp_param[i]; +decomp_param[i].file = f; On the source side the error will be set via: qemu_file_set_error(migrate_get_current()->to_dst_file, blen); Maybe we can do similar things using migrate_incoming_get_current() to avoid caching the QEMUFile multiple times? I have considered it, however, it can not work as the @file used by ram loader is not the file got from migrate_incoming_get_current() under some cases. For example, in colo_process_incoming_thread(), the file passed to qemu_loadvm_state() is a internal buffer and it is not easy to switch it to incoming file. I see. How about cache it in a global variable? We have these already: thread_count = migrate_decompress_threads(); decompress_threads = g_new0(QemuThread, thread_count); decomp_param = g_new0(DecompressParam, thread_count); ... IMHO we can add a new one too, at least we don't cache it multiple times (after all decomp_param[i]s are global variables too). Nice, that's good to me. Will add your Reviewed-by on this patch as well if you do not mind. :)
Re: [Qemu-devel] [PATCH 1/2] contrib/libvhost-user: add the protocol feature used for SET/GET message
The patch serials depend on Maxime's patch "vhost-user: back SET/GET_CONFIG requests with a protocol feature", which adding the protocol feature bit for SET/GET messages. > -Original Message- > From: Liu, Changpeng > Sent: Friday, March 30, 2018 10:46 AM > To: Liu, Changpeng ; qemu-devel@nongnu.org > Cc: m...@redhat.com; marcandre.lur...@redhat.com; > maxime.coque...@redhat.com > Subject: [PATCH 1/2] contrib/libvhost-user: add the protocol feature used for > SET/GET message > > Signed-off-by: Changpeng Liu > --- > contrib/libvhost-user/libvhost-user.h | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/contrib/libvhost-user/libvhost-user.h > b/contrib/libvhost-user/libvhost- > user.h > index 79f7a53..b27075e 100644 > --- a/contrib/libvhost-user/libvhost-user.h > +++ b/contrib/libvhost-user/libvhost-user.h > @@ -50,6 +50,7 @@ enum VhostUserProtocolFeature { > VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6, > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > +VHOST_USER_PROTOCOL_F_CONFIG = 9, > > VHOST_USER_PROTOCOL_F_MAX > }; > -- > 1.9.3
[Qemu-devel] [PATCH 2/2] contrib/vhost-user-blk: enable protocol feature for vhost-user-blk
This patch reports the protocol feature that is only advertised by QEMU if the device implements the config ops. Signed-off-by: Changpeng Liu --- contrib/vhost-user-blk/vhost-user-blk.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/contrib/vhost-user-blk/vhost-user-blk.c b/contrib/vhost-user-blk/vhost-user-blk.c index 67dac81..f6e0170 100644 --- a/contrib/vhost-user-blk/vhost-user-blk.c +++ b/contrib/vhost-user-blk/vhost-user-blk.c @@ -311,6 +311,12 @@ vub_get_features(VuDev *dev) 1ull << VHOST_USER_F_PROTOCOL_FEATURES; } +static uint64_t +vub_get_protocol_features(VuDev *dev) +{ +return 1ull << VHOST_USER_PROTOCOL_F_CONFIG; +} + static int vub_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len) { @@ -372,6 +378,7 @@ vub_set_config(VuDev *vu_dev, const uint8_t *data, static const VuDevIface vub_iface = { .get_features = vub_get_features, +.get_protocol_features = vub_get_protocol_features, .queue_set_started = vub_queue_set_started, .get_config = vub_get_config, .set_config = vub_set_config, -- 1.9.3
[Qemu-devel] [PATCH 1/2] contrib/libvhost-user: add the protocol feature used for SET/GET message
Signed-off-by: Changpeng Liu --- contrib/libvhost-user/libvhost-user.h | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h index 79f7a53..b27075e 100644 --- a/contrib/libvhost-user/libvhost-user.h +++ b/contrib/libvhost-user/libvhost-user.h @@ -50,6 +50,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_CROSS_ENDIAN = 6, VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, +VHOST_USER_PROTOCOL_F_CONFIG = 9, VHOST_USER_PROTOCOL_F_MAX }; -- 1.9.3
Re: [Qemu-devel] [PATCH v2 0/6] e1000 migration changes for 2.12
On 2018年03月29日 00:36, Dr. David Alan Gilbert (git) wrote: From: "Dr. David Alan Gilbert" Hi, This set of patches change the e1000 migration code to make it easier to keep with compatibility with older versions in backwards migration. I think the first 3 patches are fairly uncontrovercial and I would like them for 2.12; it would be nice to have the lot since changing them after we've shipped is much more difficult. v2 Ed and Paolo answered my question that I asked in the cover letter; and I think I've followed the advice - although my testing has been very light. The new patches do two things: a) When we receive a stream without the subsection we duplicate the received pops state into both props and tso_props. b) When we send without the subsection we decide which set to send in the main part of the state based on which state was last changed. Dave Dr. David Alan Gilbert (6): e1000: Convert v3 fields to subsection e1000: Dupe offload data on reading old stream e1000: wire new subsection to property e1000: Migrate props via a temporary structure e1000: Choose which set of props to migrate e1000: Old machine types, turn new subsection off hw/net/e1000.c | 103 include/hw/compat.h | 4 ++ 2 files changed, 84 insertions(+), 23 deletions(-) Reviewed-by: Jason Wang
Re: [Qemu-devel] [PATCH v2 5/6] e1000: Choose which set of props to migrate
On 2018年03月29日 16:44, Dr. David Alan Gilbert wrote: * Jason Wang (jasow...@redhat.com) wrote: On 2018年03月29日 16:08, Dr. David Alan Gilbert wrote: * Jason Wang (jasow...@redhat.com) wrote: On 2018年03月29日 00:36, Dr. David Alan Gilbert (git) wrote: From: "Dr. David Alan Gilbert" When we're using the subsection we migrate both the 'props' and 'tso_props' data; when we're not using the subsection (to migrate to 2.11 or old machine types) we've got to choose what to migrate in the main structure. If we're using the subsection migrate 'props' in the main structure. If we're not using the subsection then migrate the last one that changed, which gives behaviour similar to the old behaviour. But only after migration. Why not simply switch back to the old behavior if migrate_tso_props if false? Because: 1) We know it's a broken behaviour so it's better not to unfix it 2) The fix doesn't change guest visible behaviour other than actually sending the right packets; so there's no reason to make the fix itself dependent on the machine type. 3) Gating the fix itself on the flag is actually more complex and would need checking the flag in lots of places that are already pretty complex, rather than what this does which is just check it in one place at migration. It looks to me it was just something like: struct e1000x_txd_props *props = tp->cptse && chkflag(TSO) ? &tp->tso_props : &tp->props; is that the only thing that would change? Looks like and we can avoid using tricks like patch 4. Btw, did this patch work when: migrate A to B migrate B to C But there's no tx during B? Hmm, good question; I only tried it keeping the stream alive during migration. Lets see what happens. For this code to be used we have to be running with an old machine type/property. That means A->B will have either come from 2.11 or a 2.12 with this same patch. But given patch 2 that duplicates on loading; that means A->B should end up with B having the same data in both sets of props, and thus it doesn't matter which set this patch picks. Yes it is. I think I would agree with your idea now consider it was slightly better than using the old behavior. Thanks Dave Thanks Dave Thanks -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [PULL v1 3/4] tpm: CRB: Enforce locality is requested before processing buffer
Section 5.5.3.2.2 of the CRB specs states that use of the TPM through the localty control method must first be requested, otherwise the command will be dropped. Signed-off-by: Stefan Berger Reviewed-by: Marc-André Lureau --- hw/tpm/tpm_crb.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c index ee6c87e..a92dd50 100644 --- a/hw/tpm/tpm_crb.c +++ b/hw/tpm/tpm_crb.c @@ -76,6 +76,8 @@ enum crb_cancel { CRB_CANCEL_INVOKE = BIT(0), }; +#define TPM_CRB_NO_LOCALITY 0xff + static uint64_t tpm_crb_mmio_read(void *opaque, hwaddr addr, unsigned size) { @@ -95,10 +97,19 @@ static uint64_t tpm_crb_mmio_read(void *opaque, hwaddr addr, return val; } +static uint8_t tpm_crb_get_active_locty(CRBState *s) +{ +if (!ARRAY_FIELD_EX32(s->regs, CRB_LOC_STATE, locAssigned)) { +return TPM_CRB_NO_LOCALITY; +} +return ARRAY_FIELD_EX32(s->regs, CRB_LOC_STATE, activeLocality); +} + static void tpm_crb_mmio_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) { CRBState *s = CRB(opaque); +uint8_t locty = addr >> 12; trace_tpm_crb_mmio_write(addr, size, val); @@ -123,7 +134,8 @@ static void tpm_crb_mmio_write(void *opaque, hwaddr addr, break; case A_CRB_CTRL_START: if (val == CRB_START_INVOKE && -!(s->regs[R_CRB_CTRL_START] & CRB_START_INVOKE)) { +!(s->regs[R_CRB_CTRL_START] & CRB_START_INVOKE) && +tpm_crb_get_active_locty(s) == locty) { void *mem = memory_region_get_ram_ptr(&s->cmdmem); s->regs[R_CRB_CTRL_START] |= CRB_START_INVOKE; -- 2.5.5
[Qemu-devel] [PULL v1 4/4] tests: Tests more flags of the CRB interface
Test and modify more flags of the CRB interface. Signed-off-by: Stefan Berger Reviewed-by: Marc-André Lureau --- tests/tpm-crb-test.c | 74 ++-- 1 file changed, 72 insertions(+), 2 deletions(-) diff --git a/tests/tpm-crb-test.c b/tests/tpm-crb-test.c index e1513cb..d8f9569 100644 --- a/tests/tpm-crb-test.c +++ b/tests/tpm-crb-test.c @@ -28,6 +28,10 @@ static void tpm_crb_test(const void *data) uint64_t caddr = readq(TPM_CRB_ADDR_BASE + A_CRB_CTRL_CMD_LADDR); uint32_t rsize = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_RSP_SIZE); uint64_t raddr = readq(TPM_CRB_ADDR_BASE + A_CRB_CTRL_RSP_ADDR); +uint8_t locstate = readb(TPM_CRB_ADDR_BASE + A_CRB_LOC_STATE); +uint32_t locctrl = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL); +uint32_t locsts = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_STS); +uint32_t sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); g_assert_cmpint(FIELD_EX32(intfid, CRB_INTF_ID, InterfaceType), ==, 1); g_assert_cmpint(FIELD_EX32(intfid, CRB_INTF_ID, InterfaceVersion), ==, 1); @@ -45,9 +49,47 @@ static void tpm_crb_test(const void *data) g_assert_cmpint(caddr, >, TPM_CRB_ADDR_BASE); g_assert_cmpint(raddr, >, TPM_CRB_ADDR_BASE); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmEstablished), ==, 1); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, locAssigned), ==, 0); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, activeLocality), ==, 0); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, reserved), ==, 0); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmRegValidSts), ==, 1); + +g_assert_cmpint(locctrl, ==, 0); + +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, Granted), ==, 0); +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, beenSeized), ==, 0); + +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 1); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); + +/* request access to locality 0 */ +writeb(TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL, 1); + +/* granted bit must be set now */ +locsts = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_STS); +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, Granted), ==, 1); +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, beenSeized), ==, 0); + +/* we must have an assigned locality */ +locstate = readb(TPM_CRB_ADDR_BASE + A_CRB_LOC_STATE); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmEstablished), ==, 1); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, locAssigned), ==, 1); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, activeLocality), ==, 0); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, reserved), ==, 0); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmRegValidSts), ==, 1); + +/* set into ready state */ +writel(TPM_CRB_ADDR_BASE + A_CRB_CTRL_REQ, 1); + +/* TPM must not be in the idle state */ +sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 0); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); + memwrite(caddr, TPM_CMD, sizeof(TPM_CMD)); -uint32_t sts, start = 1; +uint32_t start = 1; uint64_t end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND; writel(TPM_CRB_ADDR_BASE + A_CRB_CTRL_START, start); do { @@ -58,12 +100,40 @@ static void tpm_crb_test(const void *data) } while (g_get_monotonic_time() < end_time); start = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_START); g_assert_cmpint(start & 1, ==, 0); + +/* TPM must still not be in the idle state */ sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); -g_assert_cmpint(sts & 1, ==, 0); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 0); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); struct tpm_hdr tpm_msg; memread(raddr, &tpm_msg, sizeof(tpm_msg)); g_assert_cmpmem(&tpm_msg, sizeof(tpm_msg), s->tpm_msg, sizeof(*s->tpm_msg)); + +/* set TPM into idle state */ +writel(TPM_CRB_ADDR_BASE + A_CRB_CTRL_REQ, 2); + +/* idle state must be indicated now */ +sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 1); +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); + +/* relinquish locality */ +writel(TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL, 2); + +/* Granted flag must be cleared */ +sts = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_STS); +g_assert_cmpint(FIELD_EX32(sts, CRB_LOC_STS, Granted), ==, 0); +g_assert_cmpint(FIELD_EX32(sts, CRB_LOC_STS, beenSeized), ==, 0); + +/* no locality may be assigned */ +locstate = readb(TPM_CRB_ADDR_BASE + A_CRB_LOC_STATE); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmEstablished), ==, 1); +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, locAssigned), ==, 0); +g_assert_cmpint(FIELD_EX32(locs
[Qemu-devel] [PULL v1 0/4] Merge tpm 2018/03/29 v1
The following patches fix the handling of some more flags of the TPM CRB interface and extend the existing TPM CRB test program with tests of more flags. Stefan The following changes since commit 043289bef4d9c0d277c45695c676a6cc9fca48a0: Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180328' into staging (2018-03-28 13:30:10 +0100) are available in the git repository at: git://github.com/stefanberger/qemu-tpm.git tags/pull-tpm-2018-03-29-1 for you to fetch changes up to 4d0d1c077e5622da95fd0f6a8e7efb72e0f222b4: tests: Tests more flags of the CRB interface (2018-03-29 17:41:03 -0400) Stefan Berger (4): tpm: CRB: set the Idle flag by default tpm: CRB: Reset Granted flag when relinquishing locality tpm: CRB: Enforce locality is requested before processing buffer tests: Tests more flags of the CRB interface hw/tpm/tpm_crb.c | 18 +- tests/tpm-crb-test.c | 74 -- 2 files changed, 89 insertions(+), 3 deletions(-) -- 2.5.5
[Qemu-devel] [PULL v1 2/4] tpm: CRB: Reset Granted flag when relinquishing locality
Reset the Granted flag when relinquishing a locality. Signed-off-by: Stefan Berger Reviewed-by: Marc-André Lureau --- hw/tpm/tpm_crb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c index e728b55..ee6c87e 100644 --- a/hw/tpm/tpm_crb.c +++ b/hw/tpm/tpm_crb.c @@ -145,6 +145,8 @@ static void tpm_crb_mmio_write(void *opaque, hwaddr addr, case CRB_LOC_CTRL_RELINQUISH: ARRAY_FIELD_DP32(s->regs, CRB_LOC_STATE, locAssigned, 0); +ARRAY_FIELD_DP32(s->regs, CRB_LOC_STS, + Granted, 0); break; case CRB_LOC_CTRL_REQUEST_ACCESS: ARRAY_FIELD_DP32(s->regs, CRB_LOC_STS, -- 2.5.5
[Qemu-devel] [PULL v1 1/4] tpm: CRB: set the Idle flag by default
Signed-off-by: Stefan Berger Reviewed-by: Marc-André Lureau --- hw/tpm/tpm_crb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c index ef8b80e..e728b55 100644 --- a/hw/tpm/tpm_crb.c +++ b/hw/tpm/tpm_crb.c @@ -220,6 +220,8 @@ static void tpm_crb_reset(void *dev) ARRAY_FIELD_DP32(s->regs, CRB_LOC_STATE, tpmRegValidSts, 1); +ARRAY_FIELD_DP32(s->regs, CRB_CTRL_STS, + tpmIdle, 1); ARRAY_FIELD_DP32(s->regs, CRB_INTF_ID, InterfaceType, CRB_INTF_TYPE_CRB_ACTIVE); ARRAY_FIELD_DP32(s->regs, CRB_INTF_ID, -- 2.5.5
[Qemu-devel] [PATCH for-2.12] nbd/client: Correctly handle bad server REP_META_CONTEXT
It's never a good idea to blindly read for size bytes as returned by the server without first validating that the size is within bounds; a malicious or buggy server could cause us to hang or get out of sync from reading further messages. It may be smarter to try and teach the client to cope with unexpected context ids by silently ignoring them instead of hanging up on the server, but for now, if the server doesn't reply with exactly the one context we expect, it's easier to just give up - however, if we give up for any reason other than an I/O failure, we might as well try to politely tell the server we are quitting rather than continuing. Fix some typos in the process. Signed-off-by: Eric Blake --- nbd/client.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/nbd/client.c b/nbd/client.c index 9b9b7f0ea29..4ee1d9a4a2c 100644 --- a/nbd/client.c +++ b/nbd/client.c @@ -599,8 +599,8 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc, * Set one meta context. Simple means that reply must contain zero (not * negotiated) or one (negotiated) contexts. More contexts would be considered * as a protocol error. It's also implied that meta-data query equals queried - * context name, so, if server replies with something different then @context, - * it considered as error too. + * context name, so, if server replies with something different than @context, + * it is considered an error too. * return 1 for successful negotiation, context_id is set *0 if operation is unsupported, *-1 with errp set for any other error @@ -651,6 +651,14 @@ static int nbd_negotiate_simple_meta_context(QIOChannel *ioc, char *name; size_t len; +if (reply.length != sizeof(received_id) + context_len) { +error_setg(errp, "Failed to negotiate meta context '%s', server " + "answered with unexpected length %u", context, + reply.length); +nbd_send_opt_abort(ioc); +return -1; +} + if (nbd_read(ioc, &received_id, sizeof(received_id), errp) < 0) { return -1; } @@ -668,6 +676,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel *ioc, "answered with different context '%s'", context, name); g_free(name); +nbd_send_opt_abort(ioc); return -1; } g_free(name); @@ -690,6 +699,12 @@ static int nbd_negotiate_simple_meta_context(QIOChannel *ioc, if (reply.type != NBD_REP_ACK) { error_setg(errp, "Unexpected reply type %" PRIx32 " expected %x", reply.type, NBD_REP_ACK); +nbd_send_opt_abort(ioc); +return -1; +} +if (reply.length) { +error_setg(errp, "Unexpected length to ACK response"); +nbd_send_opt_abort(ioc); return -1; } -- 2.14.3
[Qemu-devel] [RFC 18/18] Collection of validator.py test cases
These test case specifications demonstrate what kind of tests can be specified using validator.py. Most of them can replace existing unit tests written in C, and device-crash-test.yaml can replace most of the functionality of scripts/device-crash-test. Signed-off-by: Eduardo Habkost --- tests/validator/cpu.yaml | 5 + tests/validator/device-add.yaml| 17 + tests/validator/device-crash-test.yaml | 7 +++ tests/validator/device-help.yaml | 3 +++ tests/validator/device-introspect.yaml | 23 +++ tests/validator/device-list.yaml | 10 ++ tests/validator/just-help.yaml | 3 +++ 7 files changed, 68 insertions(+) create mode 100644 tests/validator/cpu.yaml create mode 100644 tests/validator/device-add.yaml create mode 100644 tests/validator/device-crash-test.yaml create mode 100644 tests/validator/device-help.yaml create mode 100644 tests/validator/device-introspect.yaml create mode 100644 tests/validator/device-list.yaml create mode 100644 tests/validator/just-help.yaml diff --git a/tests/validator/cpu.yaml b/tests/validator/cpu.yaml new file mode 100644 index 00..48032995ff --- /dev/null +++ b/tests/validator/cpu.yaml @@ -0,0 +1,5 @@ +# Simply check if QEMU will run without errors using a CPU model: +command-line: '$QEMU -S -cpu $CPU' +expected-failures: +# "host" doesn't always work +- CPU: 'host' diff --git a/tests/validator/device-add.yaml b/tests/validator/device-add.yaml new file mode 100644 index 00..b4c0daf70c --- /dev/null +++ b/tests/validator/device-add.yaml @@ -0,0 +1,17 @@ +# This test specification will try device_add with all device types +command-line: '$QEMU -S -machine $MACHINE,accel=$ACCEL' +monitor-commands: +- qmp: + - execute: 'device_add' +arguments: + driver: '$DEVICE' + id: 'test-device-for-$DEVICE' + - execute: 'device_del' +arguments: + id: 'test-device-for-$DEVICE' +defaults: + MACHINE: 'none' + ACCEL: 'kvm:tcg' +expected-failures: + - MACHINE: 'xenpv' + - MACHINE: 'xenfv' diff --git a/tests/validator/device-crash-test.yaml b/tests/validator/device-crash-test.yaml new file mode 100644 index 00..edaab0ee8f --- /dev/null +++ b/tests/validator/device-crash-test.yaml @@ -0,0 +1,7 @@ +# this is a very simple test case generator that will run QEMU +# using all combinations of -machine and -device. +# This replaces scripts/device-crash-test. +# TODO: use a $MACHINE_OPT variable to make -machine optional +command-line: '$QEMU -S -machine $MACHINE,accel=$ACCEL -device $DEVICE' + +#TODO: represent the whitelist from device-crash-test script somehow diff --git a/tests/validator/device-help.yaml b/tests/validator/device-help.yaml new file mode 100644 index 00..95aa5c9f5b --- /dev/null +++ b/tests/validator/device-help.yaml @@ -0,0 +1,3 @@ +# Just check if -device ...,help works: +command-line: '$QEMU -device $DEVICE,help' +qmp: false diff --git a/tests/validator/device-introspect.yaml b/tests/validator/device-introspect.yaml new file mode 100644 index 00..f5b10aff9a --- /dev/null +++ b/tests/validator/device-introspect.yaml @@ -0,0 +1,23 @@ +# This test specification is equivalent to "device/introspect/concrete" in +# tests/device-introspect-test.c +command-line: '$QEMU -S -machine $MACHINE,accel=$ACCEL' +monitor-commands: +- qmp: + - execute: 'device-list-properties' +arguments: + typename: '$DEVICE' +- hmp: 'device_add $DEVICE,help' +- hmp: 'info qom-tree' +defaults: + MACHINE: 'none' + ACCEL: 'kvm:tcg' +expected-failures: + - MACHINE: 'xenpv' + - MACHINE: 'xenfv' + +#TODO: the test runner could support something like: +# extra-cases: +# # like "device/introspect/none": +# - DEVICE: 'nonexistent' +# # like "device/introspect/abstract": +# - DEVICE: 'device' diff --git a/tests/validator/device-list.yaml b/tests/validator/device-list.yaml new file mode 100644 index 00..ac0004c05d --- /dev/null +++ b/tests/validator/device-list.yaml @@ -0,0 +1,10 @@ +# this test specification is equivalent to the +# "device/introspect/list" test case in device-introspect-test.c +command-line: '$QEMU -nodefaults -machine none' +monitor-commands: +- qmp: + - execute: qom-list-types +arguments: + implements: 'device' + abstract: true +- hmp: 'device_add help' diff --git a/tests/validator/just-help.yaml b/tests/validator/just-help.yaml new file mode 100644 index 00..84ec8d7090 --- /dev/null +++ b/tests/validator/just-help.yaml @@ -0,0 +1,3 @@ +# just run $QEMU -help and ensure it won't crash +command-line: '$QEMU -help' +qmp: false -- 2.14.3
[Qemu-devel] [RFC 15/18] qemu.py: qmp_obj() method
A wrapper for the QEMUMonitorProtocol.cmd_obj() method. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/scripts/qemu.py b/scripts/qemu.py index 00b44ea0f4..d25fe030bb 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -311,6 +311,9 @@ class QEMUMachine(object): return self._qmp.cmd(cmd, args=qmp_args) +def qmp_obj(self, *args, **kwargs): +return self._qmp.cmd_obj(*args, **kwargs) + def command(self, cmd, conv_keys=True, **args): ''' Invoke a QMP command. -- 2.14.3
[Qemu-devel] [RFC 13/18] qemu.py: 'force' parameter on shutdown()
This will allow test code to shut down the VM without trying to run a 'quit' QMP command. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index aaba04b3c1..045dca5d02 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -279,13 +279,24 @@ class QEMUMachine(object): self._popen.wait() self._post_shutdown() -def shutdown(self): -'''Terminate the VM and clean up''' -if self.is_running(): +def shutdown(self, force=None): +'''Terminate the VM and clean up + +@param force: If True, terminate QEMU process, if False always try + clean shutdown, if None terminate QEMU process if + clean shutdown fails. +''' +if not force and self.is_running(): try: self._qmp.cmd('quit') except: -self._popen.kill() +if force is None: +force = True +else: +raise + +if force and self.is_running(): +self._popen.kill() self.wait() -- 2.14.3
[Qemu-devel] [RFC 10/18] qemu.py: Set _launched = False on _post_shutdown
This will allow a sequence like: vm.launch() vm.cmd('quit') vm.wait() # triggers post-shutdown code vm.launch() to work. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index e82540a235..226d2c4d48 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -225,6 +225,8 @@ class QEMUMachine(object): shutil.rmtree(self._temp_dir) self._temp_dir = None +self._launched = False + def launch(self): """ Launch the VM and make sure we cleanup and expose the @@ -286,8 +288,6 @@ class QEMUMachine(object): command = '' LOG.warn(msg, exitcode, command) -self._launched = False - def qmp(self, cmd, conv_keys=True, **args): '''Invoke a QMP command and return the response dict''' qmp_args = dict() -- 2.14.3
[Qemu-devel] [RFC 16/18] qemu.py: is_launched() method
Useful if some code gets a QEMUMachine object in unknown state, and wants to launch it only if necessary. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/scripts/qemu.py b/scripts/qemu.py index d25fe030bb..d32b923a15 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -236,6 +236,9 @@ class QEMUMachine(object): self._launched = False +def is_launched(self): +return self._launched + def launch(self): """ Launch the VM and make sure we cleanup and expose the -- 2.14.3
[Qemu-devel] [RFC 11/18] qemu.py: Log crashes inside _post_shutdown()
This will allow us to log QEMU crashes even if the test code uses .wait() instead of .shutdown(). Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index 226d2c4d48..e19e4b34d0 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -225,6 +225,15 @@ class QEMUMachine(object): shutil.rmtree(self._temp_dir) self._temp_dir = None +exitcode = self.exitcode() +if exitcode is not None and exitcode < 0: +msg = 'qemu received signal %i: %s' +if self._qemu_full_args: +command = ' '.join(self._qemu_full_args) +else: +command = '' +LOG.warn(msg, exitcode, command) + self._launched = False def launch(self): @@ -279,15 +288,6 @@ class QEMUMachine(object): self.wait() -exitcode = self.exitcode() -if exitcode is not None and exitcode < 0: -msg = 'qemu received signal %i: %s' -if self._qemu_full_args: -command = ' '.join(self._qemu_full_args) -else: -command = '' -LOG.warn(msg, exitcode, command) - def qmp(self, cmd, conv_keys=True, **args): '''Invoke a QMP command and return the response dict''' qmp_args = dict() -- 2.14.3
[Qemu-devel] [RFC 08/18] qemu.py: Close _qmp inside _post_shutdown()
This way all shutdown-related cleanup is kept in a single place. While at it, set _qmp to None after closing the socket, to avoid trying to reuse it by mistake later. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index 4a71542c6b..1241e7f10b 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -199,6 +199,10 @@ class QEMUMachine(object): self._qmp.accept() def _post_shutdown(self): +if self._qmp is not None: +self._qmp.close() +self._qmp = None + self._load_io_log() if self._qemu_log_file is not None: self._qemu_log_file.close() @@ -250,7 +254,6 @@ class QEMUMachine(object): def wait(self): '''Wait for the VM to power off''' self._popen.wait() -self._qmp.close() self._post_shutdown() def shutdown(self): -- 2.14.3
[Qemu-devel] [RFC 17/18] validator.py script
See cover letter for a description of the new test system. TODO: code clean up TODO: write description. Signed-off-by: Eduardo Habkost --- scripts/validator.py | 724 +++ 1 file changed, 724 insertions(+) create mode 100755 scripts/validator.py diff --git a/scripts/validator.py b/scripts/validator.py new file mode 100755 index 00..4312571feb --- /dev/null +++ b/scripts/validator.py @@ -0,0 +1,724 @@ +#!/usr/bin/env python +# +# Copyright (c) 2018 Red Hat Inc +# +# Author: +# Eduardo Habkost +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +""" +QEMU validator script += + +This script will get test YAML test case specifications or Python +modules as input, and generate/run test cases based on them. + +USAGE +- + +validator.py ... -V VAR1=value1 VAR1=value2 VAR2=value3 + +specification-file is a YAML file containing the test specification. + +Example:: + +# this test specification is equivalent to the +# "device/introspect/list" test case in device-introspect-test.c +command-line: '$QEMU -nodefaults -machine none' +monitor-commands: +- qmp: + - execute: qom-list-types +arguments: + implements: 'device' + abstract: true +- hmp: 'device_add help' + + +VARIABLE EXPANSION +-- + +The test runner will try to run the test cases with all possible values +for variables appearing in the test specification. + +Some built-in variables are automatically expanded: + +* `$MACHINE` - Expands to a machine-type name supported by $QEMU +* `$ACCEL` - Expands to an accelerator name supported by $QEMU +* `$DEVICE` - Expands to a (user-creatable) device type name supported by $QEMU +* `$CPU` - Expands to a CPU model name supported by $QEMU + +Note that the $QEMU variable must be specified in th + +TEST SPECIFICATION FIELDS +- + +command-line + + +List or string, containing the QEMU command-line to be run. + +Default: '$QEMU' + + +monitor-commands + + +Mapping or list-of-mappings containing monitor commands to run. The key on each +item can be ``hmp`` or ``qmp``. The value on each entry can be a string, +mapping, or list. + +Default: None. + +TODO: not implemented yet. + + +qmp +~~~ + +Boolean. If true (the default), a QMP monitor is configured on the command-line +automatically. + +If true, the test runner will issue a ``quit`` command automatically when +testing is finished. If false, the test runner will wait until QEMU exits by +itself. + +Example:: + +# just run $QEMU -help and ensure it won't crash +command-line: ['$QEMU', '-help'] +qmp: false + + +TODO: whitelist +TODO: validate output against reference output +TODO: configure defaults for variables +TODO: compatibility with Avocado multiplexer? +""" + +import sys +import os +import string +import argparse +import pprint +import yaml +import logging +import shlex +import pipes +import re +import itertools +import traceback +import socket +from collections import OrderedDict + +sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'scripts')) +from qemu import QEMUMachine +from qmp.qmp import QMPError + +logger = logging.getLogger('qemu.tests.validator') +dbg = logger.debug + +# Python 2.7 compatibility: +shquote = getattr(shlex, 'quote', pipes.quote) + +class InvalidSpecification(Exception): +pass + +def qom_type_names(vm, **kwargs): +"""Run qom-list-types QMP command, return type names""" +types = vm.command('qom-list-types', **kwargs) +return [t['name'] for t in types] + + +def info_qdm(vm): +"""Parse 'info qdm' output""" +args = {'command-line': 'info qdm'} +devhelp = vm.command('human-monitor-command', **args) +for l in devhelp.split('\n'): +l = l.strip() +if l == '' or l.endswith(':'): +continue +d = {'name': re.search(r'name "([^"]+)"', l).group(1), + 'no-user': (re.search(', no-user', l) is not None)} +yield d + + +class QemuBinaryInfo(object): +"""Information for a specific QEMU binary""" +def __init__(self, binary): +"""Don't instantiate this directly, use get_binary_info()""" +self.binary = binary + +args = ['-S', '-machine', 'none,accel=kvm:tcg'] +
[Qemu-devel] [RFC 14/18] qemu.py: Don't try to quit cleanly on exceptions
Exceptions when starting a VM probably mean QEMU exited and the monitor won't work. It's better to simply terminate the process instead of trying to communicate using QMP. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index 045dca5d02..00b44ea0f4 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -251,7 +251,7 @@ class QEMUMachine(object): self._launch() self._launched = True except: -self.shutdown() +self.shutdown(force=True) LOG.debug('Error launching VM') if self._qemu_full_args: -- 2.14.3
[Qemu-devel] [RFC 03/18] qmp.py: Cleanly handle unexpectedly closed socket
QEMUMonitorProtocol.cmd() returns None if the socket was closed, so callers must handle this case explicltly. Signed-off-by: Eduardo Habkost --- scripts/qmp/qmp.py | 6 ++ 1 file changed, 6 insertions(+) diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py index e9cb6b2683..91b537ea04 100644 --- a/scripts/qmp/qmp.py +++ b/scripts/qmp/qmp.py @@ -73,6 +73,8 @@ class QEMUMonitorProtocol(object): raise QMPConnectError # Greeting seems ok, negotiate capabilities resp = self.cmd('qmp_capabilities') +if resp is None: +raise QMPConnectError("QMP connection unexpectedly closed") if "return" in resp: return greeting raise QMPCapabilitiesError @@ -182,6 +184,8 @@ class QEMUMonitorProtocol(object): @param name: command name (string) @param args: command arguments (dict) @param cmd_id: command id (dict, list, string or int) +@return QMP response as a Python dict or None if the connection has +been closed """ qmp_cmd = {'execute': name} if args: @@ -195,6 +199,8 @@ class QEMUMonitorProtocol(object): Build and send a QMP command to the monitor, report errors if any """ ret = self.cmd(cmd, kwds) +if ret is None: +raise QMPConnectError("QMP connection unexpectedly closed") if "error" in ret: raise Exception(ret['error']['desc']) return ret['return'] -- 2.14.3
[Qemu-devel] [RFC 05/18] qemu.py: Split _base_args()
Split it into _monitor_args() and _display_args(), so we can make monitor args optional later. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index c051c4525a..d9f85bb153 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -173,15 +173,20 @@ class QEMUMachine(object): else: return os.path.join(self._temp_dir, self._name + "-monitor.sock") -def _base_args(self): +def _monitor_args(self): addr = self._vm_monitor() if isinstance(addr, tuple): moncdev = "socket,id=mon,host=%s,port=%s" % (addr[0], addr[1]) else: moncdev = 'socket,id=mon,path=%s' % (addr) return ['-chardev', moncdev, -'-mon', 'chardev=mon,mode=control', -'-display', 'none', '-vga', 'none'] +'-mon', 'chardev=mon,mode=control'] + +def _display_args(self): +return ['-display', 'none', '-vga', 'none'] + +def _base_args(self): +return self._monitor_args() + self._display_args() def _pre_launch(self): self._temp_dir = tempfile.mkdtemp(dir=self._test_dir) -- 2.14.3
[Qemu-devel] [RFC 12/18] qemu.py: Only wait for process if it's still running
TODO: explain why Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index e19e4b34d0..aaba04b3c1 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -275,7 +275,8 @@ class QEMUMachine(object): def wait(self): '''Wait for the VM to power off''' -self._popen.wait() +if self.is_running(): +self._popen.wait() self._post_shutdown() def shutdown(self): -- 2.14.3
[Qemu-devel] [RFC 07/18] qemu.py: Use wait() logic inside shutdown()
This way we will have a single method where we wait for the QEMU process to finish. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index 84bb3da613..4a71542c6b 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -258,12 +258,10 @@ class QEMUMachine(object): if self.is_running(): try: self._qmp.cmd('quit') -self._qmp.close() except: self._popen.kill() -self._popen.wait() -self._post_shutdown() +self.wait() exitcode = self.exitcode() if exitcode is not None and exitcode < 0: -- 2.14.3
[Qemu-devel] [RFC 04/18] qemu.py: Make _vm_monitor a method
Use a function instead of a data field. Less state to keep track of, less chance of bugs. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index 08a3e9af5a..c051c4525a 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -76,7 +76,6 @@ class QEMUMachine(object): name = "qemu-%d" % os.getpid() self._name = name self._monitor_address = monitor_address -self._vm_monitor = None self._qemu_log_path = None self._qemu_log_file = None self._popen = None @@ -168,29 +167,28 @@ class QEMUMachine(object): with open(self._qemu_log_path, "r") as iolog: self._iolog = iolog.read() +def _vm_monitor(self): +if self._monitor_address is not None: +return self._monitor_address +else: +return os.path.join(self._temp_dir, self._name + "-monitor.sock") + def _base_args(self): -if isinstance(self._monitor_address, tuple): -moncdev = "socket,id=mon,host=%s,port=%s" % ( -self._monitor_address[0], -self._monitor_address[1]) +addr = self._vm_monitor() +if isinstance(addr, tuple): +moncdev = "socket,id=mon,host=%s,port=%s" % (addr[0], addr[1]) else: -moncdev = 'socket,id=mon,path=%s' % self._vm_monitor +moncdev = 'socket,id=mon,path=%s' % (addr) return ['-chardev', moncdev, '-mon', 'chardev=mon,mode=control', '-display', 'none', '-vga', 'none'] def _pre_launch(self): self._temp_dir = tempfile.mkdtemp(dir=self._test_dir) -if self._monitor_address is not None: -self._vm_monitor = self._monitor_address -else: -self._vm_monitor = os.path.join(self._temp_dir, -self._name + "-monitor.sock") self._qemu_log_path = os.path.join(self._temp_dir, self._name + ".log") self._qemu_log_file = open(self._qemu_log_path, 'wb') -self._qmp = qmp.qmp.QEMUMonitorProtocol(self._vm_monitor, -server=True) +self._qmp = qmp.qmp.QEMUMonitorProtocol(self._vm_monitor(), server=True) def _post_launch(self): self._qmp.accept() -- 2.14.3
[Qemu-devel] [RFC 02/18] qmp.py: Fix error handling for Python 3
socket.error doesn't behave like a tuple in Python 3, but we can use error.args on both Python 2.7 and 3. Signed-off-by: Eduardo Habkost --- scripts/qmp/qmp.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py index 078ce65f3b..e9cb6b2683 100644 --- a/scripts/qmp/qmp.py +++ b/scripts/qmp/qmp.py @@ -108,7 +108,7 @@ class QEMUMonitorProtocol(object): try: self.__json_read() except socket.error as err: -if err[0] == errno.EAGAIN: +if err.args[0] == errno.EAGAIN: # No data available pass self.__sock.setblocking(1) @@ -168,9 +168,9 @@ class QEMUMonitorProtocol(object): try: self.__sock.sendall(json.dumps(qmp_cmd).encode('utf-8')) except socket.error as err: -if err[0] == errno.EPIPE: +if err.args[0] == errno.EPIPE: return -raise socket.error(err) +raise resp = self.__json_read() self.logger.debug("<<< %s", resp) return resp -- 2.14.3
[Qemu-devel] [RFC 09/18] qemu.py: Make monitor optional
This will allow us to use the QEMUMachine class on test cases that don't use QMP at all. Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index 1241e7f10b..e82540a235 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -54,7 +54,7 @@ class QEMUMachine(object): ''' def __init__(self, binary, args=None, wrapper=None, name=None, - test_dir="/var/tmp", monitor_address=None, + test_dir="/var/tmp", monitor_address=None, qmp=True, socket_scm_helper=None): ''' Initialize a QEMUMachine @@ -65,6 +65,7 @@ class QEMUMachine(object): @param name: prefix for socket and log file names (default: qemu-PID) @param test_dir: where to create socket and log file @param monitor_address: address for QMP monitor +@param qmp: if False, disable QMP monitor @param socket_scm_helper: helper program, required for send_fd_scm()" @note: Qemu process is not started until launch() is used. ''' @@ -90,6 +91,7 @@ class QEMUMachine(object): self._test_dir = test_dir self._temp_dir = None self._launched = False +self._qmp_enabled = qmp # just in case logging wasn't configured by the main script: logging.basicConfig() @@ -168,13 +170,18 @@ class QEMUMachine(object): self._iolog = iolog.read() def _vm_monitor(self): -if self._monitor_address is not None: +if not self._qmp_enabled: +return None +elif self._monitor_address is not None: return self._monitor_address else: return os.path.join(self._temp_dir, self._name + "-monitor.sock") def _monitor_args(self): addr = self._vm_monitor() +if addr is None: +return [] + if isinstance(addr, tuple): moncdev = "socket,id=mon,host=%s,port=%s" % (addr[0], addr[1]) else: @@ -193,10 +200,14 @@ class QEMUMachine(object): self._qemu_log_path = os.path.join(self._temp_dir, self._name + ".log") self._qemu_log_file = open(self._qemu_log_path, 'wb') -self._qmp = qmp.qmp.QEMUMonitorProtocol(self._vm_monitor(), server=True) +self._qmp = \ +qmp.qmp.QEMUMonitorProtocol(self._vm_monitor(), server=True) \ +if self._qmp_enabled \ +else None def _post_launch(self): -self._qmp.accept() +if self._qmp: +self._qmp.accept() def _post_shutdown(self): if self._qmp is not None: -- 2.14.3
[Qemu-devel] [RFC 06/18] qemu.py: Move _load_io_log() call to _post_shutdown()
All callers of _post_shutdown() call _load_io_log(), so it's easier to simply call it inside _post_shutdown(). Signed-off-by: Eduardo Habkost --- scripts/qemu.py | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/scripts/qemu.py b/scripts/qemu.py index d9f85bb153..84bb3da613 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -199,6 +199,7 @@ class QEMUMachine(object): self._qmp.accept() def _post_shutdown(self): +self._load_io_log() if self._qemu_log_file is not None: self._qemu_log_file.close() self._qemu_log_file = None @@ -250,7 +251,6 @@ class QEMUMachine(object): '''Wait for the VM to power off''' self._popen.wait() self._qmp.close() -self._load_io_log() self._post_shutdown() def shutdown(self): @@ -263,7 +263,6 @@ class QEMUMachine(object): self._popen.kill() self._popen.wait() -self._load_io_log() self._post_shutdown() exitcode = self.exitcode() -- 2.14.3
[Qemu-devel] [RFC 00/18] QEMU validator: A method to specify QEMU crash-test cases
Rationale - Today we have a few test cases in the QEMU tree that just run QEMU with a few command-line options, run some QMP or HMP commands, and check if QEMU didn't crash. Some examples: * scripts/device-crash-test * The test case suggested by Thomas at: Subject: [RFC PATCH] tests: Add a device_add/del HMP test * Some test cases in tests/device-introspect-test.c * tests/cpu-plug-test.c * tests/display-vga-test.c * tests/drive_del-test.c * tests/machine-none-test.c * tests/test-hmp.c * Some device test skeletons: ac97-test.c, e1000-test.c, eepro100-test.c, es1370-test.c, i82801b11-test.c, intel-hda-test.c, ioh3420-test.c, ipoctal232-test.c, ne2000-test.c, nvme-test.c, pcnet-test.c, spapr-phb-test.c, test-netfilter.c, tpci200-test.c, usb-hcd-ohci-test.c, usb-hcd-xhci-test.c, virtio-balloon-test.c, virtio-console-test.c, virtio-serial-test.c, vmxnet3-test.c. All of those test cases require lots of boilerplate code just to run some combinations of QEMU command-line options and monitor commands. In addition to that, writing and reviewing test code like the examples above is more complex than it could be. Some past examples from qemu-devel: * Subject: [RFC 5/6] device-crash-test: Basic device_add support * Subject: [RFC 6/6] device-crash-test: Multi-device device_add test * Subject: [RFC PATCH] tests/device-introspect: Test devices with all machines, not only with "none" The Proposal This series introduces the scripts/validator.py test runner, and a set of example test specifications in tests/validator. Each test specification is a YAML file containing the QEMU command-line and monitor commands to run, and can contain variables like $MACHINE, $DEVICE, $CPU, that can be expanded automatically by the test runner. This way, scripts/device-crash-test can be replaced by a one-line test specification: command-line: '$QEMU -S -machine $MACHINE,accel=$ACCEL -device $DEVICE' The device_add/device_del test case written by Thomas can be specified as: command-line: '$QEMU -S -machine $MACHINE,accel=$ACCEL' monitor-commands: - qmp: - execute: 'device_add' arguments: driver: '$DEVICE' id: 'test-device-for-$DEVICE' - execute: 'device_del' arguments: id: 'test-device-for-$DEVICE' and the test runner will take care of running test cases with with all valid machine-types, device-types, and accelerators. For other examples, see the last patch in this series ("Collection of validator.py test cases"). Most of the examples replace existing C or Python test cases with simple declarative test specifications. I expect this to be very useful when somebody is fixing a QEMU crash and just want to add a very simple reproducer to ensure the crash will never be introduced again. (I actually suggest we _require_ people to include a tests/validator test specifications every time they fix a crash that can be reproduced using just QMP commands). Features Current features: * Automatic variable expansion, as described above * A simple method to specify defaults (e.g. to test only "-machine none" or only "accel=kvm:tcg" unless running in --full mode) * A simple method to represent expected failures (it can't replace the device-crash-test whitelist mechanism, yet) Caveats --- The test runner code still needs some clean up. I'm not happy with the current variable enumeration/expansion code, and plan to make it cleaner. How to test --- You can run all example test cases on /usr/bin/qemu-system-x86_64 with: $ ./scripts/validator.py tests/validator/* -V QEMU=/usr/bin/qemu-system-x86_64 However, the device-crash-test.yaml test specification still takes too long to run and reports too many false positives. So I suggest doing this instead: $ ./scripts/validator.py $(ls tests/validator/* | grep -v device-crash-test) \ -V QEMU=/usr/bin/qemu-system-x86_64 On my system, this runs ~700 test cases in ~15 seconds. What's out of scope? The following are _not_ goals of this system: * Replacing the work done by the Avocado team to support Avocado-based tests. * Making the YAML test specification a fully-featured imperative language. Anything that requires extra logic still needs to be written in C or Python. Future Plans Short term: * Choose a better name for the test runner * Support a list of known-failures similar to the whitelist in device-crash-test. * Replace device-crash-test completely * "make check" rules Long term: * stdout and QMP command output validation. I plan to use this to write machine-type/guest-ABI compatibility test cases that will detect guest ABI breakage as soon as possible. * See if the Avocado multiplexer API can be used to implement variable expansion. * See if we can make some of the features (e.g. automatic variable expansion) available to Python test cases * Fuzzing QMP commands and QEMU command-
[Qemu-devel] [RFC 01/18] qmp.py: Make it safe to call close() any time
This will allow us to simplify the error handling and shutdown logic in qemu.py. Signed-off-by: Eduardo Habkost --- scripts/qmp/qmp.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py index 5c8cf6a056..078ce65f3b 100644 --- a/scripts/qmp/qmp.py +++ b/scripts/qmp/qmp.py @@ -244,7 +244,9 @@ class QEMUMonitorProtocol(object): def close(self): self.__sock.close() -self.__sockfile.close() +if self.__sockfile is not None: +self.__sockfile.close() +self.__sockfile = None def settimeout(self, timeout): self.__sock.settimeout(timeout) -- 2.14.3
Re: [Qemu-devel] [PATCH v2 0/6] postcopy block time calculation + ppc32 build fix
* Alexey Perevalov (a.pereva...@samsung.com) wrote: > V1-V2 > accidentally appeared __nocheck after rebase > this patch set also rebased after latest pull request > > This patch set includes patches which were reverted by commit > ee86981bd, due to build problem on 32 powerpc/arm architecture. > Also it includes patch to fix build > ([PATCH v4] migration: change blocktime type to uint32_t), but that > patch was merged into: > migration: add postcopy blocktime ctx into MigrationIncomingState > migration: calculate vCPU blocktime on dst side > migration: add postcopy total blocktime into query-migrate OK, lets get this in when 2.13 opens. Thanks! Dave > > based on > commit c6740fc88ecd8f5cf3cf3185ee112c3eea41caa2 > "hw/rdma: Implementation of PVRDMA device" > > Alexey Perevalov (6): > migration: introduce postcopy-blocktime capability > migration: add postcopy blocktime ctx into MigrationIncomingState > migration: calculate vCPU blocktime on dst side > migration: postcopy_blocktime documentation > migration: add blocktime calculation into migration-test > migration: add postcopy total blocktime into query-migrate > > docs/devel/migration.rst | 14 +++ > hmp.c| 15 +++ > migration/migration.c| 51 - > migration/migration.h| 13 +++ > migration/postcopy-ram.c | 268 > ++- > migration/trace-events | 6 +- > qapi/migration.json | 17 ++- > tests/migration-test.c | 16 +++ > 8 files changed, 392 insertions(+), 8 deletions(-) > > -- > 2.7.4 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 0/8] nvdimm: guarantee persistence of QEMU writes to persistent memory
* Haozhong Zhang (haozhong.zh...@intel.com) wrote: > Post-copy with NVDIMM currently fails with message "Postcopy on shared > RAM (...) is not yet supported". Is it enough? What does it say now that postcopy-shared support is in? Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 5/8] migration/ram: ensure write persistence on loading zero pages to PMEM
* Haozhong Zhang (haozhong.zh...@intel.com) wrote: > When loading a zero page, check whether it will be loaded to > persistent memory If yes, load it by libpmem function > pmem_memset_nodrain(). Combined with a call to pmem_drain() at the > end of RAM loading, we can guarantee all those zero pages are > persistently loaded. > > Depending on the host HW/SW configurations, pmem_drain() can be > "sfence". Therefore, we do not call pmem_drain() after each > pmem_memset_nodrain(), or use pmem_memset_persist() (equally > pmem_memset_nodrain() + pmem_drain()), in order to avoid unnecessary > overhead. > > Signed-off-by: Haozhong Zhang I'm still thinking this is way too invasive; especially the next patch that touches qemu_file. One thing that would help a little, but not really enough, would be to define a : struct MemOps { void (*copy)(like a memcpy); void (*set)(like a memset); } then you could have: struct MemOps normalops = {memcpy, memset}; struct MemOps pmem_nodrain_ops = { pmem_memcopy_nodrain, pmem_memset_nodrain }; then things like ram_handle_compressed would be: void ram_handle_compressed(void *host, uint8_t ch, uint64_t size, const struct MemOps *mem) { if (ch != 0 || !is_zero_range(host, size)) { mem->set(host, ch,size); } } which means the change is pretty tiny to each function. > diff --git a/migration/rdma.c b/migration/rdma.c > index da474fc19f..573bcd2cb0 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -3229,7 +3229,7 @@ static int qemu_rdma_registration_handle(QEMUFile *f, > void *opaque) > host_addr = block->local_host_addr + > (comp->offset - block->offset); > > -ram_handle_compressed(host_addr, comp->value, comp->length); > +ram_handle_compressed(host_addr, comp->value, comp->length, > false); Is that right? Is RDMA not allowed to work on PMEM? (and anyway this call is a normal clear rather than an actual RDMA op). Dave > break; > > case RDMA_CONTROL_REGISTER_FINISHED: > diff --git a/stubs/pmem.c b/stubs/pmem.c > index 03d990e571..a65b3bfc6b 100644 > --- a/stubs/pmem.c > +++ b/stubs/pmem.c > @@ -17,3 +17,12 @@ void *pmem_memcpy_persist(void *pmemdest, const void *src, > size_t len) > { > return memcpy(pmemdest, src, len); > } > + > +void *pmem_memset_nodrain(void *pmemdest, int c, size_t len) > +{ > +return memset(pmemdest, c, len); > +} > + > +void pmem_drain(void) > +{ > +} > -- > 2.14.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 0/3] Add new CD-ROM related qtests
On 03/16/2018 01:39 AM, Thomas Huth wrote: > With one of my clean-up patches (see commit 1454509726719e0933c800), I > recently accidentially broke the "-cdrom" parameter (more precisely > "-drive if=scsi") on a couple of boards, since there was no error > detected during the "make check" regression testing. This is clearly an > indication that we are lacking tests in this area. > So this small patch series now introduces some tests for CD-ROM drives: > The first two patches introduce the possibility to check that booting > from CD-ROM drives still works fine for x86 and s390x, and the third > patch adds a test that certain machines can at least still be started > with the "-cdrom" parameter (i.e. that test would have catched the > mistake that I did with my SCSI cleanup patch). > > v2: > - Use g_spawn_sync() instead of execlp() to run genisoimage > - The "-cdrom" parameter test is now run on all architectures (with >machine "none" for the machines that are not explicitly checked) > - Some rewordings and improved comments here and there > > Thomas Huth (3): > tests/boot-sector: Add magic bytes to s390x boot code header > tests/cdrom-test: Test booting from CD-ROM ISO image file > tests/cdrom-test: Test that -cdrom parameter is working > > tests/Makefile.include | 2 + > tests/boot-sector.c| 9 +- > tests/cdrom-test.c | 222 > + > 3 files changed, 230 insertions(+), 3 deletions(-) > create mode 100644 tests/cdrom-test.c > New file, but no edit to MAINTAINERS.
Re: [Qemu-devel] linux-user: missing synchronization in CPU object realization
On Thu, Mar 29, 2018 at 7:15 AM, Max Filippov wrote: > I'm running xtensa linux-user application that rapidly creates > a lot of threads and I observe the following assertion failure: > > (process:26953): GLib-CRITICAL **: g_hash_table_iter_next: assertion > 'ri->version == ri->hash_table->version' failed > ** > ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component: code > should not be reached [...] > This happens because multiple threads concurrently call > object_property_set_bool to realize cpu object in cpu_create, > which results in the following call chain: > device_set_realized -> object_property_add_child -> > object_property_add -> g_hash_table_insert > inserting newly created CPUs as children to /unattached without > any locking. Looking at it some more I wonder why do we build this object tree n linux-user in the first place? It doesn't seem to be used anywhere. -- Thanks. -- Max
Re: [Qemu-devel] linux-user: missing synchronization in CPU object realization
Hi, I think multi-threading and locking questions are more for Alex Bennée. Alex, any comment? Thanks, Laurent Le 29/03/2018 à 16:15, Max Filippov a écrit : > Hello, > > I'm running xtensa linux-user application that rapidly creates > a lot of threads and I observe the following assertion failure: > > (process:26953): GLib-CRITICAL **: g_hash_table_iter_next: assertion > 'ri->version == ri->hash_table->version' failed > ** > ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component: code > should not be reached > > The backtrace: > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > #1 0x7667c42a in __GI_abort () at abort.c:89 > #2 0x779345a5 in g_assertion_message > (domain=domain@entry=0x0, file=file@entry=0x557f8ef8 > "/home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c", > line=line@entry=1662, > func=func@entry=0x557f96c0 <__func__.17879> > "object_get_canonical_path_component", > message=message@entry=0x7fffa80808a0 "code should not be reached") at > ././glib/gtestutils.c:2432 > #3 0x7793463a in g_assertion_message_expr (domain=0x0, > file=0x557f8ef8 > "/home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c", line=1662, > func=0x557f96c0 <__func__.17879> > "object_get_canonical_path_component", expr=) at > ././glib/gtestutils.c:2455 > #4 0x556c6cec in object_get_canonical_path_component > (obj=0x7fffa80777d0) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1662 > #5 0x556c6d1e in object_get_canonical_path > (obj=0x7fffa80777d0) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1672 > #6 0x556bc693 in device_set_realized (obj=0x7fffa80777d0, > value=true, errp=0x77f020f8) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/hw/core/qdev.c:874 > #7 0x556c75c4 in property_set_bool (obj=0x7fffa80777d0, > v=0x7fffa8080750, name=0x557f889c "realized", > opaque=0x7fffa8077660, errp=0x77f020f8) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1923 > #8 0x556c58ea in object_property_set (obj=0x7fffa80777d0, > v=0x7fffa8080750, name=0x557f889c "realized", errp=0x77f020f8) > at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1122 > #9 0x556c8901 in object_property_set_qobject > (obj=0x7fffa80777d0, value=0x7fffa80806a0, name=0x557f889c > "realized", errp=0x77f020f8) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/qom-qobject.c:27 > #10 0x556c5b62 in object_property_set_bool > (obj=0x7fffa80777d0, value=true, name=0x557f889c "realized", > errp=0x77f020f8) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1188 > #11 0x556c166f in cpu_create (typename=0x57cb0d20 > "dc233c-xtensa-cpu") at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/cpu.c:64 > #12 0x5564ec0e in cpu_copy (env=0x5924d420) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/main.c:4121 > #13 0x5565d4e1 in do_fork (env=0x5924d420, flags=4001536, > newsp=829419520, parent_tidptr=829420792, newtls=829421856, > child_tidptr=829420792) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/syscall.c:6350 > #14 0x556664c7 in do_syscall (cpu_env=0x5924d420, num=116, > arg1=4001536, arg2=829419520, arg3=829420792, arg4=829421856, > arg5=829420792, arg6=829420688, arg7=0, arg8=0) > at /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/syscall.c:10003 > #15 0x5564e85b in cpu_loop (env=0x5924d420) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/main.c:3999 > #16 0x5565d3d9 in clone_func (arg=0x7fffbcd0) at > /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/syscall.c:6313 > #17 0x769ee494 in start_thread (arg=0x77f04700) at > pthread_create.c:333 > #18 0x76730acf in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 > > This happens because multiple threads concurrently call > object_property_set_bool to realize cpu object in cpu_create, > which results in the following call chain: > device_set_realized -> object_property_add_child -> > object_property_add -> g_hash_table_insert > inserting newly created CPUs as children to /unattached without > any locking. > > I've put locking around the > object_property_set_bool(OBJECT(cpu), true, "realized", &err); > call in the cpu_create and that fixed this issue. Any suggestion > regarding how to fix this properly? >
[Qemu-devel] [PULL 0/1] RISC-V: Critical fixes for QEMU 2.12
The following changes since commit 47d3b60858d90ac8a0cc3a72af7f95c96781125a: Merge remote-tracking branch 'remotes/riscv/tags/riscv-qemu-2.12-important-fixes' into staging (2018-03-28 22:13:38 +0100) are available in the git repository at: https://github.com/riscv/riscv-qemu.git tags/riscv-qemu-2.12-critical-fixes for you to fetch changes up to b02403363f1056421d120c8e974fdf9c76a84f95: RISC-V: Workaround for critical mstatus.FS bug (2018-03-29 10:22:26 -0700) RISC-V: Critical fixes for QEMU 2.12 This series includes changes that are considered release critical, such as floating point register file corruption under SMP Linux due to incorrect handling of mstatus.FS. This workaround will be replaced with a more comprehensive fix for mstatus.FS handling in QEMU 2.13. Michael Clark (1): RISC-V: Workaround for critical mstatus.FS bug target/riscv/op_helper.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-)
[Qemu-devel] [PULL 1/1] RISC-V: Workaround for critical mstatus.FS bug
This change is a workaround for a bug where mstatus.FS is not correctly reporting dirty after operations that modify floating point registers. This a critical bug or RISC-V in QEMU as it results in floating point register file corruption when running SMP Linux due to task migration and possibly uniprocessor Linux if more than one process is using the FPU. This workaround will return dirty if mstatus.FS is switched from off to initial or clean. According to the specification it is legal for an implementation to return only off, or dirty. Cc: Palmer Dabbelt Cc: Sagar Karandikar Cc: Bastian Koppelmann Cc: Peter Maydell Cc: Alex Bennée Cc: Richard Henderson Cc: Philippe Mathieu-Daudé Tested-by: Richard W.M. Jones Signed-off-by: Michael Clark Reviewed-by: Richard Henderson --- target/riscv/op_helper.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c index e34715d..7c6068b 100644 --- a/target/riscv/op_helper.c +++ b/target/riscv/op_helper.c @@ -144,8 +144,21 @@ void csr_write_helper(CPURISCVState *env, target_ulong val_to_write, } mstatus = (mstatus & ~mask) | (val_to_write & mask); -int dirty = (mstatus & MSTATUS_FS) == MSTATUS_FS; -dirty |= (mstatus & MSTATUS_XS) == MSTATUS_XS; + +/* Note: this is a workaround for an issue where mstatus.FS + does not report dirty after floating point operations + that modify floating point state. This workaround is + technically compliant with the RISC-V Privileged + specification as it is legal to return only off, or dirty. + at the expense of extra floating point save/restore. */ + +/* FP is always dirty or off */ +if (mstatus & MSTATUS_FS) { +mstatus |= MSTATUS_FS; +} + +int dirty = ((mstatus & MSTATUS_FS) == MSTATUS_FS) | +((mstatus & MSTATUS_XS) == MSTATUS_XS); mstatus = set_field(mstatus, MSTATUS_SD, dirty); env->mstatus = mstatus; break; -- 2.7.0
Re: [Qemu-devel] [PATCH v4 2/2] tpm: extend TPM TIS with state migration support
Hi On Thu, Mar 29, 2018 at 1:56 AM, Stefan Berger wrote: > On 03/28/2018 11:41 AM, Marc-André Lureau wrote: >> >> Hi >> >> On Thu, Mar 1, 2018 at 8:59 PM, Stefan Berger >> wrote: >>> >>> + >>> +static const VMStateDescription vmstate_locty = { >>> +.name = "loc", >>> +.version_id = 1, >>> +.minimum_version_id = 0, >>> +.minimum_version_id_old = 0, >> >> I don't understand the problem there is leaving all the version fields >> to 0, just like CRB. >> >>> +.fields = (VMStateField[]) { >>> +VMSTATE_UINT32(state, TPMLocality), >>> +VMSTATE_UINT32(inte, TPMLocality), >>> +VMSTATE_UINT32(ints, TPMLocality), >>> +VMSTATE_UINT8(access, TPMLocality), >>> +VMSTATE_UINT32(sts, TPMLocality), >>> +VMSTATE_UINT32(iface_id, TPMLocality), >>> +VMSTATE_END_OF_LIST(), >>> +} >>> +}; >>> + >>> static const VMStateDescription vmstate_tpm_tis = { >>> .name = "tpm", >>> -.unmigratable = 1, >>> +.version_id = 1, >>> +.minimum_version_id = 0, >>> +.minimum_version_id_old = 0, >> >> same >> >> If you remove the version fields: Reviewed-by: Marc-André Lureau >> > > > This is the error I got when setting .version_id = 0 (on both) and doing a > localhost migration > > qemu-system-x86_64: Missing section footer for tpm-tis > qemu-system-x86_64: load of migration failed: Invalid argument It's clearly not the most friendly error message, but I debugged it, you just have to specify the right version for VMSTATE_STRUCT_ARRAY: 0. VMSTATE_STRUCT_ARRAY(loc, TPMState, TPM_TIS_NUM_LOCALITIES, 0, vmstate_locty, TPMLocality), Then it all works with version 0 (or default value) thanks > > It must have something to do with the nesting invoked by > VMSTATE_STRUCT_ARRAY(loc,...) below. > > > > >> >> >> >>> +.pre_save = tpm_tis_pre_save, >>> +.fields = (VMStateField[]) { >>> +VMSTATE_BUFFER(buffer, TPMState), >>> +VMSTATE_UINT16(rw_offset, TPMState), >>> +VMSTATE_UINT8(active_locty, TPMState), >>> +VMSTATE_UINT8(aborting_locty, TPMState), >>> +VMSTATE_UINT8(next_locty, TPMState), >>> + >>> +VMSTATE_STRUCT_ARRAY(loc, TPMState, TPM_TIS_NUM_LOCALITIES, 1, >>> + vmstate_locty, TPMLocality), >>> + >>> +VMSTATE_END_OF_LIST() >>> +} >>> }; >>> >>> static Property tpm_tis_properties[] = { >>> -- >>> 2.5.5 >>> > > -- Marc-André Lureau
Re: [Qemu-devel] [PATCH v4 3/9] cli: add -preconfig option
On Thu, Mar 29, 2018 at 03:01:12PM +0200, Igor Mammedov wrote: > On Wed, 28 Mar 2018 16:17:32 -0300 > Eduardo Habkost wrote: > > > On Tue, Mar 27, 2018 at 05:05:41PM +0200, Igor Mammedov wrote: > > > On Fri, 23 Mar 2018 18:25:08 -0300 > > > Eduardo Habkost wrote: > > > > > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote: > > > [...] > > > > > diff --git a/vl.c b/vl.c > > > > > index 3ef04ce..69b1997 100644 > > > > > --- a/vl.c > > > > > +++ b/vl.c > > > > > @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, > > > > > QemuOpts *opts, Error **errp) > > > > > /***/ > > > > > /* QEMU state */ > > > > > > > > > > -static RunState current_run_state = RUN_STATE_PRELAUNCH; > > > > > +static RunState current_run_state = RUN_STATE_PRECONFIG; > > > > > > > > > > /* We use RUN_STATE__MAX but any invalid value will do */ > > > > > static RunState vmstop_requested = RUN_STATE__MAX; > > > > > @@ -606,6 +606,9 @@ typedef struct { > > > > > > > > > > static const RunStateTransition runstate_transitions_def[] = { > > > > > /* from -> to */ > > > > > +{ RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH }, > > > > > +{ RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE }, > > > > > > > > Don't this mean -preconfig and -incoming could work together? > > > theoretically yes, but its not the reason why this transition is here. > > > It's mimicking existing approach where initial state > > >{ RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > > > were allowed to move to the next possible (including RUN_STATE_INMIGRATE) > > > > > > > I still don't get it. Where this definition of "next possible" > > comes from? If -incoming and -preconfig don't work together, why > > is PRECONFIG -> INMIGRATE migration considered possible? > I'd think it's the same (replacement) hack which we use now >RUN_STATE_PRELAUNCH -> RUN_STATE_INMIGRATE > to allow following code to succeed: > > case QEMU_OPTION_incoming: > if (!incoming) { > runstate_set(RUN_STATE_INMIGRATE); > } > incoming = optarg; > > I'd get rid of it and move state switching to the actual place > where migration starts if it were just that simple, but from > a quick look around it did look rather risky. > That's why I abandoned an idea of changing it within this series. Yeah, I now see that the initial state is PRECONFIG. > > > > > > { RUN_STATE_DEBUG, RUN_STATE_RUNNING }, > > > > > { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE }, > > > > > { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH }, > > > > > @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid; > > > > > static int powerdown_requested; > > > > > static int debug_requested; > > > > > static int suspend_requested; > > > > > +static bool preconfig_exit_requested = true; > > > > > static WakeupReason wakeup_reason; > > > > > static NotifierList powerdown_notifiers = > > > > > NOTIFIER_LIST_INITIALIZER(powerdown_notifiers); > > > > > @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void) > > > > > return r; > > > > > } > > > > > > > > > > +void qemu_exit_preconfig_request(void) > > > > > +{ > > > > > +preconfig_exit_requested = true; > > > > > +} > > > > > + > > > > > /* > > > > > * Reset the VM. Issue an event unless @reason is > > > > > SHUTDOWN_CAUSE_NONE. > > > > > */ > > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void) > > > > > RunState r; > > > > > ShutdownCause request; > > > > > > > > > > +if (preconfig_exit_requested) { > > > > > +if (runstate_check(RUN_STATE_PRECONFIG)) { > > > > > > > > Is it possible to have preconfig_exit_request set outside of > > > > RUN_STATE_PRECONFIG? When and why? > > > preconfig_exit_requested is initialized with TRUE and > > > in combo with '-inmigrate' we need this runstate check. > > > > I think this now makes sense to me. It still looks confusing, > > but I don't have a better suggestion right now. > > > > Except... > > > > Why exactly do you need to use main_loop() and > > main_loop_should_exit() for the preconfig loop? What about a > > separate preconfig_loop() and preconfig_loop_should_exit() > > function? > that would duplicate main_loop() for practically no benefit at all, > hence I'm reusing existing main_loop()/main_loop_should_exit() > just by adding relevant exit condition. It also easier to read > when state transitions are kept close to each other. I wouldn't say that main_loop_should_exit() is easy to read, but I understand that this is the existing style, so no objection. > > > > > it's the same as it was with > > > { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > > > which I probably should remove (I need to check it though) > > > > > > > > +runstate
Re: [Qemu-devel] [PATCH] tcg: fix 16-byte vector operations detection
Hi, I think it would be good to have this fix (or something similar) in -rc2. Without this we can't build QEMU 2.12 PPC64LE for RHEL7 [1] Any comment? Thanks, Laurent [1] we have bugzilla for RHEL7 GCC, https://bugzilla.redhat.com/show_bug.cgi?id=1546728 but as configure provides a way to workaround the problem we should also fix configure. On 28/03/2018 15:31, Laurent Vivier wrote: > configure tries to detect if the compiler > supports 16-byte vector operations. > > As stated in the comment of the detection > program, there is a problem with the system > compiler on GCC on Centos 7. > > This program doesn't actually detect the problem > with GCC on RHEL7 on PPC64LE (Red Hat 4.8.5-28). > > This patch updates the test to look more like > it is in QEMU helpers, and now detects the problem. > > The error reported is: > > CC ppc64-softmmu/accel/tcg/tcg-runtime-gvec.o > ..//accel/tcg/tcg-runtime-gvec.c: In function ‘helper_gvec_shl8i’: > ../accel/tcg/tcg-runtime-gvec.c:558:26: internal compiler error: in > emit_move_insn, at expr.c:3495 >*(vec8 *)(d + i) = *(vec8 *)(a + i) << shift; > ^ > Fixes: db43267 "tcg: Add generic vector expanders" > Signed-off-by: Laurent Vivier > --- > configure | 8 > 1 file changed, 8 insertions(+) > > diff --git a/configure b/configure > index 4d0e92c96c..a2301dd0dc 100755 > --- a/configure > +++ b/configure > @@ -5054,6 +5054,14 @@ static S2 c2; > static S4 c4; > static S8 c8; > static int i; > +void helper(void *d, void *a, int shift, int i); > +void helper(void *d, void *a, int shift, int i) > +{ > + *(U1 *)(d + i) = *(U1 *)(a + i) << shift; > + *(U2 *)(d + i) = *(U2 *)(a + i) << shift; > + *(U4 *)(d + i) = *(U4 *)(a + i) << shift; > + *(U8 *)(d + i) = *(U8 *)(a + i) << shift; > +} > int main(void) > { >a1 += b1; a2 += b2; a4 += b4; a8 += b8; >
Re: [Qemu-devel] [PATCH v4 2/9] numa: split out NumaOptions parsing into parse_NumaOptions()
On Thu, Mar 29, 2018 at 03:05:09PM +0200, Igor Mammedov wrote: > On Wed, 28 Mar 2018 15:54:28 -0300 > Eduardo Habkost wrote: > > > On Tue, Mar 27, 2018 at 03:08:27PM +0200, Igor Mammedov wrote: > > > On Fri, 23 Mar 2018 17:42:18 -0300 > > > Eduardo Habkost wrote: > > > > > > > On Mon, Mar 12, 2018 at 02:11:08PM +0100, Igor Mammedov wrote: > > > > > it will allow to reuse parse_NumaOptions() for parsing > > > > > configuration commands received via QMP interface > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > --- > > > > > include/sysemu/numa.h | 1 + > > > > > numa.c| 48 > > > > > +--- > > > > > 2 files changed, 30 insertions(+), 19 deletions(-) > > > > > > > > > > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h > > > > > index 21713b7..7a0ae75 100644 > > > > > --- a/include/sysemu/numa.h > > > > > +++ b/include/sysemu/numa.h > > > > > @@ -22,6 +22,7 @@ struct NumaNodeMem { > > > > > }; > > > > > > > > > > extern NodeInfo numa_info[MAX_NODES]; > > > > > +int parse_numa(void *opaque, QemuOpts *opts, Error **errp); > > > > > void parse_numa_opts(MachineState *ms); > > > > > void numa_complete_configuration(MachineState *ms); > > > > > void query_numa_node_mem(NumaNodeMem node_mem[]); > > > > > diff --git a/numa.c b/numa.c > > > > > index 126c649..2b1d292 100644 > > > > > --- a/numa.c > > > > > +++ b/numa.c > > > > > @@ -169,28 +169,11 @@ static void parse_numa_distance(NumaDistOptions > > > > > *dist, Error **errp) > > > > > have_numa_distance = true; > > > > > } > > > > > > > > > > -static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) > > > > > +static > > > > > +void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error > > > > > **errp) > > > > > > > > I wonder if we should rename the parse_numa_{node,distance}() > > > > functions to configure_numa_{node,distance}(), and this one > > > > configure_numa(). These functions don't parse anything, anymore. > > > I'd preffer to do it in another patch on top of this series > > > (added it my TODO list) > > > > I agree with renaming parse_numa*() later, but the new function > > you are creating can have a more reasonable name as it doesn't > > parse anything. > > > how about: s/parse_NumaOptions/set_NumaOptions/ No strong preference. "register", "configure", "set" would all be good enough to me. I would avoid the weird underline_and_CamelCase naming style, though (this style seems to be used only by generated QAPI code). -- Eduardo
Re: [Qemu-devel] [PATCH v4 3/9] cli: add -preconfig option
On Thu, Mar 29, 2018 at 01:43:03PM +0200, Igor Mammedov wrote: > On Wed, 28 Mar 2018 16:21:48 -0300 > Eduardo Habkost wrote: > > > On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote: > > > On Tue, 27 Mar 2018 17:05:41 +0200 > > > Igor Mammedov wrote: > > > > > > > On Fri, 23 Mar 2018 18:25:08 -0300 > > > > Eduardo Habkost wrote: > > > > > > > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote: > > > > [...] > > > [...] > > > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void) > > > > > > RunState r; > > > > > > ShutdownCause request; > > > > > > > > > > > > +if (preconfig_exit_requested) { > > > > > > +if (runstate_check(RUN_STATE_PRECONFIG)) { > > > > > > > > > > Is it possible to have preconfig_exit_request set outside of > > > > > RUN_STATE_PRECONFIG? When and why? > > > > preconfig_exit_requested is initialized with TRUE and > > > > in combo with '-inmigrate' we need this runstate check. > > > > it's the same as it was with > > > > { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > > > > which I probably should remove (I need to check it though) > > > [...] > > > > > > > > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp) > > > > > > } > > > > > > parse_numa_opts(current_machine); > > > > > > > > > > > > +/* do monitor/qmp handling at preconfig state if requested */ > > > > > > +main_loop(); > > > > > > > > > > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }" > > > > > instead of entering main_loop() just to exit immediately? > > > > The thought didn't cross my mind, it might work and more readable > > > > as one doesn't have to jump into main_loop() to find out that > > > > it would exit immediately. > > > > I'll try to it on respin. > > > Well doing as suggested end ups more messy: > > > > > > @@static bool main_loop_should_exit(void) > > > ... > > > if (preconfig_exit_requested) { > > > runstate_set(RUN_STATE_PRELAUNCH); > > > > > > return true; > > > } > > > > > > @@main > > > /* do monitor/qmp handling at preconfig state if requested */ > > > if (!preconfig_exit_requested) { > > > main_loop(); > > > } else if (runstate_check(RUN_STATE_PRECONFIG)) { > > > runstate_set(RUN_STATE_PRELAUNCH); > > > } > > > > This doesn't make sense to me. Why would we enter > > RUN_STATE_PRECONFIG state if -preconfig is not used at all? > because of RUN_STATE_PRECONFIG becomes new initial state of > our state machine where we start of (used to be RUN_STATE_PRELAUNCH) Oh, I missed that part. > Lets call it variant 1: > > with this we have 2 possible transitions: > RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init) > > and > > RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE >ugly but it was the same with RUN_STATE_PRELAUNCH initial transition That explains a lot, thanks. > > Another variant 2, in case we switch to RUN_STATE_PRECONFIG only on -preconfig > transitions would be > RUN_STATE_PRELAUNCH -> RUN_STATE_PRECONFIG > (allow switch from initial to -preconfig) > > RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH > > while the last is valid transition, the 1st one isn't really > valid because of (beside of switching from initial state) it > allows bouncing back to RUN_STATE_PRECONFIG later. > > If we consider only state machine transitions, I think it's > cleaner to start with variant 1 with the same > -inmigrate hack we already have (which potentially could > be fixed later), than allowing arbitrary bouncing to > RUN_STATE_PRECONFIG at later stage. > > With this approach all processing before machine_init() > would run at RUN_STATE_PRECONFIG and then we would switch > to RUN_STATE_PRELAUNCH. Even though it is far reaching > goal but at least that's where we should be moving to > have sane initialization flow in vl.c Thanks, now variant 1 makes more sense to me. But I really miss here are very clear and explicit descriptions of what each state really mean, and what are the differences between them. It looks like the existing description for `prelaunch` isn't accurate: # @prelaunch: QEMU was started with -S and guest has not started This is false, as QEMU can be in `prelaunch` state even if -S is not used. Also, this is the description you proposed for `preconfig`: # @preconfig: QEMU is paused before board specific init callback is executed. # The state is reachable only if -preconfig CLI option is used. # (Since 2.12) This seems wrong as well: the `prelaunch` state is reachable even if `-preconfig` isn't used in the command-line (because it is the initial state). > > > > preconfig_exit_requested = false; > > > ... > > > > > > I'd prefer original v4 approach, where only main_loop_should_exit() > > > has to deal with state transitions and book-keeping. > > > > If the abov
Re: [Qemu-devel] [PATCH v3 3/4] qobject: replace qobject_incref/QINCREF qobject_decref/QDECREF
On 03/29/2018 10:48 AM, Marc-André Lureau wrote: Now that we can safely call QOBJECT() on QObject * and children types, we can have a single macro to ref/unref the object. Change the incref/decref prefix name for the more common ref/unref. Note that before the patch, "QDECREF(obj)" was problematic because it would expand to "obj ? obj : ...", which could evaluate "obj" multiple times. Signed-off-by: Marc-André Lureau --- +++ b/include/qapi/qmp/qobject.h @@ -48,14 +48,6 @@ struct QObject { __x ? container_of(&(__x)->base, QObject, base) : NULL; \ }) -/* High-level interface for qobject_incref() */ -#define QINCREF(obj) \ -qobject_incref(QOBJECT(obj)) - -/* High-level interface for qobject_decref() */ -#define QDECREF(obj) \ -qobject_decref(obj ? QOBJECT(obj) : NULL) - Interesting choice to move the macros... /* Required for qobject_to() */ #define QTYPE_CAST_TO_QNull QTYPE_QNULL #define QTYPE_CAST_TO_QNum QTYPE_QNUM @@ -78,10 +70,7 @@ static inline void qobject_init(QObject *obj, QType type) obj->base.type = type; } -/** - * qobject_incref(): Increment QObject's reference count - */ -static inline void qobject_incref(QObject *obj) +static inline void qobject_ref(QObject *obj) { if (obj) { obj->base.refcnt++; @@ -102,11 +91,7 @@ bool qobject_is_equal(const QObject *x, const QObject *y); */ void qobject_destroy(QObject *obj); -/** - * qobject_decref(): Decrement QObject's reference count, deallocate - * when it reaches zero - */ -static inline void qobject_decref(QObject *obj) +static inline void qobject_unref(QObject *obj) { assert(!obj || obj->base.refcnt); if (obj && --obj->base.refcnt == 0) { @@ -114,6 +99,19 @@ static inline void qobject_decref(QObject *obj) } } +/** + * qobject_ref(): Increment QObject's reference count + */ +#define qobject_ref(obj)\ +qobject_ref(QOBJECT(obj)) ...below the functions of the same name. C preprocessor rules guarantee that you don't get infinite expansion, although I did a double-take the first time I read through the patch (especially since your v2 used the name qobject_ref_impl() for the function, distinct from the macro name). Worth a comment? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH v3 2/4] qobject: introduce QObjectCommon
On 03/29/2018 10:48 AM, Marc-André Lureau wrote: By moving the common fields to a QObjectCommon, QObject can be a type which also has a 'base' QObjectCommon field. This allows to write a generic QOBJECT() macro that will work with any QObject type, including QObject itself. The container_of() macro ensures that the object to cast has a QObjectCommon base field, give me some type safety guarantees. However, for it to work properly, all QObject types must have 'base' at offset 0 (which is ensured by static checking from previous patch) Commit message should mention the rationale you gave in v2 of NOT using a typedef for QObjectCommon (which was intentional so as to minimize the chance that the type gets abused). We could even go so far as to name it QObject_, with trailing underscore, rather than QObjectCommon, to make it obvious it is not for normal use. Signed-off-by: Marc-André Lureau --- -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH 16/16] tcg: remove tb_lock
Emilio G. Cota writes: > Use mmap_lock in user-mode to protect TCG state and the page > descriptors. > In !user-mode, each vCPU has its own TCG state, so no locks > needed. Per-page locks are used to protect the page descriptors. > > Per-TB locks are used in both modes to protect TB jumps. > > Some notes: > > - tb_lock is removed from notdirty_mem_write by passing a > locked page_collection to tb_invalidate_phys_page_fast. > > - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), > so there is no need to further serialize access to them. > > - do_tb_flush is run in a safe async context, meaning no other > vCPU threads are running. Therefore acquiring mmap_lock there > is just to please tools such as thread sanitizer. > > - Not visible in the diff, but tb_invalidate_phys_page already > has an assert_memory_lock. > > - cpu_io_recompile is !user-only, so no mmap_lock there. > > - Added mmap_unlock()'s before all siglongjmp's that could > be called in user-mode while mmap_lock is held. > + Added an assert for !have_mmap_lock() after returning from > the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. > > Performance numbers before/after: > > Host: AMD Opteron(tm) Processor 6376 > > ubuntu 17.04 ppc64 bootup+shutdown time > > 700 +-+--++--++---+*--+-+ > |++ ++ + *B| > | before ***B***** *| > |tb lock removal ###D### ***| > 600 +-+ *** +-+ > | ** #| > |*B* #D| > | *** * ## | > 500 +-+*** ### +-+ > | * *** ### | > |*B* # ## | > | ** * #D#| > 400 +-+ **## +-+ > | ** ### | > |** ##| > | ** # ## | > 300 +-+ * B* #D# +-+ > |B ***### | > |* ** | > | * *** ### | > 200 +-+ B *B #D# +-+ > | #B* * ## # | > | #*## | > |+ D##D# ++ ++| > 100 +-+--++--++---++--+-+ >18 16 Guest CPUs 48 64 > png: https://imgur.com/HwmBHXe > > debian jessie aarch64 bootup+shutdown time > > 90 +-+--+-+-++++--+-+ > |+ + ++++| > | before ***B***B| > 80 +tb lock removal ###D### **D +-+ > | **###| > | **## | > 70 +-+ ** # +-+ > | ** ## | > | ** #| > 60 +-+ *B ## +-+ > | ** ## | > |*** #D | > 50 +-+ *** ## +-+ > | * ** ### | > | **B* ###| > 40 +-+ # ## +-+ > | #D# | > | ***B** ###| > 30 +-+B***B** +-+ > |B * * # ### | > | B ###D# | > 20 +-+ D ##D## +-+ > | D#| > |+ + ++++| > 10 +-+--+-+-++++--+-+ > 1 8 16
Re: [Qemu-devel] [PATCH v3 4/4] qobject: modify qobject_ref() to assert on NULL and return obj
On 03/29/2018 10:48 AM, Marc-André Lureau wrote: Following a discussion on the mailing list: while it may be convenient to accept NULL value in qobject_unref() (for similar reasons as free() accepts NULL), it is a probably a bad idea to accept NULL argument in qobject_ref(). Furthermore, it is convenient and more clear to call qobject_ref() at the time when the reference is associated with a variable, or argument. For this reason, make qobject_ref() return the same pointer as given. Signed-off-by: Marc-André Lureau --- @@ -101,13 +101,18 @@ static inline void qobject_unref(QObject *obj) /** * qobject_ref(): Increment QObject's reference count + * @obj: a #QObject or children type instance (must not be NULL) s/children/child/ + * + * Returns: the same @obj. The type of @obj will be propagated to the + * return type. */ #define qobject_ref(obj)\ -qobject_ref(QOBJECT(obj)) +(typeof(obj)) qobject_ref(QOBJECT(obj)) You're missing outer (). There are cases where '(cast) (expr)' and '((cast)(expr))' can behave differently when combined with surrounding syntax; for example, -> has higher precedence than cast. Consider: qobject_ref(my_qdict)->size; As you wrote it, you would attempt to dereference the 'size' member of void* (compile error), prior to casting that result to QDict*; with the parenthesis, you get the intended dereference of the size member of the QDict pointer. +++ b/block.c @@ -5134,8 +5134,8 @@ static bool append_open_options(QDict *d, BlockDriverState *bs) continue; } -qobject_ref(qdict_entry_value(entry)); -qdict_put_obj(d, qdict_entry_key(entry), qdict_entry_value(entry)); +qdict_put_obj(d, qdict_entry_key(entry), + qobject_ref(qdict_entry_value(entry))); However, I like this simplification that your patch enables. How did you find candidate spots? Manually, or via Coccinelle? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH qemu] vfio: Print address space address when cannot map MMIO for DMA
On Thu, 29 Mar 2018 12:55:51 +1100 Alexey Kardashevskiy wrote: > On 29/3/18 8:03 am, Auger Eric wrote: > > Hi Alexey, Alex, > > On 22/03/18 09:18, Alexey Kardashevskiy wrote: > >> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added > >> an error message if a passed memory section address or size is not aligned > >> to the minimal IOMMU page size. However although it checks > >> offset_within_address_space for the alignment, offset_within_region is > >> printed instead which makes it harder to find out what device caused > >> the message so this replaces offset_within_region with > >> offset_within_address_space. > >> > >> While we are here, this replaces '..' with 'size=' (as the second number > >> is a size, not an end offset) and adds a memory region name. > >> > >> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions" > >> Signed-off-by: Alexey Kardashevskiy > > The patch indeed fixes the reported format issues. > > > > However I have some other concerns with the info that is reported to the > > end-user. See below. > > > > Assigning an e1000e device with a 64kB host, here are the traces I get: > > > > Region XXX is not aligned to 0x1 and cannot be mapped for DMA > > > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0 > > "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8 > > > > It took me some time to understand what happens but here is now my > > understanding: > > > > 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr = > > bar_addr in vfio_sub_page_bar_update_mapping() I see the following values: > > > > UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a -> UNMAPPED -> 0x100a -> > > UNMAPPED -> 0x100e > > > > vfio_sub_page_bar_update_mapping() create mrs with base bar at > > 0x100a and 0x100e successively, hence the > > vfio_listener_region_add on 0x100a. Indeed, 0x0-0x50 msix-table mmio > > region induces some memory section at 0x100a0050 and 0x100e50 successively. > > > > However this is confusing for the end-user who only has access to the > > final mapping (0x100e) through lspi [1]. > > > The trace shows that at least at some point the BAR actually was > 0x100a, I find this info rather useful than confusing as it might > expose a bug of some sort, for example. > > The user also has access to the MR name which is the host PCI address + BAR > index, how is that confusing? Seriously? You're expecting an awfully lot from users to be able to interpret this in a meaningful way and not just freak out and file bugs. > > 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension > > of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping > >> 3) Also it happens that I have a virtio-scsi-pci device that is put just > > after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has > > e1000e gets aligned to 64k but this one avoids the alignment for some reason? > > > > its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to > > 64kB (with prio 0), we have those MMIO regions which result in new > > memory sections, which cause vfio_listener_region_add calls. This > > typically explains why we get a warning on 0x100e4808 (0xb7f8). By the > > way I don't get why we don't have a trace for "0004:01:00.0 BAR 3 > > mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between > > virtio-scsi-pci msic-table and pba. > > > "info mtree -f" might give a hint how MRs got resolved, could it end up > being emulated (==skipped by the vfio listener)? > > > > So at the end of the day, my fear is all those info become really > > frightening and confusing for the end-user and even not relevant > > (0x100a stuff). So I would rather simply remove the trace in 2.12 > > until we find a place where we could generate a clear hint for the > > end-user, suggesting to relocate the msix bar. > > > > Thoughts? > > Please post complete "lspci -v" output for both pci devices and "info mtree > -f" (in addition to "info mtree", not instead). > > In general, the error_report() could be removed as we did not have any > indication of not mapping before so we do not have to start now, I am just > missing the point here - the message exposes potentially not-working P2P > which is useful for people who care about that and do not know if actually > might work. Rather than silencing it, I'd convert it into the trace point. A trace point would be an ok option too. P2P is something I don't want to lose, but it has never worked in this particular case and I don't think it's worth alarming users with new warnings where nothing has changed and
Re: [Qemu-devel] [PATCH 15/16] translate-all: remove tb_lock mention from cpu_restore_state_from_tb
Emilio G. Cota writes: > tb_lock was needed when the function did retranslation. However, > since fca8a500d519 ("tcg: Save insn data and use it in > cpu_restore_state_from_tb") we don't do retranslation. > > Get rid of the comment. I think we need to modify the comment in cpu_restore_state as well: Either way we need return early as we can't resolve it here. > > Signed-off-by: Emilio G. Cota > --- > accel/tcg/translate-all.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > index 9ab6477..ee49d03 100644 > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -357,9 +357,7 @@ static int encode_search(TranslationBlock *tb, uint8_t > *block) > return p - block; > } > > -/* The cpu state corresponding to 'searched_pc' is restored. > - * Called with tb_lock held. > - */ > +/* The cpu state corresponding to 'searched_pc' is restored */ > static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, > uintptr_t searched_pc) > { -- Alex Bennée
Re: [Qemu-devel] [PATCH for-2.12 v5] iotests: Test abnormally large size in compressed cluster descriptor
On 2018-03-29 14:07, Alberto Garcia wrote: > L2 entries for compressed clusters have a field that indicates the > number of sectors used to store the data in the image. > > That's however not the size of the compressed data itself, just the > number of sectors where that data is located. The actual data size is > usually not a multiple of the sector size, and therefore cannot be > represented with this field. > > The way it works is that QEMU reads all the specified sectors and > starts decompressing the data until there's enough to recover the > original uncompressed cluster. If there are any bytes left that > haven't been decompressed they are simply ignored. > > One consequence of this is that even if the size field is larger than > it needs to be QEMU can handle it just fine: it will read more data > from disk but it will ignore the extra bytes. > > This test creates an image with two compressed clusters that use 5 > sectors (2.5 KB) each, increases the size field to the maximum (8192 > sectors, or 4 MB) and verifies that the data can be read without > problems. > > This test is important because while the decompressed data takes > exactly one cluster, the maximum value allowed in the compressed size > field is twice the cluster size. So although QEMU won't produce images > with such large values we need to make sure that it can handle them. > > Another effect of increasing the size field is that it can make > it include data from the following host cluster(s). In this case > 'qemu-img check' will detect that the refcounts are not correct, and > we'll need to rebuild them. > > Additionally, this patch also tests that decreasing the size corrupts > the image since the original data can no longer be recovered. In this > case QEMU returns an error when trying to read the compressed data, > but 'qemu-img check' doesn't see anything wrong if the refcounts are > consistent. > > One possible task for the future is to make 'qemu-img check' verify > the sizes of the compressed clusters, by trying to decompress the data > and checking that the size stored in the L2 entry is correct. > > Signed-off-by: Alberto Garcia > --- > v5: Use 'write -c' instead of 'write' followed by 'convert' [Max] > Add TODO comment explaining that the size of compressed clusters > should also be corrected when it's too large in order to avoid > referencing other unrelated clusters. > > v4: Resend for 2.12 > > v3: Add TODO comment, as suggested by Eric. > > Corrupt the length of the second compressed cluster as well so the > uncompressed data would span three host clusters. > > v2: We now have two scenarios where we make QEMU read data from the > next host cluster and from beyond the end of the image. This > version also runs qemu-img check on the corrupted image. > > If the size field is too small, reading fails but qemu-img check > succeeds. > > If the size field is too large, reading succeeds but qemu-img > check fails (this can be repaired, though). > --- > tests/qemu-iotests/122 | 47 > ++ > tests/qemu-iotests/122.out | 33 > 2 files changed, 80 insertions(+) Now without any convert I have no idea what this test case is doing in 122, but, oh well. Thanks, applied to my block branch: https://github.com/XanClic/qemu/commits/block Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH qemu] vfio: Print address space address when cannot map MMIO for DMA
On Thu, 29 Mar 2018 16:42:12 +0200 Auger Eric wrote: > Hi Alex, > > On 29/03/18 00:13, Alex Williamson wrote: > > On Wed, 28 Mar 2018 23:03:23 +0200 > > Auger Eric wrote: > > > >> Hi Alexey, Alex, > >> On 22/03/18 09:18, Alexey Kardashevskiy wrote: > >>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added > >>> an error message if a passed memory section address or size is not aligned > >>> to the minimal IOMMU page size. However although it checks > >>> offset_within_address_space for the alignment, offset_within_region is > >>> printed instead which makes it harder to find out what device caused > >>> the message so this replaces offset_within_region with > >>> offset_within_address_space. > >>> > >>> While we are here, this replaces '..' with 'size=' (as the second number > >>> is a size, not an end offset) and adds a memory region name. > >>> > >>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions" > >>> Signed-off-by: Alexey Kardashevskiy > >> The patch indeed fixes the reported format issues. > >> > >> However I have some other concerns with the info that is reported to the > >> end-user. See below. > >> > >> Assigning an e1000e device with a 64kB host, here are the traces I get: > >> > >> Region XXX is not aligned to 0x1 and cannot be mapped for DMA > >> > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0 > >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8 > >> > >> It took me some time to understand what happens but here is now my > >> understanding: > >> > >> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr = > >> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values: > >> > >> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a -> UNMAPPED -> 0x100a -> > >> UNMAPPED -> 0x100e > >> > >> vfio_sub_page_bar_update_mapping() create mrs with base bar at > >> 0x100a and 0x100e successively, hence the > >> vfio_listener_region_add on 0x100a. Indeed, 0x0-0x50 msix-table mmio > >> region induces some memory section at 0x100a0050 and 0x100e50 successively. > >> > >> However this is confusing for the end-user who only has access to the > >> final mapping (0x100e) through lspi [1]. > >> > >> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension > >> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping > >> > >> 3) Also it happens that I have a virtio-scsi-pci device that is put just > >> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has > >> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to > >> 64kB (with prio 0), we have those MMIO regions which result in new > >> memory sections, which cause vfio_listener_region_add calls. This > >> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the > >> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3 > >> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between > >> virtio-scsi-pci msic-table and pba. > >> > >> So at the end of the day, my fear is all those info become really > >> frightening and confusing for the end-user and even not relevant > >> (0x100a stuff). So I would rather simply remove the trace in 2.12 > >> until we find a place where we could generate a clear hint for the > >> end-user, suggesting to relocate the msix bar. > >> > >> Thoughts? > > > > Yep, I think that's probably the right approach. Everything works as > > it should and nothing has worse access in 2.12 than it did in 2.11, > > there's only the opportunity to make things better with msi-x > > relocation and I don't think we want to error on the side of reporting > > too many errors that users cannot understand in an attempt to advise > > them of an unsupported option that might be better. Let's remove the > > error report for 2.12 and think about how we could make a concise > > suggestion, once, while initializing the device. Who's posting the > > patch? Thanks, > > > > Alex > > > > PS - Why isn't the firmware/kernel on aarch64 making an attempt to > > align PCI resources on page boundaries? > > so according to Laszlo's input I understand virtio-scsi-pci region 1 is > 4K and the spec only mandates a natural alignment. Right, as I noted in the other thread, it's not a spec requirement, more of a best practices feature. > Does > > pci=realloc,resource_alignment=pci:: change it? (I'm > > not sure if PCI_ANY_ID works for that option) > no, it does not. Not entirely unexpected, and since you're dealing with a virtio-scsi device it could certainly be your boot device, so firmware leaving it
[Qemu-devel] [PATCH v3 4/4] qobject: modify qobject_ref() to assert on NULL and return obj
Following a discussion on the mailing list: while it may be convenient to accept NULL value in qobject_unref() (for similar reasons as free() accepts NULL), it is a probably a bad idea to accept NULL argument in qobject_ref(). Furthermore, it is convenient and more clear to call qobject_ref() at the time when the reference is associated with a variable, or argument. For this reason, make qobject_ref() return the same pointer as given. Signed-off-by: Marc-André Lureau --- include/qapi/qmp/qobject.h| 15 ++- block.c | 8 block/blkdebug.c | 7 +++ block/blkverify.c | 8 block/null.c | 3 +-- block/nvme.c | 3 +-- block/quorum.c| 4 ++-- monitor.c | 19 +++ qapi/qobject-input-visitor.c | 6 ++ qapi/qobject-output-visitor.c | 7 +++ qobject/qdict.c | 33 +++-- 11 files changed, 48 insertions(+), 65 deletions(-) diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h index 4183d81f00..bd3a618a73 100644 --- a/include/qapi/qmp/qobject.h +++ b/include/qapi/qmp/qobject.h @@ -70,11 +70,11 @@ static inline void qobject_init(QObject *obj, QType type) obj->base.type = type; } -static inline void qobject_ref(QObject *obj) +static inline void *qobject_ref(QObject *obj) { -if (obj) { -obj->base.refcnt++; -} +assert(obj); +obj->base.refcnt++; +return obj; } /** @@ -101,13 +101,18 @@ static inline void qobject_unref(QObject *obj) /** * qobject_ref(): Increment QObject's reference count + * @obj: a #QObject or children type instance (must not be NULL) + * + * Returns: the same @obj. The type of @obj will be propagated to the + * return type. */ #define qobject_ref(obj)\ -qobject_ref(QOBJECT(obj)) +(typeof(obj)) qobject_ref(QOBJECT(obj)) /** * qobject_unref(): Decrement QObject's reference count, deallocate * when it reaches zero + * @obj: a #QObject or children type instance (can be NULL) */ #define qobject_unref(obj) \ qobject_unref(QOBJECT(obj)) diff --git a/block.c b/block.c index 55a79845be..676e57f562 100644 --- a/block.c +++ b/block.c @@ -5134,8 +5134,8 @@ static bool append_open_options(QDict *d, BlockDriverState *bs) continue; } -qobject_ref(qdict_entry_value(entry)); -qdict_put_obj(d, qdict_entry_key(entry), qdict_entry_value(entry)); +qdict_put_obj(d, qdict_entry_key(entry), + qobject_ref(qdict_entry_value(entry))); found_any = true; } @@ -5207,8 +5207,8 @@ void bdrv_refresh_filename(BlockDriverState *bs) * suffices without querying the (exact_)filename of this BDS. */ if (bs->file->bs->full_open_options) { qdict_put_str(opts, "driver", drv->format_name); -qobject_ref(bs->file->bs->full_open_options); -qdict_put(opts, "file", bs->file->bs->full_open_options); +qdict_put(opts, "file", + qobject_ref(bs->file->bs->full_open_options)); bs->full_open_options = opts; } else { diff --git a/block/blkdebug.c b/block/blkdebug.c index 689703d386..053372c22e 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -845,13 +845,12 @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options) opts = qdict_new(); qdict_put_str(opts, "driver", "blkdebug"); -qobject_ref(bs->file->bs->full_open_options); -qdict_put(opts, "image", bs->file->bs->full_open_options); +qdict_put(opts, "image", qobject_ref(bs->file->bs->full_open_options)); for (e = qdict_first(options); e; e = qdict_next(options, e)) { if (strcmp(qdict_entry_key(e), "x-image")) { -qobject_ref(qdict_entry_value(e)); -qdict_put_obj(opts, qdict_entry_key(e), qdict_entry_value(e)); +qdict_put_obj(opts, qdict_entry_key(e), + qobject_ref(qdict_entry_value(e))); } } diff --git a/block/blkverify.c b/block/blkverify.c index 3cffcb1ca6..754cc9e857 100644 --- a/block/blkverify.c +++ b/block/blkverify.c @@ -291,10 +291,10 @@ static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options) QDict *opts = qdict_new(); qdict_put_str(opts, "driver", "blkverify"); -qobject_ref(bs->file->bs->full_open_options); -qdict_put(opts, "raw", bs->file->bs->full_open_options); -qobject_ref(s->test_file->bs->full_open_options); -qdict_put(opts, "test", s->test_file->bs->full_open_options); +qdict_put(opts, "raw", + qobject_ref(bs->file->bs->full_open_options)); +qdict_put(opts, "test", + qobject_ref(s->test_file->bs->full_open_options)); bs->full_open_options = opts; } diff --git a/block
[Qemu-devel] [PATCH v3 0/4] RFC: simplify qobject refcount
Hi, This series aims to get rid of the distinction between QObject, that must use qobject_incref/qobject_decref and its various derived types that have to use QINCREF/QDECREF. Instead, replace it with qobject_ref/qobject_unref for all types. v3: after v2 review with Eric and Paolo - fix clang ubsan warning when a null pointer is given to QOBJECT. - add a patch to make qobject_ref() assert on null pointer, and return the same pointer, simplifying some code. v2: - use the QObjectCommon base approach suggested by Paolo and Eric - remove need for QEMU_GENERIC Marc-André Lureau (4): qobject: ensure base is at offset 0 qobject: introduce QObjectCommon qobject: replace qobject_incref/QINCREF qobject_decref/QDECREF qobject: modify qobject_ref() to assert on NULL and return obj scripts/qapi/events.py | 2 +- include/qapi/qmp/qbool.h| 2 +- include/qapi/qmp/qdict.h| 2 +- include/qapi/qmp/qlist.h| 2 +- include/qapi/qmp/qnull.h| 4 +- include/qapi/qmp/qnum.h | 2 +- include/qapi/qmp/qobject.h | 75 +++- include/qapi/qmp/qstring.h | 2 +- block.c | 82 ++--- block/blkdebug.c| 7 +- block/blkverify.c | 8 +-- block/crypto.c | 4 +- block/gluster.c | 4 +- block/iscsi.c | 2 +- block/nbd.c | 4 +- block/nfs.c | 4 +- block/null.c| 3 +- block/nvme.c| 3 +- block/parallels.c | 4 +- block/qapi.c| 2 +- block/qcow.c| 8 +-- block/qcow2.c | 8 +-- block/qed.c | 4 +- block/quorum.c | 4 +- block/rbd.c | 14 ++-- block/sheepdog.c| 12 ++-- block/snapshot.c| 4 +- block/ssh.c | 4 +- block/vdi.c | 2 +- block/vhdx.c| 4 +- block/vpc.c | 4 +- block/vvfat.c | 2 +- block/vxhs.c| 2 +- blockdev.c | 16 ++--- hw/i386/acpi-build.c| 12 ++-- hw/ppc/spapr_drc.c | 2 +- hw/usb/xen-usb.c| 4 +- migration/migration.c | 4 +- migration/qjson.c | 2 +- monitor.c | 57 +++ qapi/qapi-dealloc-visitor.c | 4 +- qapi/qmp-dispatch.c | 6 +- qapi/qobject-input-visitor.c| 10 ++- qapi/qobject-output-visitor.c | 11 ++- qemu-img.c | 18 ++--- qemu-io.c | 6 +- qga/main.c | 12 ++-- qmp.c | 4 +- qobject/json-parser.c | 10 +-- qobject/qdict.c | 49 + qobject/qjson.c | 2 +- qobject/qlist.c | 4 +- qobject/qobject.c | 21 -- qom/object.c| 16 ++--- qom/object_interfaces.c | 2 +- target/ppc/translate_init.c | 2 +- target/s390x/cpu_models.c | 2 +- tests/ahci-test.c | 6 +- tests/check-qdict.c | 106 ++-- tests/check-qjson.c | 84 +++--- tests/check-qlist.c | 8 +-- tests/check-qlit.c | 10 +-- tests/check-qnull.c | 10 +-- tests/check-qnum.c | 28 tests/check-qobject.c | 2 +- tests/check-qstring.c | 10 +-- tests/cpu-plug-test.c | 4 +- tests/device-introspect-test.c | 24 +++ tests/drive_del-test.c | 4 +- tests/libqos/libqos.c | 8 +-- tests/libqos/pci-pc.c | 2 +- tests/libqtest.c| 24 +++ tests/machine-none-test.c | 2 +- tests/migration-test.c | 24 +++ tests/numa-test.c | 16 ++--- tests/pvpanic-test.c| 2 +- tests/q35-test.c| 2 +- tests/qmp-test.c| 38 +- tests/qom-test.c| 8 +-- tests/tco-test.c| 12 ++-- tests/test-char.c | 2 +- tests/test-keyval.c | 82 ++--- tests/test-netfilter.c | 26 +++ tests/test-qemu-opts.c | 14 ++-- tests/test-qga.c| 76 ++-- tests/test-qmp-cmds.c | 24 +++ tests/test-qmp-event.c
[Qemu-devel] [PATCH v3 2/4] qobject: introduce QObjectCommon
By moving the common fields to a QObjectCommon, QObject can be a type which also has a 'base' QObjectCommon field. This allows to write a generic QOBJECT() macro that will work with any QObject type, including QObject itself. The container_of() macro ensures that the object to cast has a QObjectCommon base field, give me some type safety guarantees. However, for it to work properly, all QObject types must have 'base' at offset 0 (which is ensured by static checking from previous patch) Signed-off-by: Marc-André Lureau --- include/qapi/qmp/qbool.h | 2 +- include/qapi/qmp/qdict.h | 2 +- include/qapi/qmp/qlist.h | 2 +- include/qapi/qmp/qnull.h | 2 +- include/qapi/qmp/qnum.h| 2 +- include/qapi/qmp/qobject.h | 29 ++--- include/qapi/qmp/qstring.h | 2 +- qobject/qobject.c | 12 ++-- tests/check-qdict.c| 6 +++--- 9 files changed, 33 insertions(+), 26 deletions(-) diff --git a/include/qapi/qmp/qbool.h b/include/qapi/qmp/qbool.h index b9a44a1bfe..38e6287973 100644 --- a/include/qapi/qmp/qbool.h +++ b/include/qapi/qmp/qbool.h @@ -17,7 +17,7 @@ #include "qapi/qmp/qobject.h" struct QBool { -QObject base; +struct QObjectCommon base; bool value; }; diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h index 2cc3e906f7..3e54df2291 100644 --- a/include/qapi/qmp/qdict.h +++ b/include/qapi/qmp/qdict.h @@ -25,7 +25,7 @@ typedef struct QDictEntry { } QDictEntry; struct QDict { -QObject base; +struct QObjectCommon base; size_t size; QLIST_HEAD(,QDictEntry) table[QDICT_BUCKET_MAX]; }; diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h index 5c673acb06..5cf26554ae 100644 --- a/include/qapi/qmp/qlist.h +++ b/include/qapi/qmp/qlist.h @@ -22,7 +22,7 @@ typedef struct QListEntry { } QListEntry; struct QList { -QObject base; +struct QObjectCommon base; QTAILQ_HEAD(,QListEntry) head; }; diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h index c992ee2ae1..32d4ce98af 100644 --- a/include/qapi/qmp/qnull.h +++ b/include/qapi/qmp/qnull.h @@ -16,7 +16,7 @@ #include "qapi/qmp/qobject.h" struct QNull { -QObject base; +struct QObjectCommon base; }; extern QNull qnull_; diff --git a/include/qapi/qmp/qnum.h b/include/qapi/qmp/qnum.h index 3e47475b2c..3326547f2c 100644 --- a/include/qapi/qmp/qnum.h +++ b/include/qapi/qmp/qnum.h @@ -45,7 +45,7 @@ typedef enum { * convert under the hood. */ struct QNum { -QObject base; +struct QObjectCommon base; QNumKind kind; union { int64_t i64; diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h index 5206ff9ee1..96085b7dfa 100644 --- a/include/qapi/qmp/qobject.h +++ b/include/qapi/qmp/qobject.h @@ -34,13 +34,19 @@ #include "qapi/qapi-builtin-types.h" -struct QObject { +struct QObjectCommon { QType type; size_t refcnt; }; -/* Get the 'base' part of an object */ -#define QOBJECT(obj) (&(obj)->base) +struct QObject { +struct QObjectCommon base; +}; + +#define QOBJECT(x) ({ \ +typeof(x) __x = (x);\ +__x ? container_of(&(__x)->base, QObject, base) : NULL; \ +}) /* High-level interface for qobject_incref() */ #define QINCREF(obj) \ @@ -68,8 +74,8 @@ QEMU_BUILD_BUG_MSG(QTYPE__MAX != 7, static inline void qobject_init(QObject *obj, QType type) { assert(QTYPE_NONE < type && type < QTYPE__MAX); -obj->refcnt = 1; -obj->type = type; +obj->base.refcnt = 1; +obj->base.type = type; } /** @@ -77,8 +83,9 @@ static inline void qobject_init(QObject *obj, QType type) */ static inline void qobject_incref(QObject *obj) { -if (obj) -obj->refcnt++; +if (obj) { +obj->base.refcnt++; +} } /** @@ -101,8 +108,8 @@ void qobject_destroy(QObject *obj); */ static inline void qobject_decref(QObject *obj) { -assert(!obj || obj->refcnt); -if (obj && --obj->refcnt == 0) { +assert(!obj || obj->base.refcnt); +if (obj && --obj->base.refcnt == 0) { qobject_destroy(obj); } } @@ -112,8 +119,8 @@ static inline void qobject_decref(QObject *obj) */ static inline QType qobject_type(const QObject *obj) { -assert(QTYPE_NONE < obj->type && obj->type < QTYPE__MAX); -return obj->type; +assert(QTYPE_NONE < obj->base.type && obj->base.type < QTYPE__MAX); +return obj->base.type; } /** diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h index 30ae260a7f..dc50f6a0c7 100644 --- a/include/qapi/qmp/qstring.h +++ b/include/qapi/qmp/qstring.h @@ -16,7 +16,7 @@ #include "qapi/qmp/qobject.h" struct QString { -QObject base; +struct QObjectCommon base; char *string; size_t length; size_t capacity; diff --git a/qobject/qobject.c b/qobject/qobject.c index 87649c5be5..cf4b7e229e 100644 --- a/qobject/qobject.c +++ b/qobject/qobject.c @@ -37,9 +37,9 @@ static void
[Qemu-devel] [PATCH v3 1/4] qobject: ensure base is at offset 0
All QObject types have the base QObject as first field. This allows to simplify qobject_to() and will allow further simplification in following patch. Signed-off-by: Marc-André Lureau --- include/qapi/qmp/qobject.h | 5 ++--- qobject/qobject.c | 9 + 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h index e022707578..5206ff9ee1 100644 --- a/include/qapi/qmp/qobject.h +++ b/include/qapi/qmp/qobject.h @@ -61,9 +61,8 @@ struct QObject { QEMU_BUILD_BUG_MSG(QTYPE__MAX != 7, "The QTYPE_CAST_TO_* list needs to be extended"); -#define qobject_to(type, obj) ({ \ -QObject *_tmp = qobject_check_type(obj, glue(QTYPE_CAST_TO_, type)); \ -_tmp ? container_of(_tmp, type, base) : (type *)NULL; }) +#define qobject_to(type, obj) \ +((type *)qobject_check_type(obj, glue(QTYPE_CAST_TO_, type))) /* Initialize an object to default values */ static inline void qobject_init(QObject *obj, QType type) diff --git a/qobject/qobject.c b/qobject/qobject.c index 23600aa1c1..87649c5be5 100644 --- a/qobject/qobject.c +++ b/qobject/qobject.c @@ -16,6 +16,15 @@ #include "qapi/qmp/qlist.h" #include "qapi/qmp/qstring.h" +QEMU_BUILD_BUG_MSG( +offsetof(QNull, base) != 0 || +offsetof(QNum, base) != 0 || +offsetof(QString, base) != 0 || +offsetof(QDict, base) != 0 || +offsetof(QList, base) != 0 || +offsetof(QBool, base) != 0, +"base qobject must be at offset 0"); + static void (*qdestroy[QTYPE__MAX])(QObject *) = { [QTYPE_NONE] = NULL, /* No such object exists */ [QTYPE_QNULL] = NULL, /* qnull_ is indestructible */ -- 2.17.0.rc1.36.gcedb63ea2f
Re: [Qemu-devel] [PATCH 14/16] cputlb: remove tb_lock from tlb_flush functions
Emilio G. Cota writes: > The acquisition of tb_lock was added when the async tlb_flush > was introduced in e3b9ca810 ("cputlb: introduce tlb_flush_* async work.") > > tb_lock was there to allow us to do memset() on the tb_jmp_cache's. > However, since f3ced3c5928 ("tcg: consistently access cpu->tb_jmp_cache > atomically") all accesses to tb_jmp_cache are atomic, so tb_lock > is not needed here. Get rid of it. \o/ Reviewed-by: Alex Bennée > > Signed-off-by: Emilio G. Cota > --- > accel/tcg/cputlb.c | 8 > 1 file changed, 8 deletions(-) > > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c > index 0543903..f5c3a09 100644 > --- a/accel/tcg/cputlb.c > +++ b/accel/tcg/cputlb.c > @@ -125,8 +125,6 @@ static void tlb_flush_nocheck(CPUState *cpu) > atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1); > tlb_debug("(count: %zu)\n", tlb_flush_count()); > > -tb_lock(); > - > memset(env->tlb_table, -1, sizeof(env->tlb_table)); > memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table)); > cpu_tb_jmp_cache_clear(cpu); > @@ -135,8 +133,6 @@ static void tlb_flush_nocheck(CPUState *cpu) > env->tlb_flush_addr = -1; > env->tlb_flush_mask = 0; > > -tb_unlock(); > - > atomic_mb_set(&cpu->pending_tlb_flush, 0); > } > > @@ -180,8 +176,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, > run_on_cpu_data data) > > assert_cpu_is_self(cpu); > > -tb_lock(); > - > tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask); > > for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) { > @@ -197,8 +191,6 @@ static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, > run_on_cpu_data data) > cpu_tb_jmp_cache_clear(cpu); > > tlb_debug("done\n"); > - > -tb_unlock(); > } > > void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap) -- Alex Bennée
Re: [Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB
Emilio G. Cota writes: > Use the recently-gained QHT feature of returning the matching TB if it > already exists. This allows us to get rid of the lookup we perform > right after acquiring tb_lock. > > Suggested-by: Richard Henderson > Signed-off-by: Emilio G. Cota > --- > accel/tcg/cpu-exec.c | 14 ++ > accel/tcg/translate-all.c | 47 > ++- > 2 files changed, 40 insertions(+), 21 deletions(-) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 7c83887..8aed38c 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu) > if (tb == NULL) { > mmap_lock(); > tb_lock(); > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); > -if (likely(tb == NULL)) { > -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); > -} > +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); tb_gen_code needs to be renamed to reflect it's semantics. tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the generation. > tb_unlock(); > mmap_unlock(); > } > @@ -396,14 +393,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu, > tb_lock(); > acquired_tb_lock = true; > > -/* There's a chance that our desired tb has been translated while > - * taking the locks so we check again inside the lock. > - */ > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); > -if (likely(tb == NULL)) { > -/* if no translated code available, then translate it now */ > -tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); > -} > +tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); > > mmap_unlock(); > /* We add the TB in the virtual pc hash table for the fast lookup */ > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > index 82832ef..dbe6c12 100644 > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, > TranslationBlock *tb, > * (-1) to indicate that only one page contains the TB. > * > * Called with mmap_lock held for user-mode emulation. > + * > + * Returns @tb or an existing TB that matches @tb. That's just confusing to read. So this returns a TB like the @tb we passed in but actually a different one matching the same conditions? > */ > -static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, > - tb_page_addr_t phys_page2) > +static TranslationBlock * > +tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, > + tb_page_addr_t phys_page2) > { > PageDesc *p; > PageDesc *p2 = NULL; > +void *existing_tb; > uint32_t h; > > assert_memory_lock(); > @@ -1516,6 +1520,11 @@ static void tb_link_page(TranslationBlock *tb, > tb_page_addr_t phys_pc, > /* > * Add the TB to the page list. > * To avoid deadlock, acquire first the lock of the lower-addressed page. > + * We keep the locks held until after inserting the TB in the hash table, > + * so that if the insertion fails we know for sure that the TBs are still > + * in the page descriptors. > + * Note that inserting into the hash table first isn't an option, since > + * we can only insert TBs that are fully initialized. > */ > p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1); > if (likely(phys_page2 == -1)) { > @@ -1535,21 +1544,33 @@ static void tb_link_page(TranslationBlock *tb, > tb_page_addr_t phys_pc, > tb_page_add(p2, tb, 1, phys_page2); > } > > +/* add in the hash table */ > +h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK, > + tb->trace_vcpu_dstate); > +existing_tb = qht_insert(&tb_ctx.htable, tb, h); modulo comments about qht_insert API earlier in the series. > + > +/* remove TB from the page(s) if we couldn't insert it */ > +if (unlikely(existing_tb)) { > +tb_page_remove(p, tb); > +invalidate_page_bitmap(p); > +if (p2) { > +tb_page_remove(p2, tb); > +invalidate_page_bitmap(p2); > +} > +tb = existing_tb; > +} > + > if (p2) { > page_unlock(p2); > } > page_unlock(p); > > -/* add in the hash table */ > -h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK, > - tb->trace_vcpu_dstate); > -qht_insert(&tb_ctx.htable, tb, h); > - > #ifdef CONFIG_USER_ONLY > if (DEBUG_TB_CHECK_GATE) { > tb_page_check(); > } > #endif > +return tb; > } > > /* Called with mmap_lock held for user mode emulation. */ > @@ -1558,7 +1579,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, >uint32_t fl
[Qemu-devel] [PATCH] sys_membarrier: fix up include directives
Our rule right now is to use <> for external headers only. util/sys_membarrier.c violates that. Fix it up. Signed-off-by: Bruce Rogers --- util/sys_membarrier.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/util/sys_membarrier.c b/util/sys_membarrier.c index 8dcb53e63e..1362c0c4c5 100644 --- a/util/sys_membarrier.c +++ b/util/sys_membarrier.c @@ -6,9 +6,9 @@ * Author: Paolo Bonzini */ -#include -#include -#include +#include "qemu/osdep.h" +#include "qemu/sys_membarrier.h" +#include "qemu/error-report.h" #ifdef CONFIG_LINUX #include -- 2.16.2
Re: [Qemu-devel] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage
29.03.2018 17:03, Max Reitz wrote: On 2018-03-29 10:08, Vladimir Sementsov-Ogievskiy wrote: 28.03.2018 17:53, Max Reitz wrote: On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote: [...] isn't it because a lot of cat processes? will check, update loop to i=0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat; done Hmm... I know I tried to kill all of the cats, but for some reason that didn't really help yesterday. Seems to help now, for 2.12.0-rc0 at least (that is, before this series). reproduced with killing... (without these series, just on master) After the whole series, I still get a lot of failures in 169 (mismatching bitmap hash, mostly). And interestingly, if I add an abort(): diff --git a/block/qcow2.c b/block/qcow2.c index 486f3e83b7..9204c1c0ac 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1481,6 +1481,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, } if (bdrv_dirty_bitmap_next(bs, NULL)) { + abort(); /* It's some kind of reopen with already existing dirty bitmaps. There * are no known cases where we need loading bitmaps in such situation, * so it's safer don't load them. Then this fires for a couple of test cases of 169 even without the third patch of this series. I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration adds or something? Then this would be the wrong condition, because I guess we still want to load the bitmaps that are in the qcow2 file. I'm not sure whether bdrv_has_readonly_bitmaps() is the correct condition then, either, though. Maybe let's take a step back: We want to load all the bitmaps from the file exactly once, and that is when it is opened the first time. Or that's what I would have thought... Is that even correct? Why do we load the bitmaps when the device is inactive anyway? Shouldn't we load them only once the device is activated? Hmm, not sure. May be, we don't need. But we anyway need to load them, when opening read-only, and we should correspondingly reopen in this case. Yeah, well, yeah, but the current state seems just wrong. Apparently there are cases where a qcow2 node may have bitmaps before we try to load them from the file, so the current condition doesn't work. Furthermore, it seems like the current "state machine" is too complex so we don't know which cases are possible anymore and what to do when. So the first thing we could do is add a field to the BDRVQCow2State that tells us whether the bitmaps have been loaded already or not. If not, we invoke qcow2_load_dirty_bitmaps() and set the value to true. If the value was true already and the BDS is active and R/W now, we call qcow2_reopen_bitmaps_rw_hint(). That should solve one problem. good idea, will do. The other problem of course is the question whether we should call qcow2_load_dirty_bitmaps() at all while the drive is still inactive. You know the migration model better than me, so I'm asking this question to you. We can phrase it differently: Do we need to load the bitmaps before the drive is activated? Now I think that we don't need. At least, we don't have such cases in Virtuozzo (I hope :). Why did I doubt: 1. We have cases, when we want to start vm as inactive, to be able to export it's drive as NBD export, push some data to it and then start the VM (which involves activating) 2. We have cases, when we want to start vm stopped and operate on dirty bitmaps. If just open all images in inactive mode until vm start, it looks like we need bitmaps in inactive mode (for 2.). But it looks like wrong approach anyway. Firstly, I tried to solve (1.) by simply inactivate_all() in case of start vm in paused mode, but it breaks at least (2.), so finally, I solved (1.) by an approach similar with "-incoming defer". So, we have inactive mode in two cases: - incoming migration - push data to vm before start and, in these cases, we don't need to load dirty-bitmaps. Also, inconsistency: now, we remove persistent bitmaps on inactivate. So, it is inconsistent to load the in inactive mode. Ok, I'll try to respin. Max about 169, how often is it reproducible for you? -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH 11/16] translate-all: add page_collection assertions
Emilio G. Cota writes: > The appended adds assertions to make sure we do not longjmp with page > locks held. Some notes: > > - user-mode has nothing to check, since page_locks are !user-mode only. > > - The checks only apply to page collections, since these have relatively > complex callers. > > - Some simple page_lock/unlock callers have been left unchecked -- > namely page_lock_tb, tb_phys_invalidate and tb_link_page. As mentioned in the previous email I think there is a need for assert_page_locked() at least for places currently still using assert_memory_locked(). It could certainly be DEBUG_TCG only case though. Otherwise: Reviewed-by: Alex Bennée > > Signed-off-by: Emilio G. Cota > --- > accel/tcg/cpu-exec.c | 1 + > accel/tcg/translate-all.c | 22 ++ > include/exec/exec-all.h | 8 > 3 files changed, 31 insertions(+) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 8c68727..7c83887 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -271,6 +271,7 @@ void cpu_exec_step_atomic(CPUState *cpu) > tcg_debug_assert(!have_mmap_lock()); > #endif > tb_lock_reset(); > +assert_page_collection_locked(false); > } > > if (in_exclusive_region) { > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > index 07527d5..82832ef 100644 > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -605,6 +605,24 @@ void page_collection_unlock(struct page_collection *set) > { } > #else /* !CONFIG_USER_ONLY */ > > +#ifdef CONFIG_DEBUG_TCG > +static __thread bool page_collection_locked; > + > +void assert_page_collection_locked(bool val) > +{ > +tcg_debug_assert(page_collection_locked == val); > +} > + > +static inline void set_page_collection_locked(bool val) > +{ > +page_collection_locked = val; > +} > +#else > +static inline void set_page_collection_locked(bool val) > +{ > +} > +#endif /* !CONFIG_DEBUG_TCG */ > + > static inline void page_lock(PageDesc *pd) > { > qemu_spin_lock(&pd->lock); > @@ -677,6 +695,7 @@ static void do_page_entry_lock(struct page_entry *pe) > page_lock(pe->pd); > g_assert(!pe->locked); > pe->locked = true; > +set_page_collection_locked(true); > } > > static gboolean page_entry_lock(gpointer key, gpointer value, gpointer data) > @@ -769,6 +788,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t > end) > set->tree = g_tree_new_full(tb_page_addr_cmp, NULL, NULL, > page_entry_destroy); > set->max = NULL; > +assert_page_collection_locked(false); > > retry: > g_tree_foreach(set->tree, page_entry_lock, NULL); > @@ -787,6 +807,7 @@ page_collection_lock(tb_page_addr_t start, tb_page_addr_t > end) > page_trylock_add(set, tb->page_addr[1]))) { > /* drop all locks, and reacquire in order */ > g_tree_foreach(set->tree, page_entry_unlock, NULL); > +set_page_collection_locked(false); > goto retry; > } > } > @@ -799,6 +820,7 @@ void page_collection_unlock(struct page_collection *set) > /* entries are unlocked and freed via page_entry_destroy */ > g_tree_destroy(set->tree); > g_free(set); > +set_page_collection_locked(false); > } > > #endif /* !CONFIG_USER_ONLY */ > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h > index aeaa127..7911e69 100644 > --- a/include/exec/exec-all.h > +++ b/include/exec/exec-all.h > @@ -431,6 +431,14 @@ void tb_lock(void); > void tb_unlock(void); > void tb_lock_reset(void); > > +#if !defined(CONFIG_USER_ONLY) && defined(CONFIG_DEBUG_TCG) > +void assert_page_collection_locked(bool val); > +#else > +static inline void assert_page_collection_locked(bool val) > +{ > +} > +#endif > + > #if !defined(CONFIG_USER_ONLY) > > struct MemoryRegion *iotlb_to_region(CPUState *cpu, -- Alex Bennée
Re: [Qemu-devel] [PATCH 10/16] translate-all: use per-page locking in !user-mode
Emilio G. Cota writes: > Groundwork for supporting parallel TCG generation. > > Instead of using a global lock (tb_lock) to protect changes > to pages, use fine-grained, per-page locks in !user-mode. > User-mode stays with mmap_lock. > > Sometimes changes need to happen atomically on more than one > page (e.g. when a TB that spans across two pages is > added/invalidated, or when a range of pages is invalidated). > We therefore introduce struct page_collection, which helps > us keep track of a set of pages that have been locked in > the appropriate locking order (i.e. by ascending page index). > > This commit first introduces the structs and the function helpers, > to then convert the calling code to use per-page locking. Note > that tb_lock is not removed yet. > > While at it, rename tb_alloc_page to tb_page_add, which pairs with > tb_page_remove. > > Signed-off-by: Emilio G. Cota > --- > accel/tcg/translate-all.c | 432 > +- > accel/tcg/translate-all.h | 3 + > include/exec/exec-all.h | 3 +- > 3 files changed, 396 insertions(+), 42 deletions(-) > > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > index 4cb03f1..07527d5 100644 > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -112,8 +112,55 @@ typedef struct PageDesc { > #else > unsigned long flags; > #endif > +#ifndef CONFIG_USER_ONLY > +QemuSpin lock; > +#endif > } PageDesc; > > +/** > + * struct page_entry - page descriptor entry > + * @pd: pointer to the &struct PageDesc of the page this entry represents > + * @index: page index of the page > + * @locked: whether the page is locked > + * > + * This struct helps us keep track of the locked state of a page, without > + * bloating &struct PageDesc. > + * > + * A page lock protects accesses to all fields of &struct PageDesc. > + * > + * See also: &struct page_collection. > + */ > +struct page_entry { > +PageDesc *pd; > +tb_page_addr_t index; > +bool locked; > +}; > + > +/** > + * struct page_collection - tracks a set of pages (i.e. &struct page_entry's) > + * @tree: Binary search tree (BST) of the pages, with key == page index > + * @max:Pointer to the page in @tree with the highest page index > + * > + * To avoid deadlock we lock pages in ascending order of page index. > + * When operating on a set of pages, we need to keep track of them so that > + * we can lock them in order and also unlock them later. For this we collect > + * pages (i.e. &struct page_entry's) in a binary search @tree. Given that the > + * @tree implementation we use does not provide an O(1) operation to obtain > the > + * highest-ranked element, we use @max to keep track of the inserted page > + * with the highest index. This is valuable because if a page is not in > + * the tree and its index is higher than @max's, then we can lock it > + * without breaking the locking order rule. > + * > + * Note on naming: 'struct page_set' would be shorter, but we already have a > few > + * page_set_*() helpers, so page_collection is used instead to avoid > confusion. > + * > + * See also: page_collection_lock(). > + */ > +struct page_collection { > +GTree *tree; > +struct page_entry *max; > +}; > + > /* list iterators for lists of tagged pointers in TranslationBlock */ > #define TB_FOR_EACH_TAGGED(head, tb, n, field) \ > for (n = (head) & 1,\ > @@ -510,6 +557,15 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, > int alloc) > return NULL; > } > pd = g_new0(PageDesc, V_L2_SIZE); > +#ifndef CONFIG_USER_ONLY > +{ > +int i; > + > +for (i = 0; i < V_L2_SIZE; i++) { > +qemu_spin_init(&pd[i].lock); > +} > +} > +#endif > existing = atomic_cmpxchg(lp, NULL, pd); > if (unlikely(existing)) { > g_free(pd); > @@ -525,6 +581,228 @@ static inline PageDesc *page_find(tb_page_addr_t index) > return page_find_alloc(index, 0); > } > > +/* In user-mode page locks aren't used; mmap_lock is enough */ > +#ifdef CONFIG_USER_ONLY > +static inline void page_lock(PageDesc *pd) > +{ } > + > +static inline void page_unlock(PageDesc *pd) > +{ } > + > +static inline void page_lock_tb(const TranslationBlock *tb) > +{ } > + > +static inline void page_unlock_tb(const TranslationBlock *tb) > +{ } > + > +struct page_collection * > +page_collection_lock(tb_page_addr_t start, tb_page_addr_t end) > +{ > +return NULL; > +} > + > +void page_collection_unlock(struct page_collection *set) > +{ } > +#else /* !CONFIG_USER_ONLY */ > + > +static inline void page_lock(PageDesc *pd) > +{ > +qemu_spin_lock(&pd->lock); > +} > + > +static inline void page_unlock(PageDesc *pd) > +{ > +qemu_spin_unlock(&pd->lock); > +} > + > +/* lock the page(s) of a TB in the correct acquisition order */ > +static inline void page_lock_tb(const Translatio
Re: [Qemu-devel] [PATCH qemu] vfio: Print address space address when cannot map MMIO for DMA
Hi Alex, On 29/03/18 00:13, Alex Williamson wrote: > On Wed, 28 Mar 2018 23:03:23 +0200 > Auger Eric wrote: > >> Hi Alexey, Alex, >> On 22/03/18 09:18, Alexey Kardashevskiy wrote: >>> The 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") added >>> an error message if a passed memory section address or size is not aligned >>> to the minimal IOMMU page size. However although it checks >>> offset_within_address_space for the alignment, offset_within_region is >>> printed instead which makes it harder to find out what device caused >>> the message so this replaces offset_within_region with >>> offset_within_address_space. >>> >>> While we are here, this replaces '..' with 'size=' (as the second number >>> is a size, not an end offset) and adds a memory region name. >>> >>> Fixes: 567b5b309abe "vfio/pci: Relax DMA map errors for MMIO regions" >>> Signed-off-by: Alexey Kardashevskiy >> The patch indeed fixes the reported format issues. >> >> However I have some other concerns with the info that is reported to the >> end-user. See below. >> >> Assigning an e1000e device with a 64kB host, here are the traces I get: >> >> Region XXX is not aligned to 0x1 and cannot be mapped for DMA >> >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0xffb0 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a0050 size=0x3fb0 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100a4808 size=0xb7f8 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e0050 size=0x3fb0 >> "0004:01:00.0 BAR 3 mmaps[0]" 0x100e4808 size=0xb7f8 >> >> It took me some time to understand what happens but here is now my >> understanding: >> >> 1) When looking at vfio_pci_write_config() pdev->io_regions[bar].addr = >> bar_addr in vfio_sub_page_bar_update_mapping() I see the following values: >> >> UNMAPPED -> 0x0 ->UNMAPPED -> 0x100a -> UNMAPPED -> 0x100a -> >> UNMAPPED -> 0x100e >> >> vfio_sub_page_bar_update_mapping() create mrs with base bar at >> 0x100a and 0x100e successively, hence the >> vfio_listener_region_add on 0x100a. Indeed, 0x0-0x50 msix-table mmio >> region induces some memory section at 0x100a0050 and 0x100e50 successively. >> >> However this is confusing for the end-user who only has access to the >> final mapping (0x100e) through lspi [1]. >> >> 2) The changes in the size (0x3fb0 <-> 0xffb0) relate to the extension >> of the 16kB bar to 64kB in vfio_sub_page_bar_update_mapping >> >> 3) Also it happens that I have a virtio-scsi-pci device that is put just >> after the BAR3 at 0x100a4000 and 0x100e4000 successively. The device has >> its own msi-table and pba mmio regions[2]. As mmaps[0] is extended to >> 64kB (with prio 0), we have those MMIO regions which result in new >> memory sections, which cause vfio_listener_region_add calls. This >> typically explains why we get a warning on 0x100e4808 (0xb7f8). By the >> way I don't get why we don't have a trace for "0004:01:00.0 BAR 3 >> mmaps[0]" 0x100e4040 size=0x7c0, ie. mmaps[0] space between >> virtio-scsi-pci msic-table and pba. >> >> So at the end of the day, my fear is all those info become really >> frightening and confusing for the end-user and even not relevant >> (0x100a stuff). So I would rather simply remove the trace in 2.12 >> until we find a place where we could generate a clear hint for the >> end-user, suggesting to relocate the msix bar. >> >> Thoughts? > > Yep, I think that's probably the right approach. Everything works as > it should and nothing has worse access in 2.12 than it did in 2.11, > there's only the opportunity to make things better with msi-x > relocation and I don't think we want to error on the side of reporting > too many errors that users cannot understand in an attempt to advise > them of an unsupported option that might be better. Let's remove the > error report for 2.12 and think about how we could make a concise > suggestion, once, while initializing the device. Who's posting the > patch? Thanks, > > Alex > > PS - Why isn't the firmware/kernel on aarch64 making an attempt to > align PCI resources on page boundaries? so according to Laszlo's input I understand virtio-scsi-pci region 1 is 4K and the spec only mandates a natural alignment. Does > pci=realloc,resource_alignment=pci:: change it? (I'm > not sure if PCI_ANY_ID works for that option) no, it does not. Thanks Eric >
Re: [Qemu-devel] [PATCH for-2.12 v5] iotests: Test abnormally large size in compressed cluster descriptor
On 03/29/2018 07:07 AM, Alberto Garcia wrote: L2 entries for compressed clusters have a field that indicates the number of sectors used to store the data in the image. That's however not the size of the compressed data itself, just the number of sectors where that data is located. The actual data size is usually not a multiple of the sector size, and therefore cannot be represented with this field. One possible task for the future is to make 'qemu-img check' verify the sizes of the compressed clusters, by trying to decompress the data and checking that the size stored in the L2 entry is correct. Is it still worth trying to add such a check in 2.12? (Probably not - the bug has been there "since the beginning", so it's not a regression and thus not a showstopper if it stays there for another release) Signed-off-by: Alberto Garcia --- v5: Use 'write -c' instead of 'write' followed by 'convert' [Max] Add TODO comment explaining that the size of compressed clusters should also be corrected when it's too large in order to avoid referencing other unrelated clusters. Thanks, those changes look good. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds
On Thu, 29 Mar 2018 11:37:31 +0200 Thomas Huth wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c > index 817e7f5..cdb66f4 100644 > --- a/pc-bios/s390-ccw/virtio.c > +++ b/pc-bios/s390-ccw/virtio.c > @@ -14,7 +14,7 @@ > #include "virtio-scsi.h" > #include "bswap.h" > > -#define VRING_WAIT_REPLY_TIMEOUT 3 > +#define VRING_WAIT_REPLY_TIMEOUT 30 > > static VRing block[VIRTIO_MAX_VQS]; > static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS] Thanks, merged this and a rebuild into s390-fixes.
Re: [Qemu-devel] [PATCH for-2.12 v4] iotests: Test abnormally large size in compressed cluster descriptor
On 2018-03-29 11:39, Alberto Garcia wrote: > On Wed 28 Mar 2018 07:34:15 PM CEST, Max Reitz wrote: >>> diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122 >>> index 45b359c2ba..5b9593016c 100755 >>> --- a/tests/qemu-iotests/122 >>> +++ b/tests/qemu-iotests/122 >> >> Not sure if 122 is the right file for this... >> >> Or, let me rephrase, it does look to me like it is not the right file. >> But on the other hand, I don't see a more suitable file. > > Yeah I was tempted to create a new one, but in the end I decided to put > it there. We can still create a new one if you feel strongly about it. > >>> echo >>> +echo "=== Corrupted size field in compressed cluster descriptor ===" >>> +echo >>> +# Create an empty image, fill half of it with data and compress it. >>> +# The L2 entries of the two compressed clusters are located at >>> +# 0x80 and 0x88, their original values are 0x400800a0 >>> +# and 0x400800a00802 (5 sectors for compressed data each). >>> +TEST_IMG="$TEST_IMG".1 _make_test_img 8M >>> +$QEMU_IO -c "write -P 0x11 0 4M" "$TEST_IMG".1 2>&1 | _filter_qemu_io | >>> _filter_testdir >>> +$QEMU_IMG convert -c -O qcow2 -o cluster_size=2M "$TEST_IMG".1 "$TEST_IMG" >> >> Why not just use "write -c" and drop the convert? (You'd have to use >> two writes, though, one for each cluster.) > > Yeah, it's also possible, you have to do it in the same qemu-io command > though, else it will allocate two host clusters. But yes, I can change > that. > >>> +# Reduce size of compressed data to 4 sectors: this corrupts the image. >>> +poke_file "$TEST_IMG" $((0x80)) "\x40\x06" >>> +$QEMU_IO -c "read -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | >>> _filter_testdir >>> + >>> +# 'qemu-img check' however doesn't see anything wrong because it >>> +# doesn't try to decompress the data and the refcounts are consistent. >>> +# TODO: update qemu-img so this can be detected >>> +_check_test_img >>> + >>> +# Increase size of compressed data to the maximum (8192 sectors). >>> +# This makes QEMU read more data (8192 sectors instead of 5, host >>> +# addresses [0xa0, 0xdf]), but the decompression algorithm >>> +# stops once we have enough to restore the uncompressed cluster, so >>> +# the rest of the data is ignored. >>> +poke_file "$TEST_IMG" $((0x80)) "\x7f\xfe" >>> +# Do it also for the second compressed cluster (L2 entry at 0x88). >>> +# In this case the compressed data would span 3 host clusters >>> +# (host addresses: [0xa00802, 0xe00801]) >>> +poke_file "$TEST_IMG" $((0x88)) "\x7f\xfe" >>> + >>> +# Here the image is too small so we're asking QEMU to read beyond the >>> +# end of the image. >>> +$QEMU_IO -c "read -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | >>> _filter_testdir >>> +# But if we grow the image we won't be reading beyond its end anymore. >>> +$QEMU_IO -c "write -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | >>> _filter_testdir >>> +$QEMU_IO -c "read -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | >>> _filter_testdir >> >> Both reads result in exactly the same output, though, so it doesn't >> seem like qemu cares. >> >> (This is the reason I'm not merging this patch now, because that looks >> a bit fishy.) > > In this example the size stored on the compressed cluster descriptor is > corrupted and larger than the size of the compressed data on disk. > > In the first read the image is too small so this makes QEMU read not > only past the end of the compressed data, but also past the end of the > qcow2 image itself. > > In the second read the qcow2 image is larger so QEMU reads actual data > from the subsequent clusters. > > It doesn't make much difference. In the first case the buffer is padded > with zeroes and in the second with data from other clusters, but QEMU > only uses the data needed to restore the original uncompressed cluster > and the rest is ignored. But I thought I could still test both cases, > that's why there's two reads. Ah, OK. I would have preferred a comment then so that people know it's just expected to work. >>> +# The refcount data is however wrong because due to the increased size >>> +# of the compressed data it now reaches the following host clusters. >>> +# This can be repaired by qemu-img check. >> >> The OFLAG_COPIED repair looks a bit interesting, but, er, well. >> >> Max >> >> (Since a compressed cluster does not correspond 1:1 to a host cluster, >> you cannot say that a compressed cluster has a refcount -- only host >> clusters are refcounted. So what is it supposed to mean that a >> compressed cluster has a refcount of 1? > > Compressed clusters never have OFLAG_COPIED and we treat it as an error > if they do (see check_refcounts_l2()), Interesting. Wonder why the qcow2 specification doesn't mention that... >but their host clusters still > have a reference count. > > A host cluster with 4 compressed clusters has a reference count of 4. > > A compr
[Qemu-devel] [PATCH v3 0/4] RFC: simplify qobject refcount
Hi, This series aims to get rid of the distinction between QObject, that must use qobject_incref/qobject_decref and its various derived types that have to use QINCREF/QDECREF. Instead, replace it with qobject_ref/qobject_unref for all types. v3: after v2 review with Eric and Paolo - fix clang ubsan warning when a null pointer is given to QOBJECT. - add a patch to make qobject_ref() assert on null pointer, and return the same pointer, simplifying some code. v2: - use the QObjectCommon base approach suggested by Paolo and Eric - remove need for QEMU_GENERIC Marc-André Lureau (4): qobject: ensure base is at offset 0 qobject: introduce QObjectCommon qobject: replace qobject_incref/QINCREF qobject_decref/QDECREF qobject: modify qobject_ref() to assert on NULL and return obj scripts/qapi/events.py | 2 +- include/qapi/qmp/qbool.h| 2 +- include/qapi/qmp/qdict.h| 2 +- include/qapi/qmp/qlist.h| 2 +- include/qapi/qmp/qnull.h| 4 +- include/qapi/qmp/qnum.h | 2 +- include/qapi/qmp/qobject.h | 75 +++- include/qapi/qmp/qstring.h | 2 +- block.c | 82 ++--- block/blkdebug.c| 7 +- block/blkverify.c | 8 +-- block/crypto.c | 4 +- block/gluster.c | 4 +- block/iscsi.c | 2 +- block/nbd.c | 4 +- block/nfs.c | 4 +- block/null.c| 3 +- block/nvme.c| 3 +- block/parallels.c | 4 +- block/qapi.c| 2 +- block/qcow.c| 8 +-- block/qcow2.c | 8 +-- block/qed.c | 4 +- block/quorum.c | 4 +- block/rbd.c | 14 ++-- block/sheepdog.c| 12 ++-- block/snapshot.c| 4 +- block/ssh.c | 4 +- block/vdi.c | 2 +- block/vhdx.c| 4 +- block/vpc.c | 4 +- block/vvfat.c | 2 +- block/vxhs.c| 2 +- blockdev.c | 16 ++--- hw/i386/acpi-build.c| 12 ++-- hw/ppc/spapr_drc.c | 2 +- hw/usb/xen-usb.c| 4 +- migration/migration.c | 4 +- migration/qjson.c | 2 +- monitor.c | 57 +++ qapi/qapi-dealloc-visitor.c | 4 +- qapi/qmp-dispatch.c | 6 +- qapi/qobject-input-visitor.c| 10 ++- qapi/qobject-output-visitor.c | 11 ++- qemu-img.c | 18 ++--- qemu-io.c | 6 +- qga/main.c | 12 ++-- qmp.c | 4 +- qobject/json-parser.c | 10 +-- qobject/qdict.c | 49 + qobject/qjson.c | 2 +- qobject/qlist.c | 4 +- qobject/qobject.c | 21 -- qom/object.c| 16 ++--- qom/object_interfaces.c | 2 +- target/ppc/translate_init.c | 2 +- target/s390x/cpu_models.c | 2 +- tests/ahci-test.c | 6 +- tests/check-qdict.c | 106 ++-- tests/check-qjson.c | 84 +++--- tests/check-qlist.c | 8 +-- tests/check-qlit.c | 10 +-- tests/check-qnull.c | 10 +-- tests/check-qnum.c | 28 tests/check-qobject.c | 2 +- tests/check-qstring.c | 10 +-- tests/cpu-plug-test.c | 4 +- tests/device-introspect-test.c | 24 +++ tests/drive_del-test.c | 4 +- tests/libqos/libqos.c | 8 +-- tests/libqos/pci-pc.c | 2 +- tests/libqtest.c| 24 +++ tests/machine-none-test.c | 2 +- tests/migration-test.c | 24 +++ tests/numa-test.c | 16 ++--- tests/pvpanic-test.c| 2 +- tests/q35-test.c| 2 +- tests/qmp-test.c| 38 +- tests/qom-test.c| 8 +-- tests/tco-test.c| 12 ++-- tests/test-char.c | 2 +- tests/test-keyval.c | 82 ++--- tests/test-netfilter.c | 26 +++ tests/test-qemu-opts.c | 14 ++-- tests/test-qga.c| 76 ++-- tests/test-qmp-cmds.c | 24 +++ tests/test-qmp-event.c
[Qemu-devel] linux-user: missing synchronization in CPU object realization
Hello, I'm running xtensa linux-user application that rapidly creates a lot of threads and I observe the following assertion failure: (process:26953): GLib-CRITICAL **: g_hash_table_iter_next: assertion 'ri->version == ri->hash_table->version' failed ** ERROR:qemu/qom/object.c:1663:object_get_canonical_path_component: code should not be reached The backtrace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x7667c42a in __GI_abort () at abort.c:89 #2 0x779345a5 in g_assertion_message (domain=domain@entry=0x0, file=file@entry=0x557f8ef8 "/home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c", line=line@entry=1662, func=func@entry=0x557f96c0 <__func__.17879> "object_get_canonical_path_component", message=message@entry=0x7fffa80808a0 "code should not be reached") at ././glib/gtestutils.c:2432 #3 0x7793463a in g_assertion_message_expr (domain=0x0, file=0x557f8ef8 "/home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c", line=1662, func=0x557f96c0 <__func__.17879> "object_get_canonical_path_component", expr=) at ././glib/gtestutils.c:2455 #4 0x556c6cec in object_get_canonical_path_component (obj=0x7fffa80777d0) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1662 #5 0x556c6d1e in object_get_canonical_path (obj=0x7fffa80777d0) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1672 #6 0x556bc693 in device_set_realized (obj=0x7fffa80777d0, value=true, errp=0x77f020f8) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/hw/core/qdev.c:874 #7 0x556c75c4 in property_set_bool (obj=0x7fffa80777d0, v=0x7fffa8080750, name=0x557f889c "realized", opaque=0x7fffa8077660, errp=0x77f020f8) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1923 #8 0x556c58ea in object_property_set (obj=0x7fffa80777d0, v=0x7fffa8080750, name=0x557f889c "realized", errp=0x77f020f8) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1122 #9 0x556c8901 in object_property_set_qobject (obj=0x7fffa80777d0, value=0x7fffa80806a0, name=0x557f889c "realized", errp=0x77f020f8) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/qom-qobject.c:27 #10 0x556c5b62 in object_property_set_bool (obj=0x7fffa80777d0, value=true, name=0x557f889c "realized", errp=0x77f020f8) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/object.c:1188 #11 0x556c166f in cpu_create (typename=0x57cb0d20 "dc233c-xtensa-cpu") at /home/jcmvbkbc/ws/tensilica/qemu/qemu/qom/cpu.c:64 #12 0x5564ec0e in cpu_copy (env=0x5924d420) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/main.c:4121 #13 0x5565d4e1 in do_fork (env=0x5924d420, flags=4001536, newsp=829419520, parent_tidptr=829420792, newtls=829421856, child_tidptr=829420792) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/syscall.c:6350 #14 0x556664c7 in do_syscall (cpu_env=0x5924d420, num=116, arg1=4001536, arg2=829419520, arg3=829420792, arg4=829421856, arg5=829420792, arg6=829420688, arg7=0, arg8=0) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/syscall.c:10003 #15 0x5564e85b in cpu_loop (env=0x5924d420) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/main.c:3999 #16 0x5565d3d9 in clone_func (arg=0x7fffbcd0) at /home/jcmvbkbc/ws/tensilica/qemu/qemu/linux-user/syscall.c:6313 #17 0x769ee494 in start_thread (arg=0x77f04700) at pthread_create.c:333 #18 0x76730acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 This happens because multiple threads concurrently call object_property_set_bool to realize cpu object in cpu_create, which results in the following call chain: device_set_realized -> object_property_add_child -> object_property_add -> g_hash_table_insert inserting newly created CPUs as children to /unattached without any locking. I've put locking around the object_property_set_bool(OBJECT(cpu), true, "realized", &err); call in the cpu_create and that fixed this issue. Any suggestion regarding how to fix this properly? -- Thanks. -- Max
Re: [Qemu-devel] [PATCH for-2.12 v4] iotests: Test abnormally large size in compressed cluster descriptor
On 2018-03-28 19:58, Eric Blake wrote: > On 03/28/2018 12:34 PM, Max Reitz wrote: [...] >> The OFLAG_COPIED repair looks a bit interesting, but, er, well. >> >> Max >> >> (Since a compressed cluster does not correspond 1:1 to a host cluster, >> you cannot say that a compressed cluster has a refcount -- only host >> clusters are refcounted. So what is it supposed to mean that a >> compressed cluster has a refcount of 1? > > A compressed cluster may affect the refcount of multiple clusters: the > cluster that contains the initial offset, and the cluster that contains > any of the nb_sectors that did not fit in the first cluster. So, if I > have a 4k-cluster image (where each character is a sector), and where > the compressed clusters are nicely sector-aligned: > > |1---|2---|3---| > |AABB||CC--| > > Here, the L2 entry for A, B, and C each list nb_sectors of 5, as it > takes 6 sectors to list the entire image, but nb_sectors does not > include the sector that includes the original offset. The refcount for > cluster 1 is 2 (the full contents of compressed A and the first half of > compressed B); for cluster 2 is 2 (the second half of compressed B and > the first half of compressed C); and for cluster 3 is 1 (the second half > of compressed C). > > But what this patch is dealing with is when nb_sectors is larger than > required. With 4k sectors, qemu will never populate nb_clusters more > than 8 (if the output is not nicely aligned, and 4096 bytes compresses > down to only 4095, we can end up with 1 byte in the first sector, then 7 > complete sectors, then 510 bytes in a final sector, for 8 sectors beyond > the initial offset). But the qcow2 image is still valid even if the L2 > entry claims nb_sectors of 15; if that happens, then a compressed > cluster can now affect the refcount of 3 clusters rather than the usual > 1 or 2. > >> >> I'd argue from a point of usefulness. In theory, you could modify >> compressed clusters in-place, and then you'd need the information >> whether you are allowed to. But that doesn't really depend on whether >> the host clusters touched by the compressed cluster have a refcount of >> 1, instead if depends on how many times the compressed cluster itself is >> referenced. Unfortunately we have no refcounting structure for that. >> >> So all in all OFLAG_COPIED is just useless for compressed clusters.) > > In general, because we intentionally allow multiple compressed clusters > to (try to) share a single host cluster, the refcount for compressed > clusters will usually be larger than 1. And OFLAG_COPIED is only useful > when the refcount is 1, right? OFLAG_COPIED is only useful when it reflects the number of references to the data cluster, because only then can we update it in-place without having to worry about other users. For normal clusters, one data cluster is one host cluster, so we can get the number of references from the refcount structures (which contain the reference counts for host clusters, but *not* for data clusters). But for compressed clusters, the data cluster may be part of a host cluster, or it may even be multiple host clusters. So the information we get from the refcount structures is totally useless here, it doesn't tell us how many references there are to that compressed data cluster. There is only one exception: If all of the host clusters the compressed cluster touches have a refcount of 1, we can be certain that there are no other users, so then we know there is only one reference to the compressed cluster. But as soon as they have a higher refcount... We don't know whether that's because of multiple references to this compressed cluster or because of multiple compressed cluster sharing a single host cluster. In practice it doesn't really matter of course because modifying compressed clusters in-place seems like a whole different can of worms. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v4 for 2.12 0/3] fix bitmaps migration through shared storage
On 2018-03-29 10:08, Vladimir Sementsov-Ogievskiy wrote: > 28.03.2018 17:53, Max Reitz wrote: >> On 2018-03-27 12:11, Vladimir Sementsov-Ogievskiy wrote: [...] >>> isn't it because a lot of cat processes? will check, update loop to >>> i=0; while check -qcow2 169; do ((i++)); echo $i OK; killall -9 cat; >>> done >> Hmm... I know I tried to kill all of the cats, but for some reason that >> didn't really help yesterday. Seems to help now, for 2.12.0-rc0 at >> least (that is, before this series). > > reproduced with killing... (without these series, just on master) > >> >> After the whole series, I still get a lot of failures in 169 >> (mismatching bitmap hash, mostly). >> >> And interestingly, if I add an abort(): >> >> diff --git a/block/qcow2.c b/block/qcow2.c >> index 486f3e83b7..9204c1c0ac 100644 >> --- a/block/qcow2.c >> +++ b/block/qcow2.c >> @@ -1481,6 +1481,7 @@ static int coroutine_fn >> qcow2_do_open(BlockDriverState *bs, QDict *options, } >> >> if (bdrv_dirty_bitmap_next(bs, NULL)) { >> + abort(); >> /* It's some kind of reopen with already existing dirty >> bitmaps. There >> * are no known cases where we need loading bitmaps in such >> situation, >> * so it's safer don't load them. >> >> Then this fires for a couple of test cases of 169 even without the third >> patch of this series. >> >> I guess bdrv_dirty_bitmap_next() reacts to some bitmaps that migration >> adds or something? Then this would be the wrong condition, because I >> guess we still want to load the bitmaps that are in the qcow2 file. >> >> I'm not sure whether bdrv_has_readonly_bitmaps() is the correct >> condition then, either, though. Maybe let's take a step back: We want >> to load all the bitmaps from the file exactly once, and that is when it >> is opened the first time. Or that's what I would have thought... Is >> that even correct? >> >> Why do we load the bitmaps when the device is inactive anyway? >> Shouldn't we load them only once the device is activated? > > Hmm, not sure. May be, we don't need. But we anyway need to load them, > when opening read-only, and we should correspondingly reopen in this case. Yeah, well, yeah, but the current state seems just wrong. Apparently there are cases where a qcow2 node may have bitmaps before we try to load them from the file, so the current condition doesn't work. Furthermore, it seems like the current "state machine" is too complex so we don't know which cases are possible anymore and what to do when. So the first thing we could do is add a field to the BDRVQCow2State that tells us whether the bitmaps have been loaded already or not. If not, we invoke qcow2_load_dirty_bitmaps() and set the value to true. If the value was true already and the BDS is active and R/W now, we call qcow2_reopen_bitmaps_rw_hint(). That should solve one problem. The other problem of course is the question whether we should call qcow2_load_dirty_bitmaps() at all while the drive is still inactive. You know the migration model better than me, so I'm asking this question to you. We can phrase it differently: Do we need to load the bitmaps before the drive is activated? Max signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PULL 1/2] migration: fix pfd leak
From: Marc-André Lureau Fix leak spotted by ASAN: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fe1abb80a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) #1 0x7fe1aaf1bf75 in g_malloc0 ../glib/gmem.c:124 #2 0x7fe1aaf1c249 in g_malloc0_n ../glib/gmem.c:355 #3 0x55f4841cfaa9 in postcopy_ram_fault_thread /home/elmarco/src/qemu/migration/postcopy-ram.c:596 #4 0x55f48479447b in qemu_thread_start /home/elmarco/src/qemu/util/qemu-thread-posix.c:504 #5 0x7fe1a043550a in start_thread (/lib64/libpthread.so.0+0x750a) Regression introduced with commit 00fa4fc85b00f1a8a810068d158a7a66e88658eb. Signed-off-by: Marc-André Lureau Message-Id: <20180321113644.21899-1-marcandre.lur...@redhat.com> Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Peter Xu Signed-off-by: Dr. David Alan Gilbert --- migration/postcopy-ram.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index efd77939af..4a0b33b373 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -754,6 +754,7 @@ static void *postcopy_ram_fault_thread(void *opaque) } } trace_postcopy_ram_fault_thread_exit(); +g_free(pfd); return NULL; } -- 2.14.3
[Qemu-devel] [PULL 2/2] migration: Don't activate block devices if using -S
From: "Dr. David Alan Gilbert" Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 Signed-off-by: Dr. David Alan Gilbert Message-Id: <20180328170207.49512-1-dgilb...@redhat.com> Reviewed-by: Eric Blake Signed-off-by: Dr. David Alan Gilbert --- migration/migration.c | 22 +++--- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 52a5092add..58bd382730 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -306,13 +306,21 @@ static void process_incoming_migration_bh(void *opaque) Error *local_err = NULL; MigrationIncomingState *mis = opaque; -/* Make sure all file formats flush their mutable metadata. - * If we get an error here, just don't restart the VM yet. */ -bdrv_invalidate_cache_all(&local_err); -if (local_err) { -error_report_err(local_err); -local_err = NULL; -autostart = false; +/* Only fire up the block code now if we're going to restart the + * VM, else 'cont' will do it. + * This causes file locking to happen; so we don't want it to happen + * unless we really are starting the VM. + */ +if (autostart && (!global_state_received() || +global_state_get_runstate() == RUN_STATE_RUNNING)) { +/* Make sure all file formats flush their mutable metadata. + * If we get an error here, just don't restart the VM yet. */ +bdrv_invalidate_cache_all(&local_err); +if (local_err) { +error_report_err(local_err); +local_err = NULL; +autostart = false; +} } /* -- 2.14.3
[Qemu-devel] [PULL 0/2] migration queue
From: "Dr. David Alan Gilbert" The following changes since commit 47d3b60858d90ac8a0cc3a72af7f95c96781125a: Merge remote-tracking branch 'remotes/riscv/tags/riscv-qemu-2.12-important-fixes' into staging (2018-03-28 22:13:38 +0100) are available in the Git repository at: git://github.com/dagrh/qemu.git tags/pull-migration-20180329a for you to fetch changes up to 0746a92612276aee69e66dfe6782b0f882d221d5: migration: Don't activate block devices if using -S (2018-03-29 14:53:16 +0100) Migration pull (small fixes) A pair of two small fixes for 2.12. Dr. David Alan Gilbert (1): migration: Don't activate block devices if using -S Marc-André Lureau (1): migration: fix pfd leak migration/migration.c| 22 +++--- migration/postcopy-ram.c | 1 + 2 files changed, 16 insertions(+), 7 deletions(-)
Re: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds
On 29.03.2018 15:15, Cornelia Huck wrote: > On Thu, 29 Mar 2018 11:37:31 +0200 > Thomas Huth wrote: > >> The current timeout is set to only three seconds - and considering that >> vring_wait_reply() or rather get_second() is not doing any rounding, >> the real timeout is likely rather 2 seconds in most cases. When the >> host is really badly loaded and we run the guest in TCG mode instead >> of KVM, it's possible that we hit this timeout by mistake. > > Let's tweak this sentence to > > "When the host is really badly loaded, it's possible that we hit this > timeout by mistake; it's even more likely if we run the guest in TCG > mode instead of KVM." > > ? Fine for me. You could even remove that TCG vs. KVM part if you like. Thomas
Re: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds
On Thu, 29 Mar 2018 11:37:31 +0200 Thomas Huth wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. Let's tweak this sentence to "When the host is really badly loaded, it's possible that we hit this timeout by mistake; it's even more likely if we run the guest in TCG mode instead of KVM." ? > So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-)
Re: [Qemu-devel] [PATCH v4 2/9] numa: split out NumaOptions parsing into parse_NumaOptions()
On Wed, 28 Mar 2018 15:54:28 -0300 Eduardo Habkost wrote: > On Tue, Mar 27, 2018 at 03:08:27PM +0200, Igor Mammedov wrote: > > On Fri, 23 Mar 2018 17:42:18 -0300 > > Eduardo Habkost wrote: > > > > > On Mon, Mar 12, 2018 at 02:11:08PM +0100, Igor Mammedov wrote: > > > > it will allow to reuse parse_NumaOptions() for parsing > > > > configuration commands received via QMP interface > > > > > > > > Signed-off-by: Igor Mammedov > > > > --- > > > > include/sysemu/numa.h | 1 + > > > > numa.c| 48 > > > > +--- > > > > 2 files changed, 30 insertions(+), 19 deletions(-) > > > > > > > > diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h > > > > index 21713b7..7a0ae75 100644 > > > > --- a/include/sysemu/numa.h > > > > +++ b/include/sysemu/numa.h > > > > @@ -22,6 +22,7 @@ struct NumaNodeMem { > > > > }; > > > > > > > > extern NodeInfo numa_info[MAX_NODES]; > > > > +int parse_numa(void *opaque, QemuOpts *opts, Error **errp); > > > > void parse_numa_opts(MachineState *ms); > > > > void numa_complete_configuration(MachineState *ms); > > > > void query_numa_node_mem(NumaNodeMem node_mem[]); > > > > diff --git a/numa.c b/numa.c > > > > index 126c649..2b1d292 100644 > > > > --- a/numa.c > > > > +++ b/numa.c > > > > @@ -169,28 +169,11 @@ static void parse_numa_distance(NumaDistOptions > > > > *dist, Error **errp) > > > > have_numa_distance = true; > > > > } > > > > > > > > -static int parse_numa(void *opaque, QemuOpts *opts, Error **errp) > > > > +static > > > > +void parse_NumaOptions(MachineState *ms, NumaOptions *object, Error > > > > **errp) > > > > > > I wonder if we should rename the parse_numa_{node,distance}() > > > functions to configure_numa_{node,distance}(), and this one > > > configure_numa(). These functions don't parse anything, anymore. > > I'd preffer to do it in another patch on top of this series > > (added it my TODO list) > > I agree with renaming parse_numa*() later, but the new function > you are creating can have a more reasonable name as it doesn't > parse anything. how about: s/parse_NumaOptions/set_NumaOptions/
Re: [Qemu-devel] [PATCH v4 3/9] cli: add -preconfig option
On Wed, 28 Mar 2018 16:17:32 -0300 Eduardo Habkost wrote: > On Tue, Mar 27, 2018 at 05:05:41PM +0200, Igor Mammedov wrote: > > On Fri, 23 Mar 2018 18:25:08 -0300 > > Eduardo Habkost wrote: > > > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote: > > [...] > > > > diff --git a/vl.c b/vl.c > > > > index 3ef04ce..69b1997 100644 > > > > --- a/vl.c > > > > +++ b/vl.c > > > > @@ -593,7 +593,7 @@ static int default_driver_check(void *opaque, > > > > QemuOpts *opts, Error **errp) > > > > /***/ > > > > /* QEMU state */ > > > > > > > > -static RunState current_run_state = RUN_STATE_PRELAUNCH; > > > > +static RunState current_run_state = RUN_STATE_PRECONFIG; > > > > > > > > /* We use RUN_STATE__MAX but any invalid value will do */ > > > > static RunState vmstop_requested = RUN_STATE__MAX; > > > > @@ -606,6 +606,9 @@ typedef struct { > > > > > > > > static const RunStateTransition runstate_transitions_def[] = { > > > > /* from -> to */ > > > > +{ RUN_STATE_PRECONFIG, RUN_STATE_PRELAUNCH }, > > > > +{ RUN_STATE_PRECONFIG, RUN_STATE_INMIGRATE }, > > > > > > Don't this mean -preconfig and -incoming could work together? > > theoretically yes, but its not the reason why this transition is here. > > It's mimicking existing approach where initial state > >{ RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > > were allowed to move to the next possible (including RUN_STATE_INMIGRATE) > > I still don't get it. Where this definition of "next possible" > comes from? If -incoming and -preconfig don't work together, why > is PRECONFIG -> INMIGRATE migration considered possible? I'd think it's the same (replacement) hack which we use now RUN_STATE_PRELAUNCH -> RUN_STATE_INMIGRATE to allow following code to succeed: case QEMU_OPTION_incoming: if (!incoming) { runstate_set(RUN_STATE_INMIGRATE); } incoming = optarg; I'd get rid of it and move state switching to the actual place where migration starts if it were just that simple, but from a quick look around it did look rather risky. That's why I abandoned an idea of changing it within this series. > > > > { RUN_STATE_DEBUG, RUN_STATE_RUNNING }, > > > > { RUN_STATE_DEBUG, RUN_STATE_FINISH_MIGRATE }, > > > > { RUN_STATE_DEBUG, RUN_STATE_PRELAUNCH }, > > > > @@ -1629,6 +1632,7 @@ static pid_t shutdown_pid; > > > > static int powerdown_requested; > > > > static int debug_requested; > > > > static int suspend_requested; > > > > +static bool preconfig_exit_requested = true; > > > > static WakeupReason wakeup_reason; > > > > static NotifierList powerdown_notifiers = > > > > NOTIFIER_LIST_INITIALIZER(powerdown_notifiers); > > > > @@ -1713,6 +1717,11 @@ static int qemu_debug_requested(void) > > > > return r; > > > > } > > > > > > > > +void qemu_exit_preconfig_request(void) > > > > +{ > > > > +preconfig_exit_requested = true; > > > > +} > > > > + > > > > /* > > > > * Reset the VM. Issue an event unless @reason is SHUTDOWN_CAUSE_NONE. > > > > */ > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void) > > > > RunState r; > > > > ShutdownCause request; > > > > > > > > +if (preconfig_exit_requested) { > > > > +if (runstate_check(RUN_STATE_PRECONFIG)) { > > > > > > Is it possible to have preconfig_exit_request set outside of > > > RUN_STATE_PRECONFIG? When and why? > > preconfig_exit_requested is initialized with TRUE and > > in combo with '-inmigrate' we need this runstate check. > > I think this now makes sense to me. It still looks confusing, > but I don't have a better suggestion right now. > > Except... > > Why exactly do you need to use main_loop() and > main_loop_should_exit() for the preconfig loop? What about a > separate preconfig_loop() and preconfig_loop_should_exit() > function? that would duplicate main_loop() for practically no benefit at all, hence I'm reusing existing main_loop()/main_loop_should_exit() just by adding relevant exit condition. It also easier to read when state transitions are kept close to each other. > > it's the same as it was with > > { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > > which I probably should remove (I need to check it though) > > > > > > +runstate_set(RUN_STATE_PRELAUNCH); > > > > +} > > > > +preconfig_exit_requested = false; > > What happens if we don't set preconfig_exit_requested=false here? nothing should go wrong due to 'if (runstate_check(RUN_STATE_PRECONFIG))' condition. It's the same what qemu_reset_requested()/qemu_shutdown_requested() do with their respective request variables but not wrapped into a separate function as it's the only place it's used. > > > > +return true; > > > > +
[Qemu-devel] [PATCH] hw/i386: Fix IVHD entry length for AMD IOMMU
From: Jan Kiszka Counting from the IVHD ID field to the all-devices entry, we have 28 bytes, not 36. Signed-off-by: Jan Kiszka --- hw/i386/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index deb440f286..a0cda71411 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2561,7 +2561,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker) (1UL << 7), /* PPRSup */ 1); /* IVHD length */ -build_append_int_noprefix(table_data, 0x24, 2); +build_append_int_noprefix(table_data, 28, 2); /* DeviceID */ build_append_int_noprefix(table_data, s->devid, 2); /* Capability offset */ -- 2.13.6
Re: [Qemu-devel] [PATCH 4/4] tests: Tests more flags of the CRB interface
On Wed, Mar 28, 2018 at 10:59 PM, Stefan Berger wrote: > Test and modify more flags of the CRB interface. > > Signed-off-by: Stefan Berger > --- Reviewed-by: Marc-André Lureau > tests/tpm-crb-test.c | 74 > ++-- > 1 file changed, 72 insertions(+), 2 deletions(-) > > diff --git a/tests/tpm-crb-test.c b/tests/tpm-crb-test.c > index e1513cb..d8f9569 100644 > --- a/tests/tpm-crb-test.c > +++ b/tests/tpm-crb-test.c > @@ -28,6 +28,10 @@ static void tpm_crb_test(const void *data) > uint64_t caddr = readq(TPM_CRB_ADDR_BASE + A_CRB_CTRL_CMD_LADDR); > uint32_t rsize = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_RSP_SIZE); > uint64_t raddr = readq(TPM_CRB_ADDR_BASE + A_CRB_CTRL_RSP_ADDR); > +uint8_t locstate = readb(TPM_CRB_ADDR_BASE + A_CRB_LOC_STATE); > +uint32_t locctrl = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL); > +uint32_t locsts = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_STS); > +uint32_t sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); > > g_assert_cmpint(FIELD_EX32(intfid, CRB_INTF_ID, InterfaceType), ==, 1); > g_assert_cmpint(FIELD_EX32(intfid, CRB_INTF_ID, InterfaceVersion), ==, > 1); > @@ -45,9 +49,47 @@ static void tpm_crb_test(const void *data) > g_assert_cmpint(caddr, >, TPM_CRB_ADDR_BASE); > g_assert_cmpint(raddr, >, TPM_CRB_ADDR_BASE); > > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmEstablished), ==, > 1); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, locAssigned), ==, 0); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, activeLocality), ==, > 0); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, reserved), ==, 0); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmRegValidSts), ==, > 1); > + > +g_assert_cmpint(locctrl, ==, 0); > + > +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, Granted), ==, 0); > +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, beenSeized), ==, 0); > + > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 1); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); > + > +/* request access to locality 0 */ > +writeb(TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL, 1); > + > +/* granted bit must be set now */ > +locsts = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_STS); > +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, Granted), ==, 1); > +g_assert_cmpint(FIELD_EX32(locsts, CRB_LOC_STS, beenSeized), ==, 0); > + > +/* we must have an assigned locality */ > +locstate = readb(TPM_CRB_ADDR_BASE + A_CRB_LOC_STATE); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmEstablished), ==, > 1); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, locAssigned), ==, 1); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, activeLocality), ==, > 0); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, reserved), ==, 0); > +g_assert_cmpint(FIELD_EX32(locstate, CRB_LOC_STATE, tpmRegValidSts), ==, > 1); > + > +/* set into ready state */ > +writel(TPM_CRB_ADDR_BASE + A_CRB_CTRL_REQ, 1); > + > +/* TPM must not be in the idle state */ > +sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 0); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); > + > memwrite(caddr, TPM_CMD, sizeof(TPM_CMD)); > > -uint32_t sts, start = 1; > +uint32_t start = 1; > uint64_t end_time = g_get_monotonic_time() + 5 * G_TIME_SPAN_SECOND; > writel(TPM_CRB_ADDR_BASE + A_CRB_CTRL_START, start); > do { > @@ -58,12 +100,40 @@ static void tpm_crb_test(const void *data) > } while (g_get_monotonic_time() < end_time); > start = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_START); > g_assert_cmpint(start & 1, ==, 0); > + > +/* TPM must still not be in the idle state */ > sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); > -g_assert_cmpint(sts & 1, ==, 0); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 0); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); > > struct tpm_hdr tpm_msg; > memread(raddr, &tpm_msg, sizeof(tpm_msg)); > g_assert_cmpmem(&tpm_msg, sizeof(tpm_msg), s->tpm_msg, > sizeof(*s->tpm_msg)); > + > +/* set TPM into idle state */ > +writel(TPM_CRB_ADDR_BASE + A_CRB_CTRL_REQ, 2); > + > +/* idle state must be indicated now */ > +sts = readl(TPM_CRB_ADDR_BASE + A_CRB_CTRL_STS); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmIdle), ==, 1); > +g_assert_cmpint(FIELD_EX32(sts, CRB_CTRL_STS, tpmSts), ==, 0); > + > +/* relinquish locality */ > +writel(TPM_CRB_ADDR_BASE + A_CRB_LOC_CTRL, 2); > + > +/* Granted flag must be cleared */ > +sts = readl(TPM_CRB_ADDR_BASE + A_CRB_LOC_STS); > +g_assert_cmpint(FIELD_EX32(sts, CRB_LOC_STS, Granted), ==, 0); > +g_assert_cmpint(FIELD_EX32(sts, CRB_LOC_STS, beenSeized), ==, 0); > + > +/* no locality ma
Re: [Qemu-devel] [PATCH v4 5/9] qapi: introduce new cmd option "allowed-in-preconfig"
On Thu, Mar 29, 2018 at 11:53:55AM +0200, Igor Mammedov wrote: > On Wed, 28 Mar 2018 16:30:16 -0300 > Eduardo Habkost wrote: > > > On Wed, Mar 28, 2018 at 02:29:57PM +0200, Igor Mammedov wrote: > > > On Fri, 23 Mar 2018 18:28:37 -0300 > > > Eduardo Habkost wrote: > > > > > > > On Mon, Mar 12, 2018 at 02:11:11PM +0100, Igor Mammedov wrote: > > > > > New option will be used to allow commands, which are prepared/need > > > > > to run run in preconfig state. Other commands that should be able > > > > > to run in preconfig state, should be ammeded to not expect machine > > > > > in initialized state or deal with it. > > > > > > > > > > For compatibility reasons, commands, that don't use new flag > > > > > 'allowed-in-preconfig' explicitly, are not permited to run in > > > > > preconfig state but allowed in all other states like they used > > > > > to be. > > > > > > > > > > Within this patch allow following commands in preconfig state: > > > > >qmp_capabilities > > > > >query-qmp-schema > > > > >query-commands > > > > >query-status > > > > >cont > > > > > to allow qmp connection, basic introspection and moving to the next > > > > > state. > > > > > > > > > > PS: > > > > > set-numa-node and query-hotpluggable-cpus will be enabled later in > > > > > a separate patch. > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > > > I didn't review the code yet, but: > > > > > > > > Shouldn't this be applied before patch 3/9, for bisectability? > > > > Otherwise it will be very easy to crash QEMU after applying patch > > > > 3/9. > > > no, it isn't going to work. > > > This patch depends on RUN_STATE_PRECONFIG that is introduced in 3/9. > > > > > > It could be fine to merge into 3/9 during merge, but then history > > > wise it would be difficult to read it later with 2 big and mostly > > > separate changes within one patch. > > > > Yeah, I don't think squashing would be the right answer. > > > > > > > > Considering -preconfig if off by default it shouldn't affect > > > bisectability in general so I'd keep current patch order. > > > > Well, it would affect bisectability if debugging a crash that > > happens using -preconfig. > > > > The only hunk in this patch that really depends on patch 3/9 > > seems to be: > > > > @@ -92,6 +93,13 @@ static QObject *do_qmp_dispatch(QmpCommandList *cmds, > > QObject *request, > >return NULL; > >} > > > > +if (runstate_check(RUN_STATE_PRECONFIG) && > > +!(cmd->options & QCO_ALLOWED_IN_PRECONFIG)) { > > +error_setg(errp, "The command '%s' isn't permitted in '%s' > > state", > > + cmd->name, RunState_str(RUN_STATE_PRECONFIG)); > > +return NULL; > > +} > > + > >if (!qdict_haskey(dict, "arguments")) { > >args = qdict_new(); > >} else { > > > > > > What about moving it to patch 3/9? > By itself it won't work but with moving following hunk with it > > @@ -21,7 +21,8 @@ typedef void (QmpCommandFunc)(QDict *, QObject **, Error > **); > typedef enum QmpCommandOptions > { > QCO_NO_OPTIONS = 0x0, > -QCO_NO_SUCCESS_RESP = 0x1, > +QCO_NO_SUCCESS_RESP = 1U << 0, > +QCO_ALLOWED_IN_PRECONFIG = 1U << 1, > } QmpCommandOptions; > > it should compile and do proper check in above hunk. > > I'll try to do it and see if it works. Sounds OK to me. > > > Or, an alternative is to move the following hunk from patch 3/9 to this > > patch: > > > > diff --git a/qapi/run-state.json b/qapi/run-state.json > > index 1c9fff3aef..ce846a570e 100644 > > --- a/qapi/run-state.json > > +++ b/qapi/run-state.json > > @@ -49,12 +49,15 @@ > ># @colo: guest is paused to save/restore VM state under colo checkpoint, > >#VM can not get into this state unless colo capability is enabled > >#for migration. (since 2.8) > > +# @preconfig: QEMU is paused before board specific init callback is > > executed. > > +# The state is reachable only if -preconfig CLI option is > > used. > > +# (Since 2.12) > >## > >{ 'enum': 'RunState', > > 'data': [ 'debug', 'inmigrate', 'internal-error', 'io-error', 'paused', > >'postmigrate', 'prelaunch', 'finish-migrate', 'restore-vm', > >'running', 'save-vm', 'shutdown', 'suspended', 'watchdog', > > -'guest-panicked', 'colo' ] } > > +'guest-panicked', 'colo', 'preconfig' ] } > > > >## > ># @StatusInfo: > > > > Which could be an interesting idea, because the QAPI schema > > changes would be all grouped inside a single patch, and then > > followed by the actual implementation of the -preconfig option. > it won't really work as this enum generates RUN_STATE_PRECONFIG, > which is used by 3/9 My suggestion in this case would be applying this patch (including the above run-state.json change) before 3/9. -- Eduardo
Re: [Qemu-devel] [PATCH for-2.12 v2 2/2] i386/hyperv: error out if features requested but unsupported
On Thu, Mar 29, 2018 at 12:47:42PM +0300, Roman Kagan wrote: > On Wed, Mar 28, 2018 at 03:53:31PM -0300, Eduardo Habkost wrote: > > On Wed, Mar 28, 2018 at 06:30:24PM +0300, Roman Kagan wrote: > > > In order to guarantee compatibility on migration, QEMU should have > > > complete control over the features it announces to the guest via CPUID. > > > > > > However, for a number of Hyper-V-related cpu properties, if the > > > corresponding feature is not supported by the underlying KVM, the > > > propery is silently ignored and the feature is not announced to the > > > guest. > > > > > > Refuse to start with an error instead. > > > > > > Signed-off-by: Roman Kagan > > > > Something I didn't consider before: > > > > Will this block migration before it even starts, or will crash > > the VM only after all migration data was sent to the destination? > > > > I didn't test it, but kvm_arch_init_vcpu() seems to be too late > > to block an invalid/unsupport configuration. > > I just did a simple test, force-failing one of the checks and starting > QEMU with -incoming defer. It refused to start with the expected error > message. IOW in the migration case the destination will abort before > the source have started to send any data. > > > Maybe we can simply call hyperv_handle_properties() earlier, > > inside x86_cpu_realizefn()? > > Now it's > > ... > x86_cpu_realizefn > qemu_init_vcpu > qemu_kvm_start_vcpu > qemu_thread_create(qemu_kvm_cpu_thread_fn) > [in vcpu thread] > qemu_kvm_cpu_thread_fn Oh, this is the part that I missed. I thought the vCPU thread wouldn't start until migration was finished. This means the patch is OK as-is. Sorry for the confusion. > kvm_init_vcpu > kvm_arch_init_vcpu > hyperv_handle_properties > [error return] > [error return] > [error return] > exit(1) > > > (I know it's very late for this kind of intrusive change in > > v2.12, but I still think it's a good idea to fix this as soon as > > possible.) > > I agree that the current hyperv flag handling begs for cleanup but I > think this can wait for post-2.12. > > > > --- > > > v1 -> v2: > > > - indicate what flag requested the feature that can't be enabled in the > > >error message > > > - fix a typo in the error message for VP_RUNTIME > > I just noticed that I missed hv-time being silently cleared, too (just > in a slightly different pattern), so I'll have to respin. > > Thanks, > Roman. -- Eduardo
Re: [Qemu-devel] [PATCH 2/3] hw/s390x: fix memory leak in s390_init_ipl_dev()
On Thu, 29 Mar 2018 11:10:06 +0200 Greg Kurz wrote: > The string returned by object_property_get_str() is dynamically allocated. > > Fixes: 3c4e9baacf4d9 > Signed-off-by: Greg Kurz > --- > hw/s390x/s390-virtio-ccw.c |5 - > 1 file changed, 4 insertions(+), 1 deletion(-) Thanks, queued to s390-fixes.
[Qemu-devel] [PATCH for-2.12 v5] iotests: Test abnormally large size in compressed cluster descriptor
L2 entries for compressed clusters have a field that indicates the number of sectors used to store the data in the image. That's however not the size of the compressed data itself, just the number of sectors where that data is located. The actual data size is usually not a multiple of the sector size, and therefore cannot be represented with this field. The way it works is that QEMU reads all the specified sectors and starts decompressing the data until there's enough to recover the original uncompressed cluster. If there are any bytes left that haven't been decompressed they are simply ignored. One consequence of this is that even if the size field is larger than it needs to be QEMU can handle it just fine: it will read more data from disk but it will ignore the extra bytes. This test creates an image with two compressed clusters that use 5 sectors (2.5 KB) each, increases the size field to the maximum (8192 sectors, or 4 MB) and verifies that the data can be read without problems. This test is important because while the decompressed data takes exactly one cluster, the maximum value allowed in the compressed size field is twice the cluster size. So although QEMU won't produce images with such large values we need to make sure that it can handle them. Another effect of increasing the size field is that it can make it include data from the following host cluster(s). In this case 'qemu-img check' will detect that the refcounts are not correct, and we'll need to rebuild them. Additionally, this patch also tests that decreasing the size corrupts the image since the original data can no longer be recovered. In this case QEMU returns an error when trying to read the compressed data, but 'qemu-img check' doesn't see anything wrong if the refcounts are consistent. One possible task for the future is to make 'qemu-img check' verify the sizes of the compressed clusters, by trying to decompress the data and checking that the size stored in the L2 entry is correct. Signed-off-by: Alberto Garcia --- v5: Use 'write -c' instead of 'write' followed by 'convert' [Max] Add TODO comment explaining that the size of compressed clusters should also be corrected when it's too large in order to avoid referencing other unrelated clusters. v4: Resend for 2.12 v3: Add TODO comment, as suggested by Eric. Corrupt the length of the second compressed cluster as well so the uncompressed data would span three host clusters. v2: We now have two scenarios where we make QEMU read data from the next host cluster and from beyond the end of the image. This version also runs qemu-img check on the corrupted image. If the size field is too small, reading fails but qemu-img check succeeds. If the size field is too large, reading succeeds but qemu-img check fails (this can be repaired, though). --- tests/qemu-iotests/122 | 47 ++ tests/qemu-iotests/122.out | 33 2 files changed, 80 insertions(+) diff --git a/tests/qemu-iotests/122 b/tests/qemu-iotests/122 index 45b359c2ba..6cf4fcb866 100755 --- a/tests/qemu-iotests/122 +++ b/tests/qemu-iotests/122 @@ -130,6 +130,53 @@ $QEMU_IO -c "read -P 01024k 1022k" "$TEST_IMG" 2>&1 | _filter_qemu_io | _fil echo +echo "=== Corrupted size field in compressed cluster descriptor ===" +echo +# Create an empty image and fill half of it with compressed data. +# The L2 entries of the two compressed clusters are located at +# 0x80 and 0x88, their original values are 0x400800a0 +# and 0x400800a00802 (5 sectors for compressed data each). +_make_test_img 8M -o cluster_size=2M +$QEMU_IO -c "write -c -P 0x11 0 2M" -c "write -c -P 0x11 2M 2M" "$TEST_IMG" \ + 2>&1 | _filter_qemu_io | _filter_testdir + +# Reduce size of compressed data to 4 sectors: this corrupts the image. +poke_file "$TEST_IMG" $((0x80)) "\x40\x06" +$QEMU_IO -c "read -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filter_testdir + +# 'qemu-img check' however doesn't see anything wrong because it +# doesn't try to decompress the data and the refcounts are consistent. +# TODO: update qemu-img so this can be detected. +_check_test_img + +# Increase size of compressed data to the maximum (8192 sectors). +# This makes QEMU read more data (8192 sectors instead of 5, host +# addresses [0xa0, 0xdf]), but the decompression algorithm +# stops once we have enough to restore the uncompressed cluster, so +# the rest of the data is ignored. +poke_file "$TEST_IMG" $((0x80)) "\x7f\xfe" +# Do it also for the second compressed cluster (L2 entry at 0x88). +# In this case the compressed data would span 3 host clusters +# (host addresses: [0xa00802, 0xe00801]) +poke_file "$TEST_IMG" $((0x88)) "\x7f\xfe" + +# Here the image is too small so we're asking QEMU to read beyond the +# end of the image. +$QEMU_IO -c "read -P 0x11 0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | _filt
Re: [Qemu-devel] [qemu-s390x] [PATCH 3/3] sev/i386: fix memory leak in sev_guest_init()
On 29.03.2018 11:24, Cornelia Huck wrote: > On Thu, 29 Mar 2018 11:10:21 +0200 > Greg Kurz wrote: > >> The string returned by object_property_get_str() is dynamically allocated. >> >> Fixes: d8575c6c0242b >> Signed-off-by: Greg Kurz >> --- >> target/i386/sev.c |4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/target/i386/sev.c b/target/i386/sev.c >> index 019d84cef2c7..c01167143f1c 100644 >> --- a/target/i386/sev.c >> +++ b/target/i386/sev.c >> @@ -748,9 +748,11 @@ sev_guest_init(const char *id) >> if (s->sev_fd < 0) { >> error_report("%s: Failed to open %s '%s'", __func__, >> devname, strerror(errno)); >> -goto err; >> } >> g_free(devname); >> +if (s->sev_fd < 0) { >> +goto err; >> +} >> >> ret = sev_platform_ioctl(s->sev_fd, SEV_PLATFORM_STATUS, &status, >> &fw_error); >> > > I would probably add an extra g_free(devname) right before the goto > err, but this works as well, obviously. Or maybe remove devname from the error_report ("%s: Failed to open sev-device '%s', __func__ strerror(errno)) and move the g_free right before the if-statement? Anyway: Reviewed-by: Thomas Huth
Re: [Qemu-devel] [qemu-s390x] [PATCH 2/3] hw/s390x: fix memory leak in s390_init_ipl_dev()
On 29.03.2018 11:10, Greg Kurz wrote: > The string returned by object_property_get_str() is dynamically allocated. > > Fixes: 3c4e9baacf4d9 > Signed-off-by: Greg Kurz > --- > hw/s390x/s390-virtio-ccw.c |5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c > index 864145a7c6f3..435f7c99e77c 100644 > --- a/hw/s390x/s390-virtio-ccw.c > +++ b/hw/s390x/s390-virtio-ccw.c > @@ -246,6 +246,7 @@ static void s390_init_ipl_dev(const char *kernel_filename, > { > Object *new = object_new(TYPE_S390_IPL); > DeviceState *dev = DEVICE(new); > +char *netboot_fw_prop; > > if (kernel_filename) { > qdev_prop_set_string(dev, "kernel", kernel_filename); > @@ -256,9 +257,11 @@ static void s390_init_ipl_dev(const char > *kernel_filename, > qdev_prop_set_string(dev, "cmdline", kernel_cmdline); > qdev_prop_set_string(dev, "firmware", firmware); > qdev_prop_set_bit(dev, "enforce_bios", enforce_bios); > -if (!strlen(object_property_get_str(new, "netboot_fw", &error_abort))) { > +netboot_fw_prop = object_property_get_str(new, "netboot_fw", > &error_abort); > +if (!strlen(netboot_fw_prop)) { > qdev_prop_set_string(dev, "netboot_fw", netboot_fw); > } > +g_free(netboot_fw_prop); > object_property_add_child(qdev_get_machine(), TYPE_S390_IPL, >new, NULL); > object_unref(new); > > Reviewed-by: Thomas Huth
Re: [Qemu-devel] [qemu-s390x] [PATCH 1/3] exec: fix memory leak in find_max_supported_pagesize()
On 29.03.2018 11:09, Greg Kurz wrote: > The string returned by object_property_get_str() is dynamically allocated. > > Signed-off-by: Greg Kurz > --- > exec.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/exec.c b/exec.c > index c09bd93df31e..02b1efebb7c3 100644 > --- a/exec.c > +++ b/exec.c > @@ -1495,6 +1495,7 @@ static int find_max_supported_pagesize(Object *obj, > void *opaque) > mem_path = object_property_get_str(obj, "mem-path", NULL); > if (mem_path) { > long hpsize = qemu_mempath_getpagesize(mem_path); > +g_free(mem_path); > if (hpsize < *hpsize_min) { > *hpsize_min = hpsize; > } > > Reviewed-by: Thomas Huth
Re: [Qemu-devel] [qemu-s390x] [PATCH 2/3] hw/s390x: fix memory leak in s390_init_ipl_dev()
On 29.03.2018 12:31, Cornelia Huck wrote: > On Thu, 29 Mar 2018 11:39:41 +0200 > Greg Kurz wrote: > >> On Thu, 29 Mar 2018 11:27:21 +0200 >> Igor Mammedov wrote: >> >>> On Thu, 29 Mar 2018 11:10:06 +0200 >>> Greg Kurz wrote: >>> The string returned by object_property_get_str() is dynamically allocated. Fixes: 3c4e9baacf4d9 Signed-off-by: Greg Kurz --- hw/s390x/s390-virtio-ccw.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 864145a7c6f3..435f7c99e77c 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -246,6 +246,7 @@ static void s390_init_ipl_dev(const char *kernel_filename, { Object *new = object_new(TYPE_S390_IPL); DeviceState *dev = DEVICE(new); +char *netboot_fw_prop; if (kernel_filename) { qdev_prop_set_string(dev, "kernel", kernel_filename); @@ -256,9 +257,11 @@ static void s390_init_ipl_dev(const char *kernel_filename, qdev_prop_set_string(dev, "cmdline", kernel_cmdline); qdev_prop_set_string(dev, "firmware", firmware); qdev_prop_set_bit(dev, "enforce_bios", enforce_bios); -if (!strlen(object_property_get_str(new, "netboot_fw", &error_abort))) { +netboot_fw_prop = object_property_get_str(new, "netboot_fw", &error_abort); +if (!strlen(netboot_fw_prop)) { >>> probably not really issue here but, >>> is strlen really safe in case netboot_fw_prop == NULL? >>> >> >> You're right, object_property_get_str() can theoretically return NULL and >> strlen() would crash... Not sure how this would happen though. Anyway, the >> current code doesn't check if object_property_get_str() returns NULL so >> if this needs to be fixed as well, let's do it in a followup patch. > > I don't think so - if the attribute exists, we'll always get != NULL if > I read the code correctly. Right, the property is always there, so this should never be NULL. Thomas
Re: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds
On 03/29/2018 11:37 AM, Thomas Huth wrote: > The current timeout is set to only three seconds - and considering that > vring_wait_reply() or rather get_second() is not doing any rounding, > the real timeout is likely rather 2 seconds in most cases. When the > host is really badly loaded and we run the guest in TCG mode instead > of KVM, it's possible that we hit this timeout by mistake. So let's > increase the timeout to 30 seconds instead to ease this situation (30 > seconds is also the timeout that is used by the Linux SCSI subsystem > for example, so this seems to be a sane value for block IO timeout). > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1549079 > Signed-off-by: Thomas Huth Acked-by: Christian Borntraeger I think this should go into 2.12 with a rebuild > --- > pc-bios/s390-ccw/virtio.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c > index 817e7f5..cdb66f4 100644 > --- a/pc-bios/s390-ccw/virtio.c > +++ b/pc-bios/s390-ccw/virtio.c > @@ -14,7 +14,7 @@ > #include "virtio-scsi.h" > #include "bswap.h" > > -#define VRING_WAIT_REPLY_TIMEOUT 3 > +#define VRING_WAIT_REPLY_TIMEOUT 30 > > static VRing block[VIRTIO_MAX_VQS]; > static char ring_area[VIRTIO_RING_SIZE * VIRTIO_MAX_VQS] >
Re: [Qemu-devel] [PATCH v4 3/9] cli: add -preconfig option
On Wed, 28 Mar 2018 16:21:48 -0300 Eduardo Habkost wrote: > On Wed, Mar 28, 2018 at 01:48:35PM +0200, Igor Mammedov wrote: > > On Tue, 27 Mar 2018 17:05:41 +0200 > > Igor Mammedov wrote: > > > > > On Fri, 23 Mar 2018 18:25:08 -0300 > > > Eduardo Habkost wrote: > > > > > > > On Mon, Mar 12, 2018 at 02:11:09PM +0100, Igor Mammedov wrote: > > > [...] > > [...] > > > > > @@ -1886,6 +1895,13 @@ static bool main_loop_should_exit(void) > > > > > RunState r; > > > > > ShutdownCause request; > > > > > > > > > > +if (preconfig_exit_requested) { > > > > > +if (runstate_check(RUN_STATE_PRECONFIG)) { > > > > > > > > Is it possible to have preconfig_exit_request set outside of > > > > RUN_STATE_PRECONFIG? When and why? > > > preconfig_exit_requested is initialized with TRUE and > > > in combo with '-inmigrate' we need this runstate check. > > > it's the same as it was with > > > { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE }, > > > which I probably should remove (I need to check it though) > > [...] > > > > > > > @@ -4594,6 +4623,10 @@ int main(int argc, char **argv, char **envp) > > > > > } > > > > > parse_numa_opts(current_machine); > > > > > > > > > > +/* do monitor/qmp handling at preconfig state if requested */ > > > > > +main_loop(); > > > > > > > > Wouldn't it be simpler to do "if (!preconfig) { main_loop(); }" > > > > instead of entering main_loop() just to exit immediately? > > > The thought didn't cross my mind, it might work and more readable > > > as one doesn't have to jump into main_loop() to find out that > > > it would exit immediately. > > > I'll try to it on respin. > > Well doing as suggested end ups more messy: > > > > @@static bool main_loop_should_exit(void) > > ... > > if (preconfig_exit_requested) { > > runstate_set(RUN_STATE_PRELAUNCH); > > > > return true; > > } > > > > @@main > > /* do monitor/qmp handling at preconfig state if requested */ > > if (!preconfig_exit_requested) { > > main_loop(); > > } else if (runstate_check(RUN_STATE_PRECONFIG)) { > > runstate_set(RUN_STATE_PRELAUNCH); > > } > > This doesn't make sense to me. Why would we enter > RUN_STATE_PRECONFIG state if -preconfig is not used at all? because of RUN_STATE_PRECONFIG becomes new initial state of our state machine where we start of (used to be RUN_STATE_PRELAUNCH) Lets call it variant 1: with this we have 2 possible transitions: RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH (machine_init) and RUN_STATE_PRECONFIG -> RUN_STATE_INMIGRATE ugly but it was the same with RUN_STATE_PRELAUNCH initial transition Another variant 2, in case we switch to RUN_STATE_PRECONFIG only on -preconfig transitions would be RUN_STATE_PRELAUNCH -> RUN_STATE_PRECONFIG (allow switch from initial to -preconfig) RUN_STATE_PRECONFIG -> RUN_STATE_PRELAUNCH while the last is valid transition, the 1st one isn't really valid because of (beside of switching from initial state) it allows bouncing back to RUN_STATE_PRECONFIG later. If we consider only state machine transitions, I think it's cleaner to start with variant 1 with the same -inmigrate hack we already have (which potentially could be fixed later), than allowing arbitrary bouncing to RUN_STATE_PRECONFIG at later stage. With this approach all processing before machine_init() would run at RUN_STATE_PRECONFIG and then we would switch to RUN_STATE_PRELAUNCH. Even though it is far reaching goal but at least that's where we should be moving to have sane initialization flow in vl.c > > preconfig_exit_requested = false; > > ... > > > > I'd prefer original v4 approach, where only main_loop_should_exit() > > has to deal with state transitions and book-keeping. > > If the above is unavoidable, I agree. But I still don't > understand we have to enter PRECONFIG state if the user didn't > specify -preconfig. >
[Qemu-devel] [PATCH for-2.13] s390x: introduce 2.13 compat machine
Signed-off-by: Cornelia Huck --- Yes, it's that time again :) --- hw/s390x/s390-virtio-ccw.c | 17 - include/hw/compat.h| 3 +++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 864145a7c6..122f316f6a 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -668,6 +668,9 @@ bool css_migration_enabled(void) } \ type_init(ccw_machine_register_##suffix) +#define CCW_COMPAT_2_12 \ +HW_COMPAT_2_12 + #define CCW_COMPAT_2_11 \ HW_COMPAT_2_11 \ {\ @@ -753,14 +756,26 @@ bool css_migration_enabled(void) .value= "0",\ }, +static void ccw_machine_2_13_instance_options(MachineState *machine) +{ +} + +static void ccw_machine_2_13_class_options(MachineClass *mc) +{ +} +DEFINE_CCW_MACHINE(2_13, "2.13", true); + static void ccw_machine_2_12_instance_options(MachineState *machine) { +ccw_machine_2_13_instance_options(machine); } static void ccw_machine_2_12_class_options(MachineClass *mc) { +ccw_machine_2_13_class_options(mc); +SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_12); } -DEFINE_CCW_MACHINE(2_12, "2.12", true); +DEFINE_CCW_MACHINE(2_12, "2.12", false); static void ccw_machine_2_11_instance_options(MachineState *machine) { diff --git a/include/hw/compat.h b/include/hw/compat.h index bc9e3a6627..6e91491b77 100644 --- a/include/hw/compat.h +++ b/include/hw/compat.h @@ -1,6 +1,9 @@ #ifndef HW_COMPAT_H #define HW_COMPAT_H +#define HW_COMPAT_2_12 \ +/* empty */ + #define HW_COMPAT_2_11 \ {\ .driver = "hpet",\ -- 2.14.3
Re: [Qemu-devel] [PATCH for-2.13] s390x: introduce 2.13 compat machine
On 29.03.2018 13:33, Cornelia Huck wrote: > Signed-off-by: Cornelia Huck > --- > Yes, it's that time again :) Can't we do v3.0 next ;-) ? Anyway: Reviewed-by: Thomas Huth
Re: [Qemu-devel] [PATCH for-2.13] pc-bios/s390-ccw: Increase virtio timeout to 30 seconds
On 29.03.2018 12:03, Christian Borntraeger wrote: > > > On 03/29/2018 11:37 AM, Thomas Huth wrote: >> The current timeout is set to only three seconds - and considering that >> vring_wait_reply() or rather get_second() is not doing any rounding, >> the real timeout is likely rather 2 seconds in most cases. When the >> host is really badly loaded and we run the guest in TCG mode instead >> of KVM, it's possible that we hit this timeout by mistake. So let's >> increase the timeout to 30 seconds instead to ease this situation (30 >> seconds is also the timeout that is used by the Linux SCSI subsystem >> for example, so this seems to be a sane value for block IO timeout). > > I have never seen this, but cant this also with KVM (e.g. host paging > and the swap disk is busy for some seconds). Sure, I even assume that Lukáš might have hit this with KVM, and not with TCG. It's just a little bit more unlikely there. > Wouldnt it also qualify for 2.12? Yes, I was just not sure whether we want to rebuild the s390-ccw.img just because of this ... I'd like to leave that decision to you and Cornelia. Thomas
Re: [Qemu-devel] [Qemu-arm] [PATCH v2 2/2] arm_gicv3_kvm: kvm_dist_get/put: skip the registers banked by GICR
* Peter Maydell (peter.mayd...@linaro.org) wrote: > On 23 March 2018 at 12:08, Peter Maydell wrote: > > On 21 March 2018 at 08:00, Shannon Zhao wrote: > >> On 2018/3/20 19:54, Peter Maydell wrote: > >>> Can you still successfully migrate a VM from a QEMU version > >>> without this bugfix to one with the bugfix ? > >>> > >> I've tested this case. I can migrate a VM between these two versions. > > > > Hmm. Looking at the code I can't see how that would work, > > except by accident. Let me see if I understand what's happening > > here: > > > > In the code in master, we have QEMU data structures > > (bitmaps, etc) which have one entry for each of GICV3_MAXIRQ > > irqs. That includes the RAZ/WI unused space for the SPIs/PPIs, so > > for a 1-bit-per-irq bitmap: > > [0x, irq 32, irq 33, ] > > > > When we fill in the values from KVM into these data structures, > > we start after the unused space, because the for_each_dist_irq_reg() > > macro starts with _irq = GIC_INTERNAL. But we forgot to adjust > > the offset value we use for the KVM access, so we start by > > reading the RAZ/WI values from KVM, and the data structure > > contents end up with: > > [0x, 0x, irq 32, irq 33, ... ] > > (and the last irqs wouldn't get transferred). > > > > With this change to the code we will get the offset right and > > the data structure will be filled as > > [0x, irq 32, irq 33, ] > > > > But for migration from the old version, the data structure > > we receive from the migration source will contain the old > > broken layout of > > [0x, 0x, irq 32, irq 33, ... ] > > so if the new code doesn't do anything special to handle > > migration from that old version then it will write zeroes to > > irq 32..63, and then write incorrect values for all the irqs > > after that, won't it? > > > > That suggests to me that we need to have some code in the > > migration post-load routine that identifies that the data > > is coming from an old version with this bug, and shifts > > all the data down in the arrays so that the code to write > > it to the kernel can handle it. > > I was thinking a bit more about how to handle this, and > my best idea was: > > (1) send something in the migration stream that says > "I don't have this bug" (version number change? > vmstate field that's just a "no bug" flag? subsection > with no contents?) > > (2) on the destination, if the source doesn't tell us > it doesn't have this bug, and we are running KVM, then > shift all the data in the arrays down to fix it up > [Strictly what we want to know is if the source is > running KVM, not if the destination is, but I don't > know of a way to find that out, and in practice TCG->KVM > migrations don't work anyway, so it's not a big deal.] > > Juan, David, do you have any suggestions for the best > mechanism for part 1; or is there some clever way to > handle this sort of bug that I've missed? The subsection is probably the best bet; unless that is you can find a bit to misuse in an existing field. Dave > thanks > -- PMM -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[Qemu-devel] [RFC PATCH 7/8] block-backend: Add blk_co_copy_range
It's a BlockBackend wrapper of bdrv_co_copy_range. Signed-off-by: Fam Zheng --- block/block-backend.c | 8 include/sysemu/block-backend.h | 4 2 files changed, 12 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 681b240b12..54b0789282 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2217,3 +2217,11 @@ void blk_unregister_buf(BlockBackend *blk, void *host) { bdrv_unregister_buf(blk_bs(blk), host); } + +int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, + BlockBackend *blk_out, int64_t off_out, + int bytes) +{ +return bdrv_co_copy_range(blk_in->root, off_in, blk_out->root, off_out, + bytes); +} diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 92ab624fac..1fde14b461 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -232,4 +232,8 @@ void blk_set_force_allow_inactivate(BlockBackend *blk); void blk_register_buf(BlockBackend *blk, void *host, size_t size); void blk_unregister_buf(BlockBackend *blk, void *host); +int blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, + BlockBackend *blk_out, int64_t off_out, + int bytes); + #endif -- 2.14.3