Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"
>On Wed, Jul 19, 2017 at 08:17:49PM +0100, Dr. David Alan Gilbert wrote: >> * Eduardo Habkost (address@hidden) wrote: >> > On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote: >> > > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote: >> > > >> It doesn't. Perhaps we should add that as a future libvirt-qemu.so >> > > >> API >> > > >> addition, although it's probably easier to just use QMP than HMP when >> > > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want. >> > > > >> > > > Or special case the "cpu 1" command - ie notice that it is being >> > > > requested and don't execute 'human-montor-command'. Instead just >> > > > record the CPU index, and use that for future "human-monitor-command" >> > > > invokations, so we get full compat with the (dubious) stateful HMP >> > > > semantics that traditionally existed. >> > > >> > > Is 'cpu' (and the followup commands affected by it) the only stateful >> > > HMP command pairing? Is there a way to specify multiple HMP commands in >> > > a single human-monitor-command QMP call? >> > > >> > > Indeed, tweaking qemu's human-monitor-command call to track the state >> > > might be cleaner than having libvirt have to tweak API to work around >> > > this wart of HMP. >> > >> > The CPU index was the only state kept by the human monitor, and I >> > think it's by design that it stopped being considered "monitor >> > state" to be tracked, and became just an argument to >> > human-monitor-command. >> > >> > It's true that it broke compatibility of >> > "virsh qemu-monitor-command --hmp 'cpu '", >> > when we moved to QMP, but this happened years ago, and it looks >> > like nobody was relying on it. I don't see the point of trying >> > to emulate the previous stateful interface. >> >> IMHO Yi's fix (once reworked) is the right fix - it removes the >> use of that piece of state, when the optional parameter is used. >> (OK, so it needs rework not to change that state and to >> come to some agreement as to what to use instead of cpu index number >> etc). > >Agreed, as it helps us to keep the "virsh qemu-monitor-command" >interface simpler. But we have 8 commands that use >mon_get_cpu(), we shouldn't fix only "info lapic". Thank you all! I will rework this patch in this way: - extend 'info registers' with apic id value instead of current, like this: CPU#1 (socket-id: a, core-id: b, thread-id: c, apic-id: d) - add parameter 'apic id' for 'info lapic' As to other commands, I want to send some other patches 'cause in my opinion not all commands need 'apic-id' as index. --- Best wishes Yi Wang
Re: [Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug
Hi, This series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Message-id: 1500523879-23860-1-git-send-email-bhar...@linux.vnet.ibm.com Subject: [Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug Type: series === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-quick@centos6 time make docker-test-build@min-glib time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 3aad3d6 spapr: Fix QEMU abort during memory unplug === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-j1hutab7/src/dtc'... Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d' BUILD centos6 make[1]: Entering directory '/var/tmp/patchew-tester-tmp-j1hutab7/src' ARCHIVE qemu.tgz ARCHIVE dtc.tgz COPYRUNNER RUN test-quick in qemu:centos6 Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 bison-2.4.1-5.el6.x86_64 ccache-3.1.6-2.el6.x86_64 epel-release-6-8.noarch flex-2.5.35-9.el6.x86_64 gcc-4.4.7-18.el6.x86_64 git-1.7.1-8.el6.x86_64 glib2-devel-2.28.8-9.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 make-3.81-23.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 tar-1.23-15.el6_8.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel glib2-devel SDL-devel pixman-devel epel-release HOSTNAME=bee8e1f457b0 TERM=xterm MAKEFLAGS= -j8 HISTSIZE=1000 J=8 USER=root CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 MAIL=/var/spool/mail/root PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 TARGET_LIST= HISTCONTROL=ignoredups SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test LOGNAME=root LESSOPEN=||/usr/bin/lesspipe.sh %s FEATURES= dtc DEBUG= G_BROKEN_FILENAMES=1 CCACHE_HASHDIR= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install No C++ compiler available; disabling C++ specific optional code Install prefix/var/tmp/qemu-build/install BIOS directory/var/tmp/qemu-build/install/share/qemu binary directory /var/tmp/qemu-build/install/bin library directory /var/tmp/qemu-build/install/lib module directory /var/tmp/qemu-build/install/lib/qemu libexec directory /var/tmp/qemu-build/install/libexec include directory /var/tmp/qemu-build/install/include config directory /var/tmp/qemu-build/install/etc local state directory /var/tmp/qemu-build/install/var Manual directory /var/tmp/qemu-build/install/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /tmp/qemu-test/src C compilercc Host C compiler cc C++ compiler Objective-C compiler cc ARFLAGS rv CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g QEMU_CFLAGS -I/usr/include/pixman-1 -I$(SRC_PATH)/dtc/libfdt -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g make make install install pythonpython -B smbd /usr/sbin/smbd module supportno host CPU x86_64 host big endian no target list x86_64-softmmu aarch64-softmmu gprof enabled no sparse enabledno strip binariesyes profiler no static build no pixmansystem SDL support yes (1.2.14) GTK support no GTK GL supportno VTE support no TLS priority NORMAL GNUTLS supportno GNUTLS rndno libgcrypt no libgcrypt kdf no nettleno nettle kdfno libtasn1 no curses supportno virgl support no curl support no mingw32 support no Audio drivers oss Block whitelist (rw) Block whitelist (ro) VirtFS supportno VNC support yes VNC SASL support no VNC JPEG support no VNC PNG support no xen support no brlapi supportno bluez supportno Documentation no PIE yes vde support no netmap supportno Linux AIO support no ATTR/XATTR support yes Install blobs yes KVM support yes HAX support no TCG support yes TCG debug
Re: [Qemu-devel] [PATCH] migration: optimize the downtime
Hi, This series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Message-id: 1500522569-10760-1-git-send-email-jianjay.z...@huawei.com Subject: [Qemu-devel] [PATCH] migration: optimize the downtime Type: series === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-quick@centos6 time make docker-test-build@min-glib time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 1ae581e migration: optimize the downtime === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-6slpqj5k/src/dtc'... Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d' BUILD centos6 make[1]: Entering directory '/var/tmp/patchew-tester-tmp-6slpqj5k/src' ARCHIVE qemu.tgz ARCHIVE dtc.tgz COPYRUNNER RUN test-quick in qemu:centos6 Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 bison-2.4.1-5.el6.x86_64 ccache-3.1.6-2.el6.x86_64 epel-release-6-8.noarch flex-2.5.35-9.el6.x86_64 gcc-4.4.7-18.el6.x86_64 git-1.7.1-8.el6.x86_64 glib2-devel-2.28.8-9.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 make-3.81-23.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 tar-1.23-15.el6_8.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel glib2-devel SDL-devel pixman-devel epel-release HOSTNAME=50d2403d8481 TERM=xterm MAKEFLAGS= -j8 HISTSIZE=1000 J=8 USER=root CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 MAIL=/var/spool/mail/root PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 TARGET_LIST= HISTCONTROL=ignoredups SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test LOGNAME=root LESSOPEN=||/usr/bin/lesspipe.sh %s FEATURES= dtc DEBUG= G_BROKEN_FILENAMES=1 CCACHE_HASHDIR= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install No C++ compiler available; disabling C++ specific optional code Install prefix/var/tmp/qemu-build/install BIOS directory/var/tmp/qemu-build/install/share/qemu binary directory /var/tmp/qemu-build/install/bin library directory /var/tmp/qemu-build/install/lib module directory /var/tmp/qemu-build/install/lib/qemu libexec directory /var/tmp/qemu-build/install/libexec include directory /var/tmp/qemu-build/install/include config directory /var/tmp/qemu-build/install/etc local state directory /var/tmp/qemu-build/install/var Manual directory /var/tmp/qemu-build/install/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /tmp/qemu-test/src C compilercc Host C compiler cc C++ compiler Objective-C compiler cc ARFLAGS rv CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g QEMU_CFLAGS -I/usr/include/pixman-1 -I$(SRC_PATH)/dtc/libfdt -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g make make install install pythonpython -B smbd /usr/sbin/smbd module supportno host CPU x86_64 host big endian no target list x86_64-softmmu aarch64-softmmu gprof enabled no sparse enabledno strip binariesyes profiler no static build no pixmansystem SDL support yes (1.2.14) GTK support no GTK GL supportno VTE support no TLS priority NORMAL GNUTLS supportno GNUTLS rndno libgcrypt no libgcrypt kdf no nettleno nettle kdfno libtasn1 no curses supportno virgl support no curl support no mingw32 support no Audio drivers oss Block whitelist (rw) Block whitelist (ro) VirtFS supportno VNC support yes VNC SASL support no VNC JPEG support no VNC PNG support no xen support no brlapi supportno bluez supportno Documentation no PIE yes vde support no netmap supportno Linux AIO support no ATTR/XATTR support yes Install blobs yes KVM support yes HAX support no TCG support yes TCG debug enabled no TCG interpreter no
[Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 Signed-off-by: Bharata B Rao--- Changes in v1: - Added comment (David Gibson) - Ensured we free sPAPRDIMMState when corresonding entry already exists (Daniel Henrique Barboza) Daniel had shown another alternative, we can switch over to that if preferred. hw/ppc/spapr.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1cb09e7..c6091e2 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2853,8 +2853,17 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, sPAPRDIMMState *dimm_state) { -g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); -QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next); +/* + * If this request is for a DIMM whose removal had failed earlier + * (due to guest's refusal to remove the LMBs), we would have this + * dimm_state already in the pending_dimm_unplugs list. In that + * case don't add again. + */ +if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { +QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next); +} else { +g_free(dimm_state); +} } static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr, -- 2.7.4
Re: [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts
Hi, This series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Message-id: 1500520169-23367-1-git-send-email-c...@braap.org Subject: [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Type: series === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-quick@centos6 time make docker-test-build@min-glib time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/1500520169-23367-1-git-send-email-c...@braap.org -> patchew/1500520169-23367-1-git-send-email-c...@braap.org - [tag update] patchew/20170719163108.26943-1-apa...@redhat.com -> patchew/20170719163108.26943-1-apa...@redhat.com Switched to a new branch 'test' da87a3c tcg: enable multiple TCG contexts in softmmu 043a8fe tcg: introduce regions to split code_gen_buffer 56b9b69 tcg: define TCG_HIGHWATER e7f8206 translate-all: use qemu_protect_rwx/none helpers 22af337 osdep: introduce qemu_mprotect_rwx/none e169f67 util: move qemu_real_host_page_size/mask to osdep.h 2cb855d tcg: distribute profiling counters across TCGContext's bd2ea58 tcg: introduce **tcg_ctxs to keep track of all TCGContext's 317e1af tcg: dynamically allocate optimizer temps 379bf86 gen-icount: fold exitreq_label into TCGContext 7dd8ecd tcg: define tcg_init_ctx and make tcg_ctx a pointer 29b8220 tcg: take .helpers out of TCGContext f90dd29 tcg: take tb_ctx out of TCGContext 265e0ba tci: move tci_regs to tcg_qemu_tb_exec's stack 556cc0f translate-all: report correct avg host TB size 60c2e41 exec-all: rename tb_free to tb_remove 7e9cbd3 translate-all: use a binary search tree to track TBs in TBContext 05a7053 exec-all: extract tb->tc_* into a separate struct tc_tb 9d12b48 translate-all: define and use DEBUG_TB_CHECK_GATE e6a6556 translate-all: define and use DEBUG_TB_INVALIDATE_GATE 0db2c48 exec-all: introduce TB_PAGE_ADDR_FMT ace4460 translate-all: define and use DEBUG_TB_FLUSH_GATE 27836fa cpu-exec: lookup/generate TB outside exclusive region during step_atomic 1e060f9 tcg: check CF_PARALLEL instead of parallel_cpus bc98732 target/sparc: check CF_PARALLEL instead of parallel_cpus 6ffdc73 target/sh4: check CF_PARALLEL instead of parallel_cpus ccb61ac target/s390x: check CF_PARALLEL instead of parallel_cpus 12eeba3 target/m68k: check CF_PARALLEL instead of parallel_cpus c6c4772 target/i386: check CF_PARALLEL instead of parallel_cpus b6c38be target/hppa: check CF_PARALLEL instead of parallel_cpus ee35e3a target/arm: check CF_PARALLEL instead of parallel_cpus ac51961 tcg: convert tb->cflags reads to tb_cflags(tb) 30db086 tcg: define CF_PARALLEL and use it for TB hashing 35d964c exec-all: bring tb->invalid into tb->cflags cc370d5 tcg: consolidate TB lookups in tb_lookup__cpu_state 7948c58 tcg: remove addr argument from lookup_tb_ptr 263f3e2 tcg/mips: constify tcg_target_callee_save_regs fc1b9cb tcg/i386: constify tcg_target_callee_save_regs 72848e1 cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find b2ae03c translate-all: make have_tb_lock static d2271e9 exec-all: fix typos in TranslationBlock's documentation bba6cb2 tcg: fix corruption of code_time profiling counter upon tb_flush 50913ad cputlb: bring back tlb_flush_count under !TLB_DEBUG === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-0962jnus/src/dtc'... Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d' BUILD centos6 make[1]: Entering directory '/var/tmp/patchew-tester-tmp-0962jnus/src' ARCHIVE qemu.tgz ARCHIVE dtc.tgz COPYRUNNER RUN test-quick in qemu:centos6 Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 bison-2.4.1-5.el6.x86_64 ccache-3.1.6-2.el6.x86_64 epel-release-6-8.noarch flex-2.5.35-9.el6.x86_64 gcc-4.4.7-18.el6.x86_64 git-1.7.1-8.el6.x86_64 glib2-devel-2.28.8-9.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 make-3.81-23.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 tar-1.23-15.el6_8.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel glib2-devel SDL-devel pixman-devel epel-release HOSTNAME=6a51217a1bb9 TERM=xterm MAKEFLAGS= -j8 HISTSIZE=1000 J=8 USER=root CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 MAIL=/var/spool/mail/root PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 TARGET_LIST= HISTCONTROL=ignoredups SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test LOGNAME=root LESSOPEN=||/usr/bin/lesspipe.sh %s FEATURES= dtc DEBUG= G_BROKEN_FILENAMES=1 CCACHE_HASHDIR= _=/usr/bin/env Configure options:
[Qemu-devel] [PATCH] migration: optimize the downtime
Qemu_savevm_state_cleanup() takes about 300ms in my ram migration tests with a 8U24G vm(20G is really occupied), the main cost comes from KVM_SET_USER_MEMORY_REGION ioctl when mem.memory_size = 0 in kvm_set_user_memory_region(). In kmod, the main cost is kvm_zap_obsolete_pages(), which traverses the active_mmu_pages list to zap the unsync sptes. I think it can be optimized: (1) source vm will be destroyed if the migration is successfully done, so the resources will be cleanuped automatically by the system (2) delay the cleanup if the migration failed Signed-off-by: Jay Zhou--- migration/migration.c | 16 +--- qmp.c | 10 ++ 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index a0db40d..72832be 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1877,6 +1877,15 @@ static void *migration_thread(void *opaque) if (qemu_file_get_error(s->to_dst_file)) { migrate_set_state(>state, current_active_state, MIGRATION_STATUS_FAILED); +/* + * The resource has been allocated by migration will be reused in + * COLO process, so don't release them. + */ +if (!enable_colo) { +qemu_mutex_lock_iothread(); +qemu_savevm_state_cleanup(); +qemu_mutex_unlock_iothread(); +} trace_migration_thread_file_err(); break; } @@ -1916,13 +1925,6 @@ static void *migration_thread(void *opaque) end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); qemu_mutex_lock_iothread(); -/* - * The resource has been allocated by migration will be reused in COLO - * process, so don't release them. - */ -if (!enable_colo) { -qemu_savevm_state_cleanup(); -} if (s->state == MIGRATION_STATUS_COMPLETED) { uint64_t transferred_bytes = qemu_ftell(s->to_dst_file); s->total_time = end_time - s->total_time; diff --git a/qmp.c b/qmp.c index b86201e..0e68eaa 100644 --- a/qmp.c +++ b/qmp.c @@ -37,6 +37,8 @@ #include "qom/object_interfaces.h" #include "hw/mem/pc-dimm.h" #include "hw/acpi/acpi_dev_interface.h" +#include "migration/migration.h" +#include "migration/savevm.h" NameInfo *qmp_query_name(Error **errp) { @@ -200,6 +202,14 @@ void qmp_cont(Error **errp) if (runstate_check(RUN_STATE_INMIGRATE)) { autostart = 1; } else { +/* + * Delay the cleanup to reduce the downtime of migration. + * The resource has been allocated by migration will be reused + * in COLO process, so don't release them. + */ +if (runstate_check(RUN_STATE_POSTMIGRATE) && !migrate_colo_enabled()) { +qemu_savevm_state_cleanup(); +} vm_start(); } } -- 1.8.3.1
[Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer
Groundwork for supporting multiple TCG contexts. The core of this patch is this change to tcg/tcg.h: > -extern TCGContext tcg_ctx; > +extern TCGContext tcg_init_ctx; > +extern TCGContext *tcg_ctx; Note that for now we set *tcg_ctx to whatever TCGContext is passed to tcg_context_init -- in this case _init_ctx. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/gen-icount.h | 10 ++--- include/exec/helper-gen.h | 12 +++--- tcg/tcg-op.h | 80 +-- tcg/tcg.h | 15 +++ accel/tcg/translate-all.c | 97 ++- bsd-user/main.c | 2 +- linux-user/main.c | 2 +- target/alpha/translate.c | 2 +- target/arm/translate.c| 2 +- target/cris/translate.c | 2 +- target/cris/translate_v10.c | 2 +- target/hppa/translate.c | 2 +- target/i386/translate.c | 2 +- target/lm32/translate.c | 2 +- target/m68k/translate.c | 2 +- target/microblaze/translate.c | 2 +- target/mips/translate.c | 2 +- target/moxie/translate.c | 2 +- target/openrisc/translate.c | 2 +- target/ppc/translate.c| 2 +- target/s390x/translate.c | 2 +- target/sh4/translate.c| 2 +- target/sparc/translate.c | 2 +- target/tilegx/translate.c | 2 +- target/tricore/translate.c| 2 +- target/unicore32/translate.c | 2 +- target/xtensa/translate.c | 2 +- tcg/tcg-op.c | 58 +- tcg/tcg-runtime.c | 2 +- tcg/tcg.c | 21 +- 30 files changed, 171 insertions(+), 168 deletions(-) diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h index 48b566c..c58b0b2 100644 --- a/include/exec/gen-icount.h +++ b/include/exec/gen-icount.h @@ -19,7 +19,7 @@ static inline void gen_tb_start(TranslationBlock *tb) count = tcg_temp_new_i32(); } -tcg_gen_ld_i32(count, tcg_ctx.tcg_env, +tcg_gen_ld_i32(count, tcg_ctx->tcg_env, -ENV_OFFSET + offsetof(CPUState, icount_decr.u32)); if (tb_cflags(tb) & CF_USE_ICOUNT) { @@ -37,7 +37,7 @@ static inline void gen_tb_start(TranslationBlock *tb) tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label); if (tb_cflags(tb) & CF_USE_ICOUNT) { -tcg_gen_st16_i32(count, tcg_ctx.tcg_env, +tcg_gen_st16_i32(count, tcg_ctx->tcg_env, -ENV_OFFSET + offsetof(CPUState, icount_decr.u16.low)); } @@ -56,13 +56,13 @@ static inline void gen_tb_end(TranslationBlock *tb, int num_insns) tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED); /* Terminate the linked list. */ -tcg_ctx.gen_op_buf[tcg_ctx.gen_op_buf[0].prev].next = 0; +tcg_ctx->gen_op_buf[tcg_ctx->gen_op_buf[0].prev].next = 0; } static inline void gen_io_start(void) { TCGv_i32 tmp = tcg_const_i32(1); -tcg_gen_st_i32(tmp, tcg_ctx.tcg_env, +tcg_gen_st_i32(tmp, tcg_ctx->tcg_env, -ENV_OFFSET + offsetof(CPUState, can_do_io)); tcg_temp_free_i32(tmp); } @@ -70,7 +70,7 @@ static inline void gen_io_start(void) static inline void gen_io_end(void) { TCGv_i32 tmp = tcg_const_i32(0); -tcg_gen_st_i32(tmp, tcg_ctx.tcg_env, +tcg_gen_st_i32(tmp, tcg_ctx->tcg_env, -ENV_OFFSET + offsetof(CPUState, can_do_io)); tcg_temp_free_i32(tmp); } diff --git a/include/exec/helper-gen.h b/include/exec/helper-gen.h index 8239ffc..3bcb901 100644 --- a/include/exec/helper-gen.h +++ b/include/exec/helper-gen.h @@ -9,7 +9,7 @@ #define DEF_HELPER_FLAGS_0(name, flags, ret)\ static inline void glue(gen_helper_, name)(dh_retvar_decl0(ret))\ { \ - tcg_gen_callN(_ctx, HELPER(name), dh_retvar(ret), 0, NULL); \ + tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 0, NULL);\ } #define DEF_HELPER_FLAGS_1(name, flags, ret, t1)\ @@ -17,7 +17,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret) \ dh_arg_decl(t1, 1)) \ { \ TCGArg args[1] = { dh_arg(t1, 1) }; \ - tcg_gen_callN(_ctx, HELPER(name), dh_retvar(ret), 1, args); \ + tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 1, args);\ } #define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2)\ @@ -25,7 +25,7 @@ static inline void glue(gen_helper_, name)(dh_retvar_decl(ret) \ dh_arg_decl(t1, 1), dh_arg_decl(t2, 2)) \ { \ TCGArg args[2] = { dh_arg(t1, 1), dh_arg(t2, 2) };
Re: [Qemu-devel] [FIX PATCH] spapr: Fix QEMU abort during memory unplug
On Wed, Jul 19, 2017 at 02:24:09PM +0530, Bharata B Rao wrote: > Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to > sPAPRMachineState) introduced a new way to track pending LMBs of DIMM > device that is marked for removal. Since this commit we can hit the > assert in spapr_pending_dimm_unplugs_add() in the following situation: > > - DIMM device removal fails as the guest doesn't allow the removal. > - Subsequent attempt to remove the same DIMM would hit the assert > as the corresponding sPAPRDIMMState is still part of the > pending_dimm_unplugs list. > > Fix this by removing the assert and conditionally adding the > sPAPRDIMMState to pending_dimm_unplugs list only when it is not > already present. > > Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 > Signed-off-by: Bharata B RaoSounds like a reasonable change based on the rationale above. However, can you add a comment here explaining the situation in which the entry already exists. > --- > hw/ppc/spapr.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 1cb09e7..990bb2d 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2853,8 +2853,9 @@ static sPAPRDIMMState > *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, > static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, > sPAPRDIMMState *dimm_state) > { > -g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); > -QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next); > +if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { > +QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next); > +} > } > > static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr, -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext
This is a prerequisite for supporting multiple TCG contexts, since we will have threads generating code in separate regions of code_gen_buffer. For this we need a new field (.size) in struct tb_tc to keep track of the size of the translated code. This field uses a size_t to avoid adding a hole to the struct, although really an unsigned int would have been enough. The comparison function we use is optimized for the common case: insertions. Profiling shows that upon booting debian-arm, 98% of comparisons are between existing tb's (i.e. a->size and b->size are both !0), which happens during insertions (and removals, but those are rare). The remaining cases are lookups. From reading the glib sources we see that the first key is always the lookup key. However, the code does not assume this to always be the case because this behaviour is not guaranteed in the glib docs. However, we embed this knowledge in the code as a branch hint for the compiler. Note that tb_free does not free space in the code_gen_buffer anymore, since we cannot easily know whether the tb is the last one inserted in code_gen_buffer. The next patch in this series renames tb_free to tb_remove to reflect this. Performance-wise, lookups in tb_find_pc are the same as before: O(log n). However, insertions are O(log n) instead of O(1), which results in a small slowdown when booting debian-arm: Performance counter stats for 'build/arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::-:22 \ -device virtio-net-device,netdev=unet \ -drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel img/arm/aarch32-current-linux-kernel-only.img \ -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): - Before: 8048.598422 task-clock (msec) #0.931 CPUs utilized ( +- 0.28% ) 16,974 context-switches #0.002 M/sec ( +- 0.12% ) 0 cpu-migrations#0.000 K/sec 10,125 page-faults #0.001 M/sec ( +- 1.23% ) 35,144,901,879 cycles#4.367 GHz ( +- 0.14% ) stalled-cycles-frontend stalled-cycles-backend 65,758,252,643 instructions #1.87 insns per cycle ( +- 0.33% ) 10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% ) 192,322,212 branch-misses #1.77% of all branches ( +- 0.32% ) 8.640869419 seconds time elapsed ( +- 0.57% ) - After: 8146.242027 task-clock (msec) #0.923 CPUs utilized ( +- 1.23% ) 17,016 context-switches #0.002 M/sec ( +- 0.40% ) 0 cpu-migrations#0.000 K/sec 18,769 page-faults #0.002 M/sec ( +- 0.45% ) 35,660,956,120 cycles#4.378 GHz ( +- 1.22% ) stalled-cycles-frontend stalled-cycles-backend 65,095,366,607 instructions #1.83 insns per cycle ( +- 1.73% ) 10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% ) 195,601,289 branch-misses #1.81% of all branches ( +- 0.39% ) 8.828660235 seconds time elapsed ( +- 0.38% ) Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/exec-all.h | 5 ++ include/exec/tb-context.h | 4 +- accel/tcg/translate-all.c | 217 -- 3 files changed, 118 insertions(+), 108 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index bc4f41c..eb3eb7b 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -343,10 +343,15 @@ static inline void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr) /* * Translation Cache-related fields of a TB. + * This struct exists just for convenience; we keep track of TB's in a binary + * search tree, and the only fields needed to compare TB's in the tree are + * @ptr and @size. @search is brought here for consistency, since it is also + * a TC-related field. */ struct tb_tc { void *ptr;/* pointer to the translated code */ uint8_t *search; /* pointer to search data */ +size_t size; }; struct TranslationBlock { diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h index 25c2afe..1fa8dcc 100644 --- a/include/exec/tb-context.h +++ b/include/exec/tb-context.h
[Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu
This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Signed-off-by: Emilio G. Cota--- tcg/tcg.h | 7 ++- accel/tcg/translate-all.c | 2 +- cpus.c| 2 + linux-user/syscall.c | 1 + tcg/tcg.c | 141 -- 5 files changed, 143 insertions(+), 10 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index 3365da8..68cd14e 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -733,7 +733,7 @@ struct TCGContext { }; extern TCGContext tcg_init_ctx; -extern TCGContext *tcg_ctx; +extern __thread TCGContext *tcg_ctx; static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v) { @@ -755,7 +755,7 @@ static inline bool tcg_op_buf_full(void) /* pool based memory allocation */ -/* tb_lock must be held for tcg_malloc_internal. */ +/* user-mode: tb_lock must be held for tcg_malloc_internal. */ void *tcg_malloc_internal(TCGContext *s, int size); void tcg_pool_reset(TCGContext *s); TranslationBlock *tcg_tb_alloc(TCGContext *s); @@ -766,7 +766,7 @@ void tcg_region_reset_all(void); size_t tcg_code_size(void); size_t tcg_code_capacity(void); -/* Called with tb_lock held. */ +/* user-mode: Called with tb_lock held. */ static inline void *tcg_malloc(int size) { TCGContext *s = tcg_ctx; @@ -783,6 +783,7 @@ static inline void *tcg_malloc(int size) } void tcg_context_init(TCGContext *s); +void tcg_register_thread(void); void tcg_prologue_init(TCGContext *s); void tcg_func_start(TCGContext *s); diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 623b9e7..2e810b9 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -154,7 +154,7 @@ static void *l1_map[V_L1_MAX_SIZE]; /* code generation context */ TCGContext tcg_init_ctx; -TCGContext *tcg_ctx; +__thread TCGContext *tcg_ctx; TBContext tb_ctx; bool parallel_cpus; diff --git a/cpus.c b/cpus.c index 6022d40..74ddd49 100644 --- a/cpus.c +++ b/cpus.c @@ -1307,6 +1307,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg) CPUState *cpu = arg; rcu_register_thread(); +tcg_register_thread(); qemu_mutex_lock_iothread(); qemu_thread_get_self(cpu->thread); @@ -1454,6 +1455,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg) g_assert(!use_icount); rcu_register_thread(); +tcg_register_thread(); qemu_mutex_lock_iothread(); qemu_thread_get_self(cpu->thread); diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 003943b..bbf7913 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -6214,6 +6214,7 @@ static void *clone_func(void *arg) TaskState *ts; rcu_register_thread(); +tcg_register_thread(); env = info->env; cpu = ENV_GET_CPU(env); thread_cpu = cpu; diff --git a/tcg/tcg.c b/tcg/tcg.c index 22a949f..a5c01be 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -58,6 +58,7 @@ #include "elf.h" #include "exec/log.h" +#include "sysemu/sysemu.h" /*
[Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps
Groundwork for supporting multiple TCG contexts. While at it, also allocate temps_used directly as a bitmap of the required size, instead of having a bitmap of TCG_MAX_TEMPS via TCGTempSet. Performance-wise we lose about 2% in a translation-heavy workload such as booting+shutting down debian-arm: Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::-:22 \ -device virtio-net-device,netdev=unet \ -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 19489.126318 task-clock#0.960 CPUs utilized ( +- 0.96% ) 23,697 context-switches #0.001 M/sec ( +- 0.51% ) 1 CPU-migrations#0.000 M/sec 19,953 page-faults #0.001 M/sec ( +- 0.40% ) 56,214,402,410 cycles#2.884 GHz ( +- 0.95% ) [83.34%] 25,516,669,513 stalled-cycles-frontend # 45.39% frontend cycles idle ( +- 0.69% ) [83.33%] 17,266,165,747 stalled-cycles-backend# 30.71% backend cycles idle ( +- 0.59% ) [66.66%] 79,007,843,327 instructions #1.41 insns per cycle #0.32 stalled cycles per insn ( +- 1.19% ) [83.34%] 13,136,600,416 branches # 674.048 M/sec ( +- 1.29% ) [83.34%] 274,715,270 branch-misses #2.09% of all branches ( +- 0.79% ) [83.33%] 20.300335944 seconds time elapsed ( +- 0.55% ) After: 19917.737030 task-clock#0.955 CPUs utilized ( +- 0.74% ) 23,973 context-switches #0.001 M/sec ( +- 0.37% ) 1 CPU-migrations#0.000 M/sec 19,824 page-faults #0.001 M/sec ( +- 0.38% ) 57,380,269,537 cycles#2.881 GHz ( +- 0.70% ) [83.34%] 26,462,452,508 stalled-cycles-frontend # 46.12% frontend cycles idle ( +- 0.65% ) [83.34%] 17,970,546,047 stalled-cycles-backend# 31.32% backend cycles idle ( +- 0.64% ) [66.67%] 79,527,238,334 instructions #1.39 insns per cycle #0.33 stalled cycles per insn ( +- 0.79% ) [83.33%] 13,272,362,192 branches # 666.359 M/sec ( +- 0.83% ) [83.34%] 278,357,773 branch-misses #2.10% of all branches ( +- 0.65% ) [83.33%] 20.850558455 seconds time elapsed ( +- 0.55% ) That is, 2.70% slowdown. The perf difference shrinks a bit when using a high-performance allocator such as tcmalloc: Before: 19372.008814 task-clock#0.957 CPUs utilized ( +- 1.00% ) 23,621 context-switches #0.001 M/sec ( +- 0.50% ) 1 CPU-migrations#0.000 M/sec 13,289 page-faults #0.001 M/sec ( +- 1.46% ) 55,824,272,818 cycles#2.882 GHz ( +- 1.00% ) [83.33%] 25,284,946,453 stalled-cycles-frontend # 45.29% frontend cycles idle ( +- 1.12% ) [83.32%] 17,100,517,753 stalled-cycles-backend# 30.63% backend cycles idle ( +- 0.86% ) [66.69%] 78,193,046,990 instructions #1.40 insns per cycle #0.32 stalled cycles per insn ( +- 1.14% ) [83.35%] 12,986,014,194 branches # 670.349 M/sec ( +- 1.22% ) [83.34%] 272,581,789 branch-misses #2.10% of all branches ( +- 0.62% ) [83.33%] 20.249726404 seconds time elapsed ( +- 0.61% ) After: 19809.295886 task-clock#0.962 CPUs utilized ( +- 0.99% ) 23,894 context-switches #0.001 M/sec ( +- 0.50% ) 1 CPU-migrations#0.000 M/sec 12,927 page-faults #0.001 M/sec ( +- 0.78% ) 57,131,686,004 cycles#2.884 GHz ( +- 0.97% ) [83.34%] 25,965,120,001 stalled-cycles-frontend # 45.45% frontend cycles idle ( +- 0.71% ) [83.35%] 17,534,942,176 stalled-cycles-backend# 30.69% backend
[Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack
Groundwork for supporting multiple TCG contexts. Compile-tested for all targets on an x86_64 host. Suggested-by: Richard HendersonAcked-by: Richard Henderson Signed-off-by: Emilio G. Cota --- tcg/tci.c | 552 +++--- 1 file changed, 279 insertions(+), 273 deletions(-) diff --git a/tcg/tci.c b/tcg/tci.c index 4bdc645..f3216c1 100644 --- a/tcg/tci.c +++ b/tcg/tci.c @@ -55,93 +55,95 @@ typedef uint64_t (*helper_function)(tcg_target_ulong, tcg_target_ulong, tcg_target_ulong); #endif -static tcg_target_ulong tci_reg[TCG_TARGET_NB_REGS]; - -static tcg_target_ulong tci_read_reg(TCGReg index) +static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg index) { -tci_assert(index < ARRAY_SIZE(tci_reg)); -return tci_reg[index]; +tci_assert(index < TCG_TARGET_NB_REGS); +return regs[index]; } #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64 -static int8_t tci_read_reg8s(TCGReg index) +static int8_t tci_read_reg8s(const tcg_target_ulong *regs, TCGReg index) { -return (int8_t)tci_read_reg(index); +return (int8_t)tci_read_reg(regs, index); } #endif #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64 -static int16_t tci_read_reg16s(TCGReg index) +static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index) { -return (int16_t)tci_read_reg(index); +return (int16_t)tci_read_reg(regs, index); } #endif #if TCG_TARGET_REG_BITS == 64 -static int32_t tci_read_reg32s(TCGReg index) +static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index) { -return (int32_t)tci_read_reg(index); +return (int32_t)tci_read_reg(regs, index); } #endif -static uint8_t tci_read_reg8(TCGReg index) +static uint8_t tci_read_reg8(const tcg_target_ulong *regs, TCGReg index) { -return (uint8_t)tci_read_reg(index); +return (uint8_t)tci_read_reg(regs, index); } -static uint16_t tci_read_reg16(TCGReg index) +static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index) { -return (uint16_t)tci_read_reg(index); +return (uint16_t)tci_read_reg(regs, index); } -static uint32_t tci_read_reg32(TCGReg index) +static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index) { -return (uint32_t)tci_read_reg(index); +return (uint32_t)tci_read_reg(regs, index); } #if TCG_TARGET_REG_BITS == 64 -static uint64_t tci_read_reg64(TCGReg index) +static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index) { -return tci_read_reg(index); +return tci_read_reg(regs, index); } #endif -static void tci_write_reg(TCGReg index, tcg_target_ulong value) +static void +tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value) { -tci_assert(index < ARRAY_SIZE(tci_reg)); +tci_assert(index < TCG_TARGET_NB_REGS); tci_assert(index != TCG_AREG0); tci_assert(index != TCG_REG_CALL_STACK); -tci_reg[index] = value; +regs[index] = value; } #if TCG_TARGET_REG_BITS == 64 -static void tci_write_reg32s(TCGReg index, int32_t value) +static void +tci_write_reg32s(tcg_target_ulong *regs, TCGReg index, int32_t value) { -tci_write_reg(index, value); +tci_write_reg(regs, index, value); } #endif -static void tci_write_reg8(TCGReg index, uint8_t value) +static void tci_write_reg8(tcg_target_ulong *regs, TCGReg index, uint8_t value) { -tci_write_reg(index, value); +tci_write_reg(regs, index, value); } -static void tci_write_reg32(TCGReg index, uint32_t value) +static void +tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value) { -tci_write_reg(index, value); +tci_write_reg(regs, index, value); } #if TCG_TARGET_REG_BITS == 32 -static void tci_write_reg64(uint32_t high_index, uint32_t low_index, -uint64_t value) +static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index, +uint32_t low_index, uint64_t value) { -tci_write_reg(low_index, value); -tci_write_reg(high_index, value >> 32); +tci_write_reg(regs, low_index, value); +tci_write_reg(regs, high_index, value >> 32); } #elif TCG_TARGET_REG_BITS == 64 -static void tci_write_reg64(TCGReg index, uint64_t value) +static void +tci_write_reg64(tcg_target_ulong *regs, TCGReg index, uint64_t value) { -tci_write_reg(index, value); +tci_write_reg(regs, index, value); } #endif @@ -188,94 +190,97 @@ static uint64_t tci_read_i64(uint8_t **tb_ptr) #endif /* Read indexed register (native size) from bytecode. */ -static tcg_target_ulong tci_read_r(uint8_t **tb_ptr) +static tcg_target_ulong +tci_read_r(const tcg_target_ulong *regs, uint8_t **tb_ptr) { -tcg_target_ulong value = tci_read_reg(**tb_ptr); +tcg_target_ulong value = tci_read_reg(regs, **tb_ptr); *tb_ptr += 1; return value; } /* Read
[Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER
Will come in handy very soon. Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- tcg/tcg.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index 0ddd0dc..cb4ecbd 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -115,6 +115,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type, static void tcg_out_tb_init(TCGContext *s); static bool tcg_out_tb_finalize(TCGContext *s); +#define TCG_HIGHWATER 1024 + static TCGContext **tcg_ctxs; static unsigned int n_tcg_ctxs; @@ -435,7 +437,7 @@ void tcg_prologue_init(TCGContext *s) /* Compute a high-water mark, at which we voluntarily flush the buffer and start over. The size here is arbitrary, significantly larger than we expect the code generation for any one opcode to require. */ -s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024); +s->code_gen_highwater = s->code_gen_buffer + (total_size - TCG_HIGHWATER); tcg_register_jit(s->code_gen_buffer, total_size); -- 2.7.4
[Qemu-devel] [PATCH v3 17/43] target/s390x: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Signed-off-by: Emilio G. Cota--- target/s390x/helper.h | 4 +++ target/s390x/mem_helper.c | 80 +-- target/s390x/translate.c | 26 --- 3 files changed, 88 insertions(+), 22 deletions(-) diff --git a/target/s390x/helper.h b/target/s390x/helper.h index 4b02907..84a4597 100644 --- a/target/s390x/helper.h +++ b/target/s390x/helper.h @@ -34,7 +34,9 @@ DEF_HELPER_3(celgb, i64, env, i64, i32) DEF_HELPER_3(cdlgb, i64, env, i64, i32) DEF_HELPER_3(cxlgb, i64, env, i64, i32) DEF_HELPER_4(cdsg, void, env, i64, i32, i32) +DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32) DEF_HELPER_4(csst, i32, env, i32, i64, i64) +DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64) DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64) DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64) DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64) @@ -107,7 +109,9 @@ DEF_HELPER_FLAGS_1(popcnt, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_1(stfl, TCG_CALL_NO_RWG, void, env) DEF_HELPER_2(stfle, i32, env, i64) DEF_HELPER_FLAGS_2(lpq, TCG_CALL_NO_WG, i64, env, i64) +DEF_HELPER_FLAGS_2(lpq_parallel, TCG_CALL_NO_WG, i64, env, i64) DEF_HELPER_FLAGS_4(stpq, TCG_CALL_NO_WG, void, env, i64, i64, i64) +DEF_HELPER_FLAGS_4(stpq_parallel, TCG_CALL_NO_WG, void, env, i64, i64, i64) DEF_HELPER_4(mvcos, i32, env, i64, i64, i64) DEF_HELPER_4(cu12, i32, env, i32, i32, i32) DEF_HELPER_4(cu14, i32, env, i32, i32, i32) diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c index cdc78aa..74a2157 100644 --- a/target/s390x/mem_helper.c +++ b/target/s390x/mem_helper.c @@ -1363,8 +1363,8 @@ uint32_t HELPER(trXX)(CPUS390XState *env, uint32_t r1, uint32_t r2, return cc; } -void HELPER(cdsg)(CPUS390XState *env, uint64_t addr, - uint32_t r1, uint32_t r3) +static void do_cdsg(CPUS390XState *env, uint64_t addr, +uint32_t r1, uint32_t r3, bool parallel) { uintptr_t ra = GETPC(); Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]); @@ -1372,7 +1372,7 @@ void HELPER(cdsg)(CPUS390XState *env, uint64_t addr, Int128 oldv; bool fail; -if (parallel_cpus) { +if (parallel) { #ifndef CONFIG_ATOMIC128 cpu_loop_exit_atomic(ENV_GET_CPU(env), ra); #else @@ -1404,7 +1404,20 @@ void HELPER(cdsg)(CPUS390XState *env, uint64_t addr, env->regs[r1 + 1] = int128_getlo(oldv); } -uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2) +void HELPER(cdsg)(CPUS390XState *env, uint64_t addr, + uint32_t r1, uint32_t r3) +{ +do_cdsg(env, addr, r1, r3, false); +} + +void HELPER(cdsg_parallel)(CPUS390XState *env, uint64_t addr, + uint32_t r1, uint32_t r3) +{ +do_cdsg(env, addr, r1, r3, true); +} + +static uint32_t do_csst(CPUS390XState *env, uint32_t r3, uint64_t a1, +uint64_t a2, bool parallel) { #if !defined(CONFIG_USER_ONLY) || defined(CONFIG_ATOMIC128) uint32_t mem_idx = cpu_mmu_index(env, false); @@ -1440,7 +1453,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2) the complete operation is not. Therefore we do not need to assert serial context in order to implement this. That said, restart early if we can't support either operation that is supposed to be atomic. */ -if (parallel_cpus) { +if (parallel) { int mask = 0; #if !defined(CONFIG_ATOMIC64) mask = -8; @@ -1464,7 +1477,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2) uint32_t cv = env->regs[r3]; uint32_t ov; -if (parallel_cpus) { +if (parallel) { #ifdef CONFIG_USER_ONLY uint32_t *haddr = g2h(a1); ov = atomic_cmpxchg__nocheck(haddr, cv, nv); @@ -1487,7 +1500,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2) uint64_t cv = env->regs[r3]; uint64_t ov; -if (parallel_cpus) { +if (parallel) { #ifdef CONFIG_ATOMIC64 # ifdef CONFIG_USER_ONLY uint64_t *haddr = g2h(a1); @@ -1497,7 +1510,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2) ov = helper_atomic_cmpxchgq_be_mmu(env, a1, cv, nv, oi, ra); # endif #else -/* Note that we asserted !parallel_cpus above. */ +/* Note that we asserted !parallel above. */ g_assert_not_reached(); #endif } else { @@ -1517,13 +1530,13 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t a2) Int128 cv = int128_make128(env->regs[r3 + 1], env->regs[r3]); Int128 ov; -if (parallel_cpus) { +
[Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer
This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=3984 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=3360 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Signed-off-by: Emilio G. Cota--- tcg/tcg.h | 6 ++ accel/tcg/translate-all.c | 63 +--- bsd-user/main.c | 1 + cpus.c| 12 +++ linux-user/main.c | 1 + tcg/tcg.c | 183 +- 6 files changed, 221 insertions(+), 45 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index 3611141..3365da8 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -760,6 +760,12 @@ void *tcg_malloc_internal(TCGContext *s, int size); void tcg_pool_reset(TCGContext *s); TranslationBlock *tcg_tb_alloc(TCGContext *s); +void tcg_region_init(void); +void tcg_region_reset_all(void); + +size_t tcg_code_size(void); +size_t tcg_code_capacity(void); + /* Called with tb_lock held. */ static inline void *tcg_malloc(int size) { diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index e930bac..623b9e7 100644 --- a/accel/tcg/translate-all.c
[Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's
This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- tcg/tcg.h | 38 +--- accel/tcg/translate-all.c | 23 +- tcg/tcg.c | 110 ++ 3 files changed, 126 insertions(+), 45 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index f83f9b0..3611141 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -641,6 +641,26 @@ QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14)); /* Make sure that we don't overflow 64 bits without noticing. */ QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8); +typedef struct TCGProfile { +int64_t tb_count1; +int64_t tb_count; +int64_t op_count; /* total insn count */ +int op_count_max; /* max insn per TB */ +int64_t temp_count; +int temp_count_max; +int64_t del_op_count; +int64_t code_in_len; +int64_t code_out_len; +int64_t search_out_len; +int64_t interm_time; +int64_t code_time; +int64_t la_time; +int64_t opt_time; +int64_t restore_count; +int64_t restore_time; +int64_t table_op_count[NB_OPS]; +} TCGProfile; + struct TCGContext { uint8_t *pool_cur, *pool_end; TCGPool *pool_first, *pool_current, *pool_first_large; @@ -665,23 +685,7 @@ struct TCGContext { tcg_insn_unit *code_ptr; #ifdef CONFIG_PROFILER -/* profiling info */ -int64_t tb_count1; -int64_t tb_count; -int64_t op_count; /* total insn count */ -int op_count_max; /* max insn per TB */ -int64_t temp_count; -int temp_count_max; -int64_t del_op_count; -int64_t code_in_len; -int64_t code_out_len; -int64_t search_out_len; -int64_t interm_time; -int64_t code_time; -int64_t la_time; -int64_t opt_time; -int64_t restore_count; -int64_t restore_time; +TCGProfile prof; #endif #ifdef CONFIG_DEBUG_TCG diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index e6ee4e3..36b17ac 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -312,6 +312,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, uint8_t *p = tb->tc.search; int i, j, num_insns = tb->icount; #ifdef CONFIG_PROFILER +TCGProfile *prof = _ctx->prof; int64_t ti = profile_getclock(); #endif @@ -346,8 +347,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, restore_state_to_opc(env, tb, data); #ifdef CONFIG_PROFILER -tcg_ctx->restore_time += profile_getclock() - ti; -tcg_ctx->restore_count++; +atomic_set(>restore_time, +prof->restore_time + profile_getclock() - ti); +atomic_set(>restore_count, prof->restore_count + 1); #endif return 0; } @@ -1302,6 +1304,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_insn_unit *gen_code_buf; int gen_code_size, search_size; #ifdef CONFIG_PROFILER +TCGProfile *prof = _ctx->prof; int64_t ti; #endif assert_memory_lock(); @@ -1332,8 +1335,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_ctx->cf_parallel = !!(cflags & CF_PARALLEL); #ifdef CONFIG_PROFILER -tcg_ctx->tb_count1++; /* includes aborted translations because of - exceptions */ +/* includes aborted translations because of exceptions */ +atomic_set(>tb_count1, prof->tb_count1 + 1); ti = profile_getclock(); #endif @@ -1358,8 +1361,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu, #endif #ifdef CONFIG_PROFILER -tcg_ctx->tb_count++; -tcg_ctx->interm_time += profile_getclock() - ti; +atomic_set(>tb_count, prof->tb_count + 1); +atomic_set(>interm_time, prof->interm_time + profile_getclock() - ti); ti = profile_getclock(); #endif @@ -1379,10 +1382,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tb->tc.size = gen_code_size; #ifdef CONFIG_PROFILER -tcg_ctx->code_time += profile_getclock() - ti; -tcg_ctx->code_in_len += tb->size; -tcg_ctx->code_out_len += gen_code_size; -tcg_ctx->search_out_len += search_size; +atomic_set(>code_time, prof->code_time + profile_getclock() - ti); +atomic_set(>code_in_len, prof->code_in_len + tb->size); +atomic_set(>code_out_len, prof->code_out_len + gen_code_size); +atomic_set(>search_out_len, prof->search_out_len + search_size); #endif #ifdef DEBUG_DISAS diff
[Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb)
Convert all existing readers of tb->cflags to tb_cflags, so that we use atomic_read and therefore avoid undefined behaviour in C11. Note that the remaining setters/getters of the field are protected by tb_lock, and therefore do not need conversion. Luckily all readers access the field via 'tb->cflags' (so no foo.cflags, bar->cflags in the code base), which makes the conversion easily scriptable: FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h | \ cut -f 1 -d':' | sort | uniq) perl -pi -e 's/([^>])tb->cflags/$1tb_cflags(tb)/g' $FILES perl -pi -e 's/([a-z]*)->tb->cflags/tb_cflags($1->tb)/g' $FILES Then manually fixed the few errors that checkpatch reported. Compile-tested for all targets. Suggested-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/gen-icount.h | 8 +++ target/alpha/translate.c | 12 +- target/arm/translate-a64.c| 13 +- target/arm/translate.c| 10 target/cris/translate.c | 6 ++--- target/hppa/translate.c | 8 +++ target/i386/translate.c | 55 ++- target/lm32/translate.c | 14 +-- target/m68k/translate.c | 6 ++--- target/microblaze/translate.c | 6 ++--- target/mips/translate.c | 26 ++-- target/moxie/translate.c | 2 +- target/nios2/translate.c | 6 ++--- target/openrisc/translate.c | 6 ++--- target/ppc/translate.c| 6 ++--- target/ppc/translate_init.c | 32 - target/s390x/translate.c | 8 +++ target/sh4/translate.c| 6 ++--- target/sparc/translate.c | 6 ++--- target/tilegx/translate.c | 2 +- target/tricore/translate.c| 2 +- target/unicore32/translate.c | 6 ++--- target/xtensa/translate.c | 28 +++--- 23 files changed, 138 insertions(+), 136 deletions(-) diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h index 9b3cb14..48b566c 100644 --- a/include/exec/gen-icount.h +++ b/include/exec/gen-icount.h @@ -13,7 +13,7 @@ static inline void gen_tb_start(TranslationBlock *tb) TCGv_i32 count, imm; exitreq_label = gen_new_label(); -if (tb->cflags & CF_USE_ICOUNT) { +if (tb_cflags(tb) & CF_USE_ICOUNT) { count = tcg_temp_local_new_i32(); } else { count = tcg_temp_new_i32(); @@ -22,7 +22,7 @@ static inline void gen_tb_start(TranslationBlock *tb) tcg_gen_ld_i32(count, tcg_ctx.tcg_env, -ENV_OFFSET + offsetof(CPUState, icount_decr.u32)); -if (tb->cflags & CF_USE_ICOUNT) { +if (tb_cflags(tb) & CF_USE_ICOUNT) { imm = tcg_temp_new_i32(); /* We emit a movi with a dummy immediate argument. Keep the insn index * of the movi so that we later (when we know the actual insn count) @@ -36,7 +36,7 @@ static inline void gen_tb_start(TranslationBlock *tb) tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label); -if (tb->cflags & CF_USE_ICOUNT) { +if (tb_cflags(tb) & CF_USE_ICOUNT) { tcg_gen_st16_i32(count, tcg_ctx.tcg_env, -ENV_OFFSET + offsetof(CPUState, icount_decr.u16.low)); } @@ -46,7 +46,7 @@ static inline void gen_tb_start(TranslationBlock *tb) static inline void gen_tb_end(TranslationBlock *tb, int num_insns) { -if (tb->cflags & CF_USE_ICOUNT) { +if (tb_cflags(tb) & CF_USE_ICOUNT) { /* Update the num_insn immediate parameter now that we know * the actual insn count. */ tcg_set_insn_param(icount_start_insn_idx, 1, num_insns); diff --git a/target/alpha/translate.c b/target/alpha/translate.c index 9e98312..f97a8e5 100644 --- a/target/alpha/translate.c +++ b/target/alpha/translate.c @@ -484,9 +484,9 @@ static bool in_superpage(DisasContext *ctx, int64_t addr) static bool use_exit_tb(DisasContext *ctx) { -return ((ctx->tb->cflags & CF_LAST_IO) +return (tb_cflags(ctx->tb) & CF_LAST_IO) || ctx->singlestep_enabled -|| singlestep); +|| singlestep; } static bool use_goto_tb(DisasContext *ctx, uint64_t dest) @@ -2430,7 +2430,7 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn) case 0xC000: /* RPCC */ va = dest_gpr(ctx, ra); -if (ctx->tb->cflags & CF_USE_ICOUNT) { +if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) { gen_io_start(); gen_helper_load_pcc(va, cpu_env); gen_io_end(); @@ -2998,7 +2998,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb) TCGV_UNUSED_I64(ctx.lit); num_insns = 0; -max_insns = tb->cflags & CF_COUNT_MASK; +max_insns = tb_cflags(tb) & CF_COUNT_MASK; if (max_insns == 0) { max_insns = CF_COUNT_MASK; } @@ -3028,7 +3028,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
[Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing
This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota--- include/exec/exec-all.h | 20 +++- include/exec/tb-hash-xx.h | 9 ++--- include/exec/tb-hash.h| 4 ++-- include/exec/tb-lookup.h | 6 +++--- tcg/tcg.h | 1 - accel/tcg/cpu-exec.c | 45 +++-- accel/tcg/translate-all.c | 13 + exec.c| 2 +- tcg/tcg-runtime.c | 2 +- tests/qht-bench.c | 2 +- 10 files changed, 65 insertions(+), 39 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 256b9a6..0af0485 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -353,6 +353,9 @@ struct TranslationBlock { #define CF_USE_ICOUNT 0x2 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */ #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */ +#define CF_PARALLEL0x10 /* Generate code for a parallel context */ +/* cflags' mask for hashing/comparison */ +#define CF_HASH_MASK (CF_PARALLEL) /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate; @@ -396,11 +399,26 @@ struct TranslationBlock { uintptr_t jmp_list_first; }; +extern bool parallel_cpus; + +/* Hide the atomic_read to make code a little easier on the eyes */ +static inline uint32_t tb_cflags(const TranslationBlock *tb) +{ +return atomic_read(>cflags); +} + +/* current cflags for hashing/comparison */ +static inline uint32_t curr_cflags(void) +{ +return parallel_cpus ? CF_PARALLEL : 0; +} + void tb_free(TranslationBlock *tb); void tb_flush(CPUState *cpu); void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, - target_ulong cs_base, uint32_t flags); + target_ulong cs_base, uint32_t flags, + uint32_t cf_mask); #if defined(USE_DIRECT_JUMP) diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h index 6cd3022..747a9a6 100644 --- a/include/exec/tb-hash-xx.h +++ b/include/exec/tb-hash-xx.h @@ -48,8 +48,8 @@ * xxhash32, customized for input variables that are not guaranteed to be * contiguous in memory. */ -static inline -uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) +static inline uint32_t +tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g) { uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2; uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2; @@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) v4 *= PRIME32_1; h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18); -h32 += 24; +h32 += 28; h32 += e * PRIME32_3; h32 = rol32(h32, 17) * PRIME32_4; @@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) h32 += f * PRIME32_3; h32 = rol32(h32, 17) * PRIME32_4; +h32 += g * PRIME32_3; +h32 = rol32(h32, 17) * PRIME32_4; + h32 ^= h32 >> 15; h32 *= PRIME32_2; h32 ^= h32 >> 13; diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 17b5ee0..0526c4f 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -59,9 +59,9 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags, - uint32_t trace_vcpu_dstate) + uint32_t cf_mask, uint32_t trace_vcpu_dstate) { -return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate); +return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate); } #endif diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h index 436b6d5..2961385 100644 --- a/include/exec/tb-lookup.h +++ b/include/exec/tb-lookup.h @@ -21,7 +21,7 @@ /* Might cause an exception, so have a longjmp destination ready */ static inline TranslationBlock * tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base, - uint32_t *flags) + uint32_t *flags, uint32_t cf_mask) { CPUArchState *env = (CPUArchState *)cpu->env_ptr; TranslationBlock *tb; @@ -35,10 +35,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base, tb->cs_base == *cs_base && tb->flags == *flags && tb->trace_vcpu_dstate == *cpu->trace_dstate && - !(atomic_read(>cflags) & CF_INVALID))) { +
[Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h
These only depend on the host and therefore belong in the common osdep, not in a target-dependent object. While at it, query the host during an init constructor, which guarantees the page size will be well-defined throughout the execution of the program. Suggested-by: Richard HendersonReviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/exec/cpu-all.h | 2 -- include/qemu/osdep.h | 6 ++ exec.c | 4 util/pagesize.c| 18 ++ util/Makefile.objs | 1 + 5 files changed, 25 insertions(+), 6 deletions(-) create mode 100644 util/pagesize.c diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index ffe43d5..778031c 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -229,8 +229,6 @@ extern int target_page_bits; /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even * when intptr_t is 32-bit and we are aligning a long long. */ -extern uintptr_t qemu_real_host_page_size; -extern intptr_t qemu_real_host_page_mask; extern uintptr_t qemu_host_page_size; extern intptr_t qemu_host_page_mask; diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 3b74f6f..0cba871 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -481,6 +481,12 @@ char *qemu_get_pid_name(pid_t pid); */ pid_t qemu_fork(Error **errp); +/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even + * when intptr_t is 32-bit and we are aligning a long long. + */ +extern uintptr_t qemu_real_host_page_size; +extern intptr_t qemu_real_host_page_mask; + extern int qemu_icache_linesize; extern int qemu_dcache_linesize; diff --git a/exec.c b/exec.c index 94b0f3e..6e85535 100644 --- a/exec.c +++ b/exec.c @@ -121,8 +121,6 @@ int use_icount; uintptr_t qemu_host_page_size; intptr_t qemu_host_page_mask; -uintptr_t qemu_real_host_page_size; -intptr_t qemu_real_host_page_mask; bool set_preferred_target_page_bits(int bits) { @@ -3621,8 +3619,6 @@ void page_size_init(void) { /* NOTE: we can always suppose that qemu_host_page_size >= TARGET_PAGE_SIZE */ -qemu_real_host_page_size = getpagesize(); -qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size; if (qemu_host_page_size == 0) { qemu_host_page_size = qemu_real_host_page_size; } diff --git a/util/pagesize.c b/util/pagesize.c new file mode 100644 index 000..998632c --- /dev/null +++ b/util/pagesize.c @@ -0,0 +1,18 @@ +/* + * pagesize.c - query the host about its page size + * + * Copyright (C) 2017, Emilio G. Cota + * License: GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" + +uintptr_t qemu_real_host_page_size; +intptr_t qemu_real_host_page_mask; + +static void __attribute__((constructor)) init_real_host_page_size(void) +{ +qemu_real_host_page_size = getpagesize(); +qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size; +} diff --git a/util/Makefile.objs b/util/Makefile.objs index 50a55ec..2973b0a 100644 --- a/util/Makefile.objs +++ b/util/Makefile.objs @@ -40,6 +40,7 @@ util-obj-y += buffer.o util-obj-y += timed-average.o util-obj-y += base64.o util-obj-y += log.o +util-obj-y += pagesize.o util-obj-y += qdist.o util-obj-y += qht.o util-obj-y += range.o -- 2.7.4
[Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers
The helpers require the address and size to be page-aligned, so do that before calling them. Signed-off-by: Emilio G. Cota--- accel/tcg/translate-all.c | 61 ++- 1 file changed, 13 insertions(+), 48 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 36b17ac..e930bac 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -604,63 +604,24 @@ static inline void *split_cross_256mb(void *buf1, size_t size1) static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE] __attribute__((aligned(CODE_GEN_ALIGN))); -# ifdef _WIN32 -static inline void do_protect(void *addr, long size, int prot) -{ -DWORD old_protect; -VirtualProtect(addr, size, prot, _protect); -} - -static inline void map_exec(void *addr, long size) -{ -do_protect(addr, size, PAGE_EXECUTE_READWRITE); -} - -static inline void map_none(void *addr, long size) -{ -do_protect(addr, size, PAGE_NOACCESS); -} -# else -static inline void do_protect(void *addr, long size, int prot) -{ -uintptr_t start, end; - -start = (uintptr_t)addr; -start &= qemu_real_host_page_mask; - -end = (uintptr_t)addr + size; -end = ROUND_UP(end, qemu_real_host_page_size); - -mprotect((void *)start, end - start, prot); -} - -static inline void map_exec(void *addr, long size) -{ -do_protect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC); -} - -static inline void map_none(void *addr, long size) -{ -do_protect(addr, size, PROT_NONE); -} -# endif /* WIN32 */ - static inline void *alloc_code_gen_buffer(void) { void *buf = static_code_gen_buffer; +void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer); size_t full_size, size; -/* The size of the buffer, rounded down to end on a page boundary. */ -full_size = (((uintptr_t)buf + sizeof(static_code_gen_buffer)) - & qemu_real_host_page_mask) - (uintptr_t)buf; +/* page-align the beginning and end of the buffer */ +buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size); +end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size); /* Reserve a guard page. */ +full_size = end - buf; size = full_size - qemu_real_host_page_size; /* Honor a command-line option limiting the size of the buffer. */ if (size > tcg_ctx->code_gen_buffer_size) { -size = (((uintptr_t)buf + tcg_ctx->code_gen_buffer_size) -& qemu_real_host_page_mask) - (uintptr_t)buf; +size = QEMU_ALIGN_DOWN(tcg_ctx->code_gen_buffer_size, + qemu_real_host_page_size); } tcg_ctx->code_gen_buffer_size = size; @@ -671,8 +632,12 @@ static inline void *alloc_code_gen_buffer(void) } #endif -map_exec(buf, size); -map_none(buf + size, qemu_real_host_page_size); +if (qemu_mprotect_rwx(buf, size)) { +abort(); +} +if (qemu_mprotect_none(buf + size, qemu_real_host_page_size)) { +abort(); +} qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE); return buf; -- 2.7.4
[Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext
Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/gen-icount.h | 7 +++ tcg/tcg.h | 2 ++ 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h index c58b0b2..fe80176 100644 --- a/include/exec/gen-icount.h +++ b/include/exec/gen-icount.h @@ -6,13 +6,12 @@ /* Helpers for instruction counting code generation. */ static int icount_start_insn_idx; -static TCGLabel *exitreq_label; static inline void gen_tb_start(TranslationBlock *tb) { TCGv_i32 count, imm; -exitreq_label = gen_new_label(); +tcg_ctx->exitreq_label = gen_new_label(); if (tb_cflags(tb) & CF_USE_ICOUNT) { count = tcg_temp_local_new_i32(); } else { @@ -34,7 +33,7 @@ static inline void gen_tb_start(TranslationBlock *tb) tcg_temp_free_i32(imm); } -tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label); +tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label); if (tb_cflags(tb) & CF_USE_ICOUNT) { tcg_gen_st16_i32(count, tcg_ctx->tcg_env, @@ -52,7 +51,7 @@ static inline void gen_tb_end(TranslationBlock *tb, int num_insns) tcg_set_insn_param(icount_start_insn_idx, 1, num_insns); } -gen_set_label(exitreq_label); +gen_set_label(tcg_ctx->exitreq_label); tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED); /* Terminate the linked list. */ diff --git a/tcg/tcg.h b/tcg/tcg.h index c88746d..f83f9b0 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -712,6 +712,8 @@ struct TCGContext { /* The TCGBackendData structure is private to tcg-target.inc.c. */ struct TCGBackendData *be; +TCGLabel *exitreq_label; + TCGTempSet free_temps[TCG_TYPE_COUNT * 2]; TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */ -- 2.7.4
[Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext
Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/tb-context.h | 2 ++ tcg/tcg.h | 2 -- accel/tcg/cpu-exec.c | 2 +- accel/tcg/translate-all.c | 57 +++ linux-user/main.c | 6 ++--- 5 files changed, 34 insertions(+), 35 deletions(-) diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h index 1fa8dcc..1d41202 100644 --- a/include/exec/tb-context.h +++ b/include/exec/tb-context.h @@ -41,4 +41,6 @@ struct TBContext { int tb_phys_invalidate_count; }; +extern TBContext tb_ctx; + #endif diff --git a/tcg/tcg.h b/tcg/tcg.h index 9b6dade..22f7ecd 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -707,8 +707,6 @@ struct TCGContext { /* Threshold to flush the translated code buffer. */ void *code_gen_highwater; -TBContext tb_ctx; - /* Track which vCPU triggers events */ CPUState *cpu; /* *_trans */ TCGv_env tcg_env; /* *_exec */ diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 1963bda..f42096a 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -325,7 +325,7 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, phys_pc = get_page_addr_code(desc.env, pc); desc.phys_page1 = phys_pc & TARGET_PAGE_MASK; h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate); -return qht_lookup(_ctx.tb_ctx.htable, tb_cmp, , h); +return qht_lookup(_ctx.htable, tb_cmp, , h); } static inline TranslationBlock *tb_find(CPUState *cpu, diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index d50e2b9..5509407 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -154,6 +154,7 @@ static void *l1_map[V_L1_MAX_SIZE]; /* code generation context */ TCGContext tcg_ctx; +TBContext tb_ctx; bool parallel_cpus; /* translation block context */ @@ -185,7 +186,7 @@ static void page_table_config_init(void) void tb_lock(void) { assert_tb_unlocked(); -qemu_mutex_lock(_ctx.tb_ctx.tb_lock); +qemu_mutex_lock(_ctx.tb_lock); have_tb_lock++; } @@ -193,13 +194,13 @@ void tb_unlock(void) { assert_tb_locked(); have_tb_lock--; -qemu_mutex_unlock(_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(_ctx.tb_lock); } void tb_lock_reset(void) { if (have_tb_lock) { -qemu_mutex_unlock(_ctx.tb_ctx.tb_lock); +qemu_mutex_unlock(_ctx.tb_lock); have_tb_lock = 0; } } @@ -826,15 +827,15 @@ static inline void code_gen_alloc(size_t tb_size) fprintf(stderr, "Could not allocate dynamic translator buffer\n"); exit(1); } -tcg_ctx.tb_ctx.tb_tree = g_tree_new(tb_tc_cmp); -qemu_mutex_init(_ctx.tb_ctx.tb_lock); +tb_ctx.tb_tree = g_tree_new(tb_tc_cmp); +qemu_mutex_init(_ctx.tb_lock); } static void tb_htable_init(void) { unsigned int mode = QHT_MODE_AUTO_RESIZE; -qht_init(_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode); +qht_init(_ctx.htable, CODE_GEN_HTABLE_SIZE, mode); } /* Must be called before using the QEMU cpus. 'tb_size' is the size @@ -878,7 +879,7 @@ void tb_remove(TranslationBlock *tb) { assert_tb_locked(); -g_tree_remove(tcg_ctx.tb_ctx.tb_tree, >tc); +g_tree_remove(tb_ctx.tb_tree, >tc); } static inline void invalidate_page_bitmap(PageDesc *p) @@ -940,15 +941,15 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count) /* If it is already been done on request of another CPU, * just retry. */ -if (tcg_ctx.tb_ctx.tb_flush_count != tb_flush_count.host_int) { +if (tb_ctx.tb_flush_count != tb_flush_count.host_int) { goto done; } if (DEBUG_TB_FLUSH_GATE) { -size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree); +size_t nb_tbs = g_tree_nnodes(tb_ctx.tb_tree); size_t host_size = 0; -g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_host_size_iter, _size); +g_tree_foreach(tb_ctx.tb_tree, tb_host_size_iter, _size); printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n", tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs, nb_tbs > 0 ? host_size / nb_tbs : 0); @@ -963,17 +964,16 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count) } /* Increment the refcount first so that destroy acts as a reset */ -g_tree_ref(tcg_ctx.tb_ctx.tb_tree); -g_tree_destroy(tcg_ctx.tb_ctx.tb_tree); +g_tree_ref(tb_ctx.tb_tree); +g_tree_destroy(tb_ctx.tb_tree); -qht_reset_size(_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE); +qht_reset_size(_ctx.htable, CODE_GEN_HTABLE_SIZE); page_flush_tb(); tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer; /* XXX: flush processor icache at this point if cache flush is
[Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none
Signed-off-by: Emilio G. Cota--- include/qemu/osdep.h | 2 ++ util/osdep.c | 41 + 2 files changed, 43 insertions(+) diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index 0cba871..2c7d7db 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -348,6 +348,8 @@ void sigaction_invoke(struct sigaction *action, #endif int qemu_madvise(void *addr, size_t len, int advice); +int qemu_mprotect_rwx(void *addr, size_t size); +int qemu_mprotect_none(void *addr, size_t size); int qemu_open(const char *name, int flags, ...); int qemu_close(int fd); diff --git a/util/osdep.c b/util/osdep.c index a2863c8..f72d679 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -81,6 +81,47 @@ int qemu_madvise(void *addr, size_t len, int advice) #endif } +static int qemu_mprotect__osdep(void *addr, size_t size, int prot) +{ +g_assert(!((uintptr_t)addr & ~qemu_real_host_page_mask)); +g_assert(!(size & ~qemu_real_host_page_mask)); + +#ifdef _WIN32 +DWORD old_protect; + +if (!VirtualProtect(addr, size, prot, _protect)) { +error_report("%s: VirtualProtect failed with error code %d", + __func__, GetLastError()); +return -1; +} +return 0; +#else +if (mprotect(addr, size, prot)) { +error_report("%s: mprotect failed: %s", __func__, strerror(errno)); +return -1; +} +return 0; +#endif +} + +int qemu_mprotect_rwx(void *addr, size_t size) +{ +#ifdef _WIN32 +return qemu_mprotect__osdep(addr, size, PAGE_EXECUTE_READWRITE); +#else +return qemu_mprotect__osdep(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC); +#endif +} + +int qemu_mprotect_none(void *addr, size_t size) +{ +#ifdef _WIN32 +return qemu_mprotect__osdep(addr, size, PAGE_NOACCESS); +#else +return qemu_mprotect__osdep(addr, size, PROT_NONE); +#endif +} + #ifndef _WIN32 /* * Dups an fd and sets the flags -- 2.7.4
[Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT
And fix the following warning when DEBUG_TB_INVALIDATE is enabled in translate-all.c: CC mipsn32-linux-user/accel/tcg/translate-all.o /data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’: /data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=] printf("protecting code page: 0x" TARGET_FMT_lx "\n", ^ cc1: all warnings being treated as errors /data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed make[1]: *** [accel/tcg/translate-all.o] Error 1 Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed make: *** [subdir-mipsn32-linux-user] Error 2 cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$ Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/exec-all.h | 2 ++ accel/tcg/translate-all.c | 3 +-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 0af0485..00f7da8 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -31,8 +31,10 @@ type. */ #if defined(CONFIG_USER_ONLY) typedef abi_ulong tb_page_addr_t; +#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx #else typedef ram_addr_t tb_page_addr_t; +#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT #endif /* DisasContext is_jmp field values diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index c1cd258..c4c23f9 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1194,8 +1194,7 @@ static inline void tb_alloc_page(TranslationBlock *tb, mprotect(g2h(page_addr), qemu_host_page_size, (prot & PAGE_BITS) & ~PAGE_WRITE); #ifdef DEBUG_TB_INVALIDATE -printf("protecting code page: 0x" TARGET_FMT_lx "\n", - page_addr); +printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr); #endif } #else -- 2.7.4
[Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb
In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/exec-all.h | 20 ++-- accel/tcg/cpu-exec.c | 6 +++--- accel/tcg/translate-all.c | 20 ++-- tcg/tcg-runtime.c | 4 ++-- tcg/tcg.c | 4 ++-- 5 files changed, 31 insertions(+), 23 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 00f7da8..bc4f41c 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -341,6 +341,14 @@ static inline void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr) #define USE_DIRECT_JUMP #endif +/* + * Translation Cache-related fields of a TB. + */ +struct tb_tc { +void *ptr;/* pointer to the translated code */ +uint8_t *search; /* pointer to search data */ +}; + struct TranslationBlock { target_ulong pc; /* simulated PC corresponding to this block (EIP + CS base) */ target_ulong cs_base; /* CS base for this block */ @@ -362,8 +370,8 @@ struct TranslationBlock { /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate; -void *tc_ptr;/* pointer to the translated code */ -uint8_t *tc_search; /* pointer to search data */ +struct tb_tc tc; + /* original tb when cflags has CF_NOCACHE */ struct TranslationBlock *orig_tb; /* first and second physical page containing code. The lower bit @@ -462,7 +470,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr) { uint16_t offset = tb->jmp_insn_offset[n]; -tb_set_jmp_target1((uintptr_t)(tb->tc_ptr + offset), addr); +tb_set_jmp_target1((uintptr_t)(tb->tc.ptr + offset), addr); } #else @@ -489,11 +497,11 @@ static inline void tb_add_jump(TranslationBlock *tb, int n, qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc, "Linking TBs %p [" TARGET_FMT_lx "] index %d -> %p [" TARGET_FMT_lx "]\n", - tb->tc_ptr, tb->pc, n, - tb_next->tc_ptr, tb_next->pc); + tb->tc.ptr, tb->pc, n, + tb_next->tc.ptr, tb_next->pc); /* patch the native jump address */ -tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr); +tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr); /* add in TB jmp circular list */ tb->jmp_list_next[n] = tb_next->jmp_list_first; diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 526cab3..cb1e6d3 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -143,11 +143,11 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb) uintptr_t ret; TranslationBlock *last_tb; int tb_exit; -uint8_t *tb_ptr = itb->tc_ptr; +uint8_t *tb_ptr = itb->tc.ptr; qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc, "Trace %p [%d: " TARGET_FMT_lx "] %s\n", - itb->tc_ptr, cpu->cpu_index, itb->pc, + itb->tc.ptr, cpu->cpu_index, itb->pc, lookup_symbol(itb->pc)); #if defined(DEBUG_DISAS) @@ -179,7 +179,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, TranslationBlock *itb) qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc, "Stopped execution of TB chain before %p [" TARGET_FMT_lx "] %s\n", - last_tb->tc_ptr, last_tb->pc, + last_tb->tc.ptr, last_tb->pc, lookup_symbol(last_tb->pc)); if (cc->synchronize_from_tb) { cc->synchronize_from_tb(cpu, last_tb); diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 845585b..0a2eb86 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -260,7 +260,7 @@ static target_long decode_sleb128(uint8_t **pp) which comes from the host pc of the end of the code implementing the insn. Each line of the table is encoded as sleb128 deltas from the previous - line. The seed for the first line is { tb->pc, 0..., tb->tc_ptr }. + line. The seed for the first line is { tb->pc, 0..., tb->tc.ptr }. That is, the first column is seeded with the guest pc, the last column with the host pc, and the middle columns with zeros. */ @@ -270,7 +270,7 @@ static int encode_search(TranslationBlock *tb, uint8_t *block) uint8_t *p = block; int i, j, n; -tb->tc_search = block; +tb->tc.search = block; for (i = 0, n = tb->icount; i < n; ++i) { target_ulong prev; @@ -305,9 +305,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
[Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state
This avoids duplicating code. cpu_exec_step will also use the new common function once we integrate parallel_cpus into tb->cflags. Note that in this commit we also fix a race, described by Richard Henderson during review. Think of this scenario with threads A and B: (A) Lookup succeeds for TB in hash without tb_lock (B) Sets the TB's tb->invalid flag (B) Removes the TB from tb_htable (B) Clears all CPU's tb_jmp_cache (A) Store TB into local tb_jmp_cache Given that order of events, (A) will keep executing that invalid TB until another flush of its tb_jmp_cache happens, which in theory might never happen. We can fix this by checking the tb->invalid flag every time we look up a TB from tb_jmp_cache, so that in the above scenario, next time we try to find that TB in tb_jmp_cache, we won't, and will therefore be forced to look it up in tb_htable. Performance-wise, I measured a small improvement when booting debian-arm. Note that inlining pays off: Performance counter stats for 'taskset -c 0 qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::-:22 \ -device virtio-net-device,netdev=unet \ -drive file=jessie.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): Before: 18714.917392 task-clock#0.952 CPUs utilized ( +- 0.95% ) 23,142 context-switches #0.001 M/sec ( +- 0.50% ) 1 CPU-migrations#0.000 M/sec 10,558 page-faults #0.001 M/sec ( +- 0.95% ) 53,957,727,252 cycles#2.883 GHz ( +- 0.91% ) [83.33%] 24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%] 16,495,714,424 stalled-cycles-backend# 30.57% backend cycles idle ( +- 0.95% ) [66.66%] 76,267,572,582 instructions #1.41 insns per cycle #0.32 stalled cycles per insn ( +- 0.87% ) [83.34%] 12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%] 263,486,879 branch-misses #2.08% of all branches ( +- 0.73% ) [83.34%] 19.648474449 seconds time elapsed ( +- 0.82% ) After, w/ inline (this patch): 18471.376627 task-clock#0.955 CPUs utilized ( +- 0.96% ) 23,048 context-switches #0.001 M/sec ( +- 0.48% ) 1 CPU-migrations#0.000 M/sec 10,708 page-faults #0.001 M/sec ( +- 0.81% ) 53,208,990,796 cycles#2.881 GHz ( +- 0.98% ) [83.34%] 23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%] 16,161,773,848 stalled-cycles-backend# 30.37% backend cycles idle ( +- 0.76% ) [66.67%] 75,786,269,766 instructions #1.42 insns per cycle #0.32 stalled cycles per insn ( +- 1.24% ) [83.34%] 12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%] 260,235,550 branch-misses #2.07% of all branches ( +- 0.66% ) [83.33%] 19.340502161 seconds time elapsed ( +- 0.56% ) After, w/o inline: 18791.253967 task-clock#0.954 CPUs utilized ( +- 0.78% ) 23,230 context-switches #0.001 M/sec ( +- 0.42% ) 1 CPU-migrations#0.000 M/sec 10,563 page-faults #0.001 M/sec ( +- 1.27% ) 54,168,674,622 cycles#2.883 GHz ( +- 0.80% ) [83.34%] 24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%] 16,288,648,572 stalled-cycles-backend# 30.07% backend cycles idle ( +- 0.95% ) [66.66%] 77,659,755,503 instructions #1.43 insns per cycle #0.31 stalled cycles per insn ( +- 0.97% ) [83.34%] 12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%] 261,962,386 branch-misses #2.03% of all branches ( +- 0.71% ) [83.35%] 19.700174670 seconds time elapsed ( +- 0.56% ) Reviewed-by: Richard Henderson
[Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE
This prevents bit rot by ensuring the debug code is compiled when building a user-mode target. Unfortunately the helpers are user-mode-only so we cannot fully get rid of the ifdef checks. Add a comment to explain this. Suggested-by: Alex BennéeReviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- accel/tcg/translate-all.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 962e9b3..845585b 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -82,6 +82,12 @@ #undef DEBUG_TB_CHECK #endif +#ifdef DEBUG_TB_CHECK +#define DEBUG_TB_CHECK_GATE 1 +#else +#define DEBUG_TB_CHECK_GATE 0 +#endif + /* Access to the various translations structures need to be serialised via locks * for consistency. This is automatic for SoftMMU based system * emulation due to its single threaded nature. In user-mode emulation @@ -950,7 +956,13 @@ void tb_flush(CPUState *cpu) } } -#ifdef DEBUG_TB_CHECK +/* + * Formerly ifdef DEBUG_TB_CHECK. These debug functions are user-mode-only, + * so in order to prevent bit rot we compile them unconditionally in user-mode, + * and let the optimizer get rid of them by wrapping their user-only callers + * with if (DEBUG_TB_CHECK_GATE). + */ +#ifdef CONFIG_USER_ONLY static void do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp) @@ -994,7 +1006,7 @@ static void tb_page_check(void) qht_iter(_ctx.tb_ctx.htable, do_tb_page_check, NULL); } -#endif +#endif /* CONFIG_USER_ONLY */ static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb) { @@ -1238,8 +1250,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, tb->trace_vcpu_dstate); qht_insert(_ctx.tb_ctx.htable, tb, h); -#ifdef DEBUG_TB_CHECK -tb_page_check(); +#ifdef CONFIG_USER_ONLY +if (DEBUG_TB_CHECK_GATE) { +tb_page_check(); +} #endif } @@ -2209,8 +2223,10 @@ int page_unprotect(target_ulong address, uintptr_t pc) /* and since the content will be modified, we must invalidate the corresponding translated code. */ current_tb_invalidated |= tb_invalidate_phys_page(addr, pc); -#ifdef DEBUG_TB_CHECK -tb_invalidate_check(addr); +#ifdef CONFIG_USER_ONLY +if (DEBUG_TB_CHECK_GATE) { +tb_invalidate_check(addr); +} #endif } mprotect((void *)g2h(host_start), qemu_host_page_size, -- 2.7.4
[Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's
Groundwork for supporting multiple TCG contexts. Note that having n_tcg_ctxs is unnecessary. However, it is convenient to have it, since it will simplify iterating over the array: we'll have just a for loop instead of having to iterate over a NULL-terminated array (which would require n+1 elems) or having to check with ifdef's for usermode/softmmu. Signed-off-by: Emilio G. Cota--- tcg/tcg.c | 4 1 file changed, 4 insertions(+) diff --git a/tcg/tcg.c b/tcg/tcg.c index f907c47..2217314 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -115,6 +115,8 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type, static void tcg_out_tb_init(TCGContext *s); static bool tcg_out_tb_finalize(TCGContext *s); +static TCGContext **tcg_ctxs; +static unsigned int n_tcg_ctxs; static TCGRegSet tcg_target_available_regs[2]; static TCGRegSet tcg_target_call_clobber_regs; @@ -382,6 +384,8 @@ void tcg_context_init(TCGContext *s) } tcg_ctx = s; +tcg_ctxs = _ctx; +n_tcg_ctxs = 1; } /* -- 2.7.4
[Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove
We don't really free anything in this function anymore; we just remove the TB from the binary search tree. Suggested-by: Alex BennéeReviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/exec/exec-all.h | 2 +- accel/tcg/cpu-exec.c | 2 +- accel/tcg/translate-all.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index eb3eb7b..7bc2050 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -428,7 +428,7 @@ static inline uint32_t curr_cflags(void) return parallel_cpus ? CF_PARALLEL : 0; } -void tb_free(TranslationBlock *tb); +void tb_remove(TranslationBlock *tb); void tb_flush(CPUState *cpu); void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index cb1e6d3..1963bda 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -218,7 +218,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles, tb_lock(); tb_phys_invalidate(tb, -1); -tb_free(tb); +tb_remove(tb); tb_unlock(); } #endif diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index cb71aef..448f13b 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -375,7 +375,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr) if (tb->cflags & CF_NOCACHE) { /* one-shot translation, invalidate it immediately */ tb_phys_invalidate(tb, -1); -tb_free(tb); +tb_remove(tb); } r = true; } @@ -874,7 +874,7 @@ static TranslationBlock *tb_alloc(target_ulong pc) } /* Called with tb_lock held. */ -void tb_free(TranslationBlock *tb) +void tb_remove(TranslationBlock *tb) { assert_tb_locked(); @@ -1809,7 +1809,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr) * cpu_exec_nocache() */ tb_phys_invalidate(tb->orig_tb, -1); } -tb_free(tb); +tb_remove(tb); } /* FIXME: In theory this could raise an exception. In practice we have already translated the block once so it's probably ok. */ -- 2.7.4
[Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size
Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the corresponding translated code") we are not fully utilizing code_gen_buffer for translated code, and therefore are incorrectly reporting the amount of translated code as well as the average host TB size. Address this by: - Making the conscious choice of misreporting the total translated code; doing otherwise would mislead users into thinking "-tb-size" is not honoured. - Expanding tb_tree_stats to accurately count the bytes of translated code on the host, and using this for reporting the average tb host size, as well as the expansion ratio. In the future we might want to consider reporting the accurate numbers for the total translated code, together with a "bookkeeping/overhead" field to account for the TB structs. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- accel/tcg/translate-all.c | 32 +++- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 448f13b..d50e2b9 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -923,6 +923,15 @@ static void page_flush_tb(void) } } +static gboolean tb_host_size_iter(gpointer key, gpointer value, gpointer data) +{ +const TranslationBlock *tb = value; +size_t *size = data; + +*size += tb->tc.size; +return false; +} + /* flush all the translation blocks */ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count) { @@ -937,11 +946,12 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count) if (DEBUG_TB_FLUSH_GATE) { size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree); +size_t host_size = 0; -printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%td\n", +g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_host_size_iter, _size); +printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n", tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs, - nb_tbs > 0 ? - (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) / nb_tbs : 0); + nb_tbs > 0 ? host_size / nb_tbs : 0); } if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) > tcg_ctx.code_gen_buffer_size) { @@ -1883,6 +1893,7 @@ static void print_qht_statistics(FILE *f, fprintf_function cpu_fprintf, } struct tb_tree_stats { +size_t host_size; size_t target_size; size_t max_target_size; size_t direct_jmp_count; @@ -1895,6 +1906,7 @@ static gboolean tb_tree_stats_iter(gpointer key, gpointer value, gpointer data) const TranslationBlock *tb = value; struct tb_tree_stats *tst = data; +tst->host_size += tb->tc.size; tst->target_size += tb->size; if (tb->size > tst->max_target_size) { tst->max_target_size = tb->size; @@ -1923,6 +1935,11 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf) g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_tree_stats_iter, ); /* XXX: avoid using doubles ? */ cpu_fprintf(f, "Translation buffer state:\n"); +/* + * Report total code size including the padding and TB structs; + * otherwise users might think "-tb-size" is not honoured. + * For avg host size we use the precise numbers from tb_tree_stats though. + */ cpu_fprintf(f, "gen code size %td/%zd\n", tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer); @@ -1930,12 +1947,9 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf) cpu_fprintf(f, "TB avg target size %zu max=%zu bytes\n", nb_tbs ? tst.target_size / nb_tbs : 0, tst.max_target_size); -cpu_fprintf(f, "TB avg host size%td bytes (expansion ratio: %0.1f)\n", -nb_tbs ? (tcg_ctx.code_gen_ptr - - tcg_ctx.code_gen_buffer) / nb_tbs : 0, -tst.target_size ? (double) (tcg_ctx.code_gen_ptr - -tcg_ctx.code_gen_buffer) / -tst.target_size : 0); +cpu_fprintf(f, "TB avg host size%zu bytes (expansion ratio: %0.1f)\n", +nb_tbs ? tst.host_size / nb_tbs : 0, +tst.target_size ? (double)tst.host_size / tst.target_size : 0); cpu_fprintf(f, "cross page TB count %zu (%zu%%)\n", tst.cross_page, nb_tbs ? (tst.cross_page * 100) / nb_tbs : 0); cpu_fprintf(f, "direct jump count %zu (%zu%%) (2 jumps=%zu %zu%%)\n", -- 2.7.4
[Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Signed-off-by: Emilio G. Cota--- target/sh4/translate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/sh4/translate.c b/target/sh4/translate.c index 9fcaefd..52fabb3 100644 --- a/target/sh4/translate.c +++ b/target/sh4/translate.c @@ -528,7 +528,7 @@ static void _decode_opc(DisasContext * ctx) /* Detect the start of a gUSA region. If so, update envflags and end the TB. This will allow us to see the end of the region (stored in R0) in the next TB. */ -if (B11_8 == 15 && B7_0s < 0 && parallel_cpus) { +if (B11_8 == 15 && B7_0s < 0 && (tb_cflags(ctx->tb) & CF_PARALLEL)) { ctx->envflags = deposit32(ctx->envflags, GUSA_SHIFT, 8, B7_0s); ctx->bstate = BS_STOP; } -- 2.7.4
[Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic
Now that all code generation has been converted to check CF_PARALLEL, we can generate !CF_PARALLEL code without having yet set !parallel_cpus -- and therefore without having to be in the exclusive region during cpu_exec_step_atomic. While at it, merge cpu_exec_step into cpu_exec_step_atomic. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- accel/tcg/cpu-exec.c | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index b71e015..526cab3 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -223,30 +223,40 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles, } #endif -static void cpu_exec_step(CPUState *cpu) +void cpu_exec_step_atomic(CPUState *cpu) { CPUClass *cc = CPU_GET_CLASS(cpu); TranslationBlock *tb; target_ulong cs_base, pc; uint32_t flags; uint32_t cflags = 1 | CF_IGNORE_ICOUNT; +uint32_t cf_mask = cflags & CF_HASH_MASK; if (sigsetjmp(cpu->jmp_env, 0) == 0) { -tb = tb_lookup__cpu_state(cpu, , _base, , - cflags & CF_HASH_MASK); +tb = tb_lookup__cpu_state(cpu, , _base, , cf_mask); if (tb == NULL) { mmap_lock(); tb_lock(); -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); +tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); +if (likely(tb == NULL)) { +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); +} tb_unlock(); mmap_unlock(); } +start_exclusive(); + +/* Since we got here, we know that parallel_cpus must be true. */ +parallel_cpus = false; cc->cpu_exec_enter(cpu); /* execute the generated code */ trace_exec_tb(tb, pc); cpu_tb_exec(cpu, tb); cc->cpu_exec_exit(cpu); +parallel_cpus = true; + +end_exclusive(); } else { /* We may have exited due to another problem here, so we need * to reset any tb_locks we may have taken but didn't release. @@ -260,18 +270,6 @@ static void cpu_exec_step(CPUState *cpu) } } -void cpu_exec_step_atomic(CPUState *cpu) -{ -start_exclusive(); - -/* Since we got here, we know that parallel_cpus must be true. */ -parallel_cpus = false; -cpu_exec_step(cpu); -parallel_cpus = true; - -end_exclusive(); -} - struct tb_desc { target_ulong pc; target_ulong cs_base; -- 2.7.4
[Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation
Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/exec-all.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 87b1b74..69c1b36 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -370,7 +370,7 @@ struct TranslationBlock { /* The following data are used to directly call another TB from * the code of this one. This can be done either by emitting direct or * indirect native jump instructions. These jumps are reset so that the TB - * just continue its execution. The TB can be linked to another one by + * just continues its execution. The TB can be linked to another one by * setting one of the jump targets (or patching the jump instruction). Only * two of such jumps are supported. */ @@ -381,7 +381,7 @@ struct TranslationBlock { #else uintptr_t jmp_target_addr[2]; /* target address for indirect jump */ #endif -/* Each TB has an assosiated circular list of TBs jumping to this one. +/* Each TB has an associated circular list of TBs jumping to this one. * jmp_list_first points to the first TB jumping to this one. * jmp_list_next is used to point to the next TB in a list. * Since each TB can have two jumps, it can participate in two lists. -- 2.7.4
[Qemu-devel] [PATCH v3 20/43] tcg: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. The tb->cflags field is not passed to tcg generation functions. So we add a bit to TCGContext, storing there whether CF_PARALLEL is set before translating every TB. Most architectures have <= 32 registers, which results in a 4-byte hole in TCGContext. Use this hole for the bit we need, which we store in a bool. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- tcg/tcg.h | 1 + accel/tcg/translate-all.c | 1 + tcg/tcg-op.c | 10 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index 96872f8..9b6dade 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -656,6 +656,7 @@ struct TCGContext { uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP */ TCGRegSet reserved_regs; +bool cf_parallel; /* whether CF_PARALLEL is set in tb->cflags */ intptr_t current_frame_offset; intptr_t frame_start; intptr_t frame_end; diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 600c0a1..645bc70 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1271,6 +1271,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tb->flags = flags; tb->cflags = cflags; tb->trace_vcpu_dstate = *cpu->trace_dstate; +tcg_ctx.cf_parallel = !!(cflags & CF_PARALLEL); #ifdef CONFIG_PROFILER tcg_ctx.tb_count1++; /* includes aborted translations because of diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c index 205d07f..ef420d4 100644 --- a/tcg/tcg-op.c +++ b/tcg/tcg-op.c @@ -150,7 +150,7 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2, void tcg_gen_mb(TCGBar mb_type) { -if (parallel_cpus) { +if (tcg_ctx.cf_parallel) { tcg_gen_op1(_ctx, INDEX_op_mb, mb_type); } } @@ -2794,7 +2794,7 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, TCGv_i32 cmpv, { memop = tcg_canonicalize_memop(memop, 0, 0); -if (!parallel_cpus) { +if (!tcg_ctx.cf_parallel) { TCGv_i32 t1 = tcg_temp_new_i32(); TCGv_i32 t2 = tcg_temp_new_i32(); @@ -2838,7 +2838,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, TCGv_i64 cmpv, { memop = tcg_canonicalize_memop(memop, 1, 0); -if (!parallel_cpus) { +if (!tcg_ctx.cf_parallel) { TCGv_i64 t1 = tcg_temp_new_i64(); TCGv_i64 t2 = tcg_temp_new_i64(); @@ -3015,7 +3015,7 @@ static void * const table_##NAME[16] = { \ void tcg_gen_atomic_##NAME##_i32\ (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \ { \ -if (parallel_cpus) {\ +if (tcg_ctx.cf_parallel) { \ do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME); \ } else {\ do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,\ @@ -3025,7 +3025,7 @@ void tcg_gen_atomic_##NAME##_i32 \ void tcg_gen_atomic_##NAME##_i64\ (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \ { \ -if (parallel_cpus) {\ +if (tcg_ctx.cf_parallel) { \ do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME); \ } else {\ do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,\ -- 2.7.4
[Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE
This gets rid of an ifdef check while ensuring that the debug code is compiled, which prevents bit rot. Suggested-by: Alex BennéeReviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- accel/tcg/translate-all.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index c4c23f9..962e9b3 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -65,6 +65,12 @@ /* make various TB consistency checks */ /* #define DEBUG_TB_CHECK */ +#ifdef DEBUG_TB_INVALIDATE +#define DEBUG_TB_INVALIDATE_GATE 1 +#else +#define DEBUG_TB_INVALIDATE_GATE 0 +#endif + #ifdef DEBUG_TB_FLUSH #define DEBUG_TB_FLUSH_GATE 1 #else @@ -1193,9 +1199,9 @@ static inline void tb_alloc_page(TranslationBlock *tb, } mprotect(g2h(page_addr), qemu_host_page_size, (prot & PAGE_BITS) & ~PAGE_WRITE); -#ifdef DEBUG_TB_INVALIDATE -printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr); -#endif +if (DEBUG_TB_INVALIDATE_GATE) { +printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr); +} } #else /* if some code is already present, then the pages are already -- 2.7.4
[Qemu-devel] [PATCH v3 14/43] target/hppa: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- target/hppa/helper.h| 2 ++ target/hppa/op_helper.c | 32 target/hppa/translate.c | 12 ++-- 3 files changed, 40 insertions(+), 6 deletions(-) diff --git a/target/hppa/helper.h b/target/hppa/helper.h index 789f07f..0a6b900 100644 --- a/target/hppa/helper.h +++ b/target/hppa/helper.h @@ -3,7 +3,9 @@ DEF_HELPER_FLAGS_2(tsv, TCG_CALL_NO_WG, void, env, tl) DEF_HELPER_FLAGS_2(tcond, TCG_CALL_NO_WG, void, env, tl) DEF_HELPER_FLAGS_3(stby_b, TCG_CALL_NO_WG, void, env, tl, tl) +DEF_HELPER_FLAGS_3(stby_b_parallel, TCG_CALL_NO_WG, void, env, tl, tl) DEF_HELPER_FLAGS_3(stby_e, TCG_CALL_NO_WG, void, env, tl, tl) +DEF_HELPER_FLAGS_3(stby_e_parallel, TCG_CALL_NO_WG, void, env, tl, tl) DEF_HELPER_FLAGS_1(probe_r, TCG_CALL_NO_RWG_SE, tl, tl) DEF_HELPER_FLAGS_1(probe_w, TCG_CALL_NO_RWG_SE, tl, tl) diff --git a/target/hppa/op_helper.c b/target/hppa/op_helper.c index c05c0d5..3104404 100644 --- a/target/hppa/op_helper.c +++ b/target/hppa/op_helper.c @@ -76,7 +76,8 @@ static void atomic_store_3(CPUHPPAState *env, target_ulong addr, uint32_t val, #endif } -void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val) +static void do_stby_b(CPUHPPAState *env, target_ulong addr, target_ulong val, + bool parallel) { uintptr_t ra = GETPC(); @@ -89,7 +90,7 @@ void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val) break; case 1: /* The 3 byte store must appear atomic. */ -if (parallel_cpus) { +if (parallel) { atomic_store_3(env, addr, val, 0x00ffu, ra); } else { cpu_stb_data_ra(env, addr, val >> 16, ra); @@ -102,14 +103,26 @@ void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val) } } -void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val) +void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val) +{ +do_stby_b(env, addr, val, false); +} + +void HELPER(stby_b_parallel)(CPUHPPAState *env, target_ulong addr, + target_ulong val) +{ +do_stby_b(env, addr, val, true); +} + +static void do_stby_e(CPUHPPAState *env, target_ulong addr, target_ulong val, + bool parallel) { uintptr_t ra = GETPC(); switch (addr & 3) { case 3: /* The 3 byte store must appear atomic. */ -if (parallel_cpus) { +if (parallel) { atomic_store_3(env, addr - 3, val, 0xff00u, ra); } else { cpu_stw_data_ra(env, addr - 3, val >> 16, ra); @@ -132,6 +145,17 @@ void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val) } } +void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val) +{ +do_stby_e(env, addr, val, false); +} + +void HELPER(stby_e_parallel)(CPUHPPAState *env, target_ulong addr, + target_ulong val) +{ +do_stby_e(env, addr, val, true); +} + target_ulong HELPER(probe_r)(target_ulong addr) { return page_check_range(addr, 1, PAGE_READ); diff --git a/target/hppa/translate.c b/target/hppa/translate.c index 1effe82..66aa11d 100644 --- a/target/hppa/translate.c +++ b/target/hppa/translate.c @@ -2309,9 +2309,17 @@ static ExitStatus trans_stby(DisasContext *ctx, uint32_t insn, val = load_gpr(ctx, rt); if (a) { -gen_helper_stby_e(cpu_env, addr, val); +if (tb_cflags(ctx->tb) & CF_PARALLEL) { +gen_helper_stby_e_parallel(cpu_env, addr, val); +} else { +gen_helper_stby_e(cpu_env, addr, val); +} } else { -gen_helper_stby_b(cpu_env, addr, val); +if (tb_cflags(ctx->tb) & CF_PARALLEL) { +gen_helper_stby_b_parallel(cpu_env, addr, val); +} else { +gen_helper_stby_b(cpu_env, addr, val); +} } if (m) { -- 2.7.4
[Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- target/arm/helper-a64.h| 4 target/arm/helper-a64.c| 38 -- target/arm/op_helper.c | 7 --- target/arm/translate-a64.c | 31 +-- target/arm/translate.c | 9 +++-- 5 files changed, 68 insertions(+), 21 deletions(-) diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h index 6f9eaba..85d8674 100644 --- a/target/arm/helper-a64.h +++ b/target/arm/helper-a64.h @@ -43,4 +43,8 @@ DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, f64, env) DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32) DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32) DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64) +DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL_NO_WG, + i64, env, i64, i64, i64) DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64) +DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG, + i64, env, i64, i64, i64) diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c index d9df82c..d0e435c 100644 --- a/target/arm/helper-a64.c +++ b/target/arm/helper-a64.c @@ -430,8 +430,9 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, uint32_t bytes) } /* Returns 0 on success; 1 otherwise. */ -uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr, - uint64_t new_lo, uint64_t new_hi) +static uint64_t do_paired_cmpxchg64_le(CPUARMState *env, uint64_t addr, + uint64_t new_lo, uint64_t new_hi, + bool parallel) { uintptr_t ra = GETPC(); Int128 oldv, cmpv, newv; @@ -440,7 +441,7 @@ uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr, cmpv = int128_make128(env->exclusive_val, env->exclusive_high); newv = int128_make128(new_lo, new_hi); -if (parallel_cpus) { +if (parallel) { #ifndef CONFIG_ATOMIC128 cpu_loop_exit_atomic(ENV_GET_CPU(env), ra); #else @@ -484,8 +485,21 @@ uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr, return !success; } -uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr, - uint64_t new_lo, uint64_t new_hi) +uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr, + uint64_t new_lo, uint64_t new_hi) +{ +return do_paired_cmpxchg64_le(env, addr, new_lo, new_hi, false); +} + +uint64_t HELPER(paired_cmpxchg64_le_parallel)(CPUARMState *env, uint64_t addr, + uint64_t new_lo, uint64_t new_hi) +{ +return do_paired_cmpxchg64_le(env, addr, new_lo, new_hi, true); +} + +static uint64_t do_paired_cmpxchg64_be(CPUARMState *env, uint64_t addr, + uint64_t new_lo, uint64_t new_hi, + bool parallel) { uintptr_t ra = GETPC(); Int128 oldv, cmpv, newv; @@ -494,7 +508,7 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr, cmpv = int128_make128(env->exclusive_val, env->exclusive_high); newv = int128_make128(new_lo, new_hi); -if (parallel_cpus) { +if (parallel) { #ifndef CONFIG_ATOMIC128 cpu_loop_exit_atomic(ENV_GET_CPU(env), ra); #else @@ -537,3 +551,15 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr, return !success; } + +uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr, + uint64_t new_lo, uint64_t new_hi) +{ +return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, false); +} + +uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMState *env, uint64_t addr, + uint64_t new_lo, uint64_t new_hi) +{ +return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, true); +} diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c index 2a85666..a28f254 100644 --- a/target/arm/op_helper.c +++ b/target/arm/op_helper.c @@ -450,13 +450,6 @@ void HELPER(yield)(CPUARMState *env) ARMCPU *cpu = arm_env_get_cpu(env); CPUState *cs = CPU(cpu); -/* When running in MTTCG we don't generate jumps to the yield and - * WFE helpers as it won't affect the scheduling of other vCPUs. - * If we wanted to more completely model WFE/SEV so we don't busy - * spin unnecessarily we would need to do something more involved. - */ -g_assert(!parallel_cpus); - /* This is a non-trappable hint instruction that generally indicates * that the guest is currently busy-looping. Yield control back to the * top level loop so that
[Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE
This gets rid of some ifdef checks while ensuring that the debug code is compiled, which prevents bit rot. Suggested-by: Alex BennéeReviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- accel/tcg/translate-all.c | 20 +--- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 645bc70..c1cd258 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -65,6 +65,12 @@ /* make various TB consistency checks */ /* #define DEBUG_TB_CHECK */ +#ifdef DEBUG_TB_FLUSH +#define DEBUG_TB_FLUSH_GATE 1 +#else +#define DEBUG_TB_FLUSH_GATE 0 +#endif + #if !defined(CONFIG_USER_ONLY) /* TB consistency checks only implemented for usermode emulation. */ #undef DEBUG_TB_CHECK @@ -899,13 +905,13 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count) goto done; } -#if defined(DEBUG_TB_FLUSH) -printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n", - (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer), - tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ? - ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)) / - tcg_ctx.tb_ctx.nb_tbs : 0); -#endif +if (DEBUG_TB_FLUSH_GATE) { +printf("qemu: flush code_size=%td nb_tbs=%d avg_tb_size=%td\n", + tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, + tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ? + (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) / + tcg_ctx.tb_ctx.nb_tbs : 0); +} if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) > tcg_ctx.code_gen_buffer_size) { cpu_abort(cpu, "Internal error: code buffer overflow\n"); -- 2.7.4
[Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags
This gets rid of a hole in struct TranslationBlock. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- include/exec/exec-all.h | 3 +-- include/exec/tb-lookup.h | 2 +- accel/tcg/cpu-exec.c | 4 ++-- accel/tcg/translate-all.c | 3 +-- 4 files changed, 5 insertions(+), 7 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 69c1b36..256b9a6 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -352,12 +352,11 @@ struct TranslationBlock { #define CF_NOCACHE 0x1 /* To be freed after execution */ #define CF_USE_ICOUNT 0x2 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */ +#define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */ /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate; -uint16_t invalid; - void *tc_ptr;/* pointer to the translated code */ uint8_t *tc_search; /* pointer to search data */ /* original tb when cflags has CF_NOCACHE */ diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h index 9d32cb0..436b6d5 100644 --- a/include/exec/tb-lookup.h +++ b/include/exec/tb-lookup.h @@ -35,7 +35,7 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base, tb->cs_base == *cs_base && tb->flags == *flags && tb->trace_vcpu_dstate == *cpu->trace_dstate && - !atomic_read(>invalid))) { + !(atomic_read(>cflags) & CF_INVALID))) { return tb; } tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags); diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 5d2ee5b..fae8c40 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -294,7 +294,7 @@ static bool tb_cmp(const void *p, const void *d) tb->cs_base == desc->cs_base && tb->flags == desc->flags && tb->trace_vcpu_dstate == desc->trace_vcpu_dstate && -!atomic_read(>invalid)) { +!(atomic_read(>cflags) & CF_INVALID)) { /* check next page if needed */ if (tb->page_addr[1] == -1) { return true; @@ -377,7 +377,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu, tb_lock(); acquired_tb_lock = true; } -if (!tb->invalid) { +if (!(tb->cflags & CF_INVALID)) { tb_add_jump(last_tb, tb_exit, tb); } } diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index a124181..7ef4f19 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1073,7 +1073,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr) assert_tb_locked(); -atomic_set(>invalid, true); +atomic_set(>cflags, tb->cflags | CF_INVALID); /* remove the TB from the hash list */ phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK); @@ -1269,7 +1269,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tb->flags = flags; tb->cflags = cflags; tb->trace_vcpu_dstate = *cpu->trace_dstate; -tb->invalid = false; #ifdef CONFIG_PROFILER tcg_ctx.tb_count1++; /* includes aborted translations because of -- 2.7.4
[Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush
Whenever there is an overflow in code_gen_buffer (e.g. we run out of space in it and have to flush it), the code_time profiling counter ends up with an invalid value (that is, code_time -= profile_getclock(), without later on getting += profile_getclock() due to the goto). Fix it by using the ti variable, so that we only update code_time when there is no overflow. Note that in case there is an overflow we fail to account for the elapsed coding time, but this is quite rare so we can probably live with it. "info jit" before/after, roughly at the same time during debian-arm bootup: - before: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 998 JIT cycles -615191529184601 (-256329.804 s at 2.4 GHz) translated TBs 302310 (aborted=0 0.0%) avg ops/TB 48.4 max=438 deleted ops/TB 8.54 avg temps/TB32.31 max=38 avg host code/TB361.5 avg search data/TB 24.5 cycles/op -42014693.0 cycles/in byte -121444900.2 cycles/out byte -5629031.1 cycles/search byte -83114481.0 gen_interm time -0.0% gen_code time 100.0% optim./code time-0.0% liveness/code time -0.0% cpu_restore count 6236 avg cycles110.4 - after: Statistics: TB flush count 1 TB invalidate count 4665 TLB flush count 1010 JIT cycles 1996899624 (0.832 s at 2.4 GHz) translated TBs 297961 (aborted=0 0.0%) avg ops/TB 48.5 max=438 deleted ops/TB 8.56 avg temps/TB32.31 max=38 avg host code/TB361.8 avg search data/TB 24.5 cycles/op 138.2 cycles/in byte 398.4 cycles/out byte 18.5 cycles/search byte 273.1 gen_interm time 14.0% gen_code time 86.0% optim./code time19.4% liveness/code time 10.3% cpu_restore count 6372 avg cycles111.0 Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Emilio G. Cota --- accel/tcg/translate-all.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 3ee69e5..63f8538 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1300,7 +1300,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, #ifdef CONFIG_PROFILER tcg_ctx.tb_count++; tcg_ctx.interm_time += profile_getclock() - ti; -tcg_ctx.code_time -= profile_getclock(); +ti = profile_getclock(); #endif /* ??? Overflow could be handled better here. In particular, we @@ -1318,7 +1318,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, } #ifdef CONFIG_PROFILER -tcg_ctx.code_time += profile_getclock(); +tcg_ctx.code_time += profile_getclock() - ti; tcg_ctx.code_in_len += tb->size; tcg_ctx.code_out_len += gen_code_size; tcg_ctx.search_out_len += search_size; -- 2.7.4
[Qemu-devel] [PATCH v3 16/43] target/m68k: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Signed-off-by: Emilio G. Cota--- target/m68k/helper.h| 1 + target/m68k/op_helper.c | 33 - target/m68k/translate.c | 12 ++-- 3 files changed, 31 insertions(+), 15 deletions(-) diff --git a/target/m68k/helper.h b/target/m68k/helper.h index 475a1f2..eebe52d 100644 --- a/target/m68k/helper.h +++ b/target/m68k/helper.h @@ -11,6 +11,7 @@ DEF_HELPER_2(set_sr, void, env, i32) DEF_HELPER_3(movec, void, env, i32, i32) DEF_HELPER_4(cas2w, void, env, i32, i32, i32) DEF_HELPER_4(cas2l, void, env, i32, i32, i32) +DEF_HELPER_4(cas2l_parallel, void, env, i32, i32, i32) #define dh_alias_fp ptr #define dh_ctype_fp FPReg * diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c index 7b5126c..6308951 100644 --- a/target/m68k/op_helper.c +++ b/target/m68k/op_helper.c @@ -361,6 +361,7 @@ void HELPER(divsll)(CPUM68KState *env, int numr, int regr, int32_t den) env->dregs[numr] = quot; } +/* We're executing in a serial context -- no need to be atomic. */ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) { uint32_t Dc1 = extract32(regs, 9, 3); @@ -374,17 +375,11 @@ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) int16_t l1, l2; uintptr_t ra = GETPC(); -if (parallel_cpus) { -/* Tell the main loop we need to serialize this insn. */ -cpu_loop_exit_atomic(ENV_GET_CPU(env), ra); -} else { -/* We're executing in a serial context -- no need to be atomic. */ -l1 = cpu_lduw_data_ra(env, a1, ra); -l2 = cpu_lduw_data_ra(env, a2, ra); -if (l1 == c1 && l2 == c2) { -cpu_stw_data_ra(env, a1, u1, ra); -cpu_stw_data_ra(env, a2, u2, ra); -} +l1 = cpu_lduw_data_ra(env, a1, ra); +l2 = cpu_lduw_data_ra(env, a2, ra); +if (l1 == c1 && l2 == c2) { +cpu_stw_data_ra(env, a1, u1, ra); +cpu_stw_data_ra(env, a2, u2, ra); } if (c1 != l1) { @@ -399,7 +394,8 @@ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) env->dregs[Dc2] = deposit32(env->dregs[Dc2], 0, 16, l2); } -void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) +static void do_cas2l(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2, + bool parallel) { uint32_t Dc1 = extract32(regs, 9, 3); uint32_t Dc2 = extract32(regs, 6, 3); @@ -416,7 +412,7 @@ void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) TCGMemOpIdx oi; #endif -if (parallel_cpus) { +if (parallel) { /* We're executing in a parallel context -- must be atomic. */ #ifdef CONFIG_ATOMIC64 uint64_t c, u, l; @@ -470,6 +466,17 @@ void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) env->dregs[Dc2] = l2; } +void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2) +{ +do_cas2l(env, regs, a1, a2, false); +} + +void HELPER(cas2l_parallel)(CPUM68KState *env, uint32_t regs, uint32_t a1, +uint32_t a2) +{ +do_cas2l(env, regs, a1, a2, true); +} + struct bf_data { uint32_t addr; uint32_t bofs; diff --git a/target/m68k/translate.c b/target/m68k/translate.c index 188520b..65044be 100644 --- a/target/m68k/translate.c +++ b/target/m68k/translate.c @@ -2308,7 +2308,11 @@ DISAS_INSN(cas2w) (REG(ext1, 6) << 3) | (REG(ext2, 0) << 6) | (REG(ext1, 0) << 9)); -gen_helper_cas2w(cpu_env, regs, addr1, addr2); +if (tb_cflags(s->tb) & CF_PARALLEL) { +gen_helper_exit_atomic(cpu_env); +} else { +gen_helper_cas2w(cpu_env, regs, addr1, addr2); +} tcg_temp_free(regs); /* Note that cas2w also assigned to env->cc_op. */ @@ -2354,7 +2358,11 @@ DISAS_INSN(cas2l) (REG(ext1, 6) << 3) | (REG(ext2, 0) << 6) | (REG(ext1, 0) << 9)); -gen_helper_cas2l(cpu_env, regs, addr1, addr2); +if (tb_cflags(s->tb) & CF_PARALLEL) { +gen_helper_cas2l_parallel(cpu_env, regs, addr1, addr2); +} else { +gen_helper_cas2l(cpu_env, regs, addr1, addr2); +} tcg_temp_free(regs); /* Note that cas2l also assigned to env->cc_op. */ -- 2.7.4
[Qemu-devel] [PATCH v3 19/43] target/sparc: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- target/sparc/translate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/sparc/translate.c b/target/sparc/translate.c index 39d8494..768ce68 100644 --- a/target/sparc/translate.c +++ b/target/sparc/translate.c @@ -2450,7 +2450,7 @@ static void gen_ldstub_asi(DisasContext *dc, TCGv dst, TCGv addr, int insn) default: /* ??? In theory, this should be raise DAE_invalid_asi. But the SS-20 roms do ldstuba [%l0] #ASI_M_CTL, %o1. */ -if (parallel_cpus) { +if (tb_cflags(dc->tb) & CF_PARALLEL) { gen_helper_exit_atomic(cpu_env); } else { TCGv_i32 r_asi = tcg_const_i32(da.asi); -- 2.7.4
[Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG
Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried the increment of tlb_flush_count under TLB_DEBUG. This results in "info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG. Besides, under MTTCG tlb_flush_count is updated by several threads, so in order not to lose counts we'd either have to use atomic ops or distribute the counter, which is more scalable. This patch does the latter by embedding tlb_flush_count in CPUArchState. The global count is then easily obtained by iterating over the CPU list. Note that this change also requires updating the accessors to tlb_flush_count to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to it. Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 1 + include/exec/cputlb.h | 3 +-- accel/tcg/cputlb.c| 17 ++--- accel/tcg/translate-all.c | 2 +- 4 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h index bc8e7f8..e43ff83 100644 --- a/include/exec/cpu-defs.h +++ b/include/exec/cpu-defs.h @@ -137,6 +137,7 @@ typedef struct CPUIOTLBEntry { CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE]; \ CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];\ CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE]; \ +size_t tlb_flush_count; \ target_ulong tlb_flush_addr;\ target_ulong tlb_flush_mask;\ target_ulong vtlb_index;\ diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h index 3f94178..c91db21 100644 --- a/include/exec/cputlb.h +++ b/include/exec/cputlb.h @@ -23,7 +23,6 @@ /* cputlb.c */ void tlb_protect_code(ram_addr_t ram_addr); void tlb_unprotect_code(ram_addr_t ram_addr); -extern int tlb_flush_count; - +size_t tlb_flush_count(void); #endif #endif diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 85635ae..9377110 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -92,8 +92,18 @@ static void flush_all_helper(CPUState *src, run_on_cpu_func fn, } } -/* statistics */ -int tlb_flush_count; +size_t tlb_flush_count(void) +{ +CPUState *cpu; +size_t count = 0; + +CPU_FOREACH(cpu) { +CPUArchState *env = cpu->env_ptr; + +count += atomic_read(>tlb_flush_count); +} +return count; +} /* This is OK because CPU architectures generally permit an * implementation to drop entries from the TLB at any time, so @@ -112,7 +122,8 @@ static void tlb_flush_nocheck(CPUState *cpu) } assert_cpu_is_self(cpu); -tlb_debug("(count: %d)\n", tlb_flush_count++); +atomic_set(>tlb_flush_count, env->tlb_flush_count + 1); +tlb_debug("(count: %zu)\n", tlb_flush_count()); tb_lock(); diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 090ebad..3ee69e5 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1916,7 +1916,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf) atomic_read(_ctx.tb_ctx.tb_flush_count)); cpu_fprintf(f, "TB invalidate count %d\n", tcg_ctx.tb_ctx.tb_phys_invalidate_count); -cpu_fprintf(f, "TLB flush count %d\n", tlb_flush_count); +cpu_fprintf(f, "TLB flush count %zu\n", tlb_flush_count()); tcg_dump_info(f, cpu_fprintf); tb_unlock(); -- 2.7.4
[Qemu-devel] [PATCH v3 15/43] target/i386: check CF_PARALLEL instead of parallel_cpus
Thereby decoupling the resulting translated code from the current state of the system. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- target/i386/translate.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/i386/translate.c b/target/i386/translate.c index f046ffa..0f38a48 100644 --- a/target/i386/translate.c +++ b/target/i386/translate.c @@ -5263,7 +5263,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, if (!(s->cpuid_ext_features & CPUID_EXT_CX16)) goto illegal_op; gen_lea_modrm(env, s, modrm); -if ((s->prefix & PREFIX_LOCK) && parallel_cpus) { +if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->tb) & CF_PARALLEL)) { gen_helper_cmpxchg16b(cpu_env, cpu_A0); } else { gen_helper_cmpxchg16b_unlocked(cpu_env, cpu_A0); @@ -5274,7 +5274,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s, if (!(s->cpuid_features & CPUID_CX8)) goto illegal_op; gen_lea_modrm(env, s, modrm); -if ((s->prefix & PREFIX_LOCK) && parallel_cpus) { +if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->tb) & CF_PARALLEL)) { gen_helper_cmpxchg8b(cpu_env, cpu_A0); } else { gen_helper_cmpxchg8b_unlocked(cpu_env, cpu_A0); -- 2.7.4
[Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr
It is unlikely that we will ever want to call this helper passing an argument other than the current PC. So just remove the argument, and use the pc we already get from cpu_get_tb_cpu_state. This change paves the way to having a common "tb_lookup" function. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- tcg/tcg-op.h | 4 ++-- tcg/tcg-runtime.h | 2 +- target/alpha/translate.c | 2 +- target/arm/translate-a64.c | 4 ++-- target/arm/translate.c | 5 + target/hppa/translate.c| 6 +++--- target/i386/translate.c| 17 + target/mips/translate.c| 4 ++-- target/s390x/translate.c | 2 +- target/sh4/translate.c | 4 ++-- tcg/tcg-op.c | 4 ++-- tcg/tcg-runtime.c | 20 ++-- 12 files changed, 32 insertions(+), 42 deletions(-) diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h index 5d3278f..18d01b2 100644 --- a/tcg/tcg-op.h +++ b/tcg/tcg-op.h @@ -797,7 +797,7 @@ static inline void tcg_gen_exit_tb(uintptr_t val) void tcg_gen_goto_tb(unsigned idx); /** - * tcg_gen_lookup_and_goto_ptr() - look up a TB and jump to it if valid + * tcg_gen_lookup_and_goto_ptr() - look up the current TB, jump to it if valid * @addr: Guest address of the target TB * * If the TB is not valid, jump to the epilogue. @@ -805,7 +805,7 @@ void tcg_gen_goto_tb(unsigned idx); * This operation is optional. If the TCG backend does not implement goto_ptr, * this op is equivalent to calling tcg_gen_exit_tb() with 0 as the argument. */ -void tcg_gen_lookup_and_goto_ptr(TCGv addr); +void tcg_gen_lookup_and_goto_ptr(void); #if TARGET_LONG_BITS == 32 #define tcg_temp_new() tcg_temp_new_i32() diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h index c41d38a..1df17d0 100644 --- a/tcg/tcg-runtime.h +++ b/tcg/tcg-runtime.h @@ -24,7 +24,7 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64) DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32) DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64) -DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl) +DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env) DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env) diff --git a/target/alpha/translate.c b/target/alpha/translate.c index 90e6d52..9e98312 100644 --- a/target/alpha/translate.c +++ b/target/alpha/translate.c @@ -3073,7 +3073,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb) /* FALLTHRU */ case EXIT_PC_UPDATED: if (!use_exit_tb()) { -tcg_gen_lookup_and_goto_ptr(cpu_pc); +tcg_gen_lookup_and_goto_ptr(); break; } /* FALLTHRU */ diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 3fa3902..818d7eb 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -379,7 +379,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest) } else if (s->singlestep_enabled) { gen_exception_internal(EXCP_DEBUG); } else { -tcg_gen_lookup_and_goto_ptr(cpu_pc); +tcg_gen_lookup_and_goto_ptr(); s->is_jmp = DISAS_TB_JUMP; } } @@ -11366,7 +11366,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb) gen_goto_tb(dc, 1, dc->pc); break; case DISAS_JUMP: -tcg_gen_lookup_and_goto_ptr(cpu_pc); +tcg_gen_lookup_and_goto_ptr(); break; case DISAS_TB_JUMP: case DISAS_EXC: diff --git a/target/arm/translate.c b/target/arm/translate.c index e27736c..964b627 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -4152,10 +4152,7 @@ static inline bool use_goto_tb(DisasContext *s, target_ulong dest) static void gen_goto_ptr(void) { -TCGv addr = tcg_temp_new(); -tcg_gen_extu_i32_tl(addr, cpu_R[15]); -tcg_gen_lookup_and_goto_ptr(addr); -tcg_temp_free(addr); +tcg_gen_lookup_and_goto_ptr(); } /* This will end the TB but doesn't guarantee we'll return to diff --git a/target/hppa/translate.c b/target/hppa/translate.c index e10abc5..91053e2 100644 --- a/target/hppa/translate.c +++ b/target/hppa/translate.c @@ -517,7 +517,7 @@ static void gen_goto_tb(DisasContext *ctx, int which, if (ctx->singlestep_enabled) { gen_excp_1(EXCP_DEBUG); } else { -tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f); +tcg_gen_lookup_and_goto_ptr(); } } } @@ -1527,7 +1527,7 @@ static ExitStatus do_ibranch(DisasContext *ctx, TCGv dest, if (link != 0) { tcg_gen_movi_tl(cpu_gr[link], ctx->iaoq_n); } -tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f); +tcg_gen_lookup_and_goto_ptr(); return nullify_end(ctx, NO_EXIT); } else { cond_prep(>null_cond); @@ -3885,7 +3885,7 @@ void
[Qemu-devel] [PATCH v3 07/43] tcg/mips: constify tcg_target_callee_save_regs
Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Emilio G. Cota --- tcg/mips/tcg-target.inc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c index 85756b8..56db228 100644 --- a/tcg/mips/tcg-target.inc.c +++ b/tcg/mips/tcg-target.inc.c @@ -2323,7 +2323,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return NULL; } -static int tcg_target_callee_save_regs[] = { +static const int tcg_target_callee_save_regs[] = { TCG_REG_S0, /* used for the global env (TCG_AREG0) */ TCG_REG_S1, TCG_REG_S2, -- 2.7.4
[Qemu-devel] [PATCH v3 32/43] tcg: take .helpers out of TCGContext
Groundwork for supporting multiple TCG contexts. The hash table becomes read-only after it is filled in, so we can save space by keeping just a global pointer to it. Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- tcg/tcg.h | 2 -- tcg/tcg.c | 10 +- 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index 22f7ecd..53c679f 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -664,8 +664,6 @@ struct TCGContext { tcg_insn_unit *code_ptr; -GHashTable *helpers; - #ifdef CONFIG_PROFILER /* profiling info */ int64_t tb_count1; diff --git a/tcg/tcg.c b/tcg/tcg.c index 28c1b94..c0c2d6c 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -319,6 +319,7 @@ typedef struct TCGHelperInfo { static const TCGHelperInfo all_helpers[] = { #include "exec/helper-tcg.h" }; +static GHashTable *helper_table; static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)]; static void process_op_defs(TCGContext *s); @@ -329,7 +330,6 @@ void tcg_context_init(TCGContext *s) TCGOpDef *def; TCGArgConstraint *args_ct; int *sorted_args; -GHashTable *helper_table; memset(s, 0, sizeof(*s)); s->nb_globals = 0; @@ -357,7 +357,7 @@ void tcg_context_init(TCGContext *s) /* Register helpers. */ /* Use g_direct_hash/equal for direct pointer comparisons on func. */ -s->helpers = helper_table = g_hash_table_new(NULL, NULL); +helper_table = g_hash_table_new(NULL, NULL); for (i = 0; i < ARRAY_SIZE(all_helpers); ++i) { g_hash_table_insert(helper_table, (gpointer)all_helpers[i].func, @@ -761,7 +761,7 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret, unsigned sizemask, flags; TCGHelperInfo *info; -info = g_hash_table_lookup(s->helpers, (gpointer)func); +info = g_hash_table_lookup(helper_table, (gpointer)func); flags = info->flags; sizemask = info->sizemask; @@ -990,8 +990,8 @@ static char *tcg_get_arg_str_idx(TCGContext *s, char *buf, static inline const char *tcg_find_helper(TCGContext *s, uintptr_t val) { const char *ret = NULL; -if (s->helpers) { -TCGHelperInfo *info = g_hash_table_lookup(s->helpers, (gpointer)val); +if (helper_table) { +TCGHelperInfo *info = g_hash_table_lookup(helper_table, (gpointer)val); if (info) { ret = info->name; } -- 2.7.4
[Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static
It is only used by this object, and it's not exported to any other. Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- accel/tcg/translate-all.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 63f8538..a124181 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -139,7 +139,7 @@ TCGContext tcg_ctx; bool parallel_cpus; /* translation block context */ -__thread int have_tb_lock; +static __thread int have_tb_lock; static void page_table_config_init(void) { -- 2.7.4
[Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs
Reviewed-by: Richard HendersonReviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Emilio G. Cota --- tcg/i386/tcg-target.inc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c index 01e3b4e..06df01a 100644 --- a/tcg/i386/tcg-target.inc.c +++ b/tcg/i386/tcg-target.inc.c @@ -2514,7 +2514,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op) return NULL; } -static int tcg_target_callee_save_regs[] = { +static const int tcg_target_callee_save_regs[] = { #if TCG_TARGET_REG_BITS == 64 TCG_REG_RBP, TCG_REG_RBX, -- 2.7.4
[Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find
Reusing the have_tb_lock name, which is also defined in translate-all.c, makes code reviewing unnecessarily harder. Avoid potential confusion by renaming the local have_tb_lock variable to something else. Reviewed-by: Richard HendersonSigned-off-by: Emilio G. Cota --- accel/tcg/cpu-exec.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index d84b01d..c4c289b 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -337,7 +337,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu, TranslationBlock *tb; target_ulong cs_base, pc; uint32_t flags; -bool have_tb_lock = false; +bool acquired_tb_lock = false; /* we record a subset of the CPU state. It will always be the same before a given translated block @@ -356,7 +356,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu, */ mmap_lock(); tb_lock(); -have_tb_lock = true; +acquired_tb_lock = true; /* There's a chance that our desired tb has been translated while * taking the locks so we check again inside the lock. @@ -384,15 +384,15 @@ static inline TranslationBlock *tb_find(CPUState *cpu, #endif /* See if we can patch the calling TB. */ if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) { -if (!have_tb_lock) { +if (!acquired_tb_lock) { tb_lock(); -have_tb_lock = true; +acquired_tb_lock = true; } if (!tb->invalid) { tb_add_jump(last_tb, tb_exit, tb); } } -if (have_tb_lock) { +if (acquired_tb_lock) { tb_unlock(); } return tb; -- 2.7.4
[Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts
v2: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04749.html v3 applies on top of the current master (d4e59218a). To ease review/testing, you can pull this series from: https://github.com/cota/qemu/tree/multi-tcg-v3 Note: I cannot even compile-test _WIN32 bits, help appreciated! See patches 39/40. Changes from v2: - Rebase on top of current master (therefore dropping the first 2 patches, which are already on master) - Add sh4 bits, touching: - Removal of argument to tb_lookup_ptr (merged into otherwise same v2 patch) - tb_cflags() inline (new patch in v3 for sh4 and all other arches) - CF_PARALLEL instead of parallel_cpus (sh4-only patch in v3) - Add R-b tags - Drop the patch removing the tb->invalid check. - Introduce the patch implementing tb_lookup__cpu_state before the patches that fiddle with tb->cflags, so that we have a single place where to do that fiddling - Update commit log of the tb_lookup__cpu_state patch explaining why tb->invalid must be checked when obtaining the *tb from tb_jmp_cache - Improve comment next to CF_INVALID - CF_PARALLEL: - Introduce tb_cflags inline to hide the atomic_read - Add an extra patch to convert tb->cflags readers to tb_cflags - Rename curr_cf_mask() to curr_cflags() - Remove many superfluous if (parallel_cpus) checks; just call curr_cflags() - Drop tb_cf_mask(); use CF_HASH_MASK instead - m68k: use gen_helper_exit_atomic instead of implementing cas2w_parallel - s390x: Richard: I dropped your R-b tag because v3 also includes csst. - sh4: add sh4 patch, as mentioned above - tcg_ctx.cf_parallel: use a bool instead of a u8 - Do if (foo && (tb_cflags(tb) & BAR)) instead of (foo && tb_cflags() & BAR) - Use a size_t for struct tb_tc.size, plugging the 4-byte hole - Dynamically allocate TCG optimizer globals - Use directly a bitmap, instead of TCGTempSet for temps_used, which saves some space - Add perf numbers for the change: ~2% slowdown - **tcg_ctxs: get rid of tcg_ctxs_init - TCGProfile: s/PROF_ADD_MAX/PROF_MAX/ - real_host_page_size: move to its own file with an init constructor, as suggested by Richard (Richard: I kept your R-b tag). - qemu_mprotect helpers: g_assert on page-aligned address and size - Adapt callers in translate-all.c to pass page-aligned address and size - TCG regions: - Hide the computation of n_regions from tcg_region_init's callers. The function now takes no arguments. Add a comment about qemu_tcg_mttcg_enabled(). - if (!inited) { inited = true; do_init(); } in cpus.c - Use assert instead of if (err) tcg_abort(); - Use QEMU_ALIGN_DOWN instead of &= mask - Inline set_guard_pages() into tcg_region_init - Merge patch that removes code_gen_buffer's guard page into the TCG regions' patch - TCG __thread: - Inline tcg_ctxs_init into tcg_context_init - Move the code that determines the number of regions from the previous patch to this patch. To be done after this series: - Get rid of tb_lock, or at least push it down so that we take advantage of multiple TCG contexts in MTTCG. (I'm doing this in my testing, but doing it well will require another patch series.) Improvements that were suggested during this series' development: - Order tb->[*] comparisons by likelihood of mismatch. - Get rid of parallel_cpus from from cpu_exec_step_atomic -- I'm not sure whether just removing it is safe, since we call curr_cflags from several places. - Perhaps parse -accel=tcg command-line arguments before TCG is initialized, so that those arguments can be used during TCG initialization. Thanks, Emilio
[Qemu-devel] [Bug 1703506] Re: SMT not supported by QEMU on AMD Ryzen CPU
Attached Ubuntu 17.04 guest logs. I wasn't able to run x86info as root. Only as regular user. Error shown: readEntry: Operation not permitted error reading 1KB from 0x3fffc00 There are a few bug reports about it but no workarounds. Seems to happen on vm's. So the output is missing a few sections. >Also, can somebody confirm if the same Windows version works as expected on bare metal? Yes, same Windows version on bare metal works as expected. In my case showing 8 cores and 16 threads/logical processors. I'm trying to use 4 cores 8 threads in the VMs. Both Windows and Ubuntu are showing 8 physical cores. ** Attachment added: "ubuntu-guest-smt-ryzen.zip" https://bugs.launchpad.net/qemu/+bug/1703506/+attachment/4917874/+files/ubuntu-guest-smt-ryzen.zip -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1703506 Title: SMT not supported by QEMU on AMD Ryzen CPU Status in QEMU: New Bug description: HyperThreading/SMT is supported by AMD Ryzen CPUs but results in this message when setting the topology to threads=2: qemu-system-x86_64: AMD CPU doesn't support hyperthreading. Please configure -smp options properly. Checking in a Windows 10 guest reveals that SMT is not enabled, and from what I understand, QEMU converts the topology from threads to cores internally on AMD CPUs. This appears to cause performance problems in the guest perhaps because programs are assuming that these threads are actual cores. Software: Linux 4.12, qemu 2.9.0 host with KVM enabled, Windows 10 pro guest To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1703506/+subscriptions
Re: [Qemu-devel] [PATCH v2 3/3] qemu.py: make 'args' public
On Thu, 07/20 10:38, Fam Zheng wrote: > On Wed, 07/19 18:31, Amador Pahim wrote: > > Let's make args public so users can extend it without felling like > > abusing the internal API. > > s/felling/feeling/ ? Apart from that: Reviewed-by: Fam Zheng
[Qemu-devel] 答复: Re: [PATCH] vhost: fix a migration failed because ofvhost region merge
原始邮件 发件人:收件人: 抄送人: 彭浩10096742王业超10154425 日 期 :2017年07月19日 23:53 主 题 :Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because ofvhost region merge On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote: > On Wed, 19 Jul 2017 12:46:13 +0100 > "Dr. David Alan Gilbert" wrote: > > > * Igor Mammedov (imamm...@redhat.com) wrote: > > > On Wed, 19 Jul 2017 23:17:32 +0800 > > > Peng Hao wrote: > > > > > > > When a guest that has several hotplugged dimms is migrated, in > > > > destination host it will fail to resume. Because vhost regions of > > > > several dimms in source host are merged and in the restore stage > > > > in destination host it computes whether more than vhost slot limit > > > > before merging vhost regions of several dimms. > > > could you provide a bit more detailed description of the problem > > > including command line+used device_add commands on source and > > > command line on destination? > > > > (ccing in Marc Andre and Maxime) > > > > Hmm, I'd like to understade the situation where you get merging between > > RAMBlocks that complicates some stuff for postcopy. > and probably inconsistent merging breaks vhost as well > > merging might happen if regions are adjacent or overlap > but for that to happen merged regions must have equal > distance between their GPA:HVA pairs, so that following > translation would work: > > if gva in regionX[gva_start, len, hva_start] >hva = hva_start + gva - gva_start > > while GVA of regions is under QEMU control and deterministic > HVA is not, so in migration case merging might happen on source > side but not on destination, resulting in different memory maps. > > Maybe Michael might know details why migration works in vhost usecase, > but I don't see vhost sending any vmstate data. We aren't merging ramblocks at all. When we are passing blocks A and B to vhost, if we see that hvaB=hvaA + lenA gpaB=gpaA + lenA then we can improve performance a bit by passing a single chunk to vhost: hvaA,gpaA,lena+lenB so it does not affect migration normally. - I think it is like this: in source in destination:(restore) realize device 1 realize device 1 realize device 2 realize dimm 0 ... realize dimm1 realize device n realize dimmx realize device m realize dimm0 . realize dimm1 . .. . realize dimmxrealize device n In restore stage ,the sort of realizing device is different from starting vm because of adding dimms. So it may in some stage during restoring can't merge vhost regions. > > > > > > > > > > > Signed-off-by: Peng Hao > > > > Signed-off-by: Wang Yechao > > > > --- > > > > hw/mem/pc-dimm.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c > > > > index ea67b46..bb0fa08 100644 > > > > --- a/hw/mem/pc-dimm.c > > > > +++ b/hw/mem/pc-dimm.c > > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, > > > > MemoryHotplugState *hpms, > > > > goto out > > > > } > > > > > > > > -if (!vhost_has_free_slot()) { > > > > +if (!vhost_has_free_slot() && runstate_is_running()) { > > > > error_setg(_err, "a used vhost backend has no free" > > > > " memory slots left") > > > > goto out > > > > Even this produces the wrong error message in this case, > > it also makes me think if the existing code should undo a lot of > > the object_property_set's that happen. > > > > Dave > > > > > > > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v2 2/3] qemu.py: include debug information on launch error
On Wed, 07/19 18:31, Amador Pahim wrote: > When launching a VM, if an exception happens and the VM is not > initiated, it is useful to see the qemu command line that was executed > and the output of that command. > > Before the patch: > > >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64') > >>> VM.launch() > Traceback (most recent call last): > File "", line 1, in > File "qemu.py", line 137, in launch > self._post_launch() > File "qemu.py", line 121, in _post_launch > self._qmp.accept() > File "qmp/qmp.py", line 145, in accept > self.__sock, _ = self.__sock.accept() > File "/usr/lib64/python2.7/socket.py", line 206, in accept > sock, addr = self._sock.accept() > socket.timeout: timed out > > After the patch: > > >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64') > >>> VM.launch() > Traceback (most recent call last): > File "", line 1, in > File "qemu.py", line 156, in launch > raise RuntimeError(msg) > RuntimeError: Error launching VM. > Original Exception: > Traceback (most recent call last): > File "qemu.py", line 138, in launch > self._post_launch() > File "qemu.py", line 122, in _post_launch > self._qmp.accept() > File "qmp/qmp.py", line 145, in accept > self.__sock, _ = self.__sock.accept() > File "/usr/lib64/python2.7/socket.py", line 206, in accept > sock, addr = self._sock.accept() > timeout: timed out > Command: > /usr/bin/qemu-system-aarch64 -chardev socket,id=mon, > path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control > -display none -vga none > Output: > qemu-system-aarch64: No machine specified, and there is no default > Use -machine help to list supported machines > > Also, if the launch() faces an exception, the 'except' now will use args > to fill the debug information. So this patch assigns 'args' earlier, > assuring it will be available for the 'except'. > > Signed-off-by: Amador Pahim> --- > scripts/qemu.py | 18 -- > 1 file changed, 16 insertions(+), 2 deletions(-) > > diff --git a/scripts/qemu.py b/scripts/qemu.py > index f0fade32bd..2707ae7f75 100644 > --- a/scripts/qemu.py > +++ b/scripts/qemu.py > @@ -18,6 +18,7 @@ import os > import sys > import subprocess > import qmp.qmp > +import traceback > > > class QEMUMachine(object): > @@ -129,17 +130,30 @@ class QEMUMachine(object): > '''Launch the VM and establish a QMP connection''' > devnull = open('/dev/null', 'rb') > qemulog = open(self._qemu_log_path, 'wb') > +args = self._wrapper + [self._binary] + self._base_args() + self.args > try: > self._pre_launch() > -args = self._wrapper + [self._binary] + self._base_args() + > self._args > self._popen = subprocess.Popen(args, stdin=devnull, > stdout=qemulog, > stderr=subprocess.STDOUT, > shell=False) > self._post_launch() > except: > +self._load_io_log() > if self.is_running(): > self._popen.kill() > self._popen.wait() > -self._load_io_log() > +else: > +exc_type, exc_value, exc_traceback = sys.exc_info() > +msg = ('Error launching VM.\n' > + 'Original Exception: \n%s' > + 'Command:\n%s\n' > + 'Output:\n%s\n' % > + (''.join(traceback.format_exception(exc_type, > + exc_value, > + exc_traceback)), > +' '.join(args), > +self._iolog)) > +self._post_shutdown() > +raise RuntimeError(msg) > self._post_shutdown() > raise > > -- > 2.13.3 > > Reviewed-by: Fam Zheng
[Qemu-devel] Can I mount encrypt qcow2?
Can I mount encrypt qcow2 file through qemu-nbd? I tried but failed and nothing about that in man page
Re: [Qemu-devel] [PATCH v2 1/3] qemu.py: fix is_running()
On Wed, 07/19 18:31, Amador Pahim wrote: > Current implementation is broken. It does not really test if the child > process is running. > > The Popen.returncode will only be set after by a poll(), wait() or > communicate(). If the Popen fails to launch a VM, the Popen.returncode > will not turn to None by itself. > > Instead of using Popen.returncode, let's use Popen.poll(), which > actually checks if child process has terminated. > > Signed-off-by: Amador Pahim> --- > scripts/qemu.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/qemu.py b/scripts/qemu.py > index 880e3e8219..f0fade32bd 100644 > --- a/scripts/qemu.py > +++ b/scripts/qemu.py > @@ -86,7 +86,7 @@ class QEMUMachine(object): > raise > > def is_running(self): > -return self._popen and (self._popen.returncode is None) > +return self._popen and (self._popen.poll() is None) > > def exitcode(self): > if self._popen is None: > -- > 2.13.3 > > Reviewed-by: Fam Zheng
Re: [Qemu-devel] [PATCH v2 3/3] qemu.py: make 'args' public
On Wed, 07/19 18:31, Amador Pahim wrote: > Let's make args public so users can extend it without felling like > abusing the internal API. s/felling/feeling/ ? Fam > > Signed-off-by: Amador Pahim> --- > scripts/qemu.py | 13 +++-- > tests/qemu-iotests/iotests.py | 18 +- > 2 files changed, 16 insertions(+), 15 deletions(-) > > diff --git a/scripts/qemu.py b/scripts/qemu.py > index 2707ae7f75..2c2043f89a 100644 > --- a/scripts/qemu.py > +++ b/scripts/qemu.py > @@ -34,7 +34,7 @@ class QEMUMachine(object): > self._qemu_log_path = os.path.join(test_dir, name + ".log") > self._popen = None > self._binary = binary > -self._args = list(args) # Force copy args in case we modify them > +self.args = list(args) # Force copy args in case we modify them > self._wrapper = wrapper > self._events = [] > self._iolog = None > @@ -44,8 +44,8 @@ class QEMUMachine(object): > # This can be used to add an unused monitor instance. > def add_monitor_telnet(self, ip, port): > args = 'tcp:%s:%d,server,nowait,telnet' % (ip, port) > -self._args.append('-monitor') > -self._args.append(args) > +self.args.append('-monitor') > +self.args.append(args) > > def add_fd(self, fd, fdset, opaque, opts=''): > '''Pass a file descriptor to the VM''' > @@ -55,8 +55,8 @@ class QEMUMachine(object): > if opts: > options.append(opts) > > -self._args.append('-add-fd') > -self._args.append(','.join(options)) > +self.args.append('-add-fd') > +self.args.append(','.join(options)) > return self > > def send_fd_scm(self, fd_file_path): > @@ -168,7 +168,8 @@ class QEMUMachine(object): > > exitcode = self._popen.wait() > if exitcode < 0: > -sys.stderr.write('qemu received signal %i: %s\n' % > (-exitcode, ' '.join(self._args))) > +sys.stderr.write('qemu received signal %i: %s\n' % > + (-exitcode, ' '.join(self.args))) > self._load_io_log() > self._post_shutdown() > > diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py > index abcf3c10e2..6925d8841e 100644 > --- a/tests/qemu-iotests/iotests.py > +++ b/tests/qemu-iotests/iotests.py > @@ -150,13 +150,13 @@ class VM(qtest.QEMUQtestMachine): > self._num_drives = 0 > > def add_device(self, opts): > -self._args.append('-device') > -self._args.append(opts) > +self.args.append('-device') > +self.args.append(opts) > return self > > def add_drive_raw(self, opts): > -self._args.append('-drive') > -self._args.append(opts) > +self.args.append('-drive') > +self.args.append(opts) > return self > > def add_drive(self, path, opts='', interface='virtio', format=imgfmt): > @@ -172,17 +172,17 @@ class VM(qtest.QEMUQtestMachine): > if opts: > options.append(opts) > > -self._args.append('-drive') > -self._args.append(','.join(options)) > +self.args.append('-drive') > +self.args.append(','.join(options)) > self._num_drives += 1 > return self > > def add_blockdev(self, opts): > -self._args.append('-blockdev') > +self.args.append('-blockdev') > if isinstance(opts, str): > -self._args.append(opts) > +self.args.append(opts) > else: > -self._args.append(','.join(opts)) > +self.args.append(','.join(opts)) > return self > > def pause_drive(self, drive, event=None): > -- > 2.13.3 > >
[Qemu-devel] Why "trace event does not exist"?
Hi all, I want to add new trace-event and log it, so I add into $QEMU/trace-event like this: io_mem_init(void) "" > and after configure, $QEMU/build/trace-events-all also have this. then I add code like this into $QEMU/exec.c trace_io_mem_init(); > then I `make` and `make install`. But when I run `qemu-system-x86_64 -D /qemu.log -trace events=/qemu-events ...` it warn me that: qemu-system-x86_64:/qemu-events:1: WARNING: trace event 'io_mem_init' does > not exist and no io_mem_init log output. Why and what should I do to add this log? Thank you~
Re: [Qemu-devel] [PATCH v6] qga: Add support network interface statistics in guest-network-get-interfaces command
Hi, This series failed automatic build test. Please find the testing commands and their output below. If you have docker installed, you can probably reproduce it locally. Message-id: 1500512858-29428-1-git-send-email-lu.zhip...@zte.com.cn Subject: [Qemu-devel] [PATCH v6] qga: Add support network interface statistics in guest-network-get-interfaces command Type: series === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-quick@centos6 time make docker-test-build@min-glib time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/1500512858-29428-1-git-send-email-lu.zhip...@zte.com.cn -> patchew/1500512858-29428-1-git-send-email-lu.zhip...@zte.com.cn Switched to a new branch 'test' b2303df qga: Add support network interface statistics in guest-network-get-interfaces command === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-jfmm8_pa/src/dtc'... Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d' BUILD centos6 make[1]: Entering directory '/var/tmp/patchew-tester-tmp-jfmm8_pa/src' ARCHIVE qemu.tgz ARCHIVE dtc.tgz COPYRUNNER RUN test-quick in qemu:centos6 Packages installed: SDL-devel-1.2.14-7.el6_7.1.x86_64 bison-2.4.1-5.el6.x86_64 ccache-3.1.6-2.el6.x86_64 epel-release-6-8.noarch flex-2.5.35-9.el6.x86_64 gcc-4.4.7-18.el6.x86_64 git-1.7.1-8.el6.x86_64 glib2-devel-2.28.8-9.el6.x86_64 libfdt-devel-1.4.0-1.el6.x86_64 make-3.81-23.el6.x86_64 package g++ is not installed pixman-devel-0.32.8-1.el6.x86_64 tar-1.23-15.el6_8.x86_64 zlib-devel-1.2.3-29.el6.x86_64 Environment variables: PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel glib2-devel SDL-devel pixman-devel epel-release HOSTNAME=90eccbe3ed9b TERM=xterm MAKEFLAGS= -j8 HISTSIZE=1000 J=8 USER=root CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 MAIL=/var/spool/mail/root PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ LANG=en_US.UTF-8 TARGET_LIST= HISTCONTROL=ignoredups SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test LOGNAME=root LESSOPEN=||/usr/bin/lesspipe.sh %s FEATURES= dtc DEBUG= G_BROKEN_FILENAMES=1 CCACHE_HASHDIR= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/var/tmp/qemu-build/install No C++ compiler available; disabling C++ specific optional code Install prefix/var/tmp/qemu-build/install BIOS directory/var/tmp/qemu-build/install/share/qemu binary directory /var/tmp/qemu-build/install/bin library directory /var/tmp/qemu-build/install/lib module directory /var/tmp/qemu-build/install/lib/qemu libexec directory /var/tmp/qemu-build/install/libexec include directory /var/tmp/qemu-build/install/include config directory /var/tmp/qemu-build/install/etc local state directory /var/tmp/qemu-build/install/var Manual directory /var/tmp/qemu-build/install/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /tmp/qemu-test/src C compilercc Host C compiler cc C++ compiler Objective-C compiler cc ARFLAGS rv CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g QEMU_CFLAGS -I/usr/include/pixman-1 -I$(SRC_PATH)/dtc/libfdt -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-all LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g make make install install pythonpython -B smbd /usr/sbin/smbd module supportno host CPU x86_64 host big endian no target list x86_64-softmmu aarch64-softmmu gprof enabled no sparse enabledno strip binariesyes profiler no static build no pixmansystem SDL support yes (1.2.14) GTK support no GTK GL supportno VTE support no TLS priority NORMAL GNUTLS supportno GNUTLS rndno libgcrypt no libgcrypt kdf no nettleno nettle kdfno libtasn1 no curses supportno virgl support no curl support no mingw32 support no Audio drivers oss Block whitelist (rw) Block whitelist (ro) VirtFS supportno VNC support yes VNC SASL support no VNC JPEG support no VNC PNG support no xen support
[Qemu-devel] [PATCH v6] qga: Add support network interface statistics in guest-network-get-interfaces command
we can get the network interface statistics inside a virtual machine by guest-network-get-interfaces command. it is very useful for us to monitor and analyze network traffic. Signed-off-by: ZhiPeng Luv1->v2: - correct some spelling mistake and add the stats data to the guest-network-get-interfaces command instead of adding a new command. v2-v3: - optimize function implementation v3->v4: - modify compile error v4->v5: - rename some temporary variables and add str_trim_off function for calculating the space num in front of the string in guest_get_network_stats v5->v6: - use g_strchug instead of str_trim_off implemented by myself --- qga/commands-posix.c | 72 +++- qga/qapi-schema.json | 38 ++- 2 files changed, 108 insertions(+), 2 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index d8e4122..b65dd8e 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -1639,6 +1639,65 @@ guest_find_interface(GuestNetworkInterfaceList *head, return head; } +static int guest_get_network_stats(const char *name, + GuestNetworkInterfaceStat *stats) +{ +int name_len; +char const *devinfo = "/proc/net/dev"; +FILE *fp; +char *line = NULL, *colon; +size_t n; +fp = fopen(devinfo, "r"); +if (!fp) { +return -1; +} +name_len = strlen(name); +while (getline(, , fp) != -1) { +long long dummy; +long long rx_bytes; +long long rx_packets; +long long rx_errs; +long long rx_dropped; +long long tx_bytes; +long long tx_packets; +long long tx_errs; +long long tx_dropped; +char *trim_line; +trim_line = g_strchug(line); +if (trim_line[0] == '\0') { +continue; +} +colon = strchr(trim_line, ':'); +if (!colon) { +continue; +} +if (colon - name_len == trim_line && + strncmp(trim_line, name, name_len) == 0) { +if (sscanf(colon + 1, +"%lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld", + _bytes, _packets, _errs, _dropped, + , , , , + _bytes, _packets, _errs, _dropped, + , , , ) != 16) { +continue; +} +stats->rx_bytes = rx_bytes; +stats->rx_packets = rx_packets; +stats->rx_errs = rx_errs; +stats->rx_dropped = rx_dropped; +stats->tx_bytes = tx_bytes; +stats->tx_packets = tx_packets; +stats->tx_errs = tx_errs; +stats->tx_dropped = tx_dropped; +fclose(fp); +return 0; +} +} +fclose(fp); +g_debug("/proc/net/dev: Interface not found"); +return -1; +} + /* * Build information about guest interfaces */ @@ -1655,6 +1714,7 @@ GuestNetworkInterfaceList *qmp_guest_network_get_interfaces(Error **errp) for (ifa = ifap; ifa; ifa = ifa->ifa_next) { GuestNetworkInterfaceList *info; GuestIpAddressList **address_list = NULL, *address_item = NULL; +GuestNetworkInterfaceStat *interface_stat = NULL; char addr4[INET_ADDRSTRLEN]; char addr6[INET6_ADDRSTRLEN]; int sock; @@ -1774,7 +1834,17 @@ GuestNetworkInterfaceList *qmp_guest_network_get_interfaces(Error **errp) info->value->has_ip_addresses = true; - +if (!info->value->has_statistics) { +interface_stat = g_malloc0(sizeof(*interface_stat)); +if (guest_get_network_stats(info->value->name, +interface_stat) == -1) { +info->value->has_statistics = false; +g_free(interface_stat); +} else { +info->value->statistics = interface_stat; +info->value->has_statistics = true; +} +} } freeifaddrs(ifap); diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json index 03743ab..4ad5c57 100644 --- a/qga/qapi-schema.json +++ b/qga/qapi-schema.json @@ -643,6 +643,38 @@ 'prefix': 'int'} } ## +# @GuestNetworkInterfaceStat: +# +# @rx-bytes: total bytes received +# +# @rx-packets: total packets received +# +# @rx-errs: bad packets received +# +# @rx-dropped: receiver dropped packets +# +# @tx-bytes: total bytes transmitted +# +# @tx-packets: total packets transmitted +# +# @tx-errs: packet transmit problems +# +# @tx-dropped: dropped packets transmitted +# +# Since: 2.11 +## +{ 'struct': 'GuestNetworkInterfaceStat', + 'data': {'rx-bytes': 'uint64', +'rx-packets': 'uint64', +'rx-errs': 'uint64', +'rx-dropped': 'uint64', +'tx-bytes': 'uint64', +'tx-packets': 'uint64', +'tx-errs': 'uint64', +'tx-dropped': 'uint64' + } } + +## #
Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices
Hi Marcel, You can use multi-function PCIe Root Ports, this will give you 8 ports per slot, if you have 16 empty slots (I think we have more) you reach 128 root ports. Then you can use multi-function virtio-net-pci devices, this will give you 8 functions per port, so you reach the target of 1024 devices. You loose hot-plug granularity since you can hot-plug 8-functions group, but maybe is OK, depending on your scenario. Thanks for the advice losing the hotplug granularity is something I think I can live with. It would mean, I would have to track how many ports are allocated to a VM, and create 8 new ports when 1 is required, caching the other 7 for when they are needed. Even so, you can use one cold-plugged pxb-pcie if you don't have enough empty slots on pcie.0, in order to reach the maximum number of PCIe Root Ports (256) which is the maximum for a single PCI domain. Took your advice see the attached cfg, it works exactly as you indicated. If you are interested, you can use it from your VM adding -readconfig to your qemu cmd line. I can currently only manage to start a VM with around 50 coldplugged virtio devices before something breaks. Not sure what yet, I will try scaling it with hotplugging tomorrow. If you need granularity per single device (1000+ hot-pluggable), you could enhance the pxb-pcie to support multiple pci domains. Do think there would be much work in this? Thanks, Ray K test.cfg.gz Description: application/gzip
[Qemu-devel] [PULL v2 13/14] tcg/tci: enable bswap16_i64
From: Philippe Mathieu-DaudéAltough correctly implemented, bswap16_i64() never got tested/executed so the safety TODO() statement was never removed. Since it got now tested the TODO() can be removed. while running Alex Bennée's image aarch64-linux-3.15rc2-buildroot.img: Trace 0x7fa1904b0890 [0: ffc00036cd04] IN: 0xffc00036cd24: 5ac00694 rev16 w20, w20 OP: ffc00036cd24 ext32u_i64 tmp3,x20 ext16u_i64 tmp2,tmp3 bswap16_i64 x20,tmp2 movi_i64 tmp4,$0x10 shr_i64 tmp2,tmp3,tmp4 ext16u_i64 tmp2,tmp2 bswap16_i64 tmp2,tmp2 deposit_i64 x20,x20,tmp2,$0x10,$0x10 Linking TBs 0x7fa1904b0890 [ffc00036cd04] index 0 -> 0x7fa1904b0aa0 [ffc00036cd24] Trace 0x7fa1904b0aa0 [0: ffc00036cd24] TODO qemu/tci.c:1049: tcg_qemu_tb_exec() qemu/tci.c:1049: tcg fatal error Aborted Signed-off-by: Philippe Mathieu-Daudé Signed-off-by: Jaroslaw Pelczar Reviewed-by: Alex Bennée Reviewed-by: Eric Blake Reviewed-by: Stefan Weil Message-Id: <20170718045540.16322-11-f4...@amsat.org> Signed-off-by: Richard Henderson --- tcg/tci.c | 1 - 1 file changed, 1 deletion(-) diff --git a/tcg/tci.c b/tcg/tci.c index 4bdc645f2a..f39bfb95c0 100644 --- a/tcg/tci.c +++ b/tcg/tci.c @@ -1046,7 +1046,6 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr) break; #if TCG_TARGET_HAS_bswap16_i64 case INDEX_op_bswap16_i64: -TODO(); t0 = *tb_ptr++; t1 = tci_read_r16(_ptr); tci_write_reg64(t0, bswap16(t1)); -- 2.13.3
[Qemu-devel] [PULL v2 11/14] target/sparc: optimize gen_op_mulscc() using deposit op
From: Philippe Mathieu-DaudéSuggested-by: Richard Henderson Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20170718045540.16322-9-f4...@amsat.org> Signed-off-by: Richard Henderson --- target/sparc/translate.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/target/sparc/translate.c b/target/sparc/translate.c index 67a83b77cc..d13173275f 100644 --- a/target/sparc/translate.c +++ b/target/sparc/translate.c @@ -632,11 +632,8 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2) // b2 = T0 & 1; // env->y = (b2 << 31) | (env->y >> 1); -tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1); -tcg_gen_shli_tl(r_temp, r_temp, 31); tcg_gen_extract_tl(t0, cpu_y, 1, 31); -tcg_gen_or_tl(t0, t0, r_temp); -tcg_gen_andi_tl(cpu_y, t0, 0x); +tcg_gen_deposit_tl(cpu_y, t0, cpu_cc_src, 31, 1); // b1 = N ^ V; gen_mov_reg_N(t0, cpu_psr); -- 2.13.3
[Qemu-devel] [PULL v2 14/14] tcg: Pass generic CPUState to gen_intermediate_code()
From: Lluís VilanovaNeeded to implement a target-agnostic gen_intermediate_code() in the future. Reviewed-by: David Gibson Reviewed-by: Richard Henderson Reviewed-by: Alex Benneé Reviewed-by: Emilio G. Cota Signed-off-by: Lluís Vilanova Message-Id: <150002025498.22386.18051908483085660588.st...@frigg.lan> Signed-off-by: Richard Henderson --- include/exec/exec-all.h | 2 +- target/arm/translate.h| 4 ++-- accel/tcg/translate-all.c | 2 +- target/alpha/translate.c | 5 ++--- target/arm/translate-a64.c| 6 +++--- target/arm/translate.c| 6 +++--- target/cris/translate.c | 7 +++ target/hppa/translate.c | 5 ++--- target/i386/translate.c | 5 ++--- target/lm32/translate.c | 4 ++-- target/m68k/translate.c | 5 ++--- target/microblaze/translate.c | 4 ++-- target/mips/translate.c | 5 ++--- target/moxie/translate.c | 4 ++-- target/nios2/translate.c | 5 ++--- target/openrisc/translate.c | 4 ++-- target/ppc/translate.c| 5 ++--- target/s390x/translate.c | 5 ++--- target/sh4/translate.c| 5 ++--- target/sparc/translate.c | 5 ++--- target/tilegx/translate.c | 5 ++--- target/tricore/translate.c| 5 ++--- target/unicore32/translate.c | 5 ++--- target/xtensa/translate.c | 5 ++--- 24 files changed, 49 insertions(+), 64 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 87b1b74e3b..440fc31b37 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -66,7 +66,7 @@ typedef ram_addr_t tb_page_addr_t; #include "qemu/log.h" -void gen_intermediate_code(CPUArchState *env, struct TranslationBlock *tb); +void gen_intermediate_code(CPUState *cpu, struct TranslationBlock *tb); void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb, target_ulong *data); diff --git a/target/arm/translate.h b/target/arm/translate.h index 12fd79ba8e..2fe144baa9 100644 --- a/target/arm/translate.h +++ b/target/arm/translate.h @@ -149,7 +149,7 @@ static void disas_set_insn_syndrome(DisasContext *s, uint32_t syn) #ifdef TARGET_AARCH64 void a64_translate_init(void); -void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb); +void gen_intermediate_code_a64(CPUState *cpu, TranslationBlock *tb); void gen_a64_set_pc_im(uint64_t val); void aarch64_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf, int flags); @@ -158,7 +158,7 @@ static inline void a64_translate_init(void) { } -static inline void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb) +static inline void gen_intermediate_code_a64(CPUState *cpu, TranslationBlock *tb) { } diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 090ebad0a7..37ecafa931 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1280,7 +1280,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, tcg_func_start(_ctx); tcg_ctx.cpu = ENV_GET_CPU(env); -gen_intermediate_code(env, tb); +gen_intermediate_code(cpu, tb); tcg_ctx.cpu = NULL; trace_translate_block(tb, tb->pc, tb->tc_ptr); diff --git a/target/alpha/translate.c b/target/alpha/translate.c index 744d8bbf12..f465752208 100644 --- a/target/alpha/translate.c +++ b/target/alpha/translate.c @@ -2952,10 +2952,9 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn) return ret; } -void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb) +void gen_intermediate_code(CPUState *cs, struct TranslationBlock *tb) { -AlphaCPU *cpu = alpha_env_get_cpu(env); -CPUState *cs = CPU(cpu); +CPUAlphaState *env = cs->env_ptr; DisasContext ctx, *ctxp = target_ulong pc_start; target_ulong pc_mask; diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 5bb0f8ef22..883e9df0c2 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -11179,10 +11179,10 @@ static void disas_a64_insn(CPUARMState *env, DisasContext *s) free_tmp_a64(s); } -void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb) +void gen_intermediate_code_a64(CPUState *cs, TranslationBlock *tb) { -CPUState *cs = CPU(cpu); -CPUARMState *env = >env; +CPUARMState *env = cs->env_ptr; +ARMCPU *cpu = arm_env_get_cpu(env); DisasContext dc1, *dc = target_ulong pc_start; target_ulong next_page_start; diff --git a/target/arm/translate.c b/target/arm/translate.c index d3003ae0d8..d1a5f56998 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -11795,10 +11795,10 @@ static bool insn_crosses_page(CPUARMState *env, DisasContext *s) } /* generate intermediate code for basic block 'tb'. */ -void gen_intermediate_code(CPUARMState *env, TranslationBlock
[Qemu-devel] [PULL v2 06/14] target/arm: Optimize aarch64 rev16
It is much shorter to reverse all 4 half-words in parallel than extract, reverse, and deposit each in turn. Suggested-by: Aurelien JarnoSigned-off-by: Richard Henderson --- target/arm/translate-a64.c | 24 ++-- 1 file changed, 6 insertions(+), 18 deletions(-) diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c index 3fa39023ca..5bb0f8ef22 100644 --- a/target/arm/translate-a64.c +++ b/target/arm/translate-a64.c @@ -4043,25 +4043,13 @@ static void handle_rev16(DisasContext *s, unsigned int sf, TCGv_i64 tcg_rd = cpu_reg(s, rd); TCGv_i64 tcg_tmp = tcg_temp_new_i64(); TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf); +TCGv_i64 mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff); -tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0x); -tcg_gen_bswap16_i64(tcg_rd, tcg_tmp); - -tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16); -tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x); -tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); -tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16); - -if (sf) { -tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32); -tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x); -tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); -tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16); - -tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48); -tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp); -tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16); -} +tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8); +tcg_gen_and_i64(tcg_rd, tcg_rn, mask); +tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask); +tcg_gen_shli_i64(tcg_rd, tcg_rd, 8); +tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp); tcg_temp_free_i64(tcg_tmp); } -- 2.13.3
[Qemu-devel] [PULL v2 10/14] target/sparc: optimize various functions using extract op
From: Philippe Mathieu-DaudéDone with the Coccinelle semantic patch scripts/coccinelle/tcg_gen_extract.cocci. Reviewed-by: Richard Henderson Signed-off-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/sparc/translate.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/target/sparc/translate.c b/target/sparc/translate.c index aa6734d54e..67a83b77cc 100644 --- a/target/sparc/translate.c +++ b/target/sparc/translate.c @@ -380,29 +380,25 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num, static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src) { tcg_gen_extu_i32_tl(reg, src); -tcg_gen_shri_tl(reg, reg, PSR_NEG_SHIFT); -tcg_gen_andi_tl(reg, reg, 0x1); +tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 1); } static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src) { tcg_gen_extu_i32_tl(reg, src); -tcg_gen_shri_tl(reg, reg, PSR_ZERO_SHIFT); -tcg_gen_andi_tl(reg, reg, 0x1); +tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 1); } static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src) { tcg_gen_extu_i32_tl(reg, src); -tcg_gen_shri_tl(reg, reg, PSR_OVF_SHIFT); -tcg_gen_andi_tl(reg, reg, 0x1); +tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 1); } static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src) { tcg_gen_extu_i32_tl(reg, src); -tcg_gen_shri_tl(reg, reg, PSR_CARRY_SHIFT); -tcg_gen_andi_tl(reg, reg, 0x1); +tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 1); } static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2) @@ -638,8 +634,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv src2) // env->y = (b2 << 31) | (env->y >> 1); tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1); tcg_gen_shli_tl(r_temp, r_temp, 31); -tcg_gen_shri_tl(t0, cpu_y, 1); -tcg_gen_andi_tl(t0, t0, 0x7fff); +tcg_gen_extract_tl(t0, cpu_y, 1, 31); tcg_gen_or_tl(t0, t0, r_temp); tcg_gen_andi_tl(cpu_y, t0, 0x); -- 2.13.3
[Qemu-devel] [PULL v2 12/14] target/alpha: optimize gen_cvtlq() using deposit op
From: Philippe Mathieu-DaudéSuggested-by: Richard Henderson Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20170718045540.16322-10-f4...@amsat.org> Signed-off-by: Richard Henderson --- target/alpha/translate.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/target/alpha/translate.c b/target/alpha/translate.c index 90e6d5285f..744d8bbf12 100644 --- a/target/alpha/translate.c +++ b/target/alpha/translate.c @@ -783,11 +783,9 @@ static void gen_cvtlq(TCGv vc, TCGv vb) /* The arithmetic right shift here, plus the sign-extended mask below yields a sign-extended result without an explicit ext32s_i64. */ -tcg_gen_sari_i64(tmp, vb, 32); -tcg_gen_shri_i64(vc, vb, 29); -tcg_gen_andi_i64(tmp, tmp, (int32_t)0xc000); -tcg_gen_andi_i64(vc, vc, 0x3fff); -tcg_gen_or_i64(vc, vc, tmp); +tcg_gen_shri_i64(tmp, vb, 29); +tcg_gen_sari_i64(vc, vb, 32); +tcg_gen_deposit_i64(vc, vc, tmp, 0, 30); tcg_temp_free(tmp); } -- 2.13.3
[Qemu-devel] [PULL v2 09/14] target/ppc: optimize various functions using extract op
From: Philippe Mathieu-DaudéDone with the Coccinelle semantic patch scripts/coccinelle/tcg_gen_extract.cocci. Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Acked-by: David Gibson Message-Id: <20170718045540.16322-6-f4...@amsat.org> Signed-off-by: Richard Henderson --- target/ppc/translate.c | 21 +++-- target/ppc/translate/vsx-impl.inc.c | 24 2 files changed, 15 insertions(+), 30 deletions(-) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index c0cd64d927..de271af52b 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -873,8 +873,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1, } tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ carry */ tcg_temp_free(t1); -tcg_gen_shri_tl(cpu_ca, cpu_ca, 32); /* extract bit 32 */ -tcg_gen_andi_tl(cpu_ca, cpu_ca, 1); +tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1); if (is_isa300(ctx)) { tcg_gen_mov_tl(cpu_ca32, cpu_ca); } @@ -1404,8 +1403,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, TCGv ret, TCGv arg1, tcg_temp_free(inv1); tcg_gen_xor_tl(cpu_ca, t0, t1); /* bits changes w/ carry */ tcg_temp_free(t1); -tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */ -tcg_gen_andi_tl(cpu_ca, cpu_ca, 1); +tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1); if (is_isa300(ctx)) { tcg_gen_mov_tl(cpu_ca32, cpu_ca); } @@ -4336,8 +4334,7 @@ static void gen_mfsrin(DisasContext *ctx) CHK_SV; t0 = tcg_temp_new(); -tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28); -tcg_gen_andi_tl(t0, t0, 0xF); +tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4); gen_helper_load_sr(cpu_gpr[rD(ctx->opcode)], cpu_env, t0); tcg_temp_free(t0); #endif /* defined(CONFIG_USER_ONLY) */ @@ -4368,8 +4365,7 @@ static void gen_mtsrin(DisasContext *ctx) CHK_SV; t0 = tcg_temp_new(); -tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28); -tcg_gen_andi_tl(t0, t0, 0xF); +tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4); gen_helper_store_sr(cpu_env, t0, cpu_gpr[rD(ctx->opcode)]); tcg_temp_free(t0); #endif /* defined(CONFIG_USER_ONLY) */ @@ -4403,8 +4399,7 @@ static void gen_mfsrin_64b(DisasContext *ctx) CHK_SV; t0 = tcg_temp_new(); -tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28); -tcg_gen_andi_tl(t0, t0, 0xF); +tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4); gen_helper_load_sr(cpu_gpr[rD(ctx->opcode)], cpu_env, t0); tcg_temp_free(t0); #endif /* defined(CONFIG_USER_ONLY) */ @@ -4435,8 +4430,7 @@ static void gen_mtsrin_64b(DisasContext *ctx) CHK_SV; t0 = tcg_temp_new(); -tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28); -tcg_gen_andi_tl(t0, t0, 0xF); +tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4); gen_helper_store_sr(cpu_env, t0, cpu_gpr[rS(ctx->opcode)]); tcg_temp_free(t0); #endif /* defined(CONFIG_USER_ONLY) */ @@ -5414,8 +5408,7 @@ static void gen_mfsri(DisasContext *ctx) CHK_SV; t0 = tcg_temp_new(); gen_addr_reg_index(ctx, t0); -tcg_gen_shri_tl(t0, t0, 28); -tcg_gen_andi_tl(t0, t0, 0xF); +tcg_gen_extract_tl(t0, t0, 28, 4); gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0); tcg_temp_free(t0); if (ra != 0 && ra != rd) diff --git a/target/ppc/translate/vsx-impl.inc.c b/target/ppc/translate/vsx-impl.inc.c index 7f12908029..85ed135d44 100644 --- a/target/ppc/translate/vsx-impl.inc.c +++ b/target/ppc/translate/vsx-impl.inc.c @@ -1248,8 +1248,7 @@ static void gen_xsxexpdp(DisasContext *ctx) gen_exception(ctx, POWERPC_EXCP_VSXU); return; } -tcg_gen_shri_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52); -tcg_gen_andi_i64(rt, rt, 0x7FF); +tcg_gen_extract_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52, 11); } static void gen_xsxexpqp(DisasContext *ctx) @@ -1262,8 +1261,7 @@ static void gen_xsxexpqp(DisasContext *ctx) gen_exception(ctx, POWERPC_EXCP_VSXU); return; } -tcg_gen_shri_i64(xth, xbh, 48); -tcg_gen_andi_i64(xth, xth, 0x7FFF); +tcg_gen_extract_i64(xth, xbh, 48, 15); tcg_gen_movi_i64(xtl, 0); } @@ -1323,8 +1321,7 @@ static void gen_xsxsigdp(DisasContext *ctx) zr = tcg_const_i64(0); nan = tcg_const_i64(2047); -tcg_gen_shri_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52); -tcg_gen_andi_i64(exp, exp, 0x7FF); +tcg_gen_extract_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52, 11); tcg_gen_movi_i64(t0, 0x0010); tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0); tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0); @@ -1352,8
[Qemu-devel] [PULL v2 07/14] target/arm: optimize aarch32 rev16
From: Aurelien JarnoUse the same mask to avoid having to load two different constants, as suggested by Richard Henderson. Signed-off-by: Aurelien Jarno Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Richard Henderson Message-Id: <20170516230159.4195-2-aurel...@aurel32.net> Signed-off-by: Richard Henderson --- target/arm/translate.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/target/arm/translate.c b/target/arm/translate.c index e27736ce5b..d3003ae0d8 100644 --- a/target/arm/translate.c +++ b/target/arm/translate.c @@ -343,11 +343,13 @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b) static void gen_rev16(TCGv_i32 var) { TCGv_i32 tmp = tcg_temp_new_i32(); +TCGv_i32 mask = tcg_const_i32(0x00ff00ff); tcg_gen_shri_i32(tmp, var, 8); -tcg_gen_andi_i32(tmp, tmp, 0x00ff00ff); +tcg_gen_and_i32(tmp, tmp, mask); +tcg_gen_and_i32(var, var, mask); tcg_gen_shli_i32(var, var, 8); -tcg_gen_andi_i32(var, var, 0xff00ff00); tcg_gen_or_i32(var, var, tmp); +tcg_temp_free_i32(mask); tcg_temp_free_i32(tmp); } -- 2.13.3
[Qemu-devel] [PULL v2 08/14] target/m68k: optimize bcd_flags() using extract op
From: Philippe Mathieu-DaudéDone with the Coccinelle semantic patch scripts/coccinelle/tcg_gen_extract.cocci. Signed-off-by: Philippe Mathieu-Daudé Acked-by: Laurent Vivier Reviewed-by: Richard Henderson Message-Id: <20170718045540.16322-5-f4...@amsat.org> Signed-off-by: Richard Henderson --- target/m68k/translate.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/target/m68k/translate.c b/target/m68k/translate.c index 3a519b790d..e709e6cde2 100644 --- a/target/m68k/translate.c +++ b/target/m68k/translate.c @@ -1749,8 +1749,7 @@ static void bcd_flags(TCGv val) tcg_gen_andi_i32(QREG_CC_C, val, 0x0ff); tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_C); -tcg_gen_shri_i32(QREG_CC_C, val, 8); -tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1); +tcg_gen_extract_i32(QREG_CC_C, val, 8, 1); tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C); } -- 2.13.3
[Qemu-devel] [PULL v2 01/14] tcg/mips: reserve a register for the guest_base.
From: Jiang BiaoReserve a register for the guest_base using ppc code for reference. By doing so, we do not have to recompute it for every memory load. Signed-off-by: Jiang Biao Signed-off-by: Richard Henderson Message-Id: <1499677934-2249-1-git-send-email-jiang.bi...@zte.com.cn> --- tcg/mips/tcg-target.inc.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c index 85756b81d5..1a8169f5fc 100644 --- a/tcg/mips/tcg-target.inc.c +++ b/tcg/mips/tcg-target.inc.c @@ -85,6 +85,10 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = { #define TCG_TMP2 TCG_REG_T8 #define TCG_TMP3 TCG_REG_T7 +#ifndef CONFIG_SOFTMMU +#define TCG_GUEST_BASE_REG TCG_REG_S1 +#endif + /* check if we really need so many registers :P */ static const int tcg_target_reg_alloc_order[] = { /* Call saved registers. */ @@ -1547,8 +1551,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is_64) } else if (guest_base == (int16_t)guest_base) { tcg_out_opc_imm(s, ALIAS_PADDI, base, addr_regl, guest_base); } else { -tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, guest_base); -tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP0, addr_regl); +tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_GUEST_BASE_REG, addr_regl); } tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc, is_64); #endif @@ -1652,8 +1655,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is_64) } else if (guest_base == (int16_t)guest_base) { tcg_out_opc_imm(s, ALIAS_PADDI, base, addr_regl, guest_base); } else { -tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, guest_base); -tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP0, addr_regl); +tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_GUEST_BASE_REG, addr_regl); } tcg_out_qemu_st_direct(s, data_regl, data_regh, base, opc); #endif @@ -2452,6 +2454,13 @@ static void tcg_target_qemu_prologue(TCGContext *s) TCG_REG_SP, SAVE_OFS + i * REG_SIZE); } +#ifndef CONFIG_SOFTMMU +if (guest_base) { +tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base); +tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG); +} +#endif + /* Call generated code */ tcg_out_opc_reg(s, OPC_JR, 0, tcg_target_call_iarg_regs[1], 0); /* delay slot */ -- 2.13.3
[Qemu-devel] [PULL v2 03/14] tcg: Expand glue macros before stringifying helper names
Signed-off-by: Richard Henderson--- include/exec/helper-tcg.h | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/include/exec/helper-tcg.h b/include/exec/helper-tcg.h index bb9287727c..b0c5bafa99 100644 --- a/include/exec/helper-tcg.h +++ b/include/exec/helper-tcg.h @@ -6,31 +6,35 @@ #include "exec/helper-head.h" +/* Need one more level of indirection before stringification + to get all the macros expanded first. */ +#define str(s) #s + #define DEF_HELPER_FLAGS_0(NAME, FLAGS, ret) \ - { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \ + { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \ .sizemask = dh_sizemask(ret, 0) }, #define DEF_HELPER_FLAGS_1(NAME, FLAGS, ret, t1) \ - { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \ + { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \ .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) }, #define DEF_HELPER_FLAGS_2(NAME, FLAGS, ret, t1, t2) \ - { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \ + { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \ .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \ | dh_sizemask(t2, 2) }, #define DEF_HELPER_FLAGS_3(NAME, FLAGS, ret, t1, t2, t3) \ - { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \ + { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \ .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \ | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) }, #define DEF_HELPER_FLAGS_4(NAME, FLAGS, ret, t1, t2, t3, t4) \ - { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \ + { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \ .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \ | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) }, #define DEF_HELPER_FLAGS_5(NAME, FLAGS, ret, t1, t2, t3, t4, t5) \ - { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \ + { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \ .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \ | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) \ | dh_sizemask(t5, 5) }, @@ -39,6 +43,7 @@ #include "trace/generated-helpers.h" #include "tcg-runtime.h" +#undef str #undef DEF_HELPER_FLAGS_0 #undef DEF_HELPER_FLAGS_1 #undef DEF_HELPER_FLAGS_2 -- 2.13.3
[Qemu-devel] [PULL v2 04/14] coccinelle: ignore ASTs pre-parsed cached C files
From: Philippe Mathieu-Daudéfiles generated using coccinelle tool: 'spatch --use-cache' Reviewed-by: Eric Blake Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20170718045540.16322-2-f4...@amsat.org> Signed-off-by: Richard Henderson --- .gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.gitignore b/.gitignore index 09c2363acf..cf65316863 100644 --- a/.gitignore +++ b/.gitignore @@ -116,6 +116,8 @@ tags TAGS docker-src.* *~ +*.ast_raw +*.depend_raw trace.h trace.c trace-ust.h -- 2.13.3
[Qemu-devel] [PULL v2 05/14] coccinelle: add a script to optimize tcg op using tcg_gen_extract()
From: Philippe Mathieu-DaudéThe following thread was helpful while writing this script: https://github.com/coccinelle/coccinelle/issues/86 Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20170718045540.16322-3-f4...@amsat.org> Signed-off-by: Richard Henderson --- scripts/coccinelle/tcg_gen_extract.cocci | 107 +++ 1 file changed, 107 insertions(+) create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci diff --git a/scripts/coccinelle/tcg_gen_extract.cocci b/scripts/coccinelle/tcg_gen_extract.cocci new file mode 100644 index 00..81e66a35ae --- /dev/null +++ b/scripts/coccinelle/tcg_gen_extract.cocci @@ -0,0 +1,107 @@ +// optimize TCG using extract op +// +// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+. +// Confidence: High +// Options: --macro-file scripts/cocci-macro-file.h +// +// Nikunj A Dadhania optimization: +// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html +// Aurelien Jarno optimization: +// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html +// +// This script can be run either using spatch locally or via a docker image: +// +// $ spatch \ +// --macro-file scripts/cocci-macro-file.h \ +// --sp-file scripts/coccinelle/tcg_gen_extract.cocci \ +// --keep-comments --in-place \ +// --use-gitgrep --dir target +// +// $ docker run --rm -v `pwd`:`pwd` -w `pwd` philmd/coccinelle \ +// --macro-file scripts/cocci-macro-file.h \ +// --sp-file scripts/coccinelle/tcg_gen_extract.cocci \ +// --keep-comments --in-place \ +// --use-gitgrep --dir target + +@initialize:python@ +@@ +import sys +fd = sys.stderr +def debug(msg="", trailer="\n"): +fd.write("[DBG] " + msg + trailer) +def low_bits_count(value): +bits_count = 0 +while (value & (1 << bits_count)): +bits_count += 1 +return bits_count +def Mn(order): # Mersenne number +return (1 << order) - 1 + +@match@ +identifier ret; +metavariable arg; +constant ofs, msk; +position shr_p, and_p; +@@ +( +tcg_gen_shri_i32@shr_p +| +tcg_gen_shri_i64@shr_p +| +tcg_gen_shri_tl@shr_p +)(ret, arg, ofs); +... WHEN != ret +( +tcg_gen_andi_i32@and_p +| +tcg_gen_andi_i64@and_p +| +tcg_gen_andi_tl@and_p +)(ret, ret, msk); + +@script:python verify_len depends on match@ +ret_s << match.ret; +msk_s << match.msk; +shr_p << match.shr_p; +extract_len; +@@ +is_optimizable = False +debug("candidate at %s:%s" % (shr_p[0].file, shr_p[0].line)) +try: # only eval integer, no #define like 'SR_M' (cpp did this, else some headers are missing). +msk_v = long(msk_s.strip("UL"), 0) +msk_b = low_bits_count(msk_v) +if msk_b == 0: +debug(" value: 0x%x low_bits: %d" % (msk_v, msk_b)) +else: +debug(" value: 0x%x low_bits: %d [Mersenne number: 0x%x]" % (msk_v, msk_b, Mn(msk_b))) +is_optimizable = Mn(msk_b) == msk_v # check low_bits +coccinelle.extract_len = "%d" % msk_b +debug(" candidate %s optimizable" % ("IS" if is_optimizable else "is NOT")) +except: +debug(" ERROR (check included headers?)") +cocci.include_match(is_optimizable) +debug() + +@replacement depends on verify_len@ +identifier match.ret; +metavariable match.arg; +constant match.ofs, match.msk; +position match.shr_p, match.and_p; +identifier verify_len.extract_len; +@@ +( +-tcg_gen_shri_i32@shr_p(ret, arg, ofs); ++tcg_gen_extract_i32(ret, arg, ofs, extract_len); +... WHEN != ret +-tcg_gen_andi_i32@and_p(ret, ret, msk); +| +-tcg_gen_shri_i64@shr_p(ret, arg, ofs); ++tcg_gen_extract_i64(ret, arg, ofs, extract_len); +... WHEN != ret +-tcg_gen_andi_i64@and_p(ret, ret, msk); +| +-tcg_gen_shri_tl@shr_p(ret, arg, ofs); ++tcg_gen_extract_tl(ret, arg, ofs, extract_len); +... WHEN != ret +-tcg_gen_andi_tl@and_p(ret, ret, msk); +) -- 2.13.3
[Qemu-devel] [PULL v2 02/14] util/cacheinfo: Add missing include for ppc linux
From: Philippe Mathieu-DaudéThis include was forgotten when splitting cacheinfo.c out of tcg/ppc/tcg-target.inc.c (see commit b255b2c8). For a Centos7 host, the include path implicitly pulls in the desired AT_* defines. Not so for Debian Jessie. Signed-off-by: Philippe Mathieu-Daudé Message-Id: <20170711015524.22936-1-f4...@amsat.org> Signed-off-by: Richard Henderson --- util/cacheinfo.c | 1 + 1 file changed, 1 insertion(+) diff --git a/util/cacheinfo.c b/util/cacheinfo.c index 6253049533..593940f27b 100644 --- a/util/cacheinfo.c +++ b/util/cacheinfo.c @@ -129,6 +129,7 @@ static void arch_cache_info(int *isize, int *dsize) } #elif defined(_ARCH_PPC) && defined(__linux__) +# include "elf.h" static void arch_cache_info(int *isize, int *dsize) { -- 2.13.3
[Qemu-devel] [PULL v2 00/14] tcg-next patch queue
This edition is a real mix: * Code gen improvement for mips64 host (Jiang) * Build fix for ppc-linux (Philippe) * Runtime fix for tci (Philippe) * Fix atomic helper names in debugging dumps (rth) * Cross-target tcg code gen improvements (Philippe) This one had no obvious tree through which it should go, so I went ahead and took them all. * Cherry-picked the first patch from Lluis' generic translate loop, wherein the interface to gen_intermediate_code changes trivially. It's the only patch from that series that touches all targets, and I see little point carrying it around further. V2: Fixed typo in the sparc mulscc deposit patch. r~ The following changes since commit d4e59218ab80e86015753782fb5378767a51ccd0: Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2017-07-18-2' into staging (2017-07-19 20:45:37 +0100) are available in the git repository at: git://github.com/rth7680/qemu.git tags/pull-tcg-20170719 for you to fetch changes up to 9c489ea6bed134fecfd556b439c68bba48fbe102: tcg: Pass generic CPUState to gen_intermediate_code() (2017-07-19 14:45:16 -0700) Queued tcg and tcg code gen related cleanups Aurelien Jarno (1): target/arm: optimize aarch32 rev16 Jiang Biao (1): tcg/mips: reserve a register for the guest_base. Lluís Vilanova (1): tcg: Pass generic CPUState to gen_intermediate_code() Philippe Mathieu-Daudé (9): util/cacheinfo: Add missing include for ppc linux coccinelle: ignore ASTs pre-parsed cached C files coccinelle: add a script to optimize tcg op using tcg_gen_extract() target/m68k: optimize bcd_flags() using extract op target/ppc: optimize various functions using extract op target/sparc: optimize various functions using extract op target/sparc: optimize gen_op_mulscc() using deposit op target/alpha: optimize gen_cvtlq() using deposit op tcg/tci: enable bswap16_i64 Richard Henderson (2): tcg: Expand glue macros before stringifying helper names target/arm: Optimize aarch64 rev16 include/exec/exec-all.h | 2 +- include/exec/helper-tcg.h| 17 +++-- target/arm/translate.h | 4 +- accel/tcg/translate-all.c| 2 +- target/alpha/translate.c | 13 ++-- target/arm/translate-a64.c | 30 +++-- target/arm/translate.c | 12 ++-- target/cris/translate.c | 7 +- target/hppa/translate.c | 5 +- target/i386/translate.c | 5 +- target/lm32/translate.c | 4 +- target/m68k/translate.c | 8 +-- target/microblaze/translate.c| 4 +- target/mips/translate.c | 5 +- target/moxie/translate.c | 4 +- target/nios2/translate.c | 5 +- target/openrisc/translate.c | 4 +- target/ppc/translate.c | 26 +++- target/ppc/translate/vsx-impl.inc.c | 24 +++ target/s390x/translate.c | 5 +- target/sh4/translate.c | 5 +- target/sparc/translate.c | 25 +++- target/tilegx/translate.c| 5 +- target/tricore/translate.c | 5 +- target/unicore32/translate.c | 5 +- target/xtensa/translate.c| 5 +- tcg/mips/tcg-target.inc.c| 17 +++-- tcg/tci.c| 1 - util/cacheinfo.c | 1 + .gitignore | 2 + scripts/coccinelle/tcg_gen_extract.cocci | 107 +++ 31 files changed, 218 insertions(+), 146 deletions(-) create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci
Re: [Qemu-devel] [PULL 0/8] target/alpha cleanups
On 07/19/2017 06:57 AM, Peter Maydell wrote: On 19 July 2017 at 05:45, Richard Hendersonwrote: The new title holder for perf top is helper_lookup_tb_ptr. Those targets that have a complicated cpu_get_tb_cpu_state function are going to regret that. Yeah, Paolo's pointed out (and had some patches for) ARM's rather complicated cpu_get_tb_cpu_state(). My issue with his suggested fixes was that they were pretty fragile in terms of not having any guarantee that the change always produced the right tb cpu state flags answer... Oh? I must have missed seeing this one. A quick patchwork search doesn't pull it up; do either of you have a link? r~
Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation
On Wed, 19 Jul 2017 18:45:43 +0800 "Liu, Yi L"wrote: > On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote: > > On Mon, 17 Jul 2017 10:58:41 + > > "Liu, Yi L" wrote: > > > > > Hi Alex, > > > > > > Pls refer to the response inline. > > > > > > > -Original Message- > > > > From: kvm-ow...@vger.kernel.org > > > > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alex Williamson > > > > Sent: Saturday, July 15, 2017 2:16 AM > > > > To: Liu, Yi L > > > > Cc: Jean-Philippe Brucker ; > > > > Tian, Kevin ; Liu, Yi L > > > > ; Lan, Tianyu ; > > > > Raj, Ashok ; k...@vger.kernel.org; > > > > jasow...@redhat.com; Will Deacon ; > > > > pet...@redhat.com; qemu-devel@nongnu.org; > > > > io...@lists.linux-foundation.org; Pan, Jacob jun > > > > ; Joerg Roedel > > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL > > > > for IOMMU TLB invalidate propagation > > > > > > > > On Fri, 14 Jul 2017 08:58:02 + > > > > "Liu, Yi L" wrote: > > > > > > > > > Hi Alex, > > > > > > > > > > Against to the opaque open, I'd like to propose the following > > > > > definition based on the existing comments. Pls note that I've > > > > > merged the pasid table binding and iommu tlb invalidation > > > > > into a single IOCTL and make different flags to indicate the > > > > > iommu operations. Per Kevin's comments, there may be iommu > > > > > invalidation for guest IOVA tlb, so I renamed the IOCTL and > > > > > data structure to be non-svm specific. Pls kindly have a > > > > > review, so that we can make the opaque open closed and move > > > > > forward. Surely, comments and ideas are welcomed. And for the > > > > > scope and flags definition in struct iommu_tlb_invalidate, > > > > > it's also welcomed to > > > > give your ideas on it. > > > > > > > > > > 1. Add a VFIO IOCTL for iommu operations from user-space > > > > > > > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24) > > > > > > > > > > Corresponding data structure: > > > > > struct vfio_iommu_operation_info { > > > > > __u32 argsz; > > > > > #define VFIO_IOMMU_BIND_PASIDTBL (1 << 0) /* Bind > > > > > PASID Table */ #define VFIO_IOMMU_BIND_PASID (1 << > > > > > 1) /* Bind PASID from userspace > > > > driver*/ > > > > > #define VFIO_IOMMU_BIND_PGTABLE (1 << 2) /* Bind guest > > > > > mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB (1 << > > > > > 3) /* Invalidate iommu tlb */ __u32 flag; > > > > > __u32 length; // length of the data[] part in > > > > > byte __u8 data[]; // stores the data for iommu op > > > > > indicated by flag field }; > > > > > > > > If we're doing a generic "Ops" ioctl, then we should have an > > > > "op" field which is defined by an enum. It doesn't make sense > > > > to use flags for this, for example can we set multiple flag > > > > bits? If not then it's not a good use for a bit field. I'm > > > > also not sure I understand the value of the "length" field, > > > > can't it always be calculated from argsz? > > > > > > Agreed, enum would be better. "length" field could be calculated > > > from argsz. I used it just to avoid offset calculations. May > > > remove it. > > > > > For iommu tlb invalidation from userspace, the "__u8 data[]" > > > > > stores data which would be parsed by the "struct > > > > > iommu_tlb_invalidate" defined below. > > > > > > > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added > > > > > header file) > > > > > > > > > > /* IOMMU model definition for iommu operations from userspace > > > > > */ enum iommu_model { > > > > > INTLE_IOMMU, > > > > > ARM_SMMU, > > > > > AMD_IOMMU, > > > > > SPAPR_IOMMU, > > > > > S390_IOMMU, > > > > > }; > > > > > > > > > > struct iommu_tlb_invalidate { > > > > > __u32 scope; > > > > > /* pasid-selective invalidation described by @pasid */ > > > > > #define IOMMU_INVALIDATE_PASID(1 << 0) > > > > > /* address-selevtive invalidation described by (@vaddr, > > > > > @size) */ #define IOMMU_INVALIDATE_VADDR (1 << 1) > > > > > > > > Again, is a bit field appropriate here, can a user set both > > > > bits? > > > > > > yes, user may set both bits. It would be invalidate address range > > > which is tagged with a PASID value. > > > > > > > > > > > > __u32 flags; > > > > > /* targets non-pasid mappings, @pasid is not valid */ > > > > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0) > > > > > /* indicating that the pIOMMU doesn't need to invalidate > > > > > all intermediate tables cached as part of the PTE for > > > > > vaddr, only the last-level entry (pte). This is a > > > > > hint. */ #define
Re: [Qemu-devel] [PATCH v5 0/3] Add litmus tests for MTTCG consistency tests
Hi Pranith, On 12/01/2016 02:28 AM, Pranith Kumar wrote: Hello, The following patch series adds litmus tests to test consistency for MTTCG enabled qemu. These patches apply on top of the clean up tests/tcg folder made by my previous patch series. The tests were generated using the litmus tool. The sources and instructions on how to generate these sources can be found in this repository: https://github.com/pranith/qemu-litmus I tested these on both an x86 and an Aarch64 machine. These tests are currently enabled for the trusty configuration on travis. Thanks, -- Pranith *** BLURB HERE *** Pranith Kumar (3): tests/tcg: Add i386 litmus test tests/tcg: Add aarch64 litmus tests travis: Enable litmus tests .travis.yml |8 + tests/tcg/aarch64/litmus/ARMARM00.c | 501 + tests/tcg/aarch64/litmus/ARMARM01.c | 504 + tests/tcg/aarch64/litmus/ARMARM02.c | 571 ++ tests/tcg/aarch64/litmus/ARMARM03.c | 498 + tests/tcg/aarch64/litmus/ARMARM04+BIS.c | 556 ++ tests/tcg/aarch64/litmus/ARMARM04+TER.c | 538 ++ tests/tcg/aarch64/litmus/ARMARM04.c | 556 ++ tests/tcg/aarch64/litmus/ARMARM05.c | 553 ++ tests/tcg/aarch64/litmus/ARMARM06+AP+AA.c | 581 +++ tests/tcg/aarch64/litmus/ARMARM06+AP+AP.c | 581 +++ tests/tcg/aarch64/litmus/ARMARM06.c | 581 +++ tests/tcg/aarch64/litmus/ARMARM07+SAL.c | 497 + tests/tcg/aarch64/litmus/Makefile | 53 ++ tests/tcg/aarch64/litmus/README.txt | 22 + tests/tcg/aarch64/litmus/affinity.c | 159 tests/tcg/aarch64/litmus/affinity.h | 34 + tests/tcg/aarch64/litmus/comp.sh | 30 + tests/tcg/aarch64/litmus/litmus_rand.c| 64 ++ tests/tcg/aarch64/litmus/litmus_rand.h| 29 + tests/tcg/aarch64/litmus/outs.c | 148 tests/tcg/aarch64/litmus/outs.h | 49 ++ tests/tcg/aarch64/litmus/run.sh | 378 ++ tests/tcg/aarch64/litmus/show.awk |2 + tests/tcg/aarch64/litmus/utils.c | 1148 + tests/tcg/aarch64/litmus/utils.h | 275 +++ tests/tcg/i386/litmus/Makefile| 42 ++ can you add an entry for both folders into MAINTAINERS please? tests/tcg/i386/litmus/README.txt | 22 + tests/tcg/i386/litmus/SAL.c | 491 tests/tcg/i386/litmus/affinity.c | 159 tests/tcg/i386/litmus/affinity.h | 34 + tests/tcg/i386/litmus/comp.sh | 10 + tests/tcg/i386/litmus/litmus_rand.c | 64 ++ tests/tcg/i386/litmus/litmus_rand.h | 29 + tests/tcg/i386/litmus/outs.c | 148 tests/tcg/i386/litmus/outs.h | 49 ++ tests/tcg/i386/litmus/run.sh | 55 ++ tests/tcg/i386/litmus/show.awk|2 + tests/tcg/i386/litmus/utils.c | 1148 + tests/tcg/i386/litmus/utils.h | 275 +++ 40 files changed, 11444 insertions(+) create mode 100644 tests/tcg/aarch64/litmus/ARMARM00.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM01.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM02.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM03.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM04+BIS.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM04+TER.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM04.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM05.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM06+AP+AA.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM06+AP+AP.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM06.c create mode 100644 tests/tcg/aarch64/litmus/ARMARM07+SAL.c create mode 100644 tests/tcg/aarch64/litmus/Makefile create mode 100644 tests/tcg/aarch64/litmus/README.txt create mode 100644 tests/tcg/aarch64/litmus/affinity.c create mode 100644 tests/tcg/aarch64/litmus/affinity.h create mode 100644 tests/tcg/aarch64/litmus/comp.sh create mode 100644 tests/tcg/aarch64/litmus/litmus_rand.c create mode 100644 tests/tcg/aarch64/litmus/litmus_rand.h create mode 100644 tests/tcg/aarch64/litmus/outs.c create mode 100644 tests/tcg/aarch64/litmus/outs.h create mode 100755 tests/tcg/aarch64/litmus/run.sh create mode 100644 tests/tcg/aarch64/litmus/show.awk create mode 100644 tests/tcg/aarch64/litmus/utils.c create mode 100644 tests/tcg/aarch64/litmus/utils.h create mode 100644 tests/tcg/i386/litmus/Makefile create mode 100644 tests/tcg/i386/litmus/README.txt create mode 100644 tests/tcg/i386/litmus/SAL.c create mode 100644 tests/tcg/i386/litmus/affinity.c create mode 100644 tests/tcg/i386/litmus/affinity.h create mode 100644 tests/tcg/i386/litmus/comp.sh create mode 100644
Re: [Qemu-devel] [PULL 00/14] tcg-next patch queue
On 07/19/2017 10:33 AM, Philippe Mathieu-Daudé wrote: On 07/19/2017 04:45 PM, Peter Maydell wrote: The sparc-linux-user test fails: /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc -L ./gnemul/qemu-sparc sparc/ls -l dummyfile Inconsistency detected by ld.so: rtld.c: 858: dl_main: Assertion `_dl_rtld_map.l_prev->l_next == _dl_rtld_map.l_next' failed! Makefile:6: recipe for target 'test' failed A valgrind run produces a lot of noise, but this bit looks suspicious: ==14436== ==14436== Conditional jump or move depends on uninitialised value(s) ==14436==at 0x60003F7C: tcg_out_qemu_st_direct (tcg-target.inc.c:1733) ==14436==by 0x60004295: tcg_out_qemu_st (tcg-target.inc.c:1856) ==14436==by 0x60004F0C: tcg_out_op (tcg-target.inc.c:2140) ==14436==by 0x6000B0FF: tcg_reg_alloc_op (tcg.c:2360) ==14436==by 0x6000BCED: tcg_gen_code (tcg.c:2679) ==14436==by 0x600387B7: tb_gen_code (translate-all.c:1311) ==14436==by 0x6003637B: tb_find (cpu-exec.c:367) ==14436==by 0x60036A7C: cpu_exec (cpu-exec.c:675) ==14436==by 0x60039DA1: cpu_loop (main.c:1088) ==14436==by 0x6003B7AF: main (main.c:4860) ==14436== ==14436== Invalid write of size 4 ==14436==at 0x605114FA: ??? ==14436==by 0x6011ADDF: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6253464F: ??? ==14436==by 0x6022852F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6022818C: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6022852F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x416: ??? ==14436==by 0x60227F1F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436== Address 0x59d1c7d0 is not stack'd, malloc'd or (recently) free'd ==14436== Reverting "target/sparc: optimize gen_op_mulscc() using deposit op" fixed this, so I think that's probably the culprit. Thank you for taking time with valgrind, I'll verify sparc/tcg opcode used. A simple typo, Phil, diff --git a/target/sparc/translate.c b/target/sparc/translate.c index 56ef73c794..3bde47be83 100644 --- a/target/sparc/translate.c +++ b/target/sparc/translate.c @@ -633,7 +633,7 @@ static inline void gen_op_mulscc // b2 = T0 & 1; // env->y = (b2 << 31) | (env->y >> 1); tcg_gen_extract_tl(t0, cpu_y, 1, 31); -tcg_gen_deposit_tl(cpu_y, cpu_y, cpu_cc_src, 31, 1); +tcg_gen_deposit_tl(cpu_y, t0, cpu_cc_src, 31, 1); // b1 = N ^ V; gen_mov_reg_N(t0, cpu_psr); I'll respin. r~
Re: [Qemu-devel] [PULL 00/14] tcg-next patch queue
On 07/19/2017 04:45 PM, Peter Maydell wrote: The sparc-linux-user test fails: /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc -L ./gnemul/qemu-sparc sparc/ls -l dummyfile Inconsistency detected by ld.so: rtld.c: 858: dl_main: Assertion `_dl_rtld_map.l_prev->l_next == _dl_rtld_map.l_next' failed! Makefile:6: recipe for target 'test' failed A valgrind run produces a lot of noise, but this bit looks suspicious: ==14436== ==14436== Conditional jump or move depends on uninitialised value(s) ==14436==at 0x60003F7C: tcg_out_qemu_st_direct (tcg-target.inc.c:1733) ==14436==by 0x60004295: tcg_out_qemu_st (tcg-target.inc.c:1856) ==14436==by 0x60004F0C: tcg_out_op (tcg-target.inc.c:2140) ==14436==by 0x6000B0FF: tcg_reg_alloc_op (tcg.c:2360) ==14436==by 0x6000BCED: tcg_gen_code (tcg.c:2679) ==14436==by 0x600387B7: tb_gen_code (translate-all.c:1311) ==14436==by 0x6003637B: tb_find (cpu-exec.c:367) ==14436==by 0x60036A7C: cpu_exec (cpu-exec.c:675) ==14436==by 0x60039DA1: cpu_loop (main.c:1088) ==14436==by 0x6003B7AF: main (main.c:4860) ==14436== ==14436== Invalid write of size 4 ==14436==at 0x605114FA: ??? ==14436==by 0x6011ADDF: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6253464F: ??? ==14436==by 0x6022852F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6022818C: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6022852F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x416: ??? ==14436==by 0x60227F1F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436== Address 0x59d1c7d0 is not stack'd, malloc'd or (recently) free'd ==14436== Reverting "target/sparc: optimize gen_op_mulscc() using deposit op" fixed this, so I think that's probably the culprit. Thank you for taking time with valgrind, I'll verify sparc/tcg opcode used. Phil.
Re: [Qemu-devel] [PULL v2 00/18] Merge crypto 201/07/18
On 19 July 2017 at 10:15, Daniel P. Berrangewrote: > The following changes since commit 6887dc6700ccb7820d8a9d370f421ee361c748e8: > > Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170718' into > staging (2017-07-18 21:13:48 +0100) > > are available in the git repository at: > > git://github.com/berrange/qemu tags/pull-qcrypto-2017-07-18-2 > > for you to fetch changes up to c7a9af4b450c863cd84ad245ebc52a831c661392: > > tests: crypto: add hmac speed benchmark support (2017-07-19 10:11:05 +0100) > > > Merge qcrypto 2017/07/18 v2 Applied, thanks. -- PMM
[Qemu-devel] [PATCH 3/4] GRETAP Backend for UDST
From: Anton IvanovGRETAP Backend for Universal Datagram Socket Transport Signed-off-by: Anton Ivanov --- net/Makefile.objs | 2 +- net/clients.h | 4 + net/gre.c | 311 ++ net/net.c | 1 + qapi-schema.json | 41 ++- qemu-options.hx | 60 ++- 6 files changed, 414 insertions(+), 5 deletions(-) create mode 100644 net/gre.c diff --git a/net/Makefile.objs b/net/Makefile.objs index ffdfb96bd0..919bc3d78f 100644 --- a/net/Makefile.objs +++ b/net/Makefile.objs @@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o common-obj-y += socket.o common-obj-y += dump.o common-obj-y += eth.o -common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o +common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o common-obj-$(CONFIG_POSIX) += vhost-user.o common-obj-$(CONFIG_SLIRP) += slirp.o common-obj-$(CONFIG_VDE) += vde.o diff --git a/net/clients.h b/net/clients.h index 5cae479730..8f8a59aee3 100644 --- a/net/clients.h +++ b/net/clients.h @@ -49,6 +49,10 @@ int net_init_bridge(const Netdev *netdev, const char *name, int net_init_l2tpv3(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp); + +int net_init_gre(const Netdev *netdev, const char *name, +NetClientState *peer, Error **errp); + #ifdef CONFIG_VDE int net_init_vde(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp); diff --git a/net/gre.c b/net/gre.c new file mode 100644 index 00..7734d78102 --- /dev/null +++ b/net/gre.c @@ -0,0 +1,311 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2015-2017 Cambridge GREys Limited + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2012-2014 Cisco Systems + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include +#include +#include "net/net.h" +#include "clients.h" +#include "qemu-common.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qemu/option.h" +#include "qemu/sockets.h" +#include "qemu/iov.h" +#include "qemu/main-loop.h" +#include "udst.h" + +/* IANA-assigned IP protocol ID for GRE */ + + +#ifndef IPPROTO_GRE +#define IPPROTO_GRE 0x2F +#endif + +#define GRE_MODE_CHECKSUM htons(8 << 12) /* checksum */ +#define GRE_MODE_RESERVED htons(4 << 12) /* unused */ +#define GRE_MODE_KEY htons(2 << 12) /* KEY present */ +#define GRE_MODE_SEQUENCE htons(1 << 12) /* no sequence */ + + +/* GRE TYPE for Ethernet in GRE aka GRETAP */ + +#define GRE_IRB htons(0x6558) + +struct gre_minimal_header { + uint16_t header; + uint16_t arptype; +}; + +typedef struct GRETunnelParams { +/* + * GRE parameters + */ + +uint32_t rx_key; +uint32_t tx_key; +uint32_t sequence; + +/* Flags */ + +bool ipv6; +bool udp; +bool has_sequence; +bool pin_sequence; +bool checksum; +bool key; + +/* Precomputed GRE specific offsets */ + +uint32_t key_offset; +uint32_t sequence_offset; +uint32_t checksum_offset; + +struct gre_minimal_header header_bits; + +} GRETunnelParams; + + + +static void gre_form_header(void *us) +{ +NetUdstState *s = (NetUdstState *) us; +GRETunnelParams *p = (GRETunnelParams *) s->params; + +uint32_t *sequence; + +*((uint32_t *) s->header_buf) = *((uint32_t *) >header_bits); + +if (p->key) { +stl_be_p( +(uint32_t *) (s->header_buf + p->key_offset), +p->tx_key +); +} +if (p->has_sequence) { +sequence = (uint32_t *)(s->header_buf + p->sequence_offset); +if (p->pin_sequence) { +*sequence = 0; +} else { +stl_be_p(sequence, ++p->sequence); +} +} +} + +static int gre_verify_header(void *us, uint8_t *buf) +{ + +NetUdstState *s = (NetUdstState
[Qemu-devel] [PATCH 4/4] Raw Backend for UDST
From: Anton IvanovRaw Socket Backend for Universal Datagram Socket Transport Signed-off-by: Anton Ivanov --- net/Makefile.objs | 2 +- net/clients.h | 3 ++ net/net.c | 1 + net/raw.c | 123 ++ qapi-schema.json | 20 - qemu-options.hx | 32 ++ 6 files changed, 178 insertions(+), 3 deletions(-) create mode 100644 net/raw.c diff --git a/net/Makefile.objs b/net/Makefile.objs index 919bc3d78f..457297b5ed 100644 --- a/net/Makefile.objs +++ b/net/Makefile.objs @@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o common-obj-y += socket.o common-obj-y += dump.o common-obj-y += eth.o -common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o +common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o raw.o common-obj-$(CONFIG_POSIX) += vhost-user.o common-obj-$(CONFIG_SLIRP) += slirp.o common-obj-$(CONFIG_VDE) += vde.o diff --git a/net/clients.h b/net/clients.h index 8f8a59aee3..98d8ae59b7 100644 --- a/net/clients.h +++ b/net/clients.h @@ -53,6 +53,9 @@ int net_init_l2tpv3(const Netdev *netdev, const char *name, int net_init_gre(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp); +int net_init_raw(const Netdev *netdev, const char *name, +NetClientState *peer, Error **errp); + #ifdef CONFIG_VDE int net_init_vde(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp); diff --git a/net/net.c b/net/net.c index 6163a8a3af..8eb0aa2bee 100644 --- a/net/net.c +++ b/net/net.c @@ -963,6 +963,7 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])( #ifdef CONFIG_UDST [NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3, [NET_CLIENT_DRIVER_GRE] = net_init_gre, +[NET_CLIENT_DRIVER_RAW] = net_init_raw, #endif }; diff --git a/net/raw.c b/net/raw.c new file mode 100644 index 00..8f73248095 --- /dev/null +++ b/net/raw.c @@ -0,0 +1,123 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2015-2017 Cambridge Greys Limited + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2012-2014 Cisco Systems + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" +#include +#include +#include +#include +#include "net/net.h" +#include +#include +#include +#include "clients.h" +#include "qemu-common.h" +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qemu/option.h" +#include "qemu/sockets.h" +#include "qemu/iov.h" +#include "qemu/main-loop.h" +#include "udst.h" + +static int noop(void *us, uint8_t *buf) +{ +return 0; +} + +int net_init_raw(const Netdev *netdev, +const char *name, +NetClientState *peer, Error **errp) +{ + +const NetdevRawOptions *raw; +NetUdstState *s; +NetClientState *nc; + +int fd = -1; +int err; + +struct ifreq ifr; +struct sockaddr_ll sock; + + +nc = qemu_new_udst_net_client(name, peer); + +s = DO_UPCAST(NetUdstState, nc, nc); + +fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); +if (fd == -1) { +err = -errno; +error_report("raw_open : raw socket creation failed, errno = %d", -err); +goto outerr; +} + + +s->dgram_dst = NULL; +s->dst_size = 0; + +assert(netdev->type == NET_CLIENT_DRIVER_RAW); +raw = >u.raw; + +memset(, 0, sizeof(struct ifreq)); +strncpy((char *) _name, raw->ifname, sizeof(ifr.ifr_name) - 1); + +if (ioctl(fd, SIOCGIFINDEX, (void *) ) < 0) { +err = -errno; +error_report("SIOCGIFINDEX, failed to get raw interface index for %s", +raw->ifname); +goto outerr; +} + +sock.sll_family = AF_PACKET; +sock.sll_protocol = htons(ETH_P_ALL); +sock.sll_ifindex = ifr.ifr_ifindex; + +if (bind(fd, (struct
[Qemu-devel] [PATCH 2/4] Migrate l2tpv3 to UDST Backend
From: Anton IvanovMigrate L2TPv3 transport to the Unified Datagram Socket Transport Backend. Signed-off-by: Anton Ivanov --- net/l2tpv3.c | 537 +-- 1 file changed, 83 insertions(+), 454 deletions(-) diff --git a/net/l2tpv3.c b/net/l2tpv3.c index 6745b78990..25b7628244 100644 --- a/net/l2tpv3.c +++ b/net/l2tpv3.c @@ -1,6 +1,7 @@ /* * QEMU System Emulator * + * Copyright (c) 2015-2017 Cambridge Greys Limited * Copyright (c) 2003-2008 Fabrice Bellard * Copyright (c) 2012-2014 Cisco Systems * @@ -30,23 +31,14 @@ #include "clients.h" #include "qemu-common.h" #include "qemu/error-report.h" +#include "qapi/error.h" #include "qemu/option.h" #include "qemu/sockets.h" #include "qemu/iov.h" #include "qemu/main-loop.h" +#include "udst.h" -/* The buffer size needs to be investigated for optimum numbers and - * optimum means of paging in on different systems. This size is - * chosen to be sufficient to accommodate one packet with some headers - */ - -#define BUFFER_ALIGN sysconf(_SC_PAGESIZE) -#define BUFFER_SIZE 2048 -#define IOVSIZE 2 -#define MAX_L2TPV3_MSGCNT 64 -#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE) - /* Header set to 0x3 signifies a data packet */ #define L2TPV3_DATA_PACKET 0x3 @@ -57,31 +49,7 @@ #define IPPROTO_L2TP 0x73 #endif -typedef struct NetL2TPV3State { -NetClientState nc; -int fd; - -/* - * these are used for xmit - that happens packet a time - * and for first sign of life packet (easier to parse that once) - */ - -uint8_t *header_buf; -struct iovec *vec; - -/* - * these are used for receive - try to "eat" up to 32 packets at a time - */ - -struct mmsghdr *msgvec; - -/* - * peer address - */ - -struct sockaddr_storage *dgram_dst; -uint32_t dst_size; - +typedef struct L2TPV3TunnelParams { /* * L2TPv3 parameters */ @@ -90,37 +58,8 @@ typedef struct NetL2TPV3State { uint64_t tx_cookie; uint32_t rx_session; uint32_t tx_session; -uint32_t header_size; uint32_t counter; -/* -* DOS avoidance in error handling -*/ - -bool header_mismatch; - -/* - * Ring buffer handling - */ - -int queue_head; -int queue_tail; -int queue_depth; - -/* - * Precomputed offsets - */ - -uint32_t offset; -uint32_t cookie_offset; -uint32_t counter_offset; -uint32_t session_offset; - -/* Poll Control */ - -bool read_poll; -bool write_poll; - /* Flags */ bool ipv6; @@ -130,189 +69,62 @@ typedef struct NetL2TPV3State { bool cookie; bool cookie_is_64; -} NetL2TPV3State; - -static void net_l2tpv3_send(void *opaque); -static void l2tpv3_writable(void *opaque); - -static void l2tpv3_update_fd_handler(NetL2TPV3State *s) -{ -qemu_set_fd_handler(s->fd, -s->read_poll ? net_l2tpv3_send : NULL, -s->write_poll ? l2tpv3_writable : NULL, -s); -} - -static void l2tpv3_read_poll(NetL2TPV3State *s, bool enable) -{ -if (s->read_poll != enable) { -s->read_poll = enable; -l2tpv3_update_fd_handler(s); -} -} +/* Precomputed L2TPV3 specific offsets */ +uint32_t cookie_offset; +uint32_t counter_offset; +uint32_t session_offset; -static void l2tpv3_write_poll(NetL2TPV3State *s, bool enable) -{ -if (s->write_poll != enable) { -s->write_poll = enable; -l2tpv3_update_fd_handler(s); -} -} +} L2TPV3TunnelParams; -static void l2tpv3_writable(void *opaque) -{ -NetL2TPV3State *s = opaque; -l2tpv3_write_poll(s, false); -qemu_flush_queued_packets(>nc); -} -static void l2tpv3_send_completed(NetClientState *nc, ssize_t len) -{ -NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc); -l2tpv3_read_poll(s, true); -} -static void l2tpv3_poll(NetClientState *nc, bool enable) +static void l2tpv3_form_header(void *us) { -NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc); -l2tpv3_write_poll(s, enable); -l2tpv3_read_poll(s, enable); -} +NetUdstState *s = (NetUdstState *) us; +L2TPV3TunnelParams *p = (L2TPV3TunnelParams *) s->params; -static void l2tpv3_form_header(NetL2TPV3State *s) -{ uint32_t *counter; -if (s->udp) { +if (p->udp) { stl_be_p((uint32_t *) s->header_buf, L2TPV3_DATA_PACKET); } stl_be_p( -(uint32_t *) (s->header_buf + s->session_offset), -s->tx_session +(uint32_t *) (s->header_buf + p->session_offset), +p->tx_session ); -if (s->cookie) { -if (s->cookie_is_64) { +if (p->cookie) { +if (p->cookie_is_64) { stq_be_p( -(uint64_t *)(s->header_buf + s->cookie_offset), -s->tx_cookie +(uint64_t *)(s->header_buf +
[Qemu-devel] Revised Unified Datagram Socket Transport patchset
Hi Jason, hi list, Follows a revised patchset. I have addressed most comments. TODO: replace memcpy with dup where applicable TODO: add force v4 option TODO: port the UDP portion of the existing socket transport to the new infrastructure Future: add sendmmsg once a "bulk xmit" has been arranged on the QEMU hw and/or lower network subsystem layers side.
[Qemu-devel] [PATCH 1/4] Unified Datagram Socket Transports
From: Anton IvanovBasic infrastructure to start moving datagram based transports to a common infrastructure as well as introduce several additional transports. Signed-off-by: Anton Ivanov --- configure | 12 +- net/Makefile.objs | 2 +- net/net.c | 4 +- net/udst.c| 420 ++ net/udst.h| 121 qapi-schema.json | 19 ++- qemu-options.hx | 2 +- 7 files changed, 569 insertions(+), 11 deletions(-) create mode 100644 net/udst.c create mode 100644 net/udst.h diff --git a/configure b/configure index bad50f5368..00c911c49b 100755 --- a/configure +++ b/configure @@ -1862,7 +1862,9 @@ if ! compile_object -Werror ; then fi ## -# L2TPV3 probe +# UDST probe +# identical to L2TPv3 probe used for both +# during migration of L2TPv3 to udst backend cat > $TMPC < @@ -1870,9 +1872,9 @@ cat > $TMPC <> $config_host_mak fi -if test "$l2tpv3" = "yes" ; then - echo "CONFIG_L2TPV3=y" >> $config_host_mak +if test "$udst" = "yes" ; then + echo "CONFIG_UDST=y" >> $config_host_mak fi if test "$cap_ng" = "yes" ; then echo "CONFIG_LIBCAP=y" >> $config_host_mak diff --git a/net/Makefile.objs b/net/Makefile.objs index 67ba5e26fb..ffdfb96bd0 100644 --- a/net/Makefile.objs +++ b/net/Makefile.objs @@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o common-obj-y += socket.o common-obj-y += dump.o common-obj-y += eth.o -common-obj-$(CONFIG_L2TPV3) += l2tpv3.o +common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o common-obj-$(CONFIG_POSIX) += vhost-user.o common-obj-$(CONFIG_SLIRP) += slirp.o common-obj-$(CONFIG_VDE) += vde.o diff --git a/net/net.c b/net/net.c index 0e28099554..723a256260 100644 --- a/net/net.c +++ b/net/net.c @@ -960,8 +960,8 @@ static int (* const net_client_init_fun[NET_CLIENT_DRIVER__MAX])( #ifdef CONFIG_VHOST_NET_USED [NET_CLIENT_DRIVER_VHOST_USER] = net_init_vhost_user, #endif -#ifdef CONFIG_L2TPV3 -[NET_CLIENT_DRIVER_L2TPV3]= net_init_l2tpv3, +#ifdef CONFIG_UDST +[NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3, #endif }; diff --git a/net/udst.c b/net/udst.c new file mode 100644 index 00..612c90cb3a --- /dev/null +++ b/net/udst.c @@ -0,0 +1,420 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2015-2017 Cambridge Greys Limited + * Copyright (c) 2012-2014 Cisco Systems + * Copyright (c) 2003-2008 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +/* + * Udst Datagram Socket Transport Backend + * This transport is not intended to be initiated directly by an end-user + * It is used as a backend for other transports which use recv/sendmmsg + * socket functions for RX/TX. + */ + +#include "qemu/osdep.h" +#include +#include +#include "net/net.h" +#include "clients.h" +#include "qemu-common.h" +#include "qemu/error-report.h" +#include "qemu/option.h" +#include "qemu/sockets.h" +#include "qemu/iov.h" +#include "qemu/main-loop.h" +#include "udst.h" + +static void net_udst_send(void *opaque); +static void udst_writable(void *opaque); + +static void udst_update_fd_handler(NetUdstState *s) +{ +qemu_set_fd_handler(s->fd, +s->read_poll ? net_udst_send : NULL, +s->write_poll ? udst_writable : NULL, +s); +} + +static void udst_read_poll(NetUdstState *s, bool enable) +{ +if (s->read_poll != enable) { +s->read_poll = enable; +udst_update_fd_handler(s); +} +} + +static void udst_write_poll(NetUdstState *s, bool enable) +{ +if (s->write_poll != enable) { +s->write_poll = enable; +udst_update_fd_handler(s); +} +} + +static void udst_writable(void *opaque) +{ +NetUdstState *s = opaque; +udst_write_poll(s, false); +qemu_flush_queued_packets(>nc); +} + +static void
Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"
On Wed, Jul 19, 2017 at 08:17:49PM +0100, Dr. David Alan Gilbert wrote: > * Eduardo Habkost (ehabk...@redhat.com) wrote: > > On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote: > > > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote: > > > >> It doesn't. Perhaps we should add that as a future libvirt-qemu.so API > > > >> addition, although it's probably easier to just use QMP than HMP when > > > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want. > > > > > > > > Or special case the "cpu 1" command - ie notice that it is being > > > > requested and don't execute 'human-montor-command'. Instead just > > > > record the CPU index, and use that for future "human-monitor-command" > > > > invokations, so we get full compat with the (dubious) stateful HMP > > > > semantics that traditionally existed. > > > > > > Is 'cpu' (and the followup commands affected by it) the only stateful > > > HMP command pairing? Is there a way to specify multiple HMP commands in > > > a single human-monitor-command QMP call? > > > > > > Indeed, tweaking qemu's human-monitor-command call to track the state > > > might be cleaner than having libvirt have to tweak API to work around > > > this wart of HMP. > > > > The CPU index was the only state kept by the human monitor, and I > > think it's by design that it stopped being considered "monitor > > state" to be tracked, and became just an argument to > > human-monitor-command. > > > > It's true that it broke compatibility of > > "virsh qemu-monitor-command --hmp 'cpu '", > > when we moved to QMP, but this happened years ago, and it looks > > like nobody was relying on it. I don't see the point of trying > > to emulate the previous stateful interface. > > IMHO Yi's fix (once reworked) is the right fix - it removes the > use of that piece of state, when the optional parameter is used. > (OK, so it needs rework not to change that state and to > come to some agreement as to what to use instead of cpu index number > etc). Agreed, as it helps us to keep the "virsh qemu-monitor-command" interface simpler. But we have 8 commands that use mon_get_cpu(), we shouldn't fix only "info lapic". -- Eduardo
Re: [Qemu-devel] [PULL 00/14] tcg-next patch queue
On 19 July 2017 at 05:57, Richard Hendersonwrote: > This edition is a real mix: > * Code gen improvement for mips64 host (Jiang) > * Build fix for ppc-linux (Philippe) > * Runtime fix for tci (Philippe) > * Fix atomic helper names in debugging dumps (rth) > > * Cross-target tcg code gen improvements (Philippe) > This one had no obvious tree through which it should go, > so I went ahead and took them all. > > * Cherry-picked the first patch from Lluis' generic translate loop, > wherein the interface to gen_intermediate_code changes trivially. > It's the only patch from that series that touches all targets, > and I see little point carrying it around further. > > > r~ > > > The following changes since commit 6887dc6700ccb7820d8a9d370f421ee361c748e8: > > Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170718' into > staging (2017-07-18 21:13:48 +0100) > > are available in the git repository at: > > git://github.com/rth7680/qemu.git tags/pull-tcg-20170718 > > for you to fetch changes up to 3d48caee9e2c18385be60bb0467fa1f61d325c64: > > tcg: Pass generic CPUState to gen_intermediate_code() (2017-07-18 14:26:13 > -1000) > > > Queued tcg and tcg code gen related cleanups > The sparc-linux-user test fails: /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc -L ./gnemul/qemu-sparc sparc/ls -l dummyfile Inconsistency detected by ld.so: rtld.c: 858: dl_main: Assertion `_dl_rtld_map.l_prev->l_next == _dl_rtld_map.l_next' failed! Makefile:6: recipe for target 'test' failed A valgrind run produces a lot of noise, but this bit looks suspicious: ==14436== ==14436== Conditional jump or move depends on uninitialised value(s) ==14436==at 0x60003F7C: tcg_out_qemu_st_direct (tcg-target.inc.c:1733) ==14436==by 0x60004295: tcg_out_qemu_st (tcg-target.inc.c:1856) ==14436==by 0x60004F0C: tcg_out_op (tcg-target.inc.c:2140) ==14436==by 0x6000B0FF: tcg_reg_alloc_op (tcg.c:2360) ==14436==by 0x6000BCED: tcg_gen_code (tcg.c:2679) ==14436==by 0x600387B7: tb_gen_code (translate-all.c:1311) ==14436==by 0x6003637B: tb_find (cpu-exec.c:367) ==14436==by 0x60036A7C: cpu_exec (cpu-exec.c:675) ==14436==by 0x60039DA1: cpu_loop (main.c:1088) ==14436==by 0x6003B7AF: main (main.c:4860) ==14436== ==14436== Invalid write of size 4 ==14436==at 0x605114FA: ??? ==14436==by 0x6011ADDF: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6253464F: ??? ==14436==by 0x6022852F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6022818C: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x6022852F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436==by 0x416: ??? ==14436==by 0x60227F1F: ??? (in /home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) ==14436== Address 0x59d1c7d0 is not stack'd, malloc'd or (recently) free'd ==14436== Reverting "target/sparc: optimize gen_op_mulscc() using deposit op" fixed this, so I think that's probably the culprit. thanks -- PMM
Re: [Qemu-devel] Fwd: [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init
On 19/07/2017 21:56, Konrad Rzeszutek Wilk wrote: On Wed, Jul 19, 2017 at 09:38:50PM +0300, Alexander Bezzubikov wrote: 2017-07-19 21:18 GMT+03:00 Konrad Rzeszutek Wilk: On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote: ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk < konrad.w...@oracle.com>: On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote: Now PCI bridges (and PCIE root port too) get a bus range number in system init, basing on currently plugged devices. That's why when one wants to hotplug another bridge, it needs his child bus, which the parent is unable to provide. Could you explain how you trigger this? I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux says 'cannot allocate bus number for device bla-bla'. This obviously does not allow me to use the bridge at all. The suggested workaround is to have vendor-specific capability in RedHat generic pcie-root-port that contains number of additional bus to reserve on BIOS PCI init. But wouldn't the proper fix be for the PCI bridge to have the subordinate value be extended to fit more bus ranges? What do you mean? This is what I'm trying to do. Do you suppose to get rid of vendor-specific cap and use original register value instead of it? I would suggest a simple fix - each bridge has a a number of bus devices it can use. You have up to 255 - so you split the number of northbridge numbers by the amount of NUMA nodes (if that is used) - so for example if you have 4 NUMA nodes, each bridge would cover 63 bus numbers. Meaning the root bridge would cover 0->63 bus, 64->128, and so on. That gives you enough space to plug in your plugged in devices (up to 63). And if you need sub-briges then carve out a specific range. Hi Konrad, The problem is that we don't know at the init moment how many subbridges we may need, Is possible the explanation was not clear clear and led to some miscommunication. And the explanation above does not either. It just setups at init time an range where you can plug in your new devices in. But in a more uniform way such that you can also utilize this with NUMA and _PXM topology in the future. I fully agree with you and actually QEMU has already implemented the exact idea you are describing here, its called a pxb/pxb-pci device, that can be "bounded" to a specific NUMA node and has a subrange of bus numbers dedicated to it. However this problem is different. In a PCI Express machine you can hotplug PCIe devices only into PCIe Root Ports (or switch downstream ports, but not in current scope). We want to be able to hotplug a PCIe-PCI bridge into a PCIe Root Port so we can then hot-plug legacy PCI devices. Since the PCIe Root Port is a type of PCI bridge, at boot time it only gets the bus sub-range (primary bus,subordinate bus] which is computed by firmware and leaves no bus number that can be used by a hot-plugged pci-bridge. And this obviously does not depend on how we arrange NUMA/proximities. We are also not looking for a fix for a specific guest OS, so reserving some extra bus-numbers it has minimal impact on the system. I do agree the problem may be solved differently, however we can't reach all guest OS vendors and ask them to support an alternative solution in a reasonable time frame. Thanks, Marcel and how deep the whole device tree will be. The key moment - PCI bridge hotplugging needs either rescan all buses on each bridge device addition, or reserve space in advance during BIOS init. can all buses on each bridge device addition, or reserve It is more complex than that - you may need to move devices that are below you. And Linux kernel (nor any other OS) can handle that. (They can during bootup) In this series the second way was chosen. Aleksandr Bezzubikov (2): pci: add support for direct usage of bdf for capability lookup pci: enable RedHat pci bridges to reserve more buses src/fw/pciinit.c | 12 ++-- src/hw/pcidevice.c | 24 src/hw/pcidevice.h | 1 + 3 files changed, 35 insertions(+), 2 deletions(-) -- 2.7.4 -- Alexander Bezzubikov -- Alexander Bezzubikov
Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"
* Eduardo Habkost (ehabk...@redhat.com) wrote: > On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote: > > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote: > > >> It doesn't. Perhaps we should add that as a future libvirt-qemu.so API > > >> addition, although it's probably easier to just use QMP than HMP when > > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want. > > > > > > Or special case the "cpu 1" command - ie notice that it is being > > > requested and don't execute 'human-montor-command'. Instead just > > > record the CPU index, and use that for future "human-monitor-command" > > > invokations, so we get full compat with the (dubious) stateful HMP > > > semantics that traditionally existed. > > > > Is 'cpu' (and the followup commands affected by it) the only stateful > > HMP command pairing? Is there a way to specify multiple HMP commands in > > a single human-monitor-command QMP call? > > > > Indeed, tweaking qemu's human-monitor-command call to track the state > > might be cleaner than having libvirt have to tweak API to work around > > this wart of HMP. > > The CPU index was the only state kept by the human monitor, and I > think it's by design that it stopped being considered "monitor > state" to be tracked, and became just an argument to > human-monitor-command. > > It's true that it broke compatibility of > "virsh qemu-monitor-command --hmp 'cpu '", > when we moved to QMP, but this happened years ago, and it looks > like nobody was relying on it. I don't see the point of trying > to emulate the previous stateful interface. IMHO Yi's fix (once reworked) is the right fix - it removes the use of that piece of state, when the optional parameter is used. (OK, so it needs rework not to change that state and to come to some agreement as to what to use instead of cpu index number etc). Dave > -- > Eduardo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] Fwd: [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init
On Wed, Jul 19, 2017 at 09:38:50PM +0300, Alexander Bezzubikov wrote: > 2017-07-19 21:18 GMT+03:00 Konrad Rzeszutek Wilk: > > > On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote: > > > ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk < > > konrad.w...@oracle.com>: > > > > > > > On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote: > > > > > Now PCI bridges (and PCIE root port too) get a bus range number in > > > > system init, > > > > > basing on currently plugged devices. That's why when one wants to > > > > hotplug another bridge, > > > > > it needs his child bus, which the parent is unable to provide. > > > > > > > > Could you explain how you trigger this? > > > > > > > > > I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux > > says > > > 'cannot allocate bus number for device bla-bla'. This obviously does not > > > allow me to use the bridge at all. > > > > > > > > > > > > > > > > The suggested workaround is to have vendor-specific capability in > > RedHat > > > > generic pcie-root-port > > > > > that contains number of additional bus to reserve on BIOS PCI init. > > > > > > > > But wouldn't the proper fix be for the PCI bridge to have the > > subordinate > > > > value be extended to fit more bus ranges? > > > > > > > > > What do you mean? This is what I'm trying to do. Do you suppose to get > > rid > > > of vendor-specific cap and use original register value instead of it? > > > > I would suggest a simple fix - each bridge has a a number of bus devices > > it can use. You have up to 255 - so you split the number of northbridge > > numbers by the amount of NUMA nodes (if that is used) - so for example > > if you have 4 NUMA nodes, each bridge would cover 63 bus numbers. > > > > Meaning the root bridge would cover 0->63 bus, 64->128, and so on. > > That gives you enough space to plug in your plugged in devices > > (up to 63). > > > > And if you need sub-briges then carve out a specific range. > > > > The problem is that we don't know at the init moment how many subbridges we > may need, And the explanation above does not either. It just setups at init time an range where you can plug in your new devices in. But in a more uniform way such that you can also utilize this with NUMA and _PXM topology in the future. > and how deep the whole device tree will be. The key moment - PCI bridge > hotplugging > needs either rescan all buses on each bridge device addition, or reserve > space in advance during BIOS init. can all buses on each bridge device addition, or reserve It is more complex than that - you may need to move devices that are below you. And Linux kernel (nor any other OS) can handle that. (They can during bootup) > In this series the second way was chosen. > > > > > > > > > > > > > > > > > > > > > > > Aleksandr Bezzubikov (2): > > > > > pci: add support for direct usage of bdf for capability lookup > > > > > pci: enable RedHat pci bridges to reserve more buses > > > > > > > > > > src/fw/pciinit.c | 12 ++-- > > > > > src/hw/pcidevice.c | 24 > > > > > src/hw/pcidevice.h | 1 + > > > > > 3 files changed, 35 insertions(+), 2 deletions(-) > > > > > > > > > > -- > > > > > 2.7.4 > > > > > > > > > > > > > > > > > -- > > > Alexander Bezzubikov > > > > > > -- > Alexander Bezzubikov
Re: [Qemu-devel] [PATCH v2 1/3] qemu.py: fix is_running()
On Wed, Jul 19, 2017 at 03:34:47PM -0300, Eduardo Habkost wrote: > On Wed, Jul 19, 2017 at 06:31:06PM +0200, Amador Pahim wrote: > > Current implementation is broken. It does not really test if the child > > process is running. > > > > The Popen.returncode will only be set after by a poll(), wait() or > > communicate(). If the Popen fails to launch a VM, the Popen.returncode > > will not turn to None by itself. > > > > Instead of using Popen.returncode, let's use Popen.poll(), which > > actually checks if child process has terminated. > > > > Signed-off-by: Amador Pahim> > I vaguely remember I had a version of that code using poll() and > it broke scripts for some reason. I will try to find out why, so > we can either fix the script or document the reason why poll() > isn't a good choice here. Thanks to git reflog, I found the original "fix" I had in my WIP tree: 251fc73 work/device-crash-script@{71}: commit: fixup! qemu.py: Don't set _popen=None on error/shutdown diff --git a/scripts/qemu.py b/scripts/qemu.py index 4dae811..cbc9e2a 100644 --- a/scripts/qemu.py +++ b/scripts/qemu.py @@ -86,7 +86,7 @@ class QEMUMachine(object): raise def is_running(self): -return self._popen and (self._popen.poll() is None) +return self._popen and (self._popen.returncode is None) def exitcode(self): if self._popen: @@ -137,6 +137,7 @@ class QEMUMachine(object): except: if self.is_running(): self._popen.kill() +self._popen.wait() self._load_io_log() self._post_shutdown() raise The original bug was like this: if QEMU process took a little longer to be actually terminated after self._popen.kill() was called, it triggering post-shutdown code inside shutdown() (because is_running() was still True), causing the following exception: Traceback (most recent call last): File "./scripts/device-crash-test.py", line 528, in sys.exit(main()) File "./scripts/device-crash-test.py", line 487, in main f = checkOneCase(args, t) File "./scripts/device-crash-test.py", line 320, in checkOneCase vm.shutdown() File "/home/ehabkost/rh/proj/virt/qemu/scripts/qemu.py", line 156, in shutdown self._load_io_log() File "/home/ehabkost/rh/proj/virt/qemu/scripts/qemu.py", line 101, in _load_io_log with open(self._qemu_log_path, "r") as fh: IOError: [Errno 2] No such file or directory: '/var/tmp/qemu-23568.log' My fix was incorrect: the actual bug was the missing self._popen.wait() call after self._popen.kill(), not the self._popen.poll() call. Your fix looks good and device-crash-test is not crashing. Reviewed-by: Eduardo Habkost > > > --- > > scripts/qemu.py | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/qemu.py b/scripts/qemu.py > > index 880e3e8219..f0fade32bd 100644 > > --- a/scripts/qemu.py > > +++ b/scripts/qemu.py > > @@ -86,7 +86,7 @@ class QEMUMachine(object): > > raise > > > > def is_running(self): > > -return self._popen and (self._popen.returncode is None) > > +return self._popen and (self._popen.poll() is None) > > > > def exitcode(self): > > if self._popen is None: > > -- > > 2.13.3 > > > > -- > Eduardo -- Eduardo
Re: [Qemu-devel] [PATCH] ide: check BlockBackend object in ide_cancel_dma_sync
On 07/14/2017 06:00 AM, P J P wrote: > From: Prasad J Pandit> > When cancelling pending DMA requests in ide_cancel_dma_sync, > the s->blk object could be null, leading to a null dereference. > Add check to avoid it. > > Reported-by: Chensongnian > Signed-off-by: Prasad J Pandit > --- > hw/ide/core.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index 0b48b64..04474b3 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -681,8 +681,10 @@ void ide_cancel_dma_sync(IDEState *s) > #ifdef DEBUG_IDE > printf("%s: draining all remaining requests", __func__); > #endif > -blk_drain(s->blk); > -assert(s->bus->dma->aiocb == NULL); > +if (s->blk) { > +blk_drain(s->blk); > +assert(s->bus->dma->aiocb == NULL); > +} > } > } > > I guess this occurs through ide_exec_cmd cmd_device_reset ide_cancel_dma_sync though if s->blk does not exist, we should usually not be able to address this device with a reset command as such. (core.c:2021) -- but this is only for secondary devices. I guess we don't guard against nonexistent primary devices..? Further, how do we have s->bus->dma->aiocb if there's no blk device? What DMA request did we accept...? Can you please submit a stack that illustrates the code path followed so this fix can be properly verified and tested? Thanks, --John
Re: [Qemu-devel] [PATCH v5 10/17] migration: Create ram_multifd_page
* Juan Quintela (quint...@redhat.com) wrote: > The function still don't use multifd, but we have simplified > ram_save_page, xbzrle and RDMA stuff is gone. We have added a new > counter and a new flag for this type of pages. > > Signed-off-by: Juan Quintela> --- > hmp.c | 2 ++ > migration/migration.c | 1 + > migration/ram.c | 90 > ++- > qapi-schema.json | 5 ++- > 4 files changed, 96 insertions(+), 2 deletions(-) > > diff --git a/hmp.c b/hmp.c > index b01605a..eeb308b 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -234,6 +234,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) > monitor_printf(mon, "postcopy request count: %" PRIu64 "\n", > info->ram->postcopy_requests); > } > +monitor_printf(mon, "multifd: %" PRIu64 " pages\n", > + info->ram->multifd); > } > > if (info->has_disk) { > diff --git a/migration/migration.c b/migration/migration.c > index e1c79d5..d9d5415 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -528,6 +528,7 @@ static void populate_ram_info(MigrationInfo *info, > MigrationState *s) > info->ram->dirty_sync_count = ram_counters.dirty_sync_count; > info->ram->postcopy_requests = ram_counters.postcopy_requests; > info->ram->page_size = qemu_target_page_size(); > +info->ram->multifd = ram_counters.multifd; > > if (migrate_use_xbzrle()) { > info->has_xbzrle_cache = true; > diff --git a/migration/ram.c b/migration/ram.c > index b80f511..2bf3fa7 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -68,6 +68,7 @@ > #define RAM_SAVE_FLAG_XBZRLE 0x40 > /* 0x80 is reserved in migration.h start with 0x100 next */ > #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100 > +#define RAM_SAVE_FLAG_MULTIFD_PAGE 0x200 > > static inline bool is_zero_range(uint8_t *p, uint64_t size) > { > @@ -362,12 +363,17 @@ static void compress_threads_save_setup(void) > /* Multiple fd's */ > > struct MultiFDSendParams { > +/* not changed */ > uint8_t id; > QemuThread thread; > QIOChannel *c; > QemuSemaphore sem; > QemuMutex mutex; > +/* protected by param mutex */ > bool quit; Should probably comment to say what address space address is in - this is really a qemu pointer - and that's why we can treat 0 as special? > +uint8_t *address; > +/* protected by multifd mutex */ > +bool done; done needs a comment to explain what it is because it sounds similar to quit; I think 'done' is saying that the thread is idle having done what was asked? > }; > typedef struct MultiFDSendParams MultiFDSendParams; > > @@ -375,6 +381,8 @@ struct { > MultiFDSendParams *params; > /* number of created threads */ > int count; > +QemuMutex mutex; > +QemuSemaphore sem; > } *multifd_send_state; > > static void terminate_multifd_send_threads(void) > @@ -443,6 +451,7 @@ static void *multifd_send_thread(void *opaque) > } else { > qio_channel_write(p->c, string, MULTIFD_UUID_MSG, _abort); > } > +qemu_sem_post(_send_state->sem); > > while (!exit) { > qemu_mutex_lock(>mutex); > @@ -450,6 +459,15 @@ static void *multifd_send_thread(void *opaque) > qemu_mutex_unlock(>mutex); > break; > } > +if (p->address) { > +p->address = 0; > +qemu_mutex_unlock(>mutex); > +qemu_mutex_lock(_send_state->mutex); > +p->done = true; > +qemu_mutex_unlock(_send_state->mutex); > +qemu_sem_post(_send_state->sem); > +continue; > +} > qemu_mutex_unlock(>mutex); > qemu_sem_wait(>sem); > } > @@ -469,6 +487,8 @@ int multifd_save_setup(void) > multifd_send_state = g_malloc0(sizeof(*multifd_send_state)); > multifd_send_state->params = g_new0(MultiFDSendParams, thread_count); > multifd_send_state->count = 0; > +qemu_mutex_init(_send_state->mutex); > +qemu_sem_init(_send_state->sem, 0); > for (i = 0; i < thread_count; i++) { > char thread_name[16]; > MultiFDSendParams *p = _send_state->params[i]; > @@ -477,6 +497,8 @@ int multifd_save_setup(void) > qemu_sem_init(>sem, 0); > p->quit = false; > p->id = i; > +p->done = true; > +p->address = 0; > p->c = socket_send_channel_create(); > if (!p->c) { > error_report("Error creating a send channel"); > @@ -491,6 +513,30 @@ int multifd_save_setup(void) > return 0; > } > > +static int multifd_send_page(uint8_t *address) > +{ > +int i; > +MultiFDSendParams *p = NULL; /* make happy gcc */ > + > +qemu_sem_wait(_send_state->sem); > +qemu_mutex_lock(_send_state->mutex); > +for (i = 0; i < multifd_send_state->count; i++) { > +p =
Re: [Qemu-devel] [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init
On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote: > ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk: > > > On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote: > > > Now PCI bridges (and PCIE root port too) get a bus range number in > > system init, > > > basing on currently plugged devices. That's why when one wants to > > hotplug another bridge, > > > it needs his child bus, which the parent is unable to provide. > > > > Could you explain how you trigger this? > > > I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux says > 'cannot allocate bus number for device bla-bla'. This obviously does not > allow me to use the bridge at all. > > > > > > > > The suggested workaround is to have vendor-specific capability in RedHat > > generic pcie-root-port > > > that contains number of additional bus to reserve on BIOS PCI init. > > > > But wouldn't the proper fix be for the PCI bridge to have the subordinate > > value be extended to fit more bus ranges? > > > What do you mean? This is what I'm trying to do. Do you suppose to get rid > of vendor-specific cap and use original register value instead of it? I would suggest a simple fix - each bridge has a a number of bus devices it can use. You have up to 255 - so you split the number of northbridge numbers by the amount of NUMA nodes (if that is used) - so for example if you have 4 NUMA nodes, each bridge would cover 63 bus numbers. Meaning the root bridge would cover 0->63 bus, 64->128, and so on. That gives you enough space to plug in your plugged in devices (up to 63). And if you need sub-briges then carve out a specific range. > > > > > > > > > Aleksandr Bezzubikov (2): > > > pci: add support for direct usage of bdf for capability lookup > > > pci: enable RedHat pci bridges to reserve more buses > > > > > > src/fw/pciinit.c | 12 ++-- > > > src/hw/pcidevice.c | 24 > > > src/hw/pcidevice.h | 1 + > > > 3 files changed, 35 insertions(+), 2 deletions(-) > > > > > > -- > > > 2.7.4 > > > > > > > > > -- > Alexander Bezzubikov
[Qemu-devel] Fwd: [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init
2017-07-19 21:18 GMT+03:00 Konrad Rzeszutek Wilk: > On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote: > > ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk < > konrad.w...@oracle.com>: > > > > > On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote: > > > > Now PCI bridges (and PCIE root port too) get a bus range number in > > > system init, > > > > basing on currently plugged devices. That's why when one wants to > > > hotplug another bridge, > > > > it needs his child bus, which the parent is unable to provide. > > > > > > Could you explain how you trigger this? > > > > > > I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux > says > > 'cannot allocate bus number for device bla-bla'. This obviously does not > > allow me to use the bridge at all. > > > > > > > > > > > > The suggested workaround is to have vendor-specific capability in > RedHat > > > generic pcie-root-port > > > > that contains number of additional bus to reserve on BIOS PCI init. > > > > > > But wouldn't the proper fix be for the PCI bridge to have the > subordinate > > > value be extended to fit more bus ranges? > > > > > > What do you mean? This is what I'm trying to do. Do you suppose to get > rid > > of vendor-specific cap and use original register value instead of it? > > I would suggest a simple fix - each bridge has a a number of bus devices > it can use. You have up to 255 - so you split the number of northbridge > numbers by the amount of NUMA nodes (if that is used) - so for example > if you have 4 NUMA nodes, each bridge would cover 63 bus numbers. > > Meaning the root bridge would cover 0->63 bus, 64->128, and so on. > That gives you enough space to plug in your plugged in devices > (up to 63). > > And if you need sub-briges then carve out a specific range. > The problem is that we don't know at the init moment how many subbridges we may need, and how deep the whole device tree will be. The key moment - PCI bridge hotplugging needs either rescan all buses on each bridge device addition, or reserve space in advance during BIOS init. In this series the second way was chosen. > > > > > > > > > > > > > > > Aleksandr Bezzubikov (2): > > > > pci: add support for direct usage of bdf for capability lookup > > > > pci: enable RedHat pci bridges to reserve more buses > > > > > > > > src/fw/pciinit.c | 12 ++-- > > > > src/hw/pcidevice.c | 24 > > > > src/hw/pcidevice.h | 1 + > > > > 3 files changed, 35 insertions(+), 2 deletions(-) > > > > > > > > -- > > > > 2.7.4 > > > > > > > > > > > > > -- > > Alexander Bezzubikov > -- Alexander Bezzubikov
Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"
On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote: > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote: > >> It doesn't. Perhaps we should add that as a future libvirt-qemu.so API > >> addition, although it's probably easier to just use QMP than HMP when > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want. > > > > Or special case the "cpu 1" command - ie notice that it is being > > requested and don't execute 'human-montor-command'. Instead just > > record the CPU index, and use that for future "human-monitor-command" > > invokations, so we get full compat with the (dubious) stateful HMP > > semantics that traditionally existed. > > Is 'cpu' (and the followup commands affected by it) the only stateful > HMP command pairing? Is there a way to specify multiple HMP commands in > a single human-monitor-command QMP call? > > Indeed, tweaking qemu's human-monitor-command call to track the state > might be cleaner than having libvirt have to tweak API to work around > this wart of HMP. The CPU index was the only state kept by the human monitor, and I think it's by design that it stopped being considered "monitor state" to be tracked, and became just an argument to human-monitor-command. It's true that it broke compatibility of "virsh qemu-monitor-command --hmp 'cpu '", when we moved to QMP, but this happened years ago, and it looks like nobody was relying on it. I don't see the point of trying to emulate the previous stateful interface. -- Eduardo
Re: [Qemu-devel] [PULL 0/8] target/alpha cleanups
On 19 July 2017 at 05:45, Richard Hendersonwrote: > The new title holder for perf top is helper_lookup_tb_ptr. > Those targets that have a complicated cpu_get_tb_cpu_state > function are going to regret that. > > This cleans up the Alpha version of that function such that it is > just two loads and one mask. Which is one practically-free mask > away from being as minimal as one can get. > > Also, in anticipation of LLuis' generic translation loop, fix all > of the temporary leaks. They all seem to have been on insns that > end the TB, so in practice they weren't harmful, but... > > > r~ > > > The following changes since commit 6887dc6700ccb7820d8a9d370f421ee361c748e8: > > Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170718' into > staging (2017-07-18 21:13:48 +0100) > > are available in the git repository at: > > git://github.com/rth7680/qemu.git tags/pull-axp-20170718 > > for you to fetch changes up to 8aa5c65fd3d4612d8ab690bef0980d26f30f381d: > > target/alpha: Log temp leaks (2017-07-18 18:42:05 -1000) > > > Queued target/alpha patches > Applied, thanks. -- PMM