Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"

2017-07-19 Thread wang.yi59
>On Wed, Jul 19, 2017 at 08:17:49PM +0100, Dr. David Alan Gilbert wrote:

>> * Eduardo Habkost (address@hidden) wrote:

>> > On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote:

>> > > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote:

>> > > >> It doesn't.  Perhaps we should add that as a future libvirt-qemu.so 
>> > > >> API

>> > > >> addition, although it's probably easier to just use QMP than HMP when

>> > > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want.

>> > > > 

>> > > > Or special case the "cpu 1" command - ie notice that it is being

>> > > > requested and don't execute 'human-montor-command'. Instead just

>> > > > record the CPU index, and use that for future "human-monitor-command"

>> > > > invokations, so we get full compat with the (dubious) stateful HMP

>> > > > semantics that traditionally existed.

>> > > 

>> > > Is 'cpu' (and the followup commands affected by it) the only stateful

>> > > HMP command pairing?  Is there a way to specify multiple HMP commands in

>> > > a single human-monitor-command QMP call?

>> > > 

>> > > Indeed, tweaking qemu's human-monitor-command call to track the state

>> > > might be cleaner than having libvirt have to tweak API to work around

>> > > this wart of HMP.

>> > 

>> > The CPU index was the only state kept by the human monitor, and I

>> > think it's by design that it stopped being considered "monitor

>> > state" to be tracked, and became just an argument to

>> > human-monitor-command.

>> > 

>> > It's true that it broke compatibility of

>> >   "virsh qemu-monitor-command  --hmp 'cpu '",

>> > when we moved to QMP, but this happened years ago, and it looks

>> > like nobody was relying on it.  I don't see the point of trying

>> > to emulate the previous stateful interface.

>> 

>> IMHO Yi's fix (once reworked) is the right fix - it removes the

>> use of that piece of state, when the optional parameter is used.

>> (OK, so it needs rework not to change that state and to

>> come to some agreement as to what to use instead of cpu index number

>> etc).

>

>Agreed, as it helps us to keep the "virsh qemu-monitor-command"

>interface simpler.  But we have 8 commands that use

>mon_get_cpu(), we shouldn't fix only "info lapic".




Thank you all!

I will rework this patch in this way:

  - extend 'info registers' with apic id value instead of current, like this:

  CPU#1 (socket-id: a, core-id: b, thread-id: c, apic-id: d)

  - add parameter 'apic id' for 'info lapic'




As to other commands, I want to send some other patches 'cause in my opinion not

all commands need 'apic-id' as index.




---

Best wishes

Yi Wang

Re: [Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug

2017-07-19 Thread no-reply
Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 1500523879-23860-1-git-send-email-bhar...@linux.vnet.ibm.com
Subject: [Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-build@min-glib
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
3aad3d6 spapr: Fix QEMU abort during memory unplug

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-j1hutab7/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-j1hutab7/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
git-1.7.1-8.el6.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=bee8e1f457b0
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
HAX support   no
TCG support   yes
TCG debug 

Re: [Qemu-devel] [PATCH] migration: optimize the downtime

2017-07-19 Thread no-reply
Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 1500522569-10760-1-git-send-email-jianjay.z...@huawei.com
Subject: [Qemu-devel] [PATCH] migration: optimize the downtime
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-build@min-glib
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
1ae581e migration: optimize the downtime

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-6slpqj5k/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-6slpqj5k/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
git-1.7.1-8.el6.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=50d2403d8481
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Documentation no
PIE   yes
vde support   no
netmap supportno
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support   yes
HAX support   no
TCG support   yes
TCG debug enabled no
TCG interpreter   no

[Qemu-devel] [FIX PATCH v1] spapr: Fix QEMU abort during memory unplug

2017-07-19 Thread Bharata B Rao
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
device that is marked for removal. Since this commit we can hit the
assert in spapr_pending_dimm_unplugs_add() in the following situation:

- DIMM device removal fails as the guest doesn't allow the removal.
- Subsequent attempt to remove the same DIMM would hit the assert
  as the corresponding sPAPRDIMMState is still part of the
  pending_dimm_unplugs list.

Fix this by removing the assert and conditionally adding the
sPAPRDIMMState to pending_dimm_unplugs list only when it is not
already present.

Fixes: 0cffce56ae3501c5783d779f97993ce478acf856
Signed-off-by: Bharata B Rao 
---
Changes in v1:
- Added comment (David Gibson)
- Ensured we free sPAPRDIMMState when corresonding entry already
  exists (Daniel Henrique Barboza)

Daniel had shown another alternative, we can switch over to that
if preferred.

 hw/ppc/spapr.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1cb09e7..c6091e2 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2853,8 +2853,17 @@ static sPAPRDIMMState 
*spapr_pending_dimm_unplugs_find(sPAPRMachineState *s,
 static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
sPAPRDIMMState *dimm_state)
 {
-g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm));
-QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next);
+/*
+ * If this request is for a DIMM whose removal had failed earlier
+ * (due to guest's refusal to remove the LMBs), we would have this
+ * dimm_state already in the pending_dimm_unplugs list. In that
+ * case don't add again.
+ */
+if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) {
+QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next);
+} else {
+g_free(dimm_state);
+}
 }
 
 static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
-- 
2.7.4




Re: [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts

2017-07-19 Thread no-reply
Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 1500520169-23367-1-git-send-email-c...@braap.org
Subject: [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-build@min-glib
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/1500520169-23367-1-git-send-email-c...@braap.org 
-> patchew/1500520169-23367-1-git-send-email-c...@braap.org
 - [tag update]  patchew/20170719163108.26943-1-apa...@redhat.com -> 
patchew/20170719163108.26943-1-apa...@redhat.com
Switched to a new branch 'test'
da87a3c tcg: enable multiple TCG contexts in softmmu
043a8fe tcg: introduce regions to split code_gen_buffer
56b9b69 tcg: define TCG_HIGHWATER
e7f8206 translate-all: use qemu_protect_rwx/none helpers
22af337 osdep: introduce qemu_mprotect_rwx/none
e169f67 util: move qemu_real_host_page_size/mask to osdep.h
2cb855d tcg: distribute profiling counters across TCGContext's
bd2ea58 tcg: introduce **tcg_ctxs to keep track of all TCGContext's
317e1af tcg: dynamically allocate optimizer temps
379bf86 gen-icount: fold exitreq_label into TCGContext
7dd8ecd tcg: define tcg_init_ctx and make tcg_ctx a pointer
29b8220 tcg: take .helpers out of TCGContext
f90dd29 tcg: take tb_ctx out of TCGContext
265e0ba tci: move tci_regs to tcg_qemu_tb_exec's stack
556cc0f translate-all: report correct avg host TB size
60c2e41 exec-all: rename tb_free to tb_remove
7e9cbd3 translate-all: use a binary search tree to track TBs in TBContext
05a7053 exec-all: extract tb->tc_* into a separate struct tc_tb
9d12b48 translate-all: define and use DEBUG_TB_CHECK_GATE
e6a6556 translate-all: define and use DEBUG_TB_INVALIDATE_GATE
0db2c48 exec-all: introduce TB_PAGE_ADDR_FMT
ace4460 translate-all: define and use DEBUG_TB_FLUSH_GATE
27836fa cpu-exec: lookup/generate TB outside exclusive region during step_atomic
1e060f9 tcg: check CF_PARALLEL instead of parallel_cpus
bc98732 target/sparc: check CF_PARALLEL instead of parallel_cpus
6ffdc73 target/sh4: check CF_PARALLEL instead of parallel_cpus
ccb61ac target/s390x: check CF_PARALLEL instead of parallel_cpus
12eeba3 target/m68k: check CF_PARALLEL instead of parallel_cpus
c6c4772 target/i386: check CF_PARALLEL instead of parallel_cpus
b6c38be target/hppa: check CF_PARALLEL instead of parallel_cpus
ee35e3a target/arm: check CF_PARALLEL instead of parallel_cpus
ac51961 tcg: convert tb->cflags reads to tb_cflags(tb)
30db086 tcg: define CF_PARALLEL and use it for TB hashing
35d964c exec-all: bring tb->invalid into tb->cflags
cc370d5 tcg: consolidate TB lookups in tb_lookup__cpu_state
7948c58 tcg: remove addr argument from lookup_tb_ptr
263f3e2 tcg/mips: constify tcg_target_callee_save_regs
fc1b9cb tcg/i386: constify tcg_target_callee_save_regs
72848e1 cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find
b2ae03c translate-all: make have_tb_lock static
d2271e9 exec-all: fix typos in TranslationBlock's documentation
bba6cb2 tcg: fix corruption of code_time profiling counter upon tb_flush
50913ad cputlb: bring back tlb_flush_count under !TLB_DEBUG

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-0962jnus/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-0962jnus/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
git-1.7.1-8.el6.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=6a51217a1bb9
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:

[Qemu-devel] [PATCH] migration: optimize the downtime

2017-07-19 Thread Jay Zhou
Qemu_savevm_state_cleanup() takes about 300ms in my ram migration tests
with a 8U24G vm(20G is really occupied), the main cost comes from
KVM_SET_USER_MEMORY_REGION ioctl when mem.memory_size = 0 in
kvm_set_user_memory_region(). In kmod, the main cost is
kvm_zap_obsolete_pages(), which traverses the active_mmu_pages list to
zap the unsync sptes.

I think it can be optimized:
(1) source vm will be destroyed if the migration is successfully done,
so the resources will be cleanuped automatically by the system
(2) delay the cleanup if the migration failed

Signed-off-by: Jay Zhou 
---
 migration/migration.c | 16 +---
 qmp.c | 10 ++
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index a0db40d..72832be 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1877,6 +1877,15 @@ static void *migration_thread(void *opaque)
 if (qemu_file_get_error(s->to_dst_file)) {
 migrate_set_state(>state, current_active_state,
   MIGRATION_STATUS_FAILED);
+/*
+ * The resource has been allocated by migration will be reused in
+ * COLO process, so don't release them.
+ */
+if (!enable_colo) {
+qemu_mutex_lock_iothread();
+qemu_savevm_state_cleanup();
+qemu_mutex_unlock_iothread();
+}
 trace_migration_thread_file_err();
 break;
 }
@@ -1916,13 +1925,6 @@ static void *migration_thread(void *opaque)
 end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 
 qemu_mutex_lock_iothread();
-/*
- * The resource has been allocated by migration will be reused in COLO
- * process, so don't release them.
- */
-if (!enable_colo) {
-qemu_savevm_state_cleanup();
-}
 if (s->state == MIGRATION_STATUS_COMPLETED) {
 uint64_t transferred_bytes = qemu_ftell(s->to_dst_file);
 s->total_time = end_time - s->total_time;
diff --git a/qmp.c b/qmp.c
index b86201e..0e68eaa 100644
--- a/qmp.c
+++ b/qmp.c
@@ -37,6 +37,8 @@
 #include "qom/object_interfaces.h"
 #include "hw/mem/pc-dimm.h"
 #include "hw/acpi/acpi_dev_interface.h"
+#include "migration/migration.h"
+#include "migration/savevm.h"
 
 NameInfo *qmp_query_name(Error **errp)
 {
@@ -200,6 +202,14 @@ void qmp_cont(Error **errp)
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 autostart = 1;
 } else {
+/*
+ * Delay the cleanup to reduce the downtime of migration.
+ * The resource has been allocated by migration will be reused
+ * in COLO process, so don't release them.
+ */
+if (runstate_check(RUN_STATE_POSTMIGRATE) && !migrate_colo_enabled()) {
+qemu_savevm_state_cleanup();
+}
 vm_start();
 }
 }
-- 
1.8.3.1





[Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

The core of this patch is this change to tcg/tcg.h:

> -extern TCGContext tcg_ctx;
> +extern TCGContext tcg_init_ctx;
> +extern TCGContext *tcg_ctx;

Note that for now we set *tcg_ctx to whatever TCGContext is passed
to tcg_context_init -- in this case _init_ctx.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/gen-icount.h | 10 ++---
 include/exec/helper-gen.h | 12 +++---
 tcg/tcg-op.h  | 80 +--
 tcg/tcg.h | 15 +++
 accel/tcg/translate-all.c | 97 ++-
 bsd-user/main.c   |  2 +-
 linux-user/main.c |  2 +-
 target/alpha/translate.c  |  2 +-
 target/arm/translate.c|  2 +-
 target/cris/translate.c   |  2 +-
 target/cris/translate_v10.c   |  2 +-
 target/hppa/translate.c   |  2 +-
 target/i386/translate.c   |  2 +-
 target/lm32/translate.c   |  2 +-
 target/m68k/translate.c   |  2 +-
 target/microblaze/translate.c |  2 +-
 target/mips/translate.c   |  2 +-
 target/moxie/translate.c  |  2 +-
 target/openrisc/translate.c   |  2 +-
 target/ppc/translate.c|  2 +-
 target/s390x/translate.c  |  2 +-
 target/sh4/translate.c|  2 +-
 target/sparc/translate.c  |  2 +-
 target/tilegx/translate.c |  2 +-
 target/tricore/translate.c|  2 +-
 target/unicore32/translate.c  |  2 +-
 target/xtensa/translate.c |  2 +-
 tcg/tcg-op.c  | 58 +-
 tcg/tcg-runtime.c |  2 +-
 tcg/tcg.c | 21 +-
 30 files changed, 171 insertions(+), 168 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 48b566c..c58b0b2 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -19,7 +19,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 count = tcg_temp_new_i32();
 }
 
-tcg_gen_ld_i32(count, tcg_ctx.tcg_env,
+tcg_gen_ld_i32(count, tcg_ctx->tcg_env,
-ENV_OFFSET + offsetof(CPUState, icount_decr.u32));
 
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
@@ -37,7 +37,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
 
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
-tcg_gen_st16_i32(count, tcg_ctx.tcg_env,
+tcg_gen_st16_i32(count, tcg_ctx->tcg_env,
  -ENV_OFFSET + offsetof(CPUState, 
icount_decr.u16.low));
 }
 
@@ -56,13 +56,13 @@ static inline void gen_tb_end(TranslationBlock *tb, int 
num_insns)
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
 
 /* Terminate the linked list.  */
-tcg_ctx.gen_op_buf[tcg_ctx.gen_op_buf[0].prev].next = 0;
+tcg_ctx->gen_op_buf[tcg_ctx->gen_op_buf[0].prev].next = 0;
 }
 
 static inline void gen_io_start(void)
 {
 TCGv_i32 tmp = tcg_const_i32(1);
-tcg_gen_st_i32(tmp, tcg_ctx.tcg_env,
+tcg_gen_st_i32(tmp, tcg_ctx->tcg_env,
-ENV_OFFSET + offsetof(CPUState, can_do_io));
 tcg_temp_free_i32(tmp);
 }
@@ -70,7 +70,7 @@ static inline void gen_io_start(void)
 static inline void gen_io_end(void)
 {
 TCGv_i32 tmp = tcg_const_i32(0);
-tcg_gen_st_i32(tmp, tcg_ctx.tcg_env,
+tcg_gen_st_i32(tmp, tcg_ctx->tcg_env,
-ENV_OFFSET + offsetof(CPUState, can_do_io));
 tcg_temp_free_i32(tmp);
 }
diff --git a/include/exec/helper-gen.h b/include/exec/helper-gen.h
index 8239ffc..3bcb901 100644
--- a/include/exec/helper-gen.h
+++ b/include/exec/helper-gen.h
@@ -9,7 +9,7 @@
 #define DEF_HELPER_FLAGS_0(name, flags, ret)\
 static inline void glue(gen_helper_, name)(dh_retvar_decl0(ret))\
 {   \
-  tcg_gen_callN(_ctx, HELPER(name), dh_retvar(ret), 0, NULL);   \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 0, NULL);\
 }
 
 #define DEF_HELPER_FLAGS_1(name, flags, ret, t1)\
@@ -17,7 +17,7 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl(ret)  \
 dh_arg_decl(t1, 1)) \
 {   \
   TCGArg args[1] = { dh_arg(t1, 1) };   \
-  tcg_gen_callN(_ctx, HELPER(name), dh_retvar(ret), 1, args);   \
+  tcg_gen_callN(tcg_ctx, HELPER(name), dh_retvar(ret), 1, args);\
 }
 
 #define DEF_HELPER_FLAGS_2(name, flags, ret, t1, t2)\
@@ -25,7 +25,7 @@ static inline void glue(gen_helper_, 
name)(dh_retvar_decl(ret)  \
 dh_arg_decl(t1, 1), dh_arg_decl(t2, 2)) \
 {   \
   TCGArg args[2] = { dh_arg(t1, 1), dh_arg(t2, 2) }; 

Re: [Qemu-devel] [FIX PATCH] spapr: Fix QEMU abort during memory unplug

2017-07-19 Thread David Gibson
On Wed, Jul 19, 2017 at 02:24:09PM +0530, Bharata B Rao wrote:
> Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
> sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
> device that is marked for removal. Since this commit we can hit the
> assert in spapr_pending_dimm_unplugs_add() in the following situation:
> 
> - DIMM device removal fails as the guest doesn't allow the removal.
> - Subsequent attempt to remove the same DIMM would hit the assert
>   as the corresponding sPAPRDIMMState is still part of the
>   pending_dimm_unplugs list.
> 
> Fix this by removing the assert and conditionally adding the
> sPAPRDIMMState to pending_dimm_unplugs list only when it is not
> already present.
> 
> Fixes: 0cffce56ae3501c5783d779f97993ce478acf856
> Signed-off-by: Bharata B Rao 

Sounds like a reasonable change based on the rationale above.
However, can you add a comment here explaining the situation in which
the entry already exists.

> ---
>  hw/ppc/spapr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1cb09e7..990bb2d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2853,8 +2853,9 @@ static sPAPRDIMMState 
> *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s,
>  static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
> sPAPRDIMMState *dimm_state)
>  {
> -g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm));
> -QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next);
> +if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) {
> +QTAILQ_INSERT_HEAD(>pending_dimm_unplugs, dimm_state, next);
> +}
>  }
>  
>  static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


[Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext

2017-07-19 Thread Emilio G. Cota
This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::-:22 \
-device virtio-net-device,netdev=unet \
-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel img/arm/aarch32-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

- Before:

   8048.598422  task-clock (msec) #0.931 CPUs utilized  
  ( +-  0.28% )
16,974  context-switches  #0.002 M/sec  
  ( +-  0.12% )
 0  cpu-migrations#0.000 K/sec
10,125  page-faults   #0.001 M/sec  
  ( +-  1.23% )
35,144,901,879  cycles#4.367 GHz
  ( +-  0.14% )
 stalled-cycles-frontend
 stalled-cycles-backend
65,758,252,643  instructions  #1.87  insns per cycle
  ( +-  0.33% )
10,871,298,668  branches  # 1350.707 M/sec  
  ( +-  0.41% )
   192,322,212  branch-misses #1.77% of all branches
  ( +-  0.32% )

   8.640869419 seconds time elapsed 
 ( +-  0.57% )

- After:
   8146.242027  task-clock (msec) #0.923 CPUs utilized  
  ( +-  1.23% )
17,016  context-switches  #0.002 M/sec  
  ( +-  0.40% )
 0  cpu-migrations#0.000 K/sec
18,769  page-faults   #0.002 M/sec  
  ( +-  0.45% )
35,660,956,120  cycles#4.378 GHz
  ( +-  1.22% )
 stalled-cycles-frontend
 stalled-cycles-backend
65,095,366,607  instructions  #1.83  insns per cycle
  ( +-  1.73% )
10,803,480,261  branches  # 1326.192 M/sec  
  ( +-  1.95% )
   195,601,289  branch-misses #1.81% of all branches
  ( +-  0.39% )

   8.828660235 seconds time elapsed 
 ( +-  0.38% )

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   |   5 ++
 include/exec/tb-context.h |   4 +-
 accel/tcg/translate-all.c | 217 --
 3 files changed, 118 insertions(+), 108 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bc4f41c..eb3eb7b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -343,10 +343,15 @@ static inline void tb_invalidate_phys_addr(AddressSpace 
*as, hwaddr addr)
 
 /*
  * Translation Cache-related fields of a TB.
+ * This struct exists just for convenience; we keep track of TB's in a binary
+ * search tree, and the only fields needed to compare TB's in the tree are
+ * @ptr and @size. @search is brought here for consistency, since it is also
+ * a TC-related field.
  */
 struct tb_tc {
 void *ptr;/* pointer to the translated code */
 uint8_t *search;  /* pointer to search data */
+size_t size;
 };
 
 struct TranslationBlock {
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 25c2afe..1fa8dcc 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h

[Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu

2017-07-19 Thread Emilio G. Cota
This enables parallel TCG code generation. However, we do not take
advantage of it yet since tb_lock is still held during tb_gen_code.

In user-mode we use a single TCG context; see the documentation
added to tcg_region_init for the rationale.

Note that targets do not need any conversion: targets initialize a
TCGContext (e.g. defining TCG globals), and after this initialization
has finished, the context is cloned by the vCPU threads, each of
them keeping a separate copy.

TCG threads claim one entry in tcg_ctxs[] by atomically increasing
n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
of that variable; they are there just to play nice with analysis
tools such as thread sanitizer.

Note that we do not allocate an array of contexts (we allocate
an array of pointers instead) because when tcg_context_init
is called, we do not know yet how many contexts we'll use since
the bool behind qemu_tcg_mttcg_enabled() isn't set yet.

Previous patches folded some TCG globals into TCGContext. The non-const
globals remaining are only set at init time, i.e. before the TCG
threads are spawned. Here is a list of these set-at-init-time globals
under tcg/:

Only written by tcg_context_init:
- indirect_reg_alloc_order
- tcg_op_defs
Only written by tcg_target_init (called from tcg_context_init):
- tcg_target_available_regs
- tcg_target_call_clobber_regs
- arm: arm_arch, use_idiv_instructions
- i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
have_movbe, have_popcnt
- mips: use_movnz_instructions, use_mips32_instructions,
use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
- ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
- s390: tb_ret_addr, s390_facilities
- sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
 use_vis3_instructions

Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr

Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.h |   7 ++-
 accel/tcg/translate-all.c |   2 +-
 cpus.c|   2 +
 linux-user/syscall.c  |   1 +
 tcg/tcg.c | 141 --
 5 files changed, 143 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3365da8..68cd14e 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -733,7 +733,7 @@ struct TCGContext {
 };
 
 extern TCGContext tcg_init_ctx;
-extern TCGContext *tcg_ctx;
+extern __thread TCGContext *tcg_ctx;
 
 static inline void tcg_set_insn_param(int op_idx, int arg, TCGArg v)
 {
@@ -755,7 +755,7 @@ static inline bool tcg_op_buf_full(void)
 
 /* pool based memory allocation */
 
-/* tb_lock must be held for tcg_malloc_internal. */
+/* user-mode: tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
@@ -766,7 +766,7 @@ void tcg_region_reset_all(void);
 size_t tcg_code_size(void);
 size_t tcg_code_capacity(void);
 
-/* Called with tb_lock held.  */
+/* user-mode: Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
 TCGContext *s = tcg_ctx;
@@ -783,6 +783,7 @@ static inline void *tcg_malloc(int size)
 }
 
 void tcg_context_init(TCGContext *s);
+void tcg_register_thread(void);
 void tcg_prologue_init(TCGContext *s);
 void tcg_func_start(TCGContext *s);
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 623b9e7..2e810b9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -154,7 +154,7 @@ static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
 TCGContext tcg_init_ctx;
-TCGContext *tcg_ctx;
+__thread TCGContext *tcg_ctx;
 TBContext tb_ctx;
 bool parallel_cpus;
 
diff --git a/cpus.c b/cpus.c
index 6022d40..74ddd49 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1307,6 +1307,7 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
 CPUState *cpu = arg;
 
 rcu_register_thread();
+tcg_register_thread();
 
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
@@ -1454,6 +1455,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 g_assert(!use_icount);
 
 rcu_register_thread();
+tcg_register_thread();
 
 qemu_mutex_lock_iothread();
 qemu_thread_get_self(cpu->thread);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 003943b..bbf7913 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6214,6 +6214,7 @@ static void *clone_func(void *arg)
 TaskState *ts;
 
 rcu_register_thread();
+tcg_register_thread();
 env = info->env;
 cpu = ENV_GET_CPU(env);
 thread_cpu = cpu;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 22a949f..a5c01be 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -58,6 +58,7 @@
 
 #include "elf.h"
 #include "exec/log.h"
+#include "sysemu/sysemu.h"
 
 /* 

[Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of having a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 2% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::-:22 \
-device virtio-net-device,netdev=unet \
-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

Before:
  19489.126318 task-clock#0.960 CPUs utilized   
 ( +-  0.96% )
23,697 context-switches  #0.001 M/sec   
 ( +-  0.51% )
 1 CPU-migrations#0.000 M/sec
19,953 page-faults   #0.001 M/sec   
 ( +-  0.40% )
56,214,402,410 cycles#2.884 GHz 
 ( +-  0.95% ) [83.34%]
25,516,669,513 stalled-cycles-frontend   #   45.39% frontend cycles idle
 ( +-  0.69% ) [83.33%]
17,266,165,747 stalled-cycles-backend#   30.71% backend  cycles idle
 ( +-  0.59% ) [66.66%]
79,007,843,327 instructions  #1.41  insns per cycle
 #0.32  stalled cycles per insn 
 ( +-  1.19% ) [83.34%]
13,136,600,416 branches  #  674.048 M/sec   
 ( +-  1.29% ) [83.34%]
   274,715,270 branch-misses #2.09% of all branches 
 ( +-  0.79% ) [83.33%]

  20.300335944 seconds time elapsed 
 ( +-  0.55% )

After:
  19917.737030 task-clock#0.955 CPUs utilized   
 ( +-  0.74% )
23,973 context-switches  #0.001 M/sec   
 ( +-  0.37% )
 1 CPU-migrations#0.000 M/sec
19,824 page-faults   #0.001 M/sec   
 ( +-  0.38% )
57,380,269,537 cycles#2.881 GHz 
 ( +-  0.70% ) [83.34%]
26,462,452,508 stalled-cycles-frontend   #   46.12% frontend cycles idle
 ( +-  0.65% ) [83.34%]
17,970,546,047 stalled-cycles-backend#   31.32% backend  cycles idle
 ( +-  0.64% ) [66.67%]
79,527,238,334 instructions  #1.39  insns per cycle
 #0.33  stalled cycles per insn 
 ( +-  0.79% ) [83.33%]
13,272,362,192 branches  #  666.359 M/sec   
 ( +-  0.83% ) [83.34%]
   278,357,773 branch-misses #2.10% of all branches 
 ( +-  0.65% ) [83.33%]

  20.850558455 seconds time elapsed 
 ( +-  0.55% )

That is, 2.70% slowdown.

The perf difference shrinks a bit when using a high-performance allocator
such as tcmalloc:

Before:
  19372.008814 task-clock#0.957 CPUs utilized   
 ( +-  1.00% )
23,621 context-switches  #0.001 M/sec   
 ( +-  0.50% )
 1 CPU-migrations#0.000 M/sec
13,289 page-faults   #0.001 M/sec   
 ( +-  1.46% )
55,824,272,818 cycles#2.882 GHz 
 ( +-  1.00% ) [83.33%]
25,284,946,453 stalled-cycles-frontend   #   45.29% frontend cycles idle
 ( +-  1.12% ) [83.32%]
17,100,517,753 stalled-cycles-backend#   30.63% backend  cycles idle
 ( +-  0.86% ) [66.69%]
78,193,046,990 instructions  #1.40  insns per cycle
 #0.32  stalled cycles per insn 
 ( +-  1.14% ) [83.35%]
12,986,014,194 branches  #  670.349 M/sec   
 ( +-  1.22% ) [83.34%]
   272,581,789 branch-misses #2.10% of all branches 
 ( +-  0.62% ) [83.33%]

  20.249726404 seconds time elapsed 
 ( +-  0.61% )

After:
  19809.295886 task-clock#0.962 CPUs utilized   
 ( +-  0.99% )
23,894 context-switches  #0.001 M/sec   
 ( +-  0.50% )
 1 CPU-migrations#0.000 M/sec
12,927 page-faults   #0.001 M/sec   
 ( +-  0.78% )
57,131,686,004 cycles#2.884 GHz 
 ( +-  0.97% ) [83.34%]
25,965,120,001 stalled-cycles-frontend   #   45.45% frontend cycles idle
 ( +-  0.71% ) [83.35%]
17,534,942,176 stalled-cycles-backend#   30.69% backend  

[Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

Compile-tested for all targets on an x86_64 host.

Suggested-by: Richard Henderson 
Acked-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 tcg/tci.c | 552 +++---
 1 file changed, 279 insertions(+), 273 deletions(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 4bdc645..f3216c1 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -55,93 +55,95 @@ typedef uint64_t (*helper_function)(tcg_target_ulong, 
tcg_target_ulong,
 tcg_target_ulong);
 #endif
 
-static tcg_target_ulong tci_reg[TCG_TARGET_NB_REGS];
-
-static tcg_target_ulong tci_read_reg(TCGReg index)
+static tcg_target_ulong tci_read_reg(const tcg_target_ulong *regs, TCGReg 
index)
 {
-tci_assert(index < ARRAY_SIZE(tci_reg));
-return tci_reg[index];
+tci_assert(index < TCG_TARGET_NB_REGS);
+return regs[index];
 }
 
 #if TCG_TARGET_HAS_ext8s_i32 || TCG_TARGET_HAS_ext8s_i64
-static int8_t tci_read_reg8s(TCGReg index)
+static int8_t tci_read_reg8s(const tcg_target_ulong *regs, TCGReg index)
 {
-return (int8_t)tci_read_reg(index);
+return (int8_t)tci_read_reg(regs, index);
 }
 #endif
 
 #if TCG_TARGET_HAS_ext16s_i32 || TCG_TARGET_HAS_ext16s_i64
-static int16_t tci_read_reg16s(TCGReg index)
+static int16_t tci_read_reg16s(const tcg_target_ulong *regs, TCGReg index)
 {
-return (int16_t)tci_read_reg(index);
+return (int16_t)tci_read_reg(regs, index);
 }
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
-static int32_t tci_read_reg32s(TCGReg index)
+static int32_t tci_read_reg32s(const tcg_target_ulong *regs, TCGReg index)
 {
-return (int32_t)tci_read_reg(index);
+return (int32_t)tci_read_reg(regs, index);
 }
 #endif
 
-static uint8_t tci_read_reg8(TCGReg index)
+static uint8_t tci_read_reg8(const tcg_target_ulong *regs, TCGReg index)
 {
-return (uint8_t)tci_read_reg(index);
+return (uint8_t)tci_read_reg(regs, index);
 }
 
-static uint16_t tci_read_reg16(TCGReg index)
+static uint16_t tci_read_reg16(const tcg_target_ulong *regs, TCGReg index)
 {
-return (uint16_t)tci_read_reg(index);
+return (uint16_t)tci_read_reg(regs, index);
 }
 
-static uint32_t tci_read_reg32(TCGReg index)
+static uint32_t tci_read_reg32(const tcg_target_ulong *regs, TCGReg index)
 {
-return (uint32_t)tci_read_reg(index);
+return (uint32_t)tci_read_reg(regs, index);
 }
 
 #if TCG_TARGET_REG_BITS == 64
-static uint64_t tci_read_reg64(TCGReg index)
+static uint64_t tci_read_reg64(const tcg_target_ulong *regs, TCGReg index)
 {
-return tci_read_reg(index);
+return tci_read_reg(regs, index);
 }
 #endif
 
-static void tci_write_reg(TCGReg index, tcg_target_ulong value)
+static void
+tci_write_reg(tcg_target_ulong *regs, TCGReg index, tcg_target_ulong value)
 {
-tci_assert(index < ARRAY_SIZE(tci_reg));
+tci_assert(index < TCG_TARGET_NB_REGS);
 tci_assert(index != TCG_AREG0);
 tci_assert(index != TCG_REG_CALL_STACK);
-tci_reg[index] = value;
+regs[index] = value;
 }
 
 #if TCG_TARGET_REG_BITS == 64
-static void tci_write_reg32s(TCGReg index, int32_t value)
+static void
+tci_write_reg32s(tcg_target_ulong *regs, TCGReg index, int32_t value)
 {
-tci_write_reg(index, value);
+tci_write_reg(regs, index, value);
 }
 #endif
 
-static void tci_write_reg8(TCGReg index, uint8_t value)
+static void tci_write_reg8(tcg_target_ulong *regs, TCGReg index, uint8_t value)
 {
-tci_write_reg(index, value);
+tci_write_reg(regs, index, value);
 }
 
-static void tci_write_reg32(TCGReg index, uint32_t value)
+static void
+tci_write_reg32(tcg_target_ulong *regs, TCGReg index, uint32_t value)
 {
-tci_write_reg(index, value);
+tci_write_reg(regs, index, value);
 }
 
 #if TCG_TARGET_REG_BITS == 32
-static void tci_write_reg64(uint32_t high_index, uint32_t low_index,
-uint64_t value)
+static void tci_write_reg64(tcg_target_ulong *regs, uint32_t high_index,
+uint32_t low_index, uint64_t value)
 {
-tci_write_reg(low_index, value);
-tci_write_reg(high_index, value >> 32);
+tci_write_reg(regs, low_index, value);
+tci_write_reg(regs, high_index, value >> 32);
 }
 #elif TCG_TARGET_REG_BITS == 64
-static void tci_write_reg64(TCGReg index, uint64_t value)
+static void
+tci_write_reg64(tcg_target_ulong *regs, TCGReg index, uint64_t value)
 {
-tci_write_reg(index, value);
+tci_write_reg(regs, index, value);
 }
 #endif
 
@@ -188,94 +190,97 @@ static uint64_t tci_read_i64(uint8_t **tb_ptr)
 #endif
 
 /* Read indexed register (native size) from bytecode. */
-static tcg_target_ulong tci_read_r(uint8_t **tb_ptr)
+static tcg_target_ulong
+tci_read_r(const tcg_target_ulong *regs, uint8_t **tb_ptr)
 {
-tcg_target_ulong value = tci_read_reg(**tb_ptr);
+tcg_target_ulong value = tci_read_reg(regs, **tb_ptr);
 *tb_ptr += 1;
 return value;
 }
 
 /* Read 

[Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER

2017-07-19 Thread Emilio G. Cota
Will come in handy very soon.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0ddd0dc..cb4ecbd 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 static void tcg_out_tb_init(TCGContext *s);
 static bool tcg_out_tb_finalize(TCGContext *s);
 
+#define TCG_HIGHWATER 1024
+
 static TCGContext **tcg_ctxs;
 static unsigned int n_tcg_ctxs;
 
@@ -435,7 +437,7 @@ void tcg_prologue_init(TCGContext *s)
 /* Compute a high-water mark, at which we voluntarily flush the buffer
and start over.  The size here is arbitrary, significantly larger
than we expect the code generation for any one opcode to require.  */
-s->code_gen_highwater = s->code_gen_buffer + (total_size - 1024);
+s->code_gen_highwater = s->code_gen_buffer + (total_size - TCG_HIGHWATER);
 
 tcg_register_jit(s->code_gen_buffer, total_size);
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 17/43] target/s390x: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Signed-off-by: Emilio G. Cota 
---
 target/s390x/helper.h |  4 +++
 target/s390x/mem_helper.c | 80 +--
 target/s390x/translate.c  | 26 ---
 3 files changed, 88 insertions(+), 22 deletions(-)

diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 4b02907..84a4597 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -34,7 +34,9 @@ DEF_HELPER_3(celgb, i64, env, i64, i32)
 DEF_HELPER_3(cdlgb, i64, env, i64, i32)
 DEF_HELPER_3(cxlgb, i64, env, i64, i32)
 DEF_HELPER_4(cdsg, void, env, i64, i32, i32)
+DEF_HELPER_4(cdsg_parallel, void, env, i64, i32, i32)
 DEF_HELPER_4(csst, i32, env, i32, i64, i64)
+DEF_HELPER_4(csst_parallel, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_3(aeb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(adb, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_5(axb, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i64)
@@ -107,7 +109,9 @@ DEF_HELPER_FLAGS_1(popcnt, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(stfl, TCG_CALL_NO_RWG, void, env)
 DEF_HELPER_2(stfle, i32, env, i64)
 DEF_HELPER_FLAGS_2(lpq, TCG_CALL_NO_WG, i64, env, i64)
+DEF_HELPER_FLAGS_2(lpq_parallel, TCG_CALL_NO_WG, i64, env, i64)
 DEF_HELPER_FLAGS_4(stpq, TCG_CALL_NO_WG, void, env, i64, i64, i64)
+DEF_HELPER_FLAGS_4(stpq_parallel, TCG_CALL_NO_WG, void, env, i64, i64, i64)
 DEF_HELPER_4(mvcos, i32, env, i64, i64, i64)
 DEF_HELPER_4(cu12, i32, env, i32, i32, i32)
 DEF_HELPER_4(cu14, i32, env, i32, i32, i32)
diff --git a/target/s390x/mem_helper.c b/target/s390x/mem_helper.c
index cdc78aa..74a2157 100644
--- a/target/s390x/mem_helper.c
+++ b/target/s390x/mem_helper.c
@@ -1363,8 +1363,8 @@ uint32_t HELPER(trXX)(CPUS390XState *env, uint32_t r1, 
uint32_t r2,
 return cc;
 }
 
-void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
-  uint32_t r1, uint32_t r3)
+static void do_cdsg(CPUS390XState *env, uint64_t addr,
+uint32_t r1, uint32_t r3, bool parallel)
 {
 uintptr_t ra = GETPC();
 Int128 cmpv = int128_make128(env->regs[r1 + 1], env->regs[r1]);
@@ -1372,7 +1372,7 @@ void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
 Int128 oldv;
 bool fail;
 
-if (parallel_cpus) {
+if (parallel) {
 #ifndef CONFIG_ATOMIC128
 cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -1404,7 +1404,20 @@ void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
 env->regs[r1 + 1] = int128_getlo(oldv);
 }
 
-uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, uint64_t a1, uint64_t 
a2)
+void HELPER(cdsg)(CPUS390XState *env, uint64_t addr,
+  uint32_t r1, uint32_t r3)
+{
+do_cdsg(env, addr, r1, r3, false);
+}
+
+void HELPER(cdsg_parallel)(CPUS390XState *env, uint64_t addr,
+   uint32_t r1, uint32_t r3)
+{
+do_cdsg(env, addr, r1, r3, true);
+}
+
+static uint32_t do_csst(CPUS390XState *env, uint32_t r3, uint64_t a1,
+uint64_t a2, bool parallel)
 {
 #if !defined(CONFIG_USER_ONLY) || defined(CONFIG_ATOMIC128)
 uint32_t mem_idx = cpu_mmu_index(env, false);
@@ -1440,7 +1453,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, 
uint64_t a1, uint64_t a2)
the complete operation is not.  Therefore we do not need to assert 
serial
context in order to implement this.  That said, restart early if we 
can't
support either operation that is supposed to be atomic.  */
-if (parallel_cpus) {
+if (parallel) {
 int mask = 0;
 #if !defined(CONFIG_ATOMIC64)
 mask = -8;
@@ -1464,7 +1477,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, 
uint64_t a1, uint64_t a2)
 uint32_t cv = env->regs[r3];
 uint32_t ov;
 
-if (parallel_cpus) {
+if (parallel) {
 #ifdef CONFIG_USER_ONLY
 uint32_t *haddr = g2h(a1);
 ov = atomic_cmpxchg__nocheck(haddr, cv, nv);
@@ -1487,7 +1500,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, 
uint64_t a1, uint64_t a2)
 uint64_t cv = env->regs[r3];
 uint64_t ov;
 
-if (parallel_cpus) {
+if (parallel) {
 #ifdef CONFIG_ATOMIC64
 # ifdef CONFIG_USER_ONLY
 uint64_t *haddr = g2h(a1);
@@ -1497,7 +1510,7 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, 
uint64_t a1, uint64_t a2)
 ov = helper_atomic_cmpxchgq_be_mmu(env, a1, cv, nv, oi, ra);
 # endif
 #else
-/* Note that we asserted !parallel_cpus above.  */
+/* Note that we asserted !parallel above.  */
 g_assert_not_reached();
 #endif
 } else {
@@ -1517,13 +1530,13 @@ uint32_t HELPER(csst)(CPUS390XState *env, uint32_t r3, 
uint64_t a1, uint64_t a2)
 Int128 cv = int128_make128(env->regs[r3 + 1], env->regs[r3]);
 Int128 ov;
 
-if (parallel_cpus) {
+  

[Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer

2017-07-19 Thread Emilio G. Cota
This is groundwork for supporting multiple TCG contexts.

The naive solution here is to split code_gen_buffer statically
among the TCG threads; this however results in poor utilization
if translation needs are different across TCG threads.

What we do here is to add an extra layer of indirection, assigning
regions that act just like pages do in virtual memory allocation.
(BTW if you are wondering about the chosen naming, I did not want
to use blocks or pages because those are already heavily used in QEMU).

We use a global lock to serialize allocations as well as statistics
reporting (we now export the size of the used code_gen_buffer with
tcg_code_size()). Note that for the allocator we could just use
a counter and atomic_inc; however, that would complicate the gathering
of tcg_code_size()-like stats. So given that the region operations are
not a fast path, a lock seems the most reasonable choice.

The effectiveness of this approach is clear after seeing some numbers.
I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
Note that I'm evaluating this after enabling per-thread TCG (which
is done by a subsequent commit).

* -smp 1, 1 region (entire buffer):
qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367

That is, 8 flushes.

* -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:

qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368

Again, 8 flushes. Note how buffer utilization is not 100%, but it
is close. Smaller region sizes would yield higher utilization,
but we want region allocation to be rare (it acquires a lock), so
we do not want to go too small.

* -smp 8, static partitioning of 8 regions (10 MB per region):
qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
qemu: flush code_size=3984 nb_tbs=71433 avg_tb_size=372
qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
qemu: flush code_size=3360 nb_tbs=59514 avg_tb_size=378
qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364

That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.

Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.h |   6 ++
 accel/tcg/translate-all.c |  63 +---
 bsd-user/main.c   |   1 +
 cpus.c|  12 +++
 linux-user/main.c |   1 +
 tcg/tcg.c | 183 +-
 6 files changed, 221 insertions(+), 45 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 3611141..3365da8 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -760,6 +760,12 @@ void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 TranslationBlock *tcg_tb_alloc(TCGContext *s);
 
+void tcg_region_init(void);
+void tcg_region_reset_all(void);
+
+size_t tcg_code_size(void);
+size_t tcg_code_capacity(void);
+
 /* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e930bac..623b9e7 100644
--- a/accel/tcg/translate-all.c

[Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's

2017-07-19 Thread Emilio G. Cota
This is groundwork for supporting multiple TCG contexts.

To avoid scalability issues when profiling info is enabled, this patch
makes the profiling info counters distributed via the following changes:

1) Consolidate profile info into its own struct, TCGProfile, which
   TCGContext also includes. Note that tcg_table_op_count is brought
   into TCGProfile after dropping the tcg_ prefix.
2) Iterate over the TCG contexts in the system to obtain the total counts.

This change also requires updating the accessors to TCGProfile fields to
use atomic_read/set whenever there may be conflicting accesses (as defined
in C11) to them.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.h |  38 +---
 accel/tcg/translate-all.c |  23 +-
 tcg/tcg.c | 110 ++
 3 files changed, 126 insertions(+), 45 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index f83f9b0..3611141 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -641,6 +641,26 @@ QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
 /* Make sure that we don't overflow 64 bits without noticing.  */
 QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
 
+typedef struct TCGProfile {
+int64_t tb_count1;
+int64_t tb_count;
+int64_t op_count; /* total insn count */
+int op_count_max; /* max insn per TB */
+int64_t temp_count;
+int temp_count_max;
+int64_t del_op_count;
+int64_t code_in_len;
+int64_t code_out_len;
+int64_t search_out_len;
+int64_t interm_time;
+int64_t code_time;
+int64_t la_time;
+int64_t opt_time;
+int64_t restore_count;
+int64_t restore_time;
+int64_t table_op_count[NB_OPS];
+} TCGProfile;
+
 struct TCGContext {
 uint8_t *pool_cur, *pool_end;
 TCGPool *pool_first, *pool_current, *pool_first_large;
@@ -665,23 +685,7 @@ struct TCGContext {
 tcg_insn_unit *code_ptr;
 
 #ifdef CONFIG_PROFILER
-/* profiling info */
-int64_t tb_count1;
-int64_t tb_count;
-int64_t op_count; /* total insn count */
-int op_count_max; /* max insn per TB */
-int64_t temp_count;
-int temp_count_max;
-int64_t del_op_count;
-int64_t code_in_len;
-int64_t code_out_len;
-int64_t search_out_len;
-int64_t interm_time;
-int64_t code_time;
-int64_t la_time;
-int64_t opt_time;
-int64_t restore_count;
-int64_t restore_time;
+TCGProfile prof;
 #endif
 
 #ifdef CONFIG_DEBUG_TCG
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index e6ee4e3..36b17ac 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -312,6 +312,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 uint8_t *p = tb->tc.search;
 int i, j, num_insns = tb->icount;
 #ifdef CONFIG_PROFILER
+TCGProfile *prof = _ctx->prof;
 int64_t ti = profile_getclock();
 #endif
 
@@ -346,8 +347,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 restore_state_to_opc(env, tb, data);
 
 #ifdef CONFIG_PROFILER
-tcg_ctx->restore_time += profile_getclock() - ti;
-tcg_ctx->restore_count++;
+atomic_set(>restore_time,
+prof->restore_time + profile_getclock() - ti);
+atomic_set(>restore_count, prof->restore_count + 1);
 #endif
 return 0;
 }
@@ -1302,6 +1304,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_insn_unit *gen_code_buf;
 int gen_code_size, search_size;
 #ifdef CONFIG_PROFILER
+TCGProfile *prof = _ctx->prof;
 int64_t ti;
 #endif
 assert_memory_lock();
@@ -1332,8 +1335,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_ctx->cf_parallel = !!(cflags & CF_PARALLEL);
 
 #ifdef CONFIG_PROFILER
-tcg_ctx->tb_count1++; /* includes aborted translations because of
-   exceptions */
+/* includes aborted translations because of exceptions */
+atomic_set(>tb_count1, prof->tb_count1 + 1);
 ti = profile_getclock();
 #endif
 
@@ -1358,8 +1361,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 #endif
 
 #ifdef CONFIG_PROFILER
-tcg_ctx->tb_count++;
-tcg_ctx->interm_time += profile_getclock() - ti;
+atomic_set(>tb_count, prof->tb_count + 1);
+atomic_set(>interm_time, prof->interm_time + profile_getclock() - 
ti);
 ti = profile_getclock();
 #endif
 
@@ -1379,10 +1382,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tb->tc.size = gen_code_size;
 
 #ifdef CONFIG_PROFILER
-tcg_ctx->code_time += profile_getclock() - ti;
-tcg_ctx->code_in_len += tb->size;
-tcg_ctx->code_out_len += gen_code_size;
-tcg_ctx->search_out_len += search_size;
+atomic_set(>code_time, prof->code_time + profile_getclock() - ti);
+atomic_set(>code_in_len, prof->code_in_len + tb->size);
+atomic_set(>code_out_len, prof->code_out_len + gen_code_size);
+atomic_set(>search_out_len, prof->search_out_len + search_size);
 #endif
 
 #ifdef DEBUG_DISAS
diff 

[Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb)

2017-07-19 Thread Emilio G. Cota
Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.

Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.

Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:
FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h | \
cut -f 1 -d':' | sort | uniq)
perl -pi -e 's/([^>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z]*)->tb->cflags/tb_cflags($1->tb)/g' $FILES

Then manually fixed the few errors that checkpatch reported.

Compile-tested for all targets.

Suggested-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/gen-icount.h |  8 +++
 target/alpha/translate.c  | 12 +-
 target/arm/translate-a64.c| 13 +-
 target/arm/translate.c| 10 
 target/cris/translate.c   |  6 ++---
 target/hppa/translate.c   |  8 +++
 target/i386/translate.c   | 55 ++-
 target/lm32/translate.c   | 14 +--
 target/m68k/translate.c   |  6 ++---
 target/microblaze/translate.c |  6 ++---
 target/mips/translate.c   | 26 ++--
 target/moxie/translate.c  |  2 +-
 target/nios2/translate.c  |  6 ++---
 target/openrisc/translate.c   |  6 ++---
 target/ppc/translate.c|  6 ++---
 target/ppc/translate_init.c   | 32 -
 target/s390x/translate.c  |  8 +++
 target/sh4/translate.c|  6 ++---
 target/sparc/translate.c  |  6 ++---
 target/tilegx/translate.c |  2 +-
 target/tricore/translate.c|  2 +-
 target/unicore32/translate.c  |  6 ++---
 target/xtensa/translate.c | 28 +++---
 23 files changed, 138 insertions(+), 136 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 9b3cb14..48b566c 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -13,7 +13,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 TCGv_i32 count, imm;
 
 exitreq_label = gen_new_label();
-if (tb->cflags & CF_USE_ICOUNT) {
+if (tb_cflags(tb) & CF_USE_ICOUNT) {
 count = tcg_temp_local_new_i32();
 } else {
 count = tcg_temp_new_i32();
@@ -22,7 +22,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_gen_ld_i32(count, tcg_ctx.tcg_env,
-ENV_OFFSET + offsetof(CPUState, icount_decr.u32));
 
-if (tb->cflags & CF_USE_ICOUNT) {
+if (tb_cflags(tb) & CF_USE_ICOUNT) {
 imm = tcg_temp_new_i32();
 /* We emit a movi with a dummy immediate argument. Keep the insn index
  * of the movi so that we later (when we know the actual insn count)
@@ -36,7 +36,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 
 tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
 
-if (tb->cflags & CF_USE_ICOUNT) {
+if (tb_cflags(tb) & CF_USE_ICOUNT) {
 tcg_gen_st16_i32(count, tcg_ctx.tcg_env,
  -ENV_OFFSET + offsetof(CPUState, 
icount_decr.u16.low));
 }
@@ -46,7 +46,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 
 static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
-if (tb->cflags & CF_USE_ICOUNT) {
+if (tb_cflags(tb) & CF_USE_ICOUNT) {
 /* Update the num_insn immediate parameter now that we know
  * the actual insn count.  */
 tcg_set_insn_param(icount_start_insn_idx, 1, num_insns);
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 9e98312..f97a8e5 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -484,9 +484,9 @@ static bool in_superpage(DisasContext *ctx, int64_t addr)
 
 static bool use_exit_tb(DisasContext *ctx)
 {
-return ((ctx->tb->cflags & CF_LAST_IO)
+return (tb_cflags(ctx->tb) & CF_LAST_IO)
 || ctx->singlestep_enabled
-|| singlestep);
+|| singlestep;
 }
 
 static bool use_goto_tb(DisasContext *ctx, uint64_t dest)
@@ -2430,7 +2430,7 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 case 0xC000:
 /* RPCC */
 va = dest_gpr(ctx, ra);
-if (ctx->tb->cflags & CF_USE_ICOUNT) {
+if (tb_cflags(ctx->tb) & CF_USE_ICOUNT) {
 gen_io_start();
 gen_helper_load_pcc(va, cpu_env);
 gen_io_end();
@@ -2998,7 +2998,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 TCGV_UNUSED_I64(ctx.lit);
 
 num_insns = 0;
-max_insns = tb->cflags & CF_COUNT_MASK;
+max_insns = tb_cflags(tb) & CF_COUNT_MASK;
 if (max_insns == 0) {
 max_insns = CF_COUNT_MASK;
 }
@@ -3028,7 +3028,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
   

[Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing

2017-07-19 Thread Emilio G. Cota
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   | 20 +++-
 include/exec/tb-hash-xx.h |  9 ++---
 include/exec/tb-hash.h|  4 ++--
 include/exec/tb-lookup.h  |  6 +++---
 tcg/tcg.h |  1 -
 accel/tcg/cpu-exec.c  | 45 +++--
 accel/tcg/translate-all.c | 13 +
 exec.c|  2 +-
 tcg/tcg-runtime.c |  2 +-
 tests/qht-bench.c |  2 +-
 10 files changed, 65 insertions(+), 39 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 256b9a6..0af0485 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -353,6 +353,9 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT  0x2
 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */
 #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */
+#define CF_PARALLEL0x10 /* Generate code for a parallel context */
+/* cflags' mask for hashing/comparison */
+#define CF_HASH_MASK (CF_PARALLEL)
 
 /* Per-vCPU dynamic tracing state used to generate this TB */
 uint32_t trace_vcpu_dstate;
@@ -396,11 +399,26 @@ struct TranslationBlock {
 uintptr_t jmp_list_first;
 };
 
+extern bool parallel_cpus;
+
+/* Hide the atomic_read to make code a little easier on the eyes */
+static inline uint32_t tb_cflags(const TranslationBlock *tb)
+{
+return atomic_read(>cflags);
+}
+
+/* current cflags for hashing/comparison */
+static inline uint32_t curr_cflags(void)
+{
+return parallel_cpus ? CF_PARALLEL : 0;
+}
+
 void tb_free(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
-   target_ulong cs_base, uint32_t flags);
+   target_ulong cs_base, uint32_t flags,
+   uint32_t cf_mask);
 
 #if defined(USE_DIRECT_JUMP)
 
diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
index 6cd3022..747a9a6 100644
--- a/include/exec/tb-hash-xx.h
+++ b/include/exec/tb-hash-xx.h
@@ -48,8 +48,8 @@
  * xxhash32, customized for input variables that are not guaranteed to be
  * contiguous in memory.
  */
-static inline
-uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f)
+static inline uint32_t
+tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g)
 {
 uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2;
 uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2;
@@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, 
uint32_t f)
 v4 *= PRIME32_1;
 
 h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18);
-h32 += 24;
+h32 += 28;
 
 h32 += e * PRIME32_3;
 h32  = rol32(h32, 17) * PRIME32_4;
@@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, 
uint32_t f)
 h32 += f * PRIME32_3;
 h32  = rol32(h32, 17) * PRIME32_4;
 
+h32 += g * PRIME32_3;
+h32  = rol32(h32, 17) * PRIME32_4;
+
 h32 ^= h32 >> 15;
 h32 *= PRIME32_2;
 h32 ^= h32 >> 13;
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 17b5ee0..0526c4f 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -59,9 +59,9 @@ static inline unsigned int 
tb_jmp_cache_hash_func(target_ulong pc)
 
 static inline
 uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
-  uint32_t trace_vcpu_dstate)
+  uint32_t cf_mask, uint32_t trace_vcpu_dstate)
 {
-return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate);
+return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
 }
 
 #endif
diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
index 436b6d5..2961385 100644
--- a/include/exec/tb-lookup.h
+++ b/include/exec/tb-lookup.h
@@ -21,7 +21,7 @@
 /* Might cause an exception, so have a longjmp destination ready */
 static inline TranslationBlock *
 tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
- uint32_t *flags)
+ uint32_t *flags, uint32_t cf_mask)
 {
 CPUArchState *env = (CPUArchState *)cpu->env_ptr;
 TranslationBlock *tb;
@@ -35,10 +35,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, 
target_ulong *cs_base,
tb->cs_base == *cs_base &&
tb->flags == *flags &&
tb->trace_vcpu_dstate == *cpu->trace_dstate &&
-   !(atomic_read(>cflags) & CF_INVALID))) {
+   

[Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h

2017-07-19 Thread Emilio G. Cota
These only depend on the host and therefore belong in the common
osdep, not in a target-dependent object.

While at it, query the host during an init constructor, which guarantees
the page size will be well-defined throughout the execution of the program.

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/cpu-all.h |  2 --
 include/qemu/osdep.h   |  6 ++
 exec.c |  4 
 util/pagesize.c| 18 ++
 util/Makefile.objs |  1 +
 5 files changed, 25 insertions(+), 6 deletions(-)
 create mode 100644 util/pagesize.c

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index ffe43d5..778031c 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -229,8 +229,6 @@ extern int target_page_bits;
 /* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
  * when intptr_t is 32-bit and we are aligning a long long.
  */
-extern uintptr_t qemu_real_host_page_size;
-extern intptr_t qemu_real_host_page_mask;
 extern uintptr_t qemu_host_page_size;
 extern intptr_t qemu_host_page_mask;
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 3b74f6f..0cba871 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -481,6 +481,12 @@ char *qemu_get_pid_name(pid_t pid);
  */
 pid_t qemu_fork(Error **errp);
 
+/* Using intptr_t ensures that qemu_*_page_mask is sign-extended even
+ * when intptr_t is 32-bit and we are aligning a long long.
+ */
+extern uintptr_t qemu_real_host_page_size;
+extern intptr_t qemu_real_host_page_mask;
+
 extern int qemu_icache_linesize;
 extern int qemu_dcache_linesize;
 
diff --git a/exec.c b/exec.c
index 94b0f3e..6e85535 100644
--- a/exec.c
+++ b/exec.c
@@ -121,8 +121,6 @@ int use_icount;
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
-uintptr_t qemu_real_host_page_size;
-intptr_t qemu_real_host_page_mask;
 
 bool set_preferred_target_page_bits(int bits)
 {
@@ -3621,8 +3619,6 @@ void page_size_init(void)
 {
 /* NOTE: we can always suppose that qemu_host_page_size >=
TARGET_PAGE_SIZE */
-qemu_real_host_page_size = getpagesize();
-qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
 if (qemu_host_page_size == 0) {
 qemu_host_page_size = qemu_real_host_page_size;
 }
diff --git a/util/pagesize.c b/util/pagesize.c
new file mode 100644
index 000..998632c
--- /dev/null
+++ b/util/pagesize.c
@@ -0,0 +1,18 @@
+/*
+ * pagesize.c - query the host about its page size
+ *
+ * Copyright (C) 2017, Emilio G. Cota 
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+
+uintptr_t qemu_real_host_page_size;
+intptr_t qemu_real_host_page_mask;
+
+static void __attribute__((constructor)) init_real_host_page_size(void)
+{
+qemu_real_host_page_size = getpagesize();
+qemu_real_host_page_mask = -(intptr_t)qemu_real_host_page_size;
+}
diff --git a/util/Makefile.objs b/util/Makefile.objs
index 50a55ec..2973b0a 100644
--- a/util/Makefile.objs
+++ b/util/Makefile.objs
@@ -40,6 +40,7 @@ util-obj-y += buffer.o
 util-obj-y += timed-average.o
 util-obj-y += base64.o
 util-obj-y += log.o
+util-obj-y += pagesize.o
 util-obj-y += qdist.o
 util-obj-y += qht.o
 util-obj-y += range.o
-- 
2.7.4




[Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers

2017-07-19 Thread Emilio G. Cota
The helpers require the address and size to be page-aligned, so
do that before calling them.

Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 61 ++-
 1 file changed, 13 insertions(+), 48 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 36b17ac..e930bac 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -604,63 +604,24 @@ static inline void *split_cross_256mb(void *buf1, size_t 
size1)
 static uint8_t static_code_gen_buffer[DEFAULT_CODE_GEN_BUFFER_SIZE]
 __attribute__((aligned(CODE_GEN_ALIGN)));
 
-# ifdef _WIN32
-static inline void do_protect(void *addr, long size, int prot)
-{
-DWORD old_protect;
-VirtualProtect(addr, size, prot, _protect);
-}
-
-static inline void map_exec(void *addr, long size)
-{
-do_protect(addr, size, PAGE_EXECUTE_READWRITE);
-}
-
-static inline void map_none(void *addr, long size)
-{
-do_protect(addr, size, PAGE_NOACCESS);
-}
-# else
-static inline void do_protect(void *addr, long size, int prot)
-{
-uintptr_t start, end;
-
-start = (uintptr_t)addr;
-start &= qemu_real_host_page_mask;
-
-end = (uintptr_t)addr + size;
-end = ROUND_UP(end, qemu_real_host_page_size);
-
-mprotect((void *)start, end - start, prot);
-}
-
-static inline void map_exec(void *addr, long size)
-{
-do_protect(addr, size, PROT_READ | PROT_WRITE | PROT_EXEC);
-}
-
-static inline void map_none(void *addr, long size)
-{
-do_protect(addr, size, PROT_NONE);
-}
-# endif /* WIN32 */
-
 static inline void *alloc_code_gen_buffer(void)
 {
 void *buf = static_code_gen_buffer;
+void *end = static_code_gen_buffer + sizeof(static_code_gen_buffer);
 size_t full_size, size;
 
-/* The size of the buffer, rounded down to end on a page boundary.  */
-full_size = (((uintptr_t)buf + sizeof(static_code_gen_buffer))
- & qemu_real_host_page_mask) - (uintptr_t)buf;
+/* page-align the beginning and end of the buffer */
+buf = QEMU_ALIGN_PTR_UP(buf, qemu_real_host_page_size);
+end = QEMU_ALIGN_PTR_DOWN(end, qemu_real_host_page_size);
 
 /* Reserve a guard page.  */
+full_size = end - buf;
 size = full_size - qemu_real_host_page_size;
 
 /* Honor a command-line option limiting the size of the buffer.  */
 if (size > tcg_ctx->code_gen_buffer_size) {
-size = (((uintptr_t)buf + tcg_ctx->code_gen_buffer_size)
-& qemu_real_host_page_mask) - (uintptr_t)buf;
+size = QEMU_ALIGN_DOWN(tcg_ctx->code_gen_buffer_size,
+   qemu_real_host_page_size);
 }
 tcg_ctx->code_gen_buffer_size = size;
 
@@ -671,8 +632,12 @@ static inline void *alloc_code_gen_buffer(void)
 }
 #endif
 
-map_exec(buf, size);
-map_none(buf + size, qemu_real_host_page_size);
+if (qemu_mprotect_rwx(buf, size)) {
+abort();
+}
+if (qemu_mprotect_none(buf + size, qemu_real_host_page_size)) {
+abort();
+}
 qemu_madvise(buf, size, QEMU_MADV_HUGEPAGE);
 
 return buf;
-- 
2.7.4




[Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/gen-icount.h | 7 +++
 tcg/tcg.h | 2 ++
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index c58b0b2..fe80176 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -6,13 +6,12 @@
 /* Helpers for instruction counting code generation.  */
 
 static int icount_start_insn_idx;
-static TCGLabel *exitreq_label;
 
 static inline void gen_tb_start(TranslationBlock *tb)
 {
 TCGv_i32 count, imm;
 
-exitreq_label = gen_new_label();
+tcg_ctx->exitreq_label = gen_new_label();
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
 count = tcg_temp_local_new_i32();
 } else {
@@ -34,7 +33,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_temp_free_i32(imm);
 }
 
-tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, exitreq_label);
+tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
 
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
 tcg_gen_st16_i32(count, tcg_ctx->tcg_env,
@@ -52,7 +51,7 @@ static inline void gen_tb_end(TranslationBlock *tb, int 
num_insns)
 tcg_set_insn_param(icount_start_insn_idx, 1, num_insns);
 }
 
-gen_set_label(exitreq_label);
+gen_set_label(tcg_ctx->exitreq_label);
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
 
 /* Terminate the linked list.  */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index c88746d..f83f9b0 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -712,6 +712,8 @@ struct TCGContext {
 /* The TCGBackendData structure is private to tcg-target.inc.c.  */
 struct TCGBackendData *be;
 
+TCGLabel *exitreq_label;
+
 TCGTempSet free_temps[TCG_TYPE_COUNT * 2];
 TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
 
-- 
2.7.4




[Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/tb-context.h |  2 ++
 tcg/tcg.h |  2 --
 accel/tcg/cpu-exec.c  |  2 +-
 accel/tcg/translate-all.c | 57 +++
 linux-user/main.c |  6 ++---
 5 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index 1fa8dcc..1d41202 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -41,4 +41,6 @@ struct TBContext {
 int tb_phys_invalidate_count;
 };
 
+extern TBContext tb_ctx;
+
 #endif
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 9b6dade..22f7ecd 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -707,8 +707,6 @@ struct TCGContext {
 /* Threshold to flush the translated code buffer.  */
 void *code_gen_highwater;
 
-TBContext tb_ctx;
-
 /* Track which vCPU triggers events */
 CPUState *cpu;  /* *_trans */
 TCGv_env tcg_env;   /* *_exec  */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 1963bda..f42096a 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -325,7 +325,7 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
target_ulong pc,
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
 h = tb_hash_func(phys_pc, pc, flags, cf_mask, *cpu->trace_dstate);
-return qht_lookup(_ctx.tb_ctx.htable, tb_cmp, , h);
+return qht_lookup(_ctx.htable, tb_cmp, , h);
 }
 
 static inline TranslationBlock *tb_find(CPUState *cpu,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index d50e2b9..5509407 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -154,6 +154,7 @@ static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
+TBContext tb_ctx;
 bool parallel_cpus;
 
 /* translation block context */
@@ -185,7 +186,7 @@ static void page_table_config_init(void)
 void tb_lock(void)
 {
 assert_tb_unlocked();
-qemu_mutex_lock(_ctx.tb_ctx.tb_lock);
+qemu_mutex_lock(_ctx.tb_lock);
 have_tb_lock++;
 }
 
@@ -193,13 +194,13 @@ void tb_unlock(void)
 {
 assert_tb_locked();
 have_tb_lock--;
-qemu_mutex_unlock(_ctx.tb_ctx.tb_lock);
+qemu_mutex_unlock(_ctx.tb_lock);
 }
 
 void tb_lock_reset(void)
 {
 if (have_tb_lock) {
-qemu_mutex_unlock(_ctx.tb_ctx.tb_lock);
+qemu_mutex_unlock(_ctx.tb_lock);
 have_tb_lock = 0;
 }
 }
@@ -826,15 +827,15 @@ static inline void code_gen_alloc(size_t tb_size)
 fprintf(stderr, "Could not allocate dynamic translator buffer\n");
 exit(1);
 }
-tcg_ctx.tb_ctx.tb_tree = g_tree_new(tb_tc_cmp);
-qemu_mutex_init(_ctx.tb_ctx.tb_lock);
+tb_ctx.tb_tree = g_tree_new(tb_tc_cmp);
+qemu_mutex_init(_ctx.tb_lock);
 }
 
 static void tb_htable_init(void)
 {
 unsigned int mode = QHT_MODE_AUTO_RESIZE;
 
-qht_init(_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
+qht_init(_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
 }
 
 /* Must be called before using the QEMU cpus. 'tb_size' is the size
@@ -878,7 +879,7 @@ void tb_remove(TranslationBlock *tb)
 {
 assert_tb_locked();
 
-g_tree_remove(tcg_ctx.tb_ctx.tb_tree, >tc);
+g_tree_remove(tb_ctx.tb_tree, >tc);
 }
 
 static inline void invalidate_page_bitmap(PageDesc *p)
@@ -940,15 +941,15 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 /* If it is already been done on request of another CPU,
  * just retry.
  */
-if (tcg_ctx.tb_ctx.tb_flush_count != tb_flush_count.host_int) {
+if (tb_ctx.tb_flush_count != tb_flush_count.host_int) {
 goto done;
 }
 
 if (DEBUG_TB_FLUSH_GATE) {
-size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
+size_t nb_tbs = g_tree_nnodes(tb_ctx.tb_tree);
 size_t host_size = 0;
 
-g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_host_size_iter, _size);
+g_tree_foreach(tb_ctx.tb_tree, tb_host_size_iter, _size);
 printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n",
tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs,
nb_tbs > 0 ? host_size / nb_tbs : 0);
@@ -963,17 +964,16 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 }
 
 /* Increment the refcount first so that destroy acts as a reset */
-g_tree_ref(tcg_ctx.tb_ctx.tb_tree);
-g_tree_destroy(tcg_ctx.tb_ctx.tb_tree);
+g_tree_ref(tb_ctx.tb_tree);
+g_tree_destroy(tb_ctx.tb_tree);
 
-qht_reset_size(_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
+qht_reset_size(_ctx.htable, CODE_GEN_HTABLE_SIZE);
 page_flush_tb();
 
 tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
 /* XXX: flush processor icache at this point if cache flush is
  

[Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none

2017-07-19 Thread Emilio G. Cota
Signed-off-by: Emilio G. Cota 
---
 include/qemu/osdep.h |  2 ++
 util/osdep.c | 41 +
 2 files changed, 43 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 0cba871..2c7d7db 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -348,6 +348,8 @@ void sigaction_invoke(struct sigaction *action,
 #endif
 
 int qemu_madvise(void *addr, size_t len, int advice);
+int qemu_mprotect_rwx(void *addr, size_t size);
+int qemu_mprotect_none(void *addr, size_t size);
 
 int qemu_open(const char *name, int flags, ...);
 int qemu_close(int fd);
diff --git a/util/osdep.c b/util/osdep.c
index a2863c8..f72d679 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -81,6 +81,47 @@ int qemu_madvise(void *addr, size_t len, int advice)
 #endif
 }
 
+static int qemu_mprotect__osdep(void *addr, size_t size, int prot)
+{
+g_assert(!((uintptr_t)addr & ~qemu_real_host_page_mask));
+g_assert(!(size & ~qemu_real_host_page_mask));
+
+#ifdef _WIN32
+DWORD old_protect;
+
+if (!VirtualProtect(addr, size, prot, _protect)) {
+error_report("%s: VirtualProtect failed with error code %d",
+ __func__, GetLastError());
+return -1;
+}
+return 0;
+#else
+if (mprotect(addr, size, prot)) {
+error_report("%s: mprotect failed: %s", __func__, strerror(errno));
+return -1;
+}
+return 0;
+#endif
+}
+
+int qemu_mprotect_rwx(void *addr, size_t size)
+{
+#ifdef _WIN32
+return qemu_mprotect__osdep(addr, size, PAGE_EXECUTE_READWRITE);
+#else
+return qemu_mprotect__osdep(addr, size, PROT_READ | PROT_WRITE | 
PROT_EXEC);
+#endif
+}
+
+int qemu_mprotect_none(void *addr, size_t size)
+{
+#ifdef _WIN32
+return qemu_mprotect__osdep(addr, size, PAGE_NOACCESS);
+#else
+return qemu_mprotect__osdep(addr, size, PROT_NONE);
+#endif
+}
+
 #ifndef _WIN32
 /*
  * Dups an fd and sets the flags
-- 
2.7.4




[Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT

2017-07-19 Thread Emilio G. Cota
And fix the following warning when DEBUG_TB_INVALIDATE is enabled
in translate-all.c:

  CC  mipsn32-linux-user/accel/tcg/translate-all.o
/data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’:
/data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects 
argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t 
{aka unsigned int}’ [-Werror=format=]
 printf("protecting code page: 0x" TARGET_FMT_lx "\n",
^
cc1: all warnings being treated as errors
/data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' 
failed
make[1]: *** [accel/tcg/translate-all.o] Error 1
Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
make: *** [subdir-mipsn32-linux-user] Error 2
cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   | 2 ++
 accel/tcg/translate-all.c | 3 +--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0af0485..00f7da8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -31,8 +31,10 @@
type.  */
 #if defined(CONFIG_USER_ONLY)
 typedef abi_ulong tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT TARGET_ABI_FMT_lx
 #else
 typedef ram_addr_t tb_page_addr_t;
+#define TB_PAGE_ADDR_FMT RAM_ADDR_FMT
 #endif
 
 /* DisasContext is_jmp field values
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c1cd258..c4c23f9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1194,8 +1194,7 @@ static inline void tb_alloc_page(TranslationBlock *tb,
 mprotect(g2h(page_addr), qemu_host_page_size,
  (prot & PAGE_BITS) & ~PAGE_WRITE);
 #ifdef DEBUG_TB_INVALIDATE
-printf("protecting code page: 0x" TARGET_FMT_lx "\n",
-   page_addr);
+printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr);
 #endif
 }
 #else
-- 
2.7.4




[Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb

2017-07-19 Thread Emilio G. Cota
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   | 20 ++--
 accel/tcg/cpu-exec.c  |  6 +++---
 accel/tcg/translate-all.c | 20 ++--
 tcg/tcg-runtime.c |  4 ++--
 tcg/tcg.c |  4 ++--
 5 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 00f7da8..bc4f41c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -341,6 +341,14 @@ static inline void tb_invalidate_phys_addr(AddressSpace 
*as, hwaddr addr)
 #define USE_DIRECT_JUMP
 #endif
 
+/*
+ * Translation Cache-related fields of a TB.
+ */
+struct tb_tc {
+void *ptr;/* pointer to the translated code */
+uint8_t *search;  /* pointer to search data */
+};
+
 struct TranslationBlock {
 target_ulong pc;   /* simulated PC corresponding to this block (EIP + CS 
base) */
 target_ulong cs_base; /* CS base for this block */
@@ -362,8 +370,8 @@ struct TranslationBlock {
 /* Per-vCPU dynamic tracing state used to generate this TB */
 uint32_t trace_vcpu_dstate;
 
-void *tc_ptr;/* pointer to the translated code */
-uint8_t *tc_search;  /* pointer to search data */
+struct tb_tc tc;
+
 /* original tb when cflags has CF_NOCACHE */
 struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
@@ -462,7 +470,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
  int n, uintptr_t addr)
 {
 uint16_t offset = tb->jmp_insn_offset[n];
-tb_set_jmp_target1((uintptr_t)(tb->tc_ptr + offset), addr);
+tb_set_jmp_target1((uintptr_t)(tb->tc.ptr + offset), addr);
 }
 
 #else
@@ -489,11 +497,11 @@ static inline void tb_add_jump(TranslationBlock *tb, int 
n,
 qemu_log_mask_and_addr(CPU_LOG_EXEC, tb->pc,
"Linking TBs %p [" TARGET_FMT_lx
"] index %d -> %p [" TARGET_FMT_lx "]\n",
-   tb->tc_ptr, tb->pc, n,
-   tb_next->tc_ptr, tb_next->pc);
+   tb->tc.ptr, tb->pc, n,
+   tb_next->tc.ptr, tb_next->pc);
 
 /* patch the native jump address */
-tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc_ptr);
+tb_set_jmp_target(tb, n, (uintptr_t)tb_next->tc.ptr);
 
 /* add in TB jmp circular list */
 tb->jmp_list_next[n] = tb_next->jmp_list_first;
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 526cab3..cb1e6d3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -143,11 +143,11 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
 uintptr_t ret;
 TranslationBlock *last_tb;
 int tb_exit;
-uint8_t *tb_ptr = itb->tc_ptr;
+uint8_t *tb_ptr = itb->tc.ptr;
 
 qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,
"Trace %p [%d: " TARGET_FMT_lx "] %s\n",
-   itb->tc_ptr, cpu->cpu_index, itb->pc,
+   itb->tc.ptr, cpu->cpu_index, itb->pc,
lookup_symbol(itb->pc));
 
 #if defined(DEBUG_DISAS)
@@ -179,7 +179,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
TranslationBlock *itb)
 qemu_log_mask_and_addr(CPU_LOG_EXEC, last_tb->pc,
"Stopped execution of TB chain before %p ["
TARGET_FMT_lx "] %s\n",
-   last_tb->tc_ptr, last_tb->pc,
+   last_tb->tc.ptr, last_tb->pc,
lookup_symbol(last_tb->pc));
 if (cc->synchronize_from_tb) {
 cc->synchronize_from_tb(cpu, last_tb);
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 845585b..0a2eb86 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -260,7 +260,7 @@ static target_long decode_sleb128(uint8_t **pp)
which comes from the host pc of the end of the code implementing the insn.
 
Each line of the table is encoded as sleb128 deltas from the previous
-   line.  The seed for the first line is { tb->pc, 0..., tb->tc_ptr }.
+   line.  The seed for the first line is { tb->pc, 0..., tb->tc.ptr }.
That is, the first column is seeded with the guest pc, the last column
with the host pc, and the middle columns with zeros.  */
 
@@ -270,7 +270,7 @@ static int encode_search(TranslationBlock *tb, uint8_t 
*block)
 uint8_t *p = block;
 int i, j, n;
 
-tb->tc_search = block;
+tb->tc.search = block;
 
 for (i = 0, n = tb->icount; i < n; ++i) {
 target_ulong prev;
@@ -305,9 +305,9 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,

[Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state

2017-07-19 Thread Emilio G. Cota
This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

   (A) Lookup succeeds for TB in hash without tb_lock
(B) Sets the TB's tb->invalid flag
(B) Removes the TB from tb_htable
(B) Clears all CPU's tb_jmp_cache
   (A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

 Performance counter stats for 'taskset -c 0 qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::-:22 \
-device virtio-net-device,netdev=unet \
-drive file=jessie.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

Before:
  18714.917392 task-clock#0.952 CPUs utilized   
 ( +-  0.95% )
23,142 context-switches  #0.001 M/sec   
 ( +-  0.50% )
 1 CPU-migrations#0.000 M/sec
10,558 page-faults   #0.001 M/sec   
 ( +-  0.95% )
53,957,727,252 cycles#2.883 GHz 
 ( +-  0.91% ) [83.33%]
24,440,599,852 stalled-cycles-frontend   #   45.30% frontend cycles idle
 ( +-  1.20% ) [83.33%]
16,495,714,424 stalled-cycles-backend#   30.57% backend  cycles idle
 ( +-  0.95% ) [66.66%]
76,267,572,582 instructions  #1.41  insns per cycle
 #0.32  stalled cycles per insn 
 ( +-  0.87% ) [83.34%]
12,692,186,323 branches  #  678.186 M/sec   
 ( +-  0.92% ) [83.35%]
   263,486,879 branch-misses #2.08% of all branches 
 ( +-  0.73% ) [83.34%]

  19.648474449 seconds time elapsed 
 ( +-  0.82% )

After, w/ inline (this patch):
  18471.376627 task-clock#0.955 CPUs utilized   
 ( +-  0.96% )
23,048 context-switches  #0.001 M/sec   
 ( +-  0.48% )
 1 CPU-migrations#0.000 M/sec
10,708 page-faults   #0.001 M/sec   
 ( +-  0.81% )
53,208,990,796 cycles#2.881 GHz 
 ( +-  0.98% ) [83.34%]
23,941,071,673 stalled-cycles-frontend   #   44.99% frontend cycles idle
 ( +-  0.95% ) [83.34%]
16,161,773,848 stalled-cycles-backend#   30.37% backend  cycles idle
 ( +-  0.76% ) [66.67%]
75,786,269,766 instructions  #1.42  insns per cycle
 #0.32  stalled cycles per insn 
 ( +-  1.24% ) [83.34%]
12,573,617,143 branches  #  680.708 M/sec   
 ( +-  1.34% ) [83.33%]
   260,235,550 branch-misses #2.07% of all branches 
 ( +-  0.66% ) [83.33%]

  19.340502161 seconds time elapsed 
 ( +-  0.56% )

After, w/o inline:
  18791.253967 task-clock#0.954 CPUs utilized   
 ( +-  0.78% )
23,230 context-switches  #0.001 M/sec   
 ( +-  0.42% )
 1 CPU-migrations#0.000 M/sec
10,563 page-faults   #0.001 M/sec   
 ( +-  1.27% )
54,168,674,622 cycles#2.883 GHz 
 ( +-  0.80% ) [83.34%]
24,244,712,629 stalled-cycles-frontend   #   44.76% frontend cycles idle
 ( +-  1.37% ) [83.33%]
16,288,648,572 stalled-cycles-backend#   30.07% backend  cycles idle
 ( +-  0.95% ) [66.66%]
77,659,755,503 instructions  #1.43  insns per cycle
 #0.31  stalled cycles per insn 
 ( +-  0.97% ) [83.34%]
12,922,780,045 branches  #  687.702 M/sec   
 ( +-  1.06% ) [83.34%]
   261,962,386 branch-misses #2.03% of all branches 
 ( +-  0.71% ) [83.35%]

  19.700174670 seconds time elapsed 
 ( +-  0.56% )

Reviewed-by: Richard Henderson 

[Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE

2017-07-19 Thread Emilio G. Cota
This prevents bit rot by ensuring the debug code is compiled when
building a user-mode target.

Unfortunately the helpers are user-mode-only so we cannot fully
get rid of the ifdef checks. Add a comment to explain this.

Suggested-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 962e9b3..845585b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -82,6 +82,12 @@
 #undef DEBUG_TB_CHECK
 #endif
 
+#ifdef DEBUG_TB_CHECK
+#define DEBUG_TB_CHECK_GATE 1
+#else
+#define DEBUG_TB_CHECK_GATE 0
+#endif
+
 /* Access to the various translations structures need to be serialised via 
locks
  * for consistency. This is automatic for SoftMMU based system
  * emulation due to its single threaded nature. In user-mode emulation
@@ -950,7 +956,13 @@ void tb_flush(CPUState *cpu)
 }
 }
 
-#ifdef DEBUG_TB_CHECK
+/*
+ * Formerly ifdef DEBUG_TB_CHECK. These debug functions are user-mode-only,
+ * so in order to prevent bit rot we compile them unconditionally in user-mode,
+ * and let the optimizer get rid of them by wrapping their user-only callers
+ * with if (DEBUG_TB_CHECK_GATE).
+ */
+#ifdef CONFIG_USER_ONLY
 
 static void
 do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp)
@@ -994,7 +1006,7 @@ static void tb_page_check(void)
 qht_iter(_ctx.tb_ctx.htable, do_tb_page_check, NULL);
 }
 
-#endif
+#endif /* CONFIG_USER_ONLY */
 
 static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlock *tb)
 {
@@ -1238,8 +1250,10 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
  tb->trace_vcpu_dstate);
 qht_insert(_ctx.tb_ctx.htable, tb, h);
 
-#ifdef DEBUG_TB_CHECK
-tb_page_check();
+#ifdef CONFIG_USER_ONLY
+if (DEBUG_TB_CHECK_GATE) {
+tb_page_check();
+}
 #endif
 }
 
@@ -2209,8 +2223,10 @@ int page_unprotect(target_ulong address, uintptr_t pc)
 /* and since the content will be modified, we must invalidate
the corresponding translated code. */
 current_tb_invalidated |= tb_invalidate_phys_page(addr, pc);
-#ifdef DEBUG_TB_CHECK
-tb_invalidate_check(addr);
+#ifdef CONFIG_USER_ONLY
+if (DEBUG_TB_CHECK_GATE) {
+tb_invalidate_check(addr);
+}
 #endif
 }
 mprotect((void *)g2h(host_start), qemu_host_page_size,
-- 
2.7.4




[Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

Note that having n_tcg_ctxs is unnecessary. However, it is
convenient to have it, since it will simplify iterating over the
array: we'll have just a for loop instead of having to iterate
over a NULL-terminated array (which would require n+1 elems)
or having to check with ifdef's for usermode/softmmu.

Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index f907c47..2217314 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 static void tcg_out_tb_init(TCGContext *s);
 static bool tcg_out_tb_finalize(TCGContext *s);
 
+static TCGContext **tcg_ctxs;
+static unsigned int n_tcg_ctxs;
 
 static TCGRegSet tcg_target_available_regs[2];
 static TCGRegSet tcg_target_call_clobber_regs;
@@ -382,6 +384,8 @@ void tcg_context_init(TCGContext *s)
 }
 
 tcg_ctx = s;
+tcg_ctxs = _ctx;
+n_tcg_ctxs = 1;
 }
 
 /*
-- 
2.7.4




[Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove

2017-07-19 Thread Emilio G. Cota
We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.

Suggested-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   | 2 +-
 accel/tcg/cpu-exec.c  | 2 +-
 accel/tcg/translate-all.c | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index eb3eb7b..7bc2050 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -428,7 +428,7 @@ static inline uint32_t curr_cflags(void)
 return parallel_cpus ? CF_PARALLEL : 0;
 }
 
-void tb_free(TranslationBlock *tb);
+void tb_remove(TranslationBlock *tb);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index cb1e6d3..1963bda 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -218,7 +218,7 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
 tb_lock();
 tb_phys_invalidate(tb, -1);
-tb_free(tb);
+tb_remove(tb);
 tb_unlock();
 }
 #endif
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index cb71aef..448f13b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -375,7 +375,7 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
 if (tb->cflags & CF_NOCACHE) {
 /* one-shot translation, invalidate it immediately */
 tb_phys_invalidate(tb, -1);
-tb_free(tb);
+tb_remove(tb);
 }
 r = true;
 }
@@ -874,7 +874,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 }
 
 /* Called with tb_lock held.  */
-void tb_free(TranslationBlock *tb)
+void tb_remove(TranslationBlock *tb)
 {
 assert_tb_locked();
 
@@ -1809,7 +1809,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
  * cpu_exec_nocache() */
 tb_phys_invalidate(tb->orig_tb, -1);
 }
-tb_free(tb);
+tb_remove(tb);
 }
 /* FIXME: In theory this could raise an exception.  In practice
we have already translated the block once so it's probably ok.  */
-- 
2.7.4




[Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size

2017-07-19 Thread Emilio G. Cota
Since commit 6e3b2bfd6 ("tcg: allocate TB structs before the
corresponding translated code") we are not fully utilizing
code_gen_buffer for translated code, and therefore are
incorrectly reporting the amount of translated code as well as
the average host TB size. Address this by:

- Making the conscious choice of misreporting the total translated code;
  doing otherwise would mislead users into thinking "-tb-size" is not
  honoured.

- Expanding tb_tree_stats to accurately count the bytes of translated code on
  the host, and using this for reporting the average tb host size,
  as well as the expansion ratio.

In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 448f13b..d50e2b9 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -923,6 +923,15 @@ static void page_flush_tb(void)
 }
 }
 
+static gboolean tb_host_size_iter(gpointer key, gpointer value, gpointer data)
+{
+const TranslationBlock *tb = value;
+size_t *size = data;
+
+*size += tb->tc.size;
+return false;
+}
+
 /* flush all the translation blocks */
 static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
 {
@@ -937,11 +946,12 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 
 if (DEBUG_TB_FLUSH_GATE) {
 size_t nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
+size_t host_size = 0;
 
-printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%td\n",
+g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_host_size_iter, _size);
+printf("qemu: flush code_size=%td nb_tbs=%zu avg_tb_size=%zu\n",
tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer, nb_tbs,
-   nb_tbs > 0 ?
-   (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) / nb_tbs : 0);
+   nb_tbs > 0 ? host_size / nb_tbs : 0);
 }
 if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
 > tcg_ctx.code_gen_buffer_size) {
@@ -1883,6 +1893,7 @@ static void print_qht_statistics(FILE *f, 
fprintf_function cpu_fprintf,
 }
 
 struct tb_tree_stats {
+size_t host_size;
 size_t target_size;
 size_t max_target_size;
 size_t direct_jmp_count;
@@ -1895,6 +1906,7 @@ static gboolean tb_tree_stats_iter(gpointer key, gpointer 
value, gpointer data)
 const TranslationBlock *tb = value;
 struct tb_tree_stats *tst = data;
 
+tst->host_size += tb->tc.size;
 tst->target_size += tb->size;
 if (tb->size > tst->max_target_size) {
 tst->max_target_size = tb->size;
@@ -1923,6 +1935,11 @@ void dump_exec_info(FILE *f, fprintf_function 
cpu_fprintf)
 g_tree_foreach(tcg_ctx.tb_ctx.tb_tree, tb_tree_stats_iter, );
 /* XXX: avoid using doubles ? */
 cpu_fprintf(f, "Translation buffer state:\n");
+/*
+ * Report total code size including the padding and TB structs;
+ * otherwise users might think "-tb-size" is not honoured.
+ * For avg host size we use the precise numbers from tb_tree_stats though.
+ */
 cpu_fprintf(f, "gen code size   %td/%zd\n",
 tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
 tcg_ctx.code_gen_highwater - tcg_ctx.code_gen_buffer);
@@ -1930,12 +1947,9 @@ void dump_exec_info(FILE *f, fprintf_function 
cpu_fprintf)
 cpu_fprintf(f, "TB avg target size  %zu max=%zu bytes\n",
 nb_tbs ? tst.target_size / nb_tbs : 0,
 tst.max_target_size);
-cpu_fprintf(f, "TB avg host size%td bytes (expansion ratio: %0.1f)\n",
-nb_tbs ? (tcg_ctx.code_gen_ptr -
-  tcg_ctx.code_gen_buffer) / nb_tbs : 0,
-tst.target_size ? (double) (tcg_ctx.code_gen_ptr -
-tcg_ctx.code_gen_buffer) /
-tst.target_size : 0);
+cpu_fprintf(f, "TB avg host size%zu bytes (expansion ratio: %0.1f)\n",
+nb_tbs ? tst.host_size / nb_tbs : 0,
+tst.target_size ? (double)tst.host_size / tst.target_size : 0);
 cpu_fprintf(f, "cross page TB count %zu (%zu%%)\n", tst.cross_page,
 nb_tbs ? (tst.cross_page * 100) / nb_tbs : 0);
 cpu_fprintf(f, "direct jump count   %zu (%zu%%) (2 jumps=%zu %zu%%)\n",
-- 
2.7.4




[Qemu-devel] [PATCH v3 18/43] target/sh4: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Signed-off-by: Emilio G. Cota 
---
 target/sh4/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9fcaefd..52fabb3 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -528,7 +528,7 @@ static void _decode_opc(DisasContext * ctx)
 /* Detect the start of a gUSA region.  If so, update envflags
and end the TB.  This will allow us to see the end of the
region (stored in R0) in the next TB.  */
-if (B11_8 == 15 && B7_0s < 0 && parallel_cpus) {
+if (B11_8 == 15 && B7_0s < 0 && (tb_cflags(ctx->tb) & CF_PARALLEL)) {
 ctx->envflags = deposit32(ctx->envflags, GUSA_SHIFT, 8, B7_0s);
 ctx->bstate = BS_STOP;
 }
-- 
2.7.4




[Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic

2017-07-19 Thread Emilio G. Cota
Now that all code generation has been converted to check CF_PARALLEL, we can
generate !CF_PARALLEL code without having yet set !parallel_cpus --
and therefore without having to be in the exclusive region during
cpu_exec_step_atomic.

While at it, merge cpu_exec_step into cpu_exec_step_atomic.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cpu-exec.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index b71e015..526cab3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -223,30 +223,40 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
 }
 #endif
 
-static void cpu_exec_step(CPUState *cpu)
+void cpu_exec_step_atomic(CPUState *cpu)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags;
 uint32_t cflags = 1 | CF_IGNORE_ICOUNT;
+uint32_t cf_mask = cflags & CF_HASH_MASK;
 
 if (sigsetjmp(cpu->jmp_env, 0) == 0) {
-tb = tb_lookup__cpu_state(cpu, , _base, ,
-  cflags & CF_HASH_MASK);
+tb = tb_lookup__cpu_state(cpu, , _base, , cf_mask);
 if (tb == NULL) {
 mmap_lock();
 tb_lock();
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
+if (likely(tb == NULL)) {
+tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
+}
 tb_unlock();
 mmap_unlock();
 }
 
+start_exclusive();
+
+/* Since we got here, we know that parallel_cpus must be true.  */
+parallel_cpus = false;
 cc->cpu_exec_enter(cpu);
 /* execute the generated code */
 trace_exec_tb(tb, pc);
 cpu_tb_exec(cpu, tb);
 cc->cpu_exec_exit(cpu);
+parallel_cpus = true;
+
+end_exclusive();
 } else {
 /* We may have exited due to another problem here, so we need
  * to reset any tb_locks we may have taken but didn't release.
@@ -260,18 +270,6 @@ static void cpu_exec_step(CPUState *cpu)
 }
 }
 
-void cpu_exec_step_atomic(CPUState *cpu)
-{
-start_exclusive();
-
-/* Since we got here, we know that parallel_cpus must be true.  */
-parallel_cpus = false;
-cpu_exec_step(cpu);
-parallel_cpus = true;
-
-end_exclusive();
-}
-
 struct tb_desc {
 target_ulong pc;
 target_ulong cs_base;
-- 
2.7.4




[Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation

2017-07-19 Thread Emilio G. Cota
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 87b1b74..69c1b36 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -370,7 +370,7 @@ struct TranslationBlock {
 /* The following data are used to directly call another TB from
  * the code of this one. This can be done either by emitting direct or
  * indirect native jump instructions. These jumps are reset so that the TB
- * just continue its execution. The TB can be linked to another one by
+ * just continues its execution. The TB can be linked to another one by
  * setting one of the jump targets (or patching the jump instruction). Only
  * two of such jumps are supported.
  */
@@ -381,7 +381,7 @@ struct TranslationBlock {
 #else
 uintptr_t jmp_target_addr[2]; /* target address for indirect jump */
 #endif
-/* Each TB has an assosiated circular list of TBs jumping to this one.
+/* Each TB has an associated circular list of TBs jumping to this one.
  * jmp_list_first points to the first TB jumping to this one.
  * jmp_list_next is used to point to the next TB in a list.
  * Since each TB can have two jumps, it can participate in two lists.
-- 
2.7.4




[Qemu-devel] [PATCH v3 20/43] tcg: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

The tb->cflags field is not passed to tcg generation functions. So
we add a bit to TCGContext, storing there whether CF_PARALLEL is set
before translating every TB.

Most architectures have <= 32 registers, which results in a 4-byte hole
in TCGContext. Use this hole for the bit we need, which we store in a bool.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.h |  1 +
 accel/tcg/translate-all.c |  1 +
 tcg/tcg-op.c  | 10 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 96872f8..9b6dade 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -656,6 +656,7 @@ struct TCGContext {
 uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP 
*/
 
 TCGRegSet reserved_regs;
+bool cf_parallel; /* whether CF_PARALLEL is set in tb->cflags */
 intptr_t current_frame_offset;
 intptr_t frame_start;
 intptr_t frame_end;
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 600c0a1..645bc70 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1271,6 +1271,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tb->flags = flags;
 tb->cflags = cflags;
 tb->trace_vcpu_dstate = *cpu->trace_dstate;
+tcg_ctx.cf_parallel = !!(cflags & CF_PARALLEL);
 
 #ifdef CONFIG_PROFILER
 tcg_ctx.tb_count1++; /* includes aborted translations because of
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 205d07f..ef420d4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -150,7 +150,7 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, 
TCGArg a2,
 
 void tcg_gen_mb(TCGBar mb_type)
 {
-if (parallel_cpus) {
+if (tcg_ctx.cf_parallel) {
 tcg_gen_op1(_ctx, INDEX_op_mb, mb_type);
 }
 }
@@ -2794,7 +2794,7 @@ void tcg_gen_atomic_cmpxchg_i32(TCGv_i32 retv, TCGv addr, 
TCGv_i32 cmpv,
 {
 memop = tcg_canonicalize_memop(memop, 0, 0);
 
-if (!parallel_cpus) {
+if (!tcg_ctx.cf_parallel) {
 TCGv_i32 t1 = tcg_temp_new_i32();
 TCGv_i32 t2 = tcg_temp_new_i32();
 
@@ -2838,7 +2838,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, 
TCGv_i64 cmpv,
 {
 memop = tcg_canonicalize_memop(memop, 1, 0);
 
-if (!parallel_cpus) {
+if (!tcg_ctx.cf_parallel) {
 TCGv_i64 t1 = tcg_temp_new_i64();
 TCGv_i64 t2 = tcg_temp_new_i64();
 
@@ -3015,7 +3015,7 @@ static void * const table_##NAME[16] = {  
  \
 void tcg_gen_atomic_##NAME##_i32\
 (TCGv_i32 ret, TCGv addr, TCGv_i32 val, TCGArg idx, TCGMemOp memop) \
 {   \
-if (parallel_cpus) {\
+if (tcg_ctx.cf_parallel) {  \
 do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME); \
 } else {\
 do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,\
@@ -3025,7 +3025,7 @@ void tcg_gen_atomic_##NAME##_i32  
  \
 void tcg_gen_atomic_##NAME##_i64\
 (TCGv_i64 ret, TCGv addr, TCGv_i64 val, TCGArg idx, TCGMemOp memop) \
 {   \
-if (parallel_cpus) {\
+if (tcg_ctx.cf_parallel) {  \
 do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME); \
 } else {\
 do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,\
-- 
2.7.4




[Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE

2017-07-19 Thread Emilio G. Cota
This gets rid of an ifdef check while ensuring that the debug code
is compiled, which prevents bit rot.

Suggested-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index c4c23f9..962e9b3 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -65,6 +65,12 @@
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
+#ifdef DEBUG_TB_INVALIDATE
+#define DEBUG_TB_INVALIDATE_GATE 1
+#else
+#define DEBUG_TB_INVALIDATE_GATE 0
+#endif
+
 #ifdef DEBUG_TB_FLUSH
 #define DEBUG_TB_FLUSH_GATE 1
 #else
@@ -1193,9 +1199,9 @@ static inline void tb_alloc_page(TranslationBlock *tb,
   }
 mprotect(g2h(page_addr), qemu_host_page_size,
  (prot & PAGE_BITS) & ~PAGE_WRITE);
-#ifdef DEBUG_TB_INVALIDATE
-printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", page_addr);
-#endif
+if (DEBUG_TB_INVALIDATE_GATE) {
+printf("protecting code page: 0x" TB_PAGE_ADDR_FMT "\n", 
page_addr);
+}
 }
 #else
 /* if some code is already present, then the pages are already
-- 
2.7.4




[Qemu-devel] [PATCH v3 14/43] target/hppa: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 target/hppa/helper.h|  2 ++
 target/hppa/op_helper.c | 32 
 target/hppa/translate.c | 12 ++--
 3 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 789f07f..0a6b900 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -3,7 +3,9 @@ DEF_HELPER_FLAGS_2(tsv, TCG_CALL_NO_WG, void, env, tl)
 DEF_HELPER_FLAGS_2(tcond, TCG_CALL_NO_WG, void, env, tl)
 
 DEF_HELPER_FLAGS_3(stby_b, TCG_CALL_NO_WG, void, env, tl, tl)
+DEF_HELPER_FLAGS_3(stby_b_parallel, TCG_CALL_NO_WG, void, env, tl, tl)
 DEF_HELPER_FLAGS_3(stby_e, TCG_CALL_NO_WG, void, env, tl, tl)
+DEF_HELPER_FLAGS_3(stby_e_parallel, TCG_CALL_NO_WG, void, env, tl, tl)
 
 DEF_HELPER_FLAGS_1(probe_r, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_1(probe_w, TCG_CALL_NO_RWG_SE, tl, tl)
diff --git a/target/hppa/op_helper.c b/target/hppa/op_helper.c
index c05c0d5..3104404 100644
--- a/target/hppa/op_helper.c
+++ b/target/hppa/op_helper.c
@@ -76,7 +76,8 @@ static void atomic_store_3(CPUHPPAState *env, target_ulong 
addr, uint32_t val,
 #endif
 }
 
-void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+static void do_stby_b(CPUHPPAState *env, target_ulong addr, target_ulong val,
+  bool parallel)
 {
 uintptr_t ra = GETPC();
 
@@ -89,7 +90,7 @@ void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, 
target_ulong val)
 break;
 case 1:
 /* The 3 byte store must appear atomic.  */
-if (parallel_cpus) {
+if (parallel) {
 atomic_store_3(env, addr, val, 0x00ffu, ra);
 } else {
 cpu_stb_data_ra(env, addr, val >> 16, ra);
@@ -102,14 +103,26 @@ void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, 
target_ulong val)
 }
 }
 
-void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+void HELPER(stby_b)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+{
+do_stby_b(env, addr, val, false);
+}
+
+void HELPER(stby_b_parallel)(CPUHPPAState *env, target_ulong addr,
+ target_ulong val)
+{
+do_stby_b(env, addr, val, true);
+}
+
+static void do_stby_e(CPUHPPAState *env, target_ulong addr, target_ulong val,
+  bool parallel)
 {
 uintptr_t ra = GETPC();
 
 switch (addr & 3) {
 case 3:
 /* The 3 byte store must appear atomic.  */
-if (parallel_cpus) {
+if (parallel) {
 atomic_store_3(env, addr - 3, val, 0xff00u, ra);
 } else {
 cpu_stw_data_ra(env, addr - 3, val >> 16, ra);
@@ -132,6 +145,17 @@ void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, 
target_ulong val)
 }
 }
 
+void HELPER(stby_e)(CPUHPPAState *env, target_ulong addr, target_ulong val)
+{
+do_stby_e(env, addr, val, false);
+}
+
+void HELPER(stby_e_parallel)(CPUHPPAState *env, target_ulong addr,
+ target_ulong val)
+{
+do_stby_e(env, addr, val, true);
+}
+
 target_ulong HELPER(probe_r)(target_ulong addr)
 {
 return page_check_range(addr, 1, PAGE_READ);
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 1effe82..66aa11d 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2309,9 +2309,17 @@ static ExitStatus trans_stby(DisasContext *ctx, uint32_t 
insn,
 val = load_gpr(ctx, rt);
 
 if (a) {
-gen_helper_stby_e(cpu_env, addr, val);
+if (tb_cflags(ctx->tb) & CF_PARALLEL) {
+gen_helper_stby_e_parallel(cpu_env, addr, val);
+} else {
+gen_helper_stby_e(cpu_env, addr, val);
+}
 } else {
-gen_helper_stby_b(cpu_env, addr, val);
+if (tb_cflags(ctx->tb) & CF_PARALLEL) {
+gen_helper_stby_b_parallel(cpu_env, addr, val);
+} else {
+gen_helper_stby_b(cpu_env, addr, val);
+}
 }
 
 if (m) {
-- 
2.7.4




[Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 target/arm/helper-a64.h|  4 
 target/arm/helper-a64.c| 38 --
 target/arm/op_helper.c |  7 ---
 target/arm/translate-a64.c | 31 +--
 target/arm/translate.c |  9 +++--
 5 files changed, 68 insertions(+), 21 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 6f9eaba..85d8674 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -43,4 +43,8 @@ DEF_HELPER_FLAGS_2(fcvtx_f64_to_f32, TCG_CALL_NO_RWG, f32, 
f64, env)
 DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_le_parallel, TCG_CALL_NO_WG,
+   i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, 
i64)
+DEF_HELPER_FLAGS_4(paired_cmpxchg64_be_parallel, TCG_CALL_NO_WG,
+   i64, env, i64, i64, i64)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index d9df82c..d0e435c 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -430,8 +430,9 @@ uint64_t HELPER(crc32c_64)(uint64_t acc, uint64_t val, 
uint32_t bytes)
 }
 
 /* Returns 0 on success; 1 otherwise.  */
-uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
- uint64_t new_lo, uint64_t new_hi)
+static uint64_t do_paired_cmpxchg64_le(CPUARMState *env, uint64_t addr,
+   uint64_t new_lo, uint64_t new_hi,
+   bool parallel)
 {
 uintptr_t ra = GETPC();
 Int128 oldv, cmpv, newv;
@@ -440,7 +441,7 @@ uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, 
uint64_t addr,
 cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
 newv = int128_make128(new_lo, new_hi);
 
-if (parallel_cpus) {
+if (parallel) {
 #ifndef CONFIG_ATOMIC128
 cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -484,8 +485,21 @@ uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, 
uint64_t addr,
 return !success;
 }
 
-uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
- uint64_t new_lo, uint64_t new_hi)
+uint64_t HELPER(paired_cmpxchg64_le)(CPUARMState *env, uint64_t addr,
+  uint64_t new_lo, uint64_t new_hi)
+{
+return do_paired_cmpxchg64_le(env, addr, new_lo, new_hi, false);
+}
+
+uint64_t HELPER(paired_cmpxchg64_le_parallel)(CPUARMState *env, uint64_t addr,
+  uint64_t new_lo, uint64_t new_hi)
+{
+return do_paired_cmpxchg64_le(env, addr, new_lo, new_hi, true);
+}
+
+static uint64_t do_paired_cmpxchg64_be(CPUARMState *env, uint64_t addr,
+   uint64_t new_lo, uint64_t new_hi,
+   bool parallel)
 {
 uintptr_t ra = GETPC();
 Int128 oldv, cmpv, newv;
@@ -494,7 +508,7 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, 
uint64_t addr,
 cmpv = int128_make128(env->exclusive_val, env->exclusive_high);
 newv = int128_make128(new_lo, new_hi);
 
-if (parallel_cpus) {
+if (parallel) {
 #ifndef CONFIG_ATOMIC128
 cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
 #else
@@ -537,3 +551,15 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, 
uint64_t addr,
 
 return !success;
 }
+
+uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
+ uint64_t new_lo, uint64_t new_hi)
+{
+return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, false);
+}
+
+uint64_t HELPER(paired_cmpxchg64_be_parallel)(CPUARMState *env, uint64_t addr,
+ uint64_t new_lo, uint64_t new_hi)
+{
+return do_paired_cmpxchg64_be(env, addr, new_lo, new_hi, true);
+}
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index 2a85666..a28f254 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -450,13 +450,6 @@ void HELPER(yield)(CPUARMState *env)
 ARMCPU *cpu = arm_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
 
-/* When running in MTTCG we don't generate jumps to the yield and
- * WFE helpers as it won't affect the scheduling of other vCPUs.
- * If we wanted to more completely model WFE/SEV so we don't busy
- * spin unnecessarily we would need to do something more involved.
- */
-g_assert(!parallel_cpus);
-
 /* This is a non-trappable hint instruction that generally indicates
  * that the guest is currently busy-looping. Yield control back to the
  * top level loop so that 

[Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE

2017-07-19 Thread Emilio G. Cota
This gets rid of some ifdef checks while ensuring that the debug code
is compiled, which prevents bit rot.

Suggested-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 645bc70..c1cd258 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -65,6 +65,12 @@
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
+#ifdef DEBUG_TB_FLUSH
+#define DEBUG_TB_FLUSH_GATE 1
+#else
+#define DEBUG_TB_FLUSH_GATE 0
+#endif
+
 #if !defined(CONFIG_USER_ONLY)
 /* TB consistency checks only implemented for usermode emulation.  */
 #undef DEBUG_TB_CHECK
@@ -899,13 +905,13 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 goto done;
 }
 
-#if defined(DEBUG_TB_FLUSH)
-printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
-   (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
-   tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
-   ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)) /
-   tcg_ctx.tb_ctx.nb_tbs : 0);
-#endif
+if (DEBUG_TB_FLUSH_GATE) {
+printf("qemu: flush code_size=%td nb_tbs=%d avg_tb_size=%td\n",
+   tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer,
+   tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
+   (tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer) /
+   tcg_ctx.tb_ctx.nb_tbs : 0);
+}
 if ((unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer)
 > tcg_ctx.code_gen_buffer_size) {
 cpu_abort(cpu, "Internal error: code buffer overflow\n");
-- 
2.7.4




[Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags

2017-07-19 Thread Emilio G. Cota
This gets rid of a hole in struct TranslationBlock.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 include/exec/exec-all.h   | 3 +--
 include/exec/tb-lookup.h  | 2 +-
 accel/tcg/cpu-exec.c  | 4 ++--
 accel/tcg/translate-all.c | 3 +--
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 69c1b36..256b9a6 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -352,12 +352,11 @@ struct TranslationBlock {
 #define CF_NOCACHE 0x1 /* To be freed after execution */
 #define CF_USE_ICOUNT  0x2
 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */
+#define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */
 
 /* Per-vCPU dynamic tracing state used to generate this TB */
 uint32_t trace_vcpu_dstate;
 
-uint16_t invalid;
-
 void *tc_ptr;/* pointer to the translated code */
 uint8_t *tc_search;  /* pointer to search data */
 /* original tb when cflags has CF_NOCACHE */
diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
index 9d32cb0..436b6d5 100644
--- a/include/exec/tb-lookup.h
+++ b/include/exec/tb-lookup.h
@@ -35,7 +35,7 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, 
target_ulong *cs_base,
tb->cs_base == *cs_base &&
tb->flags == *flags &&
tb->trace_vcpu_dstate == *cpu->trace_dstate &&
-   !atomic_read(>invalid))) {
+   !(atomic_read(>cflags) & CF_INVALID))) {
 return tb;
 }
 tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags);
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 5d2ee5b..fae8c40 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -294,7 +294,7 @@ static bool tb_cmp(const void *p, const void *d)
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
 tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
-!atomic_read(>invalid)) {
+!(atomic_read(>cflags) & CF_INVALID)) {
 /* check next page if needed */
 if (tb->page_addr[1] == -1) {
 return true;
@@ -377,7 +377,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 acquired_tb_lock = true;
 }
-if (!tb->invalid) {
+if (!(tb->cflags & CF_INVALID)) {
 tb_add_jump(last_tb, tb_exit, tb);
 }
 }
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a124181..7ef4f19 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1073,7 +1073,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 
 assert_tb_locked();
 
-atomic_set(>invalid, true);
+atomic_set(>cflags, tb->cflags | CF_INVALID);
 
 /* remove the TB from the hash list */
 phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
@@ -1269,7 +1269,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tb->flags = flags;
 tb->cflags = cflags;
 tb->trace_vcpu_dstate = *cpu->trace_dstate;
-tb->invalid = false;
 
 #ifdef CONFIG_PROFILER
 tcg_ctx.tb_count1++; /* includes aborted translations because of
-- 
2.7.4




[Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush

2017-07-19 Thread Emilio G. Cota
Whenever there is an overflow in code_gen_buffer (e.g. we run out
of space in it and have to flush it), the code_time profiling counter
ends up with an invalid value (that is, code_time -= profile_getclock(),
without later on getting += profile_getclock() due to the goto).

Fix it by using the ti variable, so that we only update code_time
when there is no overflow. Note that in case there is an overflow
we fail to account for the elapsed coding time, but this is quite rare
so we can probably live with it.

"info jit" before/after, roughly at the same time during debian-arm bootup:

- before:
Statistics:
TB flush count  1
TB invalidate count 4665
TLB flush count 998
JIT cycles  -615191529184601 (-256329.804 s at 2.4 GHz)
translated TBs  302310 (aborted=0 0.0%)
avg ops/TB  48.4 max=438
deleted ops/TB  8.54
avg temps/TB32.31 max=38
avg host code/TB361.5
avg search data/TB  24.5
cycles/op   -42014693.0
cycles/in byte  -121444900.2
cycles/out byte -5629031.1
cycles/search byte -83114481.0
  gen_interm time   -0.0%
  gen_code time 100.0%
optim./code time-0.0%
liveness/code time  -0.0%
cpu_restore count   6236
  avg cycles110.4

- after:
Statistics:
TB flush count  1
TB invalidate count 4665
TLB flush count 1010
JIT cycles  1996899624 (0.832 s at 2.4 GHz)
translated TBs  297961 (aborted=0 0.0%)
avg ops/TB  48.5 max=438
deleted ops/TB  8.56
avg temps/TB32.31 max=38
avg host code/TB361.8
avg search data/TB  24.5
cycles/op   138.2
cycles/in byte  398.4
cycles/out byte 18.5
cycles/search byte 273.1
  gen_interm time   14.0%
  gen_code time 86.0%
optim./code time19.4%
liveness/code time  10.3%
cpu_restore count   6372
  avg cycles111.0

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 3ee69e5..63f8538 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1300,7 +1300,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 #ifdef CONFIG_PROFILER
 tcg_ctx.tb_count++;
 tcg_ctx.interm_time += profile_getclock() - ti;
-tcg_ctx.code_time -= profile_getclock();
+ti = profile_getclock();
 #endif
 
 /* ??? Overflow could be handled better here.  In particular, we
@@ -1318,7 +1318,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 }
 
 #ifdef CONFIG_PROFILER
-tcg_ctx.code_time += profile_getclock();
+tcg_ctx.code_time += profile_getclock() - ti;
 tcg_ctx.code_in_len += tb->size;
 tcg_ctx.code_out_len += gen_code_size;
 tcg_ctx.search_out_len += search_size;
-- 
2.7.4




[Qemu-devel] [PATCH v3 16/43] target/m68k: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Signed-off-by: Emilio G. Cota 
---
 target/m68k/helper.h|  1 +
 target/m68k/op_helper.c | 33 -
 target/m68k/translate.c | 12 ++--
 3 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/target/m68k/helper.h b/target/m68k/helper.h
index 475a1f2..eebe52d 100644
--- a/target/m68k/helper.h
+++ b/target/m68k/helper.h
@@ -11,6 +11,7 @@ DEF_HELPER_2(set_sr, void, env, i32)
 DEF_HELPER_3(movec, void, env, i32, i32)
 DEF_HELPER_4(cas2w, void, env, i32, i32, i32)
 DEF_HELPER_4(cas2l, void, env, i32, i32, i32)
+DEF_HELPER_4(cas2l_parallel, void, env, i32, i32, i32)
 
 #define dh_alias_fp ptr
 #define dh_ctype_fp FPReg *
diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 7b5126c..6308951 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -361,6 +361,7 @@ void HELPER(divsll)(CPUM68KState *env, int numr, int regr, 
int32_t den)
 env->dregs[numr] = quot;
 }
 
+/* We're executing in a serial context -- no need to be atomic.  */
 void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
 {
 uint32_t Dc1 = extract32(regs, 9, 3);
@@ -374,17 +375,11 @@ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, 
uint32_t a1, uint32_t a2)
 int16_t l1, l2;
 uintptr_t ra = GETPC();
 
-if (parallel_cpus) {
-/* Tell the main loop we need to serialize this insn.  */
-cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
-} else {
-/* We're executing in a serial context -- no need to be atomic.  */
-l1 = cpu_lduw_data_ra(env, a1, ra);
-l2 = cpu_lduw_data_ra(env, a2, ra);
-if (l1 == c1 && l2 == c2) {
-cpu_stw_data_ra(env, a1, u1, ra);
-cpu_stw_data_ra(env, a2, u2, ra);
-}
+l1 = cpu_lduw_data_ra(env, a1, ra);
+l2 = cpu_lduw_data_ra(env, a2, ra);
+if (l1 == c1 && l2 == c2) {
+cpu_stw_data_ra(env, a1, u1, ra);
+cpu_stw_data_ra(env, a2, u2, ra);
 }
 
 if (c1 != l1) {
@@ -399,7 +394,8 @@ void HELPER(cas2w)(CPUM68KState *env, uint32_t regs, 
uint32_t a1, uint32_t a2)
 env->dregs[Dc2] = deposit32(env->dregs[Dc2], 0, 16, l2);
 }
 
-void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
+static void do_cas2l(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t 
a2,
+ bool parallel)
 {
 uint32_t Dc1 = extract32(regs, 9, 3);
 uint32_t Dc2 = extract32(regs, 6, 3);
@@ -416,7 +412,7 @@ void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, 
uint32_t a1, uint32_t a2)
 TCGMemOpIdx oi;
 #endif
 
-if (parallel_cpus) {
+if (parallel) {
 /* We're executing in a parallel context -- must be atomic.  */
 #ifdef CONFIG_ATOMIC64
 uint64_t c, u, l;
@@ -470,6 +466,17 @@ void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, 
uint32_t a1, uint32_t a2)
 env->dregs[Dc2] = l2;
 }
 
+void HELPER(cas2l)(CPUM68KState *env, uint32_t regs, uint32_t a1, uint32_t a2)
+{
+do_cas2l(env, regs, a1, a2, false);
+}
+
+void HELPER(cas2l_parallel)(CPUM68KState *env, uint32_t regs, uint32_t a1,
+uint32_t a2)
+{
+do_cas2l(env, regs, a1, a2, true);
+}
+
 struct bf_data {
 uint32_t addr;
 uint32_t bofs;
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 188520b..65044be 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -2308,7 +2308,11 @@ DISAS_INSN(cas2w)
  (REG(ext1, 6) << 3) |
  (REG(ext2, 0) << 6) |
  (REG(ext1, 0) << 9));
-gen_helper_cas2w(cpu_env, regs, addr1, addr2);
+if (tb_cflags(s->tb) & CF_PARALLEL) {
+gen_helper_exit_atomic(cpu_env);
+} else {
+gen_helper_cas2w(cpu_env, regs, addr1, addr2);
+}
 tcg_temp_free(regs);
 
 /* Note that cas2w also assigned to env->cc_op.  */
@@ -2354,7 +2358,11 @@ DISAS_INSN(cas2l)
  (REG(ext1, 6) << 3) |
  (REG(ext2, 0) << 6) |
  (REG(ext1, 0) << 9));
-gen_helper_cas2l(cpu_env, regs, addr1, addr2);
+if (tb_cflags(s->tb) & CF_PARALLEL) {
+gen_helper_cas2l_parallel(cpu_env, regs, addr1, addr2);
+} else {
+gen_helper_cas2l(cpu_env, regs, addr1, addr2);
+}
 tcg_temp_free(regs);
 
 /* Note that cas2l also assigned to env->cc_op.  */
-- 
2.7.4




[Qemu-devel] [PATCH v3 19/43] target/sparc: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 target/sparc/translate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 39d8494..768ce68 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2450,7 +2450,7 @@ static void gen_ldstub_asi(DisasContext *dc, TCGv dst, 
TCGv addr, int insn)
 default:
 /* ??? In theory, this should be raise DAE_invalid_asi.
But the SS-20 roms do ldstuba [%l0] #ASI_M_CTL, %o1.  */
-if (parallel_cpus) {
+if (tb_cflags(dc->tb) & CF_PARALLEL) {
 gen_helper_exit_atomic(cpu_env);
 } else {
 TCGv_i32 r_asi = tcg_const_i32(da.asi);
-- 
2.7.4




[Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG

2017-07-19 Thread Emilio G. Cota
Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried
the increment of tlb_flush_count under TLB_DEBUG. This results in
"info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG.

Besides, under MTTCG tlb_flush_count is updated by several threads,
so in order not to lose counts we'd either have to use atomic ops
or distribute the counter, which is more scalable.

This patch does the latter by embedding tlb_flush_count in CPUArchState.
The global count is then easily obtained by iterating over the CPU list.

Note that this change also requires updating the accessors to
tlb_flush_count to use atomic_read/set whenever there may be conflicting
accesses (as defined in C11) to it.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 include/exec/cpu-defs.h   |  1 +
 include/exec/cputlb.h |  3 +--
 accel/tcg/cputlb.c| 17 ++---
 accel/tcg/translate-all.c |  2 +-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index bc8e7f8..e43ff83 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -137,6 +137,7 @@ typedef struct CPUIOTLBEntry {
 CPUTLBEntry tlb_v_table[NB_MMU_MODES][CPU_VTLB_SIZE];   \
 CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];\
 CPUIOTLBEntry iotlb_v[NB_MMU_MODES][CPU_VTLB_SIZE]; \
+size_t tlb_flush_count; \
 target_ulong tlb_flush_addr;\
 target_ulong tlb_flush_mask;\
 target_ulong vtlb_index;\
diff --git a/include/exec/cputlb.h b/include/exec/cputlb.h
index 3f94178..c91db21 100644
--- a/include/exec/cputlb.h
+++ b/include/exec/cputlb.h
@@ -23,7 +23,6 @@
 /* cputlb.c */
 void tlb_protect_code(ram_addr_t ram_addr);
 void tlb_unprotect_code(ram_addr_t ram_addr);
-extern int tlb_flush_count;
-
+size_t tlb_flush_count(void);
 #endif
 #endif
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 85635ae..9377110 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -92,8 +92,18 @@ static void flush_all_helper(CPUState *src, run_on_cpu_func 
fn,
 }
 }
 
-/* statistics */
-int tlb_flush_count;
+size_t tlb_flush_count(void)
+{
+CPUState *cpu;
+size_t count = 0;
+
+CPU_FOREACH(cpu) {
+CPUArchState *env = cpu->env_ptr;
+
+count += atomic_read(>tlb_flush_count);
+}
+return count;
+}
 
 /* This is OK because CPU architectures generally permit an
  * implementation to drop entries from the TLB at any time, so
@@ -112,7 +122,8 @@ static void tlb_flush_nocheck(CPUState *cpu)
 }
 
 assert_cpu_is_self(cpu);
-tlb_debug("(count: %d)\n", tlb_flush_count++);
+atomic_set(>tlb_flush_count, env->tlb_flush_count + 1);
+tlb_debug("(count: %zu)\n", tlb_flush_count());
 
 tb_lock();
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 090ebad..3ee69e5 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1916,7 +1916,7 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 atomic_read(_ctx.tb_ctx.tb_flush_count));
 cpu_fprintf(f, "TB invalidate count %d\n",
 tcg_ctx.tb_ctx.tb_phys_invalidate_count);
-cpu_fprintf(f, "TLB flush count %d\n", tlb_flush_count);
+cpu_fprintf(f, "TLB flush count %zu\n", tlb_flush_count());
 tcg_dump_info(f, cpu_fprintf);
 
 tb_unlock();
-- 
2.7.4




[Qemu-devel] [PATCH v3 15/43] target/i386: check CF_PARALLEL instead of parallel_cpus

2017-07-19 Thread Emilio G. Cota
Thereby decoupling the resulting translated code from the current state
of the system.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 target/i386/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index f046ffa..0f38a48 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -5263,7 +5263,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 if (!(s->cpuid_ext_features & CPUID_EXT_CX16))
 goto illegal_op;
 gen_lea_modrm(env, s, modrm);
-if ((s->prefix & PREFIX_LOCK) && parallel_cpus) {
+if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->tb) & CF_PARALLEL)) 
{
 gen_helper_cmpxchg16b(cpu_env, cpu_A0);
 } else {
 gen_helper_cmpxchg16b_unlocked(cpu_env, cpu_A0);
@@ -5274,7 +5274,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 if (!(s->cpuid_features & CPUID_CX8))
 goto illegal_op;
 gen_lea_modrm(env, s, modrm);
-if ((s->prefix & PREFIX_LOCK) && parallel_cpus) {
+if ((s->prefix & PREFIX_LOCK) && (tb_cflags(s->tb) & CF_PARALLEL)) 
{
 gen_helper_cmpxchg8b(cpu_env, cpu_A0);
 } else {
 gen_helper_cmpxchg8b_unlocked(cpu_env, cpu_A0);
-- 
2.7.4




[Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr

2017-07-19 Thread Emilio G. Cota
It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.

This change paves the way to having a common "tb_lookup" function.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 tcg/tcg-op.h   |  4 ++--
 tcg/tcg-runtime.h  |  2 +-
 target/alpha/translate.c   |  2 +-
 target/arm/translate-a64.c |  4 ++--
 target/arm/translate.c |  5 +
 target/hppa/translate.c|  6 +++---
 target/i386/translate.c| 17 +
 target/mips/translate.c|  4 ++--
 target/s390x/translate.c   |  2 +-
 target/sh4/translate.c |  4 ++--
 tcg/tcg-op.c   |  4 ++--
 tcg/tcg-runtime.c  | 20 ++--
 12 files changed, 32 insertions(+), 42 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 5d3278f..18d01b2 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -797,7 +797,7 @@ static inline void tcg_gen_exit_tb(uintptr_t val)
 void tcg_gen_goto_tb(unsigned idx);
 
 /**
- * tcg_gen_lookup_and_goto_ptr() - look up a TB and jump to it if valid
+ * tcg_gen_lookup_and_goto_ptr() - look up the current TB, jump to it if valid
  * @addr: Guest address of the target TB
  *
  * If the TB is not valid, jump to the epilogue.
@@ -805,7 +805,7 @@ void tcg_gen_goto_tb(unsigned idx);
  * This operation is optional. If the TCG backend does not implement goto_ptr,
  * this op is equivalent to calling tcg_gen_exit_tb() with 0 as the argument.
  */
-void tcg_gen_lookup_and_goto_ptr(TCGv addr);
+void tcg_gen_lookup_and_goto_ptr(void);
 
 #if TARGET_LONG_BITS == 32
 #define tcg_temp_new() tcg_temp_new_i32()
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index c41d38a..1df17d0 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -24,7 +24,7 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
-DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
+DEF_HELPER_FLAGS_1(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env)
 
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 90e6d52..9e98312 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -3073,7 +3073,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 /* FALLTHRU */
 case EXIT_PC_UPDATED:
 if (!use_exit_tb()) {
-tcg_gen_lookup_and_goto_ptr(cpu_pc);
+tcg_gen_lookup_and_goto_ptr();
 break;
 }
 /* FALLTHRU */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 3fa3902..818d7eb 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -379,7 +379,7 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 } else if (s->singlestep_enabled) {
 gen_exception_internal(EXCP_DEBUG);
 } else {
-tcg_gen_lookup_and_goto_ptr(cpu_pc);
+tcg_gen_lookup_and_goto_ptr();
 s->is_jmp = DISAS_TB_JUMP;
 }
 }
@@ -11366,7 +11366,7 @@ void gen_intermediate_code_a64(ARMCPU *cpu, 
TranslationBlock *tb)
 gen_goto_tb(dc, 1, dc->pc);
 break;
 case DISAS_JUMP:
-tcg_gen_lookup_and_goto_ptr(cpu_pc);
+tcg_gen_lookup_and_goto_ptr();
 break;
 case DISAS_TB_JUMP:
 case DISAS_EXC:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index e27736c..964b627 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4152,10 +4152,7 @@ static inline bool use_goto_tb(DisasContext *s, 
target_ulong dest)
 
 static void gen_goto_ptr(void)
 {
-TCGv addr = tcg_temp_new();
-tcg_gen_extu_i32_tl(addr, cpu_R[15]);
-tcg_gen_lookup_and_goto_ptr(addr);
-tcg_temp_free(addr);
+tcg_gen_lookup_and_goto_ptr();
 }
 
 /* This will end the TB but doesn't guarantee we'll return to
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index e10abc5..91053e2 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -517,7 +517,7 @@ static void gen_goto_tb(DisasContext *ctx, int which,
 if (ctx->singlestep_enabled) {
 gen_excp_1(EXCP_DEBUG);
 } else {
-tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f);
+tcg_gen_lookup_and_goto_ptr();
 }
 }
 }
@@ -1527,7 +1527,7 @@ static ExitStatus do_ibranch(DisasContext *ctx, TCGv dest,
 if (link != 0) {
 tcg_gen_movi_tl(cpu_gr[link], ctx->iaoq_n);
 }
-tcg_gen_lookup_and_goto_ptr(cpu_iaoq_f);
+tcg_gen_lookup_and_goto_ptr();
 return nullify_end(ctx, NO_EXIT);
 } else {
 cond_prep(>null_cond);
@@ -3885,7 +3885,7 @@ void 

[Qemu-devel] [PATCH v3 07/43] tcg/mips: constify tcg_target_callee_save_regs

2017-07-19 Thread Emilio G. Cota
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Emilio G. Cota 
---
 tcg/mips/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 85756b8..56db228 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2323,7 +2323,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return NULL;
 }
 
-static int tcg_target_callee_save_regs[] = {
+static const int tcg_target_callee_save_regs[] = {
 TCG_REG_S0,   /* used for the global env (TCG_AREG0) */
 TCG_REG_S1,
 TCG_REG_S2,
-- 
2.7.4




[Qemu-devel] [PATCH v3 32/43] tcg: take .helpers out of TCGContext

2017-07-19 Thread Emilio G. Cota
Groundwork for supporting multiple TCG contexts.

The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 tcg/tcg.h |  2 --
 tcg/tcg.c | 10 +-
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 22f7ecd..53c679f 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -664,8 +664,6 @@ struct TCGContext {
 
 tcg_insn_unit *code_ptr;
 
-GHashTable *helpers;
-
 #ifdef CONFIG_PROFILER
 /* profiling info */
 int64_t tb_count1;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 28c1b94..c0c2d6c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -319,6 +319,7 @@ typedef struct TCGHelperInfo {
 static const TCGHelperInfo all_helpers[] = {
 #include "exec/helper-tcg.h"
 };
+static GHashTable *helper_table;
 
 static int indirect_reg_alloc_order[ARRAY_SIZE(tcg_target_reg_alloc_order)];
 static void process_op_defs(TCGContext *s);
@@ -329,7 +330,6 @@ void tcg_context_init(TCGContext *s)
 TCGOpDef *def;
 TCGArgConstraint *args_ct;
 int *sorted_args;
-GHashTable *helper_table;
 
 memset(s, 0, sizeof(*s));
 s->nb_globals = 0;
@@ -357,7 +357,7 @@ void tcg_context_init(TCGContext *s)
 
 /* Register helpers.  */
 /* Use g_direct_hash/equal for direct pointer comparisons on func.  */
-s->helpers = helper_table = g_hash_table_new(NULL, NULL);
+helper_table = g_hash_table_new(NULL, NULL);
 
 for (i = 0; i < ARRAY_SIZE(all_helpers); ++i) {
 g_hash_table_insert(helper_table, (gpointer)all_helpers[i].func,
@@ -761,7 +761,7 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
 unsigned sizemask, flags;
 TCGHelperInfo *info;
 
-info = g_hash_table_lookup(s->helpers, (gpointer)func);
+info = g_hash_table_lookup(helper_table, (gpointer)func);
 flags = info->flags;
 sizemask = info->sizemask;
 
@@ -990,8 +990,8 @@ static char *tcg_get_arg_str_idx(TCGContext *s, char *buf,
 static inline const char *tcg_find_helper(TCGContext *s, uintptr_t val)
 {
 const char *ret = NULL;
-if (s->helpers) {
-TCGHelperInfo *info = g_hash_table_lookup(s->helpers, (gpointer)val);
+if (helper_table) {
+TCGHelperInfo *info = g_hash_table_lookup(helper_table, (gpointer)val);
 if (info) {
 ret = info->name;
 }
-- 
2.7.4




[Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static

2017-07-19 Thread Emilio G. Cota
It is only used by this object, and it's not exported to any other.

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/translate-all.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 63f8538..a124181 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -139,7 +139,7 @@ TCGContext tcg_ctx;
 bool parallel_cpus;
 
 /* translation block context */
-__thread int have_tb_lock;
+static __thread int have_tb_lock;
 
 static void page_table_config_init(void)
 {
-- 
2.7.4




[Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs

2017-07-19 Thread Emilio G. Cota
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Emilio G. Cota 
---
 tcg/i386/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 01e3b4e..06df01a 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2514,7 +2514,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return NULL;
 }
 
-static int tcg_target_callee_save_regs[] = {
+static const int tcg_target_callee_save_regs[] = {
 #if TCG_TARGET_REG_BITS == 64
 TCG_REG_RBP,
 TCG_REG_RBX,
-- 
2.7.4




[Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find

2017-07-19 Thread Emilio G. Cota
Reusing the have_tb_lock name, which is also defined in translate-all.c,
makes code reviewing unnecessarily harder.

Avoid potential confusion by renaming the local have_tb_lock variable
to something else.

Reviewed-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cpu-exec.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index d84b01d..c4c289b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -337,7 +337,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags;
-bool have_tb_lock = false;
+bool acquired_tb_lock = false;
 
 /* we record a subset of the CPU state. It will
always be the same before a given translated block
@@ -356,7 +356,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
  */
 mmap_lock();
 tb_lock();
-have_tb_lock = true;
+acquired_tb_lock = true;
 
 /* There's a chance that our desired tb has been translated while
  * taking the locks so we check again inside the lock.
@@ -384,15 +384,15 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 #endif
 /* See if we can patch the calling TB. */
 if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-if (!have_tb_lock) {
+if (!acquired_tb_lock) {
 tb_lock();
-have_tb_lock = true;
+acquired_tb_lock = true;
 }
 if (!tb->invalid) {
 tb_add_jump(last_tb, tb_exit, tb);
 }
 }
-if (have_tb_lock) {
+if (acquired_tb_lock) {
 tb_unlock();
 }
 return tb;
-- 
2.7.4




[Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts

2017-07-19 Thread Emilio G. Cota
v2:
  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04749.html

v3 applies on top of the current master (d4e59218a).

To ease review/testing, you can pull this series from:
  https://github.com/cota/qemu/tree/multi-tcg-v3

Note: I cannot even compile-test _WIN32 bits, help appreciated! See
patches 39/40.

Changes from v2:
- Rebase on top of current master (therefore dropping the first 2 patches,
  which are already on master)
  - Add sh4 bits, touching:
- Removal of argument to tb_lookup_ptr (merged into otherwise same v2 patch)
- tb_cflags() inline (new patch in v3 for sh4 and all other arches)
- CF_PARALLEL instead of parallel_cpus (sh4-only patch in v3)
- Add R-b tags
- Drop the patch removing the tb->invalid check.
- Introduce the patch implementing tb_lookup__cpu_state before the patches
  that fiddle with tb->cflags, so that we have a single place where to
  do that fiddling
  - Update commit log of the tb_lookup__cpu_state patch explaining
why tb->invalid must be checked when obtaining the *tb from tb_jmp_cache
- Improve comment next to CF_INVALID
- CF_PARALLEL:
  - Introduce tb_cflags inline to hide the atomic_read
- Add an extra patch to convert tb->cflags readers to tb_cflags
  - Rename curr_cf_mask() to curr_cflags()
- Remove many superfluous if (parallel_cpus) checks; just call curr_cflags()
  - Drop tb_cf_mask(); use CF_HASH_MASK instead
  - m68k: use gen_helper_exit_atomic instead of implementing cas2w_parallel
  - s390x: Richard: I dropped your R-b tag because v3 also includes csst.
  - sh4: add sh4 patch, as mentioned above
  - tcg_ctx.cf_parallel: use a bool instead of a u8
  - Do if (foo && (tb_cflags(tb) & BAR)) instead of (foo && tb_cflags() & BAR)
- Use a size_t for struct tb_tc.size, plugging the 4-byte hole
- Dynamically allocate TCG optimizer globals
  - Use directly a bitmap, instead of TCGTempSet for temps_used, which
saves some space
  - Add perf numbers for the change: ~2% slowdown
- **tcg_ctxs: get rid of tcg_ctxs_init
- TCGProfile: s/PROF_ADD_MAX/PROF_MAX/
- real_host_page_size: move to its own file with an init constructor,
  as suggested by Richard (Richard: I kept your R-b tag).
- qemu_mprotect helpers: g_assert on page-aligned address and size
  - Adapt callers in translate-all.c to pass page-aligned address and size
- TCG regions:
  - Hide the computation of n_regions from tcg_region_init's callers. The
function now takes no arguments. Add a comment about
qemu_tcg_mttcg_enabled().
  - if (!inited) { inited = true; do_init(); } in cpus.c
  - Use assert instead of if (err) tcg_abort();
  - Use QEMU_ALIGN_DOWN instead of &= mask
  - Inline set_guard_pages() into tcg_region_init
  - Merge patch that removes code_gen_buffer's guard page into the TCG
regions' patch
- TCG __thread:
  - Inline tcg_ctxs_init into tcg_context_init
  - Move the code that determines the number of regions from the previous
patch to this patch.

To be done after this series:
- Get rid of tb_lock, or at least push it down so that we take advantage of
  multiple TCG contexts in MTTCG. (I'm doing this in my testing, but doing
  it well will require another patch series.)

Improvements that were suggested during this series' development:
- Order tb->[*] comparisons by likelihood of mismatch.
- Get rid of parallel_cpus from from cpu_exec_step_atomic -- I'm not sure
  whether just removing it is safe, since we call curr_cflags from several
  places.
- Perhaps parse -accel=tcg command-line arguments before TCG is initialized,
  so that those arguments can be used during TCG initialization.

Thanks,

Emilio




[Qemu-devel] [Bug 1703506] Re: SMT not supported by QEMU on AMD Ryzen CPU

2017-07-19 Thread Imatimba
Attached Ubuntu 17.04 guest logs.
I wasn't able to run x86info as root. Only as regular user.
Error shown:
readEntry: Operation not permitted
error reading 1KB from 0x3fffc00

There are a few bug reports about it but no workarounds. Seems to happen on 
vm's.
So the output is missing a few sections.

>Also, can somebody confirm if the same Windows version works as
expected on bare metal?

Yes, same Windows version on bare metal works as expected. In my case showing 8 
cores and 16 threads/logical processors.
I'm trying to use 4 cores 8 threads in the VMs. Both Windows and Ubuntu are 
showing 8 physical cores.

** Attachment added: "ubuntu-guest-smt-ryzen.zip"
   
https://bugs.launchpad.net/qemu/+bug/1703506/+attachment/4917874/+files/ubuntu-guest-smt-ryzen.zip

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1703506

Title:
  SMT not supported by QEMU on AMD Ryzen CPU

Status in QEMU:
  New

Bug description:
  HyperThreading/SMT is supported by AMD Ryzen CPUs but results in this
  message when setting the topology to threads=2:

  qemu-system-x86_64: AMD CPU doesn't support hyperthreading. Please
  configure -smp options properly.

  Checking in a Windows 10 guest reveals that SMT is not enabled, and
  from what I understand, QEMU converts the topology from threads to
  cores internally on AMD CPUs. This appears to cause performance
  problems in the guest perhaps because programs are assuming that these
  threads are actual cores.

  Software: Linux 4.12, qemu 2.9.0 host with KVM enabled, Windows 10 pro
  guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1703506/+subscriptions



Re: [Qemu-devel] [PATCH v2 3/3] qemu.py: make 'args' public

2017-07-19 Thread Fam Zheng
On Thu, 07/20 10:38, Fam Zheng wrote:
> On Wed, 07/19 18:31, Amador Pahim wrote:
> > Let's make args public so users can extend it without felling like
> > abusing the internal API.
> 
> s/felling/feeling/ ?

Apart from that:

Reviewed-by: Fam Zheng 



[Qemu-devel] 答复: Re: [PATCH] vhost: fix a migration failed because ofvhost region merge

2017-07-19 Thread peng.hao2
原始邮件



发件人: 
收件人: 
抄送人:   
彭浩10096742王业超10154425 
日 期 :2017年07月19日 23:53
主 题 :Re: [Qemu-devel] [PATCH] vhost: fix a migration failed because ofvhost 
region merge





On Wed, Jul 19, 2017 at 03:24:27PM +0200, Igor Mammedov wrote:
> On Wed, 19 Jul 2017 12:46:13 +0100
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Igor Mammedov (imamm...@redhat.com) wrote:
> > > On Wed, 19 Jul 2017 23:17:32 +0800
> > > Peng Hao  wrote:
> > >   
> > > > When a guest that has several hotplugged dimms is migrated, in
> > > > destination host it will fail to resume. Because vhost regions of
> > > > several dimms in source host are merged and in the restore stage
> > > > in destination host it computes whether more than vhost slot limit
> > > > before merging vhost regions of several dimms.  
> > > could you provide a bit more detailed description of the problem
> > > including command line+used device_add commands on source and
> > > command line on destination?  
> > 
> > (ccing in Marc Andre and Maxime)
> > 
> > Hmm, I'd like to understade the situation where you get merging between
> > RAMBlocks that complicates some stuff for postcopy.
> and probably inconsistent merging breaks vhost as well
> 
> merging might happen if regions are adjacent or overlap
> but for that to happen merged regions must have equal
> distance between their GPA:HVA pairs, so that following
> translation would work:
> 
> if gva in regionX[gva_start, len, hva_start]
>hva = hva_start + gva - gva_start
> 
> while GVA of regions is under QEMU control and deterministic
> HVA is not, so in migration case merging might happen on source
> side but not on destination, resulting in different memory maps.
> 
> Maybe Michael might know details why migration works in vhost usecase,
> but I don't see vhost sending any vmstate data.

We aren't merging ramblocks at all.
When we are passing blocks A and B to vhost, if we see that

hvaB=hvaA + lenA
gpaB=gpaA + lenA

then we can improve performance a bit by passing a single
chunk to vhost: hvaA,gpaA,lena+lenB

so it does not affect migration normally.

- I think it is like this:

in source   in destination:(restore)

realize device 1  realize device 1

realize device 2  realize dimm 0

 ...   realize dimm1

   

realize device n  realize dimmx

   realize  device m

realize dimm0  .

realize dimm1  .

..  .

realize dimmxrealize device n




In restore stage ,the sort of realizing device  is different from starting vm 
because of adding dimms.

So it may in some stage during restoring can't merge vhost regions.














> 
> > 
> > > > 
> > > > Signed-off-by: Peng Hao 
> > > > Signed-off-by: Wang Yechao 
> > > > ---
> > > >  hw/mem/pc-dimm.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> > > > index ea67b46..bb0fa08 100644
> > > > --- a/hw/mem/pc-dimm.c
> > > > +++ b/hw/mem/pc-dimm.c
> > > > @@ -101,7 +101,7 @@ void pc_dimm_memory_plug(DeviceState *dev, 
> > > > MemoryHotplugState *hpms,
> > > >  goto out
> > > >  }
> > > >  
> > > > -if (!vhost_has_free_slot()) {
> > > > +if (!vhost_has_free_slot() && runstate_is_running()) {
> > > >  error_setg(_err, "a used vhost backend has no free"
> > > > " memory slots left")
> > > >  goto out  
> > 
> > Even this produces the wrong error message in this case,
> > it also makes me think if the existing code should undo a lot of
> > the object_property_set's that happen.
> > 
> > Dave
> > > 
> > >   
> > --
> > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 2/3] qemu.py: include debug information on launch error

2017-07-19 Thread Fam Zheng
On Wed, 07/19 18:31, Amador Pahim wrote:
> When launching a VM, if an exception happens and the VM is not
> initiated, it is useful to see the qemu command line that was executed
> and the output of that command.
> 
> Before the patch:
> 
> >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
> >>> VM.launch()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "qemu.py", line 137, in launch
> self._post_launch()
>   File "qemu.py", line 121, in _post_launch
> self._qmp.accept()
>   File "qmp/qmp.py", line 145, in accept
> self.__sock, _ = self.__sock.accept()
>   File "/usr/lib64/python2.7/socket.py", line 206, in accept
> sock, addr = self._sock.accept()
> socket.timeout: timed out
> 
> After the patch:
> 
> >>> VM = qemu.QEMUMachine('../aarch64-softmmu/qemu-system-aarch64')
> >>> VM.launch()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "qemu.py", line 156, in launch
> raise RuntimeError(msg)
> RuntimeError: Error launching VM.
> Original Exception:
> Traceback (most recent call last):
>   File "qemu.py", line 138, in launch
> self._post_launch()
>   File "qemu.py", line 122, in _post_launch
> self._qmp.accept()
>   File "qmp/qmp.py", line 145, in accept
> self.__sock, _ = self.__sock.accept()
>   File "/usr/lib64/python2.7/socket.py", line 206, in accept
> sock, addr = self._sock.accept()
> timeout: timed out
> Command:
> /usr/bin/qemu-system-aarch64 -chardev socket,id=mon,
> path=/var/tmp/qemu-23958-monitor.sock -mon chardev=mon,mode=control
> -display none -vga none
> Output:
> qemu-system-aarch64: No machine specified, and there is no default
> Use -machine help to list supported machines
> 
> Also, if the launch() faces an exception, the 'except' now will use args
> to fill the debug information. So this patch assigns 'args' earlier,
> assuring it will be available for the 'except'.
> 
> Signed-off-by: Amador Pahim 
> ---
>  scripts/qemu.py | 18 --
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/scripts/qemu.py b/scripts/qemu.py
> index f0fade32bd..2707ae7f75 100644
> --- a/scripts/qemu.py
> +++ b/scripts/qemu.py
> @@ -18,6 +18,7 @@ import os
>  import sys
>  import subprocess
>  import qmp.qmp
> +import traceback
>  
>  
>  class QEMUMachine(object):
> @@ -129,17 +130,30 @@ class QEMUMachine(object):
>  '''Launch the VM and establish a QMP connection'''
>  devnull = open('/dev/null', 'rb')
>  qemulog = open(self._qemu_log_path, 'wb')
> +args = self._wrapper + [self._binary] + self._base_args() + self.args
>  try:
>  self._pre_launch()
> -args = self._wrapper + [self._binary] + self._base_args() + 
> self._args
>  self._popen = subprocess.Popen(args, stdin=devnull, 
> stdout=qemulog,
> stderr=subprocess.STDOUT, 
> shell=False)
>  self._post_launch()
>  except:
> +self._load_io_log()
>  if self.is_running():
>  self._popen.kill()
>  self._popen.wait()
> -self._load_io_log()
> +else:
> +exc_type, exc_value, exc_traceback = sys.exc_info()
> +msg = ('Error launching VM.\n'
> +   'Original Exception: \n%s'
> +   'Command:\n%s\n'
> +   'Output:\n%s\n' %
> +   (''.join(traceback.format_exception(exc_type,
> +   exc_value,
> +   exc_traceback)),
> +' '.join(args),
> +self._iolog))
> +self._post_shutdown()
> +raise RuntimeError(msg)
>  self._post_shutdown()
>  raise
>  
> -- 
> 2.13.3
> 
> 

Reviewed-by: Fam Zheng 



[Qemu-devel] Can I mount encrypt qcow2?

2017-07-19 Thread 陳培泓
Can I mount encrypt qcow2 file through qemu-nbd?

I tried but failed and nothing about that in man page


Re: [Qemu-devel] [PATCH v2 1/3] qemu.py: fix is_running()

2017-07-19 Thread Fam Zheng
On Wed, 07/19 18:31, Amador Pahim wrote:
> Current implementation is broken. It does not really test if the child
> process is running.
> 
> The Popen.returncode will only be set after by a poll(), wait() or
> communicate(). If the Popen fails to launch a VM, the Popen.returncode
> will not turn to None by itself.
> 
> Instead of using Popen.returncode, let's use Popen.poll(), which
> actually checks if child process has terminated.
> 
> Signed-off-by: Amador Pahim 
> ---
>  scripts/qemu.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/qemu.py b/scripts/qemu.py
> index 880e3e8219..f0fade32bd 100644
> --- a/scripts/qemu.py
> +++ b/scripts/qemu.py
> @@ -86,7 +86,7 @@ class QEMUMachine(object):
>  raise
>  
>  def is_running(self):
> -return self._popen and (self._popen.returncode is None)
> +return self._popen and (self._popen.poll() is None)
>  
>  def exitcode(self):
>  if self._popen is None:
> -- 
> 2.13.3
> 
> 

Reviewed-by: Fam Zheng 



Re: [Qemu-devel] [PATCH v2 3/3] qemu.py: make 'args' public

2017-07-19 Thread Fam Zheng
On Wed, 07/19 18:31, Amador Pahim wrote:
> Let's make args public so users can extend it without felling like
> abusing the internal API.

s/felling/feeling/ ?

Fam

> 
> Signed-off-by: Amador Pahim 
> ---
>  scripts/qemu.py   | 13 +++--
>  tests/qemu-iotests/iotests.py | 18 +-
>  2 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/scripts/qemu.py b/scripts/qemu.py
> index 2707ae7f75..2c2043f89a 100644
> --- a/scripts/qemu.py
> +++ b/scripts/qemu.py
> @@ -34,7 +34,7 @@ class QEMUMachine(object):
>  self._qemu_log_path = os.path.join(test_dir, name + ".log")
>  self._popen = None
>  self._binary = binary
> -self._args = list(args) # Force copy args in case we modify them
> +self.args = list(args) # Force copy args in case we modify them
>  self._wrapper = wrapper
>  self._events = []
>  self._iolog = None
> @@ -44,8 +44,8 @@ class QEMUMachine(object):
>  # This can be used to add an unused monitor instance.
>  def add_monitor_telnet(self, ip, port):
>  args = 'tcp:%s:%d,server,nowait,telnet' % (ip, port)
> -self._args.append('-monitor')
> -self._args.append(args)
> +self.args.append('-monitor')
> +self.args.append(args)
>  
>  def add_fd(self, fd, fdset, opaque, opts=''):
>  '''Pass a file descriptor to the VM'''
> @@ -55,8 +55,8 @@ class QEMUMachine(object):
>  if opts:
>  options.append(opts)
>  
> -self._args.append('-add-fd')
> -self._args.append(','.join(options))
> +self.args.append('-add-fd')
> +self.args.append(','.join(options))
>  return self
>  
>  def send_fd_scm(self, fd_file_path):
> @@ -168,7 +168,8 @@ class QEMUMachine(object):
>  
>  exitcode = self._popen.wait()
>  if exitcode < 0:
> -sys.stderr.write('qemu received signal %i: %s\n' % 
> (-exitcode, ' '.join(self._args)))
> +sys.stderr.write('qemu received signal %i: %s\n' %
> + (-exitcode, ' '.join(self.args)))
>  self._load_io_log()
>  self._post_shutdown()
>  
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index abcf3c10e2..6925d8841e 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -150,13 +150,13 @@ class VM(qtest.QEMUQtestMachine):
>  self._num_drives = 0
>  
>  def add_device(self, opts):
> -self._args.append('-device')
> -self._args.append(opts)
> +self.args.append('-device')
> +self.args.append(opts)
>  return self
>  
>  def add_drive_raw(self, opts):
> -self._args.append('-drive')
> -self._args.append(opts)
> +self.args.append('-drive')
> +self.args.append(opts)
>  return self
>  
>  def add_drive(self, path, opts='', interface='virtio', format=imgfmt):
> @@ -172,17 +172,17 @@ class VM(qtest.QEMUQtestMachine):
>  if opts:
>  options.append(opts)
>  
> -self._args.append('-drive')
> -self._args.append(','.join(options))
> +self.args.append('-drive')
> +self.args.append(','.join(options))
>  self._num_drives += 1
>  return self
>  
>  def add_blockdev(self, opts):
> -self._args.append('-blockdev')
> +self.args.append('-blockdev')
>  if isinstance(opts, str):
> -self._args.append(opts)
> +self.args.append(opts)
>  else:
> -self._args.append(','.join(opts))
> +self.args.append(','.join(opts))
>  return self
>  
>  def pause_drive(self, drive, event=None):
> -- 
> 2.13.3
> 
> 



[Qemu-devel] Why "trace event does not exist"?

2017-07-19 Thread Sam
Hi all,

I want to add new trace-event and log it, so I add into $QEMU/trace-event
like this:

io_mem_init(void) ""
>

and after configure, $QEMU/build/trace-events-all also have this.
then I add code like this into $QEMU/exec.c

trace_io_mem_init();
>

then I `make` and `make install`.

But when I run `qemu-system-x86_64 -D /qemu.log -trace events=/qemu-events
...`
it warn me that:

qemu-system-x86_64:/qemu-events:1: WARNING: trace event 'io_mem_init' does
> not exist


and no io_mem_init log output. Why and what should I do to add this log?
Thank you~


Re: [Qemu-devel] [PATCH v6] qga: Add support network interface statistics in guest-network-get-interfaces command

2017-07-19 Thread no-reply
Hi,

This series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 1500512858-29428-1-git-send-email-lu.zhip...@zte.com.cn
Subject: [Qemu-devel] [PATCH v6] qga: Add support network interface statistics 
in guest-network-get-interfaces command
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-quick@centos6
time make docker-test-build@min-glib
time make docker-test-mingw@fedora
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1500512858-29428-1-git-send-email-lu.zhip...@zte.com.cn -> 
patchew/1500512858-29428-1-git-send-email-lu.zhip...@zte.com.cn
Switched to a new branch 'test'
b2303df qga: Add support network interface statistics in 
guest-network-get-interfaces command

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-jfmm8_pa/src/dtc'...
Submodule path 'dtc': checked out '558cd81bdd432769b59bff01240c44f82cfb1a9d'
  BUILD   centos6
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-jfmm8_pa/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
bison-2.4.1-5.el6.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
flex-2.5.35-9.el6.x86_64
gcc-4.4.7-18.el6.x86_64
git-1.7.1-8.el6.x86_64
glib2-devel-2.28.8-9.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ flex bison zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=90eccbe3ed9b
TERM=xterm
MAKEFLAGS= -j8
HISTSIZE=1000
J=8
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1   -I$(SRC_PATH)/dtc/libfdt -pthread 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 
-D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels 
-Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support  

[Qemu-devel] [PATCH v6] qga: Add support network interface statistics in guest-network-get-interfaces command

2017-07-19 Thread ZhiPeng Lu
we can get the network interface statistics inside a virtual machine by
guest-network-get-interfaces command. it is very useful for us to monitor
and analyze network traffic.

Signed-off-by: ZhiPeng Lu 

v1->v2:
 - correct some spelling mistake and add the stats data to the
   guest-network-get-interfaces command instead of adding a new command.
v2-v3:
 - optimize function implementation
v3->v4:
 - modify compile error
v4->v5:
 - rename some temporary variables and add str_trim_off function for
   calculating the space num in front of the string in guest_get_network_stats
v5->v6:
 - use g_strchug instead of str_trim_off implemented by myself
---
 qga/commands-posix.c | 72 +++-
 qga/qapi-schema.json | 38 ++-
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index d8e4122..b65dd8e 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1639,6 +1639,65 @@ guest_find_interface(GuestNetworkInterfaceList *head,
 return head;
 }
 
+static int guest_get_network_stats(const char *name,
+   GuestNetworkInterfaceStat *stats)
+{
+int name_len;
+char const *devinfo = "/proc/net/dev";
+FILE *fp;
+char *line = NULL, *colon;
+size_t n;
+fp = fopen(devinfo, "r");
+if (!fp) {
+return -1;
+}
+name_len = strlen(name);
+while (getline(, , fp) != -1) {
+long long dummy;
+long long rx_bytes;
+long long rx_packets;
+long long rx_errs;
+long long rx_dropped;
+long long tx_bytes;
+long long tx_packets;
+long long tx_errs;
+long long tx_dropped;
+char *trim_line;
+trim_line = g_strchug(line);
+if (trim_line[0] == '\0') {
+continue;
+}
+colon = strchr(trim_line, ':');
+if (!colon) {
+continue;
+}
+if (colon - name_len  == trim_line &&
+   strncmp(trim_line, name, name_len) == 0) {
+if (sscanf(colon + 1,
+"%lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld %lld 
%lld %lld %lld %lld",
+  _bytes, _packets, _errs, _dropped,
+  , , , ,
+  _bytes, _packets, _errs, _dropped,
+  , , , ) != 16) {
+continue;
+}
+stats->rx_bytes = rx_bytes;
+stats->rx_packets = rx_packets;
+stats->rx_errs = rx_errs;
+stats->rx_dropped = rx_dropped;
+stats->tx_bytes = tx_bytes;
+stats->tx_packets = tx_packets;
+stats->tx_errs = tx_errs;
+stats->tx_dropped = tx_dropped;
+fclose(fp);
+return 0;
+}
+}
+fclose(fp);
+g_debug("/proc/net/dev: Interface not found");
+return -1;
+}
+
 /*
  * Build information about guest interfaces
  */
@@ -1655,6 +1714,7 @@ GuestNetworkInterfaceList 
*qmp_guest_network_get_interfaces(Error **errp)
 for (ifa = ifap; ifa; ifa = ifa->ifa_next) {
 GuestNetworkInterfaceList *info;
 GuestIpAddressList **address_list = NULL, *address_item = NULL;
+GuestNetworkInterfaceStat  *interface_stat = NULL;
 char addr4[INET_ADDRSTRLEN];
 char addr6[INET6_ADDRSTRLEN];
 int sock;
@@ -1774,7 +1834,17 @@ GuestNetworkInterfaceList 
*qmp_guest_network_get_interfaces(Error **errp)
 
 info->value->has_ip_addresses = true;
 
-
+if (!info->value->has_statistics) {
+interface_stat = g_malloc0(sizeof(*interface_stat));
+if (guest_get_network_stats(info->value->name,
+interface_stat) == -1) {
+info->value->has_statistics = false;
+g_free(interface_stat);
+} else {
+info->value->statistics = interface_stat;
+info->value->has_statistics = true;
+}
+}
 }
 
 freeifaddrs(ifap);
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 03743ab..4ad5c57 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -643,6 +643,38 @@
'prefix': 'int'} }
 
 ##
+# @GuestNetworkInterfaceStat:
+#
+# @rx-bytes: total bytes received
+#
+# @rx-packets: total packets received
+#
+# @rx-errs: bad packets received
+#
+# @rx-dropped: receiver dropped packets
+#
+# @tx-bytes: total bytes transmitted
+#
+# @tx-packets: total packets transmitted
+#
+# @tx-errs: packet transmit problems
+#
+# @tx-dropped: dropped packets transmitted
+#
+# Since: 2.11
+##
+{ 'struct': 'GuestNetworkInterfaceStat',
+  'data': {'rx-bytes': 'uint64',
+'rx-packets': 'uint64',
+'rx-errs': 'uint64',
+'rx-dropped': 'uint64',
+'tx-bytes': 'uint64',
+'tx-packets': 'uint64',
+'tx-errs': 'uint64',
+'tx-dropped': 'uint64'
+   } }
+
+##
 # 

Re: [Qemu-devel] >256 Virtio-net-pci hotplug Devices

2017-07-19 Thread Kinsella, Ray

Hi Marcel,


You can use multi-function PCIe Root Ports, this will give you 8 ports
per slot, if you have 16 empty slots (I think we have more) you reach
128 root ports.
Then you can use multi-function  virtio-net-pci devices, this will
give you 8 functions per port, so you reach the target of 1024 devices.

You loose hot-plug granularity since you can hot-plug 8-functions group,
but maybe is OK, depending on your scenario.


Thanks for the advice losing the hotplug granularity is something I 
think I can live with. It would mean, I would have to track how many 
ports are allocated to a VM, and create 8 new ports when 1 is required, 
caching the other 7 for when they are needed.



Even so, you can use one cold-plugged pxb-pcie if you don't
have enough empty slots on pcie.0, in order to reach the maximum
number of PCIe Root Ports (256) which is the maximum for a single
PCI domain.


Took your advice see the attached cfg, it works exactly as you 
indicated. If you are interested, you can use it from your VM adding 
-readconfig to your qemu cmd line. I can currently only manage to start 
a VM with around 50 coldplugged virtio devices before something breaks. 
Not sure what yet, I will try scaling it with hotplugging tomorrow.




If you need granularity per single device (1000+ hot-pluggable),
you could enhance the pxb-pcie to support multiple pci domains.


Do think there would be much work in this?

Thanks,

Ray K


test.cfg.gz
Description: application/gzip


[Qemu-devel] [PULL v2 13/14] tcg/tci: enable bswap16_i64

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Altough correctly implemented, bswap16_i64() never got tested/executed so the
safety TODO() statement was never removed.

Since it got now tested the TODO() can be removed.

while running Alex Bennée's image aarch64-linux-3.15rc2-buildroot.img:

Trace 0x7fa1904b0890 [0: ffc00036cd04]

IN:
0xffc00036cd24:  5ac00694  rev16 w20, w20

OP:
  ffc00036cd24  
 ext32u_i64 tmp3,x20
 ext16u_i64 tmp2,tmp3
 bswap16_i64 x20,tmp2
 movi_i64 tmp4,$0x10
 shr_i64 tmp2,tmp3,tmp4
 ext16u_i64 tmp2,tmp2
 bswap16_i64 tmp2,tmp2
 deposit_i64 x20,x20,tmp2,$0x10,$0x10

Linking TBs 0x7fa1904b0890 [ffc00036cd04] index 0 -> 0x7fa1904b0aa0 
[ffc00036cd24]
Trace 0x7fa1904b0aa0 [0: ffc00036cd24]
TODO qemu/tci.c:1049: tcg_qemu_tb_exec()
qemu/tci.c:1049: tcg fatal error
Aborted

Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Jaroslaw Pelczar 
Reviewed-by: Alex Bennée 
Reviewed-by: Eric Blake 
Reviewed-by: Stefan Weil 
Message-Id: <20170718045540.16322-11-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 tcg/tci.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tcg/tci.c b/tcg/tci.c
index 4bdc645f2a..f39bfb95c0 100644
--- a/tcg/tci.c
+++ b/tcg/tci.c
@@ -1046,7 +1046,6 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t 
*tb_ptr)
 break;
 #if TCG_TARGET_HAS_bswap16_i64
 case INDEX_op_bswap16_i64:
-TODO();
 t0 = *tb_ptr++;
 t1 = tci_read_r16(_ptr);
 tci_write_reg64(t0, bswap16(t1));
-- 
2.13.3




[Qemu-devel] [PULL v2 11/14] target/sparc: optimize gen_op_mulscc() using deposit op

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Suggested-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170718045540.16322-9-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 67a83b77cc..d13173275f 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -632,11 +632,8 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv 
src2)
 
 // b2 = T0 & 1;
 // env->y = (b2 << 31) | (env->y >> 1);
-tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1);
-tcg_gen_shli_tl(r_temp, r_temp, 31);
 tcg_gen_extract_tl(t0, cpu_y, 1, 31);
-tcg_gen_or_tl(t0, t0, r_temp);
-tcg_gen_andi_tl(cpu_y, t0, 0x);
+tcg_gen_deposit_tl(cpu_y, t0, cpu_cc_src, 31, 1);
 
 // b1 = N ^ V;
 gen_mov_reg_N(t0, cpu_psr);
-- 
2.13.3




[Qemu-devel] [PULL v2 14/14] tcg: Pass generic CPUState to gen_intermediate_code()

2017-07-19 Thread Richard Henderson
From: Lluís Vilanova 

Needed to implement a target-agnostic gen_intermediate_code()
in the future.

Reviewed-by: David Gibson 
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Benneé 
Reviewed-by: Emilio G. Cota 
Signed-off-by: Lluís Vilanova 
Message-Id: <150002025498.22386.18051908483085660588.st...@frigg.lan>
Signed-off-by: Richard Henderson 
---
 include/exec/exec-all.h   | 2 +-
 target/arm/translate.h| 4 ++--
 accel/tcg/translate-all.c | 2 +-
 target/alpha/translate.c  | 5 ++---
 target/arm/translate-a64.c| 6 +++---
 target/arm/translate.c| 6 +++---
 target/cris/translate.c   | 7 +++
 target/hppa/translate.c   | 5 ++---
 target/i386/translate.c   | 5 ++---
 target/lm32/translate.c   | 4 ++--
 target/m68k/translate.c   | 5 ++---
 target/microblaze/translate.c | 4 ++--
 target/mips/translate.c   | 5 ++---
 target/moxie/translate.c  | 4 ++--
 target/nios2/translate.c  | 5 ++---
 target/openrisc/translate.c   | 4 ++--
 target/ppc/translate.c| 5 ++---
 target/s390x/translate.c  | 5 ++---
 target/sh4/translate.c| 5 ++---
 target/sparc/translate.c  | 5 ++---
 target/tilegx/translate.c | 5 ++---
 target/tricore/translate.c| 5 ++---
 target/unicore32/translate.c  | 5 ++---
 target/xtensa/translate.c | 5 ++---
 24 files changed, 49 insertions(+), 64 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 87b1b74e3b..440fc31b37 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -66,7 +66,7 @@ typedef ram_addr_t tb_page_addr_t;
 
 #include "qemu/log.h"
 
-void gen_intermediate_code(CPUArchState *env, struct TranslationBlock *tb);
+void gen_intermediate_code(CPUState *cpu, struct TranslationBlock *tb);
 void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb,
   target_ulong *data);
 
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 12fd79ba8e..2fe144baa9 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -149,7 +149,7 @@ static void disas_set_insn_syndrome(DisasContext *s, 
uint32_t syn)
 
 #ifdef TARGET_AARCH64
 void a64_translate_init(void);
-void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb);
+void gen_intermediate_code_a64(CPUState *cpu, TranslationBlock *tb);
 void gen_a64_set_pc_im(uint64_t val);
 void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
 fprintf_function cpu_fprintf, int flags);
@@ -158,7 +158,7 @@ static inline void a64_translate_init(void)
 {
 }
 
-static inline void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
+static inline void gen_intermediate_code_a64(CPUState *cpu, TranslationBlock 
*tb)
 {
 }
 
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 090ebad0a7..37ecafa931 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1280,7 +1280,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 tcg_func_start(_ctx);
 
 tcg_ctx.cpu = ENV_GET_CPU(env);
-gen_intermediate_code(env, tb);
+gen_intermediate_code(cpu, tb);
 tcg_ctx.cpu = NULL;
 
 trace_translate_block(tb, tb->pc, tb->tc_ptr);
diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 744d8bbf12..f465752208 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -2952,10 +2952,9 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 return ret;
 }
 
-void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
+void gen_intermediate_code(CPUState *cs, struct TranslationBlock *tb)
 {
-AlphaCPU *cpu = alpha_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
+CPUAlphaState *env = cs->env_ptr;
 DisasContext ctx, *ctxp = 
 target_ulong pc_start;
 target_ulong pc_mask;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 5bb0f8ef22..883e9df0c2 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -11179,10 +11179,10 @@ static void disas_a64_insn(CPUARMState *env, 
DisasContext *s)
 free_tmp_a64(s);
 }
 
-void gen_intermediate_code_a64(ARMCPU *cpu, TranslationBlock *tb)
+void gen_intermediate_code_a64(CPUState *cs, TranslationBlock *tb)
 {
-CPUState *cs = CPU(cpu);
-CPUARMState *env = >env;
+CPUARMState *env = cs->env_ptr;
+ARMCPU *cpu = arm_env_get_cpu(env);
 DisasContext dc1, *dc = 
 target_ulong pc_start;
 target_ulong next_page_start;
diff --git a/target/arm/translate.c b/target/arm/translate.c
index d3003ae0d8..d1a5f56998 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -11795,10 +11795,10 @@ static bool insn_crosses_page(CPUARMState *env, 
DisasContext *s)
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-void gen_intermediate_code(CPUARMState *env, TranslationBlock 

[Qemu-devel] [PULL v2 06/14] target/arm: Optimize aarch64 rev16

2017-07-19 Thread Richard Henderson
It is much shorter to reverse all 4 half-words in parallel
than extract, reverse, and deposit each in turn.

Suggested-by: Aurelien Jarno 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 24 ++--
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 3fa39023ca..5bb0f8ef22 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4043,25 +4043,13 @@ static void handle_rev16(DisasContext *s, unsigned int 
sf,
 TCGv_i64 tcg_rd = cpu_reg(s, rd);
 TCGv_i64 tcg_tmp = tcg_temp_new_i64();
 TCGv_i64 tcg_rn = read_cpu_reg(s, rn, sf);
+TCGv_i64 mask = tcg_const_i64(sf ? 0x00ff00ff00ff00ffull : 0x00ff00ff);
 
-tcg_gen_andi_i64(tcg_tmp, tcg_rn, 0x);
-tcg_gen_bswap16_i64(tcg_rd, tcg_tmp);
-
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 16);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 16, 16);
-
-if (sf) {
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 32);
-tcg_gen_andi_i64(tcg_tmp, tcg_tmp, 0x);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 32, 16);
-
-tcg_gen_shri_i64(tcg_tmp, tcg_rn, 48);
-tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
-tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_tmp, 48, 16);
-}
+tcg_gen_shri_i64(tcg_tmp, tcg_rn, 8);
+tcg_gen_and_i64(tcg_rd, tcg_rn, mask);
+tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
+tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
+tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
 
 tcg_temp_free_i64(tcg_tmp);
 }
-- 
2.13.3




[Qemu-devel] [PULL v2 10/14] target/sparc: optimize various functions using extract op

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Done with the Coccinelle semantic patch
scripts/coccinelle/tcg_gen_extract.cocci.

Reviewed-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index aa6734d54e..67a83b77cc 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -380,29 +380,25 @@ static inline void gen_goto_tb(DisasContext *s, int 
tb_num,
 static inline void gen_mov_reg_N(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_NEG_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_NEG_SHIFT, 1);
 }
 
 static inline void gen_mov_reg_Z(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_ZERO_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_ZERO_SHIFT, 1);
 }
 
 static inline void gen_mov_reg_V(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_OVF_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_OVF_SHIFT, 1);
 }
 
 static inline void gen_mov_reg_C(TCGv reg, TCGv_i32 src)
 {
 tcg_gen_extu_i32_tl(reg, src);
-tcg_gen_shri_tl(reg, reg, PSR_CARRY_SHIFT);
-tcg_gen_andi_tl(reg, reg, 0x1);
+tcg_gen_extract_tl(reg, reg, PSR_CARRY_SHIFT, 1);
 }
 
 static inline void gen_op_add_cc(TCGv dst, TCGv src1, TCGv src2)
@@ -638,8 +634,7 @@ static inline void gen_op_mulscc(TCGv dst, TCGv src1, TCGv 
src2)
 // env->y = (b2 << 31) | (env->y >> 1);
 tcg_gen_andi_tl(r_temp, cpu_cc_src, 0x1);
 tcg_gen_shli_tl(r_temp, r_temp, 31);
-tcg_gen_shri_tl(t0, cpu_y, 1);
-tcg_gen_andi_tl(t0, t0, 0x7fff);
+tcg_gen_extract_tl(t0, cpu_y, 1, 31);
 tcg_gen_or_tl(t0, t0, r_temp);
 tcg_gen_andi_tl(cpu_y, t0, 0x);
 
-- 
2.13.3




[Qemu-devel] [PULL v2 12/14] target/alpha: optimize gen_cvtlq() using deposit op

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Suggested-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170718045540.16322-10-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 target/alpha/translate.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/target/alpha/translate.c b/target/alpha/translate.c
index 90e6d5285f..744d8bbf12 100644
--- a/target/alpha/translate.c
+++ b/target/alpha/translate.c
@@ -783,11 +783,9 @@ static void gen_cvtlq(TCGv vc, TCGv vb)
 
 /* The arithmetic right shift here, plus the sign-extended mask below
yields a sign-extended result without an explicit ext32s_i64.  */
-tcg_gen_sari_i64(tmp, vb, 32);
-tcg_gen_shri_i64(vc, vb, 29);
-tcg_gen_andi_i64(tmp, tmp, (int32_t)0xc000);
-tcg_gen_andi_i64(vc, vc, 0x3fff);
-tcg_gen_or_i64(vc, vc, tmp);
+tcg_gen_shri_i64(tmp, vb, 29);
+tcg_gen_sari_i64(vc, vb, 32);
+tcg_gen_deposit_i64(vc, vc, tmp, 0, 30);
 
 tcg_temp_free(tmp);
 }
-- 
2.13.3




[Qemu-devel] [PULL v2 09/14] target/ppc: optimize various functions using extract op

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Done with the Coccinelle semantic patch
scripts/coccinelle/tcg_gen_extract.cocci.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Acked-by: David Gibson 
Message-Id: <20170718045540.16322-6-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 target/ppc/translate.c  | 21 +++--
 target/ppc/translate/vsx-impl.inc.c | 24 
 2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c0cd64d927..de271af52b 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -873,8 +873,7 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv 
ret, TCGv arg1,
 }
 tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ carry */
 tcg_temp_free(t1);
-tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);   /* extract bit 32 */
-tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
 if (is_isa300(ctx)) {
 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 }
@@ -1404,8 +1403,7 @@ static inline void gen_op_arith_subf(DisasContext *ctx, 
TCGv ret, TCGv arg1,
 tcg_temp_free(inv1);
 tcg_gen_xor_tl(cpu_ca, t0, t1); /* bits changes w/ carry */
 tcg_temp_free(t1);
-tcg_gen_shri_tl(cpu_ca, cpu_ca, 32);/* extract bit 32 */
-tcg_gen_andi_tl(cpu_ca, cpu_ca, 1);
+tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1);
 if (is_isa300(ctx)) {
 tcg_gen_mov_tl(cpu_ca32, cpu_ca);
 }
@@ -4336,8 +4334,7 @@ static void gen_mfsrin(DisasContext *ctx)
 
 CHK_SV;
 t0 = tcg_temp_new();
-tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4);
 gen_helper_load_sr(cpu_gpr[rD(ctx->opcode)], cpu_env, t0);
 tcg_temp_free(t0);
 #endif /* defined(CONFIG_USER_ONLY) */
@@ -4368,8 +4365,7 @@ static void gen_mtsrin(DisasContext *ctx)
 CHK_SV;
 
 t0 = tcg_temp_new();
-tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4);
 gen_helper_store_sr(cpu_env, t0, cpu_gpr[rD(ctx->opcode)]);
 tcg_temp_free(t0);
 #endif /* defined(CONFIG_USER_ONLY) */
@@ -4403,8 +4399,7 @@ static void gen_mfsrin_64b(DisasContext *ctx)
 
 CHK_SV;
 t0 = tcg_temp_new();
-tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4);
 gen_helper_load_sr(cpu_gpr[rD(ctx->opcode)], cpu_env, t0);
 tcg_temp_free(t0);
 #endif /* defined(CONFIG_USER_ONLY) */
@@ -4435,8 +4430,7 @@ static void gen_mtsrin_64b(DisasContext *ctx)
 
 CHK_SV;
 t0 = tcg_temp_new();
-tcg_gen_shri_tl(t0, cpu_gpr[rB(ctx->opcode)], 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, cpu_gpr[rB(ctx->opcode)], 28, 4);
 gen_helper_store_sr(cpu_env, t0, cpu_gpr[rS(ctx->opcode)]);
 tcg_temp_free(t0);
 #endif /* defined(CONFIG_USER_ONLY) */
@@ -5414,8 +5408,7 @@ static void gen_mfsri(DisasContext *ctx)
 CHK_SV;
 t0 = tcg_temp_new();
 gen_addr_reg_index(ctx, t0);
-tcg_gen_shri_tl(t0, t0, 28);
-tcg_gen_andi_tl(t0, t0, 0xF);
+tcg_gen_extract_tl(t0, t0, 28, 4);
 gen_helper_load_sr(cpu_gpr[rd], cpu_env, t0);
 tcg_temp_free(t0);
 if (ra != 0 && ra != rd)
diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 7f12908029..85ed135d44 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1248,8 +1248,7 @@ static void gen_xsxexpdp(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VSXU);
 return;
 }
-tcg_gen_shri_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52);
-tcg_gen_andi_i64(rt, rt, 0x7FF);
+tcg_gen_extract_i64(rt, cpu_vsrh(xB(ctx->opcode)), 52, 11);
 }
 
 static void gen_xsxexpqp(DisasContext *ctx)
@@ -1262,8 +1261,7 @@ static void gen_xsxexpqp(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VSXU);
 return;
 }
-tcg_gen_shri_i64(xth, xbh, 48);
-tcg_gen_andi_i64(xth, xth, 0x7FFF);
+tcg_gen_extract_i64(xth, xbh, 48, 15);
 tcg_gen_movi_i64(xtl, 0);
 }
 
@@ -1323,8 +1321,7 @@ static void gen_xsxsigdp(DisasContext *ctx)
 zr = tcg_const_i64(0);
 nan = tcg_const_i64(2047);
 
-tcg_gen_shri_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52);
-tcg_gen_andi_i64(exp, exp, 0x7FF);
+tcg_gen_extract_i64(exp, cpu_vsrh(xB(ctx->opcode)), 52, 11);
 tcg_gen_movi_i64(t0, 0x0010);
 tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, zr, zr, t0);
 tcg_gen_movcond_i64(TCG_COND_EQ, t0, exp, nan, zr, t0);
@@ -1352,8 

[Qemu-devel] [PULL v2 07/14] target/arm: optimize aarch32 rev16

2017-07-19 Thread Richard Henderson
From: Aurelien Jarno 

Use the same mask to avoid having to load two different constants, as
suggested by Richard Henderson.

Signed-off-by: Aurelien Jarno 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-Id: <20170516230159.4195-2-aurel...@aurel32.net>
Signed-off-by: Richard Henderson 
---
 target/arm/translate.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index e27736ce5b..d3003ae0d8 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -343,11 +343,13 @@ static void gen_smul_dual(TCGv_i32 a, TCGv_i32 b)
 static void gen_rev16(TCGv_i32 var)
 {
 TCGv_i32 tmp = tcg_temp_new_i32();
+TCGv_i32 mask = tcg_const_i32(0x00ff00ff);
 tcg_gen_shri_i32(tmp, var, 8);
-tcg_gen_andi_i32(tmp, tmp, 0x00ff00ff);
+tcg_gen_and_i32(tmp, tmp, mask);
+tcg_gen_and_i32(var, var, mask);
 tcg_gen_shli_i32(var, var, 8);
-tcg_gen_andi_i32(var, var, 0xff00ff00);
 tcg_gen_or_i32(var, var, tmp);
+tcg_temp_free_i32(mask);
 tcg_temp_free_i32(tmp);
 }
 
-- 
2.13.3




[Qemu-devel] [PULL v2 08/14] target/m68k: optimize bcd_flags() using extract op

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Done with the Coccinelle semantic patch
scripts/coccinelle/tcg_gen_extract.cocci.

Signed-off-by: Philippe Mathieu-Daudé 
Acked-by: Laurent Vivier 
Reviewed-by: Richard Henderson 
Message-Id: <20170718045540.16322-5-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 target/m68k/translate.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 3a519b790d..e709e6cde2 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -1749,8 +1749,7 @@ static void bcd_flags(TCGv val)
 tcg_gen_andi_i32(QREG_CC_C, val, 0x0ff);
 tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_C);
 
-tcg_gen_shri_i32(QREG_CC_C, val, 8);
-tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
+tcg_gen_extract_i32(QREG_CC_C, val, 8, 1);
 
 tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C);
 }
-- 
2.13.3




[Qemu-devel] [PULL v2 01/14] tcg/mips: reserve a register for the guest_base.

2017-07-19 Thread Richard Henderson
From: Jiang Biao 

Reserve a register for the guest_base using ppc code for reference.
By doing so, we do not have to recompute it for every memory load.

Signed-off-by: Jiang Biao 
Signed-off-by: Richard Henderson 
Message-Id: <1499677934-2249-1-git-send-email-jiang.bi...@zte.com.cn>
---
 tcg/mips/tcg-target.inc.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 85756b81d5..1a8169f5fc 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -85,6 +85,10 @@ static const char * const 
tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #define TCG_TMP2  TCG_REG_T8
 #define TCG_TMP3  TCG_REG_T7
 
+#ifndef CONFIG_SOFTMMU
+#define TCG_GUEST_BASE_REG TCG_REG_S1
+#endif
+
 /* check if we really need so many registers :P */
 static const int tcg_target_reg_alloc_order[] = {
 /* Call saved registers.  */
@@ -1547,8 +1551,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 } else if (guest_base == (int16_t)guest_base) {
 tcg_out_opc_imm(s, ALIAS_PADDI, base, addr_regl, guest_base);
 } else {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, guest_base);
-tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP0, addr_regl);
+tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_GUEST_BASE_REG, addr_regl);
 }
 tcg_out_qemu_ld_direct(s, data_regl, data_regh, base, opc, is_64);
 #endif
@@ -1652,8 +1655,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is_64)
 } else if (guest_base == (int16_t)guest_base) {
 tcg_out_opc_imm(s, ALIAS_PADDI, base, addr_regl, guest_base);
 } else {
-tcg_out_movi(s, TCG_TYPE_PTR, TCG_TMP0, guest_base);
-tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_TMP0, addr_regl);
+tcg_out_opc_reg(s, ALIAS_PADD, base, TCG_GUEST_BASE_REG, addr_regl);
 }
 tcg_out_qemu_st_direct(s, data_regl, data_regh, base, opc);
 #endif
@@ -2452,6 +2454,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
TCG_REG_SP, SAVE_OFS + i * REG_SIZE);
 }
 
+#ifndef CONFIG_SOFTMMU
+if (guest_base) {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, guest_base);
+tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+}
+#endif
+
 /* Call generated code */
 tcg_out_opc_reg(s, OPC_JR, 0, tcg_target_call_iarg_regs[1], 0);
 /* delay slot */
-- 
2.13.3




[Qemu-devel] [PULL v2 03/14] tcg: Expand glue macros before stringifying helper names

2017-07-19 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 include/exec/helper-tcg.h | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/include/exec/helper-tcg.h b/include/exec/helper-tcg.h
index bb9287727c..b0c5bafa99 100644
--- a/include/exec/helper-tcg.h
+++ b/include/exec/helper-tcg.h
@@ -6,31 +6,35 @@
 
 #include "exec/helper-head.h"
 
+/* Need one more level of indirection before stringification
+   to get all the macros expanded first.  */
+#define str(s) #s
+
 #define DEF_HELPER_FLAGS_0(NAME, FLAGS, ret) \
-  { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \
+  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
 .sizemask = dh_sizemask(ret, 0) },
 
 #define DEF_HELPER_FLAGS_1(NAME, FLAGS, ret, t1) \
-  { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \
+  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
 .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) },
 
 #define DEF_HELPER_FLAGS_2(NAME, FLAGS, ret, t1, t2) \
-  { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \
+  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
 .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
 | dh_sizemask(t2, 2) },
 
 #define DEF_HELPER_FLAGS_3(NAME, FLAGS, ret, t1, t2, t3) \
-  { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \
+  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
 .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
 | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) },
 
 #define DEF_HELPER_FLAGS_4(NAME, FLAGS, ret, t1, t2, t3, t4) \
-  { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \
+  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
 .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
 | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) },
 
 #define DEF_HELPER_FLAGS_5(NAME, FLAGS, ret, t1, t2, t3, t4, t5) \
-  { .func = HELPER(NAME), .name = #NAME, .flags = FLAGS, \
+  { .func = HELPER(NAME), .name = str(NAME), .flags = FLAGS, \
 .sizemask = dh_sizemask(ret, 0) | dh_sizemask(t1, 1) \
 | dh_sizemask(t2, 2) | dh_sizemask(t3, 3) | dh_sizemask(t4, 4) \
 | dh_sizemask(t5, 5) },
@@ -39,6 +43,7 @@
 #include "trace/generated-helpers.h"
 #include "tcg-runtime.h"
 
+#undef str
 #undef DEF_HELPER_FLAGS_0
 #undef DEF_HELPER_FLAGS_1
 #undef DEF_HELPER_FLAGS_2
-- 
2.13.3




[Qemu-devel] [PULL v2 04/14] coccinelle: ignore ASTs pre-parsed cached C files

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

files generated using coccinelle tool: 'spatch --use-cache'

Reviewed-by: Eric Blake 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170718045540.16322-2-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 .gitignore | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.gitignore b/.gitignore
index 09c2363acf..cf65316863 100644
--- a/.gitignore
+++ b/.gitignore
@@ -116,6 +116,8 @@ tags
 TAGS
 docker-src.*
 *~
+*.ast_raw
+*.depend_raw
 trace.h
 trace.c
 trace-ust.h
-- 
2.13.3




[Qemu-devel] [PULL v2 05/14] coccinelle: add a script to optimize tcg op using tcg_gen_extract()

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

The following thread was helpful while writing this script:

https://github.com/coccinelle/coccinelle/issues/86

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170718045540.16322-3-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 scripts/coccinelle/tcg_gen_extract.cocci | 107 +++
 1 file changed, 107 insertions(+)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci

diff --git a/scripts/coccinelle/tcg_gen_extract.cocci 
b/scripts/coccinelle/tcg_gen_extract.cocci
new file mode 100644
index 00..81e66a35ae
--- /dev/null
+++ b/scripts/coccinelle/tcg_gen_extract.cocci
@@ -0,0 +1,107 @@
+// optimize TCG using extract op
+//
+// Copyright: (C) 2017 Philippe Mathieu-Daudé. GPLv2+.
+// Confidence: High
+// Options: --macro-file scripts/cocci-macro-file.h
+//
+// Nikunj A Dadhania optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-02/msg05211.html
+// Aurelien Jarno optimization:
+// http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg01466.html
+//
+// This script can be run either using spatch locally or via a docker image:
+//
+// $ spatch \
+// --macro-file scripts/cocci-macro-file.h \
+// --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
+// --keep-comments --in-place \
+// --use-gitgrep --dir target
+//
+// $ docker run --rm -v `pwd`:`pwd` -w `pwd` philmd/coccinelle \
+// --macro-file scripts/cocci-macro-file.h \
+// --sp-file scripts/coccinelle/tcg_gen_extract.cocci \
+// --keep-comments --in-place \
+// --use-gitgrep --dir target
+
+@initialize:python@
+@@
+import sys
+fd = sys.stderr
+def debug(msg="", trailer="\n"):
+fd.write("[DBG] " + msg + trailer)
+def low_bits_count(value):
+bits_count = 0
+while (value & (1 << bits_count)):
+bits_count += 1
+return bits_count
+def Mn(order): # Mersenne number
+return (1 << order) - 1
+
+@match@
+identifier ret;
+metavariable arg;
+constant ofs, msk;
+position shr_p, and_p;
+@@
+(
+tcg_gen_shri_i32@shr_p
+|
+tcg_gen_shri_i64@shr_p
+|
+tcg_gen_shri_tl@shr_p
+)(ret, arg, ofs);
+...  WHEN != ret
+(
+tcg_gen_andi_i32@and_p
+|
+tcg_gen_andi_i64@and_p
+|
+tcg_gen_andi_tl@and_p
+)(ret, ret, msk);
+
+@script:python verify_len depends on match@
+ret_s << match.ret;
+msk_s << match.msk;
+shr_p << match.shr_p;
+extract_len;
+@@
+is_optimizable = False
+debug("candidate at %s:%s" % (shr_p[0].file, shr_p[0].line))
+try: # only eval integer, no #define like 'SR_M' (cpp did this, else some 
headers are missing).
+msk_v = long(msk_s.strip("UL"), 0)
+msk_b = low_bits_count(msk_v)
+if msk_b == 0:
+debug("  value: 0x%x low_bits: %d" % (msk_v, msk_b))
+else:
+debug("  value: 0x%x low_bits: %d [Mersenne number: 0x%x]" % (msk_v, 
msk_b, Mn(msk_b)))
+is_optimizable = Mn(msk_b) == msk_v # check low_bits
+coccinelle.extract_len = "%d" % msk_b
+debug("  candidate %s optimizable" % ("IS" if is_optimizable else "is 
NOT"))
+except:
+debug("  ERROR (check included headers?)")
+cocci.include_match(is_optimizable)
+debug()
+
+@replacement depends on verify_len@
+identifier match.ret;
+metavariable match.arg;
+constant match.ofs, match.msk;
+position match.shr_p, match.and_p;
+identifier verify_len.extract_len;
+@@
+(
+-tcg_gen_shri_i32@shr_p(ret, arg, ofs);
++tcg_gen_extract_i32(ret, arg, ofs, extract_len);
+...  WHEN != ret
+-tcg_gen_andi_i32@and_p(ret, ret, msk);
+|
+-tcg_gen_shri_i64@shr_p(ret, arg, ofs);
++tcg_gen_extract_i64(ret, arg, ofs, extract_len);
+...  WHEN != ret
+-tcg_gen_andi_i64@and_p(ret, ret, msk);
+|
+-tcg_gen_shri_tl@shr_p(ret, arg, ofs);
++tcg_gen_extract_tl(ret, arg, ofs, extract_len);
+...  WHEN != ret
+-tcg_gen_andi_tl@and_p(ret, ret, msk);
+)
-- 
2.13.3




[Qemu-devel] [PULL v2 02/14] util/cacheinfo: Add missing include for ppc linux

2017-07-19 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

This include was forgotten when splitting cacheinfo.c out of
tcg/ppc/tcg-target.inc.c (see commit b255b2c8).

For a Centos7 host, the include path


  

  


implicitly pulls in the desired AT_* defines.
Not so for Debian Jessie.

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170711015524.22936-1-f4...@amsat.org>
Signed-off-by: Richard Henderson 
---
 util/cacheinfo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/util/cacheinfo.c b/util/cacheinfo.c
index 6253049533..593940f27b 100644
--- a/util/cacheinfo.c
+++ b/util/cacheinfo.c
@@ -129,6 +129,7 @@ static void arch_cache_info(int *isize, int *dsize)
 }
 
 #elif defined(_ARCH_PPC) && defined(__linux__)
+# include "elf.h"
 
 static void arch_cache_info(int *isize, int *dsize)
 {
-- 
2.13.3




[Qemu-devel] [PULL v2 00/14] tcg-next patch queue

2017-07-19 Thread Richard Henderson
This edition is a real mix:
  * Code gen improvement for mips64 host (Jiang)
  * Build fix for ppc-linux (Philippe)
  * Runtime fix for tci (Philippe)
  * Fix atomic helper names in debugging dumps (rth)

  * Cross-target tcg code gen improvements (Philippe)
This one had no obvious tree through which it should go,
so I went ahead and took them all.

  * Cherry-picked the first patch from Lluis' generic translate loop,
wherein the interface to gen_intermediate_code changes trivially.
It's the only patch from that series that touches all targets,
and I see little point carrying it around further.

V2: Fixed typo in the sparc mulscc deposit patch.


r~


The following changes since commit d4e59218ab80e86015753782fb5378767a51ccd0:

  Merge remote-tracking branch 
'remotes/berrange/tags/pull-qcrypto-2017-07-18-2' into staging (2017-07-19 
20:45:37 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20170719

for you to fetch changes up to 9c489ea6bed134fecfd556b439c68bba48fbe102:

  tcg: Pass generic CPUState to gen_intermediate_code() (2017-07-19 14:45:16 
-0700)


Queued tcg and tcg code gen related cleanups


Aurelien Jarno (1):
  target/arm: optimize aarch32 rev16

Jiang Biao (1):
  tcg/mips: reserve a register for the guest_base.

Lluís Vilanova (1):
  tcg: Pass generic CPUState to gen_intermediate_code()

Philippe Mathieu-Daudé (9):
  util/cacheinfo: Add missing include for ppc linux
  coccinelle: ignore ASTs pre-parsed cached C files
  coccinelle: add a script to optimize tcg op using tcg_gen_extract()
  target/m68k: optimize bcd_flags() using extract op
  target/ppc: optimize various functions using extract op
  target/sparc: optimize various functions using extract op
  target/sparc: optimize gen_op_mulscc() using deposit op
  target/alpha: optimize gen_cvtlq() using deposit op
  tcg/tci: enable bswap16_i64

Richard Henderson (2):
  tcg: Expand glue macros before stringifying helper names
  target/arm: Optimize aarch64 rev16

 include/exec/exec-all.h  |   2 +-
 include/exec/helper-tcg.h|  17 +++--
 target/arm/translate.h   |   4 +-
 accel/tcg/translate-all.c|   2 +-
 target/alpha/translate.c |  13 ++--
 target/arm/translate-a64.c   |  30 +++--
 target/arm/translate.c   |  12 ++--
 target/cris/translate.c  |   7 +-
 target/hppa/translate.c  |   5 +-
 target/i386/translate.c  |   5 +-
 target/lm32/translate.c  |   4 +-
 target/m68k/translate.c  |   8 +--
 target/microblaze/translate.c|   4 +-
 target/mips/translate.c  |   5 +-
 target/moxie/translate.c |   4 +-
 target/nios2/translate.c |   5 +-
 target/openrisc/translate.c  |   4 +-
 target/ppc/translate.c   |  26 +++-
 target/ppc/translate/vsx-impl.inc.c  |  24 +++
 target/s390x/translate.c |   5 +-
 target/sh4/translate.c   |   5 +-
 target/sparc/translate.c |  25 +++-
 target/tilegx/translate.c|   5 +-
 target/tricore/translate.c   |   5 +-
 target/unicore32/translate.c |   5 +-
 target/xtensa/translate.c|   5 +-
 tcg/mips/tcg-target.inc.c|  17 +++--
 tcg/tci.c|   1 -
 util/cacheinfo.c |   1 +
 .gitignore   |   2 +
 scripts/coccinelle/tcg_gen_extract.cocci | 107 +++
 31 files changed, 218 insertions(+), 146 deletions(-)
 create mode 100644 scripts/coccinelle/tcg_gen_extract.cocci



Re: [Qemu-devel] [PULL 0/8] target/alpha cleanups

2017-07-19 Thread Richard Henderson

On 07/19/2017 06:57 AM, Peter Maydell wrote:

On 19 July 2017 at 05:45, Richard Henderson  wrote:

The new title holder for perf top is helper_lookup_tb_ptr.
Those targets that have a complicated cpu_get_tb_cpu_state
function are going to regret that.


Yeah, Paolo's pointed out (and had some patches for)
ARM's rather complicated cpu_get_tb_cpu_state(). My
issue with his suggested fixes was that they were
pretty fragile in terms of not having any guarantee
that the change always produced the right tb cpu
state flags answer...


Oh?  I must have missed seeing this one.
A quick patchwork search doesn't pull it up;
do either of you have a link?


r~



Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-19 Thread Jacob Pan
On Wed, 19 Jul 2017 18:45:43 +0800
"Liu, Yi L"  wrote:

> On Mon, Jul 17, 2017 at 04:45:15PM -0600, Alex Williamson wrote:
> > On Mon, 17 Jul 2017 10:58:41 +
> > "Liu, Yi L"  wrote:
> >   
> > > Hi Alex,
> > > 
> > > Pls refer to the response inline.
> > >   
> > > > -Original Message-
> > > > From: kvm-ow...@vger.kernel.org
> > > > [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Alex Williamson
> > > > Sent: Saturday, July 15, 2017 2:16 AM
> > > > To: Liu, Yi L 
> > > > Cc: Jean-Philippe Brucker ;
> > > > Tian, Kevin ; Liu, Yi L
> > > > ; Lan, Tianyu ;
> > > > Raj, Ashok ; k...@vger.kernel.org;
> > > > jasow...@redhat.com; Will Deacon ;
> > > > pet...@redhat.com; qemu-devel@nongnu.org;
> > > > io...@lists.linux-foundation.org; Pan, Jacob jun
> > > > ; Joerg Roedel 
> > > > Subject: Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL
> > > > for IOMMU TLB invalidate propagation
> > > > 
> > > > On Fri, 14 Jul 2017 08:58:02 +
> > > > "Liu, Yi L"  wrote:
> > > > 
> > > > > Hi Alex,
> > > > >
> > > > > Against to the opaque open, I'd like to propose the following
> > > > > definition based on the existing comments. Pls note that I've
> > > > > merged the pasid table binding and iommu tlb invalidation
> > > > > into a single IOCTL and make different flags to indicate the
> > > > > iommu operations. Per Kevin's comments, there may be iommu
> > > > > invalidation for guest IOVA tlb, so I renamed the IOCTL and
> > > > > data structure to be non-svm specific. Pls kindly have a
> > > > > review, so that we can make the opaque open closed and move
> > > > > forward. Surely, comments and ideas are welcomed. And for the
> > > > > scope and flags definition in struct iommu_tlb_invalidate,
> > > > > it's also welcomed to
> > > > give your ideas on it.
> > > > >
> > > > > 1. Add a VFIO IOCTL for iommu operations from user-space
> > > > >
> > > > > #define VFIO_IOMMU_OP_IOCTL _IO(VFIO_TYPE, VFIO_BASE + 24)
> > > > >
> > > > > Corresponding data structure:
> > > > > struct vfio_iommu_operation_info {
> > > > >   __u32   argsz;
> > > > > #define VFIO_IOMMU_BIND_PASIDTBL  (1 << 0) /* Bind
> > > > > PASID Table */ #define VFIO_IOMMU_BIND_PASID  (1 <<
> > > > > 1) /* Bind PASID from userspace
> > > > driver*/
> > > > > #define VFIO_IOMMU_BIND_PGTABLE   (1 << 2) /* Bind guest
> > > > > mmu page table */ #define VFIO_IOMMU_INVAL_IOTLB  (1 <<
> > > > > 3) /* Invalidate iommu tlb */ __u32   flag;
> > > > >   __u32   length; // length of the data[] part in
> > > > > byte __u8 data[]; // stores the data for iommu op
> > > > > indicated by flag field };
> > > > 
> > > > If we're doing a generic "Ops" ioctl, then we should have an
> > > > "op" field which is defined by an enum.  It doesn't make sense
> > > > to use flags for this, for example can we set multiple flag
> > > > bits?  If not then it's not a good use for a bit field.  I'm
> > > > also not sure I understand the value of the "length" field,
> > > > can't it always be calculated from argsz?
> > > 
> > > Agreed, enum would be better. "length" field could be calculated
> > > from argsz. I used it just to avoid offset calculations. May
> > > remove it. 
> > > > > For iommu tlb invalidation from userspace, the "__u8 data[]"
> > > > > stores data which would be parsed by the "struct
> > > > > iommu_tlb_invalidate" defined below.
> > > > >
> > > > > 2. Definitions in include/uapi/linux/iommu.h(newly added
> > > > > header file)
> > > > >
> > > > > /* IOMMU model definition for iommu operations from userspace
> > > > > */ enum iommu_model {
> > > > >   INTLE_IOMMU,
> > > > >   ARM_SMMU,
> > > > >   AMD_IOMMU,
> > > > >   SPAPR_IOMMU,
> > > > >   S390_IOMMU,
> > > > > };
> > > > >
> > > > > struct iommu_tlb_invalidate {
> > > > >   __u32   scope;
> > > > > /* pasid-selective invalidation described by @pasid */
> > > > > #define IOMMU_INVALIDATE_PASID(1 << 0)
> > > > > /* address-selevtive invalidation described by (@vaddr,
> > > > > @size) */ #define IOMMU_INVALIDATE_VADDR  (1 << 1)
> > > > 
> > > > Again, is a bit field appropriate here, can a user set both
> > > > bits?
> > > 
> > > yes, user may set both bits. It would be invalidate address range
> > > which is tagged with a PASID value.
> > >   
> > > > 
> > > > >   __u32   flags;
> > > > > /*  targets non-pasid mappings, @pasid is not valid */
> > > > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0)
> > > > > /* indicating that the pIOMMU doesn't need to invalidate
> > > > >   all intermediate tables cached as part of the PTE for
> > > > >   vaddr, only the last-level entry (pte). This is a
> > > > > hint. */ #define 

Re: [Qemu-devel] [PATCH v5 0/3] Add litmus tests for MTTCG consistency tests

2017-07-19 Thread Philippe Mathieu-Daudé

Hi Pranith,

On 12/01/2016 02:28 AM, Pranith Kumar wrote:

Hello,

The following patch series adds litmus tests to test consistency for
MTTCG enabled qemu. These patches apply on top of the clean up
tests/tcg folder made by my previous patch series.

The tests were generated using the litmus tool. The sources and
instructions on how to generate these sources can be found in this
repository: https://github.com/pranith/qemu-litmus

I tested these on both an x86 and an Aarch64 machine. These tests are
currently enabled for the trusty configuration on travis.

Thanks,
--
Pranith

*** BLURB HERE ***

Pranith Kumar (3):
   tests/tcg: Add i386 litmus test
   tests/tcg: Add aarch64 litmus tests
   travis: Enable litmus tests

  .travis.yml   |8 +
  tests/tcg/aarch64/litmus/ARMARM00.c   |  501 +
  tests/tcg/aarch64/litmus/ARMARM01.c   |  504 +
  tests/tcg/aarch64/litmus/ARMARM02.c   |  571 ++
  tests/tcg/aarch64/litmus/ARMARM03.c   |  498 +
  tests/tcg/aarch64/litmus/ARMARM04+BIS.c   |  556 ++
  tests/tcg/aarch64/litmus/ARMARM04+TER.c   |  538 ++
  tests/tcg/aarch64/litmus/ARMARM04.c   |  556 ++
  tests/tcg/aarch64/litmus/ARMARM05.c   |  553 ++
  tests/tcg/aarch64/litmus/ARMARM06+AP+AA.c |  581 +++
  tests/tcg/aarch64/litmus/ARMARM06+AP+AP.c |  581 +++
  tests/tcg/aarch64/litmus/ARMARM06.c   |  581 +++
  tests/tcg/aarch64/litmus/ARMARM07+SAL.c   |  497 +
  tests/tcg/aarch64/litmus/Makefile |   53 ++
  tests/tcg/aarch64/litmus/README.txt   |   22 +
  tests/tcg/aarch64/litmus/affinity.c   |  159 
  tests/tcg/aarch64/litmus/affinity.h   |   34 +
  tests/tcg/aarch64/litmus/comp.sh  |   30 +
  tests/tcg/aarch64/litmus/litmus_rand.c|   64 ++
  tests/tcg/aarch64/litmus/litmus_rand.h|   29 +
  tests/tcg/aarch64/litmus/outs.c   |  148 
  tests/tcg/aarch64/litmus/outs.h   |   49 ++
  tests/tcg/aarch64/litmus/run.sh   |  378 ++
  tests/tcg/aarch64/litmus/show.awk |2 +
  tests/tcg/aarch64/litmus/utils.c  | 1148 +
  tests/tcg/aarch64/litmus/utils.h  |  275 +++
  tests/tcg/i386/litmus/Makefile|   42 ++


can you add an entry for both folders into MAINTAINERS please?


  tests/tcg/i386/litmus/README.txt  |   22 +
  tests/tcg/i386/litmus/SAL.c   |  491 
  tests/tcg/i386/litmus/affinity.c  |  159 
  tests/tcg/i386/litmus/affinity.h  |   34 +
  tests/tcg/i386/litmus/comp.sh |   10 +
  tests/tcg/i386/litmus/litmus_rand.c   |   64 ++
  tests/tcg/i386/litmus/litmus_rand.h   |   29 +
  tests/tcg/i386/litmus/outs.c  |  148 
  tests/tcg/i386/litmus/outs.h  |   49 ++
  tests/tcg/i386/litmus/run.sh  |   55 ++
  tests/tcg/i386/litmus/show.awk|2 +
  tests/tcg/i386/litmus/utils.c | 1148 +
  tests/tcg/i386/litmus/utils.h |  275 +++
  40 files changed, 11444 insertions(+)
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM00.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM01.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM02.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM03.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM04+BIS.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM04+TER.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM04.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM05.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM06+AP+AA.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM06+AP+AP.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM06.c
  create mode 100644 tests/tcg/aarch64/litmus/ARMARM07+SAL.c
  create mode 100644 tests/tcg/aarch64/litmus/Makefile
  create mode 100644 tests/tcg/aarch64/litmus/README.txt
  create mode 100644 tests/tcg/aarch64/litmus/affinity.c
  create mode 100644 tests/tcg/aarch64/litmus/affinity.h
  create mode 100644 tests/tcg/aarch64/litmus/comp.sh
  create mode 100644 tests/tcg/aarch64/litmus/litmus_rand.c
  create mode 100644 tests/tcg/aarch64/litmus/litmus_rand.h
  create mode 100644 tests/tcg/aarch64/litmus/outs.c
  create mode 100644 tests/tcg/aarch64/litmus/outs.h
  create mode 100755 tests/tcg/aarch64/litmus/run.sh
  create mode 100644 tests/tcg/aarch64/litmus/show.awk
  create mode 100644 tests/tcg/aarch64/litmus/utils.c
  create mode 100644 tests/tcg/aarch64/litmus/utils.h
  create mode 100644 tests/tcg/i386/litmus/Makefile
  create mode 100644 tests/tcg/i386/litmus/README.txt
  create mode 100644 tests/tcg/i386/litmus/SAL.c
  create mode 100644 tests/tcg/i386/litmus/affinity.c
  create mode 100644 tests/tcg/i386/litmus/affinity.h
  create mode 100644 tests/tcg/i386/litmus/comp.sh
  create mode 100644 

Re: [Qemu-devel] [PULL 00/14] tcg-next patch queue

2017-07-19 Thread Richard Henderson

On 07/19/2017 10:33 AM, Philippe Mathieu-Daudé wrote:

On 07/19/2017 04:45 PM, Peter Maydell wrote:

The sparc-linux-user test fails:

/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc 


-L ./gnemul/qemu-sparc sparc/ls -l dummyfile
Inconsistency detected by ld.so: rtld.c: 858: dl_main: Assertion
`_dl_rtld_map.l_prev->l_next == _dl_rtld_map.l_next' failed!
Makefile:6: recipe for target 'test' failed

A valgrind run produces a lot of noise, but
this bit looks suspicious:

==14436==
==14436== Conditional jump or move depends on uninitialised value(s)
==14436==at 0x60003F7C: tcg_out_qemu_st_direct (tcg-target.inc.c:1733)
==14436==by 0x60004295: tcg_out_qemu_st (tcg-target.inc.c:1856)
==14436==by 0x60004F0C: tcg_out_op (tcg-target.inc.c:2140)
==14436==by 0x6000B0FF: tcg_reg_alloc_op (tcg.c:2360)
==14436==by 0x6000BCED: tcg_gen_code (tcg.c:2679)
==14436==by 0x600387B7: tb_gen_code (translate-all.c:1311)
==14436==by 0x6003637B: tb_find (cpu-exec.c:367)
==14436==by 0x60036A7C: cpu_exec (cpu-exec.c:675)
==14436==by 0x60039DA1: cpu_loop (main.c:1088)
==14436==by 0x6003B7AF: main (main.c:4860)
==14436==
==14436== Invalid write of size 4
==14436==at 0x605114FA: ???
==14436==by 0x6011ADDF: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) 


==14436==by 0x6253464F: ???
==14436==by 0x6022852F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) 


==14436==by 0x6022818C: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) 


==14436==by 0x6022852F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) 


==14436==by 0x416: ???
==14436==by 0x60227F1F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc) 


==14436==  Address 0x59d1c7d0 is not stack'd, malloc'd or (recently) free'd
==14436==

Reverting "target/sparc: optimize gen_op_mulscc() using deposit op"
fixed this, so I think that's probably the culprit.


Thank you for taking time with valgrind, I'll verify sparc/tcg opcode used.


A simple typo, Phil,

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 56ef73c794..3bde47be83 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -633,7 +633,7 @@ static inline void gen_op_mulscc
 // b2 = T0 & 1;
 // env->y = (b2 << 31) | (env->y >> 1);
 tcg_gen_extract_tl(t0, cpu_y, 1, 31);
-tcg_gen_deposit_tl(cpu_y, cpu_y, cpu_cc_src, 31, 1);
+tcg_gen_deposit_tl(cpu_y, t0, cpu_cc_src, 31, 1);

 // b1 = N ^ V;
 gen_mov_reg_N(t0, cpu_psr);


I'll respin.


r~



Re: [Qemu-devel] [PULL 00/14] tcg-next patch queue

2017-07-19 Thread Philippe Mathieu-Daudé

On 07/19/2017 04:45 PM, Peter Maydell wrote:

The sparc-linux-user test fails:

/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc
-L ./gnemul/qemu-sparc sparc/ls -l dummyfile
Inconsistency detected by ld.so: rtld.c: 858: dl_main: Assertion
`_dl_rtld_map.l_prev->l_next == _dl_rtld_map.l_next' failed!
Makefile:6: recipe for target 'test' failed

A valgrind run produces a lot of noise, but
this bit looks suspicious:

==14436==
==14436== Conditional jump or move depends on uninitialised value(s)
==14436==at 0x60003F7C: tcg_out_qemu_st_direct (tcg-target.inc.c:1733)
==14436==by 0x60004295: tcg_out_qemu_st (tcg-target.inc.c:1856)
==14436==by 0x60004F0C: tcg_out_op (tcg-target.inc.c:2140)
==14436==by 0x6000B0FF: tcg_reg_alloc_op (tcg.c:2360)
==14436==by 0x6000BCED: tcg_gen_code (tcg.c:2679)
==14436==by 0x600387B7: tb_gen_code (translate-all.c:1311)
==14436==by 0x6003637B: tb_find (cpu-exec.c:367)
==14436==by 0x60036A7C: cpu_exec (cpu-exec.c:675)
==14436==by 0x60039DA1: cpu_loop (main.c:1088)
==14436==by 0x6003B7AF: main (main.c:4860)
==14436==
==14436== Invalid write of size 4
==14436==at 0x605114FA: ???
==14436==by 0x6011ADDF: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x6253464F: ???
==14436==by 0x6022852F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x6022818C: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x6022852F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x416: ???
==14436==by 0x60227F1F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==  Address 0x59d1c7d0 is not stack'd, malloc'd or (recently) free'd
==14436==

Reverting "target/sparc: optimize gen_op_mulscc() using deposit op"
fixed this, so I think that's probably the culprit.


Thank you for taking time with valgrind, I'll verify sparc/tcg opcode used.

Phil.



Re: [Qemu-devel] [PULL v2 00/18] Merge crypto 201/07/18

2017-07-19 Thread Peter Maydell
On 19 July 2017 at 10:15, Daniel P. Berrange  wrote:
> The following changes since commit 6887dc6700ccb7820d8a9d370f421ee361c748e8:
>
>   Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170718' into 
> staging (2017-07-18 21:13:48 +0100)
>
> are available in the git repository at:
>
>   git://github.com/berrange/qemu tags/pull-qcrypto-2017-07-18-2
>
> for you to fetch changes up to c7a9af4b450c863cd84ad245ebc52a831c661392:
>
>   tests: crypto: add hmac speed benchmark support (2017-07-19 10:11:05 +0100)
>
> 
> Merge qcrypto 2017/07/18 v2

Applied, thanks.

-- PMM



[Qemu-devel] [PATCH 3/4] GRETAP Backend for UDST

2017-07-19 Thread anton . ivanov
From: Anton Ivanov 

GRETAP Backend for Universal Datagram Socket Transport

Signed-off-by: Anton Ivanov 
---
 net/Makefile.objs |   2 +-
 net/clients.h |   4 +
 net/gre.c | 311 ++
 net/net.c |   1 +
 qapi-schema.json  |  41 ++-
 qemu-options.hx   |  60 ++-
 6 files changed, 414 insertions(+), 5 deletions(-)
 create mode 100644 net/gre.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index ffdfb96bd0..919bc3d78f 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o
+common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o
 common-obj-$(CONFIG_POSIX) += vhost-user.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
diff --git a/net/clients.h b/net/clients.h
index 5cae479730..8f8a59aee3 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -49,6 +49,10 @@ int net_init_bridge(const Netdev *netdev, const char *name,
 
 int net_init_l2tpv3(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
+
+int net_init_gre(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp);
+
 #ifdef CONFIG_VDE
 int net_init_vde(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp);
diff --git a/net/gre.c b/net/gre.c
new file mode 100644
index 00..7734d78102
--- /dev/null
+++ b/net/gre.c
@@ -0,0 +1,311 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge GREys Limited
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012-2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include "net/net.h"
+#include "clients.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "udst.h"
+
+/* IANA-assigned IP protocol ID for GRE */
+
+
+#ifndef IPPROTO_GRE
+#define IPPROTO_GRE 0x2F
+#endif
+
+#define GRE_MODE_CHECKSUM htons(8 << 12)   /* checksum */
+#define GRE_MODE_RESERVED htons(4 << 12)   /* unused */
+#define GRE_MODE_KEY  htons(2 << 12)   /* KEY present */
+#define GRE_MODE_SEQUENCE htons(1 << 12)   /* no sequence */
+
+
+/* GRE TYPE for Ethernet in GRE aka GRETAP */
+
+#define GRE_IRB htons(0x6558)
+
+struct gre_minimal_header {
+   uint16_t header;
+   uint16_t arptype;
+};
+
+typedef struct GRETunnelParams {
+/*
+ * GRE parameters
+ */
+
+uint32_t rx_key;
+uint32_t tx_key;
+uint32_t sequence;
+
+/* Flags */
+
+bool ipv6;
+bool udp;
+bool has_sequence;
+bool pin_sequence;
+bool checksum;
+bool key;
+
+/* Precomputed GRE specific offsets */
+
+uint32_t key_offset;
+uint32_t sequence_offset;
+uint32_t checksum_offset;
+
+struct gre_minimal_header header_bits;
+
+} GRETunnelParams;
+
+
+
+static void gre_form_header(void *us)
+{
+NetUdstState *s = (NetUdstState *) us;
+GRETunnelParams *p = (GRETunnelParams *) s->params;
+
+uint32_t *sequence;
+
+*((uint32_t *) s->header_buf) = *((uint32_t *) >header_bits);
+
+if (p->key) {
+stl_be_p(
+(uint32_t *) (s->header_buf + p->key_offset),
+p->tx_key
+);
+}
+if (p->has_sequence) {
+sequence = (uint32_t *)(s->header_buf + p->sequence_offset);
+if (p->pin_sequence) {
+*sequence = 0;
+} else {
+stl_be_p(sequence, ++p->sequence);
+}
+}
+}
+
+static int gre_verify_header(void *us, uint8_t *buf)
+{
+
+NetUdstState *s = (NetUdstState 

[Qemu-devel] [PATCH 4/4] Raw Backend for UDST

2017-07-19 Thread anton . ivanov
From: Anton Ivanov 

Raw Socket Backend for Universal Datagram Socket Transport

Signed-off-by: Anton Ivanov 
---
 net/Makefile.objs |   2 +-
 net/clients.h |   3 ++
 net/net.c |   1 +
 net/raw.c | 123 ++
 qapi-schema.json  |  20 -
 qemu-options.hx   |  32 ++
 6 files changed, 178 insertions(+), 3 deletions(-)
 create mode 100644 net/raw.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 919bc3d78f..457297b5ed 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o
+common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o gre.o raw.o
 common-obj-$(CONFIG_POSIX) += vhost-user.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
diff --git a/net/clients.h b/net/clients.h
index 8f8a59aee3..98d8ae59b7 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -53,6 +53,9 @@ int net_init_l2tpv3(const Netdev *netdev, const char *name,
 int net_init_gre(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
 
+int net_init_raw(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp);
+
 #ifdef CONFIG_VDE
 int net_init_vde(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp);
diff --git a/net/net.c b/net/net.c
index 6163a8a3af..8eb0aa2bee 100644
--- a/net/net.c
+++ b/net/net.c
@@ -963,6 +963,7 @@ static int (* const 
net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_UDST
 [NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3,
 [NET_CLIENT_DRIVER_GRE] = net_init_gre,
+[NET_CLIENT_DRIVER_RAW] = net_init_raw,
 #endif
 };
 
diff --git a/net/raw.c b/net/raw.c
new file mode 100644
index 00..8f73248095
--- /dev/null
+++ b/net/raw.c
@@ -0,0 +1,123 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2012-2014 Cisco Systems
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include 
+#include 
+#include "net/net.h"
+#include 
+#include 
+#include 
+#include "clients.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "udst.h"
+
+static int noop(void *us, uint8_t *buf)
+{
+return 0;
+}
+
+int net_init_raw(const Netdev *netdev,
+const char *name,
+NetClientState *peer, Error **errp)
+{
+
+const NetdevRawOptions *raw;
+NetUdstState *s;
+NetClientState *nc;
+
+int fd = -1;
+int err;
+
+struct ifreq ifr;
+struct sockaddr_ll sock;
+
+
+nc = qemu_new_udst_net_client(name, peer);
+
+s = DO_UPCAST(NetUdstState, nc, nc);
+
+fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
+if (fd == -1) {
+err = -errno;
+error_report("raw_open : raw socket creation failed, errno = %d", 
-err);
+goto outerr;
+}
+
+
+s->dgram_dst = NULL;
+s->dst_size = 0;
+
+assert(netdev->type == NET_CLIENT_DRIVER_RAW);
+raw = >u.raw;
+
+memset(, 0, sizeof(struct ifreq));
+strncpy((char *) _name, raw->ifname, sizeof(ifr.ifr_name) - 1);
+
+if (ioctl(fd, SIOCGIFINDEX, (void *) ) < 0) {
+err = -errno;
+error_report("SIOCGIFINDEX, failed to get raw interface index for %s",
+raw->ifname);
+goto outerr;
+}
+
+sock.sll_family = AF_PACKET;
+sock.sll_protocol = htons(ETH_P_ALL);
+sock.sll_ifindex = ifr.ifr_ifindex;
+
+if (bind(fd, (struct 

[Qemu-devel] [PATCH 2/4] Migrate l2tpv3 to UDST Backend

2017-07-19 Thread anton . ivanov
From: Anton Ivanov 

Migrate L2TPv3 transport to the Unified Datagram Socket
Transport Backend.

Signed-off-by: Anton Ivanov 
---
 net/l2tpv3.c | 537 +--
 1 file changed, 83 insertions(+), 454 deletions(-)

diff --git a/net/l2tpv3.c b/net/l2tpv3.c
index 6745b78990..25b7628244 100644
--- a/net/l2tpv3.c
+++ b/net/l2tpv3.c
@@ -1,6 +1,7 @@
 /*
  * QEMU System Emulator
  *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
  * Copyright (c) 2003-2008 Fabrice Bellard
  * Copyright (c) 2012-2014 Cisco Systems
  *
@@ -30,23 +31,14 @@
 #include "clients.h"
 #include "qemu-common.h"
 #include "qemu/error-report.h"
+#include "qapi/error.h"
 #include "qemu/option.h"
 #include "qemu/sockets.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
+#include "udst.h"
 
 
-/* The buffer size needs to be investigated for optimum numbers and
- * optimum means of paging in on different systems. This size is
- * chosen to be sufficient to accommodate one packet with some headers
- */
-
-#define BUFFER_ALIGN sysconf(_SC_PAGESIZE)
-#define BUFFER_SIZE 2048
-#define IOVSIZE 2
-#define MAX_L2TPV3_MSGCNT 64
-#define MAX_L2TPV3_IOVCNT (MAX_L2TPV3_MSGCNT * IOVSIZE)
-
 /* Header set to 0x3 signifies a data packet */
 
 #define L2TPV3_DATA_PACKET 0x3
@@ -57,31 +49,7 @@
 #define IPPROTO_L2TP 0x73
 #endif
 
-typedef struct NetL2TPV3State {
-NetClientState nc;
-int fd;
-
-/*
- * these are used for xmit - that happens packet a time
- * and for first sign of life packet (easier to parse that once)
- */
-
-uint8_t *header_buf;
-struct iovec *vec;
-
-/*
- * these are used for receive - try to "eat" up to 32 packets at a time
- */
-
-struct mmsghdr *msgvec;
-
-/*
- * peer address
- */
-
-struct sockaddr_storage *dgram_dst;
-uint32_t dst_size;
-
+typedef struct L2TPV3TunnelParams {
 /*
  * L2TPv3 parameters
  */
@@ -90,37 +58,8 @@ typedef struct NetL2TPV3State {
 uint64_t tx_cookie;
 uint32_t rx_session;
 uint32_t tx_session;
-uint32_t header_size;
 uint32_t counter;
 
-/*
-* DOS avoidance in error handling
-*/
-
-bool header_mismatch;
-
-/*
- * Ring buffer handling
- */
-
-int queue_head;
-int queue_tail;
-int queue_depth;
-
-/*
- * Precomputed offsets
- */
-
-uint32_t offset;
-uint32_t cookie_offset;
-uint32_t counter_offset;
-uint32_t session_offset;
-
-/* Poll Control */
-
-bool read_poll;
-bool write_poll;
-
 /* Flags */
 
 bool ipv6;
@@ -130,189 +69,62 @@ typedef struct NetL2TPV3State {
 bool cookie;
 bool cookie_is_64;
 
-} NetL2TPV3State;
-
-static void net_l2tpv3_send(void *opaque);
-static void l2tpv3_writable(void *opaque);
-
-static void l2tpv3_update_fd_handler(NetL2TPV3State *s)
-{
-qemu_set_fd_handler(s->fd,
-s->read_poll ? net_l2tpv3_send : NULL,
-s->write_poll ? l2tpv3_writable : NULL,
-s);
-}
-
-static void l2tpv3_read_poll(NetL2TPV3State *s, bool enable)
-{
-if (s->read_poll != enable) {
-s->read_poll = enable;
-l2tpv3_update_fd_handler(s);
-}
-}
+/* Precomputed L2TPV3 specific offsets */
+uint32_t cookie_offset;
+uint32_t counter_offset;
+uint32_t session_offset;
 
-static void l2tpv3_write_poll(NetL2TPV3State *s, bool enable)
-{
-if (s->write_poll != enable) {
-s->write_poll = enable;
-l2tpv3_update_fd_handler(s);
-}
-}
+} L2TPV3TunnelParams;
 
-static void l2tpv3_writable(void *opaque)
-{
-NetL2TPV3State *s = opaque;
-l2tpv3_write_poll(s, false);
-qemu_flush_queued_packets(>nc);
-}
 
-static void l2tpv3_send_completed(NetClientState *nc, ssize_t len)
-{
-NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
-l2tpv3_read_poll(s, true);
-}
 
-static void l2tpv3_poll(NetClientState *nc, bool enable)
+static void l2tpv3_form_header(void *us)
 {
-NetL2TPV3State *s = DO_UPCAST(NetL2TPV3State, nc, nc);
-l2tpv3_write_poll(s, enable);
-l2tpv3_read_poll(s, enable);
-}
+NetUdstState *s = (NetUdstState *) us;
+L2TPV3TunnelParams *p = (L2TPV3TunnelParams *) s->params;
 
-static void l2tpv3_form_header(NetL2TPV3State *s)
-{
 uint32_t *counter;
 
-if (s->udp) {
+if (p->udp) {
 stl_be_p((uint32_t *) s->header_buf, L2TPV3_DATA_PACKET);
 }
 stl_be_p(
-(uint32_t *) (s->header_buf + s->session_offset),
-s->tx_session
+(uint32_t *) (s->header_buf + p->session_offset),
+p->tx_session
 );
-if (s->cookie) {
-if (s->cookie_is_64) {
+if (p->cookie) {
+if (p->cookie_is_64) {
 stq_be_p(
-(uint64_t *)(s->header_buf + s->cookie_offset),
-s->tx_cookie
+(uint64_t *)(s->header_buf + 

[Qemu-devel] Revised Unified Datagram Socket Transport patchset

2017-07-19 Thread anton . ivanov
Hi Jason, hi list,

Follows a revised patchset. I have addressed most comments.

TODO: replace memcpy with dup where applicable
TODO: add force v4 option
TODO: port the UDP portion of the existing socket transport
to the new infrastructure

Future: add sendmmsg once a "bulk xmit" has been arranged
on the QEMU hw and/or lower network subsystem layers side.





[Qemu-devel] [PATCH 1/4] Unified Datagram Socket Transports

2017-07-19 Thread anton . ivanov
From: Anton Ivanov 

Basic infrastructure to start moving datagram based transports
to a common infrastructure as well as introduce several
additional transports.

Signed-off-by: Anton Ivanov 
---
 configure |  12 +-
 net/Makefile.objs |   2 +-
 net/net.c |   4 +-
 net/udst.c| 420 ++
 net/udst.h| 121 
 qapi-schema.json  |  19 ++-
 qemu-options.hx   |   2 +-
 7 files changed, 569 insertions(+), 11 deletions(-)
 create mode 100644 net/udst.c
 create mode 100644 net/udst.h

diff --git a/configure b/configure
index bad50f5368..00c911c49b 100755
--- a/configure
+++ b/configure
@@ -1862,7 +1862,9 @@ if ! compile_object -Werror ; then
 fi
 
 ##
-# L2TPV3 probe
+# UDST probe
+# identical to L2TPv3 probe used for both
+# during migration of L2TPv3 to udst backend
 
 cat > $TMPC <
@@ -1870,9 +1872,9 @@ cat > $TMPC <> $config_host_mak
 fi
-if test "$l2tpv3" = "yes" ; then
-  echo "CONFIG_L2TPV3=y" >> $config_host_mak
+if test "$udst" = "yes" ; then
+  echo "CONFIG_UDST=y" >> $config_host_mak
 fi
 if test "$cap_ng" = "yes" ; then
   echo "CONFIG_LIBCAP=y" >> $config_host_mak
diff --git a/net/Makefile.objs b/net/Makefile.objs
index 67ba5e26fb..ffdfb96bd0 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -2,7 +2,7 @@ common-obj-y = net.o queue.o checksum.o util.o hub.o
 common-obj-y += socket.o
 common-obj-y += dump.o
 common-obj-y += eth.o
-common-obj-$(CONFIG_L2TPV3) += l2tpv3.o
+common-obj-$(CONFIG_UDST) += udst.o l2tpv3.o
 common-obj-$(CONFIG_POSIX) += vhost-user.o
 common-obj-$(CONFIG_SLIRP) += slirp.o
 common-obj-$(CONFIG_VDE) += vde.o
diff --git a/net/net.c b/net/net.c
index 0e28099554..723a256260 100644
--- a/net/net.c
+++ b/net/net.c
@@ -960,8 +960,8 @@ static int (* const 
net_client_init_fun[NET_CLIENT_DRIVER__MAX])(
 #ifdef CONFIG_VHOST_NET_USED
 [NET_CLIENT_DRIVER_VHOST_USER] = net_init_vhost_user,
 #endif
-#ifdef CONFIG_L2TPV3
-[NET_CLIENT_DRIVER_L2TPV3]= net_init_l2tpv3,
+#ifdef CONFIG_UDST
+[NET_CLIENT_DRIVER_L2TPV3] = net_init_l2tpv3,
 #endif
 };
 
diff --git a/net/udst.c b/net/udst.c
new file mode 100644
index 00..612c90cb3a
--- /dev/null
+++ b/net/udst.c
@@ -0,0 +1,420 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2015-2017 Cambridge Greys Limited
+ * Copyright (c) 2012-2014 Cisco Systems
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+/*
+ * Udst Datagram Socket Transport Backend
+ * This transport is not intended to be initiated directly by an end-user
+ * It is used as a backend for other transports which use recv/sendmmsg
+ * socket functions for RX/TX.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include "net/net.h"
+#include "clients.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "udst.h"
+
+static void net_udst_send(void *opaque);
+static void udst_writable(void *opaque);
+
+static void udst_update_fd_handler(NetUdstState *s)
+{
+qemu_set_fd_handler(s->fd,
+s->read_poll ? net_udst_send : NULL,
+s->write_poll ? udst_writable : NULL,
+s);
+}
+
+static void udst_read_poll(NetUdstState *s, bool enable)
+{
+if (s->read_poll != enable) {
+s->read_poll = enable;
+udst_update_fd_handler(s);
+}
+}
+
+static void udst_write_poll(NetUdstState *s, bool enable)
+{
+if (s->write_poll != enable) {
+s->write_poll = enable;
+udst_update_fd_handler(s);
+}
+}
+
+static void udst_writable(void *opaque)
+{
+NetUdstState *s = opaque;
+udst_write_poll(s, false);
+qemu_flush_queued_packets(>nc);
+}
+
+static void 

Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"

2017-07-19 Thread Eduardo Habkost
On Wed, Jul 19, 2017 at 08:17:49PM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Habkost (ehabk...@redhat.com) wrote:
> > On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote:
> > > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote:
> > > >> It doesn't.  Perhaps we should add that as a future libvirt-qemu.so API
> > > >> addition, although it's probably easier to just use QMP than HMP when
> > > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want.
> > > > 
> > > > Or special case the "cpu 1" command - ie notice that it is being
> > > > requested and don't execute 'human-montor-command'. Instead just
> > > > record the CPU index, and use that for future "human-monitor-command"
> > > > invokations, so we get full compat with the (dubious) stateful HMP
> > > > semantics that traditionally existed.
> > > 
> > > Is 'cpu' (and the followup commands affected by it) the only stateful
> > > HMP command pairing?  Is there a way to specify multiple HMP commands in
> > > a single human-monitor-command QMP call?
> > > 
> > > Indeed, tweaking qemu's human-monitor-command call to track the state
> > > might be cleaner than having libvirt have to tweak API to work around
> > > this wart of HMP.
> > 
> > The CPU index was the only state kept by the human monitor, and I
> > think it's by design that it stopped being considered "monitor
> > state" to be tracked, and became just an argument to
> > human-monitor-command.
> > 
> > It's true that it broke compatibility of
> >   "virsh qemu-monitor-command  --hmp 'cpu '",
> > when we moved to QMP, but this happened years ago, and it looks
> > like nobody was relying on it.  I don't see the point of trying
> > to emulate the previous stateful interface.
> 
> IMHO Yi's fix (once reworked) is the right fix - it removes the
> use of that piece of state, when the optional parameter is used.
> (OK, so it needs rework not to change that state and to
> come to some agreement as to what to use instead of cpu index number
> etc).

Agreed, as it helps us to keep the "virsh qemu-monitor-command"
interface simpler.  But we have 8 commands that use
mon_get_cpu(), we shouldn't fix only "info lapic".

-- 
Eduardo



Re: [Qemu-devel] [PULL 00/14] tcg-next patch queue

2017-07-19 Thread Peter Maydell
On 19 July 2017 at 05:57, Richard Henderson  wrote:
> This edition is a real mix:
>   * Code gen improvement for mips64 host (Jiang)
>   * Build fix for ppc-linux (Philippe)
>   * Runtime fix for tci (Philippe)
>   * Fix atomic helper names in debugging dumps (rth)
>
>   * Cross-target tcg code gen improvements (Philippe)
> This one had no obvious tree through which it should go,
> so I went ahead and took them all.
>
>   * Cherry-picked the first patch from Lluis' generic translate loop,
> wherein the interface to gen_intermediate_code changes trivially.
> It's the only patch from that series that touches all targets,
> and I see little point carrying it around further.
>
>
> r~
>
>
> The following changes since commit 6887dc6700ccb7820d8a9d370f421ee361c748e8:
>
>   Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170718' into 
> staging (2017-07-18 21:13:48 +0100)
>
> are available in the git repository at:
>
>   git://github.com/rth7680/qemu.git tags/pull-tcg-20170718
>
> for you to fetch changes up to 3d48caee9e2c18385be60bb0467fa1f61d325c64:
>
>   tcg: Pass generic CPUState to gen_intermediate_code() (2017-07-18 14:26:13 
> -1000)
>
> 
> Queued tcg and tcg code gen related cleanups
>

The sparc-linux-user test fails:

/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc
-L ./gnemul/qemu-sparc sparc/ls -l dummyfile
Inconsistency detected by ld.so: rtld.c: 858: dl_main: Assertion
`_dl_rtld_map.l_prev->l_next == _dl_rtld_map.l_next' failed!
Makefile:6: recipe for target 'test' failed

A valgrind run produces a lot of noise, but
this bit looks suspicious:

==14436==
==14436== Conditional jump or move depends on uninitialised value(s)
==14436==at 0x60003F7C: tcg_out_qemu_st_direct (tcg-target.inc.c:1733)
==14436==by 0x60004295: tcg_out_qemu_st (tcg-target.inc.c:1856)
==14436==by 0x60004F0C: tcg_out_op (tcg-target.inc.c:2140)
==14436==by 0x6000B0FF: tcg_reg_alloc_op (tcg.c:2360)
==14436==by 0x6000BCED: tcg_gen_code (tcg.c:2679)
==14436==by 0x600387B7: tb_gen_code (translate-all.c:1311)
==14436==by 0x6003637B: tb_find (cpu-exec.c:367)
==14436==by 0x60036A7C: cpu_exec (cpu-exec.c:675)
==14436==by 0x60039DA1: cpu_loop (main.c:1088)
==14436==by 0x6003B7AF: main (main.c:4860)
==14436==
==14436== Invalid write of size 4
==14436==at 0x605114FA: ???
==14436==by 0x6011ADDF: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x6253464F: ???
==14436==by 0x6022852F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x6022818C: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x6022852F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==by 0x416: ???
==14436==by 0x60227F1F: ??? (in
/home/petmay01/linaro/qemu-for-merges/build/all-linux-static/sparc-linux-user/qemu-sparc)
==14436==  Address 0x59d1c7d0 is not stack'd, malloc'd or (recently) free'd
==14436==

Reverting "target/sparc: optimize gen_op_mulscc() using deposit op"
fixed this, so I think that's probably the culprit.

thanks
-- PMM



Re: [Qemu-devel] Fwd: [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init

2017-07-19 Thread Marcel Apfelbaum

On 19/07/2017 21:56, Konrad Rzeszutek Wilk wrote:

On Wed, Jul 19, 2017 at 09:38:50PM +0300, Alexander Bezzubikov wrote:

2017-07-19 21:18 GMT+03:00 Konrad Rzeszutek Wilk :


On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote:

ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk <

konrad.w...@oracle.com>:



On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote:

Now PCI bridges (and PCIE root port too) get a bus range number in

system init,

basing on currently plugged devices. That's why when one wants to

hotplug another bridge,

it needs his child bus, which the parent is unable to provide.


Could you explain how you trigger this?



I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux

says

'cannot allocate bus number for device bla-bla'. This obviously does not
allow me to use the bridge at all.





The suggested workaround is to have vendor-specific capability in

RedHat

generic pcie-root-port

that contains number of additional bus to reserve on BIOS PCI init.


But wouldn't the proper fix be for the PCI bridge to have the

subordinate

value be extended to fit more bus ranges?



What do you mean? This is what I'm trying to do. Do you suppose to get

rid

of vendor-specific cap and use original register value instead of it?


I would suggest a simple fix - each bridge has a a number of bus devices
it can use. You have up to 255 - so you split the number of northbridge
numbers by the amount of NUMA nodes (if that is used) - so for example
if you have 4 NUMA nodes, each bridge would cover 63 bus numbers.

Meaning the root bridge would cover 0->63 bus, 64->128, and so on.
That gives you enough space to plug in your plugged in devices
(up to 63).

And if you need sub-briges then carve out a specific range.





Hi Konrad,


The problem is that we don't know at the init moment how many subbridges we
may need,




Is possible the explanation was not clear clear and led to
some miscommunication.

   
And the explanation above does not either. It just setups at init time

an range where you can plug in your new devices in. But in a more uniform
way such that you can also utilize this with NUMA and _PXM topology
in the future.



I fully agree with you and actually QEMU has already implemented the
exact idea you are describing here, its called a pxb/pxb-pci device,
that can be "bounded" to a specific NUMA node and has a subrange of bus
numbers dedicated to it.

However this problem is different. In a PCI Express machine you
can hotplug PCIe devices only into PCIe Root Ports (or switch
downstream ports, but not in current scope).

We want to be able to hotplug a PCIe-PCI bridge into a PCIe Root Port
so we can then hot-plug legacy PCI devices.

Since the PCIe Root Port is a type of PCI bridge, at boot time
it only gets the bus sub-range (primary bus,subordinate bus]
which is computed by firmware and leaves no bus number that
can be used by a hot-plugged pci-bridge. And this obviously
does not depend on how we arrange NUMA/proximities.

We are also not looking for a fix for a specific guest OS,
so reserving some extra bus-numbers it has minimal impact
on the system. I do agree the problem may be solved differently,
however we can't reach all guest OS vendors and ask them to
support an alternative solution in a reasonable time frame.

Thanks,
Marcel


and how deep the whole device tree will be. The key moment - PCI bridge
hotplugging
needs either rescan all buses on each bridge device addition, or reserve
space in advance during BIOS init.


can all buses on each bridge device addition, or reserve
 
It is more complex than that - you may need to move devices that are

below you. And Linux kernel (nor any other OS) can handle that.
(They can during bootup)


In this series the second way was chosen.











Aleksandr Bezzubikov (2):
   pci: add support for direct usage of bdf for capability lookup
   pci: enable RedHat pci bridges to reserve more buses

  src/fw/pciinit.c   | 12 ++--
  src/hw/pcidevice.c | 24 
  src/hw/pcidevice.h |  1 +
  3 files changed, 35 insertions(+), 2 deletions(-)

--
2.7.4





--
Alexander Bezzubikov






--
Alexander Bezzubikov





Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"

2017-07-19 Thread Dr. David Alan Gilbert
* Eduardo Habkost (ehabk...@redhat.com) wrote:
> On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote:
> > On 07/19/2017 10:07 AM, Daniel P. Berrange wrote:
> > >> It doesn't.  Perhaps we should add that as a future libvirt-qemu.so API
> > >> addition, although it's probably easier to just use QMP than HMP when
> > >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want.
> > > 
> > > Or special case the "cpu 1" command - ie notice that it is being
> > > requested and don't execute 'human-montor-command'. Instead just
> > > record the CPU index, and use that for future "human-monitor-command"
> > > invokations, so we get full compat with the (dubious) stateful HMP
> > > semantics that traditionally existed.
> > 
> > Is 'cpu' (and the followup commands affected by it) the only stateful
> > HMP command pairing?  Is there a way to specify multiple HMP commands in
> > a single human-monitor-command QMP call?
> > 
> > Indeed, tweaking qemu's human-monitor-command call to track the state
> > might be cleaner than having libvirt have to tweak API to work around
> > this wart of HMP.
> 
> The CPU index was the only state kept by the human monitor, and I
> think it's by design that it stopped being considered "monitor
> state" to be tracked, and became just an argument to
> human-monitor-command.
> 
> It's true that it broke compatibility of
>   "virsh qemu-monitor-command  --hmp 'cpu '",
> when we moved to QMP, but this happened years ago, and it looks
> like nobody was relying on it.  I don't see the point of trying
> to emulate the previous stateful interface.

IMHO Yi's fix (once reworked) is the right fix - it removes the
use of that piece of state, when the optional parameter is used.
(OK, so it needs rework not to change that state and to
come to some agreement as to what to use instead of cpu index number
etc).

Dave

> -- 
> Eduardo
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] Fwd: [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init

2017-07-19 Thread Konrad Rzeszutek Wilk
On Wed, Jul 19, 2017 at 09:38:50PM +0300, Alexander Bezzubikov wrote:
> 2017-07-19 21:18 GMT+03:00 Konrad Rzeszutek Wilk :
> 
> > On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote:
> > > ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk <
> > konrad.w...@oracle.com>:
> > >
> > > > On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote:
> > > > > Now PCI bridges (and PCIE root port too) get a bus range number in
> > > > system init,
> > > > > basing on currently plugged devices. That's why when one wants to
> > > > hotplug another bridge,
> > > > > it needs his child bus, which the parent is unable to provide.
> > > >
> > > > Could you explain how you trigger this?
> > >
> > >
> > > I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux
> > says
> > > 'cannot allocate bus number for device bla-bla'. This obviously does not
> > > allow me to use the bridge at all.
> > >
> > > >
> > > >
> > > > > The suggested workaround is to have vendor-specific capability in
> > RedHat
> > > > generic pcie-root-port
> > > > > that contains number of additional bus to reserve on BIOS PCI init.
> > > >
> > > > But wouldn't the proper fix be for the PCI bridge to have the
> > subordinate
> > > > value be extended to fit more bus ranges?
> > >
> > >
> > > What do you mean? This is what I'm trying to do. Do you suppose to get
> > rid
> > > of vendor-specific cap and use original register value instead of it?
> >
> > I would suggest a simple fix - each bridge has a a number of bus devices
> > it can use. You have up to 255 - so you split the number of northbridge
> > numbers by the amount of NUMA nodes (if that is used) - so for example
> > if you have 4 NUMA nodes, each bridge would cover 63 bus numbers.
> >
> > Meaning the root bridge would cover 0->63 bus, 64->128, and so on.
> > That gives you enough space to plug in your plugged in devices
> > (up to 63).
> >
> > And if you need sub-briges then carve out a specific range.
> >
> 
> The problem is that we don't know at the init moment how many subbridges we
> may need,


  
And the explanation above does not either. It just setups at init time  

an range where you can plug in your new devices in. But in a more uniform   

way such that you can also utilize this with NUMA and _PXM topology 

in the future.   

> and how deep the whole device tree will be. The key moment - PCI bridge
> hotplugging
> needs either rescan all buses on each bridge device addition, or reserve
> space in advance during BIOS init.

can all buses on each bridge device addition, or reserve
  


It is more complex than that - you may need to move devices that are

below you. And Linux kernel (nor any other OS) can handle that. 

(They can during bootup)   

> In this series the second way was chosen.
> 
> 
> >
> >
> > >
> > > >
> > > > >
> > > > > Aleksandr Bezzubikov (2):
> > > > >   pci: add support for direct usage of bdf for capability lookup
> > > > >   pci: enable RedHat pci bridges to reserve more buses
> > > > >
> > > > >  src/fw/pciinit.c   | 12 ++--
> > > > >  src/hw/pcidevice.c | 24 
> > > > >  src/hw/pcidevice.h |  1 +
> > > > >  3 files changed, 35 insertions(+), 2 deletions(-)
> > > > >
> > > > > --
> > > > > 2.7.4
> > > > >
> > > > >
> > > >
> > > --
> > > Alexander Bezzubikov
> >
> 
> 
> 
> -- 
> Alexander Bezzubikov



Re: [Qemu-devel] [PATCH v2 1/3] qemu.py: fix is_running()

2017-07-19 Thread Eduardo Habkost
On Wed, Jul 19, 2017 at 03:34:47PM -0300, Eduardo Habkost wrote:
> On Wed, Jul 19, 2017 at 06:31:06PM +0200, Amador Pahim wrote:
> > Current implementation is broken. It does not really test if the child
> > process is running.
> > 
> > The Popen.returncode will only be set after by a poll(), wait() or
> > communicate(). If the Popen fails to launch a VM, the Popen.returncode
> > will not turn to None by itself.
> > 
> > Instead of using Popen.returncode, let's use Popen.poll(), which
> > actually checks if child process has terminated.
> > 
> > Signed-off-by: Amador Pahim 
> 
> I vaguely remember I had a version of that code using poll() and
> it broke scripts for some reason.  I will try to find out why, so
> we can either fix the script or document the reason why poll()
> isn't a good choice here.

Thanks to git reflog, I found the original "fix" I had in my WIP
tree:

251fc73 work/device-crash-script@{71}: commit: fixup! qemu.py: Don't set 
_popen=None on error/shutdown
diff --git a/scripts/qemu.py b/scripts/qemu.py
index 4dae811..cbc9e2a 100644
--- a/scripts/qemu.py
+++ b/scripts/qemu.py
@@ -86,7 +86,7 @@ class QEMUMachine(object):
 raise

 def is_running(self):
-return self._popen and (self._popen.poll() is None)
+return self._popen and (self._popen.returncode is None)

 def exitcode(self):
 if self._popen:
@@ -137,6 +137,7 @@ class QEMUMachine(object):
 except:
 if self.is_running():
 self._popen.kill()
+self._popen.wait()
 self._load_io_log()
 self._post_shutdown()
 raise

The original bug was like this: if QEMU process took a little
longer to be actually terminated after self._popen.kill() was
called, it triggering post-shutdown code inside shutdown()
(because is_running() was still True), causing the following
exception:

Traceback (most recent call last):
  File "./scripts/device-crash-test.py", line 528, in 
sys.exit(main())
  File "./scripts/device-crash-test.py", line 487, in main
f = checkOneCase(args, t)
  File "./scripts/device-crash-test.py", line 320, in checkOneCase
vm.shutdown()
  File "/home/ehabkost/rh/proj/virt/qemu/scripts/qemu.py", line 156, in shutdown
self._load_io_log()
  File "/home/ehabkost/rh/proj/virt/qemu/scripts/qemu.py", line 101, in 
_load_io_log
with open(self._qemu_log_path, "r") as fh:
IOError: [Errno 2] No such file or directory: '/var/tmp/qemu-23568.log'


My fix was incorrect: the actual bug was the missing
self._popen.wait() call after self._popen.kill(), not the
self._popen.poll() call.  Your fix looks good and
device-crash-test is not crashing.

Reviewed-by: Eduardo Habkost 

> 
> > ---
> >  scripts/qemu.py | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/scripts/qemu.py b/scripts/qemu.py
> > index 880e3e8219..f0fade32bd 100644
> > --- a/scripts/qemu.py
> > +++ b/scripts/qemu.py
> > @@ -86,7 +86,7 @@ class QEMUMachine(object):
> >  raise
> >  
> >  def is_running(self):
> > -return self._popen and (self._popen.returncode is None)
> > +return self._popen and (self._popen.poll() is None)
> >  
> >  def exitcode(self):
> >  if self._popen is None:
> > -- 
> > 2.13.3
> > 
> 
> -- 
> Eduardo

-- 
Eduardo



Re: [Qemu-devel] [PATCH] ide: check BlockBackend object in ide_cancel_dma_sync

2017-07-19 Thread John Snow


On 07/14/2017 06:00 AM, P J P wrote:
> From: Prasad J Pandit 
> 
> When cancelling pending DMA requests in ide_cancel_dma_sync,
> the s->blk object could be null, leading to a null dereference.
> Add check to avoid it.
> 
> Reported-by: Chensongnian 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/ide/core.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 0b48b64..04474b3 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -681,8 +681,10 @@ void ide_cancel_dma_sync(IDEState *s)
>  #ifdef DEBUG_IDE
>  printf("%s: draining all remaining requests", __func__);
>  #endif
> -blk_drain(s->blk);
> -assert(s->bus->dma->aiocb == NULL);
> +if (s->blk) {
> +blk_drain(s->blk);
> +assert(s->bus->dma->aiocb == NULL);
> +}
>  }
>  }
>  
> 

I guess this occurs through

ide_exec_cmd
  cmd_device_reset
ide_cancel_dma_sync

though if s->blk does not exist, we should usually not be able to
address this device with a reset command as such. (core.c:2021) -- but
this is only for secondary devices. I guess we don't guard against
nonexistent primary devices..?

Further, how do we have s->bus->dma->aiocb if there's no blk device?
What DMA request did we accept...?

Can you please submit a stack that illustrates the code path followed so
this fix can be properly verified and tested?

Thanks,
--John



Re: [Qemu-devel] [PATCH v5 10/17] migration: Create ram_multifd_page

2017-07-19 Thread Dr. David Alan Gilbert
* Juan Quintela (quint...@redhat.com) wrote:
> The function still don't use multifd, but we have simplified
> ram_save_page, xbzrle and RDMA stuff is gone.  We have added a new
> counter and a new flag for this type of pages.
> 
> Signed-off-by: Juan Quintela 
> ---
>  hmp.c |  2 ++
>  migration/migration.c |  1 +
>  migration/ram.c   | 90 
> ++-
>  qapi-schema.json  |  5 ++-
>  4 files changed, 96 insertions(+), 2 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index b01605a..eeb308b 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -234,6 +234,8 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
>  monitor_printf(mon, "postcopy request count: %" PRIu64 "\n",
> info->ram->postcopy_requests);
>  }
> +monitor_printf(mon, "multifd: %" PRIu64 " pages\n",
> +   info->ram->multifd);
>  }
>  
>  if (info->has_disk) {
> diff --git a/migration/migration.c b/migration/migration.c
> index e1c79d5..d9d5415 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -528,6 +528,7 @@ static void populate_ram_info(MigrationInfo *info, 
> MigrationState *s)
>  info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
>  info->ram->postcopy_requests = ram_counters.postcopy_requests;
>  info->ram->page_size = qemu_target_page_size();
> +info->ram->multifd = ram_counters.multifd;
>  
>  if (migrate_use_xbzrle()) {
>  info->has_xbzrle_cache = true;
> diff --git a/migration/ram.c b/migration/ram.c
> index b80f511..2bf3fa7 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -68,6 +68,7 @@
>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>  /* 0x80 is reserved in migration.h start with 0x100 next */
>  #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
> +#define RAM_SAVE_FLAG_MULTIFD_PAGE 0x200
>  
>  static inline bool is_zero_range(uint8_t *p, uint64_t size)
>  {
> @@ -362,12 +363,17 @@ static void compress_threads_save_setup(void)
>  /* Multiple fd's */
>  
>  struct MultiFDSendParams {
> +/* not changed */
>  uint8_t id;
>  QemuThread thread;
>  QIOChannel *c;
>  QemuSemaphore sem;
>  QemuMutex mutex;
> +/* protected by param mutex */
>  bool quit;

Should probably comment to say what address space address is in - this
is really a qemu pointer - and that's why we can treat 0 as special?

> +uint8_t *address;
> +/* protected by multifd mutex */
> +bool done;

done needs a comment to explain what it is because
it sounds similar to quit;  I think 'done' is saying that
the thread is idle having done what was asked?

>  };
>  typedef struct MultiFDSendParams MultiFDSendParams;
>  
> @@ -375,6 +381,8 @@ struct {
>  MultiFDSendParams *params;
>  /* number of created threads */
>  int count;
> +QemuMutex mutex;
> +QemuSemaphore sem;
>  } *multifd_send_state;
>  
>  static void terminate_multifd_send_threads(void)
> @@ -443,6 +451,7 @@ static void *multifd_send_thread(void *opaque)
>  } else {
>  qio_channel_write(p->c, string, MULTIFD_UUID_MSG, _abort);
>  }
> +qemu_sem_post(_send_state->sem);
>  
>  while (!exit) {
>  qemu_mutex_lock(>mutex);
> @@ -450,6 +459,15 @@ static void *multifd_send_thread(void *opaque)
>  qemu_mutex_unlock(>mutex);
>  break;
>  }
> +if (p->address) {
> +p->address = 0;
> +qemu_mutex_unlock(>mutex);
> +qemu_mutex_lock(_send_state->mutex);
> +p->done = true;
> +qemu_mutex_unlock(_send_state->mutex);
> +qemu_sem_post(_send_state->sem);
> +continue;
> +}
>  qemu_mutex_unlock(>mutex);
>  qemu_sem_wait(>sem);
>  }
> @@ -469,6 +487,8 @@ int multifd_save_setup(void)
>  multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
>  multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
>  multifd_send_state->count = 0;
> +qemu_mutex_init(_send_state->mutex);
> +qemu_sem_init(_send_state->sem, 0);
>  for (i = 0; i < thread_count; i++) {
>  char thread_name[16];
>  MultiFDSendParams *p = _send_state->params[i];
> @@ -477,6 +497,8 @@ int multifd_save_setup(void)
>  qemu_sem_init(>sem, 0);
>  p->quit = false;
>  p->id = i;
> +p->done = true;
> +p->address = 0;
>  p->c = socket_send_channel_create();
>  if (!p->c) {
>  error_report("Error creating a send channel");
> @@ -491,6 +513,30 @@ int multifd_save_setup(void)
>  return 0;
>  }
>  
> +static int multifd_send_page(uint8_t *address)
> +{
> +int i;
> +MultiFDSendParams *p = NULL; /* make happy gcc */
> +
> +qemu_sem_wait(_send_state->sem);
> +qemu_mutex_lock(_send_state->mutex);
> +for (i = 0; i < multifd_send_state->count; i++) {
> +p = 

Re: [Qemu-devel] [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init

2017-07-19 Thread Konrad Rzeszutek Wilk
On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote:
> ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk :
> 
> > On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote:
> > > Now PCI bridges (and PCIE root port too) get a bus range number in
> > system init,
> > > basing on currently plugged devices. That's why when one wants to
> > hotplug another bridge,
> > > it needs his child bus, which the parent is unable to provide.
> >
> > Could you explain how you trigger this?
> 
> 
> I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux says
> 'cannot allocate bus number for device bla-bla'. This obviously does not
> allow me to use the bridge at all.
> 
> >
> >
> > > The suggested workaround is to have vendor-specific capability in RedHat
> > generic pcie-root-port
> > > that contains number of additional bus to reserve on BIOS PCI init.
> >
> > But wouldn't the proper fix be for the PCI bridge to have the subordinate
> > value be extended to fit more bus ranges?
> 
> 
> What do you mean? This is what I'm trying to do. Do you suppose to get rid
> of vendor-specific cap and use original register value instead of it?

I would suggest a simple fix - each bridge has a a number of bus devices
it can use. You have up to 255 - so you split the number of northbridge
numbers by the amount of NUMA nodes (if that is used) - so for example
if you have 4 NUMA nodes, each bridge would cover 63 bus numbers.

Meaning the root bridge would cover 0->63 bus, 64->128, and so on.
That gives you enough space to plug in your plugged in devices
(up to 63).

And if you need sub-briges then carve out a specific range.


> 
> >
> > >
> > > Aleksandr Bezzubikov (2):
> > >   pci: add support for direct usage of bdf for capability lookup
> > >   pci: enable RedHat pci bridges to reserve more buses
> > >
> > >  src/fw/pciinit.c   | 12 ++--
> > >  src/hw/pcidevice.c | 24 
> > >  src/hw/pcidevice.h |  1 +
> > >  3 files changed, 35 insertions(+), 2 deletions(-)
> > >
> > > --
> > > 2.7.4
> > >
> > >
> >
> -- 
> Alexander Bezzubikov



[Qemu-devel] Fwd: [RFC PATCH 0/2] Allow RedHat PCI bridges reserve more buses than necessary during init

2017-07-19 Thread Alexander Bezzubikov
2017-07-19 21:18 GMT+03:00 Konrad Rzeszutek Wilk :

> On Wed, Jul 19, 2017 at 05:14:41PM +, Alexander Bezzubikov wrote:
> > ср, 19 июля 2017 г. в 16:57, Konrad Rzeszutek Wilk <
> konrad.w...@oracle.com>:
> >
> > > On Wed, Jul 19, 2017 at 04:20:12PM +0300, Aleksandr Bezzubikov wrote:
> > > > Now PCI bridges (and PCIE root port too) get a bus range number in
> > > system init,
> > > > basing on currently plugged devices. That's why when one wants to
> > > hotplug another bridge,
> > > > it needs his child bus, which the parent is unable to provide.
> > >
> > > Could you explain how you trigger this?
> >
> >
> > I'm trying to hot plug pcie-pci bridge into pcie root port, and Linux
> says
> > 'cannot allocate bus number for device bla-bla'. This obviously does not
> > allow me to use the bridge at all.
> >
> > >
> > >
> > > > The suggested workaround is to have vendor-specific capability in
> RedHat
> > > generic pcie-root-port
> > > > that contains number of additional bus to reserve on BIOS PCI init.
> > >
> > > But wouldn't the proper fix be for the PCI bridge to have the
> subordinate
> > > value be extended to fit more bus ranges?
> >
> >
> > What do you mean? This is what I'm trying to do. Do you suppose to get
> rid
> > of vendor-specific cap and use original register value instead of it?
>
> I would suggest a simple fix - each bridge has a a number of bus devices
> it can use. You have up to 255 - so you split the number of northbridge
> numbers by the amount of NUMA nodes (if that is used) - so for example
> if you have 4 NUMA nodes, each bridge would cover 63 bus numbers.
>
> Meaning the root bridge would cover 0->63 bus, 64->128, and so on.
> That gives you enough space to plug in your plugged in devices
> (up to 63).
>
> And if you need sub-briges then carve out a specific range.
>

The problem is that we don't know at the init moment how many subbridges we
may need,
and how deep the whole device tree will be. The key moment - PCI bridge
hotplugging
needs either rescan all buses on each bridge device addition, or reserve
space in advance during BIOS init.
In this series the second way was chosen.


>
>
> >
> > >
> > > >
> > > > Aleksandr Bezzubikov (2):
> > > >   pci: add support for direct usage of bdf for capability lookup
> > > >   pci: enable RedHat pci bridges to reserve more buses
> > > >
> > > >  src/fw/pciinit.c   | 12 ++--
> > > >  src/hw/pcidevice.c | 24 
> > > >  src/hw/pcidevice.h |  1 +
> > > >  3 files changed, 35 insertions(+), 2 deletions(-)
> > > >
> > > > --
> > > > 2.7.4
> > > >
> > > >
> > >
> > --
> > Alexander Bezzubikov
>



-- 
Alexander Bezzubikov


Re: [Qemu-devel] [PATCH v2] hmp: allow cpu index for "info lapic"

2017-07-19 Thread Eduardo Habkost
On Wed, Jul 19, 2017 at 10:17:36AM -0500, Eric Blake wrote:
> On 07/19/2017 10:07 AM, Daniel P. Berrange wrote:
> >> It doesn't.  Perhaps we should add that as a future libvirt-qemu.so API
> >> addition, although it's probably easier to just use QMP than HMP when
> >> using 'virsh qemu-monitor-command' if HMP doesn't do what you want.
> > 
> > Or special case the "cpu 1" command - ie notice that it is being
> > requested and don't execute 'human-montor-command'. Instead just
> > record the CPU index, and use that for future "human-monitor-command"
> > invokations, so we get full compat with the (dubious) stateful HMP
> > semantics that traditionally existed.
> 
> Is 'cpu' (and the followup commands affected by it) the only stateful
> HMP command pairing?  Is there a way to specify multiple HMP commands in
> a single human-monitor-command QMP call?
> 
> Indeed, tweaking qemu's human-monitor-command call to track the state
> might be cleaner than having libvirt have to tweak API to work around
> this wart of HMP.

The CPU index was the only state kept by the human monitor, and I
think it's by design that it stopped being considered "monitor
state" to be tracked, and became just an argument to
human-monitor-command.

It's true that it broke compatibility of
  "virsh qemu-monitor-command  --hmp 'cpu '",
when we moved to QMP, but this happened years ago, and it looks
like nobody was relying on it.  I don't see the point of trying
to emulate the previous stateful interface.

-- 
Eduardo



Re: [Qemu-devel] [PULL 0/8] target/alpha cleanups

2017-07-19 Thread Peter Maydell
On 19 July 2017 at 05:45, Richard Henderson  wrote:
> The new title holder for perf top is helper_lookup_tb_ptr.
> Those targets that have a complicated cpu_get_tb_cpu_state
> function are going to regret that.
>
> This cleans up the Alpha version of that function such that it is
> just two loads and one mask.  Which is one practically-free mask
> away from being as minimal as one can get.
>
> Also, in anticipation of LLuis' generic translation loop, fix all
> of the temporary leaks.  They all seem to have been on insns that
> end the TB, so in practice they weren't harmful, but...
>
>
> r~
>
>
> The following changes since commit 6887dc6700ccb7820d8a9d370f421ee361c748e8:
>
>   Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20170718' into 
> staging (2017-07-18 21:13:48 +0100)
>
> are available in the git repository at:
>
>   git://github.com/rth7680/qemu.git tags/pull-axp-20170718
>
> for you to fetch changes up to 8aa5c65fd3d4612d8ab690bef0980d26f30f381d:
>
>   target/alpha: Log temp leaks (2017-07-18 18:42:05 -1000)
>
> 
> Queued target/alpha patches
>

Applied, thanks.

-- PMM



  1   2   3   4   >