Re: [Qemu-devel] [RFC PATCH 12/13] intel_iommu: do replay when context invalidate

2016-12-28 Thread Liu, Yi L
> Before this one we only invalidate context cache when we receive context
> entry invalidations. However it's possible that the invalidation also
> contains a domain switch (only if cache-mode is enabled for vIOMMU). In
> that case we need to notify all the registered components about the new
> mapping.
> 
> Signed-off-by: Peter Xu 
> ---
>  hw/i386/intel_iommu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 2fcd7af..0220e63 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1188,6 +1188,7 @@ static void 
> vtd_context_device_invalidate(IntelIOMMUState 
> *s,
>  trace_vtd_inv_desc_cc_device(bus_n, (devfn_it >> 3) & 0x1f,
>   devfn_it & 3);
>  vtd_as->context_cache_entry.context_cache_gen = 0;
> +memory_region_iommu_replay_all(&vtd_as->iommu);

Hi Peter,

It looks like all the device context invalidation would result in replay even 
the
device is not an assigned device. Is it necessary to do replay for a virtual 
device?

Regards,
Yi L
>  }
>  }
>  }
> -- 
> 2.7.4



Re: [Qemu-devel] [RFC PATCH 11/13] intel_iommu: provide its own replay() callback

2016-12-28 Thread Liu, Yi L
> The default replay() don't work for VT-d since vt-d will have a huge
> default memory region which covers address range 0-(2^64-1). This will
> normally bring a dead loop when guest starts.
> 
> The solution is simple - we don't walk over all the regions. Instead, we
> jump over the regions when we found that the page directories are empty.
> It'll greatly reduce the time to walk the whole region.
> 
> To achieve this, we provided a page walk helper to do that, invoking
> corresponding hook function when we found an page we are interested in.
> vtd_page_walk_level() is the core logic for the page walking. It's
> interface is designed to suite further use case, e.g., to invalidate a
> range of addresses.
> 
> Signed-off-by: Peter Xu 
> ---
>  hw/i386/intel_iommu.c | 212 
> --
>  hw/i386/trace-events  |   8 ++
>  include/exec/memory.h |   2 +
>  3 files changed, 217 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 46b8a2f..2fcd7af 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -620,6 +620,22 @@ static inline uint32_t 
> vtd_get_agaw_from_context_entry(VTDContextEntry *ce)
>  return 30 + (ce->hi & VTD_CONTEXT_ENTRY_AW) * 9;
>  }
>  
> +static inline uint64_t vtd_iova_limit(VTDContextEntry *ce)
> +{
> +uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
> +return 1ULL << MIN(ce_agaw, VTD_MGAW);
> +}
> +
> +/* Return true if IOVA passes range check, otherwise false. */
> +static inline bool vtd_iova_range_check(uint64_t iova, VTDContextEntry *ce)
> +{
> +/*
> + * Check if @iova is above 2^X-1, where X is the minimum of MGAW
> + * in CAP_REG and AW in context-entry.
> + */
> +return !(iova & ~(vtd_iova_limit(ce) - 1));
> +}
> +
>  static const uint64_t vtd_paging_entry_rsvd_field[] = {
>  [0] = ~0ULL,
>  /* For not large page */
> @@ -656,13 +672,9 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, 
> uint64_t 
> iova,
>  uint32_t level = vtd_get_level_from_context_entry(ce);
>  uint32_t offset;
>  uint64_t slpte;
> -uint32_t ce_agaw = vtd_get_agaw_from_context_entry(ce);
>  uint64_t access_right_check = 0;
>  
> -/* Check if @iova is above 2^X-1, where X is the minimum of MGAW
> - * in CAP_REG and AW in context-entry.
> - */
> -if (iova & ~((1ULL << MIN(ce_agaw, VTD_MGAW)) - 1)) {
> +if (!vtd_iova_range_check(iova, ce)) {
>  error_report("IOVA 0x%"PRIx64 " exceeds limits", iova);
>  return -VTD_FR_ADDR_BEYOND_MGAW;
>  }
> @@ -718,6 +730,166 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, 
> uint64_t iova,
>  }
>  }
>  
> +typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> +
> +/**
> + * vtd_page_walk_level - walk over specific level for IOVA range
> + *
> + * @addr: base GPA addr to start the walk
> + * @start: IOVA range start address
> + * @end: IOVA range end address (start <= addr < end)
> + * @hook_fn: hook func to be called when detected page
> + * @private: private data to be passed into hook func
> + * @read: whether parent level has read permission
> + * @write: whether parent level has write permission
> + * @skipped: accumulated skipped ranges
> + * @notify_unmap: whether we should notify invalid entries
> + */
> +static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
> +   uint64_t end, vtd_page_walk_hook hook_fn,
> +   void *private, uint32_t level,
> +   bool read, bool write, uint64_t *skipped,
> +   bool notify_unmap)
> +{
> +bool read_cur, write_cur, entry_valid;
> +uint32_t offset;
> +uint64_t slpte;
> +uint64_t subpage_size, subpage_mask;
> +IOMMUTLBEntry entry;
> +uint64_t iova = start;
> +uint64_t iova_next;
> +uint64_t skipped_local = 0;
> +int ret = 0;
> +
> +trace_vtd_page_walk_level(addr, level, start, end);
> +
> +subpage_size = 1ULL << vtd_slpt_level_shift(level);
> +subpage_mask = vtd_slpt_level_page_mask(level);
> +
> +while (iova < end) {
> +iova_next = (iova & subpage_mask) + subpage_size;
> +
> +offset = vtd_iova_level_offset(iova, level);
> +slpte = vtd_get_slpte(addr, offset);
> +
> +/*
> + * When one of the following case happens, we assume the whole
> + * range is invalid:
> + *
> + * 1. read block failed
> + * 3. reserved area non-zero
> + * 2. both read & write flag are not set
> + */
> +
> +if (slpte == (uint64_t)-1) {
> +trace_vtd_page_walk_skip_read(iova, iova_next);
> +skipped_local++;
> +goto next;
> +}
> +
> +if (vtd_slpte_nonzero_rsvd(slpte, level)) {
> +trace_vtd_page_walk_skip_reserve(iova, iova_next);
> +skipped_local++;
> +goto next;
> +}
> +
>

[Qemu-devel] Reducing guest cpu usage

2016-12-28 Thread Programmingkid
There is a program that I run inside of QEMU that doesn't use the virtual CPU 
very efficiently. It causes QEMU to use 100% of the guest's CPU time. I was 
wondering if there were a way to reduce the amount of host CPU time that a 
guest CPU can use? This feature would help prevent laptops from heating up when 
running QEMU.


Re: [Qemu-devel] [PATCH 0/4] RFC: A VFIO based block driver for NVMe device

2016-12-28 Thread Tian, Kevin
> From: Fam Zheng
> Sent: Wednesday, December 21, 2016 12:32 AM
> 
> This series adds a new protocol driver that is intended to achieve about 20%
> better performance for latency bound workloads (i.e. synchronous I/O) than
> linux-aio when guest is exclusively accessing a NVMe device, by talking to the
> device directly instead of through kernel file system layers and its NVMe
> driver.
> 

Curious... if the NVMe device is exclusively owned by the guest, why
not directly passing through to the guest? is it a tradeoff between
performance (better than linux-aio) and composability (snapshot and
live migration which not supported by direct passthrough)?

Thanks
Kevin



[Qemu-devel] [PATCH v2] doc/pcie: correct command line examples

2016-12-28 Thread Cao jin
Nit picking: Multi-function PCI Express Root Ports should mean that
'addr' property is mandatory, and slot is optional because it defaults
to 0, and 'chassis' is mandatory for 2nd & 3rd root port because it
defaults to 0 too.

Bonus: fix a typo(2->3)
Signed-off-by: Cao jin 
Reviewed-by: Marcel Apfelbaum 
---
 docs/pcie.txt | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/pcie.txt b/docs/pcie.txt
index 9fb20aaed9f4..5bada24a15ab 100644
--- a/docs/pcie.txt
+++ b/docs/pcie.txt
@@ -110,18 +110,18 @@ Plug only PCI Express devices into PCI Express Ports.
   -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] 
 \
   -device ,bus=root_port1
 2.2.2 Using multi-function PCI Express Root Ports:
-  -device 
ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0] 
\
-  -device ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] 
\
-  -device ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] 
\
-2.2.2 Plugging a PCI Express device into a Switch:
+  -device 
ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] 
\
+  -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] 
\
+  -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] 
\
+2.2.3 Plugging a PCI Express device into a Switch:
   -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
   -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] 
 \
   -device 
xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]]
 \
   -device ,bus=downstream_port1
 
 Notes:
-  - (slot, chassis) pair is mandatory and must be
- unique for each PCI Express Root Port.
+  - (slot, chassis) pair is mandatory and must be unique for each
+PCI Express Root Port. slot defaults to 0 when not specified.
   - 'addr' parameter can be 0 for all the examples above.
 
 
-- 
2.1.0






Re: [Qemu-devel] [PATCH] doc/pcie: correct command line examples

2016-12-28 Thread Cao jin


On 12/28/2016 11:21 PM, Andrew Jones wrote:
> On Wed, Dec 28, 2016 at 03:24:30PM +0200, Marcel Apfelbaum wrote:
>> On 12/27/2016 09:40 AM, Cao jin wrote:
>>> Nit picking: Multi-function PCI Express Root Ports should mean that
>>> 'addr' property is mandatory, and slot is optional because it is default
>>> to 0, and 'chassis' is mandatory for 2nd & 3rd root port because it is
>>> default to 0 too.
>>>
>>> Bonus: fix a typo(2->3)
>>> Signed-off-by: Cao jin 
>>> ---
>>>  docs/pcie.txt | 12 ++--
>>>  1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/docs/pcie.txt b/docs/pcie.txt
>>> index 9fb20aaed9f4..54f05eaa71dc 100644
>>> --- a/docs/pcie.txt
>>> +++ b/docs/pcie.txt
>>> @@ -110,18 +110,18 @@ Plug only PCI Express devices into PCI Express Ports.
>>>-device 
>>> ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
>>>-device ,bus=root_port1
>>>  2.2.2 Using multi-function PCI Express Root Ports:
>>> -  -device 
>>> ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0]
>>>  \
>>> -  -device 
>>> ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] \
>>> -  -device 
>>> ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] \
>>> -2.2.2 Plugging a PCI Express device into a Switch:
>>> +  -device 
>>> ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0]
>>>  \
>>> +  -device 
>>> ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \
>>> +  -device 
>>> ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \
>>> +2.2.3 Plugging a PCI Express device into a Switch:
>>>-device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] 
>>>  \
>>>-device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] 
>>>  \
>>>-device 
>>> xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]]
>>>  \
>>>-device ,bus=downstream_port1
>>>
>>>  Notes:
>>> -  - (slot, chassis) pair is mandatory and must be
>>> - unique for each PCI Express Root Port.
>>> +  - (slot, chassis) pair is mandatory and must be unique for each
>>> +PCI Express Root Port. slot is default to 0 when doesn't specify it.
> 
> Please rewrite last sentence as
> 
>  slot defaults to 0 when not specified.

Thanks for pointing it out, v2 is on the way.

-- 
Sincerely,
Cao jin

> 
>>>- 'addr' parameter can be 0 for all the examples above.
>>>
>>>
>>>
>>
>> Reviewed-by: Marcel Apfelbaum 
>>
>> Thanks,
>> Marcel
>>
> 
> Thanks,
> drew
> 
> 
> .
> 






Re: [Qemu-devel] [PULL v2 00/12] M68k for 2.9 patches

2016-12-28 Thread Peter Maydell
On 27 December 2016 at 17:53, Laurent Vivier  wrote:
> The following changes since commit e5fdf663cf01f824f0e29701551a2c29554d80a4:
>
>   Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20161223' into 
> staging (2016-12-27 14:56:47 +)
>
> are available in the git repository at:
>
>   git://github.com/vivier/qemu-m68k.git tags/m68k-for-2.9-pull-request
>
> for you to fetch changes up to 2b5e2170678af36df48ab4b05dff81fe40b41a65:
>
>   target-m68k: free TCG variables that are not (2016-12-27 18:28:40 +0100)
>
> 
> A series of patches queued since the beginning of the freeze period.
> Compared to the m68k-for-2.9 branch, 3 patches implementing bitfield
> ops are missing as they need new TCG functions. They will be pushed
> later.
> v2: remove warning for unused variables.
> 

Applied, thanks.

-- PMM



Re: [Qemu-devel] QEMU Advent Calendar - Final day

2016-12-28 Thread Stefan Hajnoczi
On Dec 24, 2016 9:04 AM, "Thomas Huth"  wrote:

 Hi all,

the last door of the QEMU advent calendar 2016 can now be opened
(http://www.qemu-advent-calendar.org/2016/index.html#day-24), so we'd
now like to say thank you to everybody who has contributed to or
followed the advent calendar! It was fun to come up with all these disk
images and we hope that you've also found some surprises that you enjoyed.


Great job! It's a lot of work but brings joy to many people in the wider
community.

Stefan


[Qemu-devel] [PATCH v3] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Christopher Covington
The definition of the major() and minor() macros are moving within glibc to
. Include this header when it is available to avoid the
following sorts of build-stopping messages:

qga/commands-posix.c: In function ‘dev_major_minor’:
qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "major", include 
 directly. If you did not intend to use a system-defined macro
 "major", you should undefine it after including . [-Werror]
 *devmajor = major(st.st_rdev);
 ^~

qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "minor", include 
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including . [-Werror]
 *devminor = minor(st.st_rdev);
 ^~

The additional include allows the build to complete on Fedora 26 (Rawhide)
with glibc version 2.24.90.

Signed-off-by: Christopher Covington 
---
 configure | 18 ++
 include/sysemu/os-posix.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/configure b/configure
index 218df87d21..58a33c71ad 100755
--- a/configure
+++ b/configure
@@ -4746,6 +4746,20 @@ if test "$modules" = "yes" && test "$LD_REL_FLAGS" = ""; 
then
 fi
 
 ##
+# check for sysmacros.h
+
+have_sysmacros=no
+cat > $TMPC << EOF
+#include 
+int main(void) {
+return makedev(0, 0);
+}
+EOF
+if compile_prog "" "" ; then
+have_sysmacros=yes
+fi
+
+##
 # End of CC checks
 # After here, no more $cc or $ld runs
 
@@ -5721,6 +5735,10 @@ if test "$have_af_vsock" = "yes" ; then
   echo "CONFIG_AF_VSOCK=y" >> $config_host_mak
 fi
 
+if test "$have_sysmacros" = "yes" ; then
+  echo "CONFIG_SYSMACROS=y" >> $config_host_mak
+fi
+
 # Hold two types of flag:
 #   CONFIG_THREAD_SETNAME_BYTHREAD  - we've got a way of setting the name on
 # a thread we have a handle to
diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index b0a6c0695b..900bdcb45a 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -34,6 +34,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_SYSMACROS
+#include 
+#endif
+
 void os_set_line_buffering(void);
 void os_set_proc_name(const char *s);
 void os_setup_signal_handling(void);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.




Re: [Qemu-devel] [PATCH v2] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Christopher Covington
Hi Eric,

On 12/28/2016 11:10 AM, Eric Blake wrote:
> On 12/28/2016 08:53 AM, Christopher Covington wrote:
>> The definition of the major() and minor() macros are moving within glibc to
>> . Include this header to avoid the following sorts of
>> build-stopping messages:
>>
> 
>> The additional include allows the build to complete on Fedora 26 (Rawhide)
>> with glibc version 2.24.90.
>>
>> Signed-off-by: Christopher Covington 
>> ---
>>  include/sysemu/os-posix.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
>> index b0a6c0695b..772d58f7ed 100644
>> --- a/include/sysemu/os-posix.h
>> +++ b/include/sysemu/os-posix.h
>> @@ -28,6 +28,7 @@
>>  
>>  #include 
>>  #include 
>> +#include 
> 
> I repeat what I said on v1:
> 
> Works for glibc; but  is non-standard and not present
> on some other systems, so this may fail to build elsewhere.

I read your response to v1 but got stuck on this "some other systems"
statement which seems too vague for me to act on. I see the following
operating systems checked in configure:

Cygwin, mingw32, GNU/kFreeBSD, FreeBSD, DragonFly, NetBSD, OpenBSD,
Darwin, SunOS, AIX, Haiku, and Linux.

But I'm really not sure what list of C libraries and corresponding mkdev.h
versus sysmacros.h versus types.h usage this translates to.

> You'll probably need a configure probe.

I'm testing that now and will hopefully send it out as v3 shortly.

> Autoconf also says that some platforms have  instead of
>  (per its AC_HEADER_MAJOR macro).

`git grep mkdev` returns no results for me so I conclude that no currently
supported OS/libc requires it.

In case anyone wants to work around these messages, I'd like to highlight
the --disable-werror option to ./configure. If I had known about it this
morning, I probably would be happily authoring other changes right now.

Thanks,
Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.



[Qemu-devel] [PATCH] target-x86:Add GDB XML register description support

2016-12-28 Thread Abdallah Bouassida

[Qemu-devel] [PATCH] target-x86:Add GDB XML register description support

This patch implements XML target description support for X86 and X86-64
architectures in the GDB stub, as the way with ARM and PowerPC:
- gdb-xml/32bit-core.xml & gdb-xml/64bit-core.xml: Adding the XML target
  description files, these files are picked from GDB source code.
- configure: Define gdb_xml_files for X86 targets.
- target/i386/cpu.c: Define gdb_core_xml_file and gdb_arch_name to add
  XML awareness for this architecture, modify the gdb_num_core_regs to
  fit the registers number defined in each XML file.

Signed-off-by: Abdallah Bouassida 
---
 configure  |  2 ++
 gdb-xml/32bit-core.xml | 65 
 gdb-xml/64bit-core.xml | 73 
++

 target/i386/cpu.c  | 21 ---
 4 files changed, 157 insertions(+), 4 deletions(-)
 create mode 100644 gdb-xml/32bit-core.xml
 create mode 100644 gdb-xml/64bit-core.xml

diff --git a/configure b/configure
index 218df87..b701d1e 100755
--- a/configure
+++ b/configure
@@ -5890,9 +5890,11 @@ TARGET_ABI_DIR=""

 case "$target_name" in
   i386)
+gdb_xml_files="32bit-core.xml"
   ;;
   x86_64)
 TARGET_BASE_ARCH=i386
+gdb_xml_files="64bit-core.xml"
   ;;
   alpha)
   ;;
diff --git a/gdb-xml/32bit-core.xml b/gdb-xml/32bit-core.xml
new file mode 100644
index 000..7aeeeca
--- /dev/null
+++ b/gdb-xml/32bit-core.xml
@@ -0,0 +1,65 @@
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/gdb-xml/64bit-core.xml b/gdb-xml/64bit-core.xml
new file mode 100644
index 000..5088d84
--- /dev/null
+++ b/gdb-xml/64bit-core.xml
@@ -0,0 +1,73 @@
+
+
+
+
+
+  
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index b0640f1..d712e8b 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -2371,6 +2371,15 @@ static void x86_cpu_load_def(X86CPU *cpu, 
X86CPUDefinition *def, Error **errp)


 }

+static gchar *x86_gdb_arch_name(CPUState *cs)
+{
+#ifdef TARGET_X86_64
+return g_strdup("i386:x86-64");
+#else
+return g_strdup("i386");
+#endif
+}
+
 X86CPU *cpu_x86_init(const char *cpu_model)
 {
 return X86_CPU(cpu_generic_init(TYPE_X86_CPU, cpu_model));
@@ -3720,10 +3729,14 @@ static void 
x86_cpu_common_class_init(ObjectClass *oc, void *data)

 cc->write_elf32_qemunote = x86_cpu_write_elf32_qemunote;
 cc->vmsd = &vmstate_x86_cpu;
 #endif
-/* CPU_NB_REGS * 2 = general regs + xmm regs
- * 25 = eip, eflags, 6 seg regs, st[0-7], fctrl,...,fop, mxcsr.
- */
-cc->gdb_num_core_regs = CPU_NB_REGS * 2 + 25;
+cc->gdb_arch_name = x86_gdb_arch_name;
+#ifdef TARGET_X86_64
+cc->gdb_core_xml_file = "64bit-core.xml";
+cc->gdb_num_core_regs = 40;
+#else
+cc->gdb_core_xml_file = "32bit-core.xml";
+cc->gdb_num_core_regs = 32;
+#endif
 #ifndef CONFIG_USER_ONLY
 cc->debug_excp_handler = breakpoint_handler;
 #endif
--
1.9.1




Re: [Qemu-devel] Can qemu reopen image files?

2016-12-28 Thread Christopher Pereira

Hi Eric,

There is something I don't understand.

We are doing: "virsh save", "qemu-img convert", "qemu-img rebase" and 
"virsh restore".

We only touch the backing chain by doing the rebase while the VM is down.
Is there any chance this procedure can destroy data?
If so, is there any difference between shutting down and just 
saving/restoring the VM?

Maybe save/restore keeps a cache?

Best regards,
Christopher.

On 19-Dec-16 13:24, Christopher Pereira wrote:

Hi Eric,

Thanks for your great answer.

On 19-Dec-16 12:48, Eric Blake wrote:




Then we do the rebase while the VM is suspended to make sure the image
files are reopened.

That part is where you are liable to break things.  Qemu does NOT have a
graceful way to reopen the backing chain, so rebasing snap3 to point to
snap2' behind qemu's back is asking for problems.  Since qemu may be
caching things it has already learned about snap2, you have invalidated
that cached data by making snap3 point to snap2', but have no way to
force qemu to reread the backing chain to start reading from snap2'.
We are actually doing a save, rebase and restore to reopen the backing 
chain.

We only touch files (rebase) while the VM is down.
Can you please confirm this is 100% safe?


Or, if you don't want to merge into "base'", you can use block-stream to
merge the other direction, so that "base <- snap1 <- snap2" is converted
into "snap2'" - but that depends on patches that were only barely added
in qemu 2.8 (intermediate block-commit has existed a lot longer than
intermediate block-stream).  But the point remains that you are still
using qemu to do the work, and therefore with no external qemu-img
process interfering with the chain, you don't need any guest downtime or
any risk of breaking qemu operation by invalidating data it may have 
cached.
Right. Since images are backed up remotely, we don't want to merge 
into base nor touch the backing chain at all (only the active snapshot 
should be modified). This is to keep things simple and avoid to 
re-syncs of images (remote backups).


Besides, we don't want to merge the whole backing chain, but an 
intermediate point, so it seems that the clean way is to use the 
"intermediate block-stream" feature.


We didn't try it, because when we researched we got the impression 
that the patches were not stable yet or not included in the qemu 
versions shipped with CentOS, so we went with 'qemu-img convert' 
because we needed something known, simple and stable (we are dealing 
with critical information for gov. orgs.).



If block-commit and block-stream don't have enough power to do what you
want, then we should patch them to expose that power, rather than
worrying about how to use qemu-img to modify the backing chain behind
qemu's back.
"intermediate block-stream" seems to be the right solution for our use 
case.

Does it also allow QCOW2 compression?
Compression is interesting, especially when files are sync'ed via 
network.







[Qemu-devel] CMSIS SVD based peripheral definitions

2016-12-28 Thread Liviu Ionescu
CMSIS SVD
-

The latest release of GNU ARM Eclipse QEMU (2.8.0-20161227) introduced a new 
technology for implementing peripherals, based on standard CMSIS SVD 
definitions (http://www.keil.com/pack/doc/CMSIS/SVD/html/index.html).

The SVD files are large XML files produced by the silicon vendors, and 
generally are considered the final hardware reference for the Cortex-M devices, 
so they are expected to provide the most accurate peripheral emulation.

For convenience, the original SVD files are converted to JSON files, which are 
generally easier to parse. 

There is one file for each sub-family; the files currently used by GNU ARM 
Eclipse QEMU are grouped in the devices folders:


https://github.com/gnuarmeclipse/qemu/tree/gnuarmeclipse-dev/gnuarmeclipse/devices


Please note that some devices may include multiple instances of similar 
peripherals, for example multiple timers, with each instance slightly different 
from the others. The SVD files include all these differences, so strictly 
following the content of the SVD files is mandatory; extracting a definition 
common to all instances may seem attractive, but it is not realistic, since it 
may not be accurate for all instances.


Peripheral/register/bitfield


The basic objects used to implement peripherals are the 'registers'; a 
peripheral is actually an array of registers, each with its value.

For convenience, registers can be viewed as collections of bitfields; bitfield 
objects do not have their own values, reading a bitfield refers to the parent 
register, which is masked, shifted and finally returned.

Accessing bitfields is quite straightforward:

const char *enabling_bit_name = "/machine/mcu/stm32/RCC/AHB1ENR/GPIOAEN";
state->enabling_bit = OBJECT(cm_device_by_name(enabling_bit_name));
// ...
if (register_bitfield_is_non_zero(state->enabling_bit)) {
  // ...
}


Generated code
--

In addition to using vendor SVD files, GNU ARM Eclipse QEMU goes one step 
further, by generating most of the peripheral code, to the point that new 
peripherals can be added simply be adding the generated files to the project.

Examples of the files currently used are in sub subfolders of the 
devices/support folder:


https://github.com/gnuarmeclipse/qemu/tree/gnuarmeclipse-dev/gnuarmeclipse/devices/support


As per the current STM32 implementation, to avoid redundancy, each peripheral 
file includes definitions for all families; adding a new device implies 
generating the code for the new device sub-family, and copy/paste-ing the 
required code in the peripheral implementation.

An example of such a peripheral implementation is the SYSFCG:


https://github.com/gnuarmeclipse/qemu/blob/gnuarmeclipse-dev/hw/cortexm/stm32/syscfg.c



Supported devices
-

GNU ARM Eclipse QEMU 2.8 supports the following boards:

  MapleLeafLab Arduino-style STM32 microcontroller board (r5)
  NUCLEO-F103RBST Nucleo Development Board for STM32 F1 series
  NUCLEO-F411REST Nucleo Development Board for STM32 F4 series
  NetduinoGo   Netduino GoBus Development Board with STM32F4
  NetduinoPlus2Netduino Development Board with STM32F4
  OLIMEXINO-STM32  Olimex Maple (Arduino-like) Development Board
  STM32-E407   Olimex Development Board for STM32F407ZGT6
  STM32-H103   Olimex Header Board for STM32F103RBT6
  STM32-P103   Olimex Prototype Board for STM32F103RBT6
  STM32-P107   Olimex Prototype Board for STM32F107VCT6
  STM32F0-DiscoveryST Discovery kit for STM32F051 lines
  STM32F4-DiscoveryST Discovery kit for STM32F407/417 lines
  STM32F429I-Discovery ST Discovery kit for STM32F429/439 lines

Supported MCUs:
  STM32F051R8
  STM32F103RB
  STM32F107VC
  STM32F405RG
  STM32F407VG
  STM32F407ZG
  STM32F411RE
  STM32F429ZI

Functionally, the boards include animated LEDs and active push buttons (reset 
and user).


Test projects (blinky) for all supported boards are available from

https://github.com/gnuarmeclipse/eclipse-qemu-test-projects


Conclusion
--

Although implementing peripherals remains a challenge, using the SVD 
definitions, plus the tools to generate code, significantly improve and 
simplify the process.

Unfortunately, SVD files are available only for Cortex-M devices, but I think 
that, when not available, creating the JSON files and using the automated code 
generator is still easier than manually implementing the peripherals.


Personally I plan to use this technology to re-define the system Cortex-M 
devices, which, right now, are improperly implemented.


Regards,

Liviu







[Qemu-devel] [PATCH v6 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events

2016-12-28 Thread Lluís Vilanova
If an event is dynamically disabled, the TCG code that calls the
execution-time tracer is not generated.

Removes the overheads of execution-time tracers for dynamically disabled
events. As a bonus, also avoids checking the event state when the
execution-time tracer is called from TCG-generated code (since otherwise
TCG would simply not call it).

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/__init__.py|1 +
 scripts/tracetool/format/h.py|   24 ++--
 scripts/tracetool/format/tcg_h.py|   19 ---
 scripts/tracetool/format/tcg_helper_c.py |3 ++-
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 365446fa53..63168ccdf0 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -264,6 +264,7 @@ class Event(object):
 return self._FMT.findall(self.fmt)
 
 QEMU_TRACE   = "trace_%(name)s"
+QEMU_TRACE_NOCHECK   = "_nocheck__" + QEMU_TRACE
 QEMU_TRACE_TCG   = QEMU_TRACE + "_tcg"
 QEMU_DSTATE  = "_TRACE_%(NAME)s_DSTATE"
 QEMU_EVENT   = "_TRACE_%(NAME)s_EVENT"
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 3682f4e6a8..a78e50ef35 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -49,6 +49,19 @@ def generate(events, backend, group):
 backend.generate_begin(events, group)
 
 for e in events:
+# tracer without checks
+out('',
+'static inline void %(api)s(%(args)s)',
+'{',
+api=e.api(e.QEMU_TRACE_NOCHECK),
+args=e.args)
+
+if "disable" not in e.properties:
+backend.generate(e, group)
+
+out('}')
+
+# tracer wrapper with checks (per-vCPU tracing)
 if "vcpu" in e.properties:
 trace_cpu = next(iter(e.args))[1]
 cond = "trace_event_get_vcpu_state(%(cpu)s,"\
@@ -63,16 +76,15 @@ def generate(events, backend, group):
 'static inline void %(api)s(%(args)s)',
 '{',
 'if (%(cond)s) {',
+'%(api_nocheck)s(%(names)s);',
+'}',
+'}',
 api=e.api(),
+api_nocheck=e.api(e.QEMU_TRACE_NOCHECK),
 args=e.args,
+names=", ".join(e.args.names()),
 cond=cond)
 
-if "disable" not in e.properties:
-backend.generate(e, group)
-
-out('}',
-'}')
-
 backend.generate_end(events, group)
 
 out('#endif /* TRACE_%s_GENERATED_TRACERS_H */' % group.upper())
diff --git a/scripts/tracetool/format/tcg_h.py 
b/scripts/tracetool/format/tcg_h.py
index 5f213f6cba..71b5c09432 100644
--- a/scripts/tracetool/format/tcg_h.py
+++ b/scripts/tracetool/format/tcg_h.py
@@ -41,7 +41,7 @@ def generate(events, backend, group):
 
 for e in events:
 # just keep one of them
-if "tcg-trans" not in e.properties:
+if "tcg-exec" not in e.properties:
 continue
 
 out('static inline void %(name_tcg)s(%(args)s)',
@@ -53,12 +53,25 @@ def generate(events, backend, group):
 args_trans = e.original.event_trans.args
 args_exec = tracetool.vcpu.transform_args(
 "tcg_helper_c", e.original.event_exec, "wrapper")
+if "vcpu" in e.properties:
+trace_cpu = e.args.names()[0]
+cond = "trace_event_get_vcpu_state(%(cpu)s,"\
+   " TRACE_%(id)s)"\
+   % dict(
+   cpu=trace_cpu,
+   id=e.original.event_exec.name.upper())
+else:
+cond = "true"
+
 out('%(name_trans)s(%(argnames_trans)s);',
-'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'if (%(cond)s) {',
+'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'}',
 name_trans=e.original.event_trans.api(e.QEMU_TRACE),
 name_exec=e.original.event_exec.api(e.QEMU_TRACE),
 argnames_trans=", ".join(args_trans.names()),
-argnames_exec=", ".join(args_exec.names()))
+argnames_exec=", ".join(args_exec.names()),
+cond=cond)
 
 out('}')
 
diff --git a/scripts/tracetool/format/tcg_helper_c.py 
b/scripts/tracetool/format/tcg_helper_c.py
index cc26e03008..c2a05d756c 100644
--- a/scripts/tracetool/format/tcg_helper_c.py
+++ b/scripts/tracetool/format/tcg_helper_c.py
@@ -66,10 +66,11 @@ def generate(events, backend, group):
 
 out('void %(name_tcg)s(%(args_api)s)',
 '{',
+# NOTE: the check was already performed at TCG-generation time
 '%(name)s(%(args_call)s);',
 '}',
 name_tcg="helper_%s_proxy" % e.api(),
-

[Qemu-devel] [PATCH v6 6/7] trace: [tcg, trivial] Re-align generated code

2016-12-28 Thread Lluís Vilanova
Last patch removed a nesting level in generated code. Re-align all code
generated by backends to be 4-column aligned.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/backend/dtrace.py |2 +-
 scripts/tracetool/backend/ftrace.py |   20 ++--
 scripts/tracetool/backend/log.py|   17 +
 scripts/tracetool/backend/simple.py |2 +-
 scripts/tracetool/backend/syslog.py |6 +++---
 scripts/tracetool/backend/ust.py|2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/scripts/tracetool/backend/dtrace.py 
b/scripts/tracetool/backend/dtrace.py
index 79505c6b1a..b3a8645bf0 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -41,6 +41,6 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('QEMU_%(uppername)s(%(argnames)s);',
+out('QEMU_%(uppername)s(%(argnames)s);',
 uppername=event.name.upper(),
 argnames=", ".join(event.args.names()))
diff --git a/scripts/tracetool/backend/ftrace.py 
b/scripts/tracetool/backend/ftrace.py
index db9fe7ad57..dd0eda4441 100644
--- a/scripts/tracetool/backend/ftrace.py
+++ b/scripts/tracetool/backend/ftrace.py
@@ -29,17 +29,17 @@ def generate_h(event, group):
 if len(event.args) > 0:
 argnames = ", " + argnames
 
-out('{',
-'char ftrace_buf[MAX_TRACE_STRLEN];',
-'int unused __attribute__ ((unused));',
-'int trlen;',
-'if (trace_event_get_state(%(event_id)s)) {',
-'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
-' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
-'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
-'unused = write(trace_marker_fd, ftrace_buf, trlen);',
-'}',
+out('{',
+'char ftrace_buf[MAX_TRACE_STRLEN];',
+'int unused __attribute__ ((unused));',
+'int trlen;',
+'if (trace_event_get_state(%(event_id)s)) {',
+'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
+' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
+'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
+'unused = write(trace_marker_fd, ftrace_buf, trlen);',
 '}',
+'}',
 name=event.name,
 args=event.args,
 event_id="TRACE_" + event.name.upper(),
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
index 4f4a4d38b1..7d2c3abe75 100644
--- a/scripts/tracetool/backend/log.py
+++ b/scripts/tracetool/backend/log.py
@@ -35,14 +35,15 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'struct timeval _now;',
-'gettimeofday(&_now, NULL);',
-'qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " 
%(fmt)s "\\n",',
-'  getpid(),',
-'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
-'  %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'struct timeval _now;',
+'gettimeofday(&_now, NULL);',
+'qemu_log_mask(LOG_TRACE,',
+'  "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
+'  getpid(),',
+'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
+'  %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index 85f61028e2..a28460b1e4 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -37,7 +37,7 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('_simple_%(api)s(%(args)s);',
+out('_simple_%(api)s(%(args)s);',
 api=event.api(),
 args=", ".join(event.args.names()))
 
diff --git a/scripts/tracetool/backend/syslog.py 
b/scripts/tracetool/backend/syslog.py
index b8ff2790c4..1ce627f0fc 100644
--- a/scripts/tracetool/backend/syslog.py
+++ b/scripts/tracetool/backend/syslog.py
@@ -35,9 +35,9 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip

[Qemu-devel] [PATCH v6 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state

2016-12-28 Thread Lluís Vilanova
Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova 
---
 cpu-exec.c|   26 +-
 include/exec/exec-all.h   |5 +
 include/exec/tb-hash-xx.h |8 +++-
 include/exec/tb-hash.h|5 +++--
 include/qemu-common.h |3 +++
 tests/qht-bench.c |2 +-
 trace/control-target.c|3 +++
 trace/control.h   |3 +++
 translate-all.c   |   16 ++--
 9 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 1b7366efb0..a377505b9c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -262,6 +262,7 @@ struct tb_desc {
 CPUArchState *env;
 tb_page_addr_t phys_page1;
 uint32_t flags;
+TRACE_QHT_VCPU_DSTATE_TYPE trace_vcpu_dstate;
 };
 
 static bool tb_cmp(const void *p, const void *d)
@@ -273,6 +274,7 @@ static bool tb_cmp(const void *p, const void *d)
 tb->page_addr[0] == desc->phys_page1 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
 !atomic_read(&tb->invalid)) {
 /* check next page if needed */
 if (tb->page_addr[1] == -1) {
@@ -294,7 +296,8 @@ static bool tb_cmp(const void *p, const void *d)
 static TranslationBlock *tb_htable_lookup(CPUState *cpu,
   target_ulong pc,
   target_ulong cs_base,
-  uint32_t flags)
+  uint32_t flags,
+  uint32_t trace_vcpu_dstate)
 {
 tb_page_addr_t phys_pc;
 struct tb_desc desc;
@@ -303,10 +306,11 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu,
 desc.env = (CPUArchState *)cpu->env_ptr;
 desc.cs_base = cs_base;
 desc.flags = flags;
+desc.trace_vcpu_dstate = trace_vcpu_dstate;
 desc.pc = pc;
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
-h = tb_hash_func(phys_pc, pc, flags);
+h = tb_hash_func(phys_pc, pc, flags, trace_vcpu_dstate);
 return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
@@ -318,16 +322,24 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags;
+unsigned long trace_vcpu_dstate_bitmap;
+TRACE_QHT_VCPU_DSTATE_TYPE trace_vcpu_dstate;
 bool have_tb_lock = false;
 
+bitmap_copy(&trace_vcpu_dstate_bitmap, cpu->trace_dstate,
+trace_get_vcpu_event_count());
+memcpy(&trace_vcpu_dstate, &trace_vcpu_dstate_bitmap,
+   sizeof(trace_vcpu_dstate));
+
 /* we record a subset of the CPU state. It will
always be the same before a given translated block
is executed. */
 cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
 tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
- tb->flags != flags)) {
-tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+ tb->flags != flags ||
+ tb->trace_vcpu_dstate != trace_vcpu_dstate)) {
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, trace_vcpu_dstate);
 if (!tb) {
 
 /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
@@ -341,7 +353,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 /* There's a chance that our desired tb has been translated while
  * taking the locks so we check again inside the lock.
  */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, trace_vcpu_dstate);
 if (!tb) {
 /* if no translated code available, then translate it now */
 tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
@@ -465,6 +477,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
 bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
 trace_get_vcpu_event_count());
+tb_flush_jmp_cache_all(cpu

[Qemu-devel] [PATCH v6 2/7] trace: Make trace_get_vcpu_event_count() inlinable

2016-12-28 Thread Lluís Vilanova
Later patches will make use of it.

Signed-off-by: Lluís Vilanova 
---
 trace/control-internal.h |5 +
 trace/control.c  |9 ++---
 trace/control.h  |2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/trace/control-internal.h b/trace/control-internal.h
index a9d395a587..beb98a0d2c 100644
--- a/trace/control-internal.h
+++ b/trace/control-internal.h
@@ -16,6 +16,7 @@
 
 
 extern int trace_events_enabled_count;
+extern uint32_t trace_next_vcpu_id;
 
 
 static inline bool trace_event_is_pattern(const char *str)
@@ -82,6 +83,10 @@ static inline bool 
trace_event_get_vcpu_state_dynamic(CPUState *vcpu,
 return trace_event_get_vcpu_state_dynamic_by_vcpu_id(vcpu, vcpu_id);
 }
 
+static inline uint32_t trace_get_vcpu_event_count(void)
+{
+return trace_next_vcpu_id;
+}
 
 void trace_event_register_group(TraceEvent **events);
 
diff --git a/trace/control.c b/trace/control.c
index 1a7bee6ddc..52d0e343fa 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -36,7 +36,7 @@ typedef struct TraceEventGroup {
 static TraceEventGroup *event_groups;
 static size_t nevent_groups;
 static uint32_t next_id;
-static uint32_t next_vcpu_id;
+uint32_t trace_next_vcpu_id;
 
 QemuOptsList qemu_trace_opts = {
 .name = "trace",
@@ -65,7 +65,7 @@ void trace_event_register_group(TraceEvent **events)
 for (i = 0; events[i] != NULL; i++) {
 events[i]->id = next_id++;
 if (events[i]->vcpu_id != TRACE_VCPU_EVENT_NONE) {
-events[i]->vcpu_id = next_vcpu_id++;
+events[i]->vcpu_id = trace_next_vcpu_id++;
 }
 }
 event_groups = g_renew(TraceEventGroup, event_groups, nevent_groups + 1);
@@ -299,8 +299,3 @@ char *trace_opt_parse(const char *optarg)
 
 return trace_file;
 }
-
-uint32_t trace_get_vcpu_event_count(void)
-{
-return next_vcpu_id;
-}
diff --git a/trace/control.h b/trace/control.h
index ccaeac8552..80d326c4d1 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -237,7 +237,7 @@ char *trace_opt_parse(const char *optarg);
  *
  * Return the number of known vcpu-specific events
  */
-uint32_t trace_get_vcpu_event_count(void);
+static uint32_t trace_get_vcpu_event_count(void);
 
 
 #include "trace/control-internal.h"




[Qemu-devel] [PATCH v6 7/7] trace: [trivial] Statically enable all guest events

2016-12-28 Thread Lluís Vilanova
The optimizations of this series makes it feasible to have them
available on all builds.

Signed-off-by: Lluís Vilanova 
---
 trace-events |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index f74e1d3d22..0a0f4d9cd6 100644
--- a/trace-events
+++ b/trace-events
@@ -159,7 +159,7 @@ vcpu guest_cpu_reset(void)
 #
 # Mode: user, softmmu
 # Targets: TCG(all)
-disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
+vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
 
 # @num: System call number.
 # @arg*: System call argument value.
@@ -168,7 +168,7 @@ disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) 
"info=%d", "vaddr=0x
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, 
uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, 
uint64_t arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
+vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint64_t 
arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, uint64_t 
arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
 
 # @num: System call number.
 # @ret: System call result value.
@@ -177,4 +177,4 @@ disable vcpu guest_user_syscall(uint64_t num, uint64_t 
arg1, uint64_t arg2, uint
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) 
"num=0x%016"PRIx64" ret=0x%016"PRIx64
+vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) "num=0x%016"PRIx64" 
ret=0x%016"PRIx64




[Qemu-devel] [PATCH v6 3/7] trace: [tcg] Delay changes to dynamic state when translating

2016-12-28 Thread Lluís Vilanova
This keeps consistency across all decisions taken during translation
when the dynamic state of a vCPU is changed in the middle of translating
some guest code.

Signed-off-by: Lluís Vilanova 
---
 cpu-exec.c |   26 ++
 include/qom/cpu.h  |7 +++
 qom/cpu.c  |4 
 trace/control-target.c |   11 +--
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4188fed3c6..1b7366efb0 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -33,6 +33,7 @@
 #include "hw/i386/apic.h"
 #endif
 #include "sysemu/replay.h"
+#include "trace/control.h"
 
 /* -icount align implementation. */
 
@@ -451,9 +452,21 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #ifndef CONFIG_USER_ONLY
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* delay changes to this vCPU's dstate during translation */
+atomic_set(&cpu->trace_dstate_delayed_req, false);
+atomic_set(&cpu->trace_dstate_must_delay, true);
+
 /* try to cause an exception pending in the log */
 cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
 *ret = -1;
+
+/* apply and disable delayed dstate changes */
+atomic_set(&cpu->trace_dstate_must_delay, false);
+if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
+bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 return true;
 #endif
 }
@@ -634,8 +647,21 @@ int cpu_exec(CPUState *cpu)
 
 for(;;) {
 cpu_handle_interrupt(cpu, &last_tb);
+
+/* delay changes to this vCPU's dstate during translation */
+atomic_set(&cpu->trace_dstate_delayed_req, false);
+atomic_set(&cpu->trace_dstate_must_delay, true);
+
 tb = tb_find(cpu, last_tb, tb_exit);
 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
+
+/* apply and disable delayed dstate changes */
+atomic_set(&cpu->trace_dstate_must_delay, false);
+if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
+bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 /* Try to align the host and virtual clocks
if the guest is in advance */
 align_clocks(&sc, cpu);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 3f79a8e955..58255d06fa 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -295,6 +295,10 @@ struct qemu_work_item;
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
+ * @trace_dstate_must_delay: Whether a change to trace_dstate must be delayed.
+ * @trace_dstate_delayed_req: Whether a change to trace_dstate was delayed.
+ * @trace_dstate_delayed: Delayed changes to trace_dstate (includes all changes
+ *to @trace_dstate).
  * @trace_dstate: Dynamic tracing state of events for this vCPU (bitmask).
  *
  * State of one CPU core or thread.
@@ -370,6 +374,9 @@ struct CPUState {
  * Dynamically allocated based on bitmap requried to hold up to
  * trace_get_vcpu_event_count() entries.
  */
+bool trace_dstate_must_delay;
+bool trace_dstate_delayed_req;
+unsigned long *trace_dstate_delayed;
 unsigned long *trace_dstate;
 
 /* TODO Move common fields from CPUArchState here. */
diff --git a/qom/cpu.c b/qom/cpu.c
index 03d9190f8c..d56496d28d 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -367,6 +367,9 @@ static void cpu_common_initfn(Object *obj)
 QTAILQ_INIT(&cpu->breakpoints);
 QTAILQ_INIT(&cpu->watchpoints);
 
+cpu->trace_dstate_must_delay = false;
+cpu->trace_dstate_delayed_req = false;
+cpu->trace_dstate_delayed = bitmap_new(trace_get_vcpu_event_count());
 cpu->trace_dstate = bitmap_new(trace_get_vcpu_event_count());
 
 cpu_exec_initfn(cpu);
@@ -375,6 +378,7 @@ static void cpu_common_initfn(Object *obj)
 static void cpu_common_finalize(Object *obj)
 {
 CPUState *cpu = CPU(obj);
+g_free(cpu->trace_dstate_delayed);
 g_free(cpu->trace_dstate);
 }
 
diff --git a/trace/control-target.c b/trace/control-target.c
index 7ebf6e0bcb..aba8db55de 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -69,13 +69,20 @@ void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
 if (state_pre != state) {
 if (state) {
 trace_events_enabled_count++;
-set_bit(vcpu_id, vcpu->trace_dstate);
+set_bit(vcpu_id, vcpu->trace_dstate_delayed);
+if (!atomic_read(&vcpu->trace_dstate_must_delay)) {
+set_bit(vcpu_id, vcpu->trace_dstate);
+

[Qemu-devel] [PATCH v6 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache

2016-12-28 Thread Lluís Vilanova
The function is reused in later patches.

Signed-off-by: Lluís Vilanova 
---
 cputlb.c|2 +-
 include/exec/exec-all.h |6 ++
 translate-all.c |   14 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 813279f3bc..9bf9960e1b 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -80,7 +80,7 @@ void tlb_flush(CPUState *cpu, int flush_global)
 
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+tb_flush_jmp_cache_all(cpu);
 
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a8c13cee66..57cd978578 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -256,6 +256,12 @@ struct TranslationBlock {
 };
 
 void tb_free(TranslationBlock *tb);
+/**
+ * tb_flush_jmp_cache_all:
+ *
+ * Flush the virtual translation block cache.
+ */
+void tb_flush_jmp_cache_all(CPUState *env);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
diff --git a/translate-all.c b/translate-all.c
index 3dd9214904..29ccb9e546 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -941,11 +941,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 }
 
 CPU_FOREACH(cpu) {
-int i;
-
-for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
-atomic_set(&cpu->tb_jmp_cache[i], NULL);
-}
+tb_flush_jmp_cache_all(cpu);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -1741,6 +1737,14 @@ void tb_check_watchpoint(CPUState *cpu)
 }
 }
 
+void tb_flush_jmp_cache_all(CPUState *cpu)
+{
+int i;
+for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+atomic_set(&cpu->tb_jmp_cache[i], NULL);
+}
+}
+
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
must be at the end of the TB */




[Qemu-devel] [PATCH v6 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches

2016-12-28 Thread Lluís Vilanova
Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
accesses), making it feasible to statically enable them by default on all QEMU
builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real0m2,259s
user0m2,252s
sys 0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real0m4,921s
user0m4,912s
sys 0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real0m2,228s
user0m2,216s
sys 0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real0m2,229s
user0m2,224s
sys 0m0,004s


Right now, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, where the event's dynamic state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series integrates the dynamic tracing event state
into the TB hashing function, so that vCPUs tracing different events will use
separate TBs. Note that only events with the 'vcpu' property are used for
hashing (as stored in the bitmap of CPUState->trace_dstate).

This makes dynamic event state changes on vCPUs very efficient, since they can
use TBs produced by other vCPUs while on the same event state combination (or
produced by the same vCPU, earlier).

Discarded alternatives:

* Emitting TCG code to check if an event needs tracing, where we should still
  move the tracing call code to either a cold path (making tracing performance
  worse), or leave it inlined (making non-tracing performance worse).

* Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
  it on a single vCPU will impact the performance of all other vCPUs that are
  not tracing that event.

Signed-off-by: Lluís Vilanova 
---

Changes in v6
=

* Check hashing size error with QEMU_BUILD_BUG_ON [Richard Henderson].


Changes in v5
=

* Move define into "qemu-common.h" to allow compilation of tests.


Changes in v4
=

* Incorporate trace_dstate into the TB hashing function instead of using
  multiple physical TB caches [suggested by Richard Henderson].


Changes in v3
=

* Rebase on 0737f32daf.
* Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
* Refactor trace_get_vcpu_event_count() to be inlinable.
* Optimize cpu_tb_cache_set_requested() (hottest path).


Changes in v2
=

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].


Lluís Vilanova (7):
  exec: [tcg] Refactor flush of per-CPU virtual TB cache
  trace: Make trace_get_vcpu_event_count() inlinable
  trace: [tcg] Delay changes to dynamic state when translating
  exec: [tcg] Use different TBs according to the vCPU's dynamic tracing 
state
  trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  trace: [tcg,trivial] Re-align generated code
  trace: [trivial] Statically enable all guest events


 cpu-exec.c   |   52 +++---
 cputlb.c |2 +
 include/exec/exec-all.h  |   11 ++
 include/exec/tb-hash-xx.h|8 -
 include/exec/tb-hash.h   |5 ++-
 include/qemu-common.h|3 ++
 include/qom/cpu.h|7 
 qom/cpu.c|4 ++
 scripts/tracetool/__init__.py|1 +
 scripts/tracetool/backend/dtrace.py  |2 +
 scripts/tracetool/backend/ftrace.py  |   20 ++--
 scripts/tracetool/backend/log.py |   17 +-
 scripts/tracetool/backend/simple.py  |2 +
 scripts/tracetool/backend/syslog.py  |6 ++-
 scripts/tracetool/backend/ust.py |2 +
 scripts/tracetool/format/h.py|   24 ++
 scripts/tracetool/format/tcg_h.py|   19 +--
 scripts/tracetool/format/tcg_helper_c.py |3 +-
 tests/qht-bench.c|2 +
 trace-events |6 ++-
 trace/control-internal.h |5 +++
 trace/control-target.c   |   14 +++-
 trace/control.c  |9 +
 trace/control.h  |5 ++-
 translate-all.c  |   30 +
 

Re: [Qemu-devel] Looking for a linux-user mode test

2016-12-28 Thread Peter Maydell
On 28 December 2016 at 17:12, Sean Bruno  wrote:
> On 12/28/16 10:05, Peter Maydell wrote:
>> Ideally all of that rework (including the support for properly
>> interrupting syscalls without races) should be ported over to
>> bsd-user at some point.
>
> If you have a moment to point me at the merge commit that pulled in the
> majority of this overhaul, I'll take a moment to review it for
> application to bsd-user.

Merges 430da7a81d356e3, 3e904d6ade7f36, b66e10e4c9ae7,
d6550e9ed2e1a60 (listed here latest first but probably more
helpfully examined the other way round) have the bulk of it,
there are probably some bugfixes that got in via other merges.

thanks
-- PMM



[Qemu-devel] [PATCH] linux-user: always start with parallel_cpus set to true

2016-12-28 Thread Laurent Vivier
We always need real atomics, as we can have shared memory between
processes.

A good test case is the example from futex(2), futex_demo.c:

the use case is

mmap(...);
fork();

Parent and Child:

while(...)
__sync_bool_compare_and_swap(...)
...
futex(...)

In this case we need real atomics in __sync_bool_compare_and_swap(),
but as parallel_cpus is set to 0, we don't have.

We also revert "b67cb68 linux-user: enable parallel code generation on clone"
as parallel_cpus in unconditionally set now.

Of course, this doesn't fix atomics that are emulated using
cpu_loop_exit_atomic() as we can't stop virtual CPUs from another processes.

Signed-off-by: Laurent Vivier 
---
 linux-user/syscall.c | 8 
 translate-all.c  | 4 
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 7b77503..db697c0 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6164,14 +6164,6 @@ static int do_fork(CPUArchState *env, unsigned int 
flags, abi_ulong newsp,
 sigfillset(&sigmask);
 sigprocmask(SIG_BLOCK, &sigmask, &info.sigmask);
 
-/* If this is our first additional thread, we need to ensure we
- * generate code for parallel execution and flush old translations.
- */
-if (!parallel_cpus) {
-parallel_cpus = true;
-tb_flush(cpu);
-}
-
 ret = pthread_create(&info.thread, &attr, clone_func, &info);
 /* TODO: Free new CPU state if thread creation failed.  */
 
diff --git a/translate-all.c b/translate-all.c
index 3dd9214..0b0bb09 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -142,7 +142,11 @@ static void *l1_map[V_L1_MAX_SIZE];
 
 /* code generation context */
 TCGContext tcg_ctx;
+#ifdef CONFIG_USER_ONLY
+bool parallel_cpus = true;
+#else
 bool parallel_cpus;
+#endif
 
 /* translation block context */
 #ifdef CONFIG_USER_ONLY
-- 
2.7.4




Re: [Qemu-devel] Looking for a linux-user mode test

2016-12-28 Thread Sean Bruno


On 12/28/16 10:05, Peter Maydell wrote:
> On 28 December 2016 at 15:06, Sean Bruno  wrote:
>> After some recent-ish changes to how user mode executes things/stuff,
>> I'm running into issues with the out of tree bsd-user mode code that
>> FreeBSD has been maintaining.  It looks like the host_signal_handler()
>> is never executed or registered correctly in our code.  I'm curious if
>> the linux-user code can handle this bit of configure script from m4.
>>
>> https://people.freebsd.org/~sbruno/stack.c
> 
> Hmm. That code does:
>  * set up a SIGSEGV signal handler to run on its own stack
>  * go into an infinite recursion, expecting to run out of
>stack and trigger a SEGV
> which is a bit of an obscure corner case of signal handling.
> 
> We recently fixed a lot of signal handler related bugs in linux-user
> by doing a significant overhaul of that code. If bsd-user is still
> using the old broken approach it's probably still got lots of bugs
> in it. Alternatively, it's possible we changed some of the core
> code in that process and broke bsd-user by mistake.
> 
> Ideally all of that rework (including the support for properly
> interrupting syscalls without races) should be ported over to
> bsd-user at some point.

If you have a moment to point me at the merge commit that pulled in the
majority of this overhaul, I'll take a moment to review it for
application to bsd-user.

> 
>> If someone has the time/inclination, can this code be compiled for ARMv6
>> and executed in a linux chroot with the -strace argument applied?  I see
>> the following, which after much debugging seems to indicate that the
>> host_signal_handler() code is never executed as this code is requesting
>> that SIGSEGV be masked to its own handler.
> 
> Built for ARMv7 since I don't have an ARMv6 cross compiler
> or system, but it works ok for linux (also, built with -static
> rather than run in a chroot, for convenience):
> 
> e104462:xenial:qemu$ ./build/arm-linux/arm-linux-user/qemu-arm -strace
> ~/linaro/qemu-misc-tests/stack
> 29798 uname(0xf6fff1f0) = 0
> 29798 brk(NULL) = 0x0007f000
> 29798 brk(0x0007fd00) = 0x0007fd00
> 29798 readlink("/proc/self/exe",0xf6ffe328,4096) = 43
> 29798 brk(0x000a0d00) = 0x000a0d00
> 29798 brk(0x000a1000) = 0x000a1000
> 29798 access("/etc/ld.so.nohwcap",F_OK) = -1 errno=2 (No such file or 
> directory)
> 29798 sigaltstack(0xf6fff2e0,(nil)) = 0
> 29798 rt_sigaction(SIGSEGV,0xf6fff1b0,NULL) = 0
> --- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr = 0xf67c} ---
> 29798 exit_group(0)
> 
> (the enhancement to linux-user's strace to print the line on signal
> delivery is also a pretty new change.)
> 

Thanks.  This is what I expect to see.

>> https://people.freebsd.org/~sbruno/qemu-bsd-user-arm.txt
>>
>> Prior to 7e6c57e2957c7d868f74bd0d53b5e861b495e1c7 this DTRT for our
>> ARMv6 targets.
> 
> This commit hash doesn't seem to be in QEMU master.
> 

*sigh* ... that was the merge commit to the bsd-user branch I maintain.
Ignore it.

> thanks
> -- PMM
> 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Peter Maydell
On 28 December 2016 at 16:10, Eric Blake  wrote:
> On 12/28/2016 08:53 AM, Christopher Covington wrote:
>> The definition of the major() and minor() macros are moving within glibc to
>> . Include this header to avoid the following sorts of
>> build-stopping messages:
>>
>
>> The additional include allows the build to complete on Fedora 26 (Rawhide)
>> with glibc version 2.24.90.
>>
>> Signed-off-by: Christopher Covington 
>> ---
>>  include/sysemu/os-posix.h | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
>> index b0a6c0695b..772d58f7ed 100644
>> --- a/include/sysemu/os-posix.h
>> +++ b/include/sysemu/os-posix.h
>> @@ -28,6 +28,7 @@
>>
>>  #include 
>>  #include 
>> +#include 
>
> I repeat what I said on v1:
>
> Works for glibc; but  is non-standard and not present
> on some other systems, so this may fail to build elsewhere.  You'll
> probably need a configure probe.  Autoconf also says that some platforms
> have  instead of  (per its AC_HEADER_MAJOR
> macro).

Also this seems straightforwardly like a bug in glibc: it shouldn't
be making this kind of breaking change. makedev(3) on my Linux box
says nothing about needing sysmacros.h for these.

thanks
-- PMM



Re: [Qemu-devel] [PATCH for-2.9] numa: make -numa parser dynamically allocate CPUs masks

2016-12-28 Thread Eduardo Habkost
On Fri, Nov 18, 2016 at 12:02:54PM +0100, Igor Mammedov wrote:
> so it won't impose an additional limits on max_cpus limits
> supported by different targets.
> 
> It removes global MAX_CPUMASK_BITS constant and need to
> bump it up whenever max_cpus is being increased for
> a target above MAX_CPUMASK_BITS value.
> 
> Use runtime max_cpus value instead to allocate sufficiently
> sized node_cpu bitmasks in numa parser.
> 
> Signed-off-by: Igor Mammedov 

Reviewed-by: Eduardo Habkost 

As the cpu_index assignment code isn't obviously safe against
setting cpu_index > max_cpus, I would like to squash this into
the patch. Is that OK for you?

diff --git a/numa.c b/numa.c
index 1b6fa78..33f2fd4 100644
--- a/numa.c
+++ b/numa.c
@@ -401,6 +401,7 @@ void numa_post_machine_init(void)
 
 CPU_FOREACH(cpu) {
 for (i = 0; i < nb_numa_nodes; i++) {
+assert(cpu->cpu_index < max_cpus);
 if (test_bit(cpu->cpu_index, numa_info[i].node_cpu)) {
 cpu->numa_node = i;
 }
@@ -559,6 +560,8 @@ int numa_get_node_for_cpu(int idx)
 {
 int i;
 
+assert(idx < max_cpus);
+
 for (i = 0; i < nb_numa_nodes; i++) {
 if (test_bit(idx, numa_info[i].node_cpu)) {
 break;

-- 
Eduardo



Re: [Qemu-devel] Looking for a linux-user mode test

2016-12-28 Thread Peter Maydell
On 28 December 2016 at 15:06, Sean Bruno  wrote:
> After some recent-ish changes to how user mode executes things/stuff,
> I'm running into issues with the out of tree bsd-user mode code that
> FreeBSD has been maintaining.  It looks like the host_signal_handler()
> is never executed or registered correctly in our code.  I'm curious if
> the linux-user code can handle this bit of configure script from m4.
>
> https://people.freebsd.org/~sbruno/stack.c

Hmm. That code does:
 * set up a SIGSEGV signal handler to run on its own stack
 * go into an infinite recursion, expecting to run out of
   stack and trigger a SEGV
which is a bit of an obscure corner case of signal handling.

We recently fixed a lot of signal handler related bugs in linux-user
by doing a significant overhaul of that code. If bsd-user is still
using the old broken approach it's probably still got lots of bugs
in it. Alternatively, it's possible we changed some of the core
code in that process and broke bsd-user by mistake.

Ideally all of that rework (including the support for properly
interrupting syscalls without races) should be ported over to
bsd-user at some point.

> If someone has the time/inclination, can this code be compiled for ARMv6
> and executed in a linux chroot with the -strace argument applied?  I see
> the following, which after much debugging seems to indicate that the
> host_signal_handler() code is never executed as this code is requesting
> that SIGSEGV be masked to its own handler.

Built for ARMv7 since I don't have an ARMv6 cross compiler
or system, but it works ok for linux (also, built with -static
rather than run in a chroot, for convenience):

e104462:xenial:qemu$ ./build/arm-linux/arm-linux-user/qemu-arm -strace
~/linaro/qemu-misc-tests/stack
29798 uname(0xf6fff1f0) = 0
29798 brk(NULL) = 0x0007f000
29798 brk(0x0007fd00) = 0x0007fd00
29798 readlink("/proc/self/exe",0xf6ffe328,4096) = 43
29798 brk(0x000a0d00) = 0x000a0d00
29798 brk(0x000a1000) = 0x000a1000
29798 access("/etc/ld.so.nohwcap",F_OK) = -1 errno=2 (No such file or directory)
29798 sigaltstack(0xf6fff2e0,(nil)) = 0
29798 rt_sigaction(SIGSEGV,0xf6fff1b0,NULL) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=1, si_addr = 0xf67c} ---
29798 exit_group(0)

(the enhancement to linux-user's strace to print the line on signal
delivery is also a pretty new change.)

> https://people.freebsd.org/~sbruno/qemu-bsd-user-arm.txt
>
> Prior to 7e6c57e2957c7d868f74bd0d53b5e861b495e1c7 this DTRT for our
> ARMv6 targets.

This commit hash doesn't seem to be in QEMU master.

thanks
-- PMM



Re: [Qemu-devel] [PATCH] target/i386: Fix bad patch application to translate.c

2016-12-28 Thread Eduardo Habkost
On Sat, Dec 24, 2016 at 08:29:33PM +, Doug Evans wrote:
> In commit c52ab08aee6f7d4717fc6b517174043126bd302f,
> the patch snippet for the "syscall" insn got applied to "iret".
> 
> Signed-off-by: Doug Evans 

Patch was corrupt, I have fixed line wrapping by hand and had to
use git-am --ignore-whitespace to apply it.

I suggest using git-send-email, as e-mail clients often break
patch contents when copying&pasting.

Fixed patch below, for reference:

---
 target/i386/translate.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 59e11fc..7adfff0 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -6435,10 +6435,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
   tcg_const_i32(s->pc - s->cs_base));
 set_cc_op(s, CC_OP_EFLAGS);
 }
-/* TF handling for the syscall insn is different. The TF bit is checked
-   after the syscall insn completes. This allows #DB to not be
-   generated after one has entered CPL0 if TF is set in FMASK.  */
-gen_eob_worker(s, false, true);
+gen_eob(s);
 break;
 case 0xe8: /* call im */
 {
@@ -7119,7 +7116,10 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 gen_update_cc_op(s);
 gen_jmp_im(pc_start - s->cs_base);
 gen_helper_syscall(cpu_env, tcg_const_i32(s->pc - pc_start));
-gen_eob(s);
+/* TF handling for the syscall insn is different. The TF bit is  
checked
+   after the syscall insn completes. This allows #DB to not be
+   generated after one has entered CPL0 if TF is set in FMASK.  */
+gen_eob_worker(s, false, true);
 break;
 case 0x107: /* sysret */
 if (!s->pe) {
-- 
2.7.4


> ---
>  target/i386/translate.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/target/i386/translate.c b/target/i386/translate.c
> index 59e11fc..7e9d073 100644
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -6435,10 +6435,7 @@ static target_ulong disas_insn(CPUX86State *env,
> DisasContext *s,
>tcg_const_i32(s->pc - s->cs_base));
>  set_cc_op(s, CC_OP_EFLAGS);
>  }
> -/* TF handling for the syscall insn is different. The TF bit is
> checked
> -   after the syscall insn completes. This allows #DB to not be
> -   generated after one has entered CPL0 if TF is set in FMASK.  */
> -gen_eob_worker(s, false, true);
> +gen_eob(s);
>  break;
>  case 0xe8: /* call im */
>  {
> @@ -7119,7 +7116,10 @@ static target_ulong disas_insn(CPUX86State *env,
> DisasContext *s,
>  gen_update_cc_op(s);
>  gen_jmp_im(pc_start - s->cs_base);
>  gen_helper_syscall(cpu_env, tcg_const_i32(s->pc - pc_start));
> -gen_eob(s);
> +/* TF handling for the syscall insn is different. The TF bit is
> checked
> +   after the syscall insn completes. This allows #DB to not be
> +   generated after one has entered CPL0 if TF is set in FMASK.  */
> +gen_eob_worker(s, false, true);
>  break;
>  case 0x107: /* sysret */
>  if (!s->pe) {
> -- 
> 2.8.0.rc3.226.g39d4020
> 
> 

-- 
Eduardo



[Qemu-devel] [PATCH v5 1/6] Pass generic CPUState to gen_intermediate_code()

2016-12-28 Thread Lluís Vilanova
Needed to implement a target-agnostic gen_intermediate_code() in the
future.

Signed-off-by: Lluís Vilanova 
Reviewed-by: David Gibson 
---
 include/exec/exec-all.h   |2 +-
 target-alpha/translate.c  |   11 +--
 target-arm/translate.c|   24 
 target-cris/translate.c   |   17 -
 target-i386/translate.c   |   13 ++---
 target-lm32/translate.c   |   22 +++---
 target-m68k/translate.c   |   15 +++
 target-microblaze/translate.c |   22 +++---
 target-mips/translate.c   |   15 +++
 target-moxie/translate.c  |   14 +++---
 target-openrisc/translate.c   |   22 +++---
 target-ppc/translate.c|   15 +++
 target-s390x/translate.c  |   13 ++---
 target-sh4/translate.c|   15 +++
 target-sparc/translate.c  |   11 +--
 target-tilegx/translate.c |7 +++
 target-tricore/translate.c|9 -
 target-unicore32/translate.c  |   17 -
 target-xtensa/translate.c |   13 ++---
 translate-all.c   |2 +-
 20 files changed, 133 insertions(+), 146 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a8c13cee66..0e45e1aedc 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -43,7 +43,7 @@ typedef ram_addr_t tb_page_addr_t;
 
 #include "qemu/log.h"
 
-void gen_intermediate_code(CPUArchState *env, struct TranslationBlock *tb);
+void gen_intermediate_code(CPUState *env, struct TranslationBlock *tb);
 void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb,
   target_ulong *data);
 
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 114927b751..6759ec28cc 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2873,10 +2873,9 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 return ret;
 }
 
-void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
+void gen_intermediate_code(CPUState *cpu, struct TranslationBlock *tb)
 {
-AlphaCPU *cpu = alpha_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
+CPUAlphaState *env = cpu->env_ptr;
 DisasContext ctx, *ctxp = &ctx;
 target_ulong pc_start;
 target_ulong pc_mask;
@@ -2891,7 +2890,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 ctx.pc = pc_start;
 ctx.mem_idx = cpu_mmu_index(env, false);
 ctx.implver = env->implver;
-ctx.singlestep_enabled = cs->singlestep_enabled;
+ctx.singlestep_enabled = cpu->singlestep_enabled;
 
 #ifdef CONFIG_USER_ONLY
 ctx.ir = cpu_std_ir;
@@ -2934,7 +2933,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 tcg_gen_insn_start(ctx.pc);
 num_insns++;
 
-if (unlikely(cpu_breakpoint_test(cs, ctx.pc, BP_ANY))) {
+if (unlikely(cpu_breakpoint_test(cpu, ctx.pc, BP_ANY))) {
 ret = gen_excp(&ctx, EXCP_DEBUG, 0);
 /* The address covered by the breakpoint must be included in
[tb->pc, tb->pc + tb->size) in order to for it to be
@@ -2996,7 +2995,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 && qemu_log_in_addr_range(pc_start)) {
 qemu_log_lock();
 qemu_log("IN: %s\n", lookup_symbol(pc_start));
-log_target_disas(cs, pc_start, ctx.pc - pc_start, 1);
+log_target_disas(cpu, pc_start, ctx.pc - pc_start, 1);
 qemu_log("\n");
 qemu_log_unlock();
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 0ad9070b45..3aa766901c 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11589,10 +11589,10 @@ static bool insn_crosses_page(CPUARMState *env, 
DisasContext *s)
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
+void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
+CPUARMState *env = cpu->env_ptr;
+ARMCPU *arm_cpu = arm_env_get_cpu(env);
 DisasContext dc1, *dc = &dc1;
 target_ulong pc_start;
 target_ulong next_page_start;
@@ -11606,7 +11606,7 @@ void gen_intermediate_code(CPUARMState *env, 
TranslationBlock *tb)
  * the A32/T32 complexity to do with conditional execution/IT blocks/etc.
  */
 if (ARM_TBFLAG_AARCH64_STATE(tb->flags)) {
-gen_intermediate_code_a64(cpu, tb);
+gen_intermediate_code_a64(arm_cpu, tb);
 return;
 }
 
@@ -11616,7 +11616,7 @@ void gen_intermediate_code(CPUARMState *env, 
TranslationBlock *tb)
 
 dc->is_jmp = DISAS_NEXT;
 dc->pc = pc_start;
-dc->singlestep_enabled = cs->singlestep_enabled;
+dc->singlestep_enabled = cpu->singlestep_enabled;
 dc->condjmp = 0;
 

[Qemu-devel] [PATCH v5 5/6] target: [tcg, i386] Port to generic translation framework

2016-12-28 Thread Lluís Vilanova
Signed-off-by: Lluís Vilanova 
---
 target-i386/translate.c |  304 ++-
 1 file changed, 140 insertions(+), 164 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 61d73e286f..a63627b470 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -69,6 +69,10 @@
 case (2 << 6) | (OP << 3) | 0 ... (2 << 6) | (OP << 3) | 7: \
 case (3 << 6) | (OP << 3) | 0 ... (3 << 6) | (OP << 3) | 7
 
+#include "exec/translate-all_template.h"
+#define DJ_JUMP (DJ_TARGET + 0) /* end of block due to call/jump */
+#define DJ_MISC (DJ_TARGET + 1) /* some other reason */
+
 //#define MACRO_TEST   1
 
 /* global register indexes */
@@ -94,7 +98,10 @@ static TCGv_i64 cpu_tmp1_i64;
 static int x86_64_hregs;
 #endif
 
+
 typedef struct DisasContext {
+DisasContextBase base;
+
 /* current insn context */
 int override; /* -1 if no override */
 int prefix;
@@ -102,8 +109,6 @@ typedef struct DisasContext {
 TCGMemOp dflag;
 target_ulong pc_start;
 target_ulong pc; /* pc = eip + cs_base */
-int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
-   static state change (stop translation) */
 /* current block context */
 target_ulong cs_base; /* base of CS segment */
 int pe; /* protected mode */
@@ -124,12 +129,10 @@ typedef struct DisasContext {
 int cpl;
 int iopl;
 int tf; /* TF cpu flag */
-int singlestep_enabled; /* "hardware" single step enabled */
 int jmp_opt; /* use direct block chaining for direct jumps */
 int repz_opt; /* optimize jumps within repz instructions */
 int mem_index; /* select memory access functions */
 uint64_t flags; /* all execution flags */
-struct TranslationBlock *tb;
 int popl_esp_hack; /* for correct popl with esp base handling */
 int rip_offset; /* only used in x86_64, but left for simplicity */
 int cpuid_features;
@@ -140,6 +143,8 @@ typedef struct DisasContext {
 int cpuid_xsave_features;
 } DisasContext;
 
+#include "translate-all_template.h"
+
 static void gen_eob(DisasContext *s);
 static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
@@ -1112,7 +1117,7 @@ static void gen_bpt_io(DisasContext *s, TCGv_i32 t_port, 
int ot)
 
 static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 {
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
 }
 gen_string_movl_A0_EDI(s);
@@ -1127,14 +1132,14 @@ static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 gen_op_movl_T0_Dshift(ot);
 gen_op_add_reg_T0(s->aflag, R_EDI);
 gen_bpt_io(s, cpu_tmp2_i32, ot);
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_end();
 }
 }
 
 static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 {
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
 }
 gen_string_movl_A0_ESI(s);
@@ -1147,7 +1152,7 @@ static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 gen_op_movl_T0_Dshift(ot);
 gen_op_add_reg_T0(s->aflag, R_ESI);
 gen_bpt_io(s, cpu_tmp2_i32, ot);
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_end();
 }
 }
@@ -2130,7 +2135,7 @@ static inline int insn_const_size(TCGMemOp ot)
 static inline bool use_goto_tb(DisasContext *s, target_ulong pc)
 {
 #ifndef CONFIG_USER_ONLY
-return (pc & TARGET_PAGE_MASK) == (s->tb->pc & TARGET_PAGE_MASK) ||
+return (pc & TARGET_PAGE_MASK) == (s->base.tb->pc & TARGET_PAGE_MASK) ||
(pc & TARGET_PAGE_MASK) == (s->pc_start & TARGET_PAGE_MASK);
 #else
 return true;
@@ -2145,7 +2150,7 @@ static inline void gen_goto_tb(DisasContext *s, int 
tb_num, target_ulong eip)
 /* jump to same page: we can use a direct jump */
 tcg_gen_goto_tb(tb_num);
 gen_jmp_im(eip);
-tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
+tcg_gen_exit_tb((uintptr_t)s->base.tb + tb_num);
 } else {
 /* jump to another page: currently not optimized */
 gen_jmp_im(eip);
@@ -2166,7 +2171,7 @@ static inline void gen_jcc(DisasContext *s, int b,
 
 gen_set_label(l1);
 gen_goto_tb(s, 1, val);
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_JUMP;
 } else {
 l1 = gen_new_label();
 l2 = gen_new_label();
@@ -2237,11 +2242,11 @@ static void gen_movl_seg_T0(DisasContext *s, int 
seg_reg)
stop as a special handling must be done to disable hardware
interrupts for the next instruction */
 if (seg_reg == R_SS || (s->code32 && seg_reg < R_FS))
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_JUMP;
 } else {
 gen_op_movl_seg_T0_vm(seg_reg);
 if (seg_reg == R_S

[Qemu-devel] [PATCH v5 6/6] target: [tcg, arm] Port to generic translation framework

2016-12-28 Thread Lluís Vilanova
Signed-off-by: Lluís Vilanova 
---
 target-arm/translate-a64.c |  346 ++---
 target-arm/translate.c |  720 ++--
 target-arm/translate.h |   42 ++-
 3 files changed, 555 insertions(+), 553 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 6dc27a6115..cd7a4282cb 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -296,17 +296,17 @@ static void gen_exception(int excp, uint32_t syndrome, 
uint32_t target_el)
 
 static void gen_exception_internal_insn(DisasContext *s, int offset, int excp)
 {
-gen_a64_set_pc_im(s->pc - offset);
+gen_a64_set_pc_im(s->base.pc_next - offset);
 gen_exception_internal(excp);
-s->is_jmp = DISAS_EXC;
+s->base.jmp_type = DJ_EXC;
 }
 
 static void gen_exception_insn(DisasContext *s, int offset, int excp,
uint32_t syndrome, uint32_t target_el)
 {
-gen_a64_set_pc_im(s->pc - offset);
+gen_a64_set_pc_im(s->base.pc_next - offset);
 gen_exception(excp, syndrome, target_el);
-s->is_jmp = DISAS_EXC;
+s->base.jmp_type = DJ_EXC;
 }
 
 static void gen_ss_advance(DisasContext *s)
@@ -334,7 +334,7 @@ static void gen_step_complete_exception(DisasContext *s)
 gen_ss_advance(s);
 gen_exception(EXCP_UDEF, syn_swstep(s->ss_same_el, 1, s->is_ldex),
   default_exception_el(s));
-s->is_jmp = DISAS_EXC;
+s->base.jmp_type = DJ_EXC;
 }
 
 static inline bool use_goto_tb(DisasContext *s, int n, uint64_t dest)
@@ -342,13 +342,14 @@ static inline bool use_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 /* No direct tb linking with singlestep (either QEMU's or the ARM
  * debug architecture kind) or deterministic io
  */
-if (s->singlestep_enabled || s->ss_active || (s->tb->cflags & CF_LAST_IO)) 
{
+if (s->base.singlestep_enabled || s->ss_active ||
+(s->base.tb->cflags & CF_LAST_IO)) {
 return false;
 }
 
 #ifndef CONFIG_USER_ONLY
 /* Only link tbs from inside the same guest page */
-if ((s->tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
+if ((s->base.tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
 return false;
 }
 #endif
@@ -360,21 +361,21 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 {
 TranslationBlock *tb;
 
-tb = s->tb;
+tb = s->base.tb;
 if (use_goto_tb(s, n, dest)) {
 tcg_gen_goto_tb(n);
 gen_a64_set_pc_im(dest);
 tcg_gen_exit_tb((intptr_t)tb + n);
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_TB_JUMP;
 } else {
 gen_a64_set_pc_im(dest);
 if (s->ss_active) {
 gen_step_complete_exception(s);
-} else if (s->singlestep_enabled) {
+} else if (s->base.singlestep_enabled) {
 gen_exception_internal(EXCP_DEBUG);
 } else {
 tcg_gen_exit_tb(0);
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_TB_JUMP;
 }
 }
 }
@@ -405,11 +406,11 @@ static void unallocated_encoding(DisasContext *s)
 qemu_log_mask(LOG_UNIMP, \
   "%s:%d: unsupported instruction encoding 0x%08x "  \
   "at pc=%016" PRIx64 "\n",  \
-  __FILE__, __LINE__, insn, s->pc - 4);  \
+  __FILE__, __LINE__, insn, s->base.pc_next - 4);\
 unallocated_encoding(s); \
 } while (0);
 
-static void init_tmp_a64_array(DisasContext *s)
+void init_tmp_a64_array(DisasContext *s)
 {
 #ifdef CONFIG_DEBUG_TCG
 int i;
@@ -1223,11 +1224,11 @@ static inline AArch64DecodeFn *lookup_disas_fn(const 
AArch64DecodeTable *table,
  */
 static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
 {
-uint64_t addr = s->pc + sextract32(insn, 0, 26) * 4 - 4;
+uint64_t addr = s->base.pc_next + sextract32(insn, 0, 26) * 4 - 4;
 
 if (insn & (1U << 31)) {
 /* C5.6.26 BL Branch with link */
-tcg_gen_movi_i64(cpu_reg(s, 30), s->pc);
+tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
 }
 
 /* C5.6.20 B Branch / C5.6.26 BL Branch with link */
@@ -1250,7 +1251,7 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 sf = extract32(insn, 31, 1);
 op = extract32(insn, 24, 1); /* 0: CBZ; 1: CBNZ */
 rt = extract32(insn, 0, 5);
-addr = s->pc + sextract32(insn, 5, 19) * 4 - 4;
+addr = s->base.pc_next + sextract32(insn, 5, 19) * 4 - 4;
 
 tcg_cmp = read_cpu_reg(s, rt, sf);
 label_match = gen_new_label();
@@ -1258,7 +1259,7 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 tcg_cmp, 0, label_match);
 
-gen_goto_tb(s, 0, s->pc);
+gen_goto_tb(s, 0, s->base.pc_next);
 gen_set_labe

Re: [Qemu-devel] [RFC PATCH v4 0/6] translate: [tcg] Generic translation framework

2016-12-28 Thread Lluís Vilanova
no-reply  writes:

> Hi,
> Your series failed automatic build test. Please find the testing commands and
> their output below. If you have docker installed, you can probably reproduce 
> it
> locally.

Oh, my bad. Forgot to remove some of the "restrict" I added on previous
versions.

Thanks,
  Lluis



[Qemu-devel] [PATCH v5 3/6] target: [tcg] Add generic translation framework

2016-12-28 Thread Lluís Vilanova
Signed-off-by: Lluís Vilanova 
---
 include/exec/gen-icount.h |2 
 include/exec/translate-all_template.h |   73 
 include/qom/cpu.h |   22 
 translate-all_template.h  |  204 +
 4 files changed, 300 insertions(+), 1 deletion(-)
 create mode 100644 include/exec/translate-all_template.h
 create mode 100644 translate-all_template.h

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 050de59b38..c91ac95ed7 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -45,7 +45,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_temp_free_i32(count);
 }
 
-static void gen_tb_end(TranslationBlock *tb, int num_insns)
+static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
 gen_set_label(exitreq_label);
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
diff --git a/include/exec/translate-all_template.h 
b/include/exec/translate-all_template.h
new file mode 100644
index 00..ea507f90c6
--- /dev/null
+++ b/include/exec/translate-all_template.h
@@ -0,0 +1,73 @@
+/*
+ * Generic intermediate code generation.
+ *
+ * Copyright (C) 2016 Lluís Vilanova 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef EXEC__TRANSLATE_ALL_TEMPLATE_H
+#define EXEC__TRANSLATE_ALL_TEMPLATE_H
+
+/*
+ * Include this header from a target-specific file, and add a
+ *
+ * DisasContextBase base;
+ *
+ * member in your target-specific DisasContext.
+ */
+
+
+#include "exec/exec-all.h"
+
+
+/**
+ * BreakpointHitType:
+ * @BH_MISS: No hit
+ * @BH_HIT_INSN: Hit, but continue translating instruction
+ * @BH_HIT_TB: Hit, stop translating TB
+ *
+ * How to react to a breakpoint hit.
+ */
+typedef enum BreakpointHitType {
+BH_MISS,
+BH_HIT_INSN,
+BH_HIT_TB,
+} BreakpointHitType;
+
+/**
+ * DisasJumpType:
+ * @DJ_NEXT: Next instruction in program order
+ * @DJ_TOO_MANY: Too many instructions executed
+ * @DJ_TARGET: Start of target-specific conditions
+ *
+ * What instruction to disassemble next.
+ */
+typedef enum DisasJumpType {
+DJ_NEXT,
+DJ_TOO_MANY,
+DJ_TARGET,
+} DisasJumpType;
+
+/**
+ * DisasContextBase:
+ * @tb: Translation block for this disassembly.
+ * @singlestep_enabled: "Hardware" single stepping enabled.
+ * @pc_first: Address of first guest instruction in this TB.
+ * @pc_next: Address of next guest instruction in this TB (current during
+ *   disassembly).
+ * @num_insns: Number of translated instructions (including current).
+ *
+ * Architecture-agnostic disassembly context.
+ */
+typedef struct DisasContextBase {
+TranslationBlock *tb;
+bool singlestep_enabled;
+target_ulong pc_first;
+target_ulong pc_next;
+DisasJumpType jmp_type;
+unsigned int num_insns;
+} DisasContextBase;
+
+#endif  /* EXEC__TRANSLATE_ALL_TEMPLATE_H */
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 3f79a8e955..64a288b066 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -948,6 +948,28 @@ static inline bool cpu_breakpoint_test(CPUState *cpu, 
vaddr pc, int mask)
 return false;
 }
 
+/* Get first breakpoint matching a PC */
+static inline CPUBreakpoint *cpu_breakpoint_get(CPUState *cpu, vaddr pc,
+CPUBreakpoint *bp)
+{
+if (likely(bp == NULL)) {
+if (unlikely(!QTAILQ_EMPTY(&cpu->breakpoints))) {
+QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
+if (bp->pc == pc) {
+return bp;
+}
+}
+}
+} else {
+QTAILQ_FOREACH_CONTINUE(bp, entry) {
+if (bp->pc == pc) {
+return bp;
+}
+}
+}
+return NULL;
+}
+
 int cpu_watchpoint_insert(CPUState *cpu, vaddr addr, vaddr len,
   int flags, CPUWatchpoint **watchpoint);
 int cpu_watchpoint_remove(CPUState *cpu, vaddr addr,
diff --git a/translate-all_template.h b/translate-all_template.h
new file mode 100644
index 00..6208916d08
--- /dev/null
+++ b/translate-all_template.h
@@ -0,0 +1,204 @@
+/*
+ * Generic intermediate code generation.
+ *
+ * Copyright (C) 2016 Lluís Vilanova 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef TRANSLATE_ALL_TEMPLATE_H
+#define TRANSLATE_ALL_TEMPLATE_H
+
+/*
+ * Include this header from a target-specific file, which must define the
+ * target-specific functions declared below.
+ *
+ * These must be paired with instructions in "exec/translate-all_template.h".
+ */
+
+
+#include "cpu.h"
+#include "qemu/error-report.h"
+
+
+static void gen_intermediate_code_target_init_disas_context(
+DisasContext *dc, CPUArchState *env);
+
+static void gen_intermediate_code_target_init_globals(
+DisasContext *dc, CPUArchState *

[Qemu-devel] [PATCH v5 4/6] target: [tcg] Redefine DISAS_* onto the generic translation framework (DJ_*)

2016-12-28 Thread Lluís Vilanova
Temporarily redefine DISAS_* values based on DJ_TARGET. They should
disappear as targets get ported to the generic framework.

Signed-off-by: Lluís Vilanova 
---
 include/exec/exec-all.h  |   11 +++
 target-arm/translate.h   |   15 ---
 target-cris/translate.c  |3 ++-
 target-m68k/translate.c  |3 ++-
 target-s390x/translate.c |3 ++-
 target-unicore32/translate.c |3 ++-
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0e45e1aedc..169da5ebe0 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -36,10 +36,13 @@ typedef ram_addr_t tb_page_addr_t;
 #endif
 
 /* is_jmp field values */
-#define DISAS_NEXT0 /* next instruction can be analyzed */
-#define DISAS_JUMP1 /* only pc was modified dynamically */
-#define DISAS_UPDATE  2 /* cpu state was modified dynamically */
-#define DISAS_TB_JUMP 3 /* only pc was modified statically */
+/* TODO: delete after all targets are transitioned to generic translation */
+#include "exec/translate-all_template.h"
+#define DISAS_NEXTDJ_NEXT   /* next instruction can be analyzed */
+#define DISAS_JUMP(DJ_TARGET + 0)   /* only pc was modified dynamically */
+#define DISAS_UPDATE  (DJ_TARGET + 1)   /* cpu state was modified dynamically 
*/
+#define DISAS_TB_JUMP (DJ_TARGET + 2)   /* only pc was modified statically */
+#define DISAS_TARGET  (DJ_TARGET + 3)   /* base for target-specific values */
 
 #include "qemu/log.h"
 
diff --git a/target-arm/translate.h b/target-arm/translate.h
index 285e96f087..3dd4c4578e 100644
--- a/target-arm/translate.h
+++ b/target-arm/translate.h
@@ -105,21 +105,22 @@ static inline int default_exception_el(DisasContext *s)
 }
 
 /* target-specific extra values for is_jmp */
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
 /* These instructions trap after executing, so the A32/T32 decoder must
  * defer them until after the conditional execution state has been updated.
  * WFI also needs special handling when single-stepping.
  */
-#define DISAS_WFI 4
-#define DISAS_SWI 5
+#define DISAS_WFI (DISAS_TARGET + 0)
+#define DISAS_SWI (DISAS_TARGET + 1)
 /* For instructions which unconditionally cause an exception we can skip
  * emitting unreachable code at the end of the TB in the A64 decoder
  */
-#define DISAS_EXC 6
+#define DISAS_EXC (DISAS_TARGET + 2)
 /* WFE */
-#define DISAS_WFE 7
-#define DISAS_HVC 8
-#define DISAS_SMC 9
-#define DISAS_YIELD 10
+#define DISAS_WFE (DISAS_TARGET + 3)
+#define DISAS_HVC (DISAS_TARGET + 4)
+#define DISAS_SMC (DISAS_TARGET + 5)
+#define DISAS_YIELD (DISAS_TARGET + 6)
 
 #ifdef TARGET_AARCH64
 void a64_translate_init(void);
diff --git a/target-cris/translate.c b/target-cris/translate.c
index ebcf7863bf..001714c7c1 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -50,7 +50,8 @@
 #define BUG() (gen_BUG(dc, __FILE__, __LINE__))
 #define BUG_ON(x) ({if (x) BUG();})
 
-#define DISAS_SWI 5
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
+#define DISAS_SWI (DISAS_TARGET + 0)
 
 /* Used by the decoder.  */
 #define EXTRACT_FIELD(src, start, end) \
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 6da6f2b51b..b2b0555c80 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -143,7 +143,8 @@ typedef struct DisasContext {
 int done_mac;
 } DisasContext;
 
-#define DISAS_JUMP_NEXT 4
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
+#define DISAS_JUMP_NEXT (DISAS_TARGET + 0)
 
 #if defined(CONFIG_USER_ONLY)
 #define IS_USER(s) 1
diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index a3992dae5a..75787e89e3 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -74,7 +74,8 @@ typedef struct {
 } u;
 } DisasCompare;
 
-#define DISAS_EXCP 4
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
+#define DISAS_EXCP (DISAS_TARGET + 0)
 
 #ifdef DEBUG_INLINE_BRANCHES
 static uint64_t inline_branch_hit[CC_OP_MAX];
diff --git a/target-unicore32/translate.c b/target-unicore32/translate.c
index 39eaa76b50..de0a64e1c8 100644
--- a/target-unicore32/translate.c
+++ b/target-unicore32/translate.c
@@ -45,9 +45,10 @@ typedef struct DisasContext {
 #define IS_USER(s)  1
 #endif
 
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
 /* These instructions trap after executing, so defer them until after the
conditional executions state has been updated.  */
-#define DISAS_SYSCALL 5
+#define DISAS_SYSCALL (DISAS_TARGET + 0)
 
 static TCGv_env cpu_env;
 static TCGv_i32 cpu_R[32];




[Qemu-devel] [PATCH v5 2/6] queue: Add macro for incremental traversal

2016-12-28 Thread Lluís Vilanova
Adds macro QTAILQ_FOREACH_CONTINUE to support incremental list
traversal.

Signed-off-by: Lluís Vilanova 
---
 include/qemu/queue.h |   12 
 1 file changed, 12 insertions(+)

diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 342073fb4d..ea6130f1c9 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -415,6 +415,18 @@ struct {   
 \
 (var);  \
 (var) = ((var)->field.tqe_next))
 
+/**
+ * QTAILQ_FOREACH_CONTINUE:
+ * @var: Variable to resume iteration from.
+ * @field: Field in @var holding a QTAILQ_ENTRY for this queue.
+ *
+ * Resumes iteration on a queue from the element in @var.
+ */
+#define QTAILQ_FOREACH_CONTINUE(var, field) \
+for ((var) = ((var)->field.tqe_next);   \
+(var);  \
+(var) = ((var)->field.tqe_next))
+
 #define QTAILQ_FOREACH_SAFE(var, head, field, next_var) \
 for ((var) = ((head)->tqh_first);   \
 (var) && ((next_var) = ((var)->field.tqe_next), 1); \




[Qemu-devel] [RFC PATCH v5 0/6] translate: [tcg] Generic translation framework

2016-12-28 Thread Lluís Vilanova
This series proposes a generic (target-agnostic) instruction translation
framework.

It basically provides a generic main loop for instruction disassembly, which
calls target-specific functions when necessary. This generalization makes
inserting new code in the main loop easier, and helps in keeping all targets in
synch as to the contents of it.

This series also paves the way towards adding events to trace guest code
execution (BBLs and instructions).

I've ported i386/x86-64 and arm/aarch64 as an example to see how it fits in the
current organization, but will port the rest when this series gets merged.

Signed-off-by: Lluís Vilanova 
---

Changes in v5
=

* Remove stray uses of "restrict" keyword.


Changes in v4
=

* Document new macro QTAILQ_FOREACH_CONTINUE [Peter Maydell].
* Fix coding style errors reported by checkpatch.
* Remove use of "restrict" in added functions; it makes older gcc versions barf
  about compilation errors.


Changes in v3
=

* Rebase on 0737f32daf.


Changes in v2
=

* Port ARM and AARCH64 targets.
* Fold single-stepping checks into "max_insns" [Richard Henderson].
* Move instruction start marks to target code [Richard Henderson].
* Add target hook for TB start.
* Check for TCG temporary leaks.
* Move instruction disassembly into a target hook.
* Make breakpoint_hit() return an enum to accomodate target's needs (ARM).


Lluís Vilanova (6):
  Pass generic CPUState to gen_intermediate_code()
  queue: Add macro for incremental traversal
  target: [tcg] Add generic translation framework
  target: [tcg] Redefine DISAS_* onto the generic translation framework 
(DJ_*)
  target: [tcg,i386] Port to generic translation framework
  target: [tcg,arm] Port to generic translation framework


 include/exec/exec-all.h   |   13 -
 include/exec/gen-icount.h |2 
 include/exec/translate-all_template.h |   73 +++
 include/qemu/queue.h  |   12 +
 include/qom/cpu.h |   22 +
 target-alpha/translate.c  |   11 -
 target-arm/translate-a64.c|  346 
 target-arm/translate.c|  720 +
 target-arm/translate.h|   41 +-
 target-cris/translate.c   |   20 -
 target-i386/translate.c   |  305 ++
 target-lm32/translate.c   |   22 +
 target-m68k/translate.c   |   18 -
 target-microblaze/translate.c |   22 +
 target-mips/translate.c   |   15 -
 target-moxie/translate.c  |   14 -
 target-openrisc/translate.c   |   22 +
 target-ppc/translate.c|   15 -
 target-s390x/translate.c  |   16 -
 target-sh4/translate.c|   15 -
 target-sparc/translate.c  |   11 -
 target-tilegx/translate.c |7 
 target-tricore/translate.c|9 
 target-unicore32/translate.c  |   20 -
 target-xtensa/translate.c |   13 -
 translate-all.c   |2 
 translate-all_template.h  |  204 +
 27 files changed, 1137 insertions(+), 853 deletions(-)
 create mode 100644 include/exec/translate-all_template.h
 create mode 100644 translate-all_template.h


To: qemu-devel@nongnu.org
Cc: Paolo Bonzini 
Cc: Peter Crosthwaite 
Cc: Richard Henderson 



Re: [Qemu-devel] [RFC PATCH v4 0/6] translate: [tcg] Generic translation framework

2016-12-28 Thread Lluís Vilanova
no-reply  writes:

> Hi,
> Your series failed automatic build test. Please find the testing commands and
> their output below. If you have docker installed, you can probably reproduce 
> it
> locally.

I did try to compile all targets and it works for me (gcc 6.2.1). I've also
tried the oldest gcc I have (4.8.4) and it fails to link all programs on vanilla
QEMU, but compiles all the files otherwise (including my series).


Cheers,
  Lluis



Re: [Qemu-devel] [PATCH v5 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state

2016-12-28 Thread Lluís Vilanova
Richard Henderson writes:

> On 12/28/2016 06:08 AM, Lluís Vilanova wrote:
>> @@ -83,6 +85,13 @@ uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t 
>> e)
>> h32 += e * PRIME32_3;
>> h32  = rol32(h32, 17) * PRIME32_4;
>> 
>> +if (sizeof(TRACE_QHT_VCPU_DSTATE_TYPE) == sizeof(uint32_t)) {
>> +h32 += f * PRIME32_3;
>> +h32  = rol32(h32, 17) * PRIME32_4;
>> +} else {
>> +abort();
>> +}
>> +

> QEMU_BUILD_BUG_ON.

Right, thanks.

Lluis



Re: [Qemu-devel] [PATCH v2] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Eric Blake
On 12/28/2016 08:53 AM, Christopher Covington wrote:
> The definition of the major() and minor() macros are moving within glibc to
> . Include this header to avoid the following sorts of
> build-stopping messages:
> 

> The additional include allows the build to complete on Fedora 26 (Rawhide)
> with glibc version 2.24.90.
> 
> Signed-off-by: Christopher Covington 
> ---
>  include/sysemu/os-posix.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
> index b0a6c0695b..772d58f7ed 100644
> --- a/include/sysemu/os-posix.h
> +++ b/include/sysemu/os-posix.h
> @@ -28,6 +28,7 @@
>  
>  #include 
>  #include 
> +#include 

I repeat what I said on v1:

Works for glibc; but  is non-standard and not present
on some other systems, so this may fail to build elsewhere.  You'll
probably need a configure probe.  Autoconf also says that some platforms
have  instead of  (per its AC_HEADER_MAJOR
macro).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v5 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state

2016-12-28 Thread Richard Henderson

On 12/28/2016 06:08 AM, Lluís Vilanova wrote:

@@ -83,6 +85,13 @@ uint32_t tb_hash_func5(uint64_t a0, uint64_t b0, uint32_t e)
 h32 += e * PRIME32_3;
 h32  = rol32(h32, 17) * PRIME32_4;

+if (sizeof(TRACE_QHT_VCPU_DSTATE_TYPE) == sizeof(uint32_t)) {
+h32 += f * PRIME32_3;
+h32  = rol32(h32, 17) * PRIME32_4;
+} else {
+abort();
+}
+


QEMU_BUILD_BUG_ON.


r~



Re: [Qemu-devel] [RFC PATCH v4 0/6] translate: [tcg] Generic translation framework

2016-12-28 Thread no-reply
Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Subject: [Qemu-devel] [RFC PATCH v4 0/6] translate: [tcg] Generic translation 
framework
Type: series
Message-id: 148293987753.31645.8166717498506500137.st...@fimbulvetr.bsc.es

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
379c21f target: [tcg, arm] Port to generic translation framework
c955be5 target: [tcg, i386] Port to generic translation framework
02ac4cd target: [tcg] Redefine DISAS_* onto the generic translation framework 
(DJ_*)
9cb1c12 target: [tcg] Add generic translation framework
d9d2d4d queue: Add macro for incremental traversal
8d9f6ec Pass generic CPUState to gen_intermediate_code()

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-ji3mp14i/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=a49476d3c1a8
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   no
xen support   no
brlapi supportno
bluez  supportno
Doc

[Qemu-devel] [PATCH v4 5/6] target: [tcg, i386] Port to generic translation framework

2016-12-28 Thread Lluís Vilanova
Signed-off-by: Lluís Vilanova 
---
 target-i386/translate.c |  304 ++-
 1 file changed, 140 insertions(+), 164 deletions(-)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 61d73e286f..a63627b470 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -69,6 +69,10 @@
 case (2 << 6) | (OP << 3) | 0 ... (2 << 6) | (OP << 3) | 7: \
 case (3 << 6) | (OP << 3) | 0 ... (3 << 6) | (OP << 3) | 7
 
+#include "exec/translate-all_template.h"
+#define DJ_JUMP (DJ_TARGET + 0) /* end of block due to call/jump */
+#define DJ_MISC (DJ_TARGET + 1) /* some other reason */
+
 //#define MACRO_TEST   1
 
 /* global register indexes */
@@ -94,7 +98,10 @@ static TCGv_i64 cpu_tmp1_i64;
 static int x86_64_hregs;
 #endif
 
+
 typedef struct DisasContext {
+DisasContextBase base;
+
 /* current insn context */
 int override; /* -1 if no override */
 int prefix;
@@ -102,8 +109,6 @@ typedef struct DisasContext {
 TCGMemOp dflag;
 target_ulong pc_start;
 target_ulong pc; /* pc = eip + cs_base */
-int is_jmp; /* 1 = means jump (stop translation), 2 means CPU
-   static state change (stop translation) */
 /* current block context */
 target_ulong cs_base; /* base of CS segment */
 int pe; /* protected mode */
@@ -124,12 +129,10 @@ typedef struct DisasContext {
 int cpl;
 int iopl;
 int tf; /* TF cpu flag */
-int singlestep_enabled; /* "hardware" single step enabled */
 int jmp_opt; /* use direct block chaining for direct jumps */
 int repz_opt; /* optimize jumps within repz instructions */
 int mem_index; /* select memory access functions */
 uint64_t flags; /* all execution flags */
-struct TranslationBlock *tb;
 int popl_esp_hack; /* for correct popl with esp base handling */
 int rip_offset; /* only used in x86_64, but left for simplicity */
 int cpuid_features;
@@ -140,6 +143,8 @@ typedef struct DisasContext {
 int cpuid_xsave_features;
 } DisasContext;
 
+#include "translate-all_template.h"
+
 static void gen_eob(DisasContext *s);
 static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
@@ -1112,7 +1117,7 @@ static void gen_bpt_io(DisasContext *s, TCGv_i32 t_port, 
int ot)
 
 static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 {
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
 }
 gen_string_movl_A0_EDI(s);
@@ -1127,14 +1132,14 @@ static inline void gen_ins(DisasContext *s, TCGMemOp ot)
 gen_op_movl_T0_Dshift(ot);
 gen_op_add_reg_T0(s->aflag, R_EDI);
 gen_bpt_io(s, cpu_tmp2_i32, ot);
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_end();
 }
 }
 
 static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 {
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_start();
 }
 gen_string_movl_A0_ESI(s);
@@ -1147,7 +1152,7 @@ static inline void gen_outs(DisasContext *s, TCGMemOp ot)
 gen_op_movl_T0_Dshift(ot);
 gen_op_add_reg_T0(s->aflag, R_ESI);
 gen_bpt_io(s, cpu_tmp2_i32, ot);
-if (s->tb->cflags & CF_USE_ICOUNT) {
+if (s->base.tb->cflags & CF_USE_ICOUNT) {
 gen_io_end();
 }
 }
@@ -2130,7 +2135,7 @@ static inline int insn_const_size(TCGMemOp ot)
 static inline bool use_goto_tb(DisasContext *s, target_ulong pc)
 {
 #ifndef CONFIG_USER_ONLY
-return (pc & TARGET_PAGE_MASK) == (s->tb->pc & TARGET_PAGE_MASK) ||
+return (pc & TARGET_PAGE_MASK) == (s->base.tb->pc & TARGET_PAGE_MASK) ||
(pc & TARGET_PAGE_MASK) == (s->pc_start & TARGET_PAGE_MASK);
 #else
 return true;
@@ -2145,7 +2150,7 @@ static inline void gen_goto_tb(DisasContext *s, int 
tb_num, target_ulong eip)
 /* jump to same page: we can use a direct jump */
 tcg_gen_goto_tb(tb_num);
 gen_jmp_im(eip);
-tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
+tcg_gen_exit_tb((uintptr_t)s->base.tb + tb_num);
 } else {
 /* jump to another page: currently not optimized */
 gen_jmp_im(eip);
@@ -2166,7 +2171,7 @@ static inline void gen_jcc(DisasContext *s, int b,
 
 gen_set_label(l1);
 gen_goto_tb(s, 1, val);
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_JUMP;
 } else {
 l1 = gen_new_label();
 l2 = gen_new_label();
@@ -2237,11 +2242,11 @@ static void gen_movl_seg_T0(DisasContext *s, int 
seg_reg)
stop as a special handling must be done to disable hardware
interrupts for the next instruction */
 if (seg_reg == R_SS || (s->code32 && seg_reg < R_FS))
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_JUMP;
 } else {
 gen_op_movl_seg_T0_vm(seg_reg);
 if (seg_reg == R_S

[Qemu-devel] [PATCH v4 6/6] target: [tcg, arm] Port to generic translation framework

2016-12-28 Thread Lluís Vilanova
Signed-off-by: Lluís Vilanova 
---
 target-arm/translate-a64.c |  346 ++---
 target-arm/translate.c |  720 ++--
 target-arm/translate.h |   42 ++-
 3 files changed, 555 insertions(+), 553 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 6dc27a6115..d6f5a65b5a 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -296,17 +296,17 @@ static void gen_exception(int excp, uint32_t syndrome, 
uint32_t target_el)
 
 static void gen_exception_internal_insn(DisasContext *s, int offset, int excp)
 {
-gen_a64_set_pc_im(s->pc - offset);
+gen_a64_set_pc_im(s->base.pc_next - offset);
 gen_exception_internal(excp);
-s->is_jmp = DISAS_EXC;
+s->base.jmp_type = DJ_EXC;
 }
 
 static void gen_exception_insn(DisasContext *s, int offset, int excp,
uint32_t syndrome, uint32_t target_el)
 {
-gen_a64_set_pc_im(s->pc - offset);
+gen_a64_set_pc_im(s->base.pc_next - offset);
 gen_exception(excp, syndrome, target_el);
-s->is_jmp = DISAS_EXC;
+s->base.jmp_type = DJ_EXC;
 }
 
 static void gen_ss_advance(DisasContext *s)
@@ -334,7 +334,7 @@ static void gen_step_complete_exception(DisasContext *s)
 gen_ss_advance(s);
 gen_exception(EXCP_UDEF, syn_swstep(s->ss_same_el, 1, s->is_ldex),
   default_exception_el(s));
-s->is_jmp = DISAS_EXC;
+s->base.jmp_type = DJ_EXC;
 }
 
 static inline bool use_goto_tb(DisasContext *s, int n, uint64_t dest)
@@ -342,13 +342,14 @@ static inline bool use_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 /* No direct tb linking with singlestep (either QEMU's or the ARM
  * debug architecture kind) or deterministic io
  */
-if (s->singlestep_enabled || s->ss_active || (s->tb->cflags & CF_LAST_IO)) 
{
+if (s->base.singlestep_enabled || s->ss_active ||
+(s->base.tb->cflags & CF_LAST_IO)) {
 return false;
 }
 
 #ifndef CONFIG_USER_ONLY
 /* Only link tbs from inside the same guest page */
-if ((s->tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
+if ((s->base.tb->pc & TARGET_PAGE_MASK) != (dest & TARGET_PAGE_MASK)) {
 return false;
 }
 #endif
@@ -360,21 +361,21 @@ static inline void gen_goto_tb(DisasContext *s, int n, 
uint64_t dest)
 {
 TranslationBlock *tb;
 
-tb = s->tb;
+tb = s->base.tb;
 if (use_goto_tb(s, n, dest)) {
 tcg_gen_goto_tb(n);
 gen_a64_set_pc_im(dest);
 tcg_gen_exit_tb((intptr_t)tb + n);
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_TB_JUMP;
 } else {
 gen_a64_set_pc_im(dest);
 if (s->ss_active) {
 gen_step_complete_exception(s);
-} else if (s->singlestep_enabled) {
+} else if (s->base.singlestep_enabled) {
 gen_exception_internal(EXCP_DEBUG);
 } else {
 tcg_gen_exit_tb(0);
-s->is_jmp = DISAS_TB_JUMP;
+s->base.jmp_type = DJ_TB_JUMP;
 }
 }
 }
@@ -405,11 +406,11 @@ static void unallocated_encoding(DisasContext *s)
 qemu_log_mask(LOG_UNIMP, \
   "%s:%d: unsupported instruction encoding 0x%08x "  \
   "at pc=%016" PRIx64 "\n",  \
-  __FILE__, __LINE__, insn, s->pc - 4);  \
+  __FILE__, __LINE__, insn, s->base.pc_next - 4);\
 unallocated_encoding(s); \
 } while (0);
 
-static void init_tmp_a64_array(DisasContext *s)
+void init_tmp_a64_array(DisasContext *s)
 {
 #ifdef CONFIG_DEBUG_TCG
 int i;
@@ -1223,11 +1224,11 @@ static inline AArch64DecodeFn *lookup_disas_fn(const 
AArch64DecodeTable *table,
  */
 static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
 {
-uint64_t addr = s->pc + sextract32(insn, 0, 26) * 4 - 4;
+uint64_t addr = s->base.pc_next + sextract32(insn, 0, 26) * 4 - 4;
 
 if (insn & (1U << 31)) {
 /* C5.6.26 BL Branch with link */
-tcg_gen_movi_i64(cpu_reg(s, 30), s->pc);
+tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
 }
 
 /* C5.6.20 B Branch / C5.6.26 BL Branch with link */
@@ -1250,7 +1251,7 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 sf = extract32(insn, 31, 1);
 op = extract32(insn, 24, 1); /* 0: CBZ; 1: CBNZ */
 rt = extract32(insn, 0, 5);
-addr = s->pc + sextract32(insn, 5, 19) * 4 - 4;
+addr = s->base.pc_next + sextract32(insn, 5, 19) * 4 - 4;
 
 tcg_cmp = read_cpu_reg(s, rt, sf);
 label_match = gen_new_label();
@@ -1258,7 +1259,7 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 tcg_cmp, 0, label_match);
 
-gen_goto_tb(s, 0, s->pc);
+gen_goto_tb(s, 0, s->base.pc_next);
 gen_set_labe

[Qemu-devel] [PATCH v4 4/6] target: [tcg] Redefine DISAS_* onto the generic translation framework (DJ_*)

2016-12-28 Thread Lluís Vilanova
Temporarily redefine DISAS_* values based on DJ_TARGET. They should
disappear as targets get ported to the generic framework.

Signed-off-by: Lluís Vilanova 
---
 include/exec/exec-all.h  |   11 +++
 target-arm/translate.h   |   15 ---
 target-cris/translate.c  |3 ++-
 target-m68k/translate.c  |3 ++-
 target-s390x/translate.c |3 ++-
 target-unicore32/translate.c |3 ++-
 6 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 0e45e1aedc..169da5ebe0 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -36,10 +36,13 @@ typedef ram_addr_t tb_page_addr_t;
 #endif
 
 /* is_jmp field values */
-#define DISAS_NEXT0 /* next instruction can be analyzed */
-#define DISAS_JUMP1 /* only pc was modified dynamically */
-#define DISAS_UPDATE  2 /* cpu state was modified dynamically */
-#define DISAS_TB_JUMP 3 /* only pc was modified statically */
+/* TODO: delete after all targets are transitioned to generic translation */
+#include "exec/translate-all_template.h"
+#define DISAS_NEXTDJ_NEXT   /* next instruction can be analyzed */
+#define DISAS_JUMP(DJ_TARGET + 0)   /* only pc was modified dynamically */
+#define DISAS_UPDATE  (DJ_TARGET + 1)   /* cpu state was modified dynamically 
*/
+#define DISAS_TB_JUMP (DJ_TARGET + 2)   /* only pc was modified statically */
+#define DISAS_TARGET  (DJ_TARGET + 3)   /* base for target-specific values */
 
 #include "qemu/log.h"
 
diff --git a/target-arm/translate.h b/target-arm/translate.h
index 285e96f087..3dd4c4578e 100644
--- a/target-arm/translate.h
+++ b/target-arm/translate.h
@@ -105,21 +105,22 @@ static inline int default_exception_el(DisasContext *s)
 }
 
 /* target-specific extra values for is_jmp */
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
 /* These instructions trap after executing, so the A32/T32 decoder must
  * defer them until after the conditional execution state has been updated.
  * WFI also needs special handling when single-stepping.
  */
-#define DISAS_WFI 4
-#define DISAS_SWI 5
+#define DISAS_WFI (DISAS_TARGET + 0)
+#define DISAS_SWI (DISAS_TARGET + 1)
 /* For instructions which unconditionally cause an exception we can skip
  * emitting unreachable code at the end of the TB in the A64 decoder
  */
-#define DISAS_EXC 6
+#define DISAS_EXC (DISAS_TARGET + 2)
 /* WFE */
-#define DISAS_WFE 7
-#define DISAS_HVC 8
-#define DISAS_SMC 9
-#define DISAS_YIELD 10
+#define DISAS_WFE (DISAS_TARGET + 3)
+#define DISAS_HVC (DISAS_TARGET + 4)
+#define DISAS_SMC (DISAS_TARGET + 5)
+#define DISAS_YIELD (DISAS_TARGET + 6)
 
 #ifdef TARGET_AARCH64
 void a64_translate_init(void);
diff --git a/target-cris/translate.c b/target-cris/translate.c
index ebcf7863bf..001714c7c1 100644
--- a/target-cris/translate.c
+++ b/target-cris/translate.c
@@ -50,7 +50,8 @@
 #define BUG() (gen_BUG(dc, __FILE__, __LINE__))
 #define BUG_ON(x) ({if (x) BUG();})
 
-#define DISAS_SWI 5
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
+#define DISAS_SWI (DISAS_TARGET + 0)
 
 /* Used by the decoder.  */
 #define EXTRACT_FIELD(src, start, end) \
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 6da6f2b51b..b2b0555c80 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -143,7 +143,8 @@ typedef struct DisasContext {
 int done_mac;
 } DisasContext;
 
-#define DISAS_JUMP_NEXT 4
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
+#define DISAS_JUMP_NEXT (DISAS_TARGET + 0)
 
 #if defined(CONFIG_USER_ONLY)
 #define IS_USER(s) 1
diff --git a/target-s390x/translate.c b/target-s390x/translate.c
index a3992dae5a..75787e89e3 100644
--- a/target-s390x/translate.c
+++ b/target-s390x/translate.c
@@ -74,7 +74,8 @@ typedef struct {
 } u;
 } DisasCompare;
 
-#define DISAS_EXCP 4
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
+#define DISAS_EXCP (DISAS_TARGET + 0)
 
 #ifdef DEBUG_INLINE_BRANCHES
 static uint64_t inline_branch_hit[CC_OP_MAX];
diff --git a/target-unicore32/translate.c b/target-unicore32/translate.c
index 39eaa76b50..de0a64e1c8 100644
--- a/target-unicore32/translate.c
+++ b/target-unicore32/translate.c
@@ -45,9 +45,10 @@ typedef struct DisasContext {
 #define IS_USER(s)  1
 #endif
 
+/* TODO: rename as DJ_* when transitioning this target to generic translation 
*/
 /* These instructions trap after executing, so defer them until after the
conditional executions state has been updated.  */
-#define DISAS_SYSCALL 5
+#define DISAS_SYSCALL (DISAS_TARGET + 0)
 
 static TCGv_env cpu_env;
 static TCGv_i32 cpu_R[32];




[Qemu-devel] [PATCH v4 1/6] Pass generic CPUState to gen_intermediate_code()

2016-12-28 Thread Lluís Vilanova
Needed to implement a target-agnostic gen_intermediate_code() in the
future.

Signed-off-by: Lluís Vilanova 
Reviewed-by: David Gibson 
---
 include/exec/exec-all.h   |2 +-
 target-alpha/translate.c  |   11 +--
 target-arm/translate.c|   24 
 target-cris/translate.c   |   17 -
 target-i386/translate.c   |   13 ++---
 target-lm32/translate.c   |   22 +++---
 target-m68k/translate.c   |   15 +++
 target-microblaze/translate.c |   22 +++---
 target-mips/translate.c   |   15 +++
 target-moxie/translate.c  |   14 +++---
 target-openrisc/translate.c   |   22 +++---
 target-ppc/translate.c|   15 +++
 target-s390x/translate.c  |   13 ++---
 target-sh4/translate.c|   15 +++
 target-sparc/translate.c  |   11 +--
 target-tilegx/translate.c |7 +++
 target-tricore/translate.c|9 -
 target-unicore32/translate.c  |   17 -
 target-xtensa/translate.c |   13 ++---
 translate-all.c   |2 +-
 20 files changed, 133 insertions(+), 146 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a8c13cee66..0e45e1aedc 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -43,7 +43,7 @@ typedef ram_addr_t tb_page_addr_t;
 
 #include "qemu/log.h"
 
-void gen_intermediate_code(CPUArchState *env, struct TranslationBlock *tb);
+void gen_intermediate_code(CPUState *env, struct TranslationBlock *tb);
 void restore_state_to_opc(CPUArchState *env, struct TranslationBlock *tb,
   target_ulong *data);
 
diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 114927b751..6759ec28cc 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2873,10 +2873,9 @@ static ExitStatus translate_one(DisasContext *ctx, 
uint32_t insn)
 return ret;
 }
 
-void gen_intermediate_code(CPUAlphaState *env, struct TranslationBlock *tb)
+void gen_intermediate_code(CPUState *cpu, struct TranslationBlock *tb)
 {
-AlphaCPU *cpu = alpha_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
+CPUAlphaState *env = cpu->env_ptr;
 DisasContext ctx, *ctxp = &ctx;
 target_ulong pc_start;
 target_ulong pc_mask;
@@ -2891,7 +2890,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 ctx.pc = pc_start;
 ctx.mem_idx = cpu_mmu_index(env, false);
 ctx.implver = env->implver;
-ctx.singlestep_enabled = cs->singlestep_enabled;
+ctx.singlestep_enabled = cpu->singlestep_enabled;
 
 #ifdef CONFIG_USER_ONLY
 ctx.ir = cpu_std_ir;
@@ -2934,7 +2933,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 tcg_gen_insn_start(ctx.pc);
 num_insns++;
 
-if (unlikely(cpu_breakpoint_test(cs, ctx.pc, BP_ANY))) {
+if (unlikely(cpu_breakpoint_test(cpu, ctx.pc, BP_ANY))) {
 ret = gen_excp(&ctx, EXCP_DEBUG, 0);
 /* The address covered by the breakpoint must be included in
[tb->pc, tb->pc + tb->size) in order to for it to be
@@ -2996,7 +2995,7 @@ void gen_intermediate_code(CPUAlphaState *env, struct 
TranslationBlock *tb)
 && qemu_log_in_addr_range(pc_start)) {
 qemu_log_lock();
 qemu_log("IN: %s\n", lookup_symbol(pc_start));
-log_target_disas(cs, pc_start, ctx.pc - pc_start, 1);
+log_target_disas(cpu, pc_start, ctx.pc - pc_start, 1);
 qemu_log("\n");
 qemu_log_unlock();
 }
diff --git a/target-arm/translate.c b/target-arm/translate.c
index 0ad9070b45..3aa766901c 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -11589,10 +11589,10 @@ static bool insn_crosses_page(CPUARMState *env, 
DisasContext *s)
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
+void gen_intermediate_code(CPUState *cpu, TranslationBlock *tb)
 {
-ARMCPU *cpu = arm_env_get_cpu(env);
-CPUState *cs = CPU(cpu);
+CPUARMState *env = cpu->env_ptr;
+ARMCPU *arm_cpu = arm_env_get_cpu(env);
 DisasContext dc1, *dc = &dc1;
 target_ulong pc_start;
 target_ulong next_page_start;
@@ -11606,7 +11606,7 @@ void gen_intermediate_code(CPUARMState *env, 
TranslationBlock *tb)
  * the A32/T32 complexity to do with conditional execution/IT blocks/etc.
  */
 if (ARM_TBFLAG_AARCH64_STATE(tb->flags)) {
-gen_intermediate_code_a64(cpu, tb);
+gen_intermediate_code_a64(arm_cpu, tb);
 return;
 }
 
@@ -11616,7 +11616,7 @@ void gen_intermediate_code(CPUARMState *env, 
TranslationBlock *tb)
 
 dc->is_jmp = DISAS_NEXT;
 dc->pc = pc_start;
-dc->singlestep_enabled = cs->singlestep_enabled;
+dc->singlestep_enabled = cpu->singlestep_enabled;
 dc->condjmp = 0;
 

[Qemu-devel] [PATCH v4 3/6] target: [tcg] Add generic translation framework

2016-12-28 Thread Lluís Vilanova
Signed-off-by: Lluís Vilanova 
---
 include/exec/gen-icount.h |2 
 include/exec/translate-all_template.h |   73 
 include/qom/cpu.h |   22 
 translate-all_template.h  |  204 +
 4 files changed, 300 insertions(+), 1 deletion(-)
 create mode 100644 include/exec/translate-all_template.h
 create mode 100644 translate-all_template.h

diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 050de59b38..c91ac95ed7 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -45,7 +45,7 @@ static inline void gen_tb_start(TranslationBlock *tb)
 tcg_temp_free_i32(count);
 }
 
-static void gen_tb_end(TranslationBlock *tb, int num_insns)
+static inline void gen_tb_end(TranslationBlock *tb, int num_insns)
 {
 gen_set_label(exitreq_label);
 tcg_gen_exit_tb((uintptr_t)tb + TB_EXIT_REQUESTED);
diff --git a/include/exec/translate-all_template.h 
b/include/exec/translate-all_template.h
new file mode 100644
index 00..ea507f90c6
--- /dev/null
+++ b/include/exec/translate-all_template.h
@@ -0,0 +1,73 @@
+/*
+ * Generic intermediate code generation.
+ *
+ * Copyright (C) 2016 Lluís Vilanova 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef EXEC__TRANSLATE_ALL_TEMPLATE_H
+#define EXEC__TRANSLATE_ALL_TEMPLATE_H
+
+/*
+ * Include this header from a target-specific file, and add a
+ *
+ * DisasContextBase base;
+ *
+ * member in your target-specific DisasContext.
+ */
+
+
+#include "exec/exec-all.h"
+
+
+/**
+ * BreakpointHitType:
+ * @BH_MISS: No hit
+ * @BH_HIT_INSN: Hit, but continue translating instruction
+ * @BH_HIT_TB: Hit, stop translating TB
+ *
+ * How to react to a breakpoint hit.
+ */
+typedef enum BreakpointHitType {
+BH_MISS,
+BH_HIT_INSN,
+BH_HIT_TB,
+} BreakpointHitType;
+
+/**
+ * DisasJumpType:
+ * @DJ_NEXT: Next instruction in program order
+ * @DJ_TOO_MANY: Too many instructions executed
+ * @DJ_TARGET: Start of target-specific conditions
+ *
+ * What instruction to disassemble next.
+ */
+typedef enum DisasJumpType {
+DJ_NEXT,
+DJ_TOO_MANY,
+DJ_TARGET,
+} DisasJumpType;
+
+/**
+ * DisasContextBase:
+ * @tb: Translation block for this disassembly.
+ * @singlestep_enabled: "Hardware" single stepping enabled.
+ * @pc_first: Address of first guest instruction in this TB.
+ * @pc_next: Address of next guest instruction in this TB (current during
+ *   disassembly).
+ * @num_insns: Number of translated instructions (including current).
+ *
+ * Architecture-agnostic disassembly context.
+ */
+typedef struct DisasContextBase {
+TranslationBlock *tb;
+bool singlestep_enabled;
+target_ulong pc_first;
+target_ulong pc_next;
+DisasJumpType jmp_type;
+unsigned int num_insns;
+} DisasContextBase;
+
+#endif  /* EXEC__TRANSLATE_ALL_TEMPLATE_H */
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 3f79a8e955..64a288b066 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -948,6 +948,28 @@ static inline bool cpu_breakpoint_test(CPUState *cpu, 
vaddr pc, int mask)
 return false;
 }
 
+/* Get first breakpoint matching a PC */
+static inline CPUBreakpoint *cpu_breakpoint_get(CPUState *cpu, vaddr pc,
+CPUBreakpoint *bp)
+{
+if (likely(bp == NULL)) {
+if (unlikely(!QTAILQ_EMPTY(&cpu->breakpoints))) {
+QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
+if (bp->pc == pc) {
+return bp;
+}
+}
+}
+} else {
+QTAILQ_FOREACH_CONTINUE(bp, entry) {
+if (bp->pc == pc) {
+return bp;
+}
+}
+}
+return NULL;
+}
+
 int cpu_watchpoint_insert(CPUState *cpu, vaddr addr, vaddr len,
   int flags, CPUWatchpoint **watchpoint);
 int cpu_watchpoint_remove(CPUState *cpu, vaddr addr,
diff --git a/translate-all_template.h b/translate-all_template.h
new file mode 100644
index 00..6208916d08
--- /dev/null
+++ b/translate-all_template.h
@@ -0,0 +1,204 @@
+/*
+ * Generic intermediate code generation.
+ *
+ * Copyright (C) 2016 Lluís Vilanova 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef TRANSLATE_ALL_TEMPLATE_H
+#define TRANSLATE_ALL_TEMPLATE_H
+
+/*
+ * Include this header from a target-specific file, which must define the
+ * target-specific functions declared below.
+ *
+ * These must be paired with instructions in "exec/translate-all_template.h".
+ */
+
+
+#include "cpu.h"
+#include "qemu/error-report.h"
+
+
+static void gen_intermediate_code_target_init_disas_context(
+DisasContext *dc, CPUArchState *env);
+
+static void gen_intermediate_code_target_init_globals(
+DisasContext *dc, CPUArchState *

[Qemu-devel] [PATCH v4 2/6] queue: Add macro for incremental traversal

2016-12-28 Thread Lluís Vilanova
Adds macro QTAILQ_FOREACH_CONTINUE to support incremental list
traversal.

Signed-off-by: Lluís Vilanova 
---
 include/qemu/queue.h |   12 
 1 file changed, 12 insertions(+)

diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 342073fb4d..ea6130f1c9 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -415,6 +415,18 @@ struct {   
 \
 (var);  \
 (var) = ((var)->field.tqe_next))
 
+/**
+ * QTAILQ_FOREACH_CONTINUE:
+ * @var: Variable to resume iteration from.
+ * @field: Field in @var holding a QTAILQ_ENTRY for this queue.
+ *
+ * Resumes iteration on a queue from the element in @var.
+ */
+#define QTAILQ_FOREACH_CONTINUE(var, field) \
+for ((var) = ((var)->field.tqe_next);   \
+(var);  \
+(var) = ((var)->field.tqe_next))
+
 #define QTAILQ_FOREACH_SAFE(var, head, field, next_var) \
 for ((var) = ((head)->tqh_first);   \
 (var) && ((next_var) = ((var)->field.tqe_next), 1); \




[Qemu-devel] [RFC PATCH v4 0/6] translate: [tcg] Generic translation framework

2016-12-28 Thread Lluís Vilanova
This series proposes a generic (target-agnostic) instruction translation
framework.

It basically provides a generic main loop for instruction disassembly, which
calls target-specific functions when necessary. This generalization makes
inserting new code in the main loop easier, and helps in keeping all targets in
synch as to the contents of it.

This series also paves the way towards adding events to trace guest code
execution (BBLs and instructions).

I've ported i386/x86-64 and arm/aarch64 as an example to see how it fits in the
current organization, but will port the rest when this series gets merged.

Signed-off-by: Lluís Vilanova 
---

Changes in v4
=

* Document new macro QTAILQ_FOREACH_CONTINUE [Peter Maydell].
* Fix coding style errors reported by checkpatch.
* Remove use of "restrict" in added functions; it makes older gcc versions barf
  about compilation errors.


Changes in v3
=

* Rebase on 0737f32daf.


Changes in v2
=

* Port ARM and AARCH64 targets.
* Fold single-stepping checks into "max_insns" [Richard Henderson].
* Move instruction start marks to target code [Richard Henderson].
* Add target hook for TB start.
* Check for TCG temporary leaks.
* Move instruction disassembly into a target hook.
* Make breakpoint_hit() return an enum to accomodate target's needs (ARM).


Lluís Vilanova (6):
  Pass generic CPUState to gen_intermediate_code()
  queue: Add macro for incremental traversal
  target: [tcg] Add generic translation framework
  target: [tcg] Redefine DISAS_* onto the generic translation framework 
(DJ_*)
  target: [tcg,i386] Port to generic translation framework
  target: [tcg,arm] Port to generic translation framework


 include/exec/exec-all.h   |   13 -
 include/exec/gen-icount.h |2 
 include/exec/translate-all_template.h |   73 +++
 include/qemu/queue.h  |   12 +
 include/qom/cpu.h |   22 +
 target-alpha/translate.c  |   11 -
 target-arm/translate-a64.c|  346 
 target-arm/translate.c|  720 +
 target-arm/translate.h|   41 +-
 target-cris/translate.c   |   20 -
 target-i386/translate.c   |  305 ++
 target-lm32/translate.c   |   22 +
 target-m68k/translate.c   |   18 -
 target-microblaze/translate.c |   22 +
 target-mips/translate.c   |   15 -
 target-moxie/translate.c  |   14 -
 target-openrisc/translate.c   |   22 +
 target-ppc/translate.c|   15 -
 target-s390x/translate.c  |   16 -
 target-sh4/translate.c|   15 -
 target-sparc/translate.c  |   11 -
 target-tilegx/translate.c |7 
 target-tricore/translate.c|9 
 target-unicore32/translate.c  |   20 -
 target-xtensa/translate.c |   13 -
 translate-all.c   |2 
 translate-all_template.h  |  204 +
 27 files changed, 1137 insertions(+), 853 deletions(-)
 create mode 100644 include/exec/translate-all_template.h
 create mode 100644 translate-all_template.h


To: qemu-devel@nongnu.org
Cc: Paolo Bonzini 
Cc: Peter Crosthwaite 
Cc: Richard Henderson 



[Qemu-devel] [Bug 1649042] Re: Ubuntu 16.04.1 LightDM Resolution Not Correct

2016-12-28 Thread Thomas Huth
OK, if it works with -vga virtio, I think we should close this bug as
WONTFIX, since the -vga vmware code is pretty much unmaintained as far
as I know (if somebody is willing to fix this there, too, feel free to
open this bug again).

** Changed in: qemu
   Status: New => Won't Fix

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1649042

Title:
  Ubuntu 16.04.1 LightDM Resolution Not Correct

Status in QEMU:
  Won't Fix

Bug description:
  My Specs:

  Slackware 14.2 x86_64 > Host
  Nvidia GPU GTX660M
  nvidia-driver-352.63
  QEMU 2.7.0

  Ubuntu 16.04.1 x86_64 > Guest
  Unity
  Xorg nouveau - 1:1.0.12-1build2

  These are the startup options for Ubuntu:

  qemu-system-x86_64 -drive format=raw,file=ubuntu.img \
  -cpu host \
  --enable-kvm \
  -smp 2 \
  -m 4096 \
  -vga vmware \
  -soundhw ac97 \
  -usbdevice tablet \
  -rtc base=localtime \
  -usbdevice host:0781:5575

  Unity desktop resolution set for 1440x900.

  I noticed when I come to the login screen to enter my password the
  LightDM resolution fills my entire desktop.

  I searched online and found this solution;

  cp ~/.config/monitor.xml /var/lib/lightdm/.config

  For now I'm assuming this step should not be needed and the resolution
  should be correctly detected and set?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1649042/+subscriptions



Re: [Qemu-devel] [PATCH] doc/pcie: correct command line examples

2016-12-28 Thread Andrew Jones
On Wed, Dec 28, 2016 at 03:24:30PM +0200, Marcel Apfelbaum wrote:
> On 12/27/2016 09:40 AM, Cao jin wrote:
> > Nit picking: Multi-function PCI Express Root Ports should mean that
> > 'addr' property is mandatory, and slot is optional because it is default
> > to 0, and 'chassis' is mandatory for 2nd & 3rd root port because it is
> > default to 0 too.
> > 
> > Bonus: fix a typo(2->3)
> > Signed-off-by: Cao jin 
> > ---
> >  docs/pcie.txt | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/docs/pcie.txt b/docs/pcie.txt
> > index 9fb20aaed9f4..54f05eaa71dc 100644
> > --- a/docs/pcie.txt
> > +++ b/docs/pcie.txt
> > @@ -110,18 +110,18 @@ Plug only PCI Express devices into PCI Express Ports.
> >-device 
> > ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
> >-device ,bus=root_port1
> >  2.2.2 Using multi-function PCI Express Root Ports:
> > -  -device 
> > ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0]
> >  \
> > -  -device 
> > ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] \
> > -  -device 
> > ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] \
> > -2.2.2 Plugging a PCI Express device into a Switch:
> > +  -device 
> > ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0]
> >  \
> > +  -device 
> > ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \
> > +  -device 
> > ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \
> > +2.2.3 Plugging a PCI Express device into a Switch:
> >-device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] 
> >  \
> >-device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] 
> >  \
> >-device 
> > xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]]
> >  \
> >-device ,bus=downstream_port1
> > 
> >  Notes:
> > -  - (slot, chassis) pair is mandatory and must be
> > - unique for each PCI Express Root Port.
> > +  - (slot, chassis) pair is mandatory and must be unique for each
> > +PCI Express Root Port. slot is default to 0 when doesn't specify it.

Please rewrite last sentence as

 slot defaults to 0 when not specified.

> >- 'addr' parameter can be 0 for all the examples above.
> > 
> > 
> > 
> 
> Reviewed-by: Marcel Apfelbaum 
> 
> Thanks,
> Marcel
> 

Thanks,
drew



[Qemu-devel] Looking for a linux-user mode test

2016-12-28 Thread Sean Bruno
After some recent-ish changes to how user mode executes things/stuff,
I'm running into issues with the out of tree bsd-user mode code that
FreeBSD has been maintaining.  It looks like the host_signal_handler()
is never executed or registered correctly in our code.  I'm curious if
the linux-user code can handle this bit of configure script from m4.

https://people.freebsd.org/~sbruno/stack.c

If someone has the time/inclination, can this code be compiled for ARMv6
and executed in a linux chroot with the -strace argument applied?  I see
the following, which after much debugging seems to indicate that the
host_signal_handler() code is never executed as this code is requesting
that SIGSEGV be masked to its own handler.

https://people.freebsd.org/~sbruno/qemu-bsd-user-arm.txt

Prior to 7e6c57e2957c7d868f74bd0d53b5e861b495e1c7 this DTRT for our
ARMv6 targets.

sean



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v2] build: include sys/sysmacros.h for major() and minor()

2016-12-28 Thread Christopher Covington
The definition of the major() and minor() macros are moving within glibc to
. Include this header to avoid the following sorts of
build-stopping messages:

qga/commands-posix.c: In function ‘dev_major_minor’:
qga/commands-posix.c:656:13: error: In the GNU C Library, "major" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "major", include 
 directly. If you did not intend to use a system-defined macro
 "major", you should undefine it after including . [-Werror]
 *devmajor = major(st.st_rdev);
 ^~

qga/commands-posix.c:657:13: error: In the GNU C Library, "minor" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "minor", include 
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including . [-Werror]
 *devminor = minor(st.st_rdev);
 ^~

The additional include allows the build to complete on Fedora 26 (Rawhide)
with glibc version 2.24.90.

Signed-off-by: Christopher Covington 
---
 include/sysemu/os-posix.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/sysemu/os-posix.h b/include/sysemu/os-posix.h
index b0a6c0695b..772d58f7ed 100644
--- a/include/sysemu/os-posix.h
+++ b/include/sysemu/os-posix.h
@@ -28,6 +28,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora
Forum, a Linux Foundation Collaborative Project.




[Qemu-devel] [PATCH v5 7/7] trace: [trivial] Statically enable all guest events

2016-12-28 Thread Lluís Vilanova
The optimizations of this series makes it feasible to have them
available on all builds.

Signed-off-by: Lluís Vilanova 
---
 trace-events |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index f74e1d3d22..0a0f4d9cd6 100644
--- a/trace-events
+++ b/trace-events
@@ -159,7 +159,7 @@ vcpu guest_cpu_reset(void)
 #
 # Mode: user, softmmu
 # Targets: TCG(all)
-disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
+vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
 
 # @num: System call number.
 # @arg*: System call argument value.
@@ -168,7 +168,7 @@ disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) 
"info=%d", "vaddr=0x
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, 
uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, 
uint64_t arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
+vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint64_t 
arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, uint64_t 
arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
 
 # @num: System call number.
 # @ret: System call result value.
@@ -177,4 +177,4 @@ disable vcpu guest_user_syscall(uint64_t num, uint64_t 
arg1, uint64_t arg2, uint
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) 
"num=0x%016"PRIx64" ret=0x%016"PRIx64
+vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) "num=0x%016"PRIx64" 
ret=0x%016"PRIx64




[Qemu-devel] [PATCH v5 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state

2016-12-28 Thread Lluís Vilanova
Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova 
---
 cpu-exec.c|   26 +-
 include/exec/exec-all.h   |5 +
 include/exec/tb-hash-xx.h |   11 ++-
 include/exec/tb-hash.h|5 +++--
 include/qemu-common.h |3 +++
 tests/qht-bench.c |2 +-
 trace/control-target.c|3 +++
 trace/control.h   |3 +++
 translate-all.c   |   16 ++--
 9 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 1b7366efb0..a377505b9c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -262,6 +262,7 @@ struct tb_desc {
 CPUArchState *env;
 tb_page_addr_t phys_page1;
 uint32_t flags;
+TRACE_QHT_VCPU_DSTATE_TYPE trace_vcpu_dstate;
 };
 
 static bool tb_cmp(const void *p, const void *d)
@@ -273,6 +274,7 @@ static bool tb_cmp(const void *p, const void *d)
 tb->page_addr[0] == desc->phys_page1 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
 !atomic_read(&tb->invalid)) {
 /* check next page if needed */
 if (tb->page_addr[1] == -1) {
@@ -294,7 +296,8 @@ static bool tb_cmp(const void *p, const void *d)
 static TranslationBlock *tb_htable_lookup(CPUState *cpu,
   target_ulong pc,
   target_ulong cs_base,
-  uint32_t flags)
+  uint32_t flags,
+  uint32_t trace_vcpu_dstate)
 {
 tb_page_addr_t phys_pc;
 struct tb_desc desc;
@@ -303,10 +306,11 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu,
 desc.env = (CPUArchState *)cpu->env_ptr;
 desc.cs_base = cs_base;
 desc.flags = flags;
+desc.trace_vcpu_dstate = trace_vcpu_dstate;
 desc.pc = pc;
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
-h = tb_hash_func(phys_pc, pc, flags);
+h = tb_hash_func(phys_pc, pc, flags, trace_vcpu_dstate);
 return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
@@ -318,16 +322,24 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags;
+unsigned long trace_vcpu_dstate_bitmap;
+TRACE_QHT_VCPU_DSTATE_TYPE trace_vcpu_dstate;
 bool have_tb_lock = false;
 
+bitmap_copy(&trace_vcpu_dstate_bitmap, cpu->trace_dstate,
+trace_get_vcpu_event_count());
+memcpy(&trace_vcpu_dstate, &trace_vcpu_dstate_bitmap,
+   sizeof(trace_vcpu_dstate));
+
 /* we record a subset of the CPU state. It will
always be the same before a given translated block
is executed. */
 cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
 tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
- tb->flags != flags)) {
-tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+ tb->flags != flags ||
+ tb->trace_vcpu_dstate != trace_vcpu_dstate)) {
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, trace_vcpu_dstate);
 if (!tb) {
 
 /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
@@ -341,7 +353,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 /* There's a chance that our desired tb has been translated while
  * taking the locks so we check again inside the lock.
  */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, trace_vcpu_dstate);
 if (!tb) {
 /* if no translated code available, then translate it now */
 tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
@@ -465,6 +477,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
 bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
 trace_get_vcpu_event_count());
+tb_flush_jmp_cache_all(

[Qemu-devel] [PATCH v5 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events

2016-12-28 Thread Lluís Vilanova
If an event is dynamically disabled, the TCG code that calls the
execution-time tracer is not generated.

Removes the overheads of execution-time tracers for dynamically disabled
events. As a bonus, also avoids checking the event state when the
execution-time tracer is called from TCG-generated code (since otherwise
TCG would simply not call it).

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/__init__.py|1 +
 scripts/tracetool/format/h.py|   24 ++--
 scripts/tracetool/format/tcg_h.py|   19 ---
 scripts/tracetool/format/tcg_helper_c.py |3 ++-
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 365446fa53..63168ccdf0 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -264,6 +264,7 @@ class Event(object):
 return self._FMT.findall(self.fmt)
 
 QEMU_TRACE   = "trace_%(name)s"
+QEMU_TRACE_NOCHECK   = "_nocheck__" + QEMU_TRACE
 QEMU_TRACE_TCG   = QEMU_TRACE + "_tcg"
 QEMU_DSTATE  = "_TRACE_%(NAME)s_DSTATE"
 QEMU_EVENT   = "_TRACE_%(NAME)s_EVENT"
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 3682f4e6a8..a78e50ef35 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -49,6 +49,19 @@ def generate(events, backend, group):
 backend.generate_begin(events, group)
 
 for e in events:
+# tracer without checks
+out('',
+'static inline void %(api)s(%(args)s)',
+'{',
+api=e.api(e.QEMU_TRACE_NOCHECK),
+args=e.args)
+
+if "disable" not in e.properties:
+backend.generate(e, group)
+
+out('}')
+
+# tracer wrapper with checks (per-vCPU tracing)
 if "vcpu" in e.properties:
 trace_cpu = next(iter(e.args))[1]
 cond = "trace_event_get_vcpu_state(%(cpu)s,"\
@@ -63,16 +76,15 @@ def generate(events, backend, group):
 'static inline void %(api)s(%(args)s)',
 '{',
 'if (%(cond)s) {',
+'%(api_nocheck)s(%(names)s);',
+'}',
+'}',
 api=e.api(),
+api_nocheck=e.api(e.QEMU_TRACE_NOCHECK),
 args=e.args,
+names=", ".join(e.args.names()),
 cond=cond)
 
-if "disable" not in e.properties:
-backend.generate(e, group)
-
-out('}',
-'}')
-
 backend.generate_end(events, group)
 
 out('#endif /* TRACE_%s_GENERATED_TRACERS_H */' % group.upper())
diff --git a/scripts/tracetool/format/tcg_h.py 
b/scripts/tracetool/format/tcg_h.py
index 5f213f6cba..71b5c09432 100644
--- a/scripts/tracetool/format/tcg_h.py
+++ b/scripts/tracetool/format/tcg_h.py
@@ -41,7 +41,7 @@ def generate(events, backend, group):
 
 for e in events:
 # just keep one of them
-if "tcg-trans" not in e.properties:
+if "tcg-exec" not in e.properties:
 continue
 
 out('static inline void %(name_tcg)s(%(args)s)',
@@ -53,12 +53,25 @@ def generate(events, backend, group):
 args_trans = e.original.event_trans.args
 args_exec = tracetool.vcpu.transform_args(
 "tcg_helper_c", e.original.event_exec, "wrapper")
+if "vcpu" in e.properties:
+trace_cpu = e.args.names()[0]
+cond = "trace_event_get_vcpu_state(%(cpu)s,"\
+   " TRACE_%(id)s)"\
+   % dict(
+   cpu=trace_cpu,
+   id=e.original.event_exec.name.upper())
+else:
+cond = "true"
+
 out('%(name_trans)s(%(argnames_trans)s);',
-'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'if (%(cond)s) {',
+'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'}',
 name_trans=e.original.event_trans.api(e.QEMU_TRACE),
 name_exec=e.original.event_exec.api(e.QEMU_TRACE),
 argnames_trans=", ".join(args_trans.names()),
-argnames_exec=", ".join(args_exec.names()))
+argnames_exec=", ".join(args_exec.names()),
+cond=cond)
 
 out('}')
 
diff --git a/scripts/tracetool/format/tcg_helper_c.py 
b/scripts/tracetool/format/tcg_helper_c.py
index cc26e03008..c2a05d756c 100644
--- a/scripts/tracetool/format/tcg_helper_c.py
+++ b/scripts/tracetool/format/tcg_helper_c.py
@@ -66,10 +66,11 @@ def generate(events, backend, group):
 
 out('void %(name_tcg)s(%(args_api)s)',
 '{',
+# NOTE: the check was already performed at TCG-generation time
 '%(name)s(%(args_call)s);',
 '}',
 name_tcg="helper_%s_proxy" % e.api(),
-

[Qemu-devel] [PATCH v5 2/7] trace: Make trace_get_vcpu_event_count() inlinable

2016-12-28 Thread Lluís Vilanova
Later patches will make use of it.

Signed-off-by: Lluís Vilanova 
---
 trace/control-internal.h |5 +
 trace/control.c  |9 ++---
 trace/control.h  |2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/trace/control-internal.h b/trace/control-internal.h
index a9d395a587..beb98a0d2c 100644
--- a/trace/control-internal.h
+++ b/trace/control-internal.h
@@ -16,6 +16,7 @@
 
 
 extern int trace_events_enabled_count;
+extern uint32_t trace_next_vcpu_id;
 
 
 static inline bool trace_event_is_pattern(const char *str)
@@ -82,6 +83,10 @@ static inline bool 
trace_event_get_vcpu_state_dynamic(CPUState *vcpu,
 return trace_event_get_vcpu_state_dynamic_by_vcpu_id(vcpu, vcpu_id);
 }
 
+static inline uint32_t trace_get_vcpu_event_count(void)
+{
+return trace_next_vcpu_id;
+}
 
 void trace_event_register_group(TraceEvent **events);
 
diff --git a/trace/control.c b/trace/control.c
index 1a7bee6ddc..52d0e343fa 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -36,7 +36,7 @@ typedef struct TraceEventGroup {
 static TraceEventGroup *event_groups;
 static size_t nevent_groups;
 static uint32_t next_id;
-static uint32_t next_vcpu_id;
+uint32_t trace_next_vcpu_id;
 
 QemuOptsList qemu_trace_opts = {
 .name = "trace",
@@ -65,7 +65,7 @@ void trace_event_register_group(TraceEvent **events)
 for (i = 0; events[i] != NULL; i++) {
 events[i]->id = next_id++;
 if (events[i]->vcpu_id != TRACE_VCPU_EVENT_NONE) {
-events[i]->vcpu_id = next_vcpu_id++;
+events[i]->vcpu_id = trace_next_vcpu_id++;
 }
 }
 event_groups = g_renew(TraceEventGroup, event_groups, nevent_groups + 1);
@@ -299,8 +299,3 @@ char *trace_opt_parse(const char *optarg)
 
 return trace_file;
 }
-
-uint32_t trace_get_vcpu_event_count(void)
-{
-return next_vcpu_id;
-}
diff --git a/trace/control.h b/trace/control.h
index ccaeac8552..80d326c4d1 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -237,7 +237,7 @@ char *trace_opt_parse(const char *optarg);
  *
  * Return the number of known vcpu-specific events
  */
-uint32_t trace_get_vcpu_event_count(void);
+static uint32_t trace_get_vcpu_event_count(void);
 
 
 #include "trace/control-internal.h"




[Qemu-devel] [PATCH v5 6/7] trace: [tcg, trivial] Re-align generated code

2016-12-28 Thread Lluís Vilanova
Last patch removed a nesting level in generated code. Re-align all code
generated by backends to be 4-column aligned.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/backend/dtrace.py |2 +-
 scripts/tracetool/backend/ftrace.py |   20 ++--
 scripts/tracetool/backend/log.py|   17 +
 scripts/tracetool/backend/simple.py |2 +-
 scripts/tracetool/backend/syslog.py |6 +++---
 scripts/tracetool/backend/ust.py|2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/scripts/tracetool/backend/dtrace.py 
b/scripts/tracetool/backend/dtrace.py
index 79505c6b1a..b3a8645bf0 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -41,6 +41,6 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('QEMU_%(uppername)s(%(argnames)s);',
+out('QEMU_%(uppername)s(%(argnames)s);',
 uppername=event.name.upper(),
 argnames=", ".join(event.args.names()))
diff --git a/scripts/tracetool/backend/ftrace.py 
b/scripts/tracetool/backend/ftrace.py
index db9fe7ad57..dd0eda4441 100644
--- a/scripts/tracetool/backend/ftrace.py
+++ b/scripts/tracetool/backend/ftrace.py
@@ -29,17 +29,17 @@ def generate_h(event, group):
 if len(event.args) > 0:
 argnames = ", " + argnames
 
-out('{',
-'char ftrace_buf[MAX_TRACE_STRLEN];',
-'int unused __attribute__ ((unused));',
-'int trlen;',
-'if (trace_event_get_state(%(event_id)s)) {',
-'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
-' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
-'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
-'unused = write(trace_marker_fd, ftrace_buf, trlen);',
-'}',
+out('{',
+'char ftrace_buf[MAX_TRACE_STRLEN];',
+'int unused __attribute__ ((unused));',
+'int trlen;',
+'if (trace_event_get_state(%(event_id)s)) {',
+'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
+' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
+'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
+'unused = write(trace_marker_fd, ftrace_buf, trlen);',
 '}',
+'}',
 name=event.name,
 args=event.args,
 event_id="TRACE_" + event.name.upper(),
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
index 4f4a4d38b1..7d2c3abe75 100644
--- a/scripts/tracetool/backend/log.py
+++ b/scripts/tracetool/backend/log.py
@@ -35,14 +35,15 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'struct timeval _now;',
-'gettimeofday(&_now, NULL);',
-'qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " 
%(fmt)s "\\n",',
-'  getpid(),',
-'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
-'  %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'struct timeval _now;',
+'gettimeofday(&_now, NULL);',
+'qemu_log_mask(LOG_TRACE,',
+'  "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
+'  getpid(),',
+'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
+'  %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index 85f61028e2..a28460b1e4 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -37,7 +37,7 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('_simple_%(api)s(%(args)s);',
+out('_simple_%(api)s(%(args)s);',
 api=event.api(),
 args=", ".join(event.args.names()))
 
diff --git a/scripts/tracetool/backend/syslog.py 
b/scripts/tracetool/backend/syslog.py
index b8ff2790c4..1ce627f0fc 100644
--- a/scripts/tracetool/backend/syslog.py
+++ b/scripts/tracetool/backend/syslog.py
@@ -35,9 +35,9 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip

[Qemu-devel] [PATCH v5 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache

2016-12-28 Thread Lluís Vilanova
The function is reused in later patches.

Signed-off-by: Lluís Vilanova 
---
 cputlb.c|2 +-
 include/exec/exec-all.h |6 ++
 translate-all.c |   14 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 813279f3bc..9bf9960e1b 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -80,7 +80,7 @@ void tlb_flush(CPUState *cpu, int flush_global)
 
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+tb_flush_jmp_cache_all(cpu);
 
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a8c13cee66..57cd978578 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -256,6 +256,12 @@ struct TranslationBlock {
 };
 
 void tb_free(TranslationBlock *tb);
+/**
+ * tb_flush_jmp_cache_all:
+ *
+ * Flush the virtual translation block cache.
+ */
+void tb_flush_jmp_cache_all(CPUState *env);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
diff --git a/translate-all.c b/translate-all.c
index 3dd9214904..29ccb9e546 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -941,11 +941,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 }
 
 CPU_FOREACH(cpu) {
-int i;
-
-for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
-atomic_set(&cpu->tb_jmp_cache[i], NULL);
-}
+tb_flush_jmp_cache_all(cpu);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -1741,6 +1737,14 @@ void tb_check_watchpoint(CPUState *cpu)
 }
 }
 
+void tb_flush_jmp_cache_all(CPUState *cpu)
+{
+int i;
+for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+atomic_set(&cpu->tb_jmp_cache[i], NULL);
+}
+}
+
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
must be at the end of the TB */




[Qemu-devel] [PATCH v5 3/7] trace: [tcg] Delay changes to dynamic state when translating

2016-12-28 Thread Lluís Vilanova
This keeps consistency across all decisions taken during translation
when the dynamic state of a vCPU is changed in the middle of translating
some guest code.

Signed-off-by: Lluís Vilanova 
---
 cpu-exec.c |   26 ++
 include/qom/cpu.h  |7 +++
 qom/cpu.c  |4 
 trace/control-target.c |   11 +--
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4188fed3c6..1b7366efb0 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -33,6 +33,7 @@
 #include "hw/i386/apic.h"
 #endif
 #include "sysemu/replay.h"
+#include "trace/control.h"
 
 /* -icount align implementation. */
 
@@ -451,9 +452,21 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #ifndef CONFIG_USER_ONLY
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* delay changes to this vCPU's dstate during translation */
+atomic_set(&cpu->trace_dstate_delayed_req, false);
+atomic_set(&cpu->trace_dstate_must_delay, true);
+
 /* try to cause an exception pending in the log */
 cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
 *ret = -1;
+
+/* apply and disable delayed dstate changes */
+atomic_set(&cpu->trace_dstate_must_delay, false);
+if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
+bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 return true;
 #endif
 }
@@ -634,8 +647,21 @@ int cpu_exec(CPUState *cpu)
 
 for(;;) {
 cpu_handle_interrupt(cpu, &last_tb);
+
+/* delay changes to this vCPU's dstate during translation */
+atomic_set(&cpu->trace_dstate_delayed_req, false);
+atomic_set(&cpu->trace_dstate_must_delay, true);
+
 tb = tb_find(cpu, last_tb, tb_exit);
 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
+
+/* apply and disable delayed dstate changes */
+atomic_set(&cpu->trace_dstate_must_delay, false);
+if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
+bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 /* Try to align the host and virtual clocks
if the guest is in advance */
 align_clocks(&sc, cpu);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 3f79a8e955..58255d06fa 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -295,6 +295,10 @@ struct qemu_work_item;
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
+ * @trace_dstate_must_delay: Whether a change to trace_dstate must be delayed.
+ * @trace_dstate_delayed_req: Whether a change to trace_dstate was delayed.
+ * @trace_dstate_delayed: Delayed changes to trace_dstate (includes all changes
+ *to @trace_dstate).
  * @trace_dstate: Dynamic tracing state of events for this vCPU (bitmask).
  *
  * State of one CPU core or thread.
@@ -370,6 +374,9 @@ struct CPUState {
  * Dynamically allocated based on bitmap requried to hold up to
  * trace_get_vcpu_event_count() entries.
  */
+bool trace_dstate_must_delay;
+bool trace_dstate_delayed_req;
+unsigned long *trace_dstate_delayed;
 unsigned long *trace_dstate;
 
 /* TODO Move common fields from CPUArchState here. */
diff --git a/qom/cpu.c b/qom/cpu.c
index 03d9190f8c..d56496d28d 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -367,6 +367,9 @@ static void cpu_common_initfn(Object *obj)
 QTAILQ_INIT(&cpu->breakpoints);
 QTAILQ_INIT(&cpu->watchpoints);
 
+cpu->trace_dstate_must_delay = false;
+cpu->trace_dstate_delayed_req = false;
+cpu->trace_dstate_delayed = bitmap_new(trace_get_vcpu_event_count());
 cpu->trace_dstate = bitmap_new(trace_get_vcpu_event_count());
 
 cpu_exec_initfn(cpu);
@@ -375,6 +378,7 @@ static void cpu_common_initfn(Object *obj)
 static void cpu_common_finalize(Object *obj)
 {
 CPUState *cpu = CPU(obj);
+g_free(cpu->trace_dstate_delayed);
 g_free(cpu->trace_dstate);
 }
 
diff --git a/trace/control-target.c b/trace/control-target.c
index 7ebf6e0bcb..aba8db55de 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -69,13 +69,20 @@ void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
 if (state_pre != state) {
 if (state) {
 trace_events_enabled_count++;
-set_bit(vcpu_id, vcpu->trace_dstate);
+set_bit(vcpu_id, vcpu->trace_dstate_delayed);
+if (!atomic_read(&vcpu->trace_dstate_must_delay)) {
+set_bit(vcpu_id, vcpu->trace_dstate);
+

[Qemu-devel] [PATCH v5 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches

2016-12-28 Thread Lluís Vilanova
Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
accesses), making it feasible to statically enable them by default on all QEMU
builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real0m2,259s
user0m2,252s
sys 0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real0m4,921s
user0m4,912s
sys 0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real0m2,228s
user0m2,216s
sys 0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real0m2,229s
user0m2,224s
sys 0m0,004s


Right now, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, where the event's dynamic state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series integrates the dynamic tracing event state
into the TB hashing function, so that vCPUs tracing different events will use
separate TBs. Note that only events with the 'vcpu' property are used for
hashing (as stored in the bitmap of CPUState->trace_dstate).

This makes dynamic event state changes on vCPUs very efficient, since they can
use TBs produced by other vCPUs while on the same event state combination (or
produced by the same vCPU, earlier).

Discarded alternatives:

* Emitting TCG code to check if an event needs tracing, where we should still
  move the tracing call code to either a cold path (making tracing performance
  worse), or leave it inlined (making non-tracing performance worse).

* Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
  it on a single vCPU will impact the performance of all other vCPUs that are
  not tracing that event.

Signed-off-by: Lluís Vilanova 
---

Changes in v5
=

* Move define into "qemu-common.h" to allow compilation of tests.


Changes in v4
=

* Incorporate trace_dstate into the TB hashing function instead of using
  multiple physical TB caches [suggested by Richard Henderson].


Changes in v3
=

* Rebase on 0737f32daf.
* Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
* Refactor trace_get_vcpu_event_count() to be inlinable.
* Optimize cpu_tb_cache_set_requested() (hottest path).


Changes in v2
=

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].


Lluís Vilanova (7):
  exec: [tcg] Refactor flush of per-CPU virtual TB cache
  trace: Make trace_get_vcpu_event_count() inlinable
  trace: [tcg] Delay changes to dynamic state when translating
  exec: [tcg] Use different TBs according to the vCPU's dynamic tracing 
state
  trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  trace: [tcg,trivial] Re-align generated code
  trace: [trivial] Statically enable all guest events


 cpu-exec.c   |   52 +++---
 cputlb.c |2 +
 include/exec/exec-all.h  |   11 ++
 include/exec/tb-hash-xx.h|   11 ++
 include/exec/tb-hash.h   |5 ++-
 include/qemu-common.h|3 ++
 include/qom/cpu.h|7 
 qom/cpu.c|4 ++
 scripts/tracetool/__init__.py|1 +
 scripts/tracetool/backend/dtrace.py  |2 +
 scripts/tracetool/backend/ftrace.py  |   20 ++--
 scripts/tracetool/backend/log.py |   17 +-
 scripts/tracetool/backend/simple.py  |2 +
 scripts/tracetool/backend/syslog.py  |6 ++-
 scripts/tracetool/backend/ust.py |2 +
 scripts/tracetool/format/h.py|   24 ++
 scripts/tracetool/format/tcg_h.py|   19 +--
 scripts/tracetool/format/tcg_helper_c.py |3 +-
 tests/qht-bench.c|2 +
 trace-events |6 ++-
 trace/control-internal.h |5 +++
 trace/control-target.c   |   14 +++-
 trace/control.c  |9 +
 trace/control.h  |5 ++-
 translate-all.c  |   30 +
 25 files changed, 198 insertions(+), 64 deletions(-)


To: qemu-devel@nongnu.org
Cc: Stefan Hajnoczi 

Re: [Qemu-devel] [PATCH] doc/pcie: correct command line examples

2016-12-28 Thread Marcel Apfelbaum

On 12/27/2016 09:40 AM, Cao jin wrote:

Nit picking: Multi-function PCI Express Root Ports should mean that
'addr' property is mandatory, and slot is optional because it is default
to 0, and 'chassis' is mandatory for 2nd & 3rd root port because it is
default to 0 too.

Bonus: fix a typo(2->3)
Signed-off-by: Cao jin 
---
 docs/pcie.txt | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/docs/pcie.txt b/docs/pcie.txt
index 9fb20aaed9f4..54f05eaa71dc 100644
--- a/docs/pcie.txt
+++ b/docs/pcie.txt
@@ -110,18 +110,18 @@ Plug only PCI Express devices into PCI Express Ports.
   -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] 
 \
   -device ,bus=root_port1
 2.2.2 Using multi-function PCI Express Root Ports:
-  -device 
ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0] 
\
-  -device ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] 
\
-  -device ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] 
\
-2.2.2 Plugging a PCI Express device into a Switch:
+  -device 
ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] 
\
+  -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] 
\
+  -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] 
\
+2.2.3 Plugging a PCI Express device into a Switch:
   -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
   -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] 
 \
   -device 
xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]]
 \
   -device ,bus=downstream_port1

 Notes:
-  - (slot, chassis) pair is mandatory and must be
- unique for each PCI Express Root Port.
+  - (slot, chassis) pair is mandatory and must be unique for each
+PCI Express Root Port. slot is default to 0 when doesn't specify it.
   - 'addr' parameter can be 0 for all the examples above.





Reviewed-by: Marcel Apfelbaum 

Thanks,
Marcel



Re: [Qemu-devel] [PATCH 23/23] hw/arm/virt: Add board property to enable EL2

2016-12-28 Thread Andrew Jones
On Tue, Dec 13, 2016 at 10:36:24AM +, Peter Maydell wrote:
> Add a board level property to the virt board which will
> enable EL2 on the CPU if the user asks for it. The
> default is not to provide EL2. If EL2 is enabled then
> we will use SMC as our PSCI conduit, and report the
> virtualization support in the GICv3 device tree node.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/arm/virt.c | 45 +++--
>  1 file changed, 43 insertions(+), 2 deletions(-)
>

Reviewed-by: Andrew Jones  



Re: [Qemu-devel] [PATCH 21/23] hw/arm/virt: Support using SMC for PSCI

2016-12-28 Thread Andrew Jones
On Tue, Dec 13, 2016 at 10:36:22AM +, Peter Maydell wrote:
> If we are giving the guest a CPU with EL2, it is likely to
> want to use the HVC instruction itself, for instance for
> providing PSCI to inner guest VMs. This makes using HVC
> as the PSCI conduit for the outer QEMU a bad idea. We will
> want to use SMC instead is this case: this makes sense
> because QEMU's PSCI implementation is effectively an
> emulation of functionality provided by EL3 firmware.
> 
> Add code to support selecting the PSCI conduit to use,
> rather than hardcoding use of HVC.
> 
> Signed-off-by: Peter Maydell 
> ---
>  hw/arm/virt.c | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
>

Reviewed-by: Andrew Jones  



Re: [Qemu-devel] [PATCH 22/23] target-arm: Enable EL2 feature bit on A53 and A57

2016-12-28 Thread Andrew Jones
On Tue, Dec 13, 2016 at 10:36:23AM +, Peter Maydell wrote:
> Enable the ARM_FEATURE_EL2 bit on Cortex-A52 and
> Cortex-A57, since this is all now sufficiently implemented
> to work with the GICv3. We provide the usual CPU property
> to disable it for backwards compatibility with the older
> virt boards.
> 
> In this commit, we disable the EL2 feature on the
> virt and ZynpMP boards, so there is no overall effect.
> Another commit will expose a board-level property to
> allow the user to enable EL2.
> 
> Signed-off-by: Peter Maydell 
> ---
>  target-arm/cpu.h |  2 ++
>  hw/arm/virt.c|  4 
>  hw/arm/xlnx-zynqmp.c |  2 ++
>  target-arm/cpu.c | 12 
>  target-arm/cpu64.c   |  2 ++
>  5 files changed, 22 insertions(+)
>

Reviewed-by: Andrew Jones  



Re: [Qemu-devel] [PATCH v4 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches

2016-12-28 Thread no-reply
Hi,

Your series failed automatic build test. Please find the testing commands and
their output below. If you have docker installed, you can probably reproduce it
locally.

Message-id: 148292774946.380.3638349228328753405.st...@fimbulvetr.bsc.es
Subject: [Qemu-devel] [PATCH v4 0/7] trace: [tcg] Optimize per-vCPU tracing 
states with separate TB caches
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=16
make docker-test-quick@centos6
make docker-test-mingw@fedora
make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
fa951db trace: [trivial] Statically enable all guest events
3965b43 trace: [tcg, trivial] Re-align generated code
9e50f3c trace: [tcg] Do not generate TCG code to trace dinamically-disabled 
events
0e45e8e exec: [tcg] Use different TBs according to the vCPU's dynamic tracing 
state
bcbae22 trace: [tcg] Delay changes to dynamic state when translating
ca96b75 trace: Make trace_get_vcpu_event_count() inlinable
2d7cc5e exec: [tcg] Refactor flush of per-CPU virtual TB cache

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into 'dtc'...
Submodule path 'dtc': checked out '65cc4d2748a2c2e6f27f1cf39e07a5dbabd80ebf'
  BUILD   centos6
make[1]: Entering directory `/var/tmp/patchew-tester-tmp-cfk9gpom/src'
  ARCHIVE qemu.tgz
  ARCHIVE dtc.tgz
  COPYRUNNER
RUN test-quick in qemu:centos6 
Packages installed:
SDL-devel-1.2.14-7.el6_7.1.x86_64
ccache-3.1.6-2.el6.x86_64
epel-release-6-8.noarch
gcc-4.4.7-17.el6.x86_64
git-1.7.1-4.el6_7.1.x86_64
glib2-devel-2.28.8-5.el6.x86_64
libfdt-devel-1.4.0-1.el6.x86_64
make-3.81-23.el6.x86_64
package g++ is not installed
pixman-devel-0.32.8-1.el6.x86_64
tar-1.23-15.el6_8.x86_64
zlib-devel-1.2.3-29.el6.x86_64

Environment variables:
PACKAGES=libfdt-devel ccache tar git make gcc g++ zlib-devel 
glib2-devel SDL-devel pixman-devel epel-release
HOSTNAME=b3cd0d1df384
TERM=xterm
MAKEFLAGS= -j16
HISTSIZE=1000
J=16
USER=root
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
MAIL=/var/spool/mail/root
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
LANG=en_US.UTF-8
TARGET_LIST=
HISTCONTROL=ignoredups
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
LOGNAME=root
LESSOPEN=||/usr/bin/lesspipe.sh %s
FEATURES= dtc
DEBUG=
G_BROKEN_FILENAMES=1
CCACHE_HASHDIR=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu 
--prefix=/var/tmp/qemu-build/install
No C++ compiler available; disabling C++ specific optional code
Install prefix/var/tmp/qemu-build/install
BIOS directory/var/tmp/qemu-build/install/share/qemu
binary directory  /var/tmp/qemu-build/install/bin
library directory /var/tmp/qemu-build/install/lib
module directory  /var/tmp/qemu-build/install/lib/qemu
libexec directory /var/tmp/qemu-build/install/libexec
include directory /var/tmp/qemu-build/install/include
config directory  /var/tmp/qemu-build/install/etc
local state directory   /var/tmp/qemu-build/install/var
Manual directory  /var/tmp/qemu-build/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path   /tmp/qemu-test/src
C compilercc
Host C compiler   cc
C++ compiler  
Objective-C compiler cc
ARFLAGS   rv
CFLAGS-O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS   -I/usr/include/pixman-1-pthread -I/usr/include/glib-2.0 
-I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wmissing-include-dirs 
-Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wold-style-declaration -Wold-style-definition 
-Wtype-limits -fstack-protector-all
LDFLAGS   -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
make  make
install   install
pythonpython -B
smbd  /usr/sbin/smbd
module supportno
host CPU  x86_64
host big endian   no
target list   x86_64-softmmu aarch64-softmmu
tcg debug enabled no
gprof enabled no
sparse enabledno
strip binariesyes
profiler  no
static build  no
pixmansystem
SDL support   yes (1.2.14)
GTK support   no 
GTK GL supportno
VTE support   no 
TLS priority  NORMAL
GNUTLS supportno
GNUTLS rndno
libgcrypt no
libgcrypt kdf no
nettleno 
nettle kdfno
libtasn1  no
curses supportno
virgl support no
curl support  no
mingw32 support   no
Audio drivers oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS supportno
VNC support   yes
VNC SASL support  n

[Qemu-devel] [PATCH v4 5/7] trace: [tcg] Do not generate TCG code to trace dinamically-disabled events

2016-12-28 Thread Lluís Vilanova
If an event is dynamically disabled, the TCG code that calls the
execution-time tracer is not generated.

Removes the overheads of execution-time tracers for dynamically disabled
events. As a bonus, also avoids checking the event state when the
execution-time tracer is called from TCG-generated code (since otherwise
TCG would simply not call it).

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/__init__.py|1 +
 scripts/tracetool/format/h.py|   24 ++--
 scripts/tracetool/format/tcg_h.py|   19 ---
 scripts/tracetool/format/tcg_helper_c.py |3 ++-
 4 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 365446fa53..63168ccdf0 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -264,6 +264,7 @@ class Event(object):
 return self._FMT.findall(self.fmt)
 
 QEMU_TRACE   = "trace_%(name)s"
+QEMU_TRACE_NOCHECK   = "_nocheck__" + QEMU_TRACE
 QEMU_TRACE_TCG   = QEMU_TRACE + "_tcg"
 QEMU_DSTATE  = "_TRACE_%(NAME)s_DSTATE"
 QEMU_EVENT   = "_TRACE_%(NAME)s_EVENT"
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 3682f4e6a8..a78e50ef35 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -49,6 +49,19 @@ def generate(events, backend, group):
 backend.generate_begin(events, group)
 
 for e in events:
+# tracer without checks
+out('',
+'static inline void %(api)s(%(args)s)',
+'{',
+api=e.api(e.QEMU_TRACE_NOCHECK),
+args=e.args)
+
+if "disable" not in e.properties:
+backend.generate(e, group)
+
+out('}')
+
+# tracer wrapper with checks (per-vCPU tracing)
 if "vcpu" in e.properties:
 trace_cpu = next(iter(e.args))[1]
 cond = "trace_event_get_vcpu_state(%(cpu)s,"\
@@ -63,16 +76,15 @@ def generate(events, backend, group):
 'static inline void %(api)s(%(args)s)',
 '{',
 'if (%(cond)s) {',
+'%(api_nocheck)s(%(names)s);',
+'}',
+'}',
 api=e.api(),
+api_nocheck=e.api(e.QEMU_TRACE_NOCHECK),
 args=e.args,
+names=", ".join(e.args.names()),
 cond=cond)
 
-if "disable" not in e.properties:
-backend.generate(e, group)
-
-out('}',
-'}')
-
 backend.generate_end(events, group)
 
 out('#endif /* TRACE_%s_GENERATED_TRACERS_H */' % group.upper())
diff --git a/scripts/tracetool/format/tcg_h.py 
b/scripts/tracetool/format/tcg_h.py
index 5f213f6cba..71b5c09432 100644
--- a/scripts/tracetool/format/tcg_h.py
+++ b/scripts/tracetool/format/tcg_h.py
@@ -41,7 +41,7 @@ def generate(events, backend, group):
 
 for e in events:
 # just keep one of them
-if "tcg-trans" not in e.properties:
+if "tcg-exec" not in e.properties:
 continue
 
 out('static inline void %(name_tcg)s(%(args)s)',
@@ -53,12 +53,25 @@ def generate(events, backend, group):
 args_trans = e.original.event_trans.args
 args_exec = tracetool.vcpu.transform_args(
 "tcg_helper_c", e.original.event_exec, "wrapper")
+if "vcpu" in e.properties:
+trace_cpu = e.args.names()[0]
+cond = "trace_event_get_vcpu_state(%(cpu)s,"\
+   " TRACE_%(id)s)"\
+   % dict(
+   cpu=trace_cpu,
+   id=e.original.event_exec.name.upper())
+else:
+cond = "true"
+
 out('%(name_trans)s(%(argnames_trans)s);',
-'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'if (%(cond)s) {',
+'gen_helper_%(name_exec)s(%(argnames_exec)s);',
+'}',
 name_trans=e.original.event_trans.api(e.QEMU_TRACE),
 name_exec=e.original.event_exec.api(e.QEMU_TRACE),
 argnames_trans=", ".join(args_trans.names()),
-argnames_exec=", ".join(args_exec.names()))
+argnames_exec=", ".join(args_exec.names()),
+cond=cond)
 
 out('}')
 
diff --git a/scripts/tracetool/format/tcg_helper_c.py 
b/scripts/tracetool/format/tcg_helper_c.py
index cc26e03008..c2a05d756c 100644
--- a/scripts/tracetool/format/tcg_helper_c.py
+++ b/scripts/tracetool/format/tcg_helper_c.py
@@ -66,10 +66,11 @@ def generate(events, backend, group):
 
 out('void %(name_tcg)s(%(args_api)s)',
 '{',
+# NOTE: the check was already performed at TCG-generation time
 '%(name)s(%(args_call)s);',
 '}',
 name_tcg="helper_%s_proxy" % e.api(),
-

[Qemu-devel] [PATCH v4 3/7] trace: [tcg] Delay changes to dynamic state when translating

2016-12-28 Thread Lluís Vilanova
This keeps consistency across all decisions taken during translation
when the dynamic state of a vCPU is changed in the middle of translating
some guest code.

Signed-off-by: Lluís Vilanova 
---
 cpu-exec.c |   26 ++
 include/qom/cpu.h  |7 +++
 qom/cpu.c  |4 
 trace/control-target.c |   11 +--
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4188fed3c6..1b7366efb0 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -33,6 +33,7 @@
 #include "hw/i386/apic.h"
 #endif
 #include "sysemu/replay.h"
+#include "trace/control.h"
 
 /* -icount align implementation. */
 
@@ -451,9 +452,21 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #ifndef CONFIG_USER_ONLY
 } else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* delay changes to this vCPU's dstate during translation */
+atomic_set(&cpu->trace_dstate_delayed_req, false);
+atomic_set(&cpu->trace_dstate_must_delay, true);
+
 /* try to cause an exception pending in the log */
 cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
 *ret = -1;
+
+/* apply and disable delayed dstate changes */
+atomic_set(&cpu->trace_dstate_must_delay, false);
+if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
+bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 return true;
 #endif
 }
@@ -634,8 +647,21 @@ int cpu_exec(CPUState *cpu)
 
 for(;;) {
 cpu_handle_interrupt(cpu, &last_tb);
+
+/* delay changes to this vCPU's dstate during translation */
+atomic_set(&cpu->trace_dstate_delayed_req, false);
+atomic_set(&cpu->trace_dstate_must_delay, true);
+
 tb = tb_find(cpu, last_tb, tb_exit);
 cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
+
+/* apply and disable delayed dstate changes */
+atomic_set(&cpu->trace_dstate_must_delay, false);
+if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
+bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
+trace_get_vcpu_event_count());
+}
+
 /* Try to align the host and virtual clocks
if the guest is in advance */
 align_clocks(&sc, cpu);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 3f79a8e955..58255d06fa 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -295,6 +295,10 @@ struct qemu_work_item;
  * @kvm_fd: vCPU file descriptor for KVM.
  * @work_mutex: Lock to prevent multiple access to queued_work_*.
  * @queued_work_first: First asynchronous work pending.
+ * @trace_dstate_must_delay: Whether a change to trace_dstate must be delayed.
+ * @trace_dstate_delayed_req: Whether a change to trace_dstate was delayed.
+ * @trace_dstate_delayed: Delayed changes to trace_dstate (includes all changes
+ *to @trace_dstate).
  * @trace_dstate: Dynamic tracing state of events for this vCPU (bitmask).
  *
  * State of one CPU core or thread.
@@ -370,6 +374,9 @@ struct CPUState {
  * Dynamically allocated based on bitmap requried to hold up to
  * trace_get_vcpu_event_count() entries.
  */
+bool trace_dstate_must_delay;
+bool trace_dstate_delayed_req;
+unsigned long *trace_dstate_delayed;
 unsigned long *trace_dstate;
 
 /* TODO Move common fields from CPUArchState here. */
diff --git a/qom/cpu.c b/qom/cpu.c
index 03d9190f8c..d56496d28d 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -367,6 +367,9 @@ static void cpu_common_initfn(Object *obj)
 QTAILQ_INIT(&cpu->breakpoints);
 QTAILQ_INIT(&cpu->watchpoints);
 
+cpu->trace_dstate_must_delay = false;
+cpu->trace_dstate_delayed_req = false;
+cpu->trace_dstate_delayed = bitmap_new(trace_get_vcpu_event_count());
 cpu->trace_dstate = bitmap_new(trace_get_vcpu_event_count());
 
 cpu_exec_initfn(cpu);
@@ -375,6 +378,7 @@ static void cpu_common_initfn(Object *obj)
 static void cpu_common_finalize(Object *obj)
 {
 CPUState *cpu = CPU(obj);
+g_free(cpu->trace_dstate_delayed);
 g_free(cpu->trace_dstate);
 }
 
diff --git a/trace/control-target.c b/trace/control-target.c
index 7ebf6e0bcb..aba8db55de 100644
--- a/trace/control-target.c
+++ b/trace/control-target.c
@@ -69,13 +69,20 @@ void trace_event_set_vcpu_state_dynamic(CPUState *vcpu,
 if (state_pre != state) {
 if (state) {
 trace_events_enabled_count++;
-set_bit(vcpu_id, vcpu->trace_dstate);
+set_bit(vcpu_id, vcpu->trace_dstate_delayed);
+if (!atomic_read(&vcpu->trace_dstate_must_delay)) {
+set_bit(vcpu_id, vcpu->trace_dstate);
+

[Qemu-devel] [PATCH v4 6/7] trace: [tcg, trivial] Re-align generated code

2016-12-28 Thread Lluís Vilanova
Last patch removed a nesting level in generated code. Re-align all code
generated by backends to be 4-column aligned.

Signed-off-by: Lluís Vilanova 
---
 scripts/tracetool/backend/dtrace.py |2 +-
 scripts/tracetool/backend/ftrace.py |   20 ++--
 scripts/tracetool/backend/log.py|   17 +
 scripts/tracetool/backend/simple.py |2 +-
 scripts/tracetool/backend/syslog.py |6 +++---
 scripts/tracetool/backend/ust.py|2 +-
 6 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/scripts/tracetool/backend/dtrace.py 
b/scripts/tracetool/backend/dtrace.py
index 79505c6b1a..b3a8645bf0 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -41,6 +41,6 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('QEMU_%(uppername)s(%(argnames)s);',
+out('QEMU_%(uppername)s(%(argnames)s);',
 uppername=event.name.upper(),
 argnames=", ".join(event.args.names()))
diff --git a/scripts/tracetool/backend/ftrace.py 
b/scripts/tracetool/backend/ftrace.py
index db9fe7ad57..dd0eda4441 100644
--- a/scripts/tracetool/backend/ftrace.py
+++ b/scripts/tracetool/backend/ftrace.py
@@ -29,17 +29,17 @@ def generate_h(event, group):
 if len(event.args) > 0:
 argnames = ", " + argnames
 
-out('{',
-'char ftrace_buf[MAX_TRACE_STRLEN];',
-'int unused __attribute__ ((unused));',
-'int trlen;',
-'if (trace_event_get_state(%(event_id)s)) {',
-'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
-' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
-'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
-'unused = write(trace_marker_fd, ftrace_buf, trlen);',
-'}',
+out('{',
+'char ftrace_buf[MAX_TRACE_STRLEN];',
+'int unused __attribute__ ((unused));',
+'int trlen;',
+'if (trace_event_get_state(%(event_id)s)) {',
+'trlen = snprintf(ftrace_buf, MAX_TRACE_STRLEN,',
+' "%(name)s " %(fmt)s "\\n" 
%(argnames)s);',
+'trlen = MIN(trlen, MAX_TRACE_STRLEN - 1);',
+'unused = write(trace_marker_fd, ftrace_buf, trlen);',
 '}',
+'}',
 name=event.name,
 args=event.args,
 event_id="TRACE_" + event.name.upper(),
diff --git a/scripts/tracetool/backend/log.py b/scripts/tracetool/backend/log.py
index 4f4a4d38b1..7d2c3abe75 100644
--- a/scripts/tracetool/backend/log.py
+++ b/scripts/tracetool/backend/log.py
@@ -35,14 +35,15 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'struct timeval _now;',
-'gettimeofday(&_now, NULL);',
-'qemu_log_mask(LOG_TRACE, "%%d@%%zd.%%06zd:%(name)s " 
%(fmt)s "\\n",',
-'  getpid(),',
-'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
-'  %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'struct timeval _now;',
+'gettimeofday(&_now, NULL);',
+'qemu_log_mask(LOG_TRACE,',
+'  "%%d@%%zd.%%06zd:%(name)s " %(fmt)s "\\n",',
+'  getpid(),',
+'  (size_t)_now.tv_sec, (size_t)_now.tv_usec',
+'  %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip("\n"),
diff --git a/scripts/tracetool/backend/simple.py 
b/scripts/tracetool/backend/simple.py
index 85f61028e2..a28460b1e4 100644
--- a/scripts/tracetool/backend/simple.py
+++ b/scripts/tracetool/backend/simple.py
@@ -37,7 +37,7 @@ def generate_h_begin(events, group):
 
 
 def generate_h(event, group):
-out('_simple_%(api)s(%(args)s);',
+out('_simple_%(api)s(%(args)s);',
 api=event.api(),
 args=", ".join(event.args.names()))
 
diff --git a/scripts/tracetool/backend/syslog.py 
b/scripts/tracetool/backend/syslog.py
index b8ff2790c4..1ce627f0fc 100644
--- a/scripts/tracetool/backend/syslog.py
+++ b/scripts/tracetool/backend/syslog.py
@@ -35,9 +35,9 @@ def generate_h(event, group):
 else:
 cond = "trace_event_get_state(%s)" % ("TRACE_" + event.name.upper())
 
-out('if (%(cond)s) {',
-'syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
-'}',
+out('if (%(cond)s) {',
+'syslog(LOG_INFO, "%(name)s " %(fmt)s %(argnames)s);',
+'}',
 cond=cond,
 name=event.name,
 fmt=event.fmt.rstrip

[Qemu-devel] [PATCH v4 4/7] exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state

2016-12-28 Thread Lluís Vilanova
Every vCPU now uses a separate set of TBs for each set of dynamic
tracing event state values. Each set of TBs can be used by any number of
vCPUs to maximize TB reuse when vCPUs have the same tracing state.

This feature is later used by tracetool to optimize tracing of guest
code events.

The maximum number of TB sets is defined as 2^E, where E is the number
of events that have the 'vcpu' property (their state is stored in
CPUState->trace_dstate).

For this to work, a change on the dynamic tracing state of a vCPU will
force it to flush its virtual TB cache (which is only indexed by
address), and fall back to the physical TB cache (which now contains the
vCPU's dynamic tracing state as part of the hashing function).

Signed-off-by: Lluís Vilanova 
---
 cpu-exec.c|   26 +-
 include/exec/exec-all.h   |7 +++
 include/exec/tb-hash-xx.h |   10 +-
 include/exec/tb-hash.h|5 +++--
 tests/qht-bench.c |2 +-
 trace/control-target.c|3 +++
 trace/control.h   |3 +++
 translate-all.c   |   16 ++--
 8 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 1b7366efb0..a377505b9c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -262,6 +262,7 @@ struct tb_desc {
 CPUArchState *env;
 tb_page_addr_t phys_page1;
 uint32_t flags;
+TRACE_QHT_VCPU_DSTATE_TYPE trace_vcpu_dstate;
 };
 
 static bool tb_cmp(const void *p, const void *d)
@@ -273,6 +274,7 @@ static bool tb_cmp(const void *p, const void *d)
 tb->page_addr[0] == desc->phys_page1 &&
 tb->cs_base == desc->cs_base &&
 tb->flags == desc->flags &&
+tb->trace_vcpu_dstate == desc->trace_vcpu_dstate &&
 !atomic_read(&tb->invalid)) {
 /* check next page if needed */
 if (tb->page_addr[1] == -1) {
@@ -294,7 +296,8 @@ static bool tb_cmp(const void *p, const void *d)
 static TranslationBlock *tb_htable_lookup(CPUState *cpu,
   target_ulong pc,
   target_ulong cs_base,
-  uint32_t flags)
+  uint32_t flags,
+  uint32_t trace_vcpu_dstate)
 {
 tb_page_addr_t phys_pc;
 struct tb_desc desc;
@@ -303,10 +306,11 @@ static TranslationBlock *tb_htable_lookup(CPUState *cpu,
 desc.env = (CPUArchState *)cpu->env_ptr;
 desc.cs_base = cs_base;
 desc.flags = flags;
+desc.trace_vcpu_dstate = trace_vcpu_dstate;
 desc.pc = pc;
 phys_pc = get_page_addr_code(desc.env, pc);
 desc.phys_page1 = phys_pc & TARGET_PAGE_MASK;
-h = tb_hash_func(phys_pc, pc, flags);
+h = tb_hash_func(phys_pc, pc, flags, trace_vcpu_dstate);
 return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
 }
 
@@ -318,16 +322,24 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 TranslationBlock *tb;
 target_ulong cs_base, pc;
 uint32_t flags;
+unsigned long trace_vcpu_dstate_bitmap;
+TRACE_QHT_VCPU_DSTATE_TYPE trace_vcpu_dstate;
 bool have_tb_lock = false;
 
+bitmap_copy(&trace_vcpu_dstate_bitmap, cpu->trace_dstate,
+trace_get_vcpu_event_count());
+memcpy(&trace_vcpu_dstate, &trace_vcpu_dstate_bitmap,
+   sizeof(trace_vcpu_dstate));
+
 /* we record a subset of the CPU state. It will
always be the same before a given translated block
is executed. */
 cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
 tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
 if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
- tb->flags != flags)) {
-tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+ tb->flags != flags ||
+ tb->trace_vcpu_dstate != trace_vcpu_dstate)) {
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, trace_vcpu_dstate);
 if (!tb) {
 
 /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
@@ -341,7 +353,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 /* There's a chance that our desired tb has been translated while
  * taking the locks so we check again inside the lock.
  */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+tb = tb_htable_lookup(cpu, pc, cs_base, flags, trace_vcpu_dstate);
 if (!tb) {
 /* if no translated code available, then translate it now */
 tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
@@ -465,6 +477,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 if (unlikely(atomic_read(&cpu->trace_dstate_delayed_req))) {
 bitmap_copy(cpu->trace_dstate, cpu->trace_dstate_delayed,
 trace_get_vcpu_event_count());
+tb_flush_jmp_cache_all(cpu);
 }
 
 return tr

[Qemu-devel] [PATCH v4 2/7] trace: Make trace_get_vcpu_event_count() inlinable

2016-12-28 Thread Lluís Vilanova
Later patches will make use of it.

Signed-off-by: Lluís Vilanova 
---
 trace/control-internal.h |5 +
 trace/control.c  |9 ++---
 trace/control.h  |2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/trace/control-internal.h b/trace/control-internal.h
index a9d395a587..beb98a0d2c 100644
--- a/trace/control-internal.h
+++ b/trace/control-internal.h
@@ -16,6 +16,7 @@
 
 
 extern int trace_events_enabled_count;
+extern uint32_t trace_next_vcpu_id;
 
 
 static inline bool trace_event_is_pattern(const char *str)
@@ -82,6 +83,10 @@ static inline bool 
trace_event_get_vcpu_state_dynamic(CPUState *vcpu,
 return trace_event_get_vcpu_state_dynamic_by_vcpu_id(vcpu, vcpu_id);
 }
 
+static inline uint32_t trace_get_vcpu_event_count(void)
+{
+return trace_next_vcpu_id;
+}
 
 void trace_event_register_group(TraceEvent **events);
 
diff --git a/trace/control.c b/trace/control.c
index 1a7bee6ddc..52d0e343fa 100644
--- a/trace/control.c
+++ b/trace/control.c
@@ -36,7 +36,7 @@ typedef struct TraceEventGroup {
 static TraceEventGroup *event_groups;
 static size_t nevent_groups;
 static uint32_t next_id;
-static uint32_t next_vcpu_id;
+uint32_t trace_next_vcpu_id;
 
 QemuOptsList qemu_trace_opts = {
 .name = "trace",
@@ -65,7 +65,7 @@ void trace_event_register_group(TraceEvent **events)
 for (i = 0; events[i] != NULL; i++) {
 events[i]->id = next_id++;
 if (events[i]->vcpu_id != TRACE_VCPU_EVENT_NONE) {
-events[i]->vcpu_id = next_vcpu_id++;
+events[i]->vcpu_id = trace_next_vcpu_id++;
 }
 }
 event_groups = g_renew(TraceEventGroup, event_groups, nevent_groups + 1);
@@ -299,8 +299,3 @@ char *trace_opt_parse(const char *optarg)
 
 return trace_file;
 }
-
-uint32_t trace_get_vcpu_event_count(void)
-{
-return next_vcpu_id;
-}
diff --git a/trace/control.h b/trace/control.h
index ccaeac8552..80d326c4d1 100644
--- a/trace/control.h
+++ b/trace/control.h
@@ -237,7 +237,7 @@ char *trace_opt_parse(const char *optarg);
  *
  * Return the number of known vcpu-specific events
  */
-uint32_t trace_get_vcpu_event_count(void);
+static uint32_t trace_get_vcpu_event_count(void);
 
 
 #include "trace/control-internal.h"




[Qemu-devel] [PATCH v4 7/7] trace: [trivial] Statically enable all guest events

2016-12-28 Thread Lluís Vilanova
The optimizations of this series makes it feasible to have them
available on all builds.

Signed-off-by: Lluís Vilanova 
---
 trace-events |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/trace-events b/trace-events
index f74e1d3d22..0a0f4d9cd6 100644
--- a/trace-events
+++ b/trace-events
@@ -159,7 +159,7 @@ vcpu guest_cpu_reset(void)
 #
 # Mode: user, softmmu
 # Targets: TCG(all)
-disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
+vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) "info=%d", 
"vaddr=0x%016"PRIx64" info=%d"
 
 # @num: System call number.
 # @arg*: System call argument value.
@@ -168,7 +168,7 @@ disable vcpu tcg guest_mem_before(TCGv vaddr, uint8_t info) 
"info=%d", "vaddr=0x
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, 
uint64_t arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, 
uint64_t arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
+vcpu guest_user_syscall(uint64_t num, uint64_t arg1, uint64_t arg2, uint64_t 
arg3, uint64_t arg4, uint64_t arg5, uint64_t arg6, uint64_t arg7, uint64_t 
arg8) "num=0x%016"PRIx64" arg1=0x%016"PRIx64" arg2=0x%016"PRIx64" 
arg3=0x%016"PRIx64" arg4=0x%016"PRIx64" arg5=0x%016"PRIx64" arg6=0x%016"PRIx64" 
arg7=0x%016"PRIx64" arg8=0x%016"PRIx64
 
 # @num: System call number.
 # @ret: System call result value.
@@ -177,4 +177,4 @@ disable vcpu guest_user_syscall(uint64_t num, uint64_t 
arg1, uint64_t arg2, uint
 #
 # Mode: user
 # Targets: TCG(all)
-disable vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) 
"num=0x%016"PRIx64" ret=0x%016"PRIx64
+vcpu guest_user_syscall_ret(uint64_t num, uint64_t ret) "num=0x%016"PRIx64" 
ret=0x%016"PRIx64




[Qemu-devel] [PATCH v4 1/7] exec: [tcg] Refactor flush of per-CPU virtual TB cache

2016-12-28 Thread Lluís Vilanova
The function is reused in later patches.

Signed-off-by: Lluís Vilanova 
---
 cputlb.c|2 +-
 include/exec/exec-all.h |6 ++
 translate-all.c |   14 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 813279f3bc..9bf9960e1b 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -80,7 +80,7 @@ void tlb_flush(CPUState *cpu, int flush_global)
 
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
-memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+tb_flush_jmp_cache_all(cpu);
 
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a8c13cee66..57cd978578 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -256,6 +256,12 @@ struct TranslationBlock {
 };
 
 void tb_free(TranslationBlock *tb);
+/**
+ * tb_flush_jmp_cache_all:
+ *
+ * Flush the virtual translation block cache.
+ */
+void tb_flush_jmp_cache_all(CPUState *env);
 void tb_flush(CPUState *cpu);
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 
diff --git a/translate-all.c b/translate-all.c
index 3dd9214904..29ccb9e546 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -941,11 +941,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data 
tb_flush_count)
 }
 
 CPU_FOREACH(cpu) {
-int i;
-
-for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
-atomic_set(&cpu->tb_jmp_cache[i], NULL);
-}
+tb_flush_jmp_cache_all(cpu);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -1741,6 +1737,14 @@ void tb_check_watchpoint(CPUState *cpu)
 }
 }
 
+void tb_flush_jmp_cache_all(CPUState *cpu)
+{
+int i;
+for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+atomic_set(&cpu->tb_jmp_cache[i], NULL);
+}
+}
+
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
must be at the end of the TB */




[Qemu-devel] [PATCH v4 0/7] trace: [tcg] Optimize per-vCPU tracing states with separate TB caches

2016-12-28 Thread Lluís Vilanova
Optimizes tracing of events with the 'tcg' and 'vcpu' properties (e.g., memory
accesses), making it feasible to statically enable them by default on all QEMU
builds.

Some quick'n'dirty numbers with 400.perlbench (SPECcpu2006) on the train input
(medium size - suns.pl) and the guest_mem_before event:

* vanilla, statically disabled
real0m2,259s
user0m2,252s
sys 0m0,004s

* vanilla, statically enabled (overhead: 2.18x)
real0m4,921s
user0m4,912s
sys 0m0,008s

* multi-tb, statically disabled (overhead: 0.99x) [within noise range]
real0m2,228s
user0m2,216s
sys 0m0,008s

* multi-tb, statically enabled (overhead: 0.99x) [within noise range]
real0m2,229s
user0m2,224s
sys 0m0,004s


Right now, events with the 'tcg' property always generate TCG code to trace that
event at guest code execution time, where the event's dynamic state is checked.

This series adds a performance optimization where TCG code for events with the
'tcg' and 'vcpu' properties is not generated if the event is dynamically
disabled. This optimization raises two issues:

* An event can be dynamically disabled/enabled after the corresponding TCG code
  has been generated (i.e., a new TB with the corresponding code should be
  used).

* Each vCPU can have a different dynamic state for the same event (i.e., tracing
  the memory accesses of only one process pinned to a vCPU).

To handle both issues, this series integrates the dynamic tracing event state
into the TB hashing function, so that vCPUs tracing different events will use
separate TBs. Note that only events with the 'vcpu' property are used for
hashing (as stored in the bitmap of CPUState->trace_dstate).

This makes dynamic event state changes on vCPUs very efficient, since they can
use TBs produced by other vCPUs while on the same event state combination (or
produced by the same vCPU, earlier).

Discarded alternatives:

* Emitting TCG code to check if an event needs tracing, where we should still
  move the tracing call code to either a cold path (making tracing performance
  worse), or leave it inlined (making non-tracing performance worse).

* Eliding TCG code only when *zero* vCPUs are tracing an event, since enabling
  it on a single vCPU will impact the performance of all other vCPUs that are
  not tracing that event.

Signed-off-by: Lluís Vilanova 
---

Changes in v4
=

* Incorporate trace_dstate into the TB hashing function instead of using
  multiple physical TB caches [suggested by Richard Henderson].


Changes in v3
=

* Rebase on 0737f32daf.
* Do not use reserved symbol prefixes ("__") [Stefan Hajnoczi].
* Refactor trace_get_vcpu_event_count() to be inlinable.
* Optimize cpu_tb_cache_set_requested() (hottest path).


Changes in v2
=

* Fix bitmap copy in cpu_tb_cache_set_apply().
* Split generated code re-alignment into a separate patch [Daniel P. Berrange].


Lluís Vilanova (7):
  exec: [tcg] Refactor flush of per-CPU virtual TB cache
  trace: Make trace_get_vcpu_event_count() inlinable
  trace: [tcg] Delay changes to dynamic state when translating
  exec: [tcg] Use different TBs according to the vCPU's dynamic tracing 
state
  trace: [tcg] Do not generate TCG code to trace dinamically-disabled events
  trace: [tcg,trivial] Re-align generated code
  trace: [trivial] Statically enable all guest events


 cpu-exec.c   |   52 +++---
 cputlb.c |2 +
 include/exec/exec-all.h  |   13 
 include/exec/tb-hash-xx.h|   10 +-
 include/exec/tb-hash.h   |5 ++-
 include/qom/cpu.h|7 
 qom/cpu.c|4 ++
 scripts/tracetool/__init__.py|1 +
 scripts/tracetool/backend/dtrace.py  |2 +
 scripts/tracetool/backend/ftrace.py  |   20 ++--
 scripts/tracetool/backend/log.py |   17 +-
 scripts/tracetool/backend/simple.py  |2 +
 scripts/tracetool/backend/syslog.py  |6 ++-
 scripts/tracetool/backend/ust.py |2 +
 scripts/tracetool/format/h.py|   24 ++
 scripts/tracetool/format/tcg_h.py|   19 +--
 scripts/tracetool/format/tcg_helper_c.py |3 +-
 tests/qht-bench.c|2 +
 trace-events |6 ++-
 trace/control-internal.h |5 +++
 trace/control-target.c   |   14 +++-
 trace/control.c  |9 +
 trace/control.h  |5 ++-
 translate-all.c  |   30 +
 24 files changed, 196 insertions(+), 64 deletions(-)


To: qemu-devel@nongnu.org
Cc: Stefan Hajnoczi 
Cc: Eduardo Habkost 
Cc: Eric Blake