date:20170124

Re: [Qemu-devel] [PATCH] net: Mark 'vlan' parameter as deprecated

2017-01-24 Thread Thomas Huth

On 25.01.2017 08:50, Jason Wang wrote:
> 
> 
> On 2017年01月24日 17:42, Thomas Huth wrote:
>> The 'vlan' parameter is a continuous source of confusion for the users,
>> many people mix it up with the more common term VLAN (the link layer
>> packet encapsulation), and even if they realize that the QEMU 'vlan' is
>> rather some kind of network hub emulation, there is still a high risk
>> that they configure their QEMU networking in a wrong way with this
>> parameter (e.g. by hooking NICs together, so they get a 'loopback'
>> between one and the other NIC).
>> Thus at one point in time, we should finally get rid of the 'vlan'
>> feature in QEMU. Let's do a first step in this direction by declaring
>> the 'vlan' parameter as deprecated and informing the users to use the
>> 'netdev' parameter instead.
>>
>> Signed-off-by: Thomas Huth 
>> ---
>>   net/net.c | 6 ++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/net/net.c b/net/net.c
>> index 939fe31..fb7af3a 100644
>> --- a/net/net.c
>> +++ b/net/net.c
>> @@ -970,6 +970,7 @@ static int net_client_init1(const void *object,
>> bool is_netdev, Error **errp)
>>   const Netdev *netdev;
>>   const char *name;
>>   NetClientState *peer = NULL;
>> +static bool vlan_warned;
>> if (is_netdev) {
>>   netdev = object;
>> @@ -1050,6 +1051,11 @@ static int net_client_init1(const void *object,
>> bool is_netdev, Error **errp)
>>   !opts->u.nic.data->has_netdev) {
>>   peer = net_hub_add_port(net->has_vlan ? net->vlan : 0,
>> NULL);
>>   }
>> +
>> +if (net->has_vlan && !vlan_warned) {
>> +error_report("'vlan' is deprecated. Please use 'netdev'
>> instead.");
>> +vlan_warned = true;
>> +}
>>   }
>> if (net_client_init_fun[netdev->type](netdev, name, peer,
>> errp) < 0) {
> 
> Looks good, but do really want only warn once? Consider we have monitor
> command e.g "host_net_add".

I don't mind ... I can remove the "vlan_warned" check, but then you'll
get a lot of error messages at once if you start QEMU with multiple
"-net" parameters that use "vlan=...". Is that ok for you? If yes, I'll
send an updated v2 of my patch without that "vlan_warned" check.

 Thomas

Re: [Qemu-devel] [PATCH] net: Mark 'vlan' parameter as deprecated

2017-01-24 Thread Jason Wang




On 2017年01月24日 17:42, Thomas Huth wrote:

The 'vlan' parameter is a continuous source of confusion for the users,
many people mix it up with the more common term VLAN (the link layer
packet encapsulation), and even if they realize that the QEMU 'vlan' is
rather some kind of network hub emulation, there is still a high risk
that they configure their QEMU networking in a wrong way with this
parameter (e.g. by hooking NICs together, so they get a 'loopback'
between one and the other NIC).
Thus at one point in time, we should finally get rid of the 'vlan'
feature in QEMU. Let's do a first step in this direction by declaring
the 'vlan' parameter as deprecated and informing the users to use the
'netdev' parameter instead.

Signed-off-by: Thomas Huth 
---
  net/net.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/net/net.c b/net/net.c
index 939fe31..fb7af3a 100644
--- a/net/net.c
+++ b/net/net.c
@@ -970,6 +970,7 @@ static int net_client_init1(const void *object, bool 
is_netdev, Error **errp)
  const Netdev *netdev;
  const char *name;
  NetClientState *peer = NULL;
+static bool vlan_warned;
  
  if (is_netdev) {

  netdev = object;
@@ -1050,6 +1051,11 @@ static int net_client_init1(const void *object, bool 
is_netdev, Error **errp)
  !opts->u.nic.data->has_netdev) {
  peer = net_hub_add_port(net->has_vlan ? net->vlan : 0, NULL);
  }
+
+if (net->has_vlan && !vlan_warned) {
+error_report("'vlan' is deprecated. Please use 'netdev' instead.");
+vlan_warned = true;
+}
  }
  
  if (net_client_init_fun[netdev->type](netdev, name, peer, errp) < 0) {


Looks good, but do really want only warn once? Consider we have monitor 
command e.g "host_net_add".


Thanks

Re: [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate

2017-01-24 Thread Jason Wang




On 2017年01月25日 14:44, Peter Xu wrote:

On Wed, Jan 25, 2017 at 06:37:23AM +, Tian, Kevin wrote:

From: Peter Xu [mailto:pet...@redhat.com]
Sent: Wednesday, January 25, 2017 11:46 AM

On Wed, Jan 25, 2017 at 11:09:39AM +0800, Jason Wang wrote:


On 2017年01月24日 12:52, Peter Xu wrote:

On Mon, Jan 23, 2017 at 06:36:17PM +0800, Jason Wang wrote:

On 2017年01月20日 21:08, Peter Xu wrote:

Before this one we only invalidate context cache when we receive context
entry invalidations. However it's possible that the invalidation also
contains a domain switch (only if cache-mode is enabled for vIOMMU). In
that case we need to notify all the registered components about the new
mapping.

Signed-off-by: Peter Xu 
---
  hw/i386/intel_iommu.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f9c5142..4b08b4d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1146,6 +1146,16 @@ static void

vtd_context_device_invalidate(IntelIOMMUState *s,

  trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
   VTD_PCI_FUNC(devfn_it));
  vtd_as->context_cache_entry.context_cache_gen = 0;
+/*
+ * So a device is moving out of (or moving into) a
+ * domain, a replay() suites here to notify all the
+ * IOMMU_NOTIFIER_MAP registers about this change.
+ * This won't bring bad even if we have no such
+ * notifier registered - the IOMMU notification
+ * framework will skip MAP notifications if that
+ * happened.
+ */
+memory_region_iommu_replay_all(_as->iommu);

DSI and GLOBAL questions come back again or no need to care about them :) ?

DSI/GLOBAL hanldings are in patch 20 (though it'll be squashed into 18
in my next post). Is that what you mean above?

Seems not, I mean DSI/GLOBAL for context cache invalidation instead of IOTLB
:)

Yes, I should possibly do the same thing for context cache global
invalidations. IIUC context global invalidation should be a superset
of iotlb invalidation, so maybe I'll add one more patch to call iotlb
invalidation in context invalidation as well. Kevin/others, please
correct me if I misunderstood the spec. Thanks,


context invalidation is not superset of iotlb invalidation. The spec just
requires software to always follow a context-cache invalidation with
a PASID-cache invalidation, followed by an IOTLB invalidation.

Thanks for pointing out. If so, looks like current version suffice for
this, right? (so no further change needed for this one)

-- peterx



We could not depends on guest or driver behavior. I still prefer to add 
unamp for DSI/GLOBAL to prevent us from leaking mappings.


Thanks

Re: [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices

2017-01-24 Thread Jason Wang




On 2017年01月25日 09:31, Alex Williamson wrote:

On Wed, 25 Jan 2017 09:19:25 +0800
Jason Wang  wrote:


On 2017年01月24日 03:40, Alex Williamson wrote:

On Mon, 23 Jan 2017 18:23:44 +0800
Jason Wang  wrote:
  

On 2017年01月23日 11:34, Peter Xu wrote:

On Mon, Jan 23, 2017 at 09:55:39AM +0800, Jason Wang wrote:

On 2017年01月22日 17:04, Peter Xu wrote:

On Sun, Jan 22, 2017 at 04:08:04PM +0800, Jason Wang wrote:

[...]
 

+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
+   uint16_t domain_id, hwaddr addr,
+   uint8_t am)
+{
+IntelIOMMUNotifierNode *node;
+VTDContextEntry ce;
+int ret;
+
+QLIST_FOREACH(node, &(s->notifiers_list), next) {
+VTDAddressSpace *vtd_as = node->vtd_as;
+ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+   vtd_as->devfn, );
+if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
+vtd_page_walk(, addr, addr + (1 << am) * VTD_PAGE_SIZE,
+  vtd_page_invalidate_notify_hook,
+  (void *)_as->iommu, true);

Why not simply trigger the notifier here? (or is this vfio required?)

Because we may only want to notify part of the region - we are with
mask here, but not exact size.

Consider this: guest (with caching mode) maps 12K memory (4K*3 pages),
the mask will be extended to 16K in the guest. In that case, we need
to explicitly go over the page entry to know that the 4th page should
not be notified.

I see. Then it was required by vfio only, I think we can add a fast path for
!CM in this case by triggering the notifier directly.

I noted this down (to be further investigated in my todo), but I don't
know whether this can work, due to the fact that I think it is still
legal that guest merge more than one PSIs into one. For example, I
don't know whether below is legal:

- guest invalidate page (0, 4k)
- guest map new page (4k, 8k)
- guest send single PSI of (0, 8k)

In that case, it contains both map/unmap, and looks like it didn't
disobay the spec as well?

Not sure I get your meaning, you mean just send single PSI instead of two?
  
 

Another possible issue is, consider (with CM) a 16K contiguous iova with the
last page has already been mapped. In this case, if we want to map first
three pages, when handling IOTLB invalidation, am would be 16K, then the
last page will be mapped twice. Can this lead some issue?

I don't know whether guest has special handling of this kind of
request.

This seems quite usual I think? E.g iommu_flush_iotlb_psi() did:

static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
 struct dmar_domain *domain,
 unsigned long pfn, unsigned int pages,
 int ih, int map)
{
   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
   uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
   u16 did = domain->iommu_did[iommu->seq_id];
...

  

Besides, imho to completely solve this problem, we still need that
per-domain tree. Considering that currently the tree is inside vfio, I
see this not a big issue as well.

Another issue I found is: with this series, VFIO_IOMMU_MAP_DMA seems
become guest trigger-able. And since VFIO allocate its own structure to
record dma mapping, this seems open a window for evil guest to exhaust
host memory which is even worse.

You're thinking of pci-assign, vfio does page accounting such that a
user can only lock pages up to their locked memory limit.  Exposing the
mapping ioctl within the guest is not a different problem from exposing
the ioctl to the host user from a vfio perspective.  Thanks,

Alex
  

Yes, but what if an evil guest that maps all iovas to the same gpa?

Doesn't matter, we'd account that gpa each time it's mapped, so
effectively the locked memory limit is equal to the iova size the user
can map.  Thanks,

Alex


I see. Good to know this.

Thanks

Re: [Qemu-devel] [RFC PATCH 1/3] Factor out {linux, bsd}-user/qemu.h

2017-01-24 Thread Alessandro Di Federico

On Tue, 24 Jan 2017 11:18:07 +
Peter Maydell  wrote:

> So my issue with this is that currently linux-user is pretty
> well maintained and tested, but bsd-user is basically unmaintained
> and not even compile tested. 

I can say it compiles. I've tried to build it on FreeBSD 11. Didn't
tested it though.

> So it's convenient to have bsd-user code be basically its own
> distinct standalone thing, because then when we fix linux-user we
> don't have to care too much about whether bsd-user might be
> accidentally broken by our changes.
> 
> If we had more active testing of bsd-user I'd be a bit keener
> about removing some of the code duplication.

Currently, the code looks exactly the same to me. If we don't factor it
out I'll have to make a third copy of it for libtcg, which is kinda
annoying. I'm factoring out only a portion of the header, which
probably doesn't change that much. If in the future there's some change
we're not sure about we can keep it in the non-factored out part.

I'll check if qemu-user for BSD has something in the testsuite.

--
Alessandro Di Federico
PhD student at Politecnico di Milano

Re: [Qemu-devel] [RFC PATCH 3/3] Introduce libtcg infrastructure

2017-01-24 Thread Alessandro Di Federico

On Tue, 24 Jan 2017 10:08:56 +
Marc-André Lureau  wrote:

>
> > * If there's at least a *-libtcg, compile everything as position
> >   independent code.
> 
> why not limit it to libtcg code?
> 

At least we have to enable it on "common" object files (those compiled
only once, independently from the current target), therefore I see
little benefit in compiling without PIC other object files.
I also think it might be a good idea to switch to PIE executables
altogether.

> 
> > * In case we're building libtcg, install the output binary in the
> >   $PREFIX/lib directory instead of $PREFIX/bin.
> >  
> 
> If it's installed globally, I think we have to deal with versioning,
> common prefix, and list of visible symbols.
> 

Yeah, for symbol visibility I was thinking to force all the symbols to
be hidden except for those in tcg.c, which would have a `tcg_` prefix.
I didn't think about versioning yet.

> 
> What's the plan with this tcg.c? Shouldn't the functions be stubs or
> library user callbacks?
> 
> What is test() doing?
> 
> Please add a libtcg test under tests/ (even minimal, like a basic
> translate/dump).
> 

This patch was to get a first round of opinions. The test function was
only supposed to reference some functions I know I'll use to make sure
that I'm linking in all the required object files.
In the next set of patches I'll provide something that actually does
something.

> 
> >
> >  ifdef CONFIG_USER_ONLY
> > +ifdef CONFIG_LIBTCG
> > +# libtcg
> > +QEMU_PROG=libtcg-$(TARGET_NAME)$(DSOSUF)
> > +QEMU_PROG_BUILD = $(QEMU_PROG)
> > +QEMU_CFLAGS+=-fPIC
> > +
> > +# Change the install directory
> > +PROGS_INSTALL_DIR := $(libdir)
> >  
> 
> okay, we may want to rename it to something like QEMU_BUILD perhaps
> 

You mean the QEMU_PROG_BUILD? ACK.

>
> >  install: all
> > +ifdef CONFIG_LIBTINYCODE
> > +   $(INSTALL_DATA) $(SRC_PATH)/tcg/tcg-opc.h
> > $(SRC_PATH)/libtcg/tcg.h "$(DESTDIR)$(includedir)"
> > +endif
> >  
> 
> LIBTINYCODE/LIBTCG?
> 

Leftover from my internal version.

>
> > -crypto-obj-y += aes.o
> > +crypto-obj-$(call lnot,$(CONFIG_LIBTCG)) += aes.o
> >  
> 
> why that change?
> 

aes.o was being linked in, but we don't use it in LIBTCG. Maybe this
should be fixed in an higher level.

> 
> > diff --git a/libtcg/tcg.h b/libtcg/tcg.h
> > new file mode 100644
> > index 00..e69de29bb2
> >  
> 
> What is this empty file for?
> 

Just a placeholder for the header file to install. This patch goal was
to show all the changes to the build system.

Thanks for the comments.

--
Alessandro Di Federico
PhD student at Politecnico di Milano

Re: [Qemu-devel] [RFC PATCH 2/3] *-user targets object files decoupling

2017-01-24 Thread Alessandro Di Federico

On Tue, 24 Jan 2017 10:09:25 +
Marc-André Lureau  wrote:

>
> diff --git a/exec.c b/exec.c
> index 47835c1dc1..66f8187281 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -672,9 +672,11 @@ void cpu_exec_unrealizefn(CPUState *cpu)
>  if (cc->vmsd != NULL) {
>  vmstate_unregister(NULL, cc->vmsd, cpu);
>  }
> +#ifndef CONFIG_USER_ONLY
>  if (qdev_get_vmsd(DEVICE(cpu)) == NULL) {
>  vmstate_unregister(NULL, _cpu_common, cpu);
>  }
> +#endif
> 
> 
> Shouldn't this #ifndef block also include the vmsd
> vmstate_unregister() above? That would be matching
> cpu_exec_realizefn() #ifndef.
> 

IIRC the idea here was to break a depency introduced by
"vmstate_cpu_common".
I'll include the previous if statement in the `ifndef`.

Let me know if I missed a comment on this patch.

--
Alessandro Di Federico
PhD student at Politecnico di Milano

Re: [Qemu-devel] [RFC PATCH 1/3] Factor out {linux, bsd}-user/qemu.h

2017-01-24 Thread Alessandro Di Federico

On Tue, 24 Jan 2017 10:10:08 +
Marc-André Lureau  wrote:

> 
> Looks good. Probably worth to mention that the main difference is in
> commit 658f2dc970996d547a641b5685e384ebe6f2648e not being applied to
> bsd-user.
> 

Yeah, I'll rebase my next patches on master.

My only doubt about this patch was whether the linux-* and bsd-*
implementations of put_user are equivalent. The linux one seems more
optimized, but they look equivalent to me.

--
Alessandro Di Federico
PhD student at Politecnico di Milano

Re: [Qemu-devel] [PATCH v2] qmp: Fix argument name in error message of device-list-properties

2017-01-24 Thread Markus Armbruster

Lin Ma  writes:

> The argument is called "typename", not "name".
>
> [Thanks to Markus for correcting the commit message]
>
> Signed-off-by: Lin Ma 
> ---
>  qmp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/qmp.c b/qmp.c
> index 0028f0b..886059e 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -531,12 +531,12 @@ DevicePropertyInfoList 
> *qmp_device_list_properties(const char *typename,
>  
>  klass = object_class_dynamic_cast(klass, TYPE_DEVICE);
>  if (klass == NULL) {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "name", TYPE_DEVICE);
> +error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "typename", 
> TYPE_DEVICE);
>  return NULL;
>  }
>  
>  if (object_class_is_abstract(klass)) {
> -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "name",
> +error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "typename",
> "non-abstract device type");
>  return NULL;
>  }

Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH] cirrus: fix oob access issue

2017-01-24 Thread Gerd Hoffmann

  Hi,

> > I have read all the discuss, very long and useful, but I think I still
> > need some
> > time to get a full understand. So I think one of you can provide the
> > formal patch to 
> > describe the issue in more detail.
> 
> Your patch is almost correct.

New version out for review.

> Gerd, do you think you can raise this with the SRT? (I mean, if you
> agree that this is a security issue.)

Yes, it is.  Getting a CVE number for it is wip.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] cirrus: fix oob access issue

2017-01-24 Thread Gerd Hoffmann

  Hi,

> > The negative pitch means (I think) that "addr" points to the lower
> > left corner of the rectangle.
> >
> > The second part guarantees that the last blitted byte fits (lower
> > right corner).
> 
> To which Gerd responded "upper left". In retrospect I don't understand
> why we didn't discuss that question further, as it now seems that we
> were both wrong -- "addr" stands for bottom right, in the negative pitch
> case.

/me looks at d3532a0db02296e687711b8cdc7791924efccea0 and I can't
remember I wrote that code :-o

And I can't remember the discussion either.

The good thing is I probably looked more careful at the code because of
that ...

> Unfortunately, the original patch was meant to address the
> then-embargoed CVE-2014-8106. Since we have a bug in that code (= a
> security fix), this issue should have been reported privately as well,

It has been reported privately first.  I've actually suggested to send
it to the public list without embargo, given that we are moving away
from cirrus so this is less critical than it used to be two years ago.
Cirrus isn't the default display adapter any more in qemu, since years,
and management apps (virt-manager, ovirt, ...) are following.

cheers,
  Gerd

[Qemu-devel] [PATCH] cirrus: fix oob access issue (CVE-2017-TODO)

2017-01-24 Thread Gerd Hoffmann

From: Li Qiang 

When doing bitblt copy in backward mode, we should minus the
blt width first just like the adding in the forward mode. This
can avoid the oob access of the front of vga's vram.

Signed-off-by: Li Qiang 
Message-id: 5887254f.863a240a.2c122.5...@mx.google.com

{ kraxel: with backward blits (negative pitch) addr is the topmost
  address, so check it as-is against vram size ]

Cc: qemu-sta...@nongnu.org
Cc: P J P 
Cc: Laszlo Ersek 
Cc: Paolo Bonzini 
Cc: Wolfgang Bumiller 
Fixes: d3532a0db02296e687711b8cdc7791924efccea0 (CVE-2014-8106)
Signed-off-by: Gerd Hoffmann 
---
 hw/display/cirrus_vga.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
index 379910d..b8c29a6 100644
--- a/hw/display/cirrus_vga.c
+++ b/hw/display/cirrus_vga.c
@@ -277,10 +277,9 @@ static bool blit_region_is_unsafe(struct CirrusVGAState *s,
 }
 if (pitch < 0) {
 int64_t min = addr
-+ ((int64_t)s->cirrus_blt_height-1) * pitch;
-int32_t max = addr
-+ s->cirrus_blt_width;
-if (min < 0 || max > s->vga.vram_size) {
++ ((int64_t)s->cirrus_blt_height - 1) * pitch
+- s->cirrus_blt_width;
+if (min < 0 || addr > s->vga.vram_size) {
 return true;
 }
 } else {
-- 
1.8.3.1

[Qemu-devel] [PATCH 2/2] savevm: Add new helpers to process the different stages of loadvm/savevm

2017-01-24 Thread zhanghailiang

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_loadvm_state_begin(),
qemu_load_device_state() and qemu_savevm_live_state() to achieve
different process during migration.

Signed-off-by: zhanghailiang 
Signed-off-by: Li Zhijian 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/sysemu/sysemu.h |  4 
 migration/savevm.c  | 41 +
 2 files changed, 45 insertions(+)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index ff8ffb5..d9214bf 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -126,7 +126,11 @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, 
const char *name,
uint64_t *start_list,
uint64_t *length_list);
 
+void qemu_savevm_live_state(QEMUFile *f);
+
 int qemu_loadvm_state(QEMUFile *f);
+int qemu_loadvm_state_begin(QEMUFile *f);
+int qemu_load_device_state(QEMUFile *f);
 
 extern int autostart;
 
diff --git a/migration/savevm.c b/migration/savevm.c
index 92b3d6c..04dee4f 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1264,6 +1264,13 @@ done:
 return ret;
 }
 
+void qemu_savevm_live_state(QEMUFile *f)
+{
+/* save QEMU_VM_SECTION_END section */
+qemu_savevm_state_complete_precopy(f, true);
+qemu_put_byte(f, QEMU_VM_EOF);
+}
+
 static int qemu_save_device_state(QEMUFile *f)
 {
 SaveStateEntry *se;
@@ -2064,6 +2071,40 @@ int qemu_loadvm_state(QEMUFile *f)
 return ret;
 }
 
+int qemu_loadvm_state_begin(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+Error *local_err = NULL;
+int ret;
+
+if (qemu_savevm_state_blocked(_err)) {
+error_report_err(local_err);
+return -EINVAL;
+}
+/* Load QEMU_VM_SECTION_START section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to loadvm begin work: %d", ret);
+}
+return ret;
+}
+
+int qemu_load_device_state(QEMUFile *f)
+{
+MigrationIncomingState *mis = migration_incoming_get_current();
+int ret;
+
+/* Load QEMU_VM_SECTION_FULL section */
+ret = qemu_loadvm_state_main(f, mis);
+if (ret < 0) {
+error_report("Failed to load device state: %d", ret);
+return ret;
+}
+
+cpu_synchronize_all_post_init();
+return 0;
+}
+
 void hmp_savevm(Monitor *mon, const QDict *qdict)
 {
 BlockDriverState *bs, *bs1;
-- 
1.8.3.1

[Qemu-devel] [PATCH 1/2] savevm: split save/find loadvm_handlers entry into two helper functions

2017-01-24 Thread zhanghailiang

COLO's checkpoint process is based on migration process,
everytime we do checkpoint we will repeat the process of savevm and loadvm.

So we will call qemu_loadvm_section_start_full() repeatedly, It will
add all migration sections information into loadvm_handlers list everytime,
which will lead to memory leak.

To fix it, we split the process of saving and finding section entry into two
helper functions, we will check if section info was exist in loadvm_handlers
list before save it.

This modifications have no side effect for normal migration.

Signed-off-by: zhanghailiang 
Reviewed-by: Dr. David Alan Gilbert 
---
 migration/savevm.c | 55 +++---
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index f9c06e9..92b3d6c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1805,6 +1805,37 @@ void loadvm_free_handlers(MigrationIncomingState *mis)
 }
 }
 
+static LoadStateEntry *loadvm_save_section_entry(MigrationIncomingState *mis,
+ SaveStateEntry *se,
+ uint32_t section_id,
+ uint32_t version_id)
+{
+LoadStateEntry *le;
+
+/* Add entry */
+le = g_malloc0(sizeof(*le));
+
+le->se = se;
+le->section_id = section_id;
+le->version_id = version_id;
+QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
+return le;
+}
+
+static LoadStateEntry *loadvm_find_section_entry(MigrationIncomingState *mis,
+ uint32_t section_id)
+{
+LoadStateEntry *le;
+
+QLIST_FOREACH(le, >loadvm_handlers, entry) {
+if (le->section_id == section_id) {
+break;
+}
+}
+
+return le;
+}
+
 static int
 qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
 {
@@ -1847,15 +1878,12 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
-/* Add entry */
-le = g_malloc0(sizeof(*le));
-
-le->se = se;
-le->section_id = section_id;
-le->version_id = version_id;
-QLIST_INSERT_HEAD(>loadvm_handlers, le, entry);
-
-ret = vmstate_load(f, le->se, le->version_id);
+ /* Check if we have saved this section info before, if not, save it */
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
+le = loadvm_save_section_entry(mis, se, section_id, version_id);
+}
+ret = vmstate_load(f, se, version_id);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%x of"
  " device '%s'", instance_id, idstr);
@@ -1878,12 +1906,9 @@ qemu_loadvm_section_part_end(QEMUFile *f, 
MigrationIncomingState *mis)
 section_id = qemu_get_be32(f);
 
 trace_qemu_loadvm_state_section_partend(section_id);
-QLIST_FOREACH(le, >loadvm_handlers, entry) {
-if (le->section_id == section_id) {
-break;
-}
-}
-if (le == NULL) {
+
+le = loadvm_find_section_entry(mis, section_id);
+if (!le) {
 error_report("Unknown savevm section %d", section_id);
 return -EINVAL;
 }
-- 
1.8.3.1

[Qemu-devel] [PATCH 0/2] savevm: some improvements benefit COLO's later optimization

2017-01-24 Thread zhanghailiang

Hi,

This series is split from previous COLO frame version which i post
long time ago.

These two patches are prerequisite for the later COLO optimization.
Since it is independent, I'd like to post it as a single series.

Both of them have been reviewed by Dave before, compared with old
version, I fixed the titles/comments for both of them, and I kept the
reviewed-by tag. (Hi Dave, please give it a glance again, since
it has been a long time ;) ).

Please review. Thanks.


zhanghailiang (2):
  savevm: split save/find loadvm_handlers entry into two helper
functions
  savevm: Add new helpers to process the different stages of
loadvm/savevm

 include/sysemu/sysemu.h |  4 +++
 migration/savevm.c  | 96 +
 2 files changed, 85 insertions(+), 15 deletions(-)

-- 
1.8.3.1

Re: [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate

2017-01-24 Thread Peter Xu

On Wed, Jan 25, 2017 at 06:37:23AM +, Tian, Kevin wrote:
> > From: Peter Xu [mailto:pet...@redhat.com]
> > Sent: Wednesday, January 25, 2017 11:46 AM
> > 
> > On Wed, Jan 25, 2017 at 11:09:39AM +0800, Jason Wang wrote:
> > >
> > >
> > > On 2017年01月24日 12:52, Peter Xu wrote:
> > > >On Mon, Jan 23, 2017 at 06:36:17PM +0800, Jason Wang wrote:
> > > >>
> > > >>On 2017年01月20日 21:08, Peter Xu wrote:
> > > >>>Before this one we only invalidate context cache when we receive 
> > > >>>context
> > > >>>entry invalidations. However it's possible that the invalidation also
> > > >>>contains a domain switch (only if cache-mode is enabled for vIOMMU). In
> > > >>>that case we need to notify all the registered components about the new
> > > >>>mapping.
> > > >>>
> > > >>>Signed-off-by: Peter Xu 
> > > >>>---
> > > >>>  hw/i386/intel_iommu.c | 10 ++
> > > >>>  1 file changed, 10 insertions(+)
> > > >>>
> > > >>>diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > > >>>index f9c5142..4b08b4d 100644
> > > >>>--- a/hw/i386/intel_iommu.c
> > > >>>+++ b/hw/i386/intel_iommu.c
> > > >>>@@ -1146,6 +1146,16 @@ static void
> > vtd_context_device_invalidate(IntelIOMMUState *s,
> > > >>>  trace_vtd_inv_desc_cc_device(bus_n, 
> > > >>> VTD_PCI_SLOT(devfn_it),
> > > >>>   VTD_PCI_FUNC(devfn_it));
> > > >>>  vtd_as->context_cache_entry.context_cache_gen = 0;
> > > >>>+/*
> > > >>>+ * So a device is moving out of (or moving into) a
> > > >>>+ * domain, a replay() suites here to notify all the
> > > >>>+ * IOMMU_NOTIFIER_MAP registers about this change.
> > > >>>+ * This won't bring bad even if we have no such
> > > >>>+ * notifier registered - the IOMMU notification
> > > >>>+ * framework will skip MAP notifications if that
> > > >>>+ * happened.
> > > >>>+ */
> > > >>>+memory_region_iommu_replay_all(_as->iommu);
> > > >>DSI and GLOBAL questions come back again or no need to care about them 
> > > >>:) ?
> > > >DSI/GLOBAL hanldings are in patch 20 (though it'll be squashed into 18
> > > >in my next post). Is that what you mean above?
> > >
> > > Seems not, I mean DSI/GLOBAL for context cache invalidation instead of 
> > > IOTLB
> > > :)
> > 
> > Yes, I should possibly do the same thing for context cache global
> > invalidations. IIUC context global invalidation should be a superset
> > of iotlb invalidation, so maybe I'll add one more patch to call iotlb
> > invalidation in context invalidation as well. Kevin/others, please
> > correct me if I misunderstood the spec. Thanks,
> > 
> 
> context invalidation is not superset of iotlb invalidation. The spec just
> requires software to always follow a context-cache invalidation with
> a PASID-cache invalidation, followed by an IOTLB invalidation.

Thanks for pointing out. If so, looks like current version suffice for
this, right? (so no further change needed for this one)

-- peterx

Re: [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate

2017-01-24 Thread Tian, Kevin

> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Wednesday, January 25, 2017 11:46 AM
> 
> On Wed, Jan 25, 2017 at 11:09:39AM +0800, Jason Wang wrote:
> >
> >
> > On 2017年01月24日 12:52, Peter Xu wrote:
> > >On Mon, Jan 23, 2017 at 06:36:17PM +0800, Jason Wang wrote:
> > >>
> > >>On 2017年01月20日 21:08, Peter Xu wrote:
> > >>>Before this one we only invalidate context cache when we receive context
> > >>>entry invalidations. However it's possible that the invalidation also
> > >>>contains a domain switch (only if cache-mode is enabled for vIOMMU). In
> > >>>that case we need to notify all the registered components about the new
> > >>>mapping.
> > >>>
> > >>>Signed-off-by: Peter Xu 
> > >>>---
> > >>>  hw/i386/intel_iommu.c | 10 ++
> > >>>  1 file changed, 10 insertions(+)
> > >>>
> > >>>diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > >>>index f9c5142..4b08b4d 100644
> > >>>--- a/hw/i386/intel_iommu.c
> > >>>+++ b/hw/i386/intel_iommu.c
> > >>>@@ -1146,6 +1146,16 @@ static void
> vtd_context_device_invalidate(IntelIOMMUState *s,
> > >>>  trace_vtd_inv_desc_cc_device(bus_n, 
> > >>> VTD_PCI_SLOT(devfn_it),
> > >>>   VTD_PCI_FUNC(devfn_it));
> > >>>  vtd_as->context_cache_entry.context_cache_gen = 0;
> > >>>+/*
> > >>>+ * So a device is moving out of (or moving into) a
> > >>>+ * domain, a replay() suites here to notify all the
> > >>>+ * IOMMU_NOTIFIER_MAP registers about this change.
> > >>>+ * This won't bring bad even if we have no such
> > >>>+ * notifier registered - the IOMMU notification
> > >>>+ * framework will skip MAP notifications if that
> > >>>+ * happened.
> > >>>+ */
> > >>>+memory_region_iommu_replay_all(_as->iommu);
> > >>DSI and GLOBAL questions come back again or no need to care about them :) 
> > >>?
> > >DSI/GLOBAL hanldings are in patch 20 (though it'll be squashed into 18
> > >in my next post). Is that what you mean above?
> >
> > Seems not, I mean DSI/GLOBAL for context cache invalidation instead of IOTLB
> > :)
> 
> Yes, I should possibly do the same thing for context cache global
> invalidations. IIUC context global invalidation should be a superset
> of iotlb invalidation, so maybe I'll add one more patch to call iotlb
> invalidation in context invalidation as well. Kevin/others, please
> correct me if I misunderstood the spec. Thanks,
> 

context invalidation is not superset of iotlb invalidation. The spec just
requires software to always follow a context-cache invalidation with
a PASID-cache invalidation, followed by an IOTLB invalidation.

Thanks
Kevin

Re: [Qemu-devel] [PATCH v4 3/9] docs: VM Generation ID device description

2017-01-24 Thread Laszlo Ersek

On 01/25/17 02:43, b...@skyportsystems.com wrote:
> From: Ben Warren 
> 
> This patch is based off an earlier version by Gal Hammer (gham...@redhat.com)

(1) This commit msg line is too long; please wrap it at 74 chars.

> 
> Signed-off-by: Gal Hammer 
> Signed-off-by: Ben Warren 
> ---
>  docs/specs/vmgenid.txt | 40 
>  1 file changed, 40 insertions(+)
>  create mode 100644 docs/specs/vmgenid.txt
> 
> diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
> new file mode 100644
> index 000..d36ed5b
> --- /dev/null
> +++ b/docs/specs/vmgenid.txt
> @@ -0,0 +1,40 @@
> +VIRTUAL MACHINE GENERATION ID
> +=
> +
> +Copyright (C) 2016 Red Hat, Inc.
> +Copyright (C) 2017 Skyport Systems, Inc.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.
> +See the COPYING file in the top-level directory.
> +
> +===
> +
> +The VM generation ID (vmgenid) device is an emulated device which
> +exposes a 128-bit, cryptographically random, integer value identifier.
> +This allows management applications (e.g. libvirt) to notify the guest
> +operating system when the virtual machine is executed with a different
> +configuration (e.g. snapshot execution or creation from a template).
> +
> +This is specified on the web at: 
> http://go.microsoft.com/fwlink/?LinkId=260709
> +
> +---
> +
> +The vmgenid device is a sysbus device with the ACPI ID "QEMUGVID".

(2) The ID has a typo, it should be QEMUVGID (see the next patch).

> +
> +The device has one properties,

(3) "property", singular

> which can be set using the command line
> +argument or the QMP interface:
> + guid - sets the value of the GUID.  A special value "auto" instructs
> +QEMU to generate a new random GUID.
> +For example:
> +QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
> +
> +Or to change guid in runtime use:
> + set-vm-generation-id guid=124e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87
> + set-vm-generation-id guid=auto
> +
> +According to the specification, any change to the GUID executes an
> +ACPI notification. The vmgenid device triggers the \_GPE._E05 handler
> +which executes the ACPI Notify operation.
> +
> +Although not specified in Microsoft's document, it is assumed that the
> +device is expected to use the little-endian system.
> 

The above seems okay to me, but it's too terse. Please add an ASCII
diagram (or document in plain text) that describes what object in what
ACPI table points where, what fw_cfg files are used, and so on.
Practically, the entire idea behind patch #4.

I'll try to write more about this in response to patch #4. The goal is
to help QEMU developers understand patch #4 better.

... I have something like this in mind:

https://lists.nongnu.org/archive/html/qemu-devel/2015-09/msg03409.html

(4) I think you could simply steal the "Requirements" section (just
mention in the commit message that that part was originally written up
by me). I think this should be helpful, because we can more easily
validate the implementation against such a bullet list.

(5) For the rest of the document, I suggest a quick skim. Quite a good
chunk will not apply to your implementation (justifiedly!), but it will
explain to you how OVMF behaves in this regard. Plus, it should provide
an example or two for ASCII diagrams.

(If you are interested why that patch series was abandoned ultimately:
it's because Microsoft's ACPI interpreter doesn't support the
DataTableRegion operator, as we found out.
.)

(6) With regard to OVMF, I'd like to emphasize the "OVMF SDT Header
probe suppressor". (SDT stands for ACPI System Description Table.)

In your implementation, it should be a 40-byte zero-padding at the front
of the fw_cfg blob with the GUID. It is necessary due to how OVMF
processes the ADD_POINTER commands (explained in the linked document).

The gist of it is that OVMF will look for an ACPI table header wherever
an ADD_POINTER command points, therefore 36 bytes -- see the size of
ACPI_TABLE_HEADER_DEF -- should be padded out first, to suppress that
heuristics for the vmgenid GUID. Then 4 more bytes are needed to round
up the start address of the GUID to a multiple of 8.

(7) Another comment related to the fw_cfg file that contains the GUID:
If you check requirement R1e, it follows that the fw_cfg file hosting
the GUID should be placed at a 4KB boundary, and cover a full page.

Therefore the fw_cfg file structure should be like:

- 36 bytes zero padding for suppressing OVMF's SDT Header probe
- 4 bytes zero padding for getting to an 8-byte alignment
- 8 bytes vmgenid GUID
- 4048 bytes zero padding, to ensure that nothing else is allocated on
the same page, either by the firmware or the OS.

The above layout has a consequence for both the ACPI payload you
generate, and for how QEMU acts upon the address

[Qemu-devel] [PATCH v2] qmp: Fix argument name in error message of device-list-properties

2017-01-24 Thread Lin Ma

The argument is called "typename", not "name".

[Thanks to Markus for correcting the commit message]

Signed-off-by: Lin Ma 
---
 qmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qmp.c b/qmp.c
index 0028f0b..886059e 100644
--- a/qmp.c
+++ b/qmp.c
@@ -531,12 +531,12 @@ DevicePropertyInfoList *qmp_device_list_properties(const 
char *typename,
 
 klass = object_class_dynamic_cast(klass, TYPE_DEVICE);
 if (klass == NULL) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "name", TYPE_DEVICE);
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "typename", 
TYPE_DEVICE);
 return NULL;
 }
 
 if (object_class_is_abstract(klass)) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "name",
+error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "typename",
"non-abstract device type");
 return NULL;
 }
-- 
2.9.2

Re: [Qemu-devel] [for 2.9] migration/tracing: Add tracing on save

2017-01-24 Thread Amit Shah

On (Mon) 12 Dec 2016 [12:58:38], Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Add some tracing to vmstate_subsection_save and vmstate_save_state
> to help in debugging when you're not sure if a conditional piece
> of data is being saved.
> 
> In vmstate_subsection_save I renamed the inner vmsd to avoid the aliasing
> and be able to print both names.
> 
> Signed-off-by: Dr. David Alan Gilbert 

Reviewed-by: Amit Shah 

Amit

Re: [Qemu-devel] [PATCH v5 00/18] VT-d: vfio enablement and misc enhances

2017-01-24 Thread Peter Xu

On Tue, Jan 24, 2017 at 04:48:49PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 24, 2017 at 07:49:10PM +0800, Peter Xu wrote:
> > On Tue, Jan 24, 2017 at 06:25:53PM +0800, Peter Xu wrote:
> > > This is v5 of vt-d vfio enablement series.
> > > 
> > > Most of the changes in v5 is not functionally, but related to
> > > comments, error_report()s, debugging, squashing patches, etc. (which
> > > are confirmed changes in v4), besides a tiny tweak when unmapping a
> > > whole address space (please see below changelog). There are still
> > > discussions in v4 thread, we can just continue there (or here), and
> > > from this version I'll remove RFC tag.
> > > 
> > > I didn't rebase to master since current master failed to build on my
> > > laptop (with a "vl.c/hax..." error), however this series should be
> > > okay to be applied cleanly upon it.
> > 
> > Sorry I forgot to append the todo list (please help adding in in case
> > I missed anything):
> > 
> > - don't need to notify IOTLB (psi/gsi/global) invalidations to devices
> >   that with ATS enabled
> > - investigate when guest map page while mask contains existing mapped
> >   pages (e.g. map 12k-16k first, then map 0-12k)
> > - coalesce unmap during page walk (currently, we send it once per
> >   page)
> > - when do PSI for unmap, whether we can send one notify directly
> >   instead of walking over the page table?
> > 
> > Above does not contain those that are still during discussion. And,
> > some of the entries still need tests to further prove it's
> > feasibility.
> > 
> > Thanks,
> > 
> > -- peterx
> 
> While valid I don't think above need to block merging.

Sorry for the delay of this series.

Looks like we still need to wait until we got explicit acks for vfio
patch 2-3 from Alex/Paolo/David since that'll be required by the
following series. (I'll address all Jason's new comments as well in
next post)

Do you like to merge the iommu cleanups first? Or you can also wait
for above to be confirmed, and I'll repost for a final version then
(if we don't have further comments).

Thanks,

-- peterx

Re: [Qemu-devel] [PATCH v5 02/18] vfio: introduce vfio_get_vaddr()

2017-01-24 Thread Peter Xu

On Tue, Jan 24, 2017 at 09:29:05AM -0700, Alex Williamson wrote:
> On Tue, 24 Jan 2017 18:25:55 +0800
> Peter Xu  wrote:
> 
> > A cleanup for vfio_iommu_map_notify(). Should have no functional change,
> > just to make the function shorter and easier to understand.
> > 
> > Signed-off-by: Peter Xu 
> > ---
> >  hw/vfio/common.c | 58 
> > +---
> >  1 file changed, 38 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 174f351..ce55dff 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -294,25 +294,14 @@ static bool 
> > vfio_listener_skipped_section(MemoryRegionSection *section)
> > section->offset_within_address_space & (1ULL << 63);
> >  }
> >  
> > -static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > +static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
> > +   bool *read_only)
> >  {
> > -VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > -VFIOContainer *container = giommu->container;
> > -hwaddr iova = iotlb->iova + giommu->iommu_offset;
> >  MemoryRegion *mr;
> >  hwaddr xlat;
> >  hwaddr len = iotlb->addr_mask + 1;
> > -void *vaddr;
> > -int ret;
> > -
> > -trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : 
> > "MAP",
> > -iova, iova + iotlb->addr_mask);
> > -
> > -if (iotlb->target_as != _space_memory) {
> > -error_report("Wrong target AS \"%s\", only system memory is 
> > allowed",
> > - iotlb->target_as->name ? iotlb->target_as->name : 
> > "none");
> > -return;
> > -}
> > +bool ret = false;
> > +bool writable = iotlb->perm & IOMMU_WO;
> >  
> >  /*
> >   * The IOMMU TLB entry we have just covers translation through
> > @@ -322,12 +311,13 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> > IOMMUTLBEntry *iotlb)
> >  rcu_read_lock();
> >  mr = address_space_translate(_space_memory,
> >   iotlb->translated_addr,
> > - , , iotlb->perm & IOMMU_WO);
> > + , , writable);
> >  if (!memory_region_is_ram(mr)) {
> >  error_report("iommu map to non memory area %"HWADDR_PRIx"",
> >   xlat);
> >  goto out;
> >  }
> > +
> >  /*
> >   * Translation truncates length to the IOMMU page size,
> >   * check that it did not truncate too much.
> > @@ -337,11 +327,41 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> > IOMMUTLBEntry *iotlb)
> >  goto out;
> >  }
> >  
> > +*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > +*read_only = !writable || mr->readonly;
> > +ret = true;
> > +
> > +out:
> > +rcu_read_unlock();
> > +return ret;
> > +}
> > +
> > +static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> > +{
> > +VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > +VFIOContainer *container = giommu->container;
> > +hwaddr iova = iotlb->iova + giommu->iommu_offset;
> > +bool read_only;
> > +void *vaddr;
> > +int ret;
> > +
> > +trace_vfio_iommu_map_notify(iotlb->perm == IOMMU_NONE ? "UNMAP" : 
> > "MAP",
> > +iova, iova + iotlb->addr_mask);
> > +
> > +if (iotlb->target_as != _space_memory) {
> > +error_report("Wrong target AS \"%s\", only system memory is 
> > allowed",
> > + iotlb->target_as->name ? iotlb->target_as->name : 
> > "none");
> > +return;
> > +}
> > +
> > +if (!vfio_get_vaddr(iotlb, , _only)) {
> > +return;
> > +}
> > +
> >  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > -vaddr = memory_region_get_ram_ptr(mr) + xlat;
> >  ret = vfio_dma_map(container, iova,
> > iotlb->addr_mask + 1, vaddr,
> > -   !(iotlb->perm & IOMMU_WO) || mr->readonly);
> > +   read_only);
> >  if (ret) {
> >  error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> >   "0x%"HWADDR_PRIx", %p) = %d (%m)",
> > @@ -357,8 +377,6 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> > IOMMUTLBEntry *iotlb)
> >   iotlb->addr_mask + 1, ret);
> >  }
> >  }
> > -out:
> > -rcu_read_unlock();
> 
> The comment from v4 still needs input from Paolo, is it valid to make
> use of vaddr (based on address_space_translate ->
> memory_region_get_ram_ptr) outside of the rcu read lock or could future
> BQL reduction efforts allow this to race?

Yes we should continue the discussion. Sorry for the inconvenient and
misunderstanding (I was only trying to provide a non-rfc version of
the series, in case Michael would like to pick up some cleanup any

Re: [Qemu-devel] [PATCH v4 2/9] linker-loader: Add new 'allocate and return address' cmd

2017-01-24 Thread Laszlo Ersek

On 01/25/17 02:43, b...@skyportsystems.com wrote:
> From: Ben Warren 
> 
> This adds a new linker-loader command to instruct the guest to allocate
> memory for a fw_cfg file and write the address back into another
> writeable fw_cfg file.  Knowing this address, QEMU can then write into
> guest memory at runtime.
> 
> Signed-off-by: Ben Warren 
> ---
>  hw/acpi/bios-linker-loader.c | 71 
> ++--
>  include/hw/acpi/bios-linker-loader.h |  7 
>  2 files changed, 75 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
> index d963ebe..1d991ba 100644
> --- a/hw/acpi/bios-linker-loader.c
> +++ b/hw/acpi/bios-linker-loader.c
> @@ -78,6 +78,22 @@ struct BiosLinkerLoaderEntry {
>  uint32_t length;
>  } cksum;
>  
> +/*
> + * COMMAND_ALLOCATE_RETURN_ADDR - allocate a table from 
> @alloc_ret_file
> + * subject to @alloc_ret_align alignment (must be power of 2)
> + * and @alloc_ret_zone (can be HIGH or FSEG) requirements.
> + * Additionally, return the address of the allocation in
> + * @addr_file.
> + *
> + * This may be used instead of COMMAND_ALLOCATE
> + */
> +struct {
> +char alloc_ret_file[BIOS_LINKER_LOADER_FILESZ];
> +uint32_t alloc_ret_align;
> +uint8_t alloc_ret_zone;
> +char alloc_ret_addr_file[BIOS_LINKER_LOADER_FILESZ];
> +};
> +
>  /* padding */
>  char pad[124];
>  };

(1) I remember that, when we discussed this command first, I provided
you with an explicit list of fields, for the new command structure.
Nonetheless, I suggest rephrasing both the comment block and the
structure definition in terms of the existent COMMAND_ALLOCATE:

- please give an actual struct tag to the structure that describes
COMMAND_ALLOCATE,
- reuse that type, as the first field, of the new structure,
- only add the new, last "addr_file" field explicitly.
  (This name is also simpler than "alloc_ret_addr_file".)

(2) Furthermore, the new union member lacks a name. It should be called
"alloc_ret_addr".

(3) The documentation can say something like,

  COMMAND_ALLOCATE_RETURN_ADDR - perform the COMMAND_ALLOCATE request
  described in @alloc_ret_addr.alloc, then write the resultant 8-byte
  allocation address, in little-endian encoding, to the fw_cfg file
  denoted by @alloc_ret_addr.addr_file.

> @@ -85,9 +101,10 @@ struct BiosLinkerLoaderEntry {
>  typedef struct BiosLinkerLoaderEntry BiosLinkerLoaderEntry;
>  
>  enum {
> -BIOS_LINKER_LOADER_COMMAND_ALLOCATE = 0x1,
> -BIOS_LINKER_LOADER_COMMAND_ADD_POINTER  = 0x2,
> -BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM = 0x3,
> +BIOS_LINKER_LOADER_COMMAND_ALLOCATE  = 0x1,
> +BIOS_LINKER_LOADER_COMMAND_ADD_POINTER   = 0x2,
> +BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM  = 0x3,
> +BIOS_LINKER_LOADER_COMMAND_ALLOCATE_RET_ADDR = 0x4,
>  };
>  
>  enum {
> @@ -278,3 +295,51 @@ void bios_linker_loader_add_pointer(BIOSLinker *linker,
>  
>  g_array_append_vals(linker->cmd_blob, , sizeof entry);
>  }
> +
> +/*
> + * bios_linker_loader_alloc_ret_addr: ask guest to load file into guest 
> memory
> + * and patch the address in another file

(4) I suggest: "and return the allocation address in another file".

> + *
> + * @linker: linker object instance
> + * @data_file: name of the file blob to be loaded
> + * @file_blob: pointer to blob corresponding to @file_name

(5) You renamed @file_name to @data_file, but then the reference on the
next line wasn't updated. I suggest the following names:

@data_file_name
@data_file_blob

and consistent references.

> + * @alloc_align: required minimal alignment in bytes. Must be a power of 2.
> + * @alloc_fseg: request allocation in FSEG zone (useful for the RSDP ACPI 
> table)
> + * @addr_file: destination file that will contain the address.
> + * This must already exist

(6) For consistency, I suggest @addr_file_name.

> + *
> + * Note: this command must precede any other linker command that uses
> + * the data file.
> + */
> +void bios_linker_loader_alloc_ret_addr(BIOSLinker *linker,
> +  const char *data_file,
> +  GArray *file_blob,
> +  uint32_t alloc_align,
> +  bool alloc_fseg,
> +  const char *addr_file)

(7) White space should be updated to line up the parameters with the
opening paren.

> +{
> +BiosLinkerLoaderEntry entry;
> +BiosLinkerFileEntry d_file = { g_strdup(data_file), file_blob};
> +
> +/* Address file is expected to already be loaded */
> +const BiosLinkerFileEntry *a_file =
> +bios_linker_find_file(linker, addr_file);

(8) That's incorrect. The fw_cfg file that receives the allocation
address is to be

Re: [Qemu-devel] [PATCH RFC v4 19/20] intel_iommu: unmap existing pages before replay

2017-01-24 Thread Peter Xu

On Wed, Jan 25, 2017 at 11:11:30AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月24日 15:31, Peter Xu wrote:
> >On Mon, Jan 23, 2017 at 06:40:12PM +0800, Jason Wang wrote:
> >>On 2017年01月20日 21:08, Peter Xu wrote:
> >>>  static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
> >>>  {
> >>>  memory_region_notify_one((IOMMUNotifier *)private, entry);
> >>>@@ -2711,13 +2768,16 @@ static void vtd_iommu_replay(MemoryRegion *mr, 
> >>>IOMMUNotifier *n)
> >>>  if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, ) == 0) {
> >>>  /*
> >>>- * Scanned a valid context entry, walk over the pages and
> >>>- * notify when needed.
> >>>+ * Scanned a valid context entry, we first make sure to remove
> >>>+ * all existing mappings in old domain, by sending UNMAP to
> >>>+ * all the notifiers. Then, we walk over the pages and notify
> >>>+ * with existing mapped new entries in the new domain.
> >>>   */
> >>A question is what if the context cache was invalidated but the device were
> >>not moved to a new domain. Then the code here does not do anything I
> >>believe?
> >Yes, it'll unmap all the stuffs and remap them all. I think that's my
> >intention, and can we really avoid this?
> >
> >>I think we should move vtd_address_space_unmap() in the context
> >>entry invalidation processing.
> >IMHO we need this "whole umap" thing not only for context entry
> >invalidation, but all the places that need this replay, no? For
> >example, when we receive domain flush.
> >
> >Thanks,
> >
> >-- peterx
> >
> 
> Consider the case that we move device from domain A to no domain. Looks like
> current code did nothing since it can not get a valid context entry during
> replay?

Right. I should do the "whole region unmap" thing even without a valid
context entry. Will fix it in next post. Thanks,

-- peterx

Re: [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices

2017-01-24 Thread Peter Xu

On Tue, Jan 24, 2017 at 09:24:29AM -0700, Alex Williamson wrote:

[...]

> > I see. Then this will be an strict requirement that we cannot do
> > coalescing during page walk, at least for mappings.
> > 
> > I didn't notice this before, but luckily current series is following
> > the rule above - we are basically doing the mapping in the unit of
> > pages. Normally, we should always be mapping with 4K pages, only if
> > guest provides huge pages in the VT-d page table, would we notify map
> > with >4K, though of course it can be either 2M/1G but never other
> > values.
> > 
> > The point is, guest should be aware of the existance of the above huge
> > pages, so it won't unmap (for example) a single 4k region within a 2M
> > huge page range. It'll either keep the huge page, or unmap the whole
> > huge page. In that sense, we are quite safe.
> > 
> > (for my own curiousity and out of topic: could I ask why we can't do
> >  that? e.g., we map 4K*2 pages, then we unmap the first 4K page?)
> 
> You understand why we can't do this in the hugepage case, right?  A
> hugepage means that at least one entire level of the page table is
> missing and that in order to unmap a subsection of it, we actually need
> to replace it with a new page table level, which cannot be done
> atomically relative to the rest of the PTEs in that entry.  Now what if
> we don't assume that hugepages are only the Intel defined 2MB & 1GB?
> AMD-Vi supports effectively arbitrary power of two page table entries.
> So what if we've passed a 2x 4K mapping where the physical pages were
> contiguous and vfio passed it as a direct 8K mapping to the IOMMU and
> the IOMMU has native support for 8K mappings.  We're in a similar
> scenario as the 2MB page, different page table layout though.

Thanks for the explaination. The AMD example is clear.

> 
> > > I would think (but please confirm), that when we're only tracking
> > > mappings generated by the guest OS that this works.  If the guest OS
> > > maps with 4k pages, we get map notifies for each of those 4k pages.  If
> > > they use 2MB pages, we get 2MB ranges and invalidations will come in
> > > the same granularity.  
> > 
> > I would agree (I haven't thought of a case that this might be a
> > problem).
> > 
> > > 
> > > An area of concern though is the replay mechanism in QEMU, I'll need to
> > > look for it in the code, but replaying an IOMMU domain into a new
> > > container *cannot* coalesce mappings or else it limits the granularity
> > > with which we can later accept unmaps. Take for instance a guest that
> > > has mapped a contiguous 2MB range with 4K pages.  They can unmap any 4K
> > > page within that range.  However if vfio gets a single 2MB mapping
> > > rather than 512 4K mappings, then the host IOMMU may use a hugepage
> > > mapping where our granularity is now 2MB.  Thanks,  
> > 
> > Is this the answer of my above question (which is for my own
> > curiosity)? If so, that'll kind of explain.
> > 
> > If it's just because vfio is smart enough on automatically using huge
> > pages when applicable (I believe it's for performance's sake), not
> > sure whether we can introduce a ioctl() to setup the iova_pgsizes
> > bitmap, as long as it is a subset of supported iova_pgsizes (from
> > VFIO_IOMMU_GET_INFO) - then when people wants to get rid of above
> > limitation, they can explicitly set the iova_pgsizes to only allow 4K
> > pages.
> > 
> > But, of course, this series can live well without it at least for now.
> 
> Yes, this is part of how vfio transparently makes use of hugepages in
> the IOMMU, we effectively disregard the supported page sizes bitmap
> (it's useless for anything other than determining the minimum page size
> anyway), and instead pass through the largest range of iovas which are
> physically contiguous.  The IOMMU driver can then make use of hugepages
> where available.  The VFIO_IOMMU_MAP_DMA ioctl does include a flags
> field where we could appropriate a bit to indicate map with minimum
> granularity, but that would not be as simple as triggering the
> disable_hugepages mapping path because the type1 driver would also need
> to flag the internal vfio_dma as being bisectable, if not simply
> converted to multiple vfio_dma structs internally.  Thanks,

I see, thanks!

-- peterx

Re: [Qemu-devel] [PATCH v4 1/9] ACPI: Add a function for building named qword entries

2017-01-24 Thread Laszlo Ersek

Hi Ben,

sorry about being late to reviewing this series. I hope I can now spend
more time on it.

- Please do not try to address my comments immediately. It's very
possible (even likely) that Igor, MST and myself could have different
opinions on things, so first please await agreement between your reviewers.

- I think you should have CC'd Igor and Michael directly. I'm adding
them to this reply; hopefully that will be enough for them to monitor
this series.

- I'll likely be unable to review everything with 100% coverage; so
addressing (or sufficiently refuting) my comments might not guarantee
that the next version will be merged at once.

With all that said:

On 01/25/17 02:43, b...@skyportsystems.com wrote:
> From: Ben Warren 
> 
> This is initially used to patch a 64-bit address into the VM Generation ID 
> SSDT

(1) I think this commit message line is overlong; I think we wrap at 74
chars or so. Not critical, but worth mentioning.

> 
> Signed-off-by: Ben Warren 
> ---
>  hw/acpi/aml-build.c | 28 
>  include/hw/acpi/aml-build.h |  4 
>  2 files changed, 32 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index b2a1e40..dc4edc2 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -285,6 +285,34 @@ build_append_named_dword(GArray *array, const char 
> *name_format, ...)
>  return offset;
>  }
>  
> +/*
> + * Build NAME(, 0x) where 0x is encoded as a qword,
> + * and return the offset to 0x for runtime patching.
> + *
> + * Warning: runtime patching is best avoided. Only use this as
> + * a replacement for DataTableRegion (for guests that don't
> + * support it).
> + */

(2) Since we're adding a qword (8-byte integer), the hexadecimal
constant should have 16 nibbles: 0x. (After copying the
comment from build_append_named_dword(), it should be actualized.)

(3) Normally the functions in this file that create AML operators carry
a note about the ACPI spec version and exact location that defines the
operator. I see that commit f20354910893 ("acpi: add
build_append_named_dword, returning an offset in buffer", 2016-03-01)
missed that too.

Unless this tradition has been willfully abandoned, I suggest that you
add the right reference here, and also (in retrospect) to
build_append_named_dword().

> +int
> +build_append_named_qword(GArray *array, const char *name_format, ...)
> +{
> +int offset;
> +va_list ap;
> +
> +build_append_byte(array, 0x08); /* NameOp */
> +va_start(ap, name_format);
> +build_append_namestringv(array, name_format, ap);
> +va_end(ap);
> +
> +build_append_byte(array, 0x0E); /* QWordPrefix */
> +
> +offset = array->len;
> +build_append_int_noprefix(array, 0x, 8);

(4) I guess the constant should be updated here too, for consistency
with the comment.

The rest looks okay. (I didn't verify 0x0E == QWordPrefix specifically,
because an error there would break the functionality immediately, and
you'd notice.)

Thanks!
Laszlo

> +assert(array->len == offset + 8);
> +
> +return offset;
> +}
> +
>  static GPtrArray *alloc_list;
>  
>  static Aml *aml_alloc(void)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 559326c..dbf63cf 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -385,6 +385,10 @@ int
>  build_append_named_dword(GArray *array, const char *name_format, ...)
>  GCC_FMT_ATTR(2, 3);
>  
> +int
> +build_append_named_qword(GArray *array, const char *name_format, ...)
> +GCC_FMT_ATTR(2, 3);
> +
>  void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
> uint64_t len, int node, MemoryAffinityFlags flags);
>  
>

Re: [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate

2017-01-24 Thread Peter Xu

On Wed, Jan 25, 2017 at 11:09:39AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月24日 12:52, Peter Xu wrote:
> >On Mon, Jan 23, 2017 at 06:36:17PM +0800, Jason Wang wrote:
> >>
> >>On 2017年01月20日 21:08, Peter Xu wrote:
> >>>Before this one we only invalidate context cache when we receive context
> >>>entry invalidations. However it's possible that the invalidation also
> >>>contains a domain switch (only if cache-mode is enabled for vIOMMU). In
> >>>that case we need to notify all the registered components about the new
> >>>mapping.
> >>>
> >>>Signed-off-by: Peter Xu 
> >>>---
> >>>  hw/i386/intel_iommu.c | 10 ++
> >>>  1 file changed, 10 insertions(+)
> >>>
> >>>diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> >>>index f9c5142..4b08b4d 100644
> >>>--- a/hw/i386/intel_iommu.c
> >>>+++ b/hw/i386/intel_iommu.c
> >>>@@ -1146,6 +1146,16 @@ static void 
> >>>vtd_context_device_invalidate(IntelIOMMUState *s,
> >>>  trace_vtd_inv_desc_cc_device(bus_n, 
> >>> VTD_PCI_SLOT(devfn_it),
> >>>   VTD_PCI_FUNC(devfn_it));
> >>>  vtd_as->context_cache_entry.context_cache_gen = 0;
> >>>+/*
> >>>+ * So a device is moving out of (or moving into) a
> >>>+ * domain, a replay() suites here to notify all the
> >>>+ * IOMMU_NOTIFIER_MAP registers about this change.
> >>>+ * This won't bring bad even if we have no such
> >>>+ * notifier registered - the IOMMU notification
> >>>+ * framework will skip MAP notifications if that
> >>>+ * happened.
> >>>+ */
> >>>+memory_region_iommu_replay_all(_as->iommu);
> >>DSI and GLOBAL questions come back again or no need to care about them :) ?
> >DSI/GLOBAL hanldings are in patch 20 (though it'll be squashed into 18
> >in my next post). Is that what you mean above?
> 
> Seems not, I mean DSI/GLOBAL for context cache invalidation instead of IOTLB
> :)

Yes, I should possibly do the same thing for context cache global
invalidations. IIUC context global invalidation should be a superset
of iotlb invalidation, so maybe I'll add one more patch to call iotlb
invalidation in context invalidation as well. Kevin/others, please
correct me if I misunderstood the spec. Thanks,

-- peterx

Re: [Qemu-devel] [PATCH] cirrus: fix oob access issue

2017-01-24 Thread Laszlo Ersek

On 01/25/17 02:18, Li Qiang wrote:
> 
> 
> 2017-01-25 0:12 GMT+08:00 Laszlo Ersek  >:
> 
> On 01/24/17 16:31, Wolfgang Bumiller wrote:
> > On Tue, Jan 24, 2017 at 01:29:58PM +0100, Gerd Hoffmann wrote:
> >>  if (pitch < 0) {
> >>  int64_t min = addr
> >> -+ ((int64_t)s->cirrus_blt_height-1) * pitch;
> >> ++ ((int64_t)s->cirrus_blt_height-1) * pitch
> >> +- s->cirrus_blt_width;
> >>  int32_t max = addr
> >>  + s->cirrus_blt_width;
> >>  if (min < 0 || max > s->vga.vram_size) {
> >>
> >
> > I believe this is incorrect. In this case (AFAIR), "addr"
> points to the
> > left-most pixel (= lowest address) of the bottom line (= highest
> > address).
> 
>  If I read the code correctly it is backwards *both* x and y
> axis, so
>  addr is the right-most pixel of the bottom line.
> >>>
> >>> What is "max" then? If "addr" is the right-most pixel of the bottom
> >>> line, then "max" has the highest address just past the
> rectangle, and
> >>> then adding anything non-negative to it makes no sense.
> >>
> >> That is (with the patch applied) inconsistent indeed.  We must either
> >> subtract s->cirrus_blt_width from min (addr == right-most), or
> add it to
> >> max (addr == left-most), but certainly not both.
> >>
> >>> ... Really as I remember it from the downstream review, the pitch is
> >>> negative (bottom-up), but the horizontal direction remains left
> to right.
> >>
> >> Looking at cirrus_vga_rop.h I see:
> >>  - cirrus_bitblt_rop_fwd_*() increment src and dst while walking the
> >>scanline, and
> >>  - cirrus_bitblt_rop_bkwd_*() decrement src and dst ...
> >>
> >> I still think x axis goes backwards too and therefore addr is the
> >> right-most pixel.
> >
> > I agree.
> >
> > Seeing how I've already been reading through that code I thought I'd
> > go over it again and too would say both min and max need to be
> adapted:
> >
> >  if (pitch < 0) {
> >  int64_t min = addr
> >  + ((int64_t)s->cirrus_blt_height-1) * pitch;
> > +- s->cirrus_blt_width;
> >  int32_t max = addr
> > -+ s->cirrus_blt_width;
> >  if (min < 0 || max > s->vga.vram_size) {
> >
> >
> > Here's the rest of my analysis in case anyone's interested (mostly to
> > justify the first part of this mail):
> >
> > -) Sign of the blit width/height & pitch values at the beginning:
> >
> > |s->cirrus_blt_width = (s->vga.gr [0x20] |
> (s->vga.gr [0x21] << 8)) + 1;
> > |s->cirrus_blt_height = (s->vga.gr [0x22] |
> (s->vga.gr [0x23] << 8)) + 1;
> > |s->cirrus_blt_dstpitch = (s->vga.gr [0x24]
> | (s->vga.gr [0x25] << 8));
> > |s->cirrus_blt_srcpitch = (s->vga.gr [0x26]
> | (s->vga.gr [0x27] << 8));
> >
> >vga.gr  is an uint8_t[]
> >
> > ==> all values are positive at this point
> >
> > -) Backward blits invert the sign:
> >
> > |if (s->cirrus_blt_mode & CIRRUS_BLTMODE_BACKWARDS) {
> > |s->cirrus_blt_dstpitch = -s->cirrus_blt_dstpitch;
> > |s->cirrus_blt_srcpitch = -s->cirrus_blt_srcpitch;
> >
> >
> > Starting with the simple one:
> >
> > -) Forward blits from cirrus_vga_rop.h:
> >Width (which is positive) is subtracted from the pitch (which
> >is also positive), turning srcpitch and dstpitch into values
> >representing the remaining bytes after the current row.
> >The pattern below repeats for all functions:
> >
> >   |static void
> >   |glue(cirrus_bitblt_rop_fwd_, ROP_NAME)(CirrusVGAState *s,
> >   | uint8_t *dst,const uint8_t *src,
> >   | int dstpitch,int srcpitch,
> >   | int bltwidth,int bltheight)
> >   |{
> >   |int x,y;
> >   |dstpitch -= bltwidth;
> >   |srcpitch -= bltwidth;
> >   |
> >   |if (bltheight > 1 && (dstpitch < 0 || srcpitch < 0)) {
> >   |return;
> >   |}
> >   |
> >   |for (y = 0; y < bltheight; y++) {
> >   |for (x = 0; x < bltwidth; x++) {
> >   |ROP_OP(dst, *src);
> >   |dst++;
> >   |src++;
> >   |}
> >   |dst += dstpitch;
> >   |src += srcpitch;
> >

Re: [Qemu-devel] [PATCH RFC v4 19/20] intel_iommu: unmap existing pages before replay

2017-01-24 Thread Jason Wang




On 2017年01月24日 15:31, Peter Xu wrote:

On Mon, Jan 23, 2017 at 06:40:12PM +0800, Jason Wang wrote:

On 2017年01月20日 21:08, Peter Xu wrote:

  static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
  {
  memory_region_notify_one((IOMMUNotifier *)private, entry);
@@ -2711,13 +2768,16 @@ static void vtd_iommu_replay(MemoryRegion *mr, 
IOMMUNotifier *n)
  if (vtd_dev_to_context_entry(s, bus_n, vtd_as->devfn, ) == 0) {
  /*
- * Scanned a valid context entry, walk over the pages and
- * notify when needed.
+ * Scanned a valid context entry, we first make sure to remove
+ * all existing mappings in old domain, by sending UNMAP to
+ * all the notifiers. Then, we walk over the pages and notify
+ * with existing mapped new entries in the new domain.
   */

A question is what if the context cache was invalidated but the device were
not moved to a new domain. Then the code here does not do anything I
believe?

Yes, it'll unmap all the stuffs and remap them all. I think that's my
intention, and can we really avoid this?


I think we should move vtd_address_space_unmap() in the context
entry invalidation processing.

IMHO we need this "whole umap" thing not only for context entry
invalidation, but all the places that need this replay, no? For
example, when we receive domain flush.

Thanks,

-- peterx



Consider the case that we move device from domain A to no domain. Looks 
like current code did nothing since it can not get a valid context entry 
during replay?


Thanks

Re: [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate

2017-01-24 Thread Jason Wang




On 2017年01月24日 12:52, Peter Xu wrote:

On Mon, Jan 23, 2017 at 06:36:17PM +0800, Jason Wang wrote:


On 2017年01月20日 21:08, Peter Xu wrote:

Before this one we only invalidate context cache when we receive context
entry invalidations. However it's possible that the invalidation also
contains a domain switch (only if cache-mode is enabled for vIOMMU). In
that case we need to notify all the registered components about the new
mapping.

Signed-off-by: Peter Xu 
---
  hw/i386/intel_iommu.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f9c5142..4b08b4d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1146,6 +1146,16 @@ static void 
vtd_context_device_invalidate(IntelIOMMUState *s,
  trace_vtd_inv_desc_cc_device(bus_n, VTD_PCI_SLOT(devfn_it),
   VTD_PCI_FUNC(devfn_it));
  vtd_as->context_cache_entry.context_cache_gen = 0;
+/*
+ * So a device is moving out of (or moving into) a
+ * domain, a replay() suites here to notify all the
+ * IOMMU_NOTIFIER_MAP registers about this change.
+ * This won't bring bad even if we have no such
+ * notifier registered - the IOMMU notification
+ * framework will skip MAP notifications if that
+ * happened.
+ */
+memory_region_iommu_replay_all(_as->iommu);

DSI and GLOBAL questions come back again or no need to care about them :) ?

DSI/GLOBAL hanldings are in patch 20 (though it'll be squashed into 18
in my next post). Is that what you mean above?


Seems not, I mean DSI/GLOBAL for context cache invalidation instead of 
IOTLB :)


Thanks



Thanks!

-- peterx

[Qemu-devel] [PATCH v4 9/9] tests: Add unit tests for the VM Generation ID feature

2017-01-24 Thread ben

From: Ben Warren 

The following tests are implemented:
* test that a GUID passed in by command line is propagated to the guest.
* test that changing the GUID at runtime via the monitor is reflected in
  the guest.
* test that the "auto" argument to the GUID generates a different, and
  correct GUID as seen by the guest.

  This patch is loosely based on a previous patch from:
  Gal Hammer   and Igor Mammedov 

Signed-off-by: Ben Warren 
---
 tests/Makefile.include |   2 +
 tests/vmgenid-test.c   | 184 +
 2 files changed, 186 insertions(+)
 create mode 100644 tests/vmgenid-test.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 22ea256..e2dcf15 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -239,6 +239,7 @@ check-qtest-i386-y += tests/usb-hcd-xhci-test$(EXESUF)
 gcov-files-i386-y += hw/usb/hcd-xhci.c
 check-qtest-i386-y += tests/pc-cpu-test$(EXESUF)
 check-qtest-i386-y += tests/q35-test$(EXESUF)
+check-qtest-i386-y += tests/vmgenid-test$(EXESUF)
 gcov-files-i386-y += hw/pci-host/q35.c
 check-qtest-i386-$(CONFIG_VHOST_NET_TEST_i386) += 
tests/vhost-user-test$(EXESUF)
 ifeq ($(CONFIG_VHOST_NET_TEST_i386),)
@@ -715,6 +716,7 @@ tests/ivshmem-test$(EXESUF): tests/ivshmem-test.o 
contrib/ivshmem-server/ivshmem
 tests/vhost-user-bridge$(EXESUF): tests/vhost-user-bridge.o 
contrib/libvhost-user/libvhost-user.o $(test-util-obj-y)
 tests/test-uuid$(EXESUF): tests/test-uuid.o $(test-util-obj-y)
 tests/test-arm-mptimer$(EXESUF): tests/test-arm-mptimer.o
+tests/vmgenid-test$(EXESUF): tests/vmgenid-test.o
 
 tests/migration/stress$(EXESUF): tests/migration/stress.o
$(call quiet-command, $(LINKPROG) -static -O3 $(PTHREAD_LIB) -o $@ $< 
,"LINK","$(TARGET_DIR)$@")
diff --git a/tests/vmgenid-test.c b/tests/vmgenid-test.c
new file mode 100644
index 000..0f5ba07
--- /dev/null
+++ b/tests/vmgenid-test.c
@@ -0,0 +1,184 @@
+/*
+ * QTest testcase for VM Generation ID
+ *
+ * Copyright (c) 2016 Red Hat, Inc.
+ * Copyright (c) 2017 Skyport Systems
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include 
+#include 
+#include "qemu/osdep.h"
+#include "qemu/bitmap.h"
+#include "qemu/uuid.h"
+#include "hw/acpi/acpi-defs.h"
+#include "acpi-utils.h"
+#include "libqtest.h"
+
+#define VGID_GUID "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
+
+static uint64_t vgia;
+
+typedef struct {
+AcpiTableHeader header;
+gchar name_op;
+gchar vgia[4];
+gchar val_op;
+uint64_t vgia_val;
+} QEMU_PACKED VgidTable;
+
+static uint64_t find_vgia(void)
+{
+uint32_t off;
+AcpiRsdpDescriptor rsdp_table;
+uint32_t rsdt;
+AcpiRsdtDescriptorRev1 rsdt_table;
+int tables_nr;
+uint32_t *tables;
+AcpiTableHeader ssdt_table;
+VgidTable vgid_table;
+int i;
+
+/* First, find the RSDP */
+for (off = 0xf; off < 0x10; off += 0x10) {
+uint8_t sig[] = "RSD PTR ";
+
+for (i = 0; i < sizeof sig - 1; ++i) {
+sig[i] = readb(off + i);
+}
+
+if (!memcmp(sig, "RSD PTR ", sizeof sig)) {
+break;
+}
+}
+g_assert_cmphex(off, <, 0x10);
+
+/* Parse the RSDP header so we can find the RSDT */
+ACPI_READ_FIELD(rsdp_table.signature, off);
+ACPI_ASSERT_CMP64(rsdp_table.signature, "RSD PTR ");
+
+ACPI_READ_FIELD(rsdp_table.checksum, off);
+ACPI_READ_ARRAY(rsdp_table.oem_id, off);
+ACPI_READ_FIELD(rsdp_table.revision, off);
+ACPI_READ_FIELD(rsdp_table.rsdt_physical_address, off);
+
+rsdt = rsdp_table.rsdt_physical_address;
+/* read the header */
+ACPI_READ_TABLE_HEADER(_table, rsdt);
+ACPI_ASSERT_CMP(rsdt_table.signature, "RSDT");
+
+/* compute the table entries in rsdt */
+tables_nr = (rsdt_table.length - sizeof(AcpiRsdtDescriptorRev1)) /
+sizeof(uint32_t);
+g_assert_cmpint(tables_nr, >, 0);
+
+/* get the addresses of the tables pointed by rsdt */
+tables = g_new0(uint32_t, tables_nr);
+ACPI_READ_ARRAY_PTR(tables, tables_nr, rsdt);
+
+for (i = 0; i < tables_nr; i++) {
+ACPI_READ_TABLE_HEADER(_table, tables[i]);
+if (!strncmp((char *)ssdt_table.oem_table_id, "VMGENID", 7)) {
+/* the first entry in the table should be VGIA
+ * That's all we need */
+ACPI_READ_FIELD(vgid_table.name_op, tables[i]);
+g_assert(vgid_table.name_op == 0x08);  /* name */
+ACPI_READ_ARRAY(vgid_table.vgia, tables[i]);
+g_assert(memcmp(vgid_table.vgia, "VGIA", 4) == 0);
+ACPI_READ_FIELD(vgid_table.val_op, tables[i]);
+g_assert(vgid_table.val_op == 0x0E);  /* qword */
+ACPI_READ_FIELD(vgid_table.vgia_val, tables[i]);
+return vgid_table.vgia_val;
+}
+}
+

[Qemu-devel] [PATCH v4 7/9] PC: Support dynamic sysbus on pc_i440fx

2017-01-24 Thread ben

From: Ben Warren 

This allows pc_i440fx-based machines to add new devices such as VM Generation ID
directly to the sysbus.

Signed-off-by: Ben Warren 
---
 hw/i386/pc_piix.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 9f102aa..c8ad99c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -435,6 +435,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
 m->hot_add_cpu = pc_hot_add_cpu;
 m->default_machine_opts = "firmware=bios-256k.bin";
 m->default_display = "std";
+m->has_dynamic_sysbus = true;
 }
 
 static void pc_i440fx_2_9_machine_options(MachineClass *m)
-- 
2.7.4

[Qemu-devel] [PATCH v4 2/9] linker-loader: Add new 'allocate and return address' cmd

2017-01-24 Thread ben

From: Ben Warren 

This adds a new linker-loader command to instruct the guest to allocate
memory for a fw_cfg file and write the address back into another
writeable fw_cfg file.  Knowing this address, QEMU can then write into
guest memory at runtime.

Signed-off-by: Ben Warren 
---
 hw/acpi/bios-linker-loader.c | 71 ++--
 include/hw/acpi/bios-linker-loader.h |  7 
 2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/bios-linker-loader.c b/hw/acpi/bios-linker-loader.c
index d963ebe..1d991ba 100644
--- a/hw/acpi/bios-linker-loader.c
+++ b/hw/acpi/bios-linker-loader.c
@@ -78,6 +78,22 @@ struct BiosLinkerLoaderEntry {
 uint32_t length;
 } cksum;
 
+/*
+ * COMMAND_ALLOCATE_RETURN_ADDR - allocate a table from @alloc_ret_file
+ * subject to @alloc_ret_align alignment (must be power of 2)
+ * and @alloc_ret_zone (can be HIGH or FSEG) requirements.
+ * Additionally, return the address of the allocation in
+ * @addr_file.
+ *
+ * This may be used instead of COMMAND_ALLOCATE
+ */
+struct {
+char alloc_ret_file[BIOS_LINKER_LOADER_FILESZ];
+uint32_t alloc_ret_align;
+uint8_t alloc_ret_zone;
+char alloc_ret_addr_file[BIOS_LINKER_LOADER_FILESZ];
+};
+
 /* padding */
 char pad[124];
 };
@@ -85,9 +101,10 @@ struct BiosLinkerLoaderEntry {
 typedef struct BiosLinkerLoaderEntry BiosLinkerLoaderEntry;
 
 enum {
-BIOS_LINKER_LOADER_COMMAND_ALLOCATE = 0x1,
-BIOS_LINKER_LOADER_COMMAND_ADD_POINTER  = 0x2,
-BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM = 0x3,
+BIOS_LINKER_LOADER_COMMAND_ALLOCATE  = 0x1,
+BIOS_LINKER_LOADER_COMMAND_ADD_POINTER   = 0x2,
+BIOS_LINKER_LOADER_COMMAND_ADD_CHECKSUM  = 0x3,
+BIOS_LINKER_LOADER_COMMAND_ALLOCATE_RET_ADDR = 0x4,
 };
 
 enum {
@@ -278,3 +295,51 @@ void bios_linker_loader_add_pointer(BIOSLinker *linker,
 
 g_array_append_vals(linker->cmd_blob, , sizeof entry);
 }
+
+/*
+ * bios_linker_loader_alloc_ret_addr: ask guest to load file into guest memory
+ * and patch the address in another file
+ *
+ * @linker: linker object instance
+ * @data_file: name of the file blob to be loaded
+ * @file_blob: pointer to blob corresponding to @file_name
+ * @alloc_align: required minimal alignment in bytes. Must be a power of 2.
+ * @alloc_fseg: request allocation in FSEG zone (useful for the RSDP ACPI 
table)
+ * @addr_file: destination file that will contain the address.
+ * This must already exist
+ *
+ * Note: this command must precede any other linker command that uses
+ * the data file.
+ */
+void bios_linker_loader_alloc_ret_addr(BIOSLinker *linker,
+  const char *data_file,
+  GArray *file_blob,
+  uint32_t alloc_align,
+  bool alloc_fseg,
+  const char *addr_file)
+{
+BiosLinkerLoaderEntry entry;
+BiosLinkerFileEntry d_file = { g_strdup(data_file), file_blob};
+
+/* Address file is expected to already be loaded */
+const BiosLinkerFileEntry *a_file =
+bios_linker_find_file(linker, addr_file);
+
+assert(!(alloc_align & (alloc_align - 1)));
+assert(a_file);
+
+g_array_append_val(linker->file_list, d_file);
+
+memset(, 0, sizeof entry);
+strncpy(entry.alloc_ret_file, data_file,
+sizeof entry.alloc_ret_file - 1);
+strncpy(entry.alloc_ret_addr_file, addr_file,
+sizeof entry.alloc_ret_addr_file - 1);
+entry.command = cpu_to_le32(BIOS_LINKER_LOADER_COMMAND_ALLOCATE_RET_ADDR);
+entry.alloc.align = cpu_to_le32(alloc_align);
+entry.alloc.zone = alloc_fseg ? BIOS_LINKER_LOADER_ALLOC_ZONE_FSEG :
+BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH;
+
+/* Alloc entries must come first, so prepend them */
+g_array_append_vals(linker->cmd_blob, , sizeof entry);
+}
diff --git a/include/hw/acpi/bios-linker-loader.h 
b/include/hw/acpi/bios-linker-loader.h
index fa1e5d1..69953e6 100644
--- a/include/hw/acpi/bios-linker-loader.h
+++ b/include/hw/acpi/bios-linker-loader.h
@@ -26,5 +26,12 @@ void bios_linker_loader_add_pointer(BIOSLinker *linker,
 const char *src_file,
 uint32_t src_offset);
 
+void bios_linker_loader_alloc_ret_addr(BIOSLinker *linker,
+   const char *data_file,
+   GArray *file_blob,
+   uint32_t alloc_align,
+   bool alloc_fseg,
+   const char *addr_file);
+
 void bios_linker_loader_cleanup(BIOSLinker *linker);
 #endif
-- 
2.7.4

[Qemu-devel] [PATCH v4 0/9] Add support for VM Generation ID

2017-01-24 Thread ben

From: Ben Warren 

This patch set adds support for passing a GUID to Windows guests.  It is a
re-implementation of previous patch sets written by Igor Mammedov et al, but
this time passing the GUID data as a fw_cfg blob.

This patch set has dependencies on new guest functionality, in particular the
support for a new linker-loader command and the ability to write back data
to QEMU over a DMA link.  Work is in flight in both SeaBIOS and OVMF to support 
this.
The SeaBIOS project is awaiting committing of this patch set before pulling in 
the
matching patches:
https://www.seabios.org/pipermail/seabios/2017-January/011097.html

v4->v3:
- Rebased to top of tree.
- Re-added document patch that was accidentally dropped from the last 
revision.
- Added VMState functionality so that VGIA is restored properly.
- Added Unit tests
v2->v3:
- Added second writeable fw_cfg for storing the VM Generaiton ID
  address.  This uses a new linker-loader command for instructing the
  guest to write back the allocated address.  A patch for SeaBIOS has been
  submitted 
(https://www.seabios.org/pipermail/seabios/2017-January/011079.html)
  and the resulting binary will need to be pulled into QEMU once accepted.
- Setting VM Generation ID by command line or qmp/hmp now accepts an "auto"
  value, whereby QEMU generates a random GUID.
- Incorporated review comments from v2 mainly around code styling and AML 
syntax
- Changed to use the E05 ACPI event instead of E00
v1->v2:
- Removed "changed" boolean parameter as it is unneeded
- Added ACPI Notify logic
- Style changes to pass checkpatch.pl
- Added support for dynamic sysbus to pc_piix boards

Ben Warren (7):
  ACPI: Add a function for building named qword entries
  linker-loader: Add new 'allocate and return address' cmd
  docs: VM Generation ID device description
  ACPI: Add Virtual Machine Generation ID support
  PC: Support dynamic sysbus on pc_i440fx
  tests: Move reusable ACPI macros into a new header file
  tests: Add unit tests for the VM Generation ID feature

Igor Mammedov (2):
  qmp/hmp: add query-vm-generation-id and 'info vm-generation-id'
commands
  qmp/hmp: add set-vm-generation-id commands

 default-configs/i386-softmmu.mak |   1 +
 default-configs/x86_64-softmmu.mak   |   1 +
 docs/specs/vmgenid.txt   |  40 ++
 hmp-commands-info.hx |  13 ++
 hmp-commands.hx  |  13 ++
 hmp.c|  21 +++
 hmp.h|   2 +
 hw/acpi/Makefile.objs|   1 +
 hw/acpi/aml-build.c  |  28 
 hw/acpi/bios-linker-loader.c |  71 +-
 hw/acpi/vmgenid.c| 244 +++
 hw/i386/acpi-build.c |   9 ++
 hw/i386/pc_piix.c|   1 +
 include/hw/acpi/acpi_dev_interface.h |   1 +
 include/hw/acpi/aml-build.h  |   4 +
 include/hw/acpi/bios-linker-loader.h |   7 +
 include/hw/acpi/vmgenid.h|  32 +
 qapi-schema.json |  31 +
 stubs/Makefile.objs  |   1 +
 stubs/vmgenid.c  |  14 ++
 tests/Makefile.include   |   2 +
 tests/acpi-utils.h   |  75 +++
 tests/bios-tables-test.c |  72 +--
 tests/vmgenid-test.c | 184 ++
 24 files changed, 794 insertions(+), 74 deletions(-)
 create mode 100644 docs/specs/vmgenid.txt
 create mode 100644 hw/acpi/vmgenid.c
 create mode 100644 include/hw/acpi/vmgenid.h
 create mode 100644 stubs/vmgenid.c
 create mode 100644 tests/acpi-utils.h
 create mode 100644 tests/vmgenid-test.c

-- 
2.7.4

[Qemu-devel] [PATCH v4 4/9] ACPI: Add Virtual Machine Generation ID support

2017-01-24 Thread ben

From: Ben Warren 

This implements the VM Generation ID feature by passing a 128-bit
GUID to the guest via a fw_cfg blob.
Any time the GUID changes, and ACPI notify event is sent to the guest

The user interface is a simple device with one parameter:
 - guid (string, must be "auto" or in UUID format
   ----)

Signed-off-by: Ben Warren 
---
 default-configs/i386-softmmu.mak |   1 +
 default-configs/x86_64-softmmu.mak   |   1 +
 hw/acpi/Makefile.objs|   1 +
 hw/acpi/vmgenid.c| 244 +++
 hw/i386/acpi-build.c |   9 ++
 include/hw/acpi/acpi_dev_interface.h |   1 +
 include/hw/acpi/vmgenid.h|  32 +
 7 files changed, 289 insertions(+)
 create mode 100644 hw/acpi/vmgenid.c
 create mode 100644 include/hw/acpi/vmgenid.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 0b51360..b2bccf6 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -56,3 +56,4 @@ CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
+CONFIG_ACPI_VMGENID=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 7f89503..c6bd310 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -56,3 +56,4 @@ CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
+CONFIG_ACPI_VMGENID=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 6acf798..11c35bc 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -5,6 +5,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
+common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
new file mode 100644
index 000..63cb039
--- /dev/null
+++ b/hw/acpi/vmgenid.c
@@ -0,0 +1,244 @@
+/*
+ *  Virtual Machine Generation ID Device
+ *
+ *  Copyright (C) 2017 Skyport Systems.
+ *
+ *  Author: Ben Warren 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qmp-commands.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/vmgenid.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+
+Object *find_vmgenid_dev(Error **errp)
+{
+Object *obj = object_resolve_path_type("", VMGENID_DEVICE, NULL);
+if (!obj) {
+error_setg(errp, VMGENID_DEVICE " is not found");
+}
+return obj;
+}
+
+void vmgenid_build_acpi(GArray *table_offsets, GArray *table_data,
+ BIOSLinker *linker)
+{
+Object *obj;
+VmGenIdState *s;
+GArray *guid, *vgia;
+Aml *ssdt, *dev, *scope, *method, *addr, *if_ctx;
+uint32_t vgia_offset;
+
+obj = find_vmgenid_dev(NULL);
+if (!obj) {
+return;
+}
+s = VMGENID(obj);
+
+acpi_add_table(table_offsets, table_data);
+
+guid = g_array_new(false, true, sizeof(s->guid.data));
+g_array_append_val(guid, s->guid.data);
+vgia = g_array_new(false, true, sizeof(uint64_t));
+g_array_append_val(vgia, s->vgia);
+
+/* Put this in a separate SSDT table */
+ssdt = init_aml_allocator();
+
+/* Reserve space for header */
+acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+/* Storage for the GUID address */
+vgia_offset = table_data->len +
+build_append_named_qword(ssdt->buf, "VGIA");
+scope = aml_scope("\\_SB");
+dev = aml_device("VGEN");
+aml_append(dev, aml_name_decl("_HID", aml_string("QEMUVGID")));
+aml_append(dev, aml_name_decl("_CID", aml_string("VM_Gen_Counter")));
+aml_append(dev, aml_name_decl("_DDN", aml_string("VM_Gen_Counter")));
+
+/* Simple status method to check that address is linked and non-zero */
+method = aml_method("_STA", 0, AML_NOTSERIALIZED);
+addr = aml_local(0);
+aml_append(method, aml_store(aml_int(0xf), addr));
+if_ctx = aml_if(aml_equal(aml_name("VGIA"), aml_int(0)));
+aml_append(if_ctx, aml_store(aml_int(0), addr));
+aml_append(method, if_ctx);
+aml_append(method, aml_return(addr));
+aml_append(dev, method);
+
+/* the ADDR method returns two 32-bit words representing the lower and
+ * upper halves * of the physical address of the fw_cfg blob
+ * (holding the GUID) */
+method = aml_method("ADDR", 0, AML_NOTSERIALIZED);
+
+addr = aml_local(0);
+aml_append(method, aml_store(aml_package(2), addr));
+
+aml_append(method, aml_store(aml_and(aml_name("VGIA"),
+aml_int(0x), NULL),

[Qemu-devel] [PATCH v4 3/9] docs: VM Generation ID device description

2017-01-24 Thread ben

From: Ben Warren 

This patch is based off an earlier version by Gal Hammer (gham...@redhat.com)

Signed-off-by: Gal Hammer 
Signed-off-by: Ben Warren 
---
 docs/specs/vmgenid.txt | 40 
 1 file changed, 40 insertions(+)
 create mode 100644 docs/specs/vmgenid.txt

diff --git a/docs/specs/vmgenid.txt b/docs/specs/vmgenid.txt
new file mode 100644
index 000..d36ed5b
--- /dev/null
+++ b/docs/specs/vmgenid.txt
@@ -0,0 +1,40 @@
+VIRTUAL MACHINE GENERATION ID
+=
+
+Copyright (C) 2016 Red Hat, Inc.
+Copyright (C) 2017 Skyport Systems, Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+
+===
+
+The VM generation ID (vmgenid) device is an emulated device which
+exposes a 128-bit, cryptographically random, integer value identifier.
+This allows management applications (e.g. libvirt) to notify the guest
+operating system when the virtual machine is executed with a different
+configuration (e.g. snapshot execution or creation from a template).
+
+This is specified on the web at: http://go.microsoft.com/fwlink/?LinkId=260709
+
+---
+
+The vmgenid device is a sysbus device with the ACPI ID "QEMUGVID".
+
+The device has one properties, which can be set using the command line
+argument or the QMP interface:
+ guid - sets the value of the GUID.  A special value "auto" instructs
+QEMU to generate a new random GUID.
+For example:
+QEMU  -device vmgenid,guid="324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
+
+Or to change guid in runtime use:
+ set-vm-generation-id guid=124e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87
+ set-vm-generation-id guid=auto
+
+According to the specification, any change to the GUID executes an
+ACPI notification. The vmgenid device triggers the \_GPE._E05 handler
+which executes the ACPI Notify operation.
+
+Although not specified in Microsoft's document, it is assumed that the
+device is expected to use the little-endian system.
-- 
2.7.4

[Qemu-devel] [PATCH v4 1/9] ACPI: Add a function for building named qword entries

2017-01-24 Thread ben

From: Ben Warren 

This is initially used to patch a 64-bit address into the VM Generation ID SSDT

Signed-off-by: Ben Warren 
---
 hw/acpi/aml-build.c | 28 
 include/hw/acpi/aml-build.h |  4 
 2 files changed, 32 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index b2a1e40..dc4edc2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -285,6 +285,34 @@ build_append_named_dword(GArray *array, const char 
*name_format, ...)
 return offset;
 }
 
+/*
+ * Build NAME(, 0x) where 0x is encoded as a qword,
+ * and return the offset to 0x for runtime patching.
+ *
+ * Warning: runtime patching is best avoided. Only use this as
+ * a replacement for DataTableRegion (for guests that don't
+ * support it).
+ */
+int
+build_append_named_qword(GArray *array, const char *name_format, ...)
+{
+int offset;
+va_list ap;
+
+build_append_byte(array, 0x08); /* NameOp */
+va_start(ap, name_format);
+build_append_namestringv(array, name_format, ap);
+va_end(ap);
+
+build_append_byte(array, 0x0E); /* QWordPrefix */
+
+offset = array->len;
+build_append_int_noprefix(array, 0x, 8);
+assert(array->len == offset + 8);
+
+return offset;
+}
+
 static GPtrArray *alloc_list;
 
 static Aml *aml_alloc(void)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 559326c..dbf63cf 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -385,6 +385,10 @@ int
 build_append_named_dword(GArray *array, const char *name_format, ...)
 GCC_FMT_ATTR(2, 3);
 
+int
+build_append_named_qword(GArray *array, const char *name_format, ...)
+GCC_FMT_ATTR(2, 3);
+
 void build_srat_memory(AcpiSratMemoryAffinity *numamem, uint64_t base,
uint64_t len, int node, MemoryAffinityFlags flags);
 
-- 
2.7.4

[Qemu-devel] [PATCH v4 8/9] tests: Move reusable ACPI macros into a new header file

2017-01-24 Thread ben

From: Ben Warren 

Also usable by upcoming VM Generation ID tests

Signed-off-by: Ben Warren 
---
 tests/acpi-utils.h   | 75 
 tests/bios-tables-test.c | 72 +-
 2 files changed, 76 insertions(+), 71 deletions(-)
 create mode 100644 tests/acpi-utils.h

diff --git a/tests/acpi-utils.h b/tests/acpi-utils.h
new file mode 100644
index 000..d5e5eff
--- /dev/null
+++ b/tests/acpi-utils.h
@@ -0,0 +1,75 @@
+#ifndef TEST_ACPI_UTILS_H
+#define TEST_ACPI_UTILS_H
+
+/* DSDT and SSDTs format */
+typedef struct {
+AcpiTableHeader header;
+gchar *aml;/* aml bytecode from guest */
+gsize aml_len;
+gchar *aml_file;
+gchar *asl;/* asl code generated from aml */
+gsize asl_len;
+gchar *asl_file;
+bool tmp_files_retain;   /* do not delete the temp asl/aml */
+} QEMU_PACKED AcpiSdtTable;
+
+#define ACPI_READ_FIELD(field, addr)   \
+do {   \
+switch (sizeof(field)) {   \
+case 1:\
+field = readb(addr);   \
+break; \
+case 2:\
+field = readw(addr);   \
+break; \
+case 4:\
+field = readl(addr);   \
+break; \
+case 8:\
+field = readq(addr);   \
+break; \
+default:   \
+g_assert(false);   \
+}  \
+addr += sizeof(field);  \
+} while (0);
+
+#define ACPI_READ_ARRAY_PTR(arr, length, addr)  \
+do {\
+int idx;\
+for (idx = 0; idx < length; ++idx) {\
+ACPI_READ_FIELD(arr[idx], addr);\
+}   \
+} while (0);
+
+#define ACPI_READ_ARRAY(arr, addr)   \
+ACPI_READ_ARRAY_PTR(arr, sizeof(arr) / sizeof(arr[0]), addr)
+
+#define ACPI_READ_TABLE_HEADER(table, addr)  \
+do { \
+ACPI_READ_FIELD((table)->signature, addr);   \
+ACPI_READ_FIELD((table)->length, addr);  \
+ACPI_READ_FIELD((table)->revision, addr);\
+ACPI_READ_FIELD((table)->checksum, addr);\
+ACPI_READ_ARRAY((table)->oem_id, addr);  \
+ACPI_READ_ARRAY((table)->oem_table_id, addr);\
+ACPI_READ_FIELD((table)->oem_revision, addr);\
+ACPI_READ_ARRAY((table)->asl_compiler_id, addr); \
+ACPI_READ_FIELD((table)->asl_compiler_revision, addr);   \
+} while (0);
+
+#define ACPI_ASSERT_CMP(actual, expected) do { \
+uint32_t ACPI_ASSERT_CMP_le = cpu_to_le32(actual); \
+char ACPI_ASSERT_CMP_str[5] = {}; \
+memcpy(ACPI_ASSERT_CMP_str, _ASSERT_CMP_le, 4); \
+g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \
+} while (0)
+
+#define ACPI_ASSERT_CMP64(actual, expected) do { \
+uint64_t ACPI_ASSERT_CMP_le = cpu_to_le64(actual); \
+char ACPI_ASSERT_CMP_str[9] = {}; \
+memcpy(ACPI_ASSERT_CMP_str, _ASSERT_CMP_le, 8); \
+g_assert_cmpstr(ACPI_ASSERT_CMP_str, ==, expected); \
+} while (0)
+
+#endif  /* TEST_ACPI_UTILS_H */
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 5404805..c642f7f 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -17,6 +17,7 @@
 #include "hw/acpi/acpi-defs.h"
 #include "hw/smbios/smbios.h"
 #include "qemu/bitmap.h"
+#include "acpi-utils.h"
 #include "boot-sector.h"
 
 #define MACHINE_PC "pc"
@@ -24,18 +25,6 @@
 
 #define ACPI_REBUILD_EXPECTED_AML "TEST_ACPI_REBUILD_AML"
 
-/* DSDT and SSDTs format */
-typedef struct {
-AcpiTableHeader header;
-gchar *aml;/* aml bytecode from guest */
-gsize aml_len;
-gchar *aml_file;
-gchar *asl;/* asl code generated from aml */
-gsize asl_len;
-gchar *asl_file;
-bool tmp_files_retain;   /* do not delete the temp asl/aml */
-} QEMU_PACKED AcpiSdtTable;
-
 typedef struct {
 const char *machine;
 const char *variant;
@@ -53,65 +42,6 @@ typedef struct {
 int required_struct_types_len;
 } test_data;
 
-#define ACPI_READ_FIELD(field, addr)   \
-do {   \
-switch (sizeof(field)) {   \
-case 1:\
-field = readb(addr);   \
-

[Qemu-devel] [PATCH v4 6/9] qmp/hmp: add set-vm-generation-id commands

2017-01-24 Thread ben

From: Igor Mammedov 

Add set-vm-generation-id command to set Virtual Machine
Generation ID counter.

QMP command example:
{ "execute": "set-vm-generation-id",
  "arguments": {
  "guid": "324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87"
  }
}

HMP command example:
set-vm-generation-id guid=324e6eaf-d1d1-4bf6-bf41-b9bb6c91fb87

Signed-off-by: Igor Mammedov 
Reviewed-by: Eric Blake 
Signed-off-by: Ben Warren 
---
 hmp-commands.hx  | 13 +
 hmp.c| 12 
 hmp.h|  1 +
 qapi-schema.json | 11 +++
 stubs/vmgenid.c  |  6 ++
 5 files changed, 43 insertions(+)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8819281..56744aa 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1775,5 +1775,18 @@ ETEXI
 },
 
 STEXI
+@item set-vm-generation-id @var{uuid}
+Set Virtual Machine Generation ID counter to @var{guid}
+ETEXI
+
+{
+.name   = "set-vm-generation-id",
+.args_type  = "guid:s",
+.params = "guid",
+.help   = "Set Virtual Machine Generation ID counter",
+.cmd = hmp_set_vm_generation_id,
+},
+
+STEXI
 @end table
 ETEXI
diff --git a/hmp.c b/hmp.c
index c2280e0..3a4db8b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2573,3 +2573,15 @@ void hmp_info_vm_generation_id(Monitor *mon, const QDict 
*qdict)
 }
 qapi_free_GuidInfo(info);
 }
+
+void hmp_set_vm_generation_id(Monitor *mon, const QDict *qdict)
+{
+Error *errp = NULL;
+const char *guid = qdict_get_str(qdict, "guid");
+
+qmp_set_vm_generation_id(guid, );
+if (errp) {
+hmp_handle_error(mon, );
+return;
+}
+}
diff --git a/hmp.h b/hmp.h
index 799fd37..e0ac1e8 100644
--- a/hmp.h
+++ b/hmp.h
@@ -138,5 +138,6 @@ void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict 
*qdict);
 void hmp_info_dump(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
 void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
+void hmp_set_vm_generation_id(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 413ac52..e2ed75c 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -6046,3 +6046,14 @@
 # Since 2.9
 ##
 { 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
+
+##
+# @set-vm-generation-id:
+#
+# Set Virtual Machine Generation ID
+#
+# @guid: new GUID to set as Virtual Machine Generation ID
+#
+# Since 2.9
+##
+{ 'command': 'set-vm-generation-id', 'data': {'guid': 'str'} }
diff --git a/stubs/vmgenid.c b/stubs/vmgenid.c
index 8c448ac..d25d41b 100644
--- a/stubs/vmgenid.c
+++ b/stubs/vmgenid.c
@@ -6,3 +6,9 @@ GuidInfo *qmp_query_vm_generation_id(Error **errp)
 error_setg(errp, "this command is not currently supported");
 return NULL;
 }
+
+void qmp_set_vm_generation_id(const char *guid, Error **errp)
+{
+error_setg(errp, "this command is not currently supported");
+return;
+}
-- 
2.7.4

[Qemu-devel] [PATCH v4 5/9] qmp/hmp: add query-vm-generation-id and 'info vm-generation-id' commands

2017-01-24 Thread ben

From: Igor Mammedov 

Add commands to query Virtual Machine Generation ID counter.

QMP command example:
{ "execute": "query-vm-generation-id" }

HMP command example:
info vm-generation-id

Signed-off-by: Igor Mammedov 
Reviewed-by: Eric Blake 
Signed-off-by: Ben Warren 
---
 hmp-commands-info.hx | 13 +
 hmp.c|  9 +
 hmp.h|  1 +
 qapi-schema.json | 20 
 stubs/Makefile.objs  |  1 +
 stubs/vmgenid.c  |  8 
 6 files changed, 52 insertions(+)
 create mode 100644 stubs/vmgenid.c

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 55d50c4..b0bb052 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -802,6 +802,19 @@ Show information about hotpluggable CPUs
 ETEXI
 
 STEXI
+@item info vm-generation-id
+Show Virtual Machine Generation ID
+ETEXI
+
+{
+.name   = "vm-generation-id",
+.args_type  = "",
+.params = "",
+.help   = "Show Virtual Machine Generation ID",
+.cmd = hmp_info_vm_generation_id,
+},
+
+STEXI
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index 8522efe..c2280e0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -2564,3 +2564,12 @@ void hmp_hotpluggable_cpus(Monitor *mon, const QDict 
*qdict)
 
 qapi_free_HotpluggableCPUList(saved);
 }
+
+void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict)
+{
+GuidInfo *info = qmp_query_vm_generation_id(NULL);
+if (info) {
+monitor_printf(mon, "%s\n", info->guid);
+}
+qapi_free_GuidInfo(info);
+}
diff --git a/hmp.h b/hmp.h
index 05daf7c..799fd37 100644
--- a/hmp.h
+++ b/hmp.h
@@ -137,5 +137,6 @@ void hmp_rocker_of_dpa_flows(Monitor *mon, const QDict 
*qdict);
 void hmp_rocker_of_dpa_groups(Monitor *mon, const QDict *qdict);
 void hmp_info_dump(Monitor *mon, const QDict *qdict);
 void hmp_hotpluggable_cpus(Monitor *mon, const QDict *qdict);
+void hmp_info_vm_generation_id(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index ac55f4a..413ac52 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -6026,3 +6026,23 @@
 #
 ##
 { 'command': 'query-hotpluggable-cpus', 'returns': ['HotpluggableCPU'] }
+
+##
+# @GuidInfo:
+#
+# GUID information.
+#
+# @guid: the globally unique identifier
+#
+# Since: 2.9
+##
+{ 'struct': 'GuidInfo', 'data': {'guid': 'str'} }
+
+##
+# @query-vm-generation-id:
+#
+# Show Virtual Machine Generation ID
+#
+# Since 2.9
+##
+{ 'command': 'query-vm-generation-id', 'returns': 'GuidInfo' }
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index a187295..0bffca6 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -35,3 +35,4 @@ stub-obj-y += qmp_pc_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += target-get-monitor-def.o
 stub-obj-y += pc_madt_cpu_entry.o
+stub-obj-y += vmgenid.o
diff --git a/stubs/vmgenid.c b/stubs/vmgenid.c
new file mode 100644
index 000..8c448ac
--- /dev/null
+++ b/stubs/vmgenid.c
@@ -0,0 +1,8 @@
+#include "qemu/osdep.h"
+#include "qmp-commands.h"
+
+GuidInfo *qmp_query_vm_generation_id(Error **errp)
+{
+error_setg(errp, "this command is not currently supported");
+return NULL;
+}
-- 
2.7.4

Re: [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices

2017-01-24 Thread Alex Williamson

On Wed, 25 Jan 2017 09:19:25 +0800
Jason Wang  wrote:

> On 2017年01月24日 03:40, Alex Williamson wrote:
> > On Mon, 23 Jan 2017 18:23:44 +0800
> > Jason Wang  wrote:
> >  
> >> On 2017年01月23日 11:34, Peter Xu wrote:  
> >>> On Mon, Jan 23, 2017 at 09:55:39AM +0800, Jason Wang wrote:  
>  On 2017年01月22日 17:04, Peter Xu wrote:  
> > On Sun, Jan 22, 2017 at 04:08:04PM +0800, Jason Wang wrote:
> >
> > [...]
> > 
> >>> +static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
> >>> +   uint16_t domain_id, 
> >>> hwaddr addr,
> >>> +   uint8_t am)
> >>> +{
> >>> +IntelIOMMUNotifierNode *node;
> >>> +VTDContextEntry ce;
> >>> +int ret;
> >>> +
> >>> +QLIST_FOREACH(node, &(s->notifiers_list), next) {
> >>> +VTDAddressSpace *vtd_as = node->vtd_as;
> >>> +ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
> >>> +   vtd_as->devfn, );
> >>> +if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
> >>> +vtd_page_walk(, addr, addr + (1 << am) * 
> >>> VTD_PAGE_SIZE,
> >>> +  vtd_page_invalidate_notify_hook,
> >>> +  (void *)_as->iommu, true);  
> >> Why not simply trigger the notifier here? (or is this vfio required?)  
> > Because we may only want to notify part of the region - we are with
> > mask here, but not exact size.
> >
> > Consider this: guest (with caching mode) maps 12K memory (4K*3 pages),
> > the mask will be extended to 16K in the guest. In that case, we need
> > to explicitly go over the page entry to know that the 4th page should
> > not be notified.  
>  I see. Then it was required by vfio only, I think we can add a fast path 
>  for
>  !CM in this case by triggering the notifier directly.  
> >>> I noted this down (to be further investigated in my todo), but I don't
> >>> know whether this can work, due to the fact that I think it is still
> >>> legal that guest merge more than one PSIs into one. For example, I
> >>> don't know whether below is legal:
> >>>
> >>> - guest invalidate page (0, 4k)
> >>> - guest map new page (4k, 8k)
> >>> - guest send single PSI of (0, 8k)
> >>>
> >>> In that case, it contains both map/unmap, and looks like it didn't
> >>> disobay the spec as well?  
> >> Not sure I get your meaning, you mean just send single PSI instead of two?
> >>  
> >>> 
>  Another possible issue is, consider (with CM) a 16K contiguous iova with 
>  the
>  last page has already been mapped. In this case, if we want to map first
>  three pages, when handling IOTLB invalidation, am would be 16K, then the
>  last page will be mapped twice. Can this lead some issue?  
> >>> I don't know whether guest has special handling of this kind of
> >>> request.  
> >> This seems quite usual I think? E.g iommu_flush_iotlb_psi() did:
> >>
> >> static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
> >> struct dmar_domain *domain,
> >> unsigned long pfn, unsigned int pages,
> >> int ih, int map)
> >> {
> >>   unsigned int mask = ilog2(__roundup_pow_of_two(pages));
> >>   uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
> >>   u16 did = domain->iommu_did[iommu->seq_id];
> >> ...
> >>
> >>  
> >>> Besides, imho to completely solve this problem, we still need that
> >>> per-domain tree. Considering that currently the tree is inside vfio, I
> >>> see this not a big issue as well.  
> >> Another issue I found is: with this series, VFIO_IOMMU_MAP_DMA seems
> >> become guest trigger-able. And since VFIO allocate its own structure to
> >> record dma mapping, this seems open a window for evil guest to exhaust
> >> host memory which is even worse.  
> > You're thinking of pci-assign, vfio does page accounting such that a
> > user can only lock pages up to their locked memory limit.  Exposing the
> > mapping ioctl within the guest is not a different problem from exposing
> > the ioctl to the host user from a vfio perspective.  Thanks,
> >
> > Alex
> >  
> 
> Yes, but what if an evil guest that maps all iovas to the same gpa?

Doesn't matter, we'd account that gpa each time it's mapped, so
effectively the locked memory limit is equal to the iova size the user
can map.  Thanks,

Alex

Re: [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices

2017-01-24 Thread Jason Wang




On 2017年01月24日 03:40, Alex Williamson wrote:

On Mon, 23 Jan 2017 18:23:44 +0800
Jason Wang  wrote:


On 2017年01月23日 11:34, Peter Xu wrote:

On Mon, Jan 23, 2017 at 09:55:39AM +0800, Jason Wang wrote:

On 2017年01月22日 17:04, Peter Xu wrote:

On Sun, Jan 22, 2017 at 04:08:04PM +0800, Jason Wang wrote:

[...]
  

+static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
+   uint16_t domain_id, hwaddr addr,
+   uint8_t am)
+{
+IntelIOMMUNotifierNode *node;
+VTDContextEntry ce;
+int ret;
+
+QLIST_FOREACH(node, &(s->notifiers_list), next) {
+VTDAddressSpace *vtd_as = node->vtd_as;
+ret = vtd_dev_to_context_entry(s, pci_bus_num(vtd_as->bus),
+   vtd_as->devfn, );
+if (!ret && domain_id == VTD_CONTEXT_ENTRY_DID(ce.hi)) {
+vtd_page_walk(, addr, addr + (1 << am) * VTD_PAGE_SIZE,
+  vtd_page_invalidate_notify_hook,
+  (void *)_as->iommu, true);

Why not simply trigger the notifier here? (or is this vfio required?)

Because we may only want to notify part of the region - we are with
mask here, but not exact size.

Consider this: guest (with caching mode) maps 12K memory (4K*3 pages),
the mask will be extended to 16K in the guest. In that case, we need
to explicitly go over the page entry to know that the 4th page should
not be notified.

I see. Then it was required by vfio only, I think we can add a fast path for
!CM in this case by triggering the notifier directly.

I noted this down (to be further investigated in my todo), but I don't
know whether this can work, due to the fact that I think it is still
legal that guest merge more than one PSIs into one. For example, I
don't know whether below is legal:

- guest invalidate page (0, 4k)
- guest map new page (4k, 8k)
- guest send single PSI of (0, 8k)

In that case, it contains both map/unmap, and looks like it didn't
disobay the spec as well?

Not sure I get your meaning, you mean just send single PSI instead of two?

  

Another possible issue is, consider (with CM) a 16K contiguous iova with the
last page has already been mapped. In this case, if we want to map first
three pages, when handling IOTLB invalidation, am would be 16K, then the
last page will be mapped twice. Can this lead some issue?

I don't know whether guest has special handling of this kind of
request.

This seems quite usual I think? E.g iommu_flush_iotlb_psi() did:

static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
struct dmar_domain *domain,
unsigned long pfn, unsigned int pages,
int ih, int map)
{
  unsigned int mask = ilog2(__roundup_pow_of_two(pages));
  uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
  u16 did = domain->iommu_did[iommu->seq_id];
...



Besides, imho to completely solve this problem, we still need that
per-domain tree. Considering that currently the tree is inside vfio, I
see this not a big issue as well.

Another issue I found is: with this series, VFIO_IOMMU_MAP_DMA seems
become guest trigger-able. And since VFIO allocate its own structure to
record dma mapping, this seems open a window for evil guest to exhaust
host memory which is even worse.

You're thinking of pci-assign, vfio does page accounting such that a
user can only lock pages up to their locked memory limit.  Exposing the
mapping ioctl within the guest is not a different problem from exposing
the ioctl to the host user from a vfio perspective.  Thanks,

Alex



Yes, but what if an evil guest that maps all iovas to the same gpa?

Thanks

Re: [Qemu-devel] [PATCH] cirrus: fix oob access issue

2017-01-24 Thread Li Qiang

2017-01-25 0:12 GMT+08:00 Laszlo Ersek :

> On 01/24/17 16:31, Wolfgang Bumiller wrote:
> > On Tue, Jan 24, 2017 at 01:29:58PM +0100, Gerd Hoffmann wrote:
> >>  if (pitch < 0) {
> >>  int64_t min = addr
> >> -+ ((int64_t)s->cirrus_blt_height-1) * pitch;
> >> ++ ((int64_t)s->cirrus_blt_height-1) * pitch
> >> +- s->cirrus_blt_width;
> >>  int32_t max = addr
> >>  + s->cirrus_blt_width;
> >>  if (min < 0 || max > s->vga.vram_size) {
> >>
> >
> > I believe this is incorrect. In this case (AFAIR), "addr" points to
> the
> > left-most pixel (= lowest address) of the bottom line (= highest
> > address).
> 
>  If I read the code correctly it is backwards *both* x and y axis, so
>  addr is the right-most pixel of the bottom line.
> >>>
> >>> What is "max" then? If "addr" is the right-most pixel of the bottom
> >>> line, then "max" has the highest address just past the rectangle, and
> >>> then adding anything non-negative to it makes no sense.
> >>
> >> That is (with the patch applied) inconsistent indeed.  We must either
> >> subtract s->cirrus_blt_width from min (addr == right-most), or add it to
> >> max (addr == left-most), but certainly not both.
> >>
> >>> ... Really as I remember it from the downstream review, the pitch is
> >>> negative (bottom-up), but the horizontal direction remains left to
> right.
> >>
> >> Looking at cirrus_vga_rop.h I see:
> >>  - cirrus_bitblt_rop_fwd_*() increment src and dst while walking the
> >>scanline, and
> >>  - cirrus_bitblt_rop_bkwd_*() decrement src and dst ...
> >>
> >> I still think x axis goes backwards too and therefore addr is the
> >> right-most pixel.
> >
> > I agree.
> >
> > Seeing how I've already been reading through that code I thought I'd
> > go over it again and too would say both min and max need to be adapted:
> >
> >  if (pitch < 0) {
> >  int64_t min = addr
> >  + ((int64_t)s->cirrus_blt_height-1) * pitch;
> > +- s->cirrus_blt_width;
> >  int32_t max = addr
> > -+ s->cirrus_blt_width;
> >  if (min < 0 || max > s->vga.vram_size) {
> >
> >
> > Here's the rest of my analysis in case anyone's interested (mostly to
> > justify the first part of this mail):
> >
> > -) Sign of the blit width/height & pitch values at the beginning:
> >
> > |s->cirrus_blt_width = (s->vga.gr[0x20] | (s->vga.gr[0x21] <<
> 8)) + 1;
> > |s->cirrus_blt_height = (s->vga.gr[0x22] | (s->vga.gr[0x23] <<
> 8)) + 1;
> > |s->cirrus_blt_dstpitch = (s->vga.gr[0x24] | (s->vga.gr[0x25]
> << 8));
> > |s->cirrus_blt_srcpitch = (s->vga.gr[0x26] | (s->vga.gr[0x27]
> << 8));
> >
> >vga.gr is an uint8_t[]
> >
> > ==> all values are positive at this point
> >
> > -) Backward blits invert the sign:
> >
> > |if (s->cirrus_blt_mode & CIRRUS_BLTMODE_BACKWARDS) {
> > |s->cirrus_blt_dstpitch = -s->cirrus_blt_dstpitch;
> > |s->cirrus_blt_srcpitch = -s->cirrus_blt_srcpitch;
> >
> >
> > Starting with the simple one:
> >
> > -) Forward blits from cirrus_vga_rop.h:
> >Width (which is positive) is subtracted from the pitch (which
> >is also positive), turning srcpitch and dstpitch into values
> >representing the remaining bytes after the current row.
> >The pattern below repeats for all functions:
> >
> >   |static void
> >   |glue(cirrus_bitblt_rop_fwd_, ROP_NAME)(CirrusVGAState *s,
> >   | uint8_t *dst,const uint8_t *src,
> >   | int dstpitch,int srcpitch,
> >   | int bltwidth,int bltheight)
> >   |{
> >   |int x,y;
> >   |dstpitch -= bltwidth;
> >   |srcpitch -= bltwidth;
> >   |
> >   |if (bltheight > 1 && (dstpitch < 0 || srcpitch < 0)) {
> >   |return;
> >   |}
> >   |
> >   |for (y = 0; y < bltheight; y++) {
> >   |for (x = 0; x < bltwidth; x++) {
> >   |ROP_OP(dst, *src);
> >   |dst++;
> >   |src++;
> >   |}
> >   |dst += dstpitch;
> >   |src += srcpitch;
> >   |}
> >   |}
> >
> >The first access to src/dst is unmodified, so the lowest accessed
> >address is the initial address.
> >
> >Some functions iterate through pairs:
> >   |   for (x = 0; x < bltwidth; x+=2) {
> >   |  p1 = *dst;
> >   |  p2 = *(dst+1);
> >Since the loop uses `x += 2` this `+1` should not go out of bounds
> >provided the width is even (which if not the case should at least
> >have an even pitch value).
> >
> > Conclusion for forward blits:
> >We use (start + pitch * (height-1) + width) which seems obvious for
> >the main pattern but we also need to assert that the 2nd example
> >above cannot

[Qemu-devel] [PULL 00/14] target/xtensa updates

2017-01-24 Thread Max Filippov

Hi Peter,

please pull the following batch of updates for target/xtensa.

The following changes since commit a470b33259bf82ef2336bfcd5d07640562d3f63b:

  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging 
(2016-12-22 19:23:51 +)

are available in the git repository at:

  git://github.com/OSLL/qemu-xtensa.git tags/20170124-xtensa

for you to fetch changes up to 3a3c9dc4ca2eaa612cbd5d4c85d674b15eadfb02:

  target-xtensa: implement RER/WER instructions (2017-01-16 19:19:03 -0800)


target/xtensa updates:

- refactor CCOUNT/CCOMPARE (use QEMU timers instead of instruction counting);
- support icount; run target/xtensa TCG tests with icount;
- implement SMP prerequisites: static vector selection, RUNSTALL and RER/WER.


Max Filippov (14):
  target/xtensa: add static vectors selection
  target/xtensa: implement RUNSTALL
  target/xtensa: refactor CCOUNT/CCOMPARE
  target/xtensa: support icount
  target/xtensa: don't continue translation after exception
  target/xtensa: tests: run tests with icount
  target/xtensa: tests: fix timer tests
  target/xtensa: tests: replace hardcoded interrupt masks
  target/xtensa: tests: add ccount write tests
  target/xtensa: fix ICACHE/DCACHE options detection
  target/xtensa: implement MEMCTL SR
  target/xtensa: tests: add memctl test
  target/xtensa: tests: clean up interrupt tests
  target-xtensa: implement RER/WER instructions

 hw/xtensa/pic_cpu.c   |  75 +++-
 target/xtensa/cpu.c   |  12 +-
 target/xtensa/cpu.h   |  60 --
 target/xtensa/helper.c|  13 ++
 target/xtensa/helper.h|   9 +-
 target/xtensa/op_helper.c |  73 ++--
 target/xtensa/overlay_tool.h  |  37 +-
 target/xtensa/translate.c | 245 ++
 tests/tcg/xtensa/Makefile |   2 +-
 tests/tcg/xtensa/test_interrupt.S |  27 +++--
 tests/tcg/xtensa/test_sr.S|   1 +
 tests/tcg/xtensa/test_timer.S | 105 +++-
 12 files changed, 456 insertions(+), 203 deletions(-)

-- 
2.1.4

Re: [Qemu-devel] update-linux-headers.sh fails on clean kernel dir

2017-01-24 Thread Sam Bobroff

On Tue, Jan 24, 2017 at 10:32:36AM +, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Jan 24, 2017 at 8:41 AM Sam Bobroff  wrote:
> 
> > Hi QEMU developers,
> >
> > If I run scripts/update-linux-headers.sh from a clean checkout of QEMU
> > and point it at a clean checkout of a recent linux kernel (4.10-rc1 or
> > later), it fails:
> >
> > $ scripts/update-linux-headers.sh ~/tmp/linux/
> >
> > ...
> >
> > scripts/Makefile.headersinst:62: *** Missing generated UAPI file
> > ./arch/arm/include/generated/uapi/asm/unistd-common.h.  Stop.
> >
> > This seems to be because the script passes the arch to the kernel
> > makefile using "SRCARCH" rather than "ARCH".
> >
> > (SRCARCH seems to be intended as an internal value, and setting it does
> > not propagate the setting to ARCH. Because ARCH is left empty, the
> > prerequisites that should generate unistd-common.h fail. If ARCH is set,
> > SRCARCH is set automatically and everything works.)
> >
> > Changing the script to use "ARCH" seems to fixe the problem.
> >
> 
> That's also what Documentation/kbuild/headers_install.txt documents.
> 
> 
> > (Note: when testing this be careful: unistd-common.h is not removed by
> > "make clean" in the kernel directory.)
> >
> > Does this seem correct?
> >
> > Should I send a patch even though it's a very small change?
> >
> >
> I think so, thanks

Thanks for the quick answer, I'll post a patch ASAP.

Perhaps you wouldn't mind commenting on this followup issue, since that
fix isn't enough to get the script working nicely...

With that fixed, update-linux-headers.sh still doesn't work as I would
expect. It seems that since commit
7e71da7f124fc47a2f2bc1fbfb32f0e8ee3e7d44
QEMU makes use of virtio_mmio.h but that file isn't installed by the
kernel's "install_headers" target so running update-linux-headers.sh
causes virtio_mmio.h to be deleted from the QEMU source rather than
updated.

(It looks like virtio_mmio.h was manually added by commit
5878b13642ddf8da44186ef93ac91319ff53668b )

It looks like the reason virtio_mmio.h isn't installed is that it's not
a uapi header and the reason this causes the deletion is that the file
has been placed in include/standard-headers/...

So, it seems there are several possible approaches:
(1) Move QEMU's copy of the file out of standard-headers.
(2) Ask the kernel to make virtio_mmio.h a uapi header.
(3) Change update-linux-headers.sh to manually copy that file.
(4) Manually copy that file in after running the script.

(1) seems to be the easiest to me, and (2) possibly the cleanest, but
what would you recommend? Should I try posting a patch for this too?

Thanks,
Sam.

[Qemu-devel] [RFC PATCH] linux-user: Add signal handling for x86_64

2017-01-24 Thread Pranith Kumar

Adopted from a previous patch posting:
https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg02079.html

CC: Allan Wirth 
CC: Peter Maydell 
Signed-off-by: Pranith Kumar 
---
 linux-user/signal.c  | 264 ---
 target/i386/cpu.h|   2 +
 target/i386/fpu_helper.c |  12 +++
 3 files changed, 242 insertions(+), 36 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 0a5bb4e26b..0248621d66 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -253,8 +253,7 @@ int do_sigprocmask(int how, const sigset_t *set, sigset_t 
*oldset)
 return 0;
 }
 
-#if !defined(TARGET_OPENRISC) && !defined(TARGET_UNICORE32) && \
-!defined(TARGET_X86_64)
+#if !defined(TARGET_OPENRISC) && !defined(TARGET_UNICORE32)
 /* Just set the guest's signal mask to the specified value; the
  * caller is assumed to have called block_signals() already.
  */
@@ -512,7 +511,7 @@ void signal_init(void)
 }
 }
 
-#if !(defined(TARGET_X86_64) || defined(TARGET_UNICORE32))
+#ifndef TARGET_UNICORE32
 /* Force a synchronously taken signal. The kernel force_sig() function
  * also forces the signal to "not blocked, not ignored", but for QEMU
  * that work is done in process_pending_signals().
@@ -819,9 +818,8 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 return ret;
 }
 
-#if defined(TARGET_I386) && TARGET_ABI_BITS == 32
-
-/* from the Linux kernel */
+#if defined(TARGET_I386)
+/* from the Linux kernel - /arch/x86/include/uapi/asm/sigcontext.h */
 
 struct target_fpreg {
 uint16_t significand[4];
@@ -835,7 +833,7 @@ struct target_fpxreg {
 };
 
 struct target_xmmreg {
-abi_ulong element[4];
+uint32_t element[4];
 };
 
 struct target_fpstate {
@@ -860,33 +858,117 @@ struct target_fpstate {
 abi_ulong padding[56];
 };
 
-#define X86_FXSR_MAGIC 0x
+struct target_fpstate_32 {
+/* Regular FPU environment */
+uint32_t cw;
+uint32_t sw;
+uint32_t tag;
+uint32_t ipoff;
+uint32_t cssel;
+uint32_t dataoff;
+uint32_t datasel;
+struct target_fpreg _st[8];
+uint16_t  status;
+uint16_t  magic;  /* 0x = regular FPU data only */
 
-struct target_sigcontext {
+/* FXSR FPU environment */
+uint32_t _fxsr_env[6];   /* FXSR FPU env is ignored */
+uint32_t mxcsr;
+uint32_t reserved;
+struct target_fpxreg _fxsr_st[8]; /* FXSR FPU reg data is ignored */
+struct target_xmmreg _xmm[8];
+uint32_t padding[56];
+};
+
+struct target_fpstate_64 {
+/* FXSAVE format */
+uint16_t cw;
+uint16_t sw;
+uint16_t twd;
+uint16_t fop;
+uint64_t rip;
+uint64_t rdp;
+uint32_t mxcsr;
+uint32_t mxcsr_mask;
+uint32_t st_space[32];
+uint32_t xmm_space[64];
+uint32_t reserved[24];
+};
+
+#ifndef TARGET_X86_64
+# define target_fpstate target_fpstate_32
+#else
+# define target_fpstate target_fpstate_64
+#endif
+
+struct target_sigcontext_32 {
 uint16_t gs, __gsh;
 uint16_t fs, __fsh;
 uint16_t es, __esh;
 uint16_t ds, __dsh;
-abi_ulong edi;
-abi_ulong esi;
-abi_ulong ebp;
-abi_ulong esp;
-abi_ulong ebx;
-abi_ulong edx;
-abi_ulong ecx;
-abi_ulong eax;
-abi_ulong trapno;
-abi_ulong err;
-abi_ulong eip;
+uint32_t edi;
+uint32_t esi;
+uint32_t ebp;
+uint32_t esp;
+uint32_t ebx;
+uint32_t edx;
+uint32_t ecx;
+uint32_t eax;
+uint32_t trapno;
+uint32_t err;
+uint32_t eip;
 uint16_t cs, __csh;
-abi_ulong eflags;
-abi_ulong esp_at_signal;
+uint32_t eflags;
+uint32_t esp_at_signal;
 uint16_t ss, __ssh;
-abi_ulong fpstate; /* pointer */
-abi_ulong oldmask;
-abi_ulong cr2;
+uint32_t fpstate; /* pointer */
+uint32_t oldmask;
+uint32_t cr2;
+};
+
+struct target_sigcontext_64 {
+uint64_t r8;
+uint64_t r9;
+uint64_t r10;
+uint64_t r11;
+uint64_t r12;
+uint64_t r13;
+uint64_t r14;
+uint64_t r15;
+
+uint64_t rdi;
+uint64_t rsi;
+uint64_t rbp;
+uint64_t rbx;
+uint64_t rdx;
+uint64_t rax;
+uint64_t rcx;
+uint64_t rsp;
+uint64_t rip;
+
+uint64_t eflags;
+
+uint16_t cs;
+uint16_t gs;
+uint16_t fs;
+uint16_t ss;
+
+uint64_t err;
+uint64_t trapno;
+uint64_t oldmask;
+uint64_t cr2;
+
+uint64_t fpstate; /* pointer */
+uint64_t padding[8];
 };
 
+#ifndef TARGET_X86_64
+# define target_sigcontext target_sigcontext_32
+#else
+# define target_sigcontext target_sigcontext_64
+#endif
+
+/* see Linux/include/uapi/asm-generic/ucontext.h */
 struct target_ucontext {
 abi_ulong tuc_flags;
 abi_ulong tuc_link;
@@ -895,8 +977,8 @@ struct target_ucontext {
 target_sigset_t   tuc_sigmask;  /* mask last for extensibility */
 };
 
-struct sigframe
-{
+#ifndef TARGET_X86_64
+struct sigframe {
 abi_ulong pretcode;

Re: [Qemu-devel] [PATCH] 9pfs: fix offset error in v9fs_xattr_read()

2017-01-24 Thread Greg Kurz

On Tue, 24 Jan 2017 14:24:23 -0800 (PST)
Stefano Stabellini  wrote:

> On Tue, 24 Jan 2017, Greg Kurz wrote:
> > On Mon, 23 Jan 2017 12:20:57 -0800 (PST)
> > Stefano Stabellini  wrote:
> >   
> > > On Sat, 21 Jan 2017, Greg Kurz wrote:  
> > > > The current code tries to copy `read_count' bytes starting at offset
> > > > `offset' from a `read_count`-sized iovec. This causes v9fs_pack() to
> > > > fail with ENOBUFS.
> > > > 
> > > > Since the PDU iovec is already partially filled with `offset' bytes,
> > > > let's skip them when creating `qiov_full' and have v9fs_pack() to
> > > > copy the whole of it. Moreover, this is consistent with the other
> > > > places where v9fs_init_qiov_from_pdu() is called.
> > > > 
> > > > This fixes commit "bcb8998fac16 9pfs: call v9fs_init_qiov_from_pdu
> > > > before v9fs_pack".
> > > 
> > > Sorry about that!
> > > 
> > >   
> > > > Signed-off-by: Greg Kurz 
> > > > ---
> > > >  hw/9pfs/9p.c |4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > > index fa58877570f6..f0eef1a3ef53 100644
> > > > --- a/hw/9pfs/9p.c
> > > > +++ b/hw/9pfs/9p.c
> > > > @@ -1685,8 +1685,8 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU 
> > > > *pdu, V9fsFidState *fidp,
> > > >  }
> > > >  offset += err;
> > > >  
> > > > -v9fs_init_qiov_from_pdu(_full, pdu, 0, read_count, false);
> > > > -err = v9fs_pack(qiov_full.iov, qiov_full.niov, offset,
> > > > +v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count, 
> > > > false);
> > > 
> > > v9fs_init_qiov_from_pdu calls init_in_iov_from_pdu passing read_count as
> > > size argument. offset is not passed to init_in_iov_from_pdu, it is only
> > > used to initialized qiov_full.
> > > 
> > > In other words, don't we need to:
> > > 
> > > v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count + offset, 
> > > false);
> > > 
> > > To make sure that qiov_full has "read_count + offset" bytes in it, and
> > > qiov_full is initialized skipping the first "offset" bytes?
> > >   
> > 
> > If we do that then qemu_iovec_concat() will skip offset bytes and then copy
> > read_count + offset bytes or am I missing something ?  
> 
> You are right, sorry. qemu_iovec_concat didn't do what I thought it
> would. In that case, I think we need to change v9fs_init_qiov_from_pdu
> (in addition to the changes already in this patch):
> 
> @@ -1659,12 +1659,14 @@ static void v9fs_init_qiov_from_pdu(QEMUIOVector 
> *qiov, V9fsPDU *pdu,
>  if (is_write) {
>  pdu->s->transport->init_out_iov_from_pdu(pdu, , );
>  } else {
> -pdu->s->transport->init_in_iov_from_pdu(pdu, , , size);
> +pdu->s->transport->init_in_iov_from_pdu(pdu, , , size + 
> skip);
>  }
>  
> The reason is the same as before: iov needs to have offset+read_count
> bytes in it, for qemu_iovec_concat to skip offset and iov still have
> read_count bytes left.
> 

Ah! I had missed that since virtio_init_in_iov_from_pdu doesn't use
the size argument, but I now understand :)

Thanks for the clarification.

> 
> > > > +err = v9fs_pack(qiov_full.iov, qiov_full.niov, 0,
> > > >  ((char *)fidp->fs.xattr.value) + off,
> > > >  read_count);
> > > >  qemu_iovec_destroy(_full);
> > > > 
> > 
> >

Re: [Qemu-devel] [PATCH v2] 9pfs: fix offset error in v9fs_xattr_read()

2017-01-24 Thread Stefano Stabellini

On Wed, 25 Jan 2017, Greg Kurz wrote:
> The current code tries to copy `read_count' bytes starting at offset
> `offset' from a `read_count`-sized iovec. This causes v9fs_pack() to
> fail with ENOBUFS.
> 
> Since the PDU iovec is already partially filled with `offset' bytes,
> let's skip them when creating `qiov_full' and have v9fs_pack() to
> copy the whole of it. Moreover, this is consistent with the other
> places where v9fs_init_qiov_from_pdu() is called.
> 
> This fixes commit "bcb8998fac16 9pfs: call v9fs_init_qiov_from_pdu
> before v9fs_pack".
> 
> Signed-off-by: Greg Kurz 

Reviewed-by: Stefano Stabellini 


> v2: - pass size + skip to the init_in_iov_from_pdu callback
> ---
>  hw/9pfs/9p.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> index fa58877570f6..f22d39064f6b 100644
> --- a/hw/9pfs/9p.c
> +++ b/hw/9pfs/9p.c
> @@ -1655,7 +1655,7 @@ static void v9fs_init_qiov_from_pdu(QEMUIOVector *qiov, 
> V9fsPDU *pdu,
>  if (is_write) {
>  pdu->s->transport->init_out_iov_from_pdu(pdu, , );
>  } else {
> -pdu->s->transport->init_in_iov_from_pdu(pdu, , , size);
> +pdu->s->transport->init_in_iov_from_pdu(pdu, , , size + 
> skip);
>  }
>  
>  qemu_iovec_init_external(, iov, niov);
> @@ -1685,8 +1685,8 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, 
> V9fsFidState *fidp,
>  }
>  offset += err;
>  
> -v9fs_init_qiov_from_pdu(_full, pdu, 0, read_count, false);
> -err = v9fs_pack(qiov_full.iov, qiov_full.niov, offset,
> +v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count, false);
> +err = v9fs_pack(qiov_full.iov, qiov_full.niov, 0,
>  ((char *)fidp->fs.xattr.value) + off,
>  read_count);
>  qemu_iovec_destroy(_full);
>

Re: [Qemu-devel] [PATCH v2 3/3] xen-platform: add missing disk unplug option

2017-01-24 Thread Stefano Stabellini

On Tue, 24 Jan 2017, Paul Durrant wrote:
> The Xen HVM unplug protocol [1] specifies a mechanism to allow guests to
> request unplug of 'aux' disks (which is stated to mean all IDE disks,
> except the primary master). This patch adds support for that unplug request.
> 
> NOTE: The semantics of what happens if unplug of all disks and 'aux' disks
>   is simultaneously requests is not clear. The patch makes that
>   assumption that an 'all' request overrides an 'aux' request.
> 
> [1] 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.markdown
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Stefano Stabellini 


> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> Cc: John Snow 
> ---
>  hw/i386/xen/xen_platform.c | 27 +++
>  hw/ide/piix.c  |  4 ++--
>  include/hw/ide.h   |  2 +-
>  3 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index 7d41ebb..6010f35 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -107,8 +107,12 @@ static void pci_unplug_nics(PCIBus *bus)
>  pci_for_each_device(bus, 0, unplug_nic, NULL);
>  }
>  
> -static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
> +static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque)
>  {
> +uint32_t flags = *(uint32_t *)opaque;
> +bool aux = (flags & UNPLUG_AUX_IDE_DISKS) &&
> +!(flags & UNPLUG_ALL_DISKS);
> +
>  /* We have to ignore passthrough devices */
>  if (!strcmp(d->name, "xen-pci-passthrough")) {
>  return;
> @@ -116,12 +120,14 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
> *o)
>  
>  switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
>  case PCI_CLASS_STORAGE_IDE:
> -pci_piix3_xen_ide_unplug(DEVICE(d));
> +pci_piix3_xen_ide_unplug(DEVICE(d), aux);
>  break;
>  
>  case PCI_CLASS_STORAGE_SCSI:
>  case PCI_CLASS_STORAGE_EXPRESS:
> -object_unparent(OBJECT(d));
> +if (!aux) {
> +object_unparent(OBJECT(d));
> +}
>  break;
>  
>  default:
> @@ -129,9 +135,9 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  }
>  }
>  
> -static void pci_unplug_disks(PCIBus *bus)
> +static void pci_unplug_disks(PCIBus *bus, uint32_t flags)
>  {
> -pci_for_each_device(bus, 0, unplug_disks, NULL);
> +pci_for_each_device(bus, 0, unplug_disks, );
>  }
>  
>  static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, 
> uint32_t val)
> @@ -144,17 +150,14 @@ static void platform_fixed_ioport_writew(void *opaque, 
> uint32_t addr, uint32_t v
>  /* Unplug devices.  Value is a bitmask of which devices to
> unplug, with bit 0 the disk devices, bit 1 the network
> devices, and bit 2 the non-primary-master IDE devices. */
> -if (val & UNPLUG_ALL_DISKS) {
> +if (val & (UNPLUG_ALL_DISKS | UNPLUG_AUX_IDE_DISKS)) {
>  DPRINTF("unplug disks\n");
> -pci_unplug_disks(pci_dev->bus);
> +pci_unplug_disks(pci_dev->bus, val);
>  }
>  if (val & UNPLUG_ALL_NICS) {
>  DPRINTF("unplug nics\n");
>  pci_unplug_nics(pci_dev->bus);
>  }
> -if (val & UNPLUG_AUX_IDE_DISKS) {
> -DPRINTF("unplug auxiliary disks not supported\n");
> -}
>  break;
>  }
>  case 2:
> @@ -335,14 +338,14 @@ static void xen_platform_ioport_writeb(void *opaque, 
> hwaddr addr,
>   * If VMDP was to control both disk and LAN it would use 4.
>   * If it controlled just disk or just LAN, it would use 8 below.
>   */
> -pci_unplug_disks(pci_dev->bus);
> +pci_unplug_disks(pci_dev->bus, UNPLUG_ALL_DISKS);
>  pci_unplug_nics(pci_dev->bus);
>  }
>  break;
>  case 8:
>  switch (val) {
>  case 1:
> -pci_unplug_disks(pci_dev->bus);
> +pci_unplug_disks(pci_dev->bus, UNPLUG_ALL_DISKS);
>  break;
>  case 2:
>  pci_unplug_nics(pci_dev->bus);
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index d5777fd..7e2d767 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -165,7 +165,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
> **errp)
>  pci_piix_init_ports(d);
>  }
>  
> -int pci_piix3_xen_ide_unplug(DeviceState *dev)
> +int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
>  {
>  PCIIDEState *pci_ide;
>  DriveInfo *di;
> @@ -174,7 +174,7 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev)
>  
>  pci_ide = PCI_IDE(dev);
>

Re: [Qemu-devel] [PATCH v2 2/3] xen-platform: add support for unplugging NVMe disks...

2017-01-24 Thread Stefano Stabellini

On Tue, 24 Jan 2017, Paul Durrant wrote:
> ...not just IDE and SCSI.
> 
> This patch allows the Xen tool-stack to fully support of NVMe as an
> emulated disk type.
> 
> Signed-off-by: Paul Durrant 

Please update docs/misc/hvm-emulated-unplug.markdown in the Xen
repository first. It might be also worth clarifying that `1` actually
means all disks, not just IDE disks. Then, please add a reference to
that commit in the description of this patch.


> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: "Michael S. Tsirkin" 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> ---
>  hw/i386/xen/xen_platform.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index f50915f..7d41ebb 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -120,6 +120,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  break;
>  
>  case PCI_CLASS_STORAGE_SCSI:
> +case PCI_CLASS_STORAGE_EXPRESS:
>  object_unparent(OBJECT(d));
>  break;
>  
> -- 
> 2.1.4
>

Re: [Qemu-devel] [PATCH v2 1/3] xen-platform: re-structure unplug_disks

2017-01-24 Thread Stefano Stabellini

On Tue, 24 Jan 2017, Paul Durrant wrote:
> The current code is poorly structured and potentially leads to multiple
> config space reads when one is sufficient. Also the UNPLUG_ALL_IDE_DISKS
> flag is mis-named since it also results in SCSI disks being unplugged.
> 
> This patch renames the flag and re-structures the code to be more
> efficient, and readable.
> 
> Signed-off-by: Paul Durrant 

Reviewed-by: Stefano Stabellini 

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> 
> v2:
> - Fix style issue
> ---
>  hw/i386/xen/xen_platform.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index 2e1e543..f50915f 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -88,7 +88,7 @@ static void log_writeb(PCIXenPlatformState *s, char val)
>  }
>  
>  /* Xen Platform, Fixed IOPort */
> -#define UNPLUG_ALL_IDE_DISKS 1
> +#define UNPLUG_ALL_DISKS 1
>  #define UNPLUG_ALL_NICS 2
>  #define UNPLUG_AUX_IDE_DISKS 4
>  
> @@ -110,14 +110,21 @@ static void pci_unplug_nics(PCIBus *bus)
>  static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  {
>  /* We have to ignore passthrough devices */
> -if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
> -PCI_CLASS_STORAGE_IDE
> -&& strcmp(d->name, "xen-pci-passthrough") != 0) {
> +if (!strcmp(d->name, "xen-pci-passthrough")) {
> +return;
> +}
> +
> +switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
> +case PCI_CLASS_STORAGE_IDE:
>  pci_piix3_xen_ide_unplug(DEVICE(d));
> -} else if (pci_get_word(d->config + PCI_CLASS_DEVICE) ==
> -PCI_CLASS_STORAGE_SCSI
> -&& strcmp(d->name, "xen-pci-passthrough") != 0) {
> +break;
> +
> +case PCI_CLASS_STORAGE_SCSI:
>  object_unparent(OBJECT(d));
> +break;
> +
> +default:
> +break;
>  }
>  }
>  
> @@ -134,9 +141,9 @@ static void platform_fixed_ioport_writew(void *opaque, 
> uint32_t addr, uint32_t v
>  case 0: {
>  PCIDevice *pci_dev = PCI_DEVICE(s);
>  /* Unplug devices.  Value is a bitmask of which devices to
> -   unplug, with bit 0 the IDE devices, bit 1 the network
> +   unplug, with bit 0 the disk devices, bit 1 the network
> devices, and bit 2 the non-primary-master IDE devices. */
> -if (val & UNPLUG_ALL_IDE_DISKS) {
> +if (val & UNPLUG_ALL_DISKS) {
>  DPRINTF("unplug disks\n");
>  pci_unplug_disks(pci_dev->bus);
>  }
> -- 
> 2.1.4
>

[Qemu-devel] [PATCH v2] 9pfs: fix offset error in v9fs_xattr_read()

2017-01-24 Thread Greg Kurz

The current code tries to copy `read_count' bytes starting at offset
`offset' from a `read_count`-sized iovec. This causes v9fs_pack() to
fail with ENOBUFS.

Since the PDU iovec is already partially filled with `offset' bytes,
let's skip them when creating `qiov_full' and have v9fs_pack() to
copy the whole of it. Moreover, this is consistent with the other
places where v9fs_init_qiov_from_pdu() is called.

This fixes commit "bcb8998fac16 9pfs: call v9fs_init_qiov_from_pdu
before v9fs_pack".

Signed-off-by: Greg Kurz 
---
v2: - pass size + skip to the init_in_iov_from_pdu callback
---
 hw/9pfs/9p.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index fa58877570f6..f22d39064f6b 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1655,7 +1655,7 @@ static void v9fs_init_qiov_from_pdu(QEMUIOVector *qiov, 
V9fsPDU *pdu,
 if (is_write) {
 pdu->s->transport->init_out_iov_from_pdu(pdu, , );
 } else {
-pdu->s->transport->init_in_iov_from_pdu(pdu, , , size);
+pdu->s->transport->init_in_iov_from_pdu(pdu, , , size + skip);
 }
 
 qemu_iovec_init_external(, iov, niov);
@@ -1685,8 +1685,8 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU *pdu, 
V9fsFidState *fidp,
 }
 offset += err;
 
-v9fs_init_qiov_from_pdu(_full, pdu, 0, read_count, false);
-err = v9fs_pack(qiov_full.iov, qiov_full.niov, offset,
+v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count, false);
+err = v9fs_pack(qiov_full.iov, qiov_full.niov, 0,
 ((char *)fidp->fs.xattr.value) + off,
 read_count);
 qemu_iovec_destroy(_full);

[Qemu-devel] [Bug 1658634] Re: Can't get correct display with latest QEMU and OVMF BIOS

2017-01-24 Thread Kai Cong

It works well on my side with the patch. Thanks!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1658634

Title:
  Can't get correct display with latest QEMU and OVMF BIOS

Status in QEMU:
  Confirmed

Bug description:
  I tried to install a Ubuntu 16.04.1 Desktop 64bits with latest QEMU and OVMF 
UEFI BIOS, however I can't get correct display output with default vga 
configuration (-vga std). However, qemu works with a couple of different 
configurations:
  1. "-vga cirrus" + "-bios OVMF.fd": works
  2. "-vga std" + non-UEFI bios: works

  The same error with QEMU 2.8.0 release. Everything works well on
  2.7.0/1.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1658634/+subscriptions

[Qemu-devel] [PULL 15/16] vhost: skip ROM sections

2017-01-24 Thread Michael S. Tsirkin

vhost does not support RO protections on memory at the moment - adding
ROMs would mean that e.g. a buggy guest might change them in-memory - a
condition from which guest reset does not recover. Not nice.

We also definitely don't want to try logging writes into ROMs -
in particular guests set very high addresses for ROM BARs
so logging these writes would waste a lot of memory.

Maybe ROMs could be supported with the iotlb variant -
not sure, but there seems to be no good reason for virtio
to try to do DMA from ROM. So let's just skip ROM memory.

Suggested-by: Laurent Vivier 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Laurent Vivier 
Tested-by: Laurent Vivier 
---
 hw/virtio/vhost.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 9cacf55..94b8c4a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -612,7 +612,8 @@ static void vhost_set_memory(MemoryListener *listener,
 
 static bool vhost_section(MemoryRegionSection *section)
 {
-return memory_region_is_ram(section->mr);
+return memory_region_is_ram(section->mr) &&
+!memory_region_is_rom(section->mr);
 }
 
 static void vhost_begin(MemoryListener *listener)
-- 
MST

[Qemu-devel] [PULL 14/16] virtio: make virtio_should_notify static

2017-01-24 Thread Michael S. Tsirkin

From: Paolo Bonzini 

Signed-off-by: Paolo Bonzini 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio.h | 1 -
 hw/virtio/virtio.c | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 6523bac..525da24 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -182,7 +182,6 @@ void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int 
*in_bytes,
unsigned int *out_bytes,
unsigned max_in_bytes, unsigned max_out_bytes);
 
-bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq);
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq);
 void virtio_notify(VirtIODevice *vdev, VirtQueue *vq);
 
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index cc17b97..b1c5563 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1383,7 +1383,7 @@ static void virtio_set_isr(VirtIODevice *vdev, int value)
 }
 }
 
-bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
+static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
 uint16_t old, new;
 bool v;
-- 
MST

[Qemu-devel] [PULL 13/16] pci: Convert msix_init() to Error and fix callers

2017-01-24 Thread Michael S. Tsirkin

From: Cao jin 

msix_init() reports errors with error_report(), which is wrong when
it's used in realize().  The same issue was fixed for msi_init() in
commit 1108b2f. In order to make the API change as small as possible,
leave the return value check to later patch.

For some devices(like e1000e, vmxnet3, nvme) who won't fail because of
msix_init's failure, suppress the error report by passing NULL error
object.

Bonus: add comment for msix_init.

CC: Jiri Pirko 
CC: Gerd Hoffmann 
CC: Dmitry Fleytman 
CC: Jason Wang 
CC: Michael S. Tsirkin 
CC: Hannes Reinecke 
CC: Paolo Bonzini 
CC: Alex Williamson 
CC: Markus Armbruster 
CC: Marcel Apfelbaum 
Signed-off-by: Cao jin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/pci/msix.h  |  5 +++--
 hw/block/nvme.c|  2 +-
 hw/misc/ivshmem.c  |  8 
 hw/net/e1000e.c|  2 +-
 hw/net/rocker/rocker.c |  4 +++-
 hw/net/vmxnet3.c   |  2 +-
 hw/pci-bridge/gen_pcie_root_port.c |  3 +--
 hw/pci/msix.c  | 36 +++-
 hw/scsi/megasas.c  |  4 +++-
 hw/usb/hcd-xhci.c  |  4 ++--
 hw/vfio/pci.c  |  8 ++--
 hw/virtio/virtio-pci.c |  4 ++--
 12 files changed, 58 insertions(+), 24 deletions(-)

diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 048a29d..1f27658 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -9,9 +9,10 @@ MSIMessage msix_get_message(PCIDevice *dev, unsigned int 
vector);
 int msix_init(PCIDevice *dev, unsigned short nentries,
   MemoryRegion *table_bar, uint8_t table_bar_nr,
   unsigned table_offset, MemoryRegion *pba_bar,
-  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos);
+  uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
+  Error **errp);
 int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries,
-uint8_t bar_nr);
+uint8_t bar_nr, Error **errp);
 
 void msix_write_config(PCIDevice *dev, uint32_t address, uint32_t val, int 
len);
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d479fd2..ae91a18 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -872,7 +872,7 @@ static int nvme_init(PCIDevice *pci_dev)
 pci_register_bar(>parent_obj, 0,
 PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64,
 >iomem);
-msix_init_exclusive_bar(>parent_obj, n->num_queues, 4);
+msix_init_exclusive_bar(>parent_obj, n->num_queues, 4, NULL);
 
 id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));
 id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index abeaf3d..70e71a5 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -749,13 +749,13 @@ static void ivshmem_reset(DeviceState *d)
 }
 }
 
-static int ivshmem_setup_interrupts(IVShmemState *s)
+static int ivshmem_setup_interrupts(IVShmemState *s, Error **errp)
 {
 /* allocate QEMU callback data for receiving interrupts */
 s->msi_vectors = g_malloc0(s->vectors * sizeof(MSIVector));
 
 if (ivshmem_has_feature(s, IVSHMEM_MSI)) {
-if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1)) {
+if (msix_init_exclusive_bar(PCI_DEVICE(s), s->vectors, 1, errp)) {
 return -1;
 }
 
@@ -897,8 +897,8 @@ static void ivshmem_common_realize(PCIDevice *dev, Error 
**errp)
 qemu_chr_fe_set_handlers(>server_chr, ivshmem_can_receive,
  ivshmem_read, NULL, s, NULL, true);
 
-if (ivshmem_setup_interrupts(s) < 0) {
-error_setg(errp, "failed to initialize interrupts");
+if (ivshmem_setup_interrupts(s, errp) < 0) {
+error_prepend(errp, "Failed to initialize interrupts: ");
 return;
 }
 }
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 77a4b3e..b8bc0fe 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -292,7 +292,7 @@ e1000e_init_msix(E1000EState *s)
 E1000E_MSIX_IDX, E1000E_MSIX_TABLE,
 >msix,
 E1000E_MSIX_IDX, E1000E_MSIX_PBA,
-0xA0);
+0xA0, NULL);
 
 if (res < 0) {
 trace_e1000e_msix_init_fail(res);
diff --git a/hw/net/rocker/rocker.c b/hw/net/rocker/rocker.c
index e9d215a..6e70fdd 100644
--- a/hw/net/rocker/rocker.c
+++ b/hw/net/rocker/rocker.c
@@ -1256,14 +1256,16 @@ static int rocker_msix_init(Rocker *r)
 {
 PCIDevice *dev

[Qemu-devel] [PULL 08/16] hw/ioh3420: derive from PCI Express Root Port base class

2017-01-24 Thread Michael S. Tsirkin

From: Marcel Apfelbaum 

Preserve only Intel specific details.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci-bridge/ioh3420.c | 121 ++--
 1 file changed, 15 insertions(+), 106 deletions(-)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index 84b7946..571ffe5 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -61,119 +61,28 @@ static uint8_t ioh3420_aer_vector(const PCIDevice *d)
 return 0;
 }
 
-static void ioh3420_aer_vector_update(PCIDevice *d)
+static int ioh3420_interrupts_init(PCIDevice *d, Error **errp)
 {
-pcie_aer_root_set_vector(d, ioh3420_aer_vector(d));
-}
-
-static void ioh3420_write_config(PCIDevice *d,
-   uint32_t address, uint32_t val, int len)
-{
-uint32_t root_cmd =
-pci_get_long(d->config + d->exp.aer_cap + PCI_ERR_ROOT_COMMAND);
-
-pci_bridge_write_config(d, address, val, len);
-ioh3420_aer_vector_update(d);
-pcie_cap_slot_write_config(d, address, val, len);
-pcie_aer_write_config(d, address, val, len);
-pcie_aer_root_write_config(d, address, val, len, root_cmd);
-}
-
-static void ioh3420_reset(DeviceState *qdev)
-{
-PCIDevice *d = PCI_DEVICE(qdev);
-
-ioh3420_aer_vector_update(d);
-pcie_cap_root_reset(d);
-pcie_cap_deverr_reset(d);
-pcie_cap_slot_reset(d);
-pcie_cap_arifwd_reset(d);
-pcie_aer_root_reset(d);
-pci_bridge_reset(qdev);
-pci_bridge_disable_base_limit(d);
-}
-
-static int ioh3420_initfn(PCIDevice *d)
-{
-PCIEPort *p = PCIE_PORT(d);
-PCIESlot *s = PCIE_SLOT(d);
 int rc;
-Error *err = NULL;
-
-pci_config_set_interrupt_pin(d->config, 1);
-pci_bridge_initfn(d, TYPE_PCIE_BUS);
-pcie_port_init_reg(d);
-
-rc = pci_bridge_ssvid_init(d, IOH_EP_SSVID_OFFSET,
-   IOH_EP_SSVID_SVID, IOH_EP_SSVID_SSID);
-if (rc < 0) {
-goto err_bridge;
-}
+Error *local_err = NULL;
 
 rc = msi_init(d, IOH_EP_MSI_OFFSET, IOH_EP_MSI_NR_VECTOR,
   IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT,
-  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT, );
+  IOH_EP_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
+  _err);
 if (rc < 0) {
 assert(rc == -ENOTSUP);
-error_report_err(err);
-goto err_bridge;
+error_propagate(errp, local_err);
 }
 
-rc = pcie_cap_init(d, IOH_EP_EXP_OFFSET, PCI_EXP_TYPE_ROOT_PORT, p->port);
-if (rc < 0) {
-goto err_msi;
-}
-
-pcie_cap_arifwd_init(d);
-pcie_cap_deverr_init(d);
-pcie_cap_slot_init(d, s->slot);
-pcie_cap_root_init(d);
-
-pcie_chassis_create(s->chassis);
-rc = pcie_chassis_add_slot(s);
-if (rc < 0) {
-goto err_pcie_cap;
-}
-
-rc = pcie_aer_init(d, PCI_ERR_VER, IOH_EP_AER_OFFSET,
-   PCI_ERR_SIZEOF, );
-if (rc < 0) {
-error_report_err(err);
-goto err;
-}
-pcie_aer_root_init(d);
-ioh3420_aer_vector_update(d);
-
-return 0;
-
-err:
-pcie_chassis_del_slot(s);
-err_pcie_cap:
-pcie_cap_exit(d);
-err_msi:
-msi_uninit(d);
-err_bridge:
-pci_bridge_exitfn(d);
 return rc;
 }
 
-static void ioh3420_exitfn(PCIDevice *d)
+static void ioh3420_interrupts_uninit(PCIDevice *d)
 {
-PCIESlot *s = PCIE_SLOT(d);
-
-pcie_aer_exit(d);
-pcie_chassis_del_slot(s);
-pcie_cap_exit(d);
 msi_uninit(d);
-pci_bridge_exitfn(d);
 }
 
-static Property ioh3420_props[] = {
-DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present,
-QEMU_PCIE_SLTCAP_PCP_BITNR, true),
-DEFINE_PROP_END_OF_LIST()
-};
-
 static const VMStateDescription vmstate_ioh3420 = {
 .name = "ioh-3240-express-root-port",
 .version_id = 1,
@@ -191,25 +100,25 @@ static void ioh3420_class_init(ObjectClass *klass, void 
*data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
 
-k->is_express = 1;
-k->is_bridge = 1;
-k->config_write = ioh3420_write_config;
-k->init = ioh3420_initfn;
-k->exit = ioh3420_exitfn;
 k->vendor_id = PCI_VENDOR_ID_INTEL;
 k->device_id = PCI_DEVICE_ID_IOH_EPORT;
 k->revision = PCI_DEVICE_ID_IOH_REV;
-set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
 dc->desc = "Intel IOH device id 3420 PCIE Root Port";
-dc->reset = ioh3420_reset;
 dc->vmsd = _ioh3420;
-dc->props = ioh3420_props;
+rpc->aer_vector = ioh3420_aer_vector;
+rpc->interrupts_init = ioh3420_interrupts_init;
+rpc->interrupts_uninit = ioh3420_interrupts_uninit;
+rpc->exp_offset = IOH_EP_EXP_OFFSET;
+rpc->aer_offset = IOH_EP_AER_OFFSET;
+rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
+

[Qemu-devel] [PULL 10/16] hw/i386: check if nvdimm is enabled before plugging

2017-01-24 Thread Michael S. Tsirkin

From: Haozhong Zhang 

The missing of 'nvdimm' in the machine type option '-M' means NVDIMM
is disabled. QEMU should refuse to plug any NVDIMM device in this case
and report the misconfiguration.

The behavior of NVDIMM on unsupported platform (HW/FW) is vendor
specific. For some vendors, it's undefined and the platform may do
anything. Thus, I think QEMU is free to choose the implementation.
Aborting QEMU (i.e. refusing to boot) is the easiest one.

Reported-by: Stefan Hajnoczi 
Signed-off-by: Haozhong Zhang 
Message-Id: 20170112110928.GF4621@stefanha-x1.localdomain
Message-Id: 20170111093630.2088-1-stefa...@redhat.com
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Xiao Guangrong 
Reviewed-by: Eduardo Habkost 
Reviewed-by: Stefan Hajnoczi 
---
 hw/i386/pc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c949cf0..47984c8 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1708,6 +1708,11 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
 }
 
 if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
+if (!pcms->acpi_nvdimm_state.is_enabled) {
+error_setg(_err,
+   "nvdimm is not enabled: missing 'nvdimm' in '-M'");
+goto out;
+}
 nvdimm_plug(>acpi_nvdimm_state);
 }
 
-- 
MST

[Qemu-devel] [PULL 16/16] vhost-user: delete chardev on cleanup

2017-01-24 Thread Michael S. Tsirkin

From: Marc-André Lureau 

Remove the chardev implicitly when cleaning up the netdev. This
prevents from reusing the chardev since it would be in an incorrect
state with the slave.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1256618

Signed-off-by: Marc-André Lureau 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Eric Blake 
---
 net/vhost-user.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index 7aff77e..179939f 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -151,7 +151,10 @@ static void vhost_user_cleanup(NetClientState *nc)
 s->vhost_net = NULL;
 }
 if (nc->queue_index == 0) {
+CharDriverState *chr = qemu_chr_fe_get_driver(>chr);
+
 qemu_chr_fe_deinit(>chr);
+qemu_chr_delete(chr);
 }
 
 qemu_purge_queued_packets(nc);
-- 
MST

[Qemu-devel] [PULL 05/16] pci: mark ROMs read-only

2017-01-24 Thread Michael S. Tsirkin

Looks like we didn't mark PCI ROMs as RO allowing
mischief such as guests writing there.
Further, e.g. vhost gets confused trying to allocate
enough space to log writes there. Fix it up.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Marcel Apfelbaum 
Tested-by: Laurent Vivier 
---
 hw/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index fe9acec..8843ebf 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2192,7 +2192,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom,
 snprintf(name, sizeof(name), "%s.rom", 
object_get_typename(OBJECT(pdev)));
 }
 pdev->has_rom = true;
-memory_region_init_ram(>rom, OBJECT(pdev), name, size, _fatal);
+memory_region_init_rom(>rom, OBJECT(pdev), name, size, _fatal);
 vmstate_register_ram(>rom, >qdev);
 ptr = memory_region_get_ram_ptr(>rom);
 load_image(path, ptr);
-- 
MST

[Qemu-devel] [PULL 09/16] hw/pcie: Introduce Generic PCI Express Root Port

2017-01-24 Thread Michael S. Tsirkin

From: Marcel Apfelbaum 

The Generic Root Port behaves almost the same as the
Intel's IOH device with id 3420, without having
Intel specific attributes.

The device has two purposes:
 (1) Can be used on both X86 and ARM machines.
 (2) It will allow us to tweak the behaviour
(e.g add vendor-specific PCI capabilities)
 - something that obviously cannot be done
   on a known device.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Tested-by: Andrea Bolognani 
---
 include/hw/pci/pci.h   |  1 +
 hw/pci-bridge/gen_pcie_root_port.c | 88 ++
 hw/pci-bridge/Makefile.objs|  2 +-
 3 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 hw/pci-bridge/gen_pcie_root_port.c

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 772692f..cbc1fdf 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -96,6 +96,7 @@
 #define PCI_DEVICE_ID_REDHAT_PXB 0x0009
 #define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
 #define PCI_DEVICE_ID_REDHAT_PXB_PCIE0x000b
+#define PCI_DEVICE_ID_REDHAT_PCIE_RP 0x000c
 #define PCI_DEVICE_ID_REDHAT_QXL 0x0100
 
 #define FMT_PCIBUS  PRIx64
diff --git a/hw/pci-bridge/gen_pcie_root_port.c 
b/hw/pci-bridge/gen_pcie_root_port.c
new file mode 100644
index 000..42bbe34
--- /dev/null
+++ b/hw/pci-bridge/gen_pcie_root_port.c
@@ -0,0 +1,88 @@
+/*
+ * Generic PCI Express Root Port emulation
+ *
+ * Copyright (C) 2017 Red Hat Inc
+ *
+ * Authors:
+ *   Marcel Apfelbaum 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/pci/msix.h"
+#include "hw/pci/pcie_port.h"
+
+#define TYPE_GEN_PCIE_ROOT_PORT"pcie-root-port"
+
+#define GEN_PCIE_ROOT_PORT_AER_OFFSET   0x100
+#define GEN_PCIE_ROOT_PORT_MSIX_NR_VECTOR   1
+
+static uint8_t gen_rp_aer_vector(const PCIDevice *d)
+{
+return 0;
+}
+
+static int gen_rp_interrupts_init(PCIDevice *d, Error **errp)
+{
+int rc;
+
+rc = msix_init_exclusive_bar(d, GEN_PCIE_ROOT_PORT_MSIX_NR_VECTOR, 0);
+
+if (rc < 0) {
+assert(rc == -ENOTSUP);
+error_setg(errp, "Unable to init msix vectors");
+} else {
+msix_vector_use(d, 0);
+}
+
+return rc;
+}
+
+static void gen_rp_interrupts_uninit(PCIDevice *d)
+{
+msix_uninit_exclusive_bar(d);
+}
+
+static const VMStateDescription vmstate_rp_dev = {
+.name = "pcie-root-port",
+.version_id = 1,
+.minimum_version_id = 1,
+.post_load = pcie_cap_slot_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_PCIE_DEVICE(parent_obj.parent_obj.parent_obj, PCIESlot),
+VMSTATE_STRUCT(parent_obj.parent_obj.parent_obj.exp.aer_log,
+   PCIESlot, 0, vmstate_pcie_aer_log, PCIEAERLog),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static void gen_rp_dev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
+
+k->vendor_id = PCI_VENDOR_ID_REDHAT;
+k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_RP;
+dc->desc = "PCI Express Root Port";
+dc->vmsd = _rp_dev;
+rpc->aer_vector = gen_rp_aer_vector;
+rpc->interrupts_init = gen_rp_interrupts_init;
+rpc->interrupts_uninit = gen_rp_interrupts_uninit;
+rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
+}
+
+static const TypeInfo gen_rp_dev_info = {
+.name  = TYPE_GEN_PCIE_ROOT_PORT,
+.parent= TYPE_PCIE_ROOT_PORT,
+.class_init= gen_rp_dev_class_init,
+};
+
+ static void gen_rp_register_types(void)
+ {
+type_register_static(_rp_dev_info);
+ }
+ type_init(gen_rp_register_types)
diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
index 4f2039f..85e8e39 100644
--- a/hw/pci-bridge/Makefile.objs
+++ b/hw/pci-bridge/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += pci_bridge_dev.o
 common-obj-y += pci_expander_bridge.o
-common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o
+common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o gen_pcie_root_port.o
 common-obj-$(CONFIG_XIO3130) += xio3130_upstream.o xio3130_downstream.o
 common-obj-$(CONFIG_IOH3420) += ioh3420.o
 common-obj-$(CONFIG_I82801B11) += i82801b11.o
-- 
MST

[Qemu-devel] [PULL 11/16] msix: Follow CODING_STYLE

2017-01-24 Thread Michael S. Tsirkin

From: Cao jin 

CC: Markus Armbruster 
CC: Marcel Apfelbaum 
CC: Michael S. Tsirkin 

Reviewed-by: Markus Armbruster 
Acked-by: Marcel Apfelbaum 
Signed-off-by: Cao jin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/pci/msix.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 0ec1cb1..0cee631 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -447,8 +447,10 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 {
 MSIMessage msg;
 
-if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector])
+if (vector >= dev->msix_entries_nr || !dev->msix_entry_used[vector]) {
 return;
+}
+
 if (msix_is_masked(dev, vector)) {
 msix_set_pending(dev, vector);
 return;
@@ -483,8 +485,10 @@ void msix_reset(PCIDevice *dev)
 /* Mark vector as used. */
 int msix_vector_use(PCIDevice *dev, unsigned vector)
 {
-if (vector >= dev->msix_entries_nr)
+if (vector >= dev->msix_entries_nr) {
 return -EINVAL;
+}
+
 dev->msix_entry_used[vector]++;
 return 0;
 }
-- 
MST

[Qemu-devel] [PULL 06/16] intel_iommu: fix and simplify size calculation in process_device_iotlb_desc()

2017-01-24 Thread Michael S. Tsirkin

From: Jason Wang 

We don't use 1ULL which is wrong during size calculation. Fix it, and
while at it, switch to use cto64() and adds a comments to make it
simpler and easier to be understood.

Reported-by: Paolo Bonzini 
Cc: Paolo Bonzini 
Signed-off-by: Jason Wang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Paolo Bonzini 
---
 hw/i386/intel_iommu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ec62239..3270fb9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1485,8 +1485,16 @@ static bool 
vtd_process_device_iotlb_desc(IntelIOMMUState *s,
 goto done;
 }
 
+/* According to ATS spec table 2.4:
+ * S = 0, bits 15:12 =  range size: 4K
+ * S = 1, bits 15:12 = xxx0 range size: 8K
+ * S = 1, bits 15:12 = xx01 range size: 16K
+ * S = 1, bits 15:12 = x011 range size: 32K
+ * S = 1, bits 15:12 = 0111 range size: 64K
+ * ...
+ */
 if (size) {
-sz = 1 << (ctz64(~(addr | (VTD_PAGE_MASK_4K - 1))) + 1);
+sz = (VTD_PAGE_SIZE * 2) << cto64(addr >> VTD_PAGE_SHIFT);
 addr &= ~(sz - 1);
 } else {
 sz = VTD_PAGE_SIZE;
-- 
MST

[Qemu-devel] [PULL 12/16] hcd-xhci: check & correct param before using it

2017-01-24 Thread Michael S. Tsirkin

From: Cao jin 

usb_xhci_realize() corrects invalid values of property "intrs"
automatically, but the uncorrected value is passed to msi_init(),
which chokes on invalid values.  Delay that until after the
correction.

Resources allocated by usb_xhci_init() are leaked when msi_init()
fails.  Fix by calling it after msi_init().

CC: Gerd Hoffmann 
CC: Markus Armbruster 
CC: Marcel Apfelbaum 
CC: Michael S. Tsirkin 

Reviewed-by: Markus Armbruster 
Acked-by: Marcel Apfelbaum 
Signed-off-by: Cao jin 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/usb/hcd-xhci.c | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 4acf0c6..0ace273 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -3627,25 +3627,6 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
 dev->config[0x60] = 0x30; /* release number */
 
-usb_xhci_init(xhci);
-
-if (xhci->msi != ON_OFF_AUTO_OFF) {
-ret = msi_init(dev, 0x70, xhci->numintrs, true, false, );
-/* Any error other than -ENOTSUP(board's MSI support is broken)
- * is a programming error */
-assert(!ret || ret == -ENOTSUP);
-if (ret && xhci->msi == ON_OFF_AUTO_ON) {
-/* Can't satisfy user's explicit msi=on request, fail */
-error_append_hint(, "You have to use msi=auto (default) or "
-"msi=off with this machine type.\n");
-error_propagate(errp, err);
-return;
-}
-assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
-/* With msi=auto, we fall back to MSI off silently */
-error_free(err);
-}
-
 if (xhci->numintrs > MAXINTRS) {
 xhci->numintrs = MAXINTRS;
 }
@@ -3667,6 +3648,24 @@ static void usb_xhci_realize(struct PCIDevice *dev, 
Error **errp)
 xhci->max_pstreams_mask = 0;
 }
 
+if (xhci->msi != ON_OFF_AUTO_OFF) {
+ret = msi_init(dev, 0x70, xhci->numintrs, true, false, );
+/* Any error other than -ENOTSUP(board's MSI support is broken)
+ * is a programming error */
+assert(!ret || ret == -ENOTSUP);
+if (ret && xhci->msi == ON_OFF_AUTO_ON) {
+/* Can't satisfy user's explicit msi=on request, fail */
+error_append_hint(, "You have to use msi=auto (default) or "
+"msi=off with this machine type.\n");
+error_propagate(errp, err);
+return;
+}
+assert(!err || xhci->msi == ON_OFF_AUTO_AUTO);
+/* With msi=auto, we fall back to MSI off silently */
+error_free(err);
+}
+
+usb_xhci_init(xhci);
 xhci->mfwrap_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, xhci_mfwrap_timer, 
xhci);
 
 memory_region_init(>mem, OBJECT(xhci), "xhci", LEN_REGS);
-- 
MST

[Qemu-devel] [PULL 04/16] ARRAY_SIZE: check that argument is an array

2017-01-24 Thread Michael S. Tsirkin

It's a familiar pattern: some code uses ARRAY_SIZE, then refactoring
changes the argument from an array to a pointer to a dynamically
allocated buffer.  Code keeps compiling but any ARRAY_SIZE calls now
return the size of the pointer divided by element size.

Let's add build time checks to ARRAY_SIZE before we allow more
of these in the code-base.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Markus Armbruster 
Reviewed-by: Eric Blake 
---
 include/qemu/osdep.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 689f253..56c9e22 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -198,8 +198,15 @@ extern int daemon(int, int);
 #define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
 #endif
 
+/*
+ * &(x)[0] is always a pointer - if it's same type as x then the argument is a
+ * pointer, not an array.
+ */
+#define QEMU_IS_ARRAY(x) (!__builtin_types_compatible_p(typeof(x), \
+typeof(&(x)[0])))
 #ifndef ARRAY_SIZE
-#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
+#define ARRAY_SIZE(x) ((sizeof(x) / sizeof((x)[0])) + \
+   QEMU_BUILD_BUG_ON_ZERO(!QEMU_IS_ARRAY(x)))
 #endif
 
 int qemu_daemon(int nochdir, int noclose);
-- 
MST

[Qemu-devel] [PULL 07/16] hw/pcie: Introduce a base class for PCI Express Root Ports

2017-01-24 Thread Michael S. Tsirkin

From: Marcel Apfelbaum 

The 'base' PCI Express Root Port includes
the common code to be re-used for all
Root Ports implementations. Most of the code
was taken from the current implementation
of Intel's IOH 3420 Root Port.

Signed-off-by: Marcel Apfelbaum 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 default-configs/arm-softmmu.mak|   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 include/hw/pci/pcie_port.h |  19 +
 hw/pci-bridge/pcie_root_port.c | 171 +
 hw/pci-bridge/Makefile.objs|   1 +
 6 files changed, 194 insertions(+)
 create mode 100644 hw/pci-bridge/pcie_root_port.c

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 6de3e16..6f2a180 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -108,6 +108,7 @@ CONFIG_FSL_IMX25=y
 
 CONFIG_IMX_I2C=y
 
+CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 0b51360..9288838 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -51,6 +51,7 @@ CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
+CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 7f89503..7d2c2d4 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -51,6 +51,7 @@ CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
 CONFIG_ACPI_NVDIMM=y
+CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/include/hw/pci/pcie_port.h b/include/hw/pci/pcie_port.h
index f7b64db..1333266 100644
--- a/include/hw/pci/pcie_port.h
+++ b/include/hw/pci/pcie_port.h
@@ -57,4 +57,23 @@ PCIESlot *pcie_chassis_find_slot(uint8_t chassis, uint16_t 
slot);
 int pcie_chassis_add_slot(struct PCIESlot *slot);
 void pcie_chassis_del_slot(PCIESlot *s);
 
+#define TYPE_PCIE_ROOT_PORT "pcie-root-port-base"
+#define PCIE_ROOT_PORT_CLASS(klass) \
+ OBJECT_CLASS_CHECK(PCIERootPortClass, (klass), TYPE_PCIE_ROOT_PORT)
+#define PCIE_ROOT_PORT_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(PCIERootPortClass, (obj), TYPE_PCIE_ROOT_PORT)
+
+typedef struct PCIERootPortClass {
+PCIDeviceClass parent_class;
+
+uint8_t (*aer_vector)(const PCIDevice *dev);
+int (*interrupts_init)(PCIDevice *dev, Error **errp);
+void (*interrupts_uninit)(PCIDevice *dev);
+
+int exp_offset;
+int aer_offset;
+int ssvid_offset;
+int ssid;
+} PCIERootPortClass;
+
 #endif /* QEMU_PCIE_PORT_H */
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
new file mode 100644
index 000..cf36318
--- /dev/null
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -0,0 +1,171 @@
+/*
+ * Base class for PCI Express Root Ports
+ *
+ * Copyright (C) 2017 Red Hat Inc
+ *
+ * Authors:
+ *   Marcel Apfelbaum 
+ *
+ * Most of the code was migrated from hw/pci-bridge/ioh3420.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/pci/pcie_port.h"
+
+static void rp_aer_vector_update(PCIDevice *d)
+{
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
+
+if (rpc->aer_vector) {
+pcie_aer_root_set_vector(d, rpc->aer_vector(d));
+}
+}
+
+static void rp_write_config(PCIDevice *d, uint32_t address,
+uint32_t val, int len)
+{
+uint32_t root_cmd =
+pci_get_long(d->config + d->exp.aer_cap + PCI_ERR_ROOT_COMMAND);
+
+pci_bridge_write_config(d, address, val, len);
+rp_aer_vector_update(d);
+pcie_cap_slot_write_config(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+pcie_aer_root_write_config(d, address, val, len, root_cmd);
+}
+
+static void rp_reset(DeviceState *qdev)
+{
+PCIDevice *d = PCI_DEVICE(qdev);
+
+rp_aer_vector_update(d);
+pcie_cap_root_reset(d);
+pcie_cap_deverr_reset(d);
+pcie_cap_slot_reset(d);
+pcie_cap_arifwd_reset(d);
+pcie_aer_root_reset(d);
+pci_bridge_reset(qdev);
+pci_bridge_disable_base_limit(d);
+}
+
+static void rp_realize(PCIDevice *d, Error **errp)
+{
+PCIEPort *p = PCIE_PORT(d);
+PCIESlot *s = PCIE_SLOT(d);
+PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
+PCIERootPortClass *rpc = PCIE_ROOT_PORT_GET_CLASS(d);
+int rc;
+Error *local_err = NULL;
+
+pci_config_set_interrupt_pin(d->config, 1);
+pci_bridge_initfn(d, TYPE_PCIE_BUS);
+pcie_port_init_reg(d);
+
+rc = pci_bridge_ssvid_init(d, rpc->ssvid_offset, dc->vendor_id, rpc->ssid);
+if (rc < 0) {
+

[Qemu-devel] [PULL 01/16] compiler: drop ; after BUILD_BUG_ON

2017-01-24 Thread Michael S. Tsirkin

All users include the trailing ; anyway, let's require that -
it seems cleaner.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 include/qemu/compiler.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 157698b..7512082 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -86,7 +86,8 @@
 #define type_check(t1,t2) ((t1*)0 - (t2*)0)
 
 #define QEMU_BUILD_BUG_ON(x) \
-typedef char glue(qemu_build_bug_on__,__LINE__)[(x)?-1:1] 
__attribute__((unused));
+typedef char glue(qemu_build_bug_on__, __LINE__)[(x) ? -1 : 1] \
+__attribute__((unused))
 
 #if defined __GNUC__
 # if !QEMU_GNUC_PREREQ(4, 4)
-- 
MST

[Qemu-devel] [PULL 02/16] compiler: rework BUG_ON using a struct

2017-01-24 Thread Michael S. Tsirkin

There are theoretical concerns that some compilers might not trigger
build failures on attempts to define an array of size (x ? -1 : 1) where
x is a variable and make it a variable sized array instead. Let rewrite
using a struct with a negative bit field size instead as there are no
dynamic bit field sizes.  This is similar to what Linux does.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 include/qemu/compiler.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 7512082..1da500e 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -85,9 +85,12 @@
 #define typeof_field(type, field) typeof(((type *)0)->field)
 #define type_check(t1,t2) ((t1*)0 - (t2*)0)
 
-#define QEMU_BUILD_BUG_ON(x) \
-typedef char glue(qemu_build_bug_on__, __LINE__)[(x) ? -1 : 1] \
-__attribute__((unused))
+#define QEMU_BUILD_BUG_ON_STRUCT(x) \
+struct { \
+int:(x) ? -1 : 1; \
+}
+#define QEMU_BUILD_BUG_ON(x) typedef QEMU_BUILD_BUG_ON_STRUCT(x) \
+glue(qemu_build_bug_on__, __LINE__) __attribute__((unused))
 
 #if defined __GNUC__
 # if !QEMU_GNUC_PREREQ(4, 4)
-- 
MST

[Qemu-devel] [PULL 03/16] compiler: expression version of QEMU_BUILD_BUG_ON

2017-01-24 Thread Michael S. Tsirkin

QEMU_BUILD_BUG_ON uses a typedef in order to be safe
to use outside functions, but sometimes it's useful
to have a version that can be used within an expression.
Following what Linux does, introduce QEMU_BUILD_BUG_ON_ZERO
that return zero after checking condition at build time.

Signed-off-by: Michael S. Tsirkin 
Reviewed-by: Eric Blake 
Reviewed-by: Markus Armbruster 
---
 include/qemu/compiler.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 1da500e..e0fb18b 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -92,6 +92,9 @@
 #define QEMU_BUILD_BUG_ON(x) typedef QEMU_BUILD_BUG_ON_STRUCT(x) \
 glue(qemu_build_bug_on__, __LINE__) __attribute__((unused))
 
+#define QEMU_BUILD_BUG_ON_ZERO(x) (sizeof(QEMU_BUILD_BUG_ON_STRUCT(x)) - \
+   sizeof(QEMU_BUILD_BUG_ON_STRUCT(x)))
+
 #if defined __GNUC__
 # if !QEMU_GNUC_PREREQ(4, 4)
/* gcc versions before 4.4.x don't support gnu_printf, so use printf. */
-- 
MST

[Qemu-devel] [PULL 00/16] virtio, vhost, pci: fixes, features

2017-01-24 Thread Michael S. Tsirkin

The following changes since commit a9e404600a9bd1e6a26431fc89e5069092e67f14:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170124' into 
staging (2017-01-24 17:26:26 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to b562d945fc840131d290d4bf5af8d6cc3bb98a2b:

  vhost-user: delete chardev on cleanup (2017-01-24 21:49:25 +0200)


virtio, vhost, pci: fixes, features

generic pci root port support
fixes and cleanups all over the place

Signed-off-by: Michael S. Tsirkin <m...@redhat.com>


Cao jin (3):
  msix: Follow CODING_STYLE
  hcd-xhci: check & correct param before using it
  pci: Convert msix_init() to Error and fix callers

Haozhong Zhang (1):
  hw/i386: check if nvdimm is enabled before plugging

Jason Wang (1):
  intel_iommu: fix and simplify size calculation in 
process_device_iotlb_desc()

Marc-André Lureau (1):
  vhost-user: delete chardev on cleanup

Marcel Apfelbaum (3):
  hw/pcie: Introduce a base class for PCI Express Root Ports
  hw/ioh3420: derive from PCI Express Root Port base class
  hw/pcie: Introduce Generic PCI Express Root Port

Michael S. Tsirkin (6):
  compiler: drop ; after BUILD_BUG_ON
  compiler: rework BUG_ON using a struct
  compiler: expression version of QEMU_BUILD_BUG_ON
  ARRAY_SIZE: check that argument is an array
  pci: mark ROMs read-only
  vhost: skip ROM sections

Paolo Bonzini (1):
  virtio: make virtio_should_notify static

 default-configs/arm-softmmu.mak|   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 include/hw/pci/msix.h  |   5 +-
 include/hw/pci/pci.h   |   1 +
 include/hw/pci/pcie_port.h |  19 +
 include/hw/virtio/virtio.h |   1 -
 include/qemu/compiler.h|  11 ++-
 include/qemu/osdep.h   |   9 +-
 hw/block/nvme.c|   2 +-
 hw/i386/intel_iommu.c  |  10 ++-
 hw/i386/pc.c   |   5 ++
 hw/misc/ivshmem.c  |   8 +-
 hw/net/e1000e.c|   2 +-
 hw/net/rocker/rocker.c |   4 +-
 hw/net/vmxnet3.c   |   2 +-
 hw/pci-bridge/gen_pcie_root_port.c |  87 +++
 hw/pci-bridge/ioh3420.c| 121 --
 hw/pci-bridge/pcie_root_port.c | 171 +
 hw/pci/msix.c  |  44 --
 hw/pci/pci.c   |   2 +-
 hw/scsi/megasas.c  |   4 +-
 hw/usb/hcd-xhci.c  |  41 +
 hw/vfio/pci.c  |   8 +-
 hw/virtio/vhost.c  |   3 +-
 hw/virtio/virtio-pci.c |   4 +-
 hw/virtio/virtio.c |   2 +-
 net/vhost-user.c   |   3 +
 hw/pci-bridge/Makefile.objs|   1 +
 29 files changed, 416 insertions(+), 157 deletions(-)
 create mode 100644 hw/pci-bridge/gen_pcie_root_port.c
 create mode 100644 hw/pci-bridge/pcie_root_port.c

Re: [Qemu-devel] [PATCH] 9pfs: fix offset error in v9fs_xattr_read()

2017-01-24 Thread Stefano Stabellini

On Tue, 24 Jan 2017, Greg Kurz wrote:
> On Mon, 23 Jan 2017 12:20:57 -0800 (PST)
> Stefano Stabellini  wrote:
> 
> > On Sat, 21 Jan 2017, Greg Kurz wrote:
> > > The current code tries to copy `read_count' bytes starting at offset
> > > `offset' from a `read_count`-sized iovec. This causes v9fs_pack() to
> > > fail with ENOBUFS.
> > > 
> > > Since the PDU iovec is already partially filled with `offset' bytes,
> > > let's skip them when creating `qiov_full' and have v9fs_pack() to
> > > copy the whole of it. Moreover, this is consistent with the other
> > > places where v9fs_init_qiov_from_pdu() is called.
> > > 
> > > This fixes commit "bcb8998fac16 9pfs: call v9fs_init_qiov_from_pdu
> > > before v9fs_pack".  
> > 
> > Sorry about that!
> > 
> > 
> > > Signed-off-by: Greg Kurz 
> > > ---
> > >  hw/9pfs/9p.c |4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
> > > index fa58877570f6..f0eef1a3ef53 100644
> > > --- a/hw/9pfs/9p.c
> > > +++ b/hw/9pfs/9p.c
> > > @@ -1685,8 +1685,8 @@ static int v9fs_xattr_read(V9fsState *s, V9fsPDU 
> > > *pdu, V9fsFidState *fidp,
> > >  }
> > >  offset += err;
> > >  
> > > -v9fs_init_qiov_from_pdu(_full, pdu, 0, read_count, false);
> > > -err = v9fs_pack(qiov_full.iov, qiov_full.niov, offset,
> > > +v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count, false); 
> > >  
> > 
> > v9fs_init_qiov_from_pdu calls init_in_iov_from_pdu passing read_count as
> > size argument. offset is not passed to init_in_iov_from_pdu, it is only
> > used to initialized qiov_full.
> > 
> > In other words, don't we need to:
> > 
> > v9fs_init_qiov_from_pdu(_full, pdu, offset, read_count + offset, 
> > false);
> > 
> > To make sure that qiov_full has "read_count + offset" bytes in it, and
> > qiov_full is initialized skipping the first "offset" bytes?
> > 
> 
> If we do that then qemu_iovec_concat() will skip offset bytes and then copy
> read_count + offset bytes or am I missing something ?

You are right, sorry. qemu_iovec_concat didn't do what I thought it
would. In that case, I think we need to change v9fs_init_qiov_from_pdu
(in addition to the changes already in this patch):

@@ -1659,12 +1659,14 @@ static void v9fs_init_qiov_from_pdu(QEMUIOVector *qiov, 
V9fsPDU *pdu,
 if (is_write) {
 pdu->s->transport->init_out_iov_from_pdu(pdu, , );
 } else {
-pdu->s->transport->init_in_iov_from_pdu(pdu, , , size);
+pdu->s->transport->init_in_iov_from_pdu(pdu, , , size + skip);
 }
 
The reason is the same as before: iov needs to have offset+read_count
bytes in it, for qemu_iovec_concat to skip offset and iov still have
read_count bytes left.


> > > +err = v9fs_pack(qiov_full.iov, qiov_full.niov, 0,
> > >  ((char *)fidp->fs.xattr.value) + off,
> > >  read_count);
> > >  qemu_iovec_destroy(_full);
> > >   
> 
>

[Qemu-devel] QEMU websockets support is laggy?

2017-01-24 Thread Brian Rak

We've been considering switching over to using qemu's built in 
websockets support (to avoid the overhead of needing websockify 
running).  We've been seeing very poor performance after the switch (it 
takes the console 4-5 seconds to update after pressing a key).  So far, 
I haven't been able to find any indication of why this is happening.  
The exact same configuration works perfectly when running with 
websockify, but laggy when hitting qemu directly.


I've tried a few things (disabling encryption, bypassing our usual nginx 
proxy, even connecting via a ssh tunnel), and haven't made any sort of 
progress here.  Has anyone else seen this?  Any suggestions as to where 
I should start looking?

Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: MMU compatibility check.

2017-01-24 Thread Valentin Plotkin

On Tue, 24 Jan 2017, Thomas Huth wrote:

Date: Tue, 24 Jan 2017 21:32:44 +0100
From: Thomas Huth 
To: Valentin Plotkin , qemu-triv...@nongnu.org
Cc: qemu-...@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-ppc] [PATCH] PPC: MMU compatibility check.

On 24.01.2017 19:56, Valentin Plotkin wrote:

Hi everyone,

I looked at the "qemu-system-ppc -nographic -cpu G2leGP3 -M ppce500" on
the BiteSizedTasks page. The segfault was caused by machine
initialization code which expected a certain MMU model, checked, so
unused SPR were read, returning zeros. bamboo and virtex machines are
affected as well, but it doesn't always cause segfault, usually running
into unmapped memory and failing somewhat more nicely.

I added the checks. It would be possible to add support for other MMU
models, but I'm not sure if there is any point (would any guest OS
support mutually exclusive CPU and machine)?

Hi,

great to have a fix for this crash! I don't think it make much sense to
add support for other MMU models here, so the simple checks should be
good enough.
However, your new code obviously does not follow the QEMU coding style.
Could you please run your patch through scripts/checkpatch.pl and fix
all issues that it reports? And when you resubmit, please make sure to
copy the maintainers on CC: as well (scripts/get_maintainers.pl is your
friend here).

Thanks,
 Thomas

Here is fengshuised version (at least I hope so). Thanks for guiding me.

Signed-off-by: Valentin Plotkin 
---
 hw/ppc/e500.c  | 6 ++
 hw/ppc/ppc440_bamboo.c | 6 ++
 hw/ppc/virtex_ml507.c  | 6 ++
 3 files changed, 18 insertions(+)

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index cf8b122..683d9a9 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -631,6 +631,12 @@ static uint64_t mmubooke_initial_mapsize(CPUPPCState 
*env)

 static void mmubooke_create_initial_mapping(CPUPPCState *env)
 {
+if (env->mmu_model != POWERPC_MMU_BOOKE206) {
+fprintf(stderr, "MMU model %i not supported by this machine.\n",
+env->mmu_model);
+exit(-1);
+}
+
 ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0);
 hwaddr size;
 int ps;
diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c
index 5c535b1..793b758 100644
--- a/hw/ppc/ppc440_bamboo.c
+++ b/hw/ppc/ppc440_bamboo.c
@@ -124,6 +124,12 @@ static void 
mmubooke_create_initial_mapping(CPUPPCState *env,

  target_ulong va,
  hwaddr pa)
 {
+if (env->mmu_model != POWERPC_MMU_BOOKE) {
+fprintf(stderr, "MMU model %i not supported by this machine.\n",
+env->mmu_model);
+exit(-1);
+}
+
 ppcemb_tlb_t *tlb = >tlb.tlbe[0];

 tlb->attr = 0;
diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c
index b97d966..c01415c 100644
--- a/hw/ppc/virtex_ml507.c
+++ b/hw/ppc/virtex_ml507.c
@@ -69,6 +69,12 @@ static void mmubooke_create_initial_mapping(CPUPPCState 
*env,

  target_ulong va,
  hwaddr pa)
 {
+if (env->mmu_model != POWERPC_MMU_BOOKE) {
+fprintf(stderr, "MMU model %i not supported by this machine.\n",
+env->mmu_model);
+exit(-1);
+}
+
 ppcemb_tlb_t *tlb = >tlb.tlbe[0];

 tlb->attr = 0;
--
2.5.5

calib...@sdf.org
SDF Public Access UNIX System - http://sdf.org

Re: [Qemu-devel] Commit 3a6c9 breaks QEMU on FreeBSD/Xen

2017-01-24 Thread Stefano Stabellini

On Tue, 24 Jan 2017, Stefano Stabellini wrote:
> On Tue, 24 Jan 2017, Roger Pau Monné wrote:
> > Hello,
> > 
> > The following commit:
> > 
> > commit 3a6c9172ac5951e6dac2b3f6cbce3cfccdec5894
> > Author: Juergen Gross 
> > Date:   Tue Nov 22 07:10:58 2016 +0100
> > 
> > xen: create qdev for each backend device
> > 
> > Prevents me from running QEMU on FreeBSD/Xen, the following is printed on 
> > the
> > QEMU log:
> > 
> > char device redirected to /dev/pts/2 (label serial0)
> > xen be core: xen be core: can't open gnttab device
> > can't open gnttab device
> > xen be core: xen be core: can't open gnttab device
> > can't open gnttab device
> > 
> > # xl create -c ~/domain.cfg
> > Parsing config from /root/domain.cfg
> > libxl: error: libxl_dm.c:2201:device_model_spawn_outcome: Domain 32:domain 
> > 32 device model: spawn failed (rc=-3)
> > libxl: error: libxl_create.c:1506:domcreate_devmodel_started: Domain 
> > 32:device model did not start: -3
> > libxl: error: libxl_dm.c:2315:kill_device_model: Device Model already exited
> > libxl: error: libxl.c:1572:libxl__destroy_domid: Domain 32:Non-existant 
> > domain
> > libxl: error: libxl.c:1531:domain_destroy_callback: Domain 32:Unable to 
> > destroy guest
> > libxl: error: libxl.c:1458:domain_destroy_cb: Domain 32:Destruction of 
> > domain failed
> > # cat /var/log/xen/qemu-dm-domain.log
> > char device redirected to /dev/pts/2 (label serial0)
> > xen be core: xen be core: can't open gnttab device
> > can't open gnttab device
> > xen be core: xen be core: can't open gnttab device
> > can't open gnttab device
> > 
> > I'm not really familiar with any of that code, but I think that using
> > qdev_init_nofail is wrong, since on FreeBSD/Xen for example we don't yet
> > support the gnttab device, so initialization of the Xen Qdisk backend can 
> > fail
> > (and possibly the same applies to Linux if someone decides to compile a 
> > kernel
> > without the gnttab device). Yet QEMU can be used without the Qdisk backend.
> 
> How did you manage to configure QEMU before? The configure script had
> xc_gnttab_open calls in it up to Xen 4.6.

I know the answer! Because the configure script only compiles the code,
doesn't try to run it. xc_gnttab_open compiled correctly but returned
error when executed. Is that right?


> I am happy to support a use case where the kernel doesn't have gntdev,
> but it needs to be explicit: we need to detect it in the configure
> script, then avoid the initialization of devices which require it.

I would still prefer configure to be able to detect this case. If it
cannot be made to detect it, then we can try to figure out a way to
catch the initialization errors at run time.

Re: [Qemu-devel] Commit 3a6c9 breaks QEMU on FreeBSD/Xen

2017-01-24 Thread Stefano Stabellini

On Tue, 24 Jan 2017, Roger Pau Monné wrote:
> Hello,
> 
> The following commit:
> 
> commit 3a6c9172ac5951e6dac2b3f6cbce3cfccdec5894
> Author: Juergen Gross 
> Date:   Tue Nov 22 07:10:58 2016 +0100
> 
> xen: create qdev for each backend device
> 
> Prevents me from running QEMU on FreeBSD/Xen, the following is printed on the
> QEMU log:
> 
> char device redirected to /dev/pts/2 (label serial0)
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> 
> # xl create -c ~/domain.cfg
> Parsing config from /root/domain.cfg
> libxl: error: libxl_dm.c:2201:device_model_spawn_outcome: Domain 32:domain 32 
> device model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1506:domcreate_devmodel_started: Domain 
> 32:device model did not start: -3
> libxl: error: libxl_dm.c:2315:kill_device_model: Device Model already exited
> libxl: error: libxl.c:1572:libxl__destroy_domid: Domain 32:Non-existant domain
> libxl: error: libxl.c:1531:domain_destroy_callback: Domain 32:Unable to 
> destroy guest
> libxl: error: libxl.c:1458:domain_destroy_cb: Domain 32:Destruction of domain 
> failed
> # cat /var/log/xen/qemu-dm-domain.log
> char device redirected to /dev/pts/2 (label serial0)
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> 
> I'm not really familiar with any of that code, but I think that using
> qdev_init_nofail is wrong, since on FreeBSD/Xen for example we don't yet
> support the gnttab device, so initialization of the Xen Qdisk backend can fail
> (and possibly the same applies to Linux if someone decides to compile a kernel
> without the gnttab device). Yet QEMU can be used without the Qdisk backend.

How did you manage to configure QEMU before? The configure script had
xc_gnttab_open calls in it up to Xen 4.6.

I am happy to support a use case where the kernel doesn't have gntdev,
but it needs to be explicit: we need to detect it in the configure
script, then avoid the initialization of devices which require it.

[Qemu-devel] [PULL v2 0/7] Nios2 architecture support

2017-01-24 Thread Richard Henderson

Version 2: Use PRIx64 in patch 1 to fix a 32-bit build failure.


r~


The following changes since commit a678502e4f7580a6f143f680404aaee57ac3f4b5:

  Merge remote-tracking branch 
'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging (2017-01-24 
15:39:09 +)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-nios-20170124

for you to fetch changes up to e671711c9a8c1de540f035095e18458bc03968de:

  nios2: Add support for Nios-II R1 (2017-01-24 13:10:36 -0800)


nios2 target support


Chris Wulff (3):
  nios2: Add architecture emulation support
  nios2: Add IIC interrupt controller emulation
  nios2: Add periodic timer emulation

Marek Vasut (4):
  nios2: Add disas entries
  nios2: Add usermode binaries emulation
  nios2: Add Altera 10M50 GHRD emulation
  nios2: Add support for Nios-II R1

 MAINTAINERS  |8 +
 arch_init.c  |2 +
 configure|5 +
 default-configs/nios2-linux-user.mak |1 +
 default-configs/nios2-softmmu.mak|6 +
 disas/Makefile.objs  |1 +
 disas/nios2.c| 3534 ++
 hw/intc/Makefile.objs|1 +
 hw/intc/nios2_iic.c  |  103 +
 hw/nios2/10m50_devboard.c|  126 ++
 hw/nios2/Makefile.objs   |1 +
 hw/nios2/boot.c  |  223 +++
 hw/nios2/boot.h  |   11 +
 hw/nios2/cpu_pic.c   |   70 +
 hw/timer/Makefile.objs   |1 +
 hw/timer/altera_timer.c  |  237 +++
 include/disas/bfd.h  |6 +
 include/elf.h|2 +
 include/sysemu/arch_init.h   |1 +
 linux-user/elfload.c |   57 +
 linux-user/main.c|  140 +-
 linux-user/nios2/syscall_nr.h|  329 
 linux-user/nios2/target_cpu.h|   39 +
 linux-user/nios2/target_signal.h |   26 +
 linux-user/nios2/target_structs.h|   58 +
 linux-user/nios2/target_syscall.h|   37 +
 linux-user/nios2/termbits.h  |  220 +++
 linux-user/signal.c  |  239 ++-
 linux-user/syscall_defs.h|8 +-
 qemu-doc.texi|3 +
 target/nios2/Makefile.objs   |4 +
 target/nios2/cpu.c   |  237 +++
 target/nios2/cpu.h   |  272 +++
 target/nios2/helper.c|  313 +++
 target/nios2/helper.h|   27 +
 target/nios2/mmu.c   |  296 +++
 target/nios2/mmu.h   |   50 +
 target/nios2/monitor.c   |   35 +
 target/nios2/op_helper.c |   47 +
 target/nios2/translate.c |  958 +
 40 files changed, 7727 insertions(+), 7 deletions(-)
 create mode 100644 default-configs/nios2-linux-user.mak
 create mode 100644 default-configs/nios2-softmmu.mak
 create mode 100644 disas/nios2.c
 create mode 100644 hw/intc/nios2_iic.c
 create mode 100644 hw/nios2/10m50_devboard.c
 create mode 100644 hw/nios2/Makefile.objs
 create mode 100644 hw/nios2/boot.c
 create mode 100644 hw/nios2/boot.h
 create mode 100644 hw/nios2/cpu_pic.c
 create mode 100644 hw/timer/altera_timer.c
 create mode 100644 linux-user/nios2/syscall_nr.h
 create mode 100644 linux-user/nios2/target_cpu.h
 create mode 100644 linux-user/nios2/target_signal.h
 create mode 100644 linux-user/nios2/target_structs.h
 create mode 100644 linux-user/nios2/target_syscall.h
 create mode 100644 linux-user/nios2/termbits.h
 create mode 100644 target/nios2/Makefile.objs
 create mode 100644 target/nios2/cpu.c
 create mode 100644 target/nios2/cpu.h
 create mode 100644 target/nios2/helper.c
 create mode 100644 target/nios2/helper.h
 create mode 100644 target/nios2/mmu.c
 create mode 100644 target/nios2/mmu.h
 create mode 100644 target/nios2/monitor.c
 create mode 100644 target/nios2/op_helper.c
 create mode 100644 target/nios2/translate.c

Re: [Qemu-devel] [Qemu-block] [PATCH v4 0/7] add blkdebug tests

2017-01-24 Thread Eric Blake

ping

On 12/20/2016 01:19 PM, Eric Blake wrote:
> [oops, I forgot cc's on the cover letter, even though the rest of the
> series was properly broadcast]
> 
> On 12/20/2016 01:15 PM, Eric Blake wrote:
>> Based on Kevin's block-next branch:
>> http://repo.or.cz/qemu/kevin.git/shortlog/refs/heads/block-next
>>
>> Available as a tag at:
>> git fetch git://repo.or.cz/qemu/ericb.git nbd-blkdebug-v4
>>
>> v3 was:
>> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00278.html
>>
>> Since then:
>> - Address comments from Max and Kevin
>>   - typo cleanups
>>   - no longer force 512-byte alignment
>>   - better naming
>> - Merge two series into one
>>
>> 001/7:[down] 'qcow2: Assert that cluster operations are aligned'
>> 002/7:[down] 'qcow2: Discard/zero clusters by byte count'
>> 003/7:[] [--] 'blkdebug: Sanity check block layer guarantees'
>> 004/7:[0015] [FC] 'blkdebug: Add pass-through write_zero and discard support'
>> 005/7:[0002] [FC] 'blkdebug: Simplify override logic'
>> 006/7:[0046] [FC] 'blkdebug: Add ability to override unmap geometries'
>> 007/7:[0015] [FC] 'tests: Add coverage for recent block geometry fixes'
>>
>> Eric Blake (7):
>>   qcow2: Assert that cluster operations are aligned
>>   qcow2: Discard/zero clusters by byte count
>>   blkdebug: Sanity check block layer guarantees
>>   blkdebug: Add pass-through write_zero and discard support
>>   blkdebug: Simplify override logic
>>   blkdebug: Add ability to override unmap geometries
>>   tests: Add coverage for recent block geometry fixes
>>
>>  qapi/block-core.json   |  24 +-
>>  block/qcow2.h  |   9 +-
>>  block/blkdebug.c   | 204 
>> +++--
>>  block/qcow2-cluster.c  |  29 ---
>>  block/qcow2-snapshot.c |   7 +-
>>  block/qcow2.c  |  21 ++---
>>  tests/qemu-iotests/173 | 114 +
>>  tests/qemu-iotests/173.out |  49 +++
>>  tests/qemu-iotests/group   |   1 +
>>  9 files changed, 413 insertions(+), 45 deletions(-)
>>  create mode 100755 tests/qemu-iotests/173
>>  create mode 100644 tests/qemu-iotests/173.out
>>
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v7 05/27] tcg: add options for enabling MTTCG

2017-01-24 Thread Richard Henderson

On 01/24/2017 12:25 PM, Alex Bennée wrote:
> We are talking about doing the necessary annotation to all a givens
> targets loads and stores? I figured this code would morph and be tweaked
> when (if?) we get there.

Fair enough.


r~

Re: [Qemu-devel] [PATCH v7 18/27] cputlb: introduce tlb_flush_*_all_cpus

2017-01-24 Thread Richard Henderson

On 01/24/2017 12:34 PM, Alex Bennée wrote:
> 
> Richard Henderson  writes:
> 
>> On 01/19/2017 09:04 AM, Alex Bennée wrote:
>>> +/* flush_all_helper: run fn across all cpus
>>> + *
>>> + * If the wait flag is set then the src cpu's helper will be queued as
>>> + * "safe" work and the loop exited creating a synchronisation point
>>> + * where all queued work will be finished before execution starts
>>> + * again.
>>> + */
>>> +static void flush_all_helper(CPUState *src, bool wait,
>>> + run_on_cpu_func fn, run_on_cpu_data d)
>>> +{
>>> +CPUState *cpu;
>>> +
>>> +if (!wait) {
>>> +CPU_FOREACH(cpu) {
>>> +if (cpu != src) {
>>> +async_run_on_cpu(cpu, fn, d);
>>> +} else {
>>> +g_assert(qemu_cpu_is_self(src));
>>> +fn(src, d);
>>> +}
>>> +}
>>> +} else {
>>> +CPU_FOREACH(cpu) {
>>> +if (cpu != src) {
>>> +async_run_on_cpu(cpu, fn, d);
>>> +} else {
>>> +async_safe_run_on_cpu(cpu, fn, d);
>>> +}
>>> +
>>> +}
>>> +cpu_loop_exit(src);
>>> +}
>>> +}
>>
>> What's the rationale for not having the target do the exit itself?  Surely it
>> can tell, and simple end the TB after the insn.
> 
> It's more for the global sync functionality. I wanted to keep all the
> guts of re-starting the loop with the correct async_safe_work all in one
> place with a defined API for the guests rather than have them all do it
> themselves.
> 
> For the common case of not needing to sync across the cores I agree the
> guest is perfectly able to end the TB so its safe work completes next.
> In fact the ARM helper calls do exactly that.

Hmm.  Would it make more sense to have two functions then, for wait and !wait?
That would allow the wait function be QEMU_NORETURN, which might make it a bit
more obvious about the interface contract.


r~

Re: [Qemu-devel] [PATCH v7 16/27] cputlb: add tlb_flush_by_mmuidx async routines

2017-01-24 Thread Richard Henderson

On 01/24/2017 12:31 PM, Alex Bennée wrote:
>> Why don't we just pass in this bitmap in the first place?  It's much better
>> than having to use varargs in tlb_flush_by_mmuidx...
> 
> We could. By not messing with the API it leaves the door open to having
> other non-MTTCG architectures that have lots of MMU indexes versus a
> hard limit based on page-size. That said I think the number of indexes
> also affects the size of the TLB so I guess the current design is
> limited for arbitrarily large sets if indexes?

We hard-limit at 12 indices, though even that is arguably too high.
I hope we never see more than PPC's current 8.

> Is ARM is the current outlier for this functionality? Apart from SPARC's
> two uses are we likely to see more architectures using this?

In theory, Alpha could use it to avoid ever flushing MMU_PHYS_IDX.  It appears
that there are a few others that could also avoid flushing a "mmu-disabled" 
index.

I suspect that PPC could make good use of it as well.  That one's complicated
enough that it probably needs a good going over -- especially for the non-local
flushes.

r~

Re: [Qemu-devel] [PATCH v7 18/27] cputlb: introduce tlb_flush_*_all_cpus

2017-01-24 Thread Alex Bennée

Richard Henderson  writes:

> On 01/19/2017 09:04 AM, Alex Bennée wrote:
>> +/* flush_all_helper: run fn across all cpus
>> + *
>> + * If the wait flag is set then the src cpu's helper will be queued as
>> + * "safe" work and the loop exited creating a synchronisation point
>> + * where all queued work will be finished before execution starts
>> + * again.
>> + */
>> +static void flush_all_helper(CPUState *src, bool wait,
>> + run_on_cpu_func fn, run_on_cpu_data d)
>> +{
>> +CPUState *cpu;
>> +
>> +if (!wait) {
>> +CPU_FOREACH(cpu) {
>> +if (cpu != src) {
>> +async_run_on_cpu(cpu, fn, d);
>> +} else {
>> +g_assert(qemu_cpu_is_self(src));
>> +fn(src, d);
>> +}
>> +}
>> +} else {
>> +CPU_FOREACH(cpu) {
>> +if (cpu != src) {
>> +async_run_on_cpu(cpu, fn, d);
>> +} else {
>> +async_safe_run_on_cpu(cpu, fn, d);
>> +}
>> +
>> +}
>> +cpu_loop_exit(src);
>> +}
>> +}
>
> What's the rationale for not having the target do the exit itself?  Surely it
> can tell, and simple end the TB after the insn.

It's more for the global sync functionality. I wanted to keep all the
guts of re-starting the loop with the correct async_safe_work all in one
place with a defined API for the guests rather than have them all do it
themselves.

For the common case of not needing to sync across the cores I agree the
guest is perfectly able to end the TB so its safe work completes next.
In fact the ARM helper calls do exactly that.

--
Alex Bennée

Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: MMU compatibility check.

2017-01-24 Thread Thomas Huth

On 24.01.2017 19:56, Valentin Plotkin wrote:
> 
> Hi everyone,
> 
> I looked at the "qemu-system-ppc -nographic -cpu G2leGP3 -M ppce500" on
> the BiteSizedTasks page. The segfault was caused by machine
> initialization code which expected a certain MMU model, checked, so
> unused SPR were read, returning zeros. bamboo and virtex machines are
> affected as well, but it doesn't always cause segfault, usually running
> into unmapped memory and failing somewhat more nicely.
> 
> I added the checks. It would be possible to add support for other MMU
> models, but I'm not sure if there is any point (would any guest OS
> support mutually exclusive CPU and machine)?

 Hi,

great to have a fix for this crash! I don't think it make much sense to
add support for other MMU models here, so the simple checks should be
good enough.
However, your new code obviously does not follow the QEMU coding style.
Could you please run your patch through scripts/checkpatch.pl and fix
all issues that it reports? And when you resubmit, please make sure to
copy the maintainers on CC: as well (scripts/get_maintainers.pl is your
friend here).

 Thanks,
  Thomas

Re: [Qemu-devel] [PATCH v7 16/27] cputlb: add tlb_flush_by_mmuidx async routines

2017-01-24 Thread Alex Bennée

Richard Henderson  writes:

> On 01/19/2017 09:04 AM, Alex Bennée wrote:
>> +/* Helper function to slurp va_args list into a bitmap
>> + */
>> +static inline unsigned long make_mmu_index_bitmap(va_list args)
>> +{
>> +unsigned long bitmap = 0;
>> +int mmu_index = va_arg(args, int);
>> +
>> +/* An empty va_list would be a bad call */
>> +g_assert(mmu_index > 0);
>> +
>> +do {
>> +set_bit(mmu_index, );
>> +mmu_index = va_arg(args, int);
>> +} while (mmu_index >= 0);
>> +
>> +return bitmap;
>> +}
>> +
>
> Why don't we just pass in this bitmap in the first place?  It's much better
> than having to use varargs in tlb_flush_by_mmuidx...

We could. By not messing with the API it leaves the door open to having
other non-MTTCG architectures that have lots of MMU indexes versus a
hard limit based on page-size. That said I think the number of indexes
also affects the size of the TLB so I guess the current design is
limited for arbitrarily large sets if indexes?

Is ARM is the current outlier for this functionality? Apart from SPARC's
two uses are we likely to see more architectures using this?

--
Alex Bennée

[Qemu-devel] [PULL 00/31] Trivial patches for 2017-01-24

2017-01-24 Thread Michael Tokarev

Rebased, yes it was a trivial merge conflict (a trivial patch
changed comment in a removed section of a file). The same as
before, with the merge conflict fixed and an r-b added.

Not resending other stuff, just the cover letter.

Thanks,

/mjt

The following changes since commit a9e404600a9bd1e6a26431fc89e5069092e67f14:

  Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20170124' into 
staging (2017-01-24 17:26:26 +)

are available in the git repository at:

  git://git.corpit.ru/qemu.git tags/trivial-patches-fetch

for you to fetch changes up to 5658ffa39aae034458231bc4abfee57637b88c6e:

  hw/isa/isa-bus: Set category of the "isabus-bridge" device (2017-01-24 
23:26:54 +0300)


trivial patches for 2017-01-24


Cao jin (5):
  pcie: fix typo in comments
  doc/usb2: fix typo
  vfio: remove a duplicated word in comments
  util/mmap-alloc: check parameter before using
  util/mmap-alloc: refactor a little bit for readability

Frediano Ziglio (1):
  usb: Fix typo in documentation

Gerd Hoffmann (1):
  docs: sync pci-ids.txt

Marc-André Lureau (3):
  object.h: spelling fix
  object: make some funcs static
  win32: use glib gpoll if glib >= 2.50

Michael Tokarev (1):
  doc: don't mention -memory, it is -m

Paolo Bonzini (4):
  qemu-img: remove dead check
  pci-assign: avoid pointless stat
  block: remove dead check
  qga: fix erroneous argument to strerror

Peter Maydell (6):
  README: Add linux to macOS build info
  hw/i386/kvmvapic: Remove dead code in patch_hypercalls()
  lm32: milkymist-tmu2: fix another integer overflow
  disas/cris.c: Fix Coverity warning about unchecked NULL
  hw/display/framebuffer.c: Avoid overflow for framebuffers > 4GB
  scsi-disk: add 'fall through' comment to switch VERIFY cases

Po-Hsu Lin (1):
  qemu-options: cleanup duplicated help message for kernel_irqchip

Samuel Thibault (1):
  Drop duplicate display option documentation

Stefan Weil (4):
  hw/block/m25p80: Fix typo in local macro name
  Fix documentation and some comments (article, grammar)
  include: Fix typos found by codespell
  hw: Fix typos found by codespell

Thomas Huth (2):
  usb: Set category and description of the MTP device
  hw/isa/isa-bus: Set category of the "isabus-bridge" device

Ziyue Yang (2):
  gdbstub.c: fix GDB connection segfault caused by empty machines
  gdbstub.c: update old error report statements

 README |  1 +
 block.c|  2 +-
 disas/cris.c   |  2 +-
 docs/specs/pci-ids.txt |  3 +++
 docs/usb-storage.txt   |  2 +-
 docs/usb2.txt  |  2 +-
 gdbstub.c  | 17 -
 hw/block/m25p80.c  |  4 ++--
 hw/core/generic-loader.c   |  4 ++--
 hw/core/qdev-properties.c  |  2 +-
 hw/display/framebuffer.c   |  2 +-
 hw/display/milkymist-tmu2.c|  2 +-
 hw/display/xlnx_dp.c   |  4 ++--
 hw/i386/kvmvapic.c |  6 --
 hw/i386/pc.c   |  2 +-
 hw/i386/pci-assign-load-rom.c  | 16 
 hw/isa/isa-bus.c   |  1 +
 hw/net/cadence_gem.c   |  2 +-
 hw/net/spapr_llan.c|  4 ++--
 hw/pci/pcie.c  |  2 +-
 hw/ppc/spapr_drc.c |  2 +-
 hw/s390x/s390-pci-bus.h|  4 ++--
 hw/scsi/scsi-disk.c|  1 +
 hw/usb/dev-mtp.c   |  4 +++-
 hw/vfio/pci-quirks.c   |  2 +-
 hw/vfio/pci.c  |  4 ++--
 hw/virtio/virtio-crypto.c  |  2 +-
 include/glib-compat.h  |  2 +-
 include/hw/dma/xlnx_dpdma.h|  3 ++-
 include/hw/pci-host/q35.h  |  2 +-
 include/hw/register.h  |  2 +-
 include/qapi/dealloc-visitor.h |  2 +-
 include/qemu/qht.h |  2 +-
 include/qemu/xattr.h   |  2 +-
 include/qom/object.h   | 26 +-
 qemu-doc.texi  |  2 +-
 qemu-img.c | 12 +---
 qemu-options.hx|  7 +++
 qga/main.c |  4 ++--
 qom/object.c   |  4 ++--
 util/mmap-alloc.c  | 17 -
 util/oslib-win32.c |  2 ++
 util/uri.c |  4 ++--
 43 files changed, 88 insertions(+), 104 deletions(-)

Re: [Qemu-devel] [PATCH v7 05/27] tcg: add options for enabling MTTCG

2017-01-24 Thread Alex Bennée

Richard Henderson  writes:

> On 01/19/2017 09:04 AM, Alex Bennée wrote:
>> +void qemu_tcg_configure(QemuOpts *opts, Error **errp)
>> +{
>> +const char *t = qemu_opt_get(opts, "thread");
>> +if (t) {
>> +if (strcmp(t, "multi") == 0) {
>> +if (TCG_OVERSIZED_GUEST) {
>> +error_setg(errp, "No MTTCG when guest word size > hosts");
>
> I agree with this, since I don't ever imagine it'll be fixed.  It's a silly 
> use
> case in the first place considering the ubiquity of 64-bit hosts.

Funnily enough I know one kernel hacker who does use this to test their
arm64 kernels on their re-purposed armhf chromebooks. I've already
explained coming their way ;-)

>
>> +} else if (!check_tcg_memory_orders_compatible()) {
>> +error_setg(errp,
>> +   "No MTTCG when guest MO is stronger than host 
>> MO");
>
> This, on the other hand, means that one cannot even force multi for testing.  
> A
> warning perhaps?

I did toy with making that a warning when I first wrote it. I'll make it so.

> And how shall we handle a guest translate adding barriers as
> necessary to satisfy host memory ordering?

We are talking about doing the necessary annotation to all a givens
targets loads and stores? I figured this code would morph and be tweaked
when (if?) we get there.

--
Alex Bennée

Re: [Qemu-devel] [PATCH 1/1] Migration: libvirt live migration over RDMA of ipv6 addr failed

2017-01-24 Thread Michael Roth

Quoting Daniel P. Berrange (2017-01-24 11:36:19)
> On Tue, Jan 24, 2017 at 09:44:33AM -0600, David Dai wrote:
> > Using libvirt to do live migration over RDMA via ip v6 address failed. 
> > For example:
> > # virsh migrate  --live --migrateuri rdma://[deba::]:49152  \
> >   rhel73_host1_guest1 qemu+ssh://[deba::]/system --verbose
> > root@deba::'s password:
> > error: internal error: unable to execute QEMU command 'migrate': RDMA 
> > ERROR:  
> >   could not rdma_getaddrinfo address deba
> > 
> > As we can see, the ip v6 address used by rdma_getaddrinfo() has only "deba" 
> > part. It should be "deba::".
> > 
> > 1) According to rfc 3986, a literal ip v6 address should be enclosed 
> > in '[' and ']'.
> > When using virsh command to do live migration via ip v6 addresss, user
> > will input the ip v6 address with brackets (i.e. rdma://[deba::]:49152).
> > libvirt will parse command line option by calling virURIParse(). 
> > Inside it calls virStringStripIPv6Brackets() to strip off the brackets.
> > The uri passed in to virURIParse()  is:
> >"uri = rdma://[deba::]:49152"
> > Inside virURIParse() routine, it will strip off the bracket '[' and ']' if
> > it's ip v6 address. Then save the ip v6 address in this format "deba::" 
> > in the virURI->server field, and to be passed to qemu.
> > 
> > 2) At the beginning of migration, in qemu's qemu_rdma_data_init(host_port) 
> > routine, it calls inet_parse(host_port) routine to parse the ip v6 address 
> > and 
> > port string obtained from libvirt.
> > The input string host_port passed to qemu_rdma_data_init() can be:
> > "hostname:port", or
> > "ipv4address:port", or
> > "[ipv6address]:port" (i.e "[deba::]:49152"), or
> > "ipv6address:port" (i.e "deba:::49152").
> > Existing qemu api inet_parse() can handle the above first 3 cases properly, 
> >  
> > but didn't handle the last case ("ipv6address:port") correctly.
> > In this live migration over rdma via ip v6 address case, the server ip v6 
> > address obtained from libvirt doesn't contain the brackets '[' and ']' 
> > (i.e. "deba:::49152"). It caused inet_parse() to parse only "deba" 
> > part, 
> > and stopped at the 1st colon ':'. As the result, the subsequent 
> > rdma_getaddrinfo() with ip address "deba" will fail.
> > 
> > If we don't strip off brackets '[' and ']' for an ip v6 address in 
> > libvirt's 
> > virURIParse(), it will cause libvirt ipv6 ssh authentication failure.
> > 
> > NOTE:
> > If using libvirt to do live migration over TCP via ip v6 address:
> > # virsh migrate  --live --migrateuri tcp://[deba::]:49152  \
> >   rhel73_host1_guest1 qemu+ssh://[deba::]/system --verbose
> > It works fine.
> > In migrateuri of tcp case, libvirt will call virNetSocketNewConnectTCP()
> > directly to connect to remote "deba:::49152" after it strips off
> > the bracket '[' and ']' for an ip v6 address. 
> > On qemu side, fd_start_outgoing_migration() will be called to do migration.
> > It doesn't call inet_parse(). So we don't see issue in tcp case.
> > 
> > Solution:
> > I choose to fix the code in qemu's inet_parse() routine to parse the
> > ip v6 addresss w/o brackets properly (i.e. "deba:::49152" format).
> 
> That is not right - "deba::222:49152" is *not* a valid address. It
> is mandatory to use [] when providing a numeric IPv6 address. So
> there's nothing broken in QEMU here - libvirt needs fixing to pass
> correct data to QEMU

It's worth noting that inet_parse() is still somewhat broken in that
instead of rejecting "deba:::49152" it erroneously treats it as
an ipv4 address and passes it up the stack as host/port deba/.

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
>

Re: [Qemu-devel] [PATCH v3 1/8] make: move top level dir to end of include search path

2017-01-24 Thread Eric Blake

On 01/24/2017 05:01 AM, Daniel P. Berrange wrote:
> Currently the search path is
> 
>   1. source dir corresponding to input file (implicit by compiler)
>   2. top level build dir
>   3. top level source dir
>   4. top level source include/ dir
>   5. source dir corresponding to input file
>   6. build dir corresponding to output file
> 
> This causes a semantic difference in behaviour for builds
> where srcdir == builddir vs srcdir != builddir, because
> item 5 moves from end to start, when srcdir == builddir.

Rather, item 5 is a no-op (because it duplicated 1), and item 6 moves
from the end to the beginning when srcdir == builddir

> 
> As a general rule we also want to move to have all shared
> headers in the include/ dir, so move that ahead of the
> top level dirs in the search order.

Wait - are you proposing that you swap 4 to occur earlier than 2/3?...

> 
> Thus we now have:
> 
>   1. source dir corresponding to input file
>   2. build dir corresponding to output file
>   3. top level build dir
>   4. top level source dir
>   5. top level source include/ dir

...because this doesn't match that swap, and I don't see it in the patch
body (but I may have missed it; I'm not as strong at reviewing make as I
am at C)

> 
> and items 1+2 and 4+5 collapse into a single dir when srcdir==builddir

Isn't that items 3+4 (not 4+5) that collapse?

> so overall order remains the same.
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  rules.mak | 17 ++---
>  1 file changed, 10 insertions(+), 7 deletions(-)


> 
> diff --git a/rules.mak b/rules.mak
> index d5c516c..e09aabe 100644
> --- a/rules.mak
> +++ b/rules.mak
> @@ -26,8 +26,10 @@ QEMU_CXXFLAGS = -D__STDC_LIMIT_MACROS $(filter-out 
> -Wstrict-prototypes -Wmissing
>  # Flags for dependency generation
>  QEMU_DGFLAGS += -MMD -MP -MT $@ -MF $(@D)/$(*F).d
>  
> -# Same as -I$(SRC_PATH) -I., but for the nested source/object directories
> -QEMU_INCLUDES += -I$( +# Compiler searches the source file dir first, but in vpath builds
> +# we need to make it search the build dir too, before any other
> +# explicit search paths.
> +QEMU_LOCAL_INCLUDES = -I$(BUILD_DIR)/$(@D)

while this is the new code for 2, plus documentation that 1 is implicit.

>  
>  WL_U := -Wl,-u,
>  find-symbols = $(if $1, $(sort $(shell $(NM) -P -g $1 | $2)))
> @@ -61,7 +63,7 @@ expand-objs = $(strip $(sort $(filter %.o,$1)) \
>$(filter-out %.o %.mo,$1))
>  
>  %.o: %.c
> - $(call quiet-command,$(CC) $(QEMU_INCLUDES) $(QEMU_CFLAGS) 
> $(QEMU_DGFLAGS) $(CFLAGS) $($@-cflags) -c -o $@ $<,"CC","$(TARGET_DIR)$@")
> + $(call quiet-command,$(CC) $(QEMU_LOCAL_INCLUDES) $(QEMU_INCLUDES) 
> $(QEMU_CFLAGS) $(QEMU_DGFLAGS) $(CFLAGS) $($@-cflags) -c -o $@ 
> $<,"CC","$(TARGET_DIR)$@")

And the pre-pending of QEMU_LOCAL_INCLUDES is what changes the position
of the local directory from last to first, thus delaying the top level
dir to the end, but I don't see top/include/ moving.

These are now some long lines; is it worth taking the time to add \ line
splitting for legibility, either in this patch or as an add-on?

> @@ -359,6 +361,7 @@ define unnest-vars
>  $(eval $(o:%.mo=%$(DSOSUF)): module-common.o $($o-objs)),
>  $(error $o added in $v but $o-objs is not set)))
>  $(shell mkdir -p ./ $(sort $(dir $($v
> +$(shell cd $(BUILD_DIR) && mkdir -p ./ $(sort $(dir $($v
>  # Include all the .d files

Okay, this change makes sense (make sure all the build directories exist
in time; no-op for in-tree build, but helpful for VPATH), but seems
unrelated to the commit message.  Rebase snafu?

It looks like you're on the right track, but there's enough
discrepancies between the commit message and actual change that I'd
prefer a v4 before I grant R-b.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v2 1/2] niagara: fail if a firmware file is missing

2017-01-24 Thread Artyom Tarasenko

fail if a firmware file is missing and not qtest_enabled(),
the later is necessary to allow some basic tests if
firmware is not available

Suggested-by: Peter Maydell 
Signed-off-by: Artyom Tarasenko 
---
 hw/sparc64/niagara.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/hw/sparc64/niagara.c b/hw/sparc64/niagara.c
index b55d4bb..edde86e 100644
--- a/hw/sparc64/niagara.c
+++ b/hw/sparc64/niagara.c
@@ -35,6 +35,8 @@
 #include "hw/timer/sun4v-rtc.h"
 #include "exec/address-spaces.h"
 #include "sysemu/block-backend.h"
+#include "qemu/error-report.h"
+#include "sysemu/qtest.h"
 
 
 typedef struct NiagaraBoardState {
@@ -85,6 +87,17 @@ typedef struct NiagaraBoardState {
 #define NIAGARA_OBP_OFFSET  0x8ULL
 #define PROM_SIZE_MAX   (4 * 1024 * 1024)
 
+static void add_rom_or_fail(const char *file, const hwaddr addr)
+{
+/* XXX remove qtest_enabled() check once firmware files are
+ * in the qemu tree
+ */
+if (!qtest_enabled() && rom_add_file_fixed(file, addr, -1)) {
+error_report("Unable to load a firmware for -M niagara");
+exit(1);
+}
+
+}
 /* Niagara hardware initialisation */
 static void niagara_init(MachineState *machine)
 {
@@ -119,14 +132,13 @@ static void niagara_init(MachineState *machine)
  "sun4v.prom", PROM_SIZE_MAX);
 memory_region_add_subregion(sysmem, NIAGARA_PROM_BASE, >prom);
 
-rom_add_file_fixed("nvram1", NIAGARA_NVRAM_BASE, -1);
-rom_add_file_fixed("1up-md.bin", NIAGARA_MD_ROM_BASE, -1);
-rom_add_file_fixed("1up-hv.bin", NIAGARA_HV_ROM_BASE, -1);
+add_rom_or_fail("nvram1", NIAGARA_NVRAM_BASE);
+add_rom_or_fail("1up-md.bin", NIAGARA_MD_ROM_BASE);
+add_rom_or_fail("1up-hv.bin", NIAGARA_HV_ROM_BASE);
 
-rom_add_file_fixed("reset.bin", NIAGARA_PROM_BASE, -1);
-rom_add_file_fixed("q.bin", NIAGARA_PROM_BASE + NIAGARA_Q_OFFSET, -1);
-rom_add_file_fixed("openboot.bin", NIAGARA_PROM_BASE + NIAGARA_OBP_OFFSET,
-   -1);
+add_rom_or_fail("reset.bin", NIAGARA_PROM_BASE);
+add_rom_or_fail("q.bin", NIAGARA_PROM_BASE + NIAGARA_Q_OFFSET);
+add_rom_or_fail("openboot.bin", NIAGARA_PROM_BASE + NIAGARA_OBP_OFFSET);
 
 /* the virtual ramdisk is kind of initrd, but it resides
outside of the partition RAM */
-- 
2.7.2

[Qemu-devel] [PATCH v2 2/2] niagara: check if a serial port is available

2017-01-24 Thread Artyom Tarasenko

Reported-by: Markus Armbruster 
Reviewed-by: Markus Armbruster 
Signed-off-by: Artyom Tarasenko 
---
 hw/sparc64/niagara.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/sparc64/niagara.c b/hw/sparc64/niagara.c
index edde86e..9a8d610 100644
--- a/hw/sparc64/niagara.c
+++ b/hw/sparc64/niagara.c
@@ -158,9 +158,10 @@ static void niagara_init(MachineState *machine)
 exit(1);
 }
 }
-serial_mm_init(sysmem, NIAGARA_UART_BASE, 0, NULL, 115200,
-   serial_hds[0], DEVICE_BIG_ENDIAN);
-
+if (serial_hds[0]) {
+serial_mm_init(sysmem, NIAGARA_UART_BASE, 0, NULL, 115200,
+   serial_hds[0], DEVICE_BIG_ENDIAN);
+}
 empty_slot_init(NIAGARA_IOBBASE, NIAGARA_IOBSIZE);
 sun4v_rtc_init(NIAGARA_RTC_BASE);
 }
-- 
2.7.2

[Qemu-devel] [PATCH v2 0/2] niagara fixes

2017-01-24 Thread Artyom Tarasenko

niagara fixes

Artyom Tarasenko (2):
  niagara: fail if a firmware file is missing
  niagara: check if a serial port is available

 hw/sparc64/niagara.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

-- 
2.7.2

Re: [Qemu-devel] [PATCH 03/17] migration: split common postcopy out of ram postcopy

2017-01-24 Thread Dr. David Alan Gilbert

* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> 24.01.2017 12:24, Juan Quintela wrote:
> > Vladimir Sementsov-Ogievskiy  wrote:
> > > Split common postcopy staff from ram postcopy staff.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > > ---
> > >   include/migration/migration.h |  1 +
> > >   migration/migration.c | 39 
> > > +++
> > >   migration/postcopy-ram.c  |  4 +++-
> > >   migration/savevm.c| 31 ++-
> > >   4 files changed, 53 insertions(+), 22 deletions(-)
> > > 
> > Hi
> > 
> > {
> > >   MigrationState *s;
> > > @@ -1587,9 +1592,11 @@ static int postcopy_start(MigrationState *ms, bool 
> > > *old_vm_running)
> > >* need to tell the destination to throw any pages it's already 
> > > received
> > >* that are dirty
> > >*/
> > > -if (ram_postcopy_send_discard_bitmap(ms)) {
> > > -error_report("postcopy send discard bitmap failed");
> > > -goto fail;
> > > +if (migrate_postcopy_ram()) {
> > > +if (ram_postcopy_send_discard_bitmap(ms)) {
> > > +error_report("postcopy send discard bitmap failed");
> > > +goto fail;
> > > +}
> > I will have preffered that for the ram commands, to embed the
> > migrate_postocpy_ram() check inside them, but that is taste, and
> > everyone has its own O:-)
> > 
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index e436cb2..c8a71c8 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -73,7 +73,7 @@ static struct mig_cmd_args {
> > >   [MIG_CMD_INVALID]  = { .len = -1, .name = "INVALID" },
> > >   [MIG_CMD_OPEN_RETURN_PATH] = { .len =  0, .name = 
> > > "OPEN_RETURN_PATH" },
> > >   [MIG_CMD_PING] = { .len = sizeof(uint32_t), .name = 
> > > "PING" },
> > > -[MIG_CMD_POSTCOPY_ADVISE]  = { .len = 16, .name = "POSTCOPY_ADVISE" 
> > > },
> > > +[MIG_CMD_POSTCOPY_ADVISE]  = { .len = -1, .name = "POSTCOPY_ADVISE" 
> > > },
> > >   [MIG_CMD_POSTCOPY_LISTEN]  = { .len =  0, .name = "POSTCOPY_LISTEN" 
> > > },
> > >   [MIG_CMD_POSTCOPY_RUN] = { .len =  0, .name = "POSTCOPY_RUN" },
> > >   [MIG_CMD_POSTCOPY_RAM_DISCARD] = {
> > > @@ -827,12 +827,17 @@ int qemu_savevm_send_packaged(QEMUFile *f, const 
> > > uint8_t *buf, size_t len)
> > >   /* Send prior to any postcopy transfer */
> > >   void qemu_savevm_send_postcopy_advise(QEMUFile *f)
> > >   {
> > > -uint64_t tmp[2];
> > > -tmp[0] = cpu_to_be64(getpagesize());
> > > -tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
> > > +if (migrate_postcopy_ram()) {
> > > +uint64_t tmp[2];
> > > +tmp[0] = cpu_to_be64(getpagesize());
> > > +tmp[1] = cpu_to_be64(1ul << qemu_target_page_bits());
> > > -trace_qemu_savevm_send_postcopy_advise();
> > > -qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_ADVISE, 16, (uint8_t 
> > > *)tmp);
> > > +trace_qemu_savevm_send_postcopy_advise();
> > > +qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_ADVISE,
> > > + 16, (uint8_t *)tmp);
> > > +} else {
> > > +qemu_savevm_command_send(f, MIG_CMD_POSTCOPY_ADVISE, 0, NULL);
> > > +}
> > >   }
> > >   /* Sent prior to starting the destination running in postcopy, discard 
> > > pages
> > I haven't yet figured out why you are reusing this command with a
> > different number of parameters.
> > For this to pass, I need that Dave comment on this.
> > 
> > So,
> > 
> > Reviewed-by: Juan Quintela 
> > 
> > conditioned that Dave agrees with this.
> 
> These parameters are unrelated if ram postcopy is disabled. So, I should be
> better to have a possibility of skipping them, then to send unneeded numbers
> and write separate code to read them from the stream (rewriting
> loadvm_postcopy_handle_advise to just read these two numbers and do nothing
> about ram postcopy for this case).

I think I'd prefer either a new command or keeping these fields (probably all 0 
?)
my worry is what happens in the case if we add a 3rd postcopy subfeature;
In your case we have three possibilities:

a) Postcopy RAM only - 16 bytes
b) Postcopy persistent-dirty-bitmap only - 0 bytes
c) Both - 16 bytes

Lets say we added postcopy-foo in the future and it wanted to add
another 16 bytes, what would it send if it was foo+persistent-dirty-bitmap and 
no RAM?
We'd end up with 16 bytes sent but you'd have to be very careful
never to get that confused with case (a) above.

(I don't feel too strongly about it though)

Dave

> -- 
> Best regards,
> Vladimir
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH v2 0/8] virtio: use MemoryRegionCache for descriptors and rings

2017-01-24 Thread Michael S. Tsirkin

On Tue, Jan 24, 2017 at 10:26:27AM -0800, no-re...@patchew.org wrote:
> qemu-system-x86_64: /tmp/qemu-test/src/exec.c:3180: 
> address_space_translate_cached: Assertion `addr < cache->len && *plen <= 
> cache->len - addr' failed.
> make: *** [check-tests/test-qga] Error 1
> make: *** Waiting for unfinished jobs

Interesting.

-- 
MST

Re: [Qemu-devel] [PATCH 05/10] armv7m: honour CCR.STACKALIGN on exception entry

2017-01-24 Thread Peter Maydell

On 24 January 2017 at 19:33, Richard Henderson  wrote:
> On 01/24/2017 11:16 AM, Peter Maydell wrote:
>> The CCR.STACKALIGN bit controls whether the CPU is supposed to force
>> 8-alignment of the stack pointer on entry to the exception handler.
>
> 8...
>
>> +/* Align stack pointer if the guest wants that */
>> +if ((env->regs[13] & 4) && (env->v7m.ccr & R_V7M_CCR_STKALIGN_MASK)) {
>
> 4...
>
>>  env->regs[13] -= 4;
>
> Not alignment.  "&= -4"?

We know SP is always at least a multiple of 4. If it's already
a multiple of 8 then (sp & 4) will be false and we leave sp alone.
Otherwise it's a multiple of 4 but not 8, and subtracting 4
makes it 8 aligned (and we set the saved-XPSR bit to indicate
that we need to undo that on exception exit).

You could maybe rephrase the code to look a bit closer to the
v7M ARM ARM pseudocode, but the way it's written now isn't wrong,
so since this patch is only trying to say "do this if STKALIGN
is set rather than all the time" just adjusting the if conditional
seemed the best thing to me.

(The pseudocode checks for "do we need to align" with
"SP<2> AND forcealign", and does the alignment by
ANDing with a mask constructed with "NOT(ZeroExtend(forcealign:'00',32))".
So we do the check the same way it does, but use a subtract
rather than an AND-NOT. (Since we know that bit 2 must be set
then subtracting 4 and masking that bit to 0 are the same thing.)

thanks
-- PMM

Re: [Qemu-devel] [PATCH v9 04/11] msix: check msix_init's return value

2017-01-24 Thread Michael S. Tsirkin

On Tue, Jan 24, 2017 at 07:18:14PM +0100, Paolo Bonzini wrote:
> 
> 
> On 17/01/2017 17:01, Michael S. Tsirkin wrote:
> >>> Doesn't do it for megasas & hcd-xhci, later patches will fix them.
> >>>
> >>> Signed-off-by: Cao jin 
> > I don't like this one, frankly. That's a bunch of code duplication.
> > I suspect vfio is the only one who might reasonably get EINVAL here.
> > So how about e.g. msix_validate_and_init that doesn't assert and use that
> > from vfio, then switch msix_init to assert instead?
> 
> The names we use normally would be msix_init and msix_init_nofail.
> Would still require a change through the whole tree, but it's more
> consistent at least.
> 
> Paolo

This area has seen too much noise already but OK I guess.
Also, msix_init_exclusive_bar probably should assert
internally, no need for two versions.

-- 
MST

[Qemu-devel] [PATCH 06/10] armv7m: set CFSR.UNDEFINSTR on undefined instructions

2017-01-24 Thread Peter Maydell

From: Michael Davidsaver 

When we take an exception for an undefined instruction, set the
appropriate CFSR bit.

Signed-off-by: Michael Davidsaver 
[PMM: tweaked commit message, comment]
Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7dc30f5..e6b1c36 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6072,6 +6072,7 @@ void arm_v7m_cpu_do_interrupt(CPUState *cs)
 switch (cs->exception_index) {
 case EXCP_UDEF:
 armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_USAGE);
+env->v7m.cfsr |= R_V7M_CFSR_UNDEFINSTR_MASK;
 return;
 case EXCP_SWI:
 /* The PC already points to the next instruction.  */
-- 
2.7.4

Re: [Qemu-devel] [PATCH 05/10] armv7m: honour CCR.STACKALIGN on exception entry

2017-01-24 Thread Richard Henderson

On 01/24/2017 11:16 AM, Peter Maydell wrote:
> The CCR.STACKALIGN bit controls whether the CPU is supposed to force
> 8-alignment of the stack pointer on entry to the exception handler.

8...

> +/* Align stack pointer if the guest wants that */
> +if ((env->regs[13] & 4) && (env->v7m.ccr & R_V7M_CCR_STKALIGN_MASK)) {

4...

>  env->regs[13] -= 4;

Not alignment.  "&= -4"?


r~

Re: [Qemu-devel] [PULL 0/2] s390x fixes

2017-01-24 Thread Peter Maydell

On 24 January 2017 at 15:08, Cornelia Huck <cornelia.h...@de.ibm.com> wrote:
> The following changes since commit 48cef39bf39846ce4aaf79d4a2ae620373b3e181:
>
>   Merge remote-tracking branch 'remotes/rth/tags/pull-hppa-20170123' into 
> staging (2017-01-24 09:52:42 +)
>
> are available in the git repository at:
>
>   git://github.com/cohuck/qemu tags/s390x-20170124
>
> for you to fetch changes up to 0cf4d747cb8d053e6a6161aadfd3531fa1a62be1:
>
>   s390x/kvm: fix cmma reset for KVM (2017-01-24 15:47:31 +0100)
>
> 
> Two s390x fixes: One for the kvm.c build failure, and one for a bug
> that might cause random guest crashes with zeroed out pages on host
> kernels with working cmma (< 4.6 and likely >= 4.10).
>
> 
>
> Christian Borntraeger (1):
>   s390x/kvm: fix cmma reset for KVM
>
> Cornelia Huck (1):
>   s390x/kvm: include hw_accel.h instead of kvm.h
>
>  target/s390x/kvm.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCHv2] vhost-user: delete chardev on cleanup

2017-01-24 Thread Eric Blake

On 01/24/2017 01:02 PM, Marc-André Lureau wrote:
> Remove the chardev implicitely when cleaning up the netdev. This

s/implicitely/implicitly/

> prevents from reusing the chardev since it would be in an incorrect
> state with the slave.
> 
> Fixes:
> https://bugzilla.redhat.com/show_bug.cgi?id=1256618
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  net/vhost-user.c | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Eric Blake 

> 
> diff --git a/net/vhost-user.c b/net/vhost-user.c
> index 7aff77ee4a..179939f5c1 100644
> --- a/net/vhost-user.c
> +++ b/net/vhost-user.c
> @@ -151,7 +151,10 @@ static void vhost_user_cleanup(NetClientState *nc)
>  s->vhost_net = NULL;
>  }
>  if (nc->queue_index == 0) {
> +CharDriverState *chr = qemu_chr_fe_get_driver(>chr);
> +
>  qemu_chr_fe_deinit(>chr);
> +qemu_chr_delete(chr);
>  }
>  
>  qemu_purge_queued_packets(nc);
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH 03/10] armv7m: add state for v7M CCR, CFSR, HFSR, DFSR, MMFAR, BFAR

2017-01-24 Thread Peter Maydell

Add the structure fields, VMState fields, reset code and macros for
the v7M system control registers CCR, CFSR, HFSR, DFSR, MMFAR and
BFAR.

Signed-off-by: Peter Maydell 
---
 target/arm/cpu.h | 54 
 target/arm/cpu.c |  7 +++
 target/arm/machine.c | 10 --
 3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index b2cc329..4b062d2 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -21,6 +21,7 @@
 #define ARM_CPU_H
 
 #include "kvm-consts.h"
+#include "hw/registerfields.h"
 
 #if defined(TARGET_AARCH64)
   /* AArch64 definitions */
@@ -405,6 +406,12 @@ typedef struct CPUARMState {
 uint32_t vecbase;
 uint32_t basepri;
 uint32_t control;
+uint32_t ccr; /* Configuration and Control */
+uint32_t cfsr; /* Configurable Fault Status */
+uint32_t hfsr; /* HardFault Status */
+uint32_t dfsr; /* Debug Fault Status Register */
+uint32_t mmfar; /* MemManage Fault Address */
+uint32_t bfar; /* BusFault Address */
 int exception;
 } v7m;
 
@@ -1086,6 +1093,53 @@ enum arm_cpu_mode {
 #define ARM_IWMMXT_wCGR2   10
 #define ARM_IWMMXT_wCGR3   11
 
+/* V7M CCR bits */
+FIELD(V7M_CCR, NONBASETHRDENA, 0, 1)
+FIELD(V7M_CCR, USERSETMPEND, 1, 1)
+FIELD(V7M_CCR, UNALIGN_TRP, 3, 1)
+FIELD(V7M_CCR, DIV_0_TRP, 4, 1)
+FIELD(V7M_CCR, BFHFNMIGN, 8, 1)
+FIELD(V7M_CCR, STKALIGN, 9, 1)
+FIELD(V7M_CCR, DC, 16, 1)
+FIELD(V7M_CCR, IC, 17, 1)
+
+/* V7M CFSR bits for MMFSR */
+FIELD(V7M_CFSR, IACCVIOL, 0, 1)
+FIELD(V7M_CFSR, DACCVIOL, 1, 1)
+FIELD(V7M_CFSR, MUNSTKERR, 3, 1)
+FIELD(V7M_CFSR, MSTKERR, 4, 1)
+FIELD(V7M_CFSR, MLSPERR, 5, 1)
+FIELD(V7M_CFSR, MMARVALID, 7, 1)
+
+/* V7M CFSR bits for BFSR */
+FIELD(V7M_CFSR, IBUSERR, 8 + 0, 1)
+FIELD(V7M_CFSR, PRECISERR, 8 + 1, 1)
+FIELD(V7M_CFSR, IMPRECISERR, 8 + 2, 1)
+FIELD(V7M_CFSR, UNSTKERR, 8 + 3, 1)
+FIELD(V7M_CFSR, STKERR, 8 + 4, 1)
+FIELD(V7M_CFSR, LSPERR, 8 + 5, 1)
+FIELD(V7M_CFSR, BFARVALID, 8 + 7, 1)
+
+/* V7M CFSR bits for UFSR */
+FIELD(V7M_CFSR, UNDEFINSTR, 16 + 0, 1)
+FIELD(V7M_CFSR, INVSTATE, 16 + 1, 1)
+FIELD(V7M_CFSR, INVPC, 16 + 2, 1)
+FIELD(V7M_CFSR, NOCP, 16 + 3, 1)
+FIELD(V7M_CFSR, UNALIGNED, 16 + 8, 1)
+FIELD(V7M_CFSR, DIVBYZERO, 16 + 9, 1)
+
+/* V7M HFSR bits */
+FIELD(V7M_HFSR, VECTTBL, 1, 1)
+FIELD(V7M_HFSR, FORCED, 30, 1)
+FIELD(V7M_HFSR, DEBUGEVT, 31, 1)
+
+/* V7M DFSR bits */
+FIELD(V7M_DFSR, HALTED, 0, 1)
+FIELD(V7M_DFSR, BKPT, 1, 1)
+FIELD(V7M_DFSR, DWTTRAP, 2, 1)
+FIELD(V7M_DFSR, VCATCH, 3, 1)
+FIELD(V7M_DFSR, EXTERNAL, 4, 1)
+
 /* If adding a feature bit which corresponds to a Linux ELF
  * HWCAP bit, remember to update the feature-bit-to-hwcap
  * mapping in linux-user/elfload.c:get_elf_hwcap().
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 6395d5a..c804f59 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -188,6 +188,13 @@ static void arm_cpu_reset(CPUState *s)
 uint8_t *rom;
 
 env->daif &= ~PSTATE_I;
+
+/* The reset value of this bit is IMPDEF, but ARM recommends
+ * that it resets to 1, so QEMU always does that rather than making
+ * it dependent on CPU model.
+ */
+env->v7m.ccr = R_V7M_CCR_STKALIGN_MASK;
+
 rom = rom_ptr(0);
 if (rom) {
 /* Address zero is covered by ROM which hasn't yet been
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 8ed24bf..49e09a8 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -96,13 +96,19 @@ static bool m_needed(void *opaque)
 
 static const VMStateDescription vmstate_m = {
 .name = "cpu/m",
-.version_id = 2,
-.minimum_version_id = 2,
+.version_id = 3,
+.minimum_version_id = 3,
 .needed = m_needed,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(env.v7m.vecbase, ARMCPU),
 VMSTATE_UINT32(env.v7m.basepri, ARMCPU),
 VMSTATE_UINT32(env.v7m.control, ARMCPU),
+VMSTATE_UINT32(env.v7m.ccr, ARMCPU),
+VMSTATE_UINT32(env.v7m.cfsr, ARMCPU),
+VMSTATE_UINT32(env.v7m.hfsr, ARMCPU),
+VMSTATE_UINT32(env.v7m.dfsr, ARMCPU),
+VMSTATE_UINT32(env.v7m.mmfar, ARMCPU),
+VMSTATE_UINT32(env.v7m.bfar, ARMCPU),
 VMSTATE_INT32(env.v7m.exception, ARMCPU),
 VMSTATE_END_OF_LIST()
 }
-- 
2.7.4

[Qemu-devel] [PATCH] PPC: MMU compatibility check.

2017-01-24 Thread Valentin Plotkin




Hi everyone,

I looked at the "qemu-system-ppc -nographic -cpu G2leGP3 -M ppce500" on 
the BiteSizedTasks page. The segfault was caused by machine initialization 
code which expected a certain MMU model, checked, so unused SPR were read, 
returning zeros. bamboo and virtex machines are affected as well, but it 
doesn't always cause segfault, usually running into unmapped memory and 
failing somewhat more nicely.


I added the checks. It would be possible to add support for other MMU 
models, but I'm not sure if there is any point (would any guest OS support 
mutually exclusive CPU and machine)?


---
 hw/ppc/e500.c  | 5 +
 hw/ppc/ppc440_bamboo.c | 5 +
 hw/ppc/virtex_ml507.c  | 5 +
 3 files changed, 15 insertions(+)

diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index cf8b122..cd352f6 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -631,6 +631,11 @@ static uint64_t mmubooke_initial_mapsize(CPUPPCState 
*env)


 static void mmubooke_create_initial_mapping(CPUPPCState *env)
 {
+if(env->mmu_model!=POWERPC_MMU_BOOKE206){
+fprintf(stderr,"MMU model %i not supported by this machine.\n", 
env->mmu_model);

+exit(-1);
+}
+
 ppcmas_tlb_t *tlb = booke206_get_tlbm(env, 1, 0, 0);
 hwaddr size;
 int ps;
diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c
index 5c535b1..61e7028 100644
--- a/hw/ppc/ppc440_bamboo.c
+++ b/hw/ppc/ppc440_bamboo.c
@@ -124,6 +124,11 @@ static void 
mmubooke_create_initial_mapping(CPUPPCState *env,

  target_ulong va,
  hwaddr pa)
 {
+if(env->mmu_model!=POWERPC_MMU_BOOKE){
+fprintf(stderr,"MMU model %i not supported by this machine.\n", 
env->mmu_model);

+exit(-1);
+}
+
 ppcemb_tlb_t *tlb = >tlb.tlbe[0];

 tlb->attr = 0;
diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c
index b97d966..4041ff3 100644
--- a/hw/ppc/virtex_ml507.c
+++ b/hw/ppc/virtex_ml507.c
@@ -69,6 +69,11 @@ static void mmubooke_create_initial_mapping(CPUPPCState 
*env,

  target_ulong va,
  hwaddr pa)
 {
+if(env->mmu_model!=POWERPC_MMU_BOOKE){
+fprintf(stderr,"MMU model %i not supported by this machine.\n", 
env->mmu_model);

+exit(-1);
+}
+
 ppcemb_tlb_t *tlb = >tlb.tlbe[0];

 tlb->attr = 0;
--
2.5.5

Valentin Plotkin
calib...@sdf.org
SDF Public Access UNIX System - http://sdf.org

1 2 3 4 5 >

1 - 100 of 467 matches

Mail list logo