date:20160906

Re: [Qemu-devel] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-06 Thread Markus Armbruster

John Snow  writes:

> On 09/05/2016 05:15 AM, Markus Armbruster wrote:
>> John Snow  writes:
>>
>>> On 09/02/2016 01:44 AM, Markus Armbruster wrote:
 John Snow  writes:

> If a device still has an attached BDS because the medium has not yet
> been removed, we will be unable to migrate to a new host because
> blk_flush will return an error for that backend.
>
> Replace the call to blk_is_available to blk_is_inserted to weaken
> the check and allow flushes from the backend to work, while still
> disallowing flushes from the frontend/device model to work.
>
> This fixes a regression present in 2.6.0 caused by the following commit:
> fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
> block: Move some bdrv_*_all() functions to BB
>
> Signed-off-by: John Snow 
> ---
>  block/block-backend.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index effa038..36a32c3 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1058,7 +1058,7 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, 
> int64_t offset,
>  BlockAIOCB *blk_aio_flush(BlockBackend *blk,
>BlockCompletionFunc *cb, void *opaque)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
>  }
>
> @@ -1118,7 +1118,7 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t 
> offset, int count)
>
>  int blk_co_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }
>
> @@ -1127,7 +1127,7 @@ int blk_co_flush(BlockBackend *blk)
>
>  int blk_flush(BlockBackend *blk)
>  {
> -if (!blk_is_available(blk)) {
> +if (!blk_is_inserted(blk)) {
>  return -ENOMEDIUM;
>  }

 Naive question: should we flush before we open the tray?

>>>
>>> Naive, but long-winded answer:
>>>
>>> There are two types of flushes to me, conceptually:
>>>
>>> Frontend and Backend.
>>>
>>> Frontend would be a request from the quest to flush. If the medium in
>>> question is absent, this should rightly fail. I do expect this to be
>>> handled at the device layer.
>>>
>>> Backend is a request from QEMU for various reasons, like needing to
>>> drain the queue for migration or savevm.
>>>
>>> What's happening here is that we have a backend request to flush a
>>> medium that is inaccessible to the guest
>>
>> Assuming we caught frontend requests at the device layer already.
>>
>>>  -- The flush all routine is
>>> ignorant of this fact. So we get a migration request with an open tray
>>> (because the user had opened it some time prior perhaps, and left it
>>> open unwittingly) and it fails because its inaccessible to the
>>> guest. Migration fails as a result.
>>>
>>> That seems wrong to me. A few ways to fix it:
>>>
>>> (1) Have internal flush requests be aware of the fact that there's
>>> nothing to flush here: this is a read-only media and we could skip it.
>>
>> Returning successfully because there's nothing to flush makes sense to
>> me.
>>
>
> Only slightly more involved. At the time I wrote the first patch, I
> don't think Denis' improvements had gone in yet, and I still haven't
> really looked at them to see how easy they are to co-opt here.
>
>>> (2) Allow flush to fail in a non-fatal way for cases where we need to
>>> flush, but cannot (-ENOMEDIUM) and where it would rightly fail on a
>>> real machine.
>>
>> I don't like -ENOMEDIUM when there's nothing to flush.
>>
>>> (3) Just allow flushes to work on devices not visible to the guest,
>>> which is what I've done here. Internal requests should work, Guest
>>> requests should fail.
>>
>> I guess that's okay, ...
>>
>
> Conceptually, QEMU should be allowed to flush things to its internal
> drive abstractions regardless of what it looks like to the guest
> whenever it decides it is necessary to.
>
> In practice, I don't know how clean we are about separating out who is
> requesting the flush to know which to deny.

Exactly.

>>> I was briefly concerned about "What if this lets us flush something we
>>> shouldn't?" and Max's take was "That doesn't seem so bad. CDROMs are
>>> RO anyway."
>>
>> ... even though I don't buy Max's reason.  Writable removable media
>> exist.  The argument I can buy is that internal requests are
>> fundamentally different from external requests.
>>
>
> Yeah, I think any such flush request should be denied by the device model.
>
>>> So I went with the easiest option here.
>>>
>>> To answer your question more directly, we aren't choosing to
>>> eject-then-flush, the user is. I can't move the flush before the eject
>>> unless we flush-on-eject internally, then mark the de

Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-06 Thread Tian, Kevin

> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Wednesday, September 07, 2016 1:41 AM
> 
> On Sat, 3 Sep 2016 22:04:56 +0530
> Kirti Wankhede  wrote:
> 
> > On 9/3/2016 3:18 AM, Paolo Bonzini wrote:
> > >
> > >
> > > On 02/09/2016 20:33, Kirti Wankhede wrote:
> > >>  We could even do:
> > 
> >  echo $UUID1:$GROUPA > create
> > 
> >  where $GROUPA is the group ID of a previously created mdev device into
> >  which $UUID1 is to be created and added to the same group.
> > >> 
> > >
> > > From the point of view of libvirt, I think I prefer Alex's idea.
> > >  could be an additional element in the nodedev-create XML:
> > >
> > > 
> > >   my-vgpu
> > >   pci__86_00_0
> > >   
> > > 
> > > 0695d332-7831-493f-9e71-1c85c8911a08
> > > group1
> > >   
> > > 
> > >
> > > (should group also be a UUID?)
> > >
> >
> > No, this should be a unique number in a system, similar to iommu_group.
> 
> Sorry, just trying to catch up on this thread after a long weekend.
> 
> We're talking about iommu groups here, we're not creating any sort of
> parallel grouping specific to mdev devices.  This is why my example
> created a device and then required the user to go find the group number
> given to that device in order to create another device within the same
> group.  iommu group numbering is not within the user's control and is
> not a uuid.  libvirt can refer to the group as anything it wants in the
> xml, but the host group number is allocated by the host, not under user
> control, is not persistent.  libvirt would just be giving it a name to
> know which devices are part of the same group.  Perhaps the runtime xml
> would fill in the group number once created.
> 
> There were also a lot of unanswered questions in my proposal, it's not
> clear that there's a standard algorithm for when mdev devices need to
> be grouped together.  Should we even allow groups to span multiple host
> devices?  Should they be allowed to span devices from different
> vendors?

I think we should limit the scope of iommu group for mdev here, which 
better only contains mdev belonging to same parent device. Spanning
multiple host devices (regardless of whether from different vendors)
are grouped based on physical isolation granularity. Better not to mix
two levels together. I'm not sure whether NVIDIA has requirement to
start all vGPUs together even when they come from different parent
devices. Hope not...

> 
> If we imagine a scenario of a group composed of a mix of Intel and
> NVIDIA vGPUs, what happens when an Intel device is opened first?  The
> NVIDIA driver wouldn't know about this, but it would know when the
> first NVIDIA device is opened and be able to establish p2p for the
> NVIDIA devices at that point.  Can we do what we need with that model?
> What if libvirt is asked to hot-add an NVIDIA vGPU?  It would need to
> do a create on the NVIDIA parent device with the existing group id, at
> which point the NVIDIA vendor driver could fail the device create if
> the p2p setup has already been done.  The Intel vendor driver might
> allow it.  Similar to open, the last close of the mdev device for a
> given vendor (which might not be the last close of mdev devices within
> the group) would need to trigger the offline process for that vendor.

I assume iommu group is for minimal isolation granularity. In higher
level we have VFIO container which could deliver both Intel vGPUs
and NVIDIA vGPUs to the same VM. Intel vGPUs each have its own
iommu group, while NVIDIA vGPUs of the same parent device may
be in one group.

Thanks
Kevin

Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support

2016-09-06 Thread Jike Song

On 09/07/2016 11:38 AM, Neo Jia wrote:
> On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:
>> On 09/02/2016 11:03 PM, Alex Williamson wrote:
>>> On Fri,  2 Sep 2016 16:16:08 +0800
>>> Jike Song  wrote:
>>>
 This patchset is based on NVidia's "Add Mediated device support" series, 
 version 6:

http://www.spinics.net/lists/kvm/msg136472.html
>>>
>>>
>>> Hi Jike,
>>>
>>> I'm thrilled by your active participation here, but I'm confused which
>>> versions I should be reviewing and where the primary development is
>>> going.  Kirti sent v7 a week ago, so I would have expected a revision
>>> based on that rather than a re-write based on v6 plus incorporation of a
>>> few of Kirti's patches directly.
>>
>> Hi Alex,
>>
>> [Sorry! replied this on Monday but it was silently dropped by the our 
>> firewall]
>>
>>
>>
>> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, 
>> to
>> demonstrate how is it possible and beneficial to:
>>
>>  1, Introduce an independent device between physical and mdev;
>>  2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
>>
>> Unfortunately neither was understood or adopted in v7:
>>
>>  http://www.spinics.net/lists/kvm/msg137081.html
>>
>> So here came the v2, as a standalone series, to give a whole and straight
>> demonstration. The reason of still basing on v6:
>>
>>  - Addressed all v6 comments (except the iommu part);
>>  - There is no comments yet for v7 (except the sysfs ones);
>>
>>
>>
>>> I liked the last version of these
>>> changes a lot, but we need to figure out how to combine development
>>> because we do not have infinite cycles for review available :-\  Thanks!
>>
>> Fully understand.
>>
>> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
>> at the direction we prefer. 
> 
> Hi Jike,
> 
> I wish I could meet you in person in KVM forum couple weeks ago so we can 
> have a
> better discussion.

I wish I could have that opportunity, too!

> We are trying our best to accommodate almost all requirements / comments from 
> use cases and code reviews while keeping little (or none) architectural 
> changes
> between revisions.

Yes I saw that, there was little architectural change from v6 to v7,
that's what I will argue for :)

>> We would be highly glad and thankful if Neo/Kirti
>> would adopt the code in their next version, which will certainly form a
>> more simple and consolidated base for future co-development; otherwise
>> and we could at least discuss the concerns, in case of any.
>>
> 
> As I have said in my previous response to you, if you have any questions about
> adopting the framework that we have developed, you are very welcome to
> comment/speak out on the code review thread like others. And if it is 
> reasonable
> request and won't break other vendors' use case, we will adopt it (one example
> is the online file and removing the mdev pci dependency).
> 

Not limited to having questions about adoption, right? :)

We do think the framework itself had too much unnecessary logic and
imposed limitations to vendor drivers, also it's clearly possible to be
simplified.

> Just some update for you regarding the v7 patches, currently we are very
> actively trying to lock down the sysfs and management interfaces discussion.
> 
> So, if you would like to make the upstream happen sooner, please join us in 
> the
> v7 and following patch discussion instead of rewriting them.
> 

So as you said, I would comment on the v7 series to propose both architectural
and implementation changes, hoping this will help more.


--
Thanks,
Jike

Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution

2016-09-06 Thread Richard Henderson


On 09/06/2016 10:43 PM, Paolo Bonzini wrote:



On 07/09/2016 06:03, Richard Henderson wrote:



+if (mr->global_locking) {
+qemu_mutex_lock_iothread();
+locked = true;
+}
 memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
 iotlbentry->attrs);
+if (locked) {
+qemu_mutex_unlock_iothread();
+}


I'm not keen on this pattern.

(1) Why not use recursive locks?


Probably I'm biased by looking at this code too long, but... how would
they help?


You wouldn't check to see if the lock is already taken.


r~

Re: [Qemu-devel] [PATCH 2/3] memory: add iommu_notify_flag

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 02:34:19PM +0800, Peter Xu wrote:
> On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > > "ADDITION" is nowhere better...
> > 
> > Right.. this brings up a good point.
> > 
> > Changing a mapping (i.e. overwriting an existing mapping with a
> > different one) would also need notification, even on x86, no?  Since
> > it implicitly invalidates the previous mapping.
> > 
> > I'm guessing the guest will avoid this by always unmapping before it
> > maps.  We still need to consider this possibility when designing the
> > notifier interface though.
> > 
> > It seems the real notification triggers here are:
> > * map - something is mapped which previously wasn't
> > * unmap - something is no longer mapped which was before
> > 
> > Note that whether the second needs to be triggered depends on the
> > *previous* state of that IOBA range, *not* on the permissions of the
> > new mapping (if any).
> > 
> > A "change" - replacing one mapping with another should count as both a
> > "map" and "unmap" event.
> 
> Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
> it doesn't care about map/unmap, it cares about invalidated cache.

I think caring about invalidated cache *is* caring about unmap.  It
doesn't matter whether the new mapping is something or nothing - if
the old mapping is no longer valid, you need to invalidate the cache,
yes?

> So
> IIUC this is a question about "naming" but not the implementations...
> I suppose it is really a matter of taste, and both work for me (either
> INVALIDATION/CHANGE or UNMAP/MAP).

No.. it is a question of implementation.  My point is that I don't
think the new permission is sufficient information to let you know if
a notification is necessary.  You need to know if there was an
existing mapping at that IOBA.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 0/4] Introduce error_report_{fatal|abort}

2016-09-06 Thread Fam Zheng

On Wed, 09/07 14:06, Peter Xu wrote:
> v4 changes:
> - remove two standard headers since they are included in osdep.h
>   already [Fam]
> - make sure it passes build on all platforms (no --target-list
>   specified during configure)

Reviewed-by: Fam Zheng

Re: [Qemu-devel] [PATCH 2/3] memory: add iommu_notify_flag

2016-09-06 Thread Peter Xu

On Wed, Sep 07, 2016 at 03:44:19PM +1000, David Gibson wrote:
> > For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> > "ADDITION" is nowhere better...
> 
> Right.. this brings up a good point.
> 
> Changing a mapping (i.e. overwriting an existing mapping with a
> different one) would also need notification, even on x86, no?  Since
> it implicitly invalidates the previous mapping.
> 
> I'm guessing the guest will avoid this by always unmapping before it
> maps.  We still need to consider this possibility when designing the
> notifier interface though.
> 
> It seems the real notification triggers here are:
> * map - something is mapped which previously wasn't
> * unmap - something is no longer mapped which was before
> 
> Note that whether the second needs to be triggered depends on the
> *previous* state of that IOBA range, *not* on the permissions of the
> new mapping (if any).
> 
> A "change" - replacing one mapping with another should count as both a
> "map" and "unmap" event.

Yeah... For MAP/UNMAP, it is strange in another way: e.g. for vhost,
it doesn't care about map/unmap, it cares about invalidated cache. So
IIUC this is a question about "naming" but not the implementations...
I suppose it is really a matter of taste, and both work for me (either
INVALIDATION/CHANGE or UNMAP/MAP).

Thanks,

-- peterx

Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines

2016-09-06 Thread Gerd Hoffmann

  Hi,

> >> ports, if that's allowed). For example:
> >>
> >> -  1-32 ports needed: use root ports only
> >>
> >> - 33-64 ports needed: use 31 root ports, and one switch with 2-32
> >> downstream ports

I expect you rarely need any switches.  You can go multifunction with
the pcie root ports.  Which is how physical q35 works too btw, typically
the root ports are on slot 1c for intel chipsets:

nilsson root ~# lspci -s1c
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset
Family PCI Express Root Port 3 (rev c4)

Root bus has 32 slots, a few are taken (host bridge @ 00.0, lpc+sata @
1f.*, pci bridge @ 1e.0, maybe vga @ 01.0), leaving 28 free slots.  With
8 functions each you can have up to 224 root ports without any switches,
and you have not many pci bus numbers left until you hit the 256 busses
limit ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 1/1] mirror: fix improperly filled copy_bitmap for mirror block job

2016-09-06 Thread Jeff Cody

On Tue, Sep 06, 2016 at 06:11:19PM +0300, Denis V. Lunev wrote:
> bdrv_is_allocated_above() returns true in the case if qcow2 even for
> completely zeroed areas as BDRV_BLOCK_ALLOCATED flag is set in both
> cases.

Hi Denis,

Not just qcow2.  BDRV_BLOCK_ALLOCATED for a layer means the content of the
block is defined by that layer (even if that is a zero block, that doesn't
mean it is unallocated).  This applies to all image formats.

> Though we have completely zeroed out the image just above or it was
> already zeroed.

This is only true if we are in 'sync = full' mode.

> The patch stops using bdrv_is_allocated_above() wrapper and switches to
> bdrv_get_block_status_above() to distinguish zeroed areas and areas with
> data to avoid extra IO operations, which could be VERY slow.

For 'sync = top', we need to make sure to appropriately allocate the sector,
even if it is BDRV_BLOCK_ZERO.

> Signed-off-by: Denis V. Lunev 
> CC: Jeff Cody 
> CC: Kevin Wolf 
> CC: Max Reitz 
> ---
>  block/mirror.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index e0b3f41..87edbd8 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -552,7 +552,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  BlockDriverState *base = s->base;
>  BlockDriverState *bs = blk_bs(s->common.blk);
>  BlockDriverState *target_bs = blk_bs(s->target);
> -int ret, n;
> +int n;
>  
>  end = s->bdev_length / BDRV_SECTOR_SIZE;
>  
> @@ -590,6 +590,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob 
> *s)
>  /* Just to make sure we are not exceeding int limit. */
>  int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
>   end - sector_num);
> +int64_t status;
> +BlockDriverState *file;
>  
>  mirror_throttle(s);
>  
> @@ -597,13 +599,14 @@ static int coroutine_fn 
> mirror_dirty_init(MirrorBlockJob *s)
>  return 0;
>  }
>  
> -ret = bdrv_is_allocated_above(bs, base, sector_num, nb_sectors, &n);
> -if (ret < 0) {
> -return ret;
> +status = bdrv_get_block_status_above(bs, base, sector_num,
> + nb_sectors, &n, &file);
> +if (status < 0) {
> +return status;
>  }
>  
>  assert(n > 0);
> -if (ret == 1) {
> +if (status & BDRV_BLOCK_DATA) {

I think this patch would work if this was changed to something like this:

 mark_dirty = base == NULL ? status & BDRV_BLOCK_DATA : /* 'full' */
 status & BDRV_BLOCK_ALLOCATED;

 if (mark_dirty) {


Thanks,
Jeff

>  bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
>  }



>  sector_num += n;
> -- 
> 2.7.4
>

[Qemu-devel] [PATCH v4 4/4] error-report: leverage error_report_abort()

2016-09-06 Thread Peter Xu

Two cases that can leverage error_report_abort():

Case 1:

  error_report(...);
  abort();

Case 2:

  error_setg(&error_abort, ...);

This patch converts these cases to use error_report_abort().

Script error_report_abort.cocci is added to automate the convertion.

Signed-off-by: Peter Xu 
---
 block/qcow2.c   |  4 ++--
 hw/block/fdc.c  |  6 +++---
 hw/i386/kvm/pci-assign.c|  4 ++--
 hw/intc/xics.c  |  5 ++---
 hw/pci/pci.c|  3 +--
 hw/ppc/spapr_drc.c  |  2 +-
 hw/vfio/platform.c  | 10 --
 net/netmap.c|  5 ++---
 net/tap-linux.c | 10 ++
 scripts/coccinelle/error_report_abort.cocci | 14 ++
 target-ppc/translate_init.c |  3 +--
 vl.c|  7 +++
 12 files changed, 41 insertions(+), 32 deletions(-)
 create mode 100644 scripts/coccinelle/error_report_abort.cocci

diff --git a/block/qcow2.c b/block/qcow2.c
index 91ef4df..d491564 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2254,8 +2254,8 @@ static int qcow2_create2(const char *filename, int64_t 
total_size,
 goto out;
 
 } else if (ret != 0) {
-error_report("Huh, first cluster in empty image is already in use?");
-abort();
+error_report_abort("Huh, first cluster in empty image is "
+   "already in use?");
 }
 
 /* Create a full header (including things like feature table) */
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index f73af7d..14d2f20 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -372,9 +372,9 @@ static int pick_geometry(FDrive *drv)
 
 /* No match of any kind found -- fd_format is misconfigured, abort. */
 if (match == -1) {
-error_setg(&error_abort, "No candidate geometries present in table "
-   " for floppy drive type '%s'",
-   FloppyDriveType_lookup[drv->drive]);
+error_report_abort("No candidate geometries present in table "
+   " for floppy drive type '%s'",
+   FloppyDriveType_lookup[drv->drive]);
 }
 
 parse = &(fd_formats[match]);
diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 8238fbc..e27f307 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -465,8 +465,8 @@ static void assigned_dev_register_regions(PCIRegion 
*io_regions,
  * so should return EINVAL for a 3 byte read */
 ret = pread(pci_dev->v_addrs[i].region->resource_fd, &val, 3, 0);
 if (ret >= 0) {
-error_report("Unexpected return from I/O port read: %d", ret);
-abort();
+error_report_abort("Unexpected return from I/O port read: %d",
+   ret);
 } else if (errno != EINVAL) {
 error_report("Kernel doesn't support ioport resource "
  "access, hiding this region.");
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index cd48f42..86e5ddc8 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -84,9 +84,8 @@ void xics_cpu_setup(XICSState *xics, PowerPCCPU *cpu)
 break;
 
 default:
-error_report("XICS interrupt controller does not support this CPU "
- "bus model");
-abort();
+error_report_abort("XICS interrupt controller does not support "
+   "this CPU bus model");
 }
 }
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 0642def..f54b43c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2606,8 +2606,7 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
 msg = msi_get_message(dev, vector);
 } else {
 /* Should never happen */
-error_report("%s: unknown interrupt type", __func__);
-abort();
+error_report_abort("%s: unknown interrupt type", __func__);
 }
 return msg;
 }
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 26a0679..8d59a8d 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -327,7 +327,7 @@ static void prop_get_fdt(Object *obj, Visitor *v, const 
char *name,
 break;
 }
 default:
-error_setg(&error_abort, "device FDT in unexpected state: %d", 
tag);
+error_report_abort("device FDT in unexpected state: %d", tag);
 }
 fdt_offset = fdt_offset_next;
 } while (fdt_depth != 0);
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a559e7b..1190c92 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -339,9 +339,8 @@ static void vfio_start_eventfd_injection(SysBusDevice 
*sbdev, qemu_irq irq)
 
 ret = vfio_set_trigger_eventfd(intp, vfio_intp_interrupt);
 if (ret) {
-error_report("vfio: failed to start eventfd signalin

[Qemu-devel] [PATCH v4 0/4] Introduce error_report_{fatal|abort}

2016-09-06 Thread Peter Xu

v4 changes:
- remove two standard headers since they are included in osdep.h
  already [Fam]
- make sure it passes build on all platforms (no --target-list
  specified during configure)

v3 changes:
- implement error_report_fatal using function [Markus]
- provide error_report_abort as well in seperate patch [Markus, Fam]

We have many use cases that first print some error messages, then
quit (by either exit() or abort()). This series introduce two helper
functions for that.

The old formats are mostly one of the following:

Case one:

  error_report(...);
  exit(1|EXIT_FAILURE) | abort();

Case two:

  error_setg(&error_{fatal|abort}, ...);

And we can convert either of the above cases into:

  error_report_{fatal|abort}(...);

Two coccinelle scripts are created to help automate the work, plus
some manual tweaks:

1. very long strings, fix for over-80-chars issues, to make sure it
   passes checkpatch.pl.

2. add "return XXX" for some non-void retcode functions.

The first two patches introduce the functions. The latter two apply
them.

Peter Xu (4):
  error-report: provide error_report_fatal()
  error-report: provide error_report_abort()
  error-report: leverage error_report_fatal()
  error-report: leverage error_report_abort()

 arch_init.c |   8 +-
 block/qcow2.c   |   4 +-
 bootdevice.c|   4 +-
 cpus.c  |   3 +-
 device_tree.c   |  61 +++---
 hw/9pfs/9p.c|   3 +-
 hw/alpha/dp264.c|  14 +-
 hw/arm/armv7m.c |   3 +-
 hw/arm/digic_boards.c   |   6 +-
 hw/arm/fsl-imx6.c   |   5 +-
 hw/arm/highbank.c   |   6 +-
 hw/arm/raspi.c  |   7 +-
 hw/arm/sabrelite.c  |   9 +-
 hw/arm/strongarm.c  |   6 +-
 hw/arm/sysbus-fdt.c |  41 ++--
 hw/arm/vexpress.c   |  14 +-
 hw/arm/virt.c   |  45 ++---
 hw/arm/xlnx-ep108.c |  10 +-
 hw/block/fdc.c  |   6 +-
 hw/block/tc58128.c  |   3 +-
 hw/block/virtio-blk.c   |   9 +-
 hw/char/exynos4210_uart.c   |   9 +-
 hw/core/machine.c   |   7 +-
 hw/core/platform-bus.c  |   8 +-
 hw/i386/intel_iommu.c   |   5 +-
 hw/i386/kvm/pci-assign.c|   4 +-
 hw/i386/pc.c|  36 ++--
 hw/i386/pc_piix.c   |   3 +-
 hw/i386/x86-iommu.c |   5 +-
 hw/ide/core.c   |   5 +-
 hw/intc/ioapic.c|   5 +-
 hw/intc/xics.c  |   5 +-
 hw/intc/xics_kvm.c  |  17 +-
 hw/m68k/an5206.c|   3 +-
 hw/microblaze/boot.c|   5 +-
 hw/mips/mips_fulong2e.c |   3 +-
 hw/mips/mips_jazz.c |   3 +-
 hw/mips/mips_malta.c|  26 ++-
 hw/mips/mips_mipssim.c  |   6 +-
 hw/net/virtio-net.c |  28 ++-
 hw/nvram/fw_cfg.c   |   5 +-
 hw/pci/pci.c|   8 +-
 hw/ppc/e500.c   |   5 +-
 hw/ppc/mac_newworld.c   |  14 +-
 hw/ppc/mac_oldworld.c   |  18 +-
 hw/ppc/mpc8544ds.c  |   4 +-
 hw/ppc/ppc405_boards.c  |  14 +-
 hw/ppc/prep.c   |  11 +-
 hw/ppc/spapr.c  |  89 -
 hw/ppc/spapr_drc.c  |   2 +-
 hw/ppc/virtex_ml507.c   |   5 +-
 hw/scsi/vhost-scsi.c|  11 +-
 hw/scsi/virtio-scsi.c   |  10 +-
 hw/sh4/shix.c   |   3 +-
 hw/smbios/smbios.c  |  42 ++---
 hw/sparc/sun4m.c|  19 +-
 hw/tricore/tricore_testboard.c  |   8 +-
 hw/unicore32/puv3.c |   9 +-
 hw/vfio/platform.c  |  10 +-
 hw/virtio/virtio.c  |  45 ++---
 hw/xtensa/sim.c |   5 +-
 hw/xtensa/xtfpga.c  |  17 +-
 include/qemu/error-report.h |   2 +
 net/netmap.c|   5 +-
 net/tap-linux.c |  10 +-
 numa.c  |  22 +--
 qemu-img.c  |  16 +-
 qemu-io.c   |   9 +-
 qemu-nbd.c  | 116

[Qemu-devel] [PATCH v4 2/4] error-report: provide error_report_abort()

2016-09-06 Thread Peter Xu

A twin for error_report_fatal(), for programming errors.

Signed-off-by: Peter Xu 
---
 include/qemu/error-report.h |  1 +
 util/qemu-error.c   | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
index eb8260e..51a6f31 100644
--- a/include/qemu/error-report.h
+++ b/include/qemu/error-report.h
@@ -37,6 +37,7 @@ void error_set_progname(const char *argv0);
 void error_vreport(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
 void error_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 void error_report_fatal(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+void error_report_abort(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 const char *error_get_progname(void);
 extern bool enable_timestamp_msg;
 
diff --git a/util/qemu-error.c b/util/qemu-error.c
index 684f543..59726ab 100644
--- a/util/qemu-error.c
+++ b/util/qemu-error.c
@@ -248,3 +248,14 @@ void error_report_fatal(const char *fmt, ...)
 
 exit(1);
 }
+
+void error_report_abort(const char *fmt, ...)
+{
+va_list ap;
+
+va_start(ap, fmt);
+error_vreport(fmt, ap);
+va_end(ap);
+
+abort();
+}
-- 
2.7.4

Re: [Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 01:32:22PM +0800, Peter Xu wrote:
> IOMMU Notifier list is used for notifying IO address mapping changes.
> Currently VFIO is the only user.
> 
> However it is possible that future consumer like vhost would like to
> only listen to part of its notifications (e.g., cache invalidations).
> 
> This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
> grained control of it.
> 
> IOMMUNotifier contains a bitfield for the notify consumer describing
> what kind of notification it is interested in. Currently two kinds of
> notifications are defined:
> 
> - IOMMU_NOTIFIER_CHANGE:   for entry changes (additions)
> - IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)

As noted on the other thread, I think the correct options for your
bitmap here are "map" and "unmap".  Which are triggered depends on the
permissions / existence of the *previous* mapping, as well as the new
one.

You could in fact have "map-read", "map-write", "unmap-read",
"unmap-write" as separate bitmap options (e.g. changing a mapping from
RO to WO would be both a read-unmap and write-map event).  I can't see
any real use for that though, so just "map" and "unmap" are probably
sufficient.

> When registering the IOMMU notifier, we need to specify one or multiple
> capability bit(s) to listen to.
> 
> When notifications are triggered, it will be checked against the
> notifier's capability bits, and only notifiers with registered bits will
> be notified.
> 
> Signed-off-by: Peter Xu 
> ---
>  hw/vfio/common.c  |  3 ++-
>  include/exec/memory.h | 39 ---
>  include/hw/vfio/vfio-common.h |  2 +-
>  memory.c  | 37 -
>  4 files changed, 63 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index b313e7c..b0cea2c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -293,7 +293,7 @@ static bool 
> vfio_listener_skipped_section(MemoryRegionSection *section)
> section->offset_within_address_space & (1ULL << 63);
>  }
>  
> -static void vfio_iommu_map_notify(Notifier *n, void *data)
> +static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
>  {
>  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>  VFIOContainer *container = giommu->container;
> @@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
> section->offset_within_region;
>  giommu->container = container;
>  giommu->n.notify = vfio_iommu_map_notify;
> +giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;

"caps" isn't really right.  It's a *requirement* that VFIO get all the
notifications, not a capability.  "caps" would only make sense on the
other side (the vIOMMU implementation).

>  QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
>  
>  memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 3e4d416..92f14db 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -67,6 +67,28 @@ struct IOMMUTLBEntry {
>  IOMMUAccessFlags perm;
>  };
>  
> +/*
> + * Bitmap for differnet IOMMUNotifier capabilities. Each notifier can
> + * register with one or multiple IOMMU Notifier capability bit(s).
> + */
> +typedef enum {
> +IOMMU_NOTIFIER_NONE = 0,
> +/* Notify cache invalidations */
> +IOMMU_NOTIFIER_INVALIDATION = 0x1,
> +/* Notify entry changes (newly created entries) */
> +IOMMU_NOTIFIER_CHANGE = 0x2,
> +} IOMMUNotifierCap;
> +
> +#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_INVALIDATION | \
> +IOMMU_NOTIFIER_CHANGE)
> +
> +struct IOMMUNotifier {
> +void (*notify)(struct IOMMUNotifier *notifier, void *data);
> +IOMMUNotifierCap notifier_caps;
> +QLIST_ENTRY(IOMMUNotifier) node;
> +};
> +typedef struct IOMMUNotifier IOMMUNotifier;
> +
>  /* New-style MMIO accessors can indicate that the transaction failed.
>   * A zero (MEMTX_OK) response means success; anything else is a failure
>   * of some kind. The memory subsystem will bitwise-OR together results
> @@ -201,7 +223,7 @@ struct MemoryRegion {
>  const char *name;
>  unsigned ioeventfd_nb;
>  MemoryRegionIoeventfd *ioeventfds;
> -NotifierList iommu_notify;
> +QLIST_HEAD(, IOMMUNotifier) iommu_notify;
>  };
>  
>  /**
> @@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
>   * IOMMU translation entries.
>   *
>   * @mr: the memory region to observe
> - * @n: the notifier to be added; the notifier receives a pointer to an
> - * #IOMMUTLBEntry as the opaque value; the pointer ceases to be
> - * valid on exit from the notifier.
> + * @n: the IOMMUNotifier to be added; the notify callback receives a
> + * pointer to an #IOMMUTLBEntry as the opaque value; the pointer
> + * ceases to be

[Qemu-devel] [PATCH v4 1/4] error-report: provide error_report_fatal()

2016-09-06 Thread Peter Xu

There are many places in current QEMU codes that needs to print some
error and then quit QEMU. Provide a new function for it.

Signed-off-by: Peter Xu 
---
 include/qemu/error-report.h |  1 +
 util/qemu-error.c   | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
index 499ec8b..eb8260e 100644
--- a/include/qemu/error-report.h
+++ b/include/qemu/error-report.h
@@ -36,6 +36,7 @@ void error_printf_unless_qmp(const char *fmt, ...) 
GCC_FMT_ATTR(1, 2);
 void error_set_progname(const char *argv0);
 void error_vreport(const char *fmt, va_list ap) GCC_FMT_ATTR(1, 0);
 void error_report(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
+void error_report_fatal(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 const char *error_get_progname(void);
 extern bool enable_timestamp_msg;
 
diff --git a/util/qemu-error.c b/util/qemu-error.c
index 1ef3566..684f543 100644
--- a/util/qemu-error.c
+++ b/util/qemu-error.c
@@ -237,3 +237,14 @@ void error_report(const char *fmt, ...)
 error_vreport(fmt, ap);
 va_end(ap);
 }
+
+void error_report_fatal(const char *fmt, ...)
+{
+va_list ap;
+
+va_start(ap, fmt);
+error_vreport(fmt, ap);
+va_end(ap);
+
+exit(1);
+}
-- 
2.7.4

Re: [Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 01:32:23PM +0800, Peter Xu wrote:
> Considering that we may have multiple IOMMU notifier consumers in the
> future, converting iommu_ops.notify_{started|stopped} into some more
> general form. Now we can trap all notifier registerations and
> deregistrations, rather than only the first ones.
> 
> Power was leveraging the notifier_{started|stopped}, adding iommu_user
> field for counting on Power guests to achieve the same goal.

Requiring each vIOMMU frontend to reference count or whatever seems
like a pain.  The semantics of notify_started() were designed to avoid
that.

Instead I'd suggest a callback which gets tripped any time the logical
OR of the requested notifications for all current notifiers changes.

> Suggested-by:  Paolo Bonzini 
> Signed-off-by: Peter Xu 
> ---
>  hw/i386/intel_iommu.c  |  4 ++--
>  hw/ppc/spapr_iommu.c   | 18 --
>  include/exec/memory.h  |  8 
>  include/hw/ppc/spapr.h |  1 +
>  memory.c   | 10 --
>  5 files changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 28c31a2..c6bd8f6 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -1974,7 +1974,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion 
> *iommu, hwaddr addr,
>  return ret;
>  }
>  
> -static void vtd_iommu_notify_started(MemoryRegion *iommu)
> +static void vtd_iommu_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
>  {
>  VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
>  
> @@ -2348,7 +2348,7 @@ static void vtd_init(IntelIOMMUState *s)
>  memset(s->womask, 0, DMAR_REG_SIZE);
>  
>  s->iommu_ops.translate = vtd_iommu_translate;
> -s->iommu_ops.notify_started = vtd_iommu_notify_started;
> +s->iommu_ops.notifier_add = vtd_iommu_notifier_add;
>  s->root = 0;
>  s->root_extended = false;
>  s->dmar_enabled = false;
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 6bc4d4d..99c83a4 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -156,14 +156,20 @@ static uint64_t 
> spapr_tce_get_min_page_size(MemoryRegion *iommu)
>  return 1ULL << tcet->page_shift;
>  }
>  
> -static void spapr_tce_notify_started(MemoryRegion *iommu)
> +static void spapr_tce_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
>  {
> -spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true);
> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +if (tcet->iommu_users++ == 0) {
> +spapr_tce_set_need_vfio(tcet, true);
> +}
>  }
>  
> -static void spapr_tce_notify_stopped(MemoryRegion *iommu)
> +static void spapr_tce_notifier_del(MemoryRegion *iommu, IOMMUNotifier *n)
>  {
> -spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), 
> false);
> +sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
> +if (--tcet->iommu_users == 0) {
> +spapr_tce_set_need_vfio(tcet, false);
> +}
>  }
>  
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
> @@ -246,8 +252,8 @@ static const VMStateDescription vmstate_spapr_tce_table = 
> {
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  .translate = spapr_tce_translate_iommu,
>  .get_min_page_size = spapr_tce_get_min_page_size,
> -.notify_started = spapr_tce_notify_started,
> -.notify_stopped = spapr_tce_notify_stopped,
> +.notifier_add = spapr_tce_notifier_add,
> +.notifier_del = spapr_tce_notifier_del,
>  };
>  
>  static int spapr_tce_table_realize(DeviceState *dev)
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 92f14db..9efcf1b 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -175,10 +175,10 @@ struct MemoryRegionIOMMUOps {
>  IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool 
> is_write);
>  /* Returns minimum supported page size */
>  uint64_t (*get_min_page_size)(MemoryRegion *iommu);
> -/* Called when the first notifier is set */
> -void (*notify_started)(MemoryRegion *iommu);
> -/* Called when the last notifier is removed */
> -void (*notify_stopped)(MemoryRegion *iommu);
> +/* Called when someone registers to the notify list */
> +void (*notifier_add)(MemoryRegion *iommu, IOMMUNotifier *n);
> +/* Called when someone unregisters from the notify list */
> +void (*notifier_del)(MemoryRegion *iommu, IOMMUNotifier *n);
>  };
>  
>  typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index caf7be9..08627ec 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -564,6 +564,7 @@ struct sPAPRTCETable {
>  MemoryRegion root, iommu;
>  struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only 
> */
>  QLIST_ENTRY(sPAPRTCETable) list;
> +int iommu_users;
>  };
>  
>  sPAPRTCETable *spapr_tce_find_by_liobn

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 03:27:46PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2016-09-07 at 11:59 +1000, David Gibson wrote:
> > 
> > That does suggest an alternative approach though.  You could remove
> > XScomDevice entirely from QOM existence, and just expose the xscom
> > address space globally, much like address_space_memory.  The
> > individual devices could just register their own subregions within
> > it.
> > 
> > I'm not sure if the latter is a good idea, though.
> 
> Not globally, per chip.

Right, that's probably better.  Not immediately sure how the
scomdevice would get hold of its chip's scom AS, but we can probably
figure out something.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/3] memory: add iommu_notify_flag

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 06:31:42PM +0800, Peter Xu wrote:
> On Tue, Sep 06, 2016 at 10:19:14AM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 06/09/2016 10:17, Peter Xu wrote:
> > > After knowing the possibility that the two consumers might be
> > > mixturely used in the future (as David has mentioned), I'd vote for a
> > > bitmask for notification type:
> > > 
> > > IOMMU_NOTIFIER_NONE = 0,
> > > IOMMU_NOTIFIER_INVALIDATION = 1,
> > > IOMMU_NOTIFIER_ADDITION = 2,
> > 
> > ADDITION really should be "CHANGE" I think, so what about
> > IOMMU_NOTIFIER_INVALIDATE and IOMMU_NOTIFIER_CHANGE?
> 
> For "CHANGE", it sounds like a unmap() + a map(). However I'd say
> "ADDITION" is nowhere better...

Right.. this brings up a good point.

Changing a mapping (i.e. overwriting an existing mapping with a
different one) would also need notification, even on x86, no?  Since
it implicitly invalidates the previous mapping.

I'm guessing the guest will avoid this by always unmapping before it
maps.  We still need to consider this possibility when designing the
notifier interface though.

It seems the real notification triggers here are:
* map - something is mapped which previously wasn't
* unmap - something is no longer mapped which was before

Note that whether the second needs to be triggered depends on the
*previous* state of that IOBA range, *not* on the permissions of the
new mapping (if any).

A "change" - replacing one mapping with another should count as both a
"map" and "unmap" event.

> 
> Will use "CHANGE".
> 
> > 
> > For VFIO, would the "invalidate" and "add" callbacks use the same code
> > or different?
> 
> Currently vfio_iommu_map_notify() should be handling both.
> 
> Thanks,
> 
> -- peterx
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution

2016-09-06 Thread Paolo Bonzini



On 07/09/2016 06:03, Richard Henderson wrote:
> 
>> +if (mr->global_locking) {
>> +qemu_mutex_lock_iothread();
>> +locked = true;
>> +}
>>  memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
>>  iotlbentry->attrs);
>> +if (locked) {
>> +qemu_mutex_unlock_iothread();
>> +}
> 
> I'm not keen on this pattern.
> 
> (1) Why not use recursive locks?

Probably I'm biased by looking at this code too long, but... how would
they help?

> (2) If there's a good reason why not, then perhaps
> 
>   if (mr->global_locking) {
> qemu_mutex_lock_iothread();
> do_something;
> qemu_mutex_unlock_iothread();
>   } else {
> do_something;
>   }

I went with the other one because the arguments to
memory_region_dispatch_read are relatively many, but this can work too
if Alex prefers it.

Paolo

Re: [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation

2016-09-06 Thread Paolo Bonzini



On 07/09/2016 05:25, Richard Henderson wrote:
>>
>> +/* Set to kick if we have to do more than one vCPU */
>> +if (CPU_NEXT(first_cpu)) {
>> +kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
>> +  &kick_timer);
> 
> I'm not especially keen on this pointer to local variable thing. 
> Perhaps better as
> 
>   kick_timer = timer_new_ns(..., NULL);
>   kick_timer->opaque = kick_timer;

Or put it in CPUState and pass that.

Paolo

> and avoid the indirection in kick_tcg_thread.

Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 10:17:42AM +0530, Nikunj A Dadhania wrote:
> David Gibson  writes:
> 
> > [ Unknown signature status ]
> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> >> Signed-off-by: Nikunj A Dadhania 
> >
> > This really needs a comment indicating that this implementation isn't
> > strictly correct (although probably good enough in practice).
> 
> Sure. And it also does not help if someone uses any store other than
> store conditional, that isn't taken care.
> 
> Assumption here is the locking primitives use load with reservation and
> store conditional. And no other ld/st variant touch this memory.

So, a) I don't think this really relies on that: an ordinary store
(assuming it changes the value) will still get picked up the cmpxchg.
Well.. at least after a suitable memory barrier.  Matching memory
models between emulated and host cpus is a whole other kettle of fish.

I think this does matter, IIRC a kernel spin unlock on ppc is a
barrier + plain store, no load locked or store conditional.

> > Specifically a racing store which happens to store the same value
> > which was already in memory should clobber the reservation, but won't
> > with this implementation.
> >
> > I had a long discussion at KVM Forum with Emilio Costa about this, in
> > which I discovered just how hard it is to strictly implement
> > store-conditional semantics in terms of anything else.  So, this is
> > probably a reasonable substitute, but we should note the fact that
> > it's not 100%.
> 
> I will update the commit log.
> 
> Regards,
> Nikunj
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v3 3/3] intel_iommu: allow invalidation typed notifiers

2016-09-06 Thread Peter Xu

Intel vIOMMU is still lacking of a complete IOMMU notifier mechanism.
Before that is achieved, let's open a door for vhost DMAR support, which
only requires device-IOTLB based cache invalidations.

Meanwhile, converting hw_error() to error_report() and exit(), to make
the error messages clean and obvious (so no CPU registers will be
dumped).

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c6bd8f6..227c931 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1978,10 +1978,14 @@ static void vtd_iommu_notifier_add(MemoryRegion *iommu, 
IOMMUNotifier *n)
 {
 VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 
-hw_error("Device at bus %s addr %02x.%d requires iommu notifier which "
- "is currently not supported by intel-iommu emulation",
- vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
- PCI_FUNC(vtd_as->devfn));
+if (n->notifier_caps & IOMMU_NOTIFIER_CHANGE) {
+error_report("Device at bus %s addr %02x.%d requires iommu "
+ "notifier which is currently not supported by "
+ "intel-iommu emulation",
+ vtd_as->bus->qbus.name, PCI_SLOT(vtd_as->devfn),
+ PCI_FUNC(vtd_as->devfn));
+exit(1);
+}
 }
 
 static const VMStateDescription vtd_vmstate = {
-- 
2.7.4

[Qemu-devel] [PATCH v3 1/3] memory: introduce IOMMUNotifier and its caps

2016-09-06 Thread Peter Xu

IOMMU Notifier list is used for notifying IO address mapping changes.
Currently VFIO is the only user.

However it is possible that future consumer like vhost would like to
only listen to part of its notifications (e.g., cache invalidations).

This patch introduced IOMMUNotifier and IOMMUNotfierCap bits for a finer
grained control of it.

IOMMUNotifier contains a bitfield for the notify consumer describing
what kind of notification it is interested in. Currently two kinds of
notifications are defined:

- IOMMU_NOTIFIER_CHANGE:   for entry changes (additions)
- IOMMU_NOTIFIER_INVALIDATION: for entry removals (cache invalidates)

When registering the IOMMU notifier, we need to specify one or multiple
capability bit(s) to listen to.

When notifications are triggered, it will be checked against the
notifier's capability bits, and only notifiers with registered bits will
be notified.

Signed-off-by: Peter Xu 
---
 hw/vfio/common.c  |  3 ++-
 include/exec/memory.h | 39 ---
 include/hw/vfio/vfio-common.h |  2 +-
 memory.c  | 37 -
 4 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b313e7c..b0cea2c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -293,7 +293,7 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
section->offset_within_address_space & (1ULL << 63);
 }
 
-static void vfio_iommu_map_notify(Notifier *n, void *data)
+static void vfio_iommu_map_notify(IOMMUNotifier *n, void *data)
 {
 VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
 VFIOContainer *container = giommu->container;
@@ -454,6 +454,7 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
section->offset_within_region;
 giommu->container = container;
 giommu->n.notify = vfio_iommu_map_notify;
+giommu->n.notifier_caps = IOMMU_NOTIFIER_ALL;
 QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
 
 memory_region_register_iommu_notifier(giommu->iommu, &giommu->n);
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3e4d416..92f14db 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -67,6 +67,28 @@ struct IOMMUTLBEntry {
 IOMMUAccessFlags perm;
 };
 
+/*
+ * Bitmap for differnet IOMMUNotifier capabilities. Each notifier can
+ * register with one or multiple IOMMU Notifier capability bit(s).
+ */
+typedef enum {
+IOMMU_NOTIFIER_NONE = 0,
+/* Notify cache invalidations */
+IOMMU_NOTIFIER_INVALIDATION = 0x1,
+/* Notify entry changes (newly created entries) */
+IOMMU_NOTIFIER_CHANGE = 0x2,
+} IOMMUNotifierCap;
+
+#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_INVALIDATION | \
+IOMMU_NOTIFIER_CHANGE)
+
+struct IOMMUNotifier {
+void (*notify)(struct IOMMUNotifier *notifier, void *data);
+IOMMUNotifierCap notifier_caps;
+QLIST_ENTRY(IOMMUNotifier) node;
+};
+typedef struct IOMMUNotifier IOMMUNotifier;
+
 /* New-style MMIO accessors can indicate that the transaction failed.
  * A zero (MEMTX_OK) response means success; anything else is a failure
  * of some kind. The memory subsystem will bitwise-OR together results
@@ -201,7 +223,7 @@ struct MemoryRegion {
 const char *name;
 unsigned ioeventfd_nb;
 MemoryRegionIoeventfd *ioeventfds;
-NotifierList iommu_notify;
+QLIST_HEAD(, IOMMUNotifier) iommu_notify;
 };
 
 /**
@@ -620,11 +642,12 @@ void memory_region_notify_iommu(MemoryRegion *mr,
  * IOMMU translation entries.
  *
  * @mr: the memory region to observe
- * @n: the notifier to be added; the notifier receives a pointer to an
- * #IOMMUTLBEntry as the opaque value; the pointer ceases to be
- * valid on exit from the notifier.
+ * @n: the IOMMUNotifier to be added; the notify callback receives a
+ * pointer to an #IOMMUTLBEntry as the opaque value; the pointer
+ * ceases to be valid on exit from the notifier.
  */
-void memory_region_register_iommu_notifier(MemoryRegion *mr, Notifier *n);
+void memory_region_register_iommu_notifier(MemoryRegion *mr,
+   IOMMUNotifier *n);
 
 /**
  * memory_region_iommu_replay: replay existing IOMMU translations to
@@ -636,7 +659,8 @@ void memory_region_register_iommu_notifier(MemoryRegion 
*mr, Notifier *n);
  * @is_write: Whether to treat the replay as a translate "write"
  * through the iommu
  */
-void memory_region_iommu_replay(MemoryRegion *mr, Notifier *n, bool is_write);
+void memory_region_iommu_replay(MemoryRegion *mr, IOMMUNotifier *n,
+bool is_write);
 
 /**
  * memory_region_unregister_iommu_notifier: unregister a notifier for
@@ -646,7 +670,8 @@ void memory_region_iommu_replay(MemoryRegion *mr, Notifier 
*n, bool is_write);
  *  needs to be called
  * @n: the notifier to be removed.

[Qemu-devel] [PATCH v3 2/3] memory: generalize iommu_ops.notify_started to notifier_add

2016-09-06 Thread Peter Xu

Considering that we may have multiple IOMMU notifier consumers in the
future, converting iommu_ops.notify_{started|stopped} into some more
general form. Now we can trap all notifier registerations and
deregistrations, rather than only the first ones.

Power was leveraging the notifier_{started|stopped}, adding iommu_user
field for counting on Power guests to achieve the same goal.

Suggested-by:  Paolo Bonzini 
Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  |  4 ++--
 hw/ppc/spapr_iommu.c   | 18 --
 include/exec/memory.h  |  8 
 include/hw/ppc/spapr.h |  1 +
 memory.c   | 10 --
 5 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 28c31a2..c6bd8f6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1974,7 +1974,7 @@ static IOMMUTLBEntry vtd_iommu_translate(MemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
-static void vtd_iommu_notify_started(MemoryRegion *iommu)
+static void vtd_iommu_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
 {
 VTDAddressSpace *vtd_as = container_of(iommu, VTDAddressSpace, iommu);
 
@@ -2348,7 +2348,7 @@ static void vtd_init(IntelIOMMUState *s)
 memset(s->womask, 0, DMAR_REG_SIZE);
 
 s->iommu_ops.translate = vtd_iommu_translate;
-s->iommu_ops.notify_started = vtd_iommu_notify_started;
+s->iommu_ops.notifier_add = vtd_iommu_notifier_add;
 s->root = 0;
 s->root_extended = false;
 s->dmar_enabled = false;
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 6bc4d4d..99c83a4 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -156,14 +156,20 @@ static uint64_t spapr_tce_get_min_page_size(MemoryRegion 
*iommu)
 return 1ULL << tcet->page_shift;
 }
 
-static void spapr_tce_notify_started(MemoryRegion *iommu)
+static void spapr_tce_notifier_add(MemoryRegion *iommu, IOMMUNotifier *n)
 {
-spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), true);
+sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+if (tcet->iommu_users++ == 0) {
+spapr_tce_set_need_vfio(tcet, true);
+}
 }
 
-static void spapr_tce_notify_stopped(MemoryRegion *iommu)
+static void spapr_tce_notifier_del(MemoryRegion *iommu, IOMMUNotifier *n)
 {
-spapr_tce_set_need_vfio(container_of(iommu, sPAPRTCETable, iommu), false);
+sPAPRTCETable *tcet = container_of(iommu, sPAPRTCETable, iommu);
+if (--tcet->iommu_users == 0) {
+spapr_tce_set_need_vfio(tcet, false);
+}
 }
 
 static int spapr_tce_table_post_load(void *opaque, int version_id)
@@ -246,8 +252,8 @@ static const VMStateDescription vmstate_spapr_tce_table = {
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
 .translate = spapr_tce_translate_iommu,
 .get_min_page_size = spapr_tce_get_min_page_size,
-.notify_started = spapr_tce_notify_started,
-.notify_stopped = spapr_tce_notify_stopped,
+.notifier_add = spapr_tce_notifier_add,
+.notifier_del = spapr_tce_notifier_del,
 };
 
 static int spapr_tce_table_realize(DeviceState *dev)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 92f14db..9efcf1b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -175,10 +175,10 @@ struct MemoryRegionIOMMUOps {
 IOMMUTLBEntry (*translate)(MemoryRegion *iommu, hwaddr addr, bool 
is_write);
 /* Returns minimum supported page size */
 uint64_t (*get_min_page_size)(MemoryRegion *iommu);
-/* Called when the first notifier is set */
-void (*notify_started)(MemoryRegion *iommu);
-/* Called when the last notifier is removed */
-void (*notify_stopped)(MemoryRegion *iommu);
+/* Called when someone registers to the notify list */
+void (*notifier_add)(MemoryRegion *iommu, IOMMUNotifier *n);
+/* Called when someone unregisters from the notify list */
+void (*notifier_del)(MemoryRegion *iommu, IOMMUNotifier *n);
 };
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index caf7be9..08627ec 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -564,6 +564,7 @@ struct sPAPRTCETable {
 MemoryRegion root, iommu;
 struct VIOsPAPRDevice *vdev; /* for @bypass migration compatibility only */
 QLIST_ENTRY(sPAPRTCETable) list;
+int iommu_users;
 };
 
 sPAPRTCETable *spapr_tce_find_by_liobn(target_ulong liobn);
diff --git a/memory.c b/memory.c
index 45a3902..d913043 100644
--- a/memory.c
+++ b/memory.c
@@ -1518,9 +1518,8 @@ void memory_region_register_iommu_notifier(MemoryRegion 
*mr,
 {
 /* We need to register for at least one bitfield */
 assert(n->notifier_caps != IOMMU_NOTIFIER_NONE);
-if (mr->iommu_ops->notify_started &&
-QLIST_EMPTY(&mr->iommu_notify)) {
-mr->iommu_ops->notify_started(mr);
+if (mr->iommu_ops->notifier_add) {
+mr->iommu_ops->notifier_add(mr, n);
 }
 QLIST_INSERT_HEAD(&mr->iommu_notify, n, n

[Qemu-devel] [PATCH v3 0/3] Introduce IOMMUNotifier struct

2016-09-06 Thread Peter Xu

V3:
- use QLIST instead of embedding Notifier into IOMMUNotifier [Paolo]
- fix a build error for ppc64-softmmu

The idea originates from one of Alex's reply:

  https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg00254.html

But after further discussions, it seems that only adding a simple type
for notifier is not enough. This series introduced IOMMUNotifier
struct to replace the old Notifier interface. Along with it, we can
provide registration for one (or multiple) of the IOMMU notifications:

- cache invalidations
- entry changes

This is a support material for Jason's vhost dmar patchset.

Please read commit messages for detailed information. Thanks,

Peter Xu (3):
  memory: introduce IOMMUNotifier and its caps
  memory: generalize iommu_ops.notify_started to notifier_add
  intel_iommu: allow invalidation typed notifiers

 hw/i386/intel_iommu.c | 16 +--
 hw/ppc/spapr_iommu.c  | 18 +++--
 hw/vfio/common.c  |  3 ++-
 include/exec/memory.h | 47 +--
 include/hw/ppc/spapr.h|  1 +
 include/hw/vfio/vfio-common.h |  2 +-
 memory.c  | 43 +++
 7 files changed, 92 insertions(+), 38 deletions(-)

-- 
2.7.4

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread Benjamin Herrenschmidt

On Wed, 2016-09-07 at 11:59 +1000, David Gibson wrote:
> 
> That does suggest an alternative approach though.  You could remove
> XScomDevice entirely from QOM existence, and just expose the xscom
> address space globally, much like address_space_memory.  The
> individual devices could just register their own subregions within
> it.
> 
> I'm not sure if the latter is a good idea, though.

Not globally, per chip.

Cheers,
Ben.

Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation

2016-09-06 Thread Benjamin Herrenschmidt

On Wed, 2016-09-07 at 10:17 +0530, Nikunj A Dadhania wrote:
> > David Gibson  writes:
> 
> > 
> > [ Unknown signature status ]
> > On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> > > 
> > > > > > Signed-off-by: Nikunj A Dadhania 
> > 
> > This really needs a comment indicating that this implementation isn't
> > strictly correct (although probably good enough in practice).
> 
> Sure. And it also does not help if someone uses any store other than
> store conditional, that isn't taken care.
> 
> Assumption here is the locking primitives use load with reservation and
> store conditional. And no other ld/st variant touch this memory.

This is an incorrect assumption. spin_unlock for example uses a normal
store.

That being said, you will observe the difference in value which should
hopefully make things work...

I *hope* we don't have anything that relies on a normal store of the same
value as the atomic breaking the reservation, I *think* we might get away
with it, but it is indeed fishy.

> > Specifically a racing store which happens to store the same value
> > which was already in memory should clobber the reservation, but won't
> > with this implementation.
> > 
> > I had a long discussion at KVM Forum with Emilio Costa about this, in
> > which I discovered just how hard it is to strictly implement
> > store-conditional semantics in terms of anything else.  So, this is
> > probably a reasonable substitute, but we should note the fact that
> > it's not 100%.
> 
> I will update the commit log.
> 
> Regards,
> Nikunj

[Qemu-devel] [PATCH v5 0/2] targte-i386: Add virtual L3 cache support

2016-09-06 Thread Longpeng(Mike)

This patchset add virtual L3 cache support.

For KVM's linux guest, this will reduces amouts of IPIs under some workloads.
In our experiments(vm:1*socket,8*cores,2*threads 
workload:SAP-HANA-PB-testsuite),
this reduces 85% guest's resched-IPIs, and the performance improves 7.2%~33.1%.

---
Chandes since v4:
  - rename "l3-cache-shared" to "l3-cache"

Changes since v3:
  - add 2.8 machine(2.7 was released), set compat_props on PC_COMPAT_2_8
  - makes the commit message more clearly.

Changes since v2:
  - add more useful commit mesage.
  - rename "compat-cache" to "l3-cache-shared".

Changes since v1:
  - fix the compat problem: set compat_props on PC_COMPAT_2_7.
  - fix a "intentionally introducde bug": make intel's and amd's consistently.
  - fix the CPUID.(EAX=4, ECX=3):EAX[25:14].
  - test the performance if vcpus running on sparate sockets: with L3 cache,
the performance improves 7.2%~33.1%(avg: 15.7%).

Longpeng(Mike) (2):
  pc: Add 2.8 machine
  target-i386: present virtual L3 cache info for vcpus

 hw/i386/pc_piix.c| 16 +---
 hw/i386/pc_q35.c | 13 +++--
 include/hw/i386/pc.h | 12 
 target-i386/cpu.c| 49 -
 target-i386/cpu.h|  6 ++
 5 files changed, 86 insertions(+), 10 deletions(-)

-- 
1.8.3.1

[Qemu-devel] [PATCH v5 1/2] pc: Add 2.8 machine

2016-09-06 Thread Longpeng(Mike)

This will used by the next patch.

Signed-off-by: Longpeng(Mike) 
---
 hw/i386/pc_piix.c| 16 +---
 hw/i386/pc_q35.c | 13 +++--
 include/hw/i386/pc.h |  3 +++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a07dc81..0308fee 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -438,13 +438,25 @@ static void pc_i440fx_machine_options(MachineClass *m)
 m->default_display = "std";
 }
 
-static void pc_i440fx_2_7_machine_options(MachineClass *m)
+static void pc_i440fx_2_8_machine_options(MachineClass *m)
 {
 pc_i440fx_machine_options(m);
 m->alias = "pc";
 m->is_default = 1;
 }
 
+DEFINE_I440FX_MACHINE(v2_8, "pc-i440fx-2.8", NULL,
+  pc_i440fx_2_8_machine_options);
+
+
+static void pc_i440fx_2_7_machine_options(MachineClass *m)
+{
+pc_i440fx_2_8_machine_options(m);
+m->is_default = 0;
+m->alias = NULL;
+SET_MACHINE_COMPAT(m, PC_COMPAT_2_7);
+}
+
 DEFINE_I440FX_MACHINE(v2_7, "pc-i440fx-2.7", NULL,
   pc_i440fx_2_7_machine_options);
 
@@ -453,8 +465,6 @@ static void pc_i440fx_2_6_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_i440fx_2_7_machine_options(m);
-m->is_default = 0;
-m->alias = NULL;
 pcmc->legacy_cpu_hotplug = true;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_6);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index c0b9961..9559b81 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -292,12 +292,22 @@ static void pc_q35_machine_options(MachineClass *m)
 m->has_dynamic_sysbus = true;
 }
 
-static void pc_q35_2_7_machine_options(MachineClass *m)
+static void pc_q35_2_8_machine_options(MachineClass *m)
 {
 pc_q35_machine_options(m);
 m->alias = "q35";
 }
 
+DEFINE_Q35_MACHINE(v2_8, "pc-q35-2.8", NULL,
+   pc_q35_2_8_machine_options);
+
+static void pc_q35_2_7_machine_options(MachineClass *m)
+{
+pc_q35_2_8_machine_options(m);
+m->alias = NULL;
+SET_MACHINE_COMPAT(m, PC_COMPAT_2_7);
+}
+
 DEFINE_Q35_MACHINE(v2_7, "pc-q35-2.7", NULL,
pc_q35_2_7_machine_options);
 
@@ -305,7 +315,6 @@ static void pc_q35_2_6_machine_options(MachineClass *m)
 {
 PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
 pc_q35_2_7_machine_options(m);
-m->alias = NULL;
 pcmc->legacy_cpu_hotplug = true;
 SET_MACHINE_COMPAT(m, PC_COMPAT_2_6);
 }
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 74c175c..bd42cc7 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -367,6 +367,9 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
 int e820_get_num_entries(void);
 bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 
+#define PC_COMPAT_2_7 \
+HW_COMPAT_2_7
+
 #define PC_COMPAT_2_6 \
 HW_COMPAT_2_6 \
 {\
-- 
1.8.3.1

[Qemu-devel] [PATCH v5 2/2] target-i386: present virtual L3 cache info for vcpus

2016-09-06 Thread Longpeng(Mike)

Some software algorithms are based on the hardware's cache info, for example,
for x86 linux kernel, when cpu1 want to wakeup a task on cpu2, cpu1 will trigger
a resched IPI and told cpu2 to do the wakeup if they don't share low level
cache. Oppositely, cpu1 will access cpu2's runqueue directly if they share llc.
The relevant linux-kernel code as bellow:

static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
..
if (... && !cpus_share_cache(smp_processor_id(), cpu)) {
..
ttwu_queue_remote(p, cpu); /* will trigger RES IPI */
return;
}
..
ttwu_do_activate(rq, p, 0); /* access target's rq directly */
..
}

In real hardware, the cpus on the same socket share L3 cache, so one won't
trigger a resched IPIs when wakeup a task on others. But QEMU doesn't present a
virtual L3 cache info for VM, then the linux guest will trigger lots of RES IPIs
under some workloads even if the virtual cpus belongs to the same virtual 
socket.

For KVM, there will be lots of vmexit due to guest send IPIs.
The workload is a SAP HANA's testsuite, we run it one round(about 40 minuates)
and observe the (Suse11sp3)Guest's amounts of RES IPIs which triggering during
the period:
No-L3   With-L3(applied this patch)
cpu0:   363890  44582
cpu1:   373405  43109
cpu2:   340783  43797
cpu3:   333854  43409
cpu4:   327170  40038
cpu5:   325491  39922
cpu6:   319129  42391
cpu7:   306480  41035
cpu8:   161139  32188
cpu9:   164649  31024
cpu10:  149823  30398
cpu11:  149823  32455
cpu12:  164830  35143
cpu13:  172269  35805
cpu14:  179979  33898
cpu15:  194505  32754
avg:268963.640129.8

The VM's topology is "1*socket 8*cores 2*threads".
After present virtual L3 cache info for VM, the amounts of RES IPIs in guest
reduce 85%.

For KVM, vcpus send IPIs will cause vmexit which is expensive, so it can cause
severe performance degradation. We had tested the overall system performance if
vcpus actually run on sparate physical socket. With L3 cache, the performance
improves 7.2%~33.1%(avg:15.7%).

Signed-off-by: Longpeng(Mike) 
---
 include/hw/i386/pc.h |  9 +
 target-i386/cpu.c| 49 -
 target-i386/cpu.h|  6 ++
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index bd42cc7..079a957 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -367,7 +367,16 @@ int e820_add_entry(uint64_t, uint64_t, uint32_t);
 int e820_get_num_entries(void);
 bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
 
+#define PC_COMPAT_2_8 \
+{\
+.driver   = TYPE_X86_CPU,\
+.property = "l3-cache",\
+.value= "off",\
+},
+
+
 #define PC_COMPAT_2_7 \
+PC_COMPAT_2_8 \
 HW_COMPAT_2_7
 
 #define PC_COMPAT_2_6 \
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index ec674dc..5a5299a 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -57,6 +57,7 @@
 #define CPUID_2_L1D_32KB_8WAY_64B 0x2c
 #define CPUID_2_L1I_32KB_8WAY_64B 0x30
 #define CPUID_2_L2_2MB_8WAY_64B   0x7d
+#define CPUID_2_L3_16MB_16WAY_64B 0x4d
 
 
 /* CPUID Leaf 4 constants: */
@@ -131,11 +132,18 @@
 #define L2_LINES_PER_TAG   1
 #define L2_SIZE_KB_AMD   512
 
-/* No L3 cache: */
+/* Level 3 unified cache: */
 #define L3_SIZE_KB 0 /* disabled */
 #define L3_ASSOCIATIVITY   0 /* disabled */
 #define L3_LINES_PER_TAG   0 /* disabled */
 #define L3_LINE_SIZE   0 /* disabled */
+#define L3_N_LINE_SIZE 64
+#define L3_N_ASSOCIATIVITY 16
+#define L3_N_SETS   16384
+#define L3_N_PARTITIONS 1
+#define L3_N_DESCRIPTOR CPUID_2_L3_16MB_16WAY_64B
+#define L3_N_LINES_PER_TAG  1
+#define L3_N_SIZE_KB_AMD16384
 
 /* TLB definitions: */
 
@@ -2279,6 +2287,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 {
 X86CPU *cpu = x86_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
+uint32_t pkg_offset;
 
 /* test if maximum index reached */
 if (index & 0x8000) {
@@ -2332,7 +2341,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 }
 *eax = 1; /* Number of CPUID[EAX=2] calls required */
 *ebx = 0;
-*ecx = 0;
+if (!cpu->enable_l3_cache) {
+*ecx = 0;
+} else {
+*ecx = L3_N_DESCRIPTOR;
+}
 *edx = (L1D_DESCRIPTOR << 16) | \
(L1I_DESCRIPTOR <<  8) | \
(L2_DESCRIPTOR);
@@ -2378,6 +2391,25 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *ecx = L2_SETS - 1;

Re: [Qemu-devel] [PATCH v2 1/3] memory: introduce IOMMUNotifier and its caps

2016-09-06 Thread Peter Xu

On Tue, Sep 06, 2016 at 04:29:52PM +0200, Paolo Bonzini wrote:
> Since you're walking the notifier list manually anyway, I think it's
> simpler to get rid of Notifier completely.  Otherwise, this looks pretty
> good!

Yes that should be cleaner. Will fix. Thanks!

-- peterx

Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC

2016-09-06 Thread Nikunj A Dadhania

David Gibson  writes:

> [ Unknown signature status ]
> On Fri, Sep 02, 2016 at 05:19:21PM +1000, Benjamin Herrenschmidt wrote:
>> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
>> > The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
>> > Changes that were needed to enable PowerPC are pretty simple;
>> > 
>> > Patch 01: Take a iothread lock during hcall, as hcall can generate io 
>> > requests
>> >   02: For TCG, we were harcoding smt as 1, this gets rid of the 
>> > limitation
>> 
>> If we do this, we need to implement the shared SPRs properly and the
>> inter-thread doorbells...
>
> Exactly.  I think we want to get MTTCG working with multiple emulated
> cores first, then look into handling emulated hardware
> multi-threading.  2/4 suggests you are conflating the two, when
> they're really not related AFAICT.

Sure. Will do that in my next version.

Regards
Nikunj

Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation

2016-09-06 Thread Nikunj A Dadhania

David Gibson  writes:

> [ Unknown signature status ]
> On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
>> Signed-off-by: Nikunj A Dadhania 
>
> This really needs a comment indicating that this implementation isn't
> strictly correct (although probably good enough in practice).

Sure. And it also does not help if someone uses any store other than
store conditional, that isn't taken care.

Assumption here is the locking primitives use load with reservation and
store conditional. And no other ld/st variant touch this memory.

> Specifically a racing store which happens to store the same value
> which was already in memory should clobber the reservation, but won't
> with this implementation.
>
> I had a long discussion at KVM Forum with Emilio Costa about this, in
> which I discovered just how hard it is to strictly implement
> store-conditional semantics in terms of anything else.  So, this is
> probably a reasonable substitute, but we should note the fact that
> it's not 100%.

I will update the commit log.

Regards,
Nikunj

[Qemu-devel] [PATCH] scsi: pvscsi: limit process IO loop to maximum page count

2016-09-06 Thread P J P

From: Prasad J Pandit 

Vmware Paravirtual SCSI emulator while processing IO requests
could run into an infinite loop if 'pvscsi_ring_pop_req_descr'
always returned positive value. Limit IO loop to the maximum
page count.

Reported-by: Li Qiang 
Signed-off-by: Prasad J Pandit 
---
 hw/scsi/vmw_pvscsi.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/vmw_pvscsi.c b/hw/scsi/vmw_pvscsi.c
index babac5a..3e77a08 100644
--- a/hw/scsi/vmw_pvscsi.c
+++ b/hw/scsi/vmw_pvscsi.c
@@ -711,11 +711,13 @@ pvscsi_process_request_descriptor(PVSCSIState *s,
 static void
 pvscsi_process_io(PVSCSIState *s)
 {
+int descr_pa_cnt = 0;
 PVSCSIRingReqDesc descr;
 hwaddr next_descr_pa;
 
 assert(s->rings_info_valid);
-while ((next_descr_pa = pvscsi_ring_pop_req_descr(&s->rings)) != 0) {
+while (((next_descr_pa = pvscsi_ring_pop_req_descr(&s->rings)) != 0)
+&& descr_pa_cnt++ < PVSCSI_SETUP_RINGS_MAX_NUM_PAGES) {
 
 /* Only read after production index verification */
 smp_rmb();
-- 
2.5.5

Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads

2016-09-06 Thread Nikunj A Dadhania

David Gibson  writes:

> [ Unknown signature status ]
> On Fri, Sep 02, 2016 at 12:02:54PM +0530, Nikunj A Dadhania wrote:
>> Signed-off-by: Nikunj A Dadhania 
>> ---
>>  target-ppc/kvm.c | 2 +-
>>  target-ppc/kvm_ppc.h | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
>> index dcb68b9..20eb450 100644
>> --- a/target-ppc/kvm.c
>> +++ b/target-ppc/kvm.c
>> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int 
>> mpic_proxy)
>>  
>>  int kvmppc_smt_threads(void)
>>  {
>> -return cap_ppc_smt ? cap_ppc_smt : 1;
>> +return cap_ppc_smt ? cap_ppc_smt : 8;
>>  }
>>  
>>  #ifdef TARGET_PPC64
>> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
>> index 5461d10..053db0a 100644
>> --- a/target-ppc/kvm_ppc.h
>> +++ b/target-ppc/kvm_ppc.h
>> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU 
>> *cpu, int mpic_proxy)
>>  
>>  static inline int kvmppc_smt_threads(void)
>>  {
>> -return 1;
>> +return 8;
>>  }
>>  
>>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)
>
> This doesn't make sense to me.  Allowing multiple hardware threads to
> be emulated in TCG (which I think will require a bunch of fixes) seems
> like a separate question to MTTCG - i.e. allowing separate vcpus to be
> TCG emulated in separate host threads.

Right, I was planning to drop this for now and introduce this later. As
noted in the other thread, i did not consider the multi-core case.

Regards,
Nikunj

Re: [Qemu-devel] [PATCH 0/3] memory: add IOMMU notifier type

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 02:26:44PM +0800, Jason Wang wrote:
> 
> 
> On 2016年09月06日 13:49, Peter Xu wrote:
> > On Tue, Sep 06, 2016 at 03:06:17PM +1000, David Gibson wrote:
> > > On Mon, Sep 05, 2016 at 03:21:18PM +0800, Peter Xu wrote:
> > > > In the thread:
> > > > 
> > > >https://lists.gnu.org/archive/html/qemu-devel/2016-09/msg00254.html
> > > > 
> > > > Alex proposed a way for vhost DMAR to be enabled without breaking
> > > > existing protections on vIOMMU and device assignments. This series
> > > > tried to implement the idea, by introducing a IOMMU notifier type for
> > > > each IOMMU memory region.
> > > Hrm, I'm pretty dubious about this concept, since it's basically just
> > > an interim hack for an incomplete notifier implementation on x86.
> > > What makes just fixing the notifier so difficult?
> > Aviv is working on the full notifier support for that. It's been
> > months since his last post though. If he cannot continue it (due to
> > any reason), I can take it over. But for now, we may still need to
> > wait for his patches to fully enable a complete notifier mechanism.
> 
> Yes, and the issue is:
> 
> - There's no way for current intel IOMMU code to be notified when guest map
> a page. So it's impossible for intel IOMMU to co-work with vfio now.
> - A solution is caching mode (CM) support which requires a TLB invalidation
> even if it was a non-present to present changing, but this is still under
> development.

Right.  AIUI the whole point of CM is to allow this sort of
virtualization.

What I hadn't reaalized before was that even with CM==0, invalidations
were still notified, otherwise I didn't see how anything was possible.

> - VT-d spec requires: "Hardware implementations of this architecture must
> support operation corresponding to CM=0." So even if we have CM mode which
> can work with vfio, we still must support intel IOMMU with CM disabled. It
> was not only a spec requirement but also a performance consideration. (CM is
> usually slower)

Well.. this isn't a hardware implementation, so I don't think it's
really a spec requirement, although I can see why you'd want it so
that you can do something as near as possible indistinguishable from
hardware.

> So in conclusion, there's a mode for intel IOMMU that can't work with vfio
> at all. Fixing the notifier does not solve this since the notifier won't be
> able to be triggered.

Ok, I understand.  CM==1 will be able to have the full notifier, but
CM==0 won't and will never work with VFIO, so we need a way to detect
that cleanly.

> 
> > 
> > I don't know how POWER works to provide a complete notifier, and
> > whether POWER can selectively enable the notified items... But for
> > Intel VT-d, it provided two choices: by default, only cache
> > invalidations are notified (even, we can disalbe cache invaliations),
> > but if one want to have a complete notifier, just set the CM bit to 1.
> > So I just think it'll be cool if we can support both cases. E.g., for
> > vhost, it does not need to be notified with newly added entries, but
> > only cache invalidations. IMHO we can't just force vhost to use a
> > complete notifier while actually it only needs part of it.
> > 
> > Thanks,
> > 
> > -- peterx
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC v4 21/28] tcg: enable tb_lock() for SoftMMU

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

tb_lock() has long been used for linux-user mode to protect code
generation. By enabling it now we prepare for MTTCG and ensure all code
generation is serialised by this lock. The other major structure that
needs protecting is the l1_map and its PageDesc structures. For the
SoftMMU case we also use tb_lock() to protect these structures instead
of linux-user mmap_lock() which as the name suggests serialises updates
to the structure as a result of guest mmap operations.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 20/28] cpus: tweak sleeping and safe_work rules for MTTCG

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

Once TCG gains the ability to sleep individual threads we need to make
sure they don't sleep when safe work is pending as all threads need to
go through the process_queued_work function. Also if we have multiple
threads wait_for_safe_work can now sleep without deadlocking.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 18/28] tcg: remove global exit_request

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

There are now only two uses of the global exit_request left.

The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.

The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.

We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH RFC 3/4] target-ppc: use atomic_cmpxchg for ld/st reservation

2016-09-06 Thread David Gibson

On Fri, Sep 02, 2016 at 12:02:55PM +0530, Nikunj A Dadhania wrote:
> Signed-off-by: Nikunj A Dadhania 

This really needs a comment indicating that this implementation isn't
strictly correct (although probably good enough in practice).
Specifically a racing store which happens to store the same value
which was already in memory should clobber the reservation, but won't
with this implementation.

I had a long discussion at KVM Forum with Emilio Costa about this, in
which I discovered just how hard it is to strictly implement
store-conditional semantics in terms of anything else.  So, this is
probably a reasonable substitute, but we should note the fact that
it's not 100%.

> ---
>  target-ppc/translate.c | 24 +---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 4a882b3..447c13e 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -72,6 +72,7 @@ static TCGv cpu_cfar;
>  #endif
>  static TCGv cpu_xer, cpu_so, cpu_ov, cpu_ca;
>  static TCGv cpu_reserve;
> +static TCGv cpu_reserve_val;
>  static TCGv cpu_fpscr;
>  static TCGv_i32 cpu_access_type;
>  
> @@ -176,6 +177,9 @@ void ppc_translate_init(void)
>  cpu_reserve = tcg_global_mem_new(cpu_env,
>   offsetof(CPUPPCState, reserve_addr),
>   "reserve_addr");
> +cpu_reserve_val = tcg_global_mem_new(cpu_env,
> + offsetof(CPUPPCState, reserve_val),
> + "reserve_val");
>  
>  cpu_fpscr = tcg_global_mem_new(cpu_env,
> offsetof(CPUPPCState, fpscr), "fpscr");
> @@ -3086,7 +3090,7 @@ static void gen_##name(DisasContext *ctx)   
>  \
>  }\
>  tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);\
>  tcg_gen_mov_tl(cpu_reserve, t0); \
> -tcg_gen_st_tl(gpr, cpu_env, offsetof(CPUPPCState, reserve_val)); \
> +tcg_gen_mov_tl(cpu_reserve_val, gpr);\
>  tcg_temp_free(t0);   \
>  }
>  
> @@ -3112,14 +3116,28 @@ static void gen_conditional_store(DisasContext *ctx, 
> TCGv EA,
>int reg, int memop)
>  {
>  TCGLabel *l1;
> +TCGv_i32 tmp = tcg_temp_local_new_i32();
> +TCGv t0;
>  
> +tcg_gen_movi_i32(tmp, 0);
>  tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
>  l1 = gen_new_label();
>  tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
> -tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], 1 << CRF_EQ);
> -tcg_gen_qemu_st_tl(cpu_gpr[reg], EA, ctx->mem_idx, memop);
> +
> +t0 = tcg_temp_new();
> +tcg_gen_atomic_cmpxchg_tl(t0, EA, cpu_reserve_val, cpu_gpr[reg],
> +  ctx->mem_idx, DEF_MEMOP(memop));
> +tcg_gen_setcond_tl(TCG_COND_EQ, t0, t0, cpu_reserve_val);
> +tcg_gen_trunc_tl_i32(tmp, t0);
> +
>  gen_set_label(l1);
> +tcg_gen_shli_i32(tmp, tmp, CRF_EQ);
> +tcg_gen_or_i32(cpu_crf[0], cpu_crf[0], tmp);
>  tcg_gen_movi_tl(cpu_reserve, -1);
> +tcg_gen_movi_tl(cpu_reserve_val, 0);
> +
> +tcg_temp_free(t0);
> +tcg_temp_free_i32(tmp);
>  }
>  #endif
>  

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 2/4] target-ppc: with MTTCG report more threads

2016-09-06 Thread David Gibson

On Fri, Sep 02, 2016 at 12:02:54PM +0530, Nikunj A Dadhania wrote:
> Signed-off-by: Nikunj A Dadhania 
> ---
>  target-ppc/kvm.c | 2 +-
>  target-ppc/kvm_ppc.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index dcb68b9..20eb450 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -2090,7 +2090,7 @@ void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, int 
> mpic_proxy)
>  
>  int kvmppc_smt_threads(void)
>  {
> -return cap_ppc_smt ? cap_ppc_smt : 1;
> +return cap_ppc_smt ? cap_ppc_smt : 8;
>  }
>  
>  #ifdef TARGET_PPC64
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index 5461d10..053db0a 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -128,7 +128,7 @@ static inline void kvmppc_set_mpic_proxy(PowerPCCPU *cpu, 
> int mpic_proxy)
>  
>  static inline int kvmppc_smt_threads(void)
>  {
> -return 1;
> +return 8;
>  }
>  
>  static inline int kvmppc_or_tsr_bits(PowerPCCPU *cpu, uint32_t tsr_bits)

This doesn't make sense to me.  Allowing multiple hardware threads to
be emulated in TCG (which I think will require a bunch of fixes) seems
like a separate question to MTTCG - i.e. allowing separate vcpus to be
TCG emulated in separate host threads.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH RFC 0/4] Enable MTTCG on PowerPC

2016-09-06 Thread David Gibson

On Fri, Sep 02, 2016 at 05:19:21PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2016-09-02 at 12:02 +0530, Nikunj A Dadhania wrote:
> > The series is a first attempt at enabling Multi-Threaded TCG on PowerPC.
> > Changes that were needed to enable PowerPC are pretty simple;
> > 
> > Patch 01: Take a iothread lock during hcall, as hcall can generate io 
> > requests
> >   02: For TCG, we were harcoding smt as 1, this gets rid of the 
> > limitation
> 
> If we do this, we need to implement the shared SPRs properly and the
> inter-thread doorbells...

Exactly.  I think we want to get MTTCG working with multiple emulated
cores first, then look into handling emulated hardware
multi-threading.  2/4 suggests you are conflating the two, when
they're really not related AFAICT.

> 
> >   03: Use atomic_cmpxchg in store conditional
> >   04: With more threads, flush the entry from each cpu. 
> >   This can be optimized further.
> > 
> > The patches are based on the Alex Bennee's base enabling patches for 
> > MTTCG[1] and Emilios's cmpxchg atomics. The consolidated tree of the 
> > above patches is here:
> > 
> > https://github.com/stsquad/qemu/tree/mttcg/base-patches-v4-with-cmpxchg-atomics-v2
> > 
> > Apart from the above, PPC patches are based out of ppc-for-2.8 and 
> > load/store consolidation patches [2]
> > 
> > Series with all dependent patches available here: 
> > https://github.com/nikunjad/qemu/tree/ppc_mttcg_v1
> > 
> > Testing: 
> > 
> > 
> > -smp 4,cores=1,threads=4 -accel tcg,thread=multi
> > 
> > TODO
> > 
> > Implement msgsndp instructions(door-bell), newer kernels enable it 
> > depending on the PVR. I have been using following workaround to boot.
> > https://github.com/nikunjad/qemu/commit/2c10052c5f93418a6b920e6ba3ce1813fcf50bc4
> > 
> > [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg391966.html
> > [2] https://lists.gnu.org/archive/html/qemu-ppc/2016-08/msg00265.html
> > 
> > Nikunj A Dadhania (4):
> >   spapr-hcall: take iothread lock during handler call
> >   target-ppc: with MTTCG report more threads
> >   target-ppc: use atomic_cmpxchg for ld/st reservation
> >   target-ppc: flush tlb from all the cpu
> > 
> >  cputlb.c| 15 +++
> >  hw/ppc/spapr_hcall.c| 11 +--
> >  include/exec/exec-all.h |  2 ++
> >  target-ppc/kvm.c|  2 +-
> >  target-ppc/kvm_ppc.h|  2 +-
> >  target-ppc/mmu-hash64.c |  2 +-
> >  target-ppc/translate.c  | 24 +---
> >  7 files changed, 50 insertions(+), 8 deletions(-)
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [RFC v4 17/28] cpus: re-factor out handle_icount_deadline

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

In preparation for adding a MTTCG thread we re-factor out a bit of what
will be common code to handle the QEMU_CLOCK_VIRTUAL expiration.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 16/28] tcg: drop global lock during TCG code execution

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

+if (mr->global_locking) {
+qemu_mutex_lock_iothread();
+locked = true;
+}
 memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
 iotlbentry->attrs);
+if (locked) {
+qemu_mutex_unlock_iothread();
+}


I'm not keen on this pattern.

(1) Why not use recursive locks?

(2) If there's a good reason why not, then perhaps

  if (mr->global_locking) {
qemu_mutex_lock_iothread();
do_something;
qemu_mutex_unlock_iothread();
  } else {
do_something;
  }

is a better pattern to use.


r~

Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support

2016-09-06 Thread Neo Jia

On Wed, Sep 07, 2016 at 10:22:26AM +0800, Jike Song wrote:
> On 09/02/2016 11:03 PM, Alex Williamson wrote:
> > On Fri,  2 Sep 2016 16:16:08 +0800
> > Jike Song  wrote:
> > 
> >> This patchset is based on NVidia's "Add Mediated device support" series, 
> >> version 6:
> >>
> >>http://www.spinics.net/lists/kvm/msg136472.html
> > 
> > 
> > Hi Jike,
> > 
> > I'm thrilled by your active participation here, but I'm confused which
> > versions I should be reviewing and where the primary development is
> > going.  Kirti sent v7 a week ago, so I would have expected a revision
> > based on that rather than a re-write based on v6 plus incorporation of a
> > few of Kirti's patches directly.
> 
> Hi Alex,
> 
> [Sorry! replied this on Monday but it was silently dropped by the our 
> firewall]
> 
> 
> 
> The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, 
> to
> demonstrate how is it possible and beneficial to:
> 
>   1, Introduce an independent device between physical and mdev;
>   2, Simplify vfio-mdev and make it the most flexible for vendor drivers;
> 
> Unfortunately neither was understood or adopted in v7:
> 
>   http://www.spinics.net/lists/kvm/msg137081.html
> 
> So here came the v2, as a standalone series, to give a whole and straight
> demonstration. The reason of still basing on v6:
> 
>   - Addressed all v6 comments (except the iommu part);
>   - There is no comments yet for v7 (except the sysfs ones);
> 
> 
> 
> > I liked the last version of these
> > changes a lot, but we need to figure out how to combine development
> > because we do not have infinite cycles for review available :-\  Thanks!
> 
> Fully understand.
> 
> Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
> at the direction we prefer. 

Hi Jike,

I wish I could meet you in person in KVM forum couple weeks ago so we can have a
better discussion.

We are trying our best to accommodate almost all requirements / comments from 
use cases and code reviews while keeping little (or none) architectural changes
between revisions.

> We would be highly glad and thankful if Neo/Kirti
> would adopt the code in their next version, which will certainly form a
> more simple and consolidated base for future co-development; otherwise
> and we could at least discuss the concerns, in case of any.
> 

As I have said in my previous response to you, if you have any questions about
adopting the framework that we have developed, you are very welcome to
comment/speak out on the code review thread like others. And if it is reasonable
request and won't break other vendors' use case, we will adopt it (one example
is the online file and removing the mdev pci dependency).

Just some update for you regarding the v7 patches, currently we are very
actively trying to lock down the sysfs and management interfaces discussion.

So, if you would like to make the upstream happen sooner, please join us in the
v7 and following patch discussion instead of rewriting them.

Thanks,
Neo

> 
> --
> Thanks,
> Jike
> 
> >>
> >>
> >> Key Changes from Nvidia v6:
> >>
> >>- Introduced an independent struct device to host device, thereby
> >>  formed a physical-host-mdev hierarchy, and highly reused Linux
> >>  driver core support;
> >>
> >>- Added online/offline to mdev_bus_type, leveraging the 'online'
> >>  attr support from Linux driver core;
> >>
> >>- Removed mdev_class and other unecessary stuff;
> >>
> >>/*
> >> * Given the changes above, the code volume of mdev core driver
> >> * dramatically reduced by ~50%.
> >> */
> >>
> >>
> >>- Interfaces between vfio_mdev and vendor driver are high-level,
> >>  e.g. ioctl instead of get_irq_info/set_irq_info and reset,
> >>  start/stop became mdev oriented, etc.;
> >>
> >>/*
> >> * Given the changes above, the code volume of mdev core driver
> >> * dramatically reduced by ~64%.
> >> */
> >>
> >>
> >> Test
> >>
> >>- Tested with KVMGT
> >>
> >> TODO
> >>
> >>- Re-implement the attribute group of host device as long as the
> >>  sysfs hierarchy in discussion gets finalized;
> >>
> >>- Move common routines from current vfio-pci into a higher location,
> >>  export them for various VFIO bus drivers and/or mdev vendor drivers;
> >>
> >>- Add implementation examples for vendor drivers to Documentation;
> >>
> >>- Refine IOMMU changes
> >>
> >>
> >>
> >> Jike Song (2):
> >>   Mediated device Core driver
> >>   vfio: VFIO bus driver for MDEV devices
> >>
> >> Kirti Wankhede (2):
> >>   vfio iommu: Add support for mediated devices
> >>   docs: Add Documentation for Mediated devices
> >>
> >>  Documentation/vfio-mediated-device.txt | 203 ++
> >>  drivers/vfio/Kconfig   |   1 +
> >>  drivers/vfio/Makefile  |   1 +
> >>  drivers/vfio/mdev/Kconfig  |  18 ++
> >>  drivers/vfio/mdev/Makefile |   5 +
> >>  drivers/vfio/mdev/m

Re: [Qemu-devel] [RFC v4 15/28] tcg: rename tcg_current_cpu to tcg_current_rr_cpu

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.

qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

For the time being the setting of the global exit_request remains.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 14/28] tcg: add kick timer for single-threaded vCPU emulation

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée 

---
v2
  - re-base fixes
  - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
v3
  - add define for TCG_KICK_FREQ
  - fix checkpatch warning
v4
  - wrap next calc in inline qemu_tcg_next_kick() instead of macro
---
 cpus.c | 32 
 1 file changed, 32 insertions(+)

diff --git a/cpus.c b/cpus.c
index b5b45b8..8c49d6c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1185,9 +1185,34 @@ static void deal_with_unplugged_cpus(void)
 }
 }

+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+static void qemu_cpu_kick_no_halt(void);
+
+#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
+
+static inline int64_t qemu_tcg_next_kick(void)
+{
+return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+}
+
+static void kick_tcg_thread(void *opaque)
+{
+QEMUTimer *self = *(QEMUTimer **) opaque;
+timer_mod(self, qemu_tcg_next_kick());
+qemu_cpu_kick_no_halt();
+}
+
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
+QEMUTimer *kick_timer;

 rcu_register_thread();

@@ -1211,6 +1236,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 }

+/* Set to kick if we have to do more than one vCPU */
+if (CPU_NEXT(first_cpu)) {
+kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
+  &kick_timer);


I'm not especially keen on this pointer to local variable thing.  Perhaps 
better as

  kick_timer = timer_new_ns(..., NULL);
  kick_timer->opaque = kick_timer;

and avoid the indirection in kick_tcg_thread.

Also, we should shut down the timer when the cpu is removed, surely?


r~

Re: [Qemu-devel] [PATCH] iscsi: Fix divide-by-zero regression on raw SG devices

2016-09-06 Thread Holger Schranz


Hi Eric,

I will test this patch within tomorrow. Is this o.k. for you?

Beste regards

Holger

Am 06.09.2016 um 21:04 schrieb Eric Blake:

When qemu uses iscsi devices in sg mode, iscsilun->block_size
is left at 0.  Prior to commits cf081fca and similar, when
block limits were tracked in sectors, this did not matter:
various block limits were just left at 0.  But when we started
scaling by block size, this caused SIGFPE.

One possible solution for SG mode is to just blindly skip ALL
of iscsi_refresh_limits(), since we already short circuit so
many other things in sg mode.  But this patch takes a slightly
more conservative approach, and merely guarantees that scaling
will succeed (for SG devices, the scaling is done to the block
layer default of 512 bytes, since we don't know of any iscsi
devices with a smaller block size), while still using multiples
of the original size.  Resulting limits may still be zero in SG
mode (that is, we only fix block_size used as a denominator, not
all uses).

Reported-by: Holger Schranz 
Signed-off-by: Eric Blake 
CC: qemu-sta...@nongnu.org
---

I would really appreciate Holger testing this patch. We could
also go with the much shorter patch that just does
if (bs->sg) { return; }
at the top of iscsi_refresh_limits(), but I'm not sure if that
would break anything else in the block layer (we had several,
but not all, limits that were provably left alone at 0 for
sg mode).

  block/iscsi.c | 12 
  1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 95ce9e1..ff84cd1 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1813,6 +1813,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)

  IscsiLun *iscsilun = bs->opaque;
  uint64_t max_xfer_len = iscsilun->use_16_for_rw ? 0x : 0x;
+unsigned int block_size = MIN_NON_ZERO(BDRV_SECTOR_SIZE,
+   iscsilun->block_size);
+
+assert(iscsilun->block_size >= BDRV_SECTOR_SIZE || bs->sg);

  bs->bl.request_alignment = iscsilun->block_size;

@@ -1820,12 +1824,12 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
  max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
  }

-if (max_xfer_len * iscsilun->block_size < INT_MAX) {
+if (max_xfer_len * block_size < INT_MAX) {
  bs->bl.max_transfer = max_xfer_len * iscsilun->block_size;
  }

  if (iscsilun->lbp.lbpu) {
-if (iscsilun->bl.max_unmap < 0x / iscsilun->block_size) {
+if (iscsilun->bl.max_unmap < 0x / block_size) {
  bs->bl.max_pdiscard =
  iscsilun->bl.max_unmap * iscsilun->block_size;
  }
@@ -1835,7 +1839,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
  bs->bl.pdiscard_alignment = iscsilun->block_size;
  }

-if (iscsilun->bl.max_ws_len < 0x / iscsilun->block_size) {
+if (iscsilun->bl.max_ws_len < 0x / block_size) {
  bs->bl.max_pwrite_zeroes =
  iscsilun->bl.max_ws_len * iscsilun->block_size;
  }
@@ -1846,7 +1850,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, 
Error **errp)
  bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
  }
  if (iscsilun->bl.opt_xfer_len &&
-iscsilun->bl.opt_xfer_len < INT_MAX / iscsilun->block_size) {
+iscsilun->bl.opt_xfer_len < INT_MAX / block_size) {
  bs->bl.opt_transfer = pow2floor(iscsilun->bl.opt_xfer_len *
  iscsilun->block_size);
  }



---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

Re: [Qemu-devel] [RFC v4 13/28] tcg: add options for enabling MTTCG

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

From: KONRAD Frederic 

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic 
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 11/28] tcg: move tcg_exec_all and helpers above thread fn

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

This is a pure mechanical change in preparation for up-coming
re-factoring. Instead of a forward declaration for tcg_exec_all it and
the associated helper functions are moved in front of the call from
qemu_tcg_cpu_thread_fn.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 09/28] tcg: protect TBContext with tb_lock.

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:




Missing?  Or just remove it.


r~

Re: [Qemu-devel] [RFC v4 08/28] translate-all: Add assert_(memory|tb)_lock annotations

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

This adds calls to the assert_(memory|tb)_lock for all public APIs which
are documented as needing them held for linux-user mode. The asserts are
NOPs for system-mode although these will be converted when MTTCG is
enabled.

Signed-off-by: Alex Bennée 



Reviewed-by: Richard Henderson 


@@ -,7 +1121,7 @@ static void build_page_bitmap(PageDesc *p)
 tb_end = tb_start + tb->size;
 if (tb_end > TARGET_PAGE_SIZE) {
 tb_end = TARGET_PAGE_SIZE;
-}
+ }
 } else {


Introducing a whitespace error?


r~

Re: [Qemu-devel] [RFC v4 07/28] linux-user/elfload: ensure mmap_lock() held while setting up

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

Future patches will enforce the holding of mmap_lock() when we are
manipulating internal memory structures. Technically it doesn't matter
in the case of elfload as we haven't started executing yet. However it
is easier to grab the lock when required than special case the
translate-all API.

Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RFC v4 06/28] tcg: comment on which functions have to be called with tb_lock held

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

From: Paolo Bonzini 

softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks.  Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.

This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.

Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock.  The next one will rectify this.

Signed-off-by: Paolo Bonzini 
Signed-off-by: Alex Bennée 


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PULL 00/66] ppc-for-2.8 queue 20160906

2016-09-06 Thread Benjamin Herrenschmidt

On Wed, 2016-09-07 at 07:52 +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-09-06 at 23:09 +0200, Thomas Huth wrote:
> > 
> > The bad commit is: "ppc: Speed up load/store multiple"
> > 
> > There are two "#if defined(HOST_WORDS_BIGENDIAN)" sections in this
> > patch
> > which are both bad: The memcpy tries to copy 32-bit values into 64-
> > bit
> > registers, which of course does not work (unless you compile this
> > code
> > for a 32-bit host only where the size of a gpr is only 32-bit).
> 
> The instruction does that. It only copies 32-bits. I think it's
> correct.  

Except of course when the host is 64-bit and we enable with a 64-bit
target_ulong ... ugh ;-)

Drop the patch for now, I'll redo it later.

Cheers,
Ben.

Re: [Qemu-devel] [RFC v2 0/4] adding mdev bus and vfio support

2016-09-06 Thread Jike Song

On 09/02/2016 11:03 PM, Alex Williamson wrote:
> On Fri,  2 Sep 2016 16:16:08 +0800
> Jike Song  wrote:
> 
>> This patchset is based on NVidia's "Add Mediated device support" series, 
>> version 6:
>>
>>  http://www.spinics.net/lists/kvm/msg136472.html
> 
> 
> Hi Jike,
> 
> I'm thrilled by your active participation here, but I'm confused which
> versions I should be reviewing and where the primary development is
> going.  Kirti sent v7 a week ago, so I would have expected a revision
> based on that rather than a re-write based on v6 plus incorporation of a
> few of Kirti's patches directly.

Hi Alex,

[Sorry! replied this on Monday but it was silently dropped by the our firewall]



The v1 of this patchset was send as incremental ones, basing on Nvidia's v6, to
demonstrate how is it possible and beneficial to:

1, Introduce an independent device between physical and mdev;
2, Simplify vfio-mdev and make it the most flexible for vendor drivers;

Unfortunately neither was understood or adopted in v7:

http://www.spinics.net/lists/kvm/msg137081.html

So here came the v2, as a standalone series, to give a whole and straight
demonstration. The reason of still basing on v6:

- Addressed all v6 comments (except the iommu part);
- There is no comments yet for v7 (except the sysfs ones);



> I liked the last version of these
> changes a lot, but we need to figure out how to combine development
> because we do not have infinite cycles for review available :-\  Thanks!

Fully understand.

Here is the dilemma: v6 is an obsolete version to work upon, v7 is still not
at the direction we prefer. We would be highly glad and thankful if Neo/Kirti
would adopt the code in their next version, which will certainly form a
more simple and consolidated base for future co-development; otherwise
and we could at least discuss the concerns, in case of any.


--
Thanks,
Jike

>>
>>
>> Key Changes from Nvidia v6:
>>
>>  - Introduced an independent struct device to host device, thereby
>>formed a physical-host-mdev hierarchy, and highly reused Linux
>>driver core support;
>>
>>  - Added online/offline to mdev_bus_type, leveraging the 'online'
>>attr support from Linux driver core;
>>
>>  - Removed mdev_class and other unecessary stuff;
>>
>>  /*
>>   * Given the changes above, the code volume of mdev core driver
>>   * dramatically reduced by ~50%.
>>   */
>>
>>
>>  - Interfaces between vfio_mdev and vendor driver are high-level,
>>e.g. ioctl instead of get_irq_info/set_irq_info and reset,
>>start/stop became mdev oriented, etc.;
>>
>>  /*
>>   * Given the changes above, the code volume of mdev core driver
>>   * dramatically reduced by ~64%.
>>   */
>>
>>
>> Test
>>
>>  - Tested with KVMGT
>>
>> TODO
>>
>>  - Re-implement the attribute group of host device as long as the
>>sysfs hierarchy in discussion gets finalized;
>>
>>  - Move common routines from current vfio-pci into a higher location,
>>export them for various VFIO bus drivers and/or mdev vendor drivers;
>>
>>  - Add implementation examples for vendor drivers to Documentation;
>>
>>  - Refine IOMMU changes
>>
>>
>>
>> Jike Song (2):
>>   Mediated device Core driver
>>   vfio: VFIO bus driver for MDEV devices
>>
>> Kirti Wankhede (2):
>>   vfio iommu: Add support for mediated devices
>>   docs: Add Documentation for Mediated devices
>>
>>  Documentation/vfio-mediated-device.txt | 203 ++
>>  drivers/vfio/Kconfig   |   1 +
>>  drivers/vfio/Makefile  |   1 +
>>  drivers/vfio/mdev/Kconfig  |  18 ++
>>  drivers/vfio/mdev/Makefile |   5 +
>>  drivers/vfio/mdev/mdev_core.c  | 250 +
>>  drivers/vfio/mdev/mdev_driver.c| 155 ++
>>  drivers/vfio/mdev/mdev_private.h   |  29 ++
>>  drivers/vfio/mdev/mdev_sysfs.c | 155 ++
>>  drivers/vfio/mdev/vfio_mdev.c  | 187 
>>  drivers/vfio/vfio.c|  82 ++
>>  drivers/vfio/vfio_iommu_type1.c| 499 
>> +
>>  include/linux/mdev.h   | 159 +++
>>  include/linux/vfio.h   |  13 +-
>>  14 files changed, 1709 insertions(+), 48 deletions(-)
>>  create mode 100644 Documentation/vfio-mediated-device.txt
>>  create mode 100644 drivers/vfio/mdev/Kconfig
>>  create mode 100644 drivers/vfio/mdev/Makefile
>>  create mode 100644 drivers/vfio/mdev/mdev_core.c
>>  create mode 100644 drivers/vfio/mdev/mdev_driver.c
>>  create mode 100644 drivers/vfio/mdev/mdev_private.h
>>  create mode 100644 drivers/vfio/mdev/mdev_sysfs.c
>>  create mode 100644 drivers/vfio/mdev/vfio_mdev.c
>>  create mode 100644 include/linux/mdev.h
>>
>

Re: [Qemu-devel] [RFC v4 04/28] cpu-exec: include cpu_index in CPU_LOG_EXEC messages

2016-09-06 Thread Richard Henderson


On 08/11/2016 08:24 AM, Alex Bennée wrote:

Even more important when debugging MTTCG is seeing which vCPU is
currently executing.

Signed-off-by: Alex Bennée 
---
 cpu-exec.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH v7 02/20] qapi: Add lock-mode in blockdev-add options

2016-09-06 Thread Fam Zheng

On Tue, 09/06 18:18, Kevin Wolf wrote:
> Am 08.08.2016 um 15:13 hat Fam Zheng geschrieben:
> > To allow overriding the default locking behavior when opening the image.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  qapi/block-core.json | 19 ++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 5e2d7d7..d1eb197 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -2151,6 +2151,20 @@
> >  '*debug-level': 'int' } }
> >  
> >  ##
> > +# @BlockdevLockMode
> > +#
> > +# Describes how QEMU should lock the image.
> > +#
> > +# @off:   Disabled
> > +# @shared:Use shared lock for both RO and RW images.
> > +# @auto:  Use exclusive lock for RW images, and shared lock for RO 
> > images.
> > +#
> > +# Since: 2.7
> > +##
> > +{ 'enum': 'BlockdevLockMode',
> > +  'data': [ 'off', 'shared', 'auto' ] }
> 
> You decided to drop 'exclusive'? Didn't we agree last time that we
> should have both a real 'exclusive' and 'auto'?

I must have misunderstood it. I can add exclusive back.

Fam

Re: [Qemu-devel] [PATCH v7 03/20] block: Add and parse "lock-mode" option for image locking

2016-09-06 Thread Fam Zheng

On Tue, 09/06 18:33, Kevin Wolf wrote:
> > +lock_mode = qemu_opt_get(opts, BDRV_OPT_LOCK_MODE) ? : "off";
> 
> Am I missing some other place that overrides this or does it make the
> default "off" rather than "auto"?

It is in patch 19, once all tests are fixed (this patch cannot be reordered
after tests fixing because it involves specifying the lock-mode option.)

Re: [Qemu-devel] [PATCH v3 6/8] docker: make sure debootstrap is at least 1.0.67

2016-09-06 Thread Fam Zheng

On Tue, 09/06 22:05, Sascha Silbe wrote:
> debootstrap prior to 1.0.67 generated an empty sources.list during
> foreign bootstraps (Debian#732255 [1]). Fall back to the git checkout
> if the installed debootstrap version is too old.
> 
> [1] https://bugs.debian.org/732255
> 
> Signed-off-by: Sascha Silbe 
> ---
> v2→v3:
>   - fix unbalanced white space around pipes
>   - replaced GNU-specific version sort option with POSIX compliant set
> of options
> 
>  tests/docker/dockerfiles/debian-bootstrap.pre | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tests/docker/dockerfiles/debian-bootstrap.pre 
> b/tests/docker/dockerfiles/debian-bootstrap.pre
> index 3d9f7f4..ee69e89 100755
> --- a/tests/docker/dockerfiles/debian-bootstrap.pre
> +++ b/tests/docker/dockerfiles/debian-bootstrap.pre
> @@ -3,6 +3,8 @@
>  # Simple wrapper for debootstrap, run in the docker build context
>  #
>  FAKEROOT=`which fakeroot 2> /dev/null`
> +# debootstrap < 1.0.67 generates empty sources.list, see Debian#732255
> +MIN_DEBOOTSTRAP_VERSION=1.0.67
>  
>  exit_and_skip()
>  {
> @@ -40,9 +42,17 @@ fi
>  #
>  
>  if [ -z $DEBOOTSTRAP_DIR ]; then
> +NEED_DEBOOTSTRAP=false
>  DEBOOTSTRAP=`which debootstrap 2> /dev/null`
>  if [ -z $DEBOOTSTRAP ]; then
>  echo "No debootstrap installed, attempting to install from SCM"
> +NEED_DEBOOTSTRAP=true
> +elif ! (echo "${MIN_DEBOOTSTRAP_VERSION}" ; "${DEBOOTSTRAP}" --version \
> +| cut -d ' ' -f 2) | sort -t . -n -k 1,1 -k 2,2 -k 3,3 -C; then

Like -V, -C is not available on OSX either. Maybe use "-c"? If so, I can fix it
when applying.

https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/sort.1.html

Fam

> +echo "debootstrap too old, attempting to install from SCM"
> +NEED_DEBOOTSTRAP=true
> +fi
> +if $NEED_DEBOOTSTRAP; then
>  DEBOOTSTRAP_SOURCE=https://anonscm.debian.org/git/d-i/debootstrap.git
>  git clone ${DEBOOTSTRAP_SOURCE} ./debootstrap.git
>  export DEBOOTSTRAP_DIR=./debootstrap.git
> -- 
> 1.9.1
> 
>

Re: [Qemu-devel] [PATCH v2 5/7] ppc/pnv: add a PnvCore object

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 08:14:37AM +0200, Cédric Le Goater wrote:
> On 09/05/2016 06:02 AM, David Gibson wrote:
> > On Wed, Aug 31, 2016 at 06:34:13PM +0200, Cédric Le Goater wrote:
> >> This is largy inspired by sPAPRCPUCore with some simplification, no
> >> hotplug for instance. But the differences are small and the objects
> >> could possibly be merged.
> >>
> >> A set of PnvCore objects is added to the PnvChip and the device
> >> tree is populated looping on these cores.
> >>
> >> Real HW cpu ids are now generated depending on the chip cpu model, the
> >> chip id and a core mask. This id is stored in CPUState->cpu_index and
> >> in PnvCore->core_id and it is used to populate the device tree.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>
> >>  Changes since v1:
> >>
> >>  - changed name to PnvCore
> >>  - changed PnvChip core array type to a 'PnvCore *cores'
> >>  - introduced real cpu hw ids using a core mask from the chip
> >>  - reworked powernv_create_core_node() which populates the device tree
> >>  - added missing "ibm,pa-features" property 
> >>  - smp_cpus representing threads, used smp_cores instead to create the
> >>cores in the chip.
> >>  - removed the use of ppc_get_vcpu_dt_id() 
> >>  - added "POWER8E" and "POWER8NVL" cpu models to exercice the
> >>PnvChipClass
> >>
> >>  hw/ppc/Makefile.objs  |   2 +-
> >>  hw/ppc/pnv.c  | 204 
> >> ++
> >>  hw/ppc/pnv_core.c | 170 ++
> >>  include/hw/ppc/pnv.h  |   7 ++
> >>  include/hw/ppc/pnv_core.h |  47 +++
> >>  5 files changed, 429 insertions(+), 1 deletion(-)
> >>  create mode 100644 hw/ppc/pnv_core.c
> >>  create mode 100644 include/hw/ppc/pnv_core.h
> >>
> >> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> >> index f580e5c41413..08c213c40684 100644
> >> --- a/hw/ppc/Makefile.objs
> >> +++ b/hw/ppc/Makefile.objs
> >> @@ -6,7 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
> >> spapr_rtas.o
> >>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> >>  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> >>  # IBM PowerNV
> >> -obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o
> >> +obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o pnv_core.o
> >>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> >>  obj-y += spapr_pci_vfio.o
> >>  endif
> >> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> >> index b6efb5e3ef07..daf9f459ab0e 100644
> >> --- a/hw/ppc/pnv.c
> >> +++ b/hw/ppc/pnv.c
> >> @@ -35,6 +35,7 @@
> >>  #include "hw/ppc/fdt.h"
> >>  #include "hw/ppc/ppc.h"
> >>  #include "hw/ppc/pnv.h"
> >> +#include "hw/ppc/pnv_core.h"
> >>  #include "hw/loader.h"
> >>  #include "exec/address-spaces.h"
> >>  #include "qemu/cutils.h"
> >> @@ -98,6 +99,136 @@ static int powernv_populate_memory(void *fdt)
> >>  return 0;
> >>  }
> >>  
> >> +/*
> >> + * The PowerNV cores (and threads) need to use real HW ids and not an
> >> + * incremental index like it has been done on other platforms. This HW
> >> + * id is called a PIR and is used in the device tree, in the XSCOM
> >> + * communication to address cores, in the interrupt servers.
> >> + */
> >> +static void powernv_create_core_node(PnvCore *pc, void *fdt,
> >> + int cpus_offset, int chip_id)
> >> +{
> >> +CPUCore *core = CPU_CORE(pc);
> >> +CPUState *cs = CPU(DEVICE(pc->threads));
> >> +DeviceClass *dc = DEVICE_GET_CLASS(cs);
> >> +PowerPCCPU *cpu = POWERPC_CPU(cs);
> >> +int smt_threads = ppc_get_compat_smt_threads(cpu);
> >> +CPUPPCState *env = &cpu->env;
> >> +PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
> >> +uint32_t servers_prop[smt_threads];
> >> +uint32_t gservers_prop[smt_threads * 2];
> >> +int i;
> >> +uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
> >> +   0x, 0x};
> >> +uint32_t tbfreq = PNV_TIMEBASE_FREQ;
> >> +uint32_t cpufreq = 10;
> >> +uint32_t page_sizes_prop[64];
> >> +size_t page_sizes_prop_size;
> >> +const uint8_t pa_features[] = { 24, 0,
> >> +0xf6, 0x3f, 0xc7, 0xc0, 0x80, 0xf0,
> >> +0x80, 0x00, 0x00, 0x00, 0x00, 0x00,
> >> +0x00, 0x00, 0x00, 0x00, 0x80, 0x00,
> >> +0x80, 0x00, 0x80, 0x00, 0x80, 0x00 };
> >> +int offset;
> >> +char *nodename;
> >> +
> >> +nodename = g_strdup_printf("%s@%x", dc->fw_name, core->core_id);
> >> +offset = fdt_add_subnode(fdt, cpus_offset, nodename);
> >> +_FDT(offset);
> >> +g_free(nodename);
> >> +
> >> +_FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id", chip_id)));
> >> +
> >> +_FDT((fdt_setprop_cell(fdt, offset, "reg", core->core_id)));
> >> +_FDT((fdt_setprop_cell(fdt, offset, "ibm,pir", core->core_id)));
> >> +_FDT((fdt_setprop_string(fdt, offset, "devic

Re: [Qemu-devel] [PATCH v2 6/7] ppc/pnv: add a XScomDevice to PnvCore

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 03:54:10PM +0200, Cédric Le Goater wrote:
> On 09/05/2016 06:19 AM, David Gibson wrote:
> > On Wed, Aug 31, 2016 at 06:34:14PM +0200, Cédric Le Goater wrote:
> >> Now that we are using real HW ids for the cores in PowerNV chips, we
> >> can route the XSCOM accesses to them. We just need to attach a
> >> XScomDevice to each core with the associated ranges in the XSCOM
> >> address space.
> >>
> >> To start with, let's install the DTS (Digital Thermal Sensor) handlers
> >> which are easy to handle.
> >>
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>  hw/ppc/pnv.c  |  9 +++
> >>  hw/ppc/pnv_core.c | 67 
> >> +++
> >>  include/hw/ppc/pnv_core.h | 13 +
> >>  3 files changed, 89 insertions(+)
> >>
> >> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> >> index daf9f459ab0e..a31568415192 100644
> >> --- a/hw/ppc/pnv.c
> >> +++ b/hw/ppc/pnv.c
> >> @@ -527,6 +527,7 @@ static void pnv_chip_realize(DeviceState *dev, Error 
> >> **errp)
> >>  for (i = 0, core_hwid = 0; (core_hwid < sizeof(chip->cores_mask) * 8)
> >>   && (i < chip->num_cores); core_hwid++) {
> >>  PnvCore *pnv_core = &chip->cores[i];
> >> +DeviceState *qdev;
> >>  
> >>  if (!(chip->cores_mask & (1 << core_hwid))) {
> >>  continue;
> >> @@ -542,6 +543,14 @@ static void pnv_chip_realize(DeviceState *dev, Error 
> >> **errp)
> >>   &error_fatal);
> >>  object_unref(OBJECT(pnv_core));
> >>  i++;
> >> +
> >> +/* Attach the core to its XSCOM bus */
> >> +qdev = qdev_create(&chip->xscom->bus, TYPE_PNV_CORE_XSCOM);
> > 
> > Again, I think this breaks QOM lifetime rules.
> > 
> >> +qdev_prop_set_uint32(qdev, "core-pir",
> >> + P8_PIR(chip->chip_id, core_hwid));
> >> +qdev_init_nofail(qdev);
> > 
> > qdev_init_nofail() - which will abort on failure - shouldn't be used
> > in a realize() function, which has a way to gracefully report errors.
> 
> yes. I kind of knew that was ugly but the main purpose of the patch 
> is the use of scom. So I took the quick and dirty path :)
> 
> we should do what you propose below:
>  
> > So, in terms of the lifetime thing.  I think one permitted solution is
> > to embed the scom device state in the core device state and use
> > object_initialize().
> 
> yes.
> 
> > Alternatively, since SCOM is by its nature a sideband bus, I wonder if
> > it would make sense to have ScomDevice be a QOM interface, rather than
> > a full QOM class.  That way the core device itself (and other devices
> > with SCOM control) could present the SCOM device interface and be
> > placed directly on the SCOM bus without having to introduce an extra
> > object.  
> 
> what you are proposing is to have a PnvCore inherit from CPUCore and 
> XScomDevice ? I tried that and did not find a way for multi-inheritance. 

QOM supports multiple interface inheritance, but not multiple direct
inheritance.  That's why youd need to change XScomDevice from a class
to an interface, as I propose in a different mail.  Not sure if that
will cause other problems with the qbus infrastructure.  Have a look
at say, 'HotplugHandler' for an example of a QOM interface.

> But yes, we would get rid of the extra object. At the same time, that 
> extra object represents the xscom interface unit in a chiplet which 
> exists on real HW.
> 
> > It will probably make accessing the object innards in response to 
> > SCOM requests more straightforward as well.
> 
> The address space is probably the best approach. I have some experimental 
> patches which look pretty good :
> 
> address-space: xscom-0
>   -0007 (prio 0, RW): xscom-0
> 00b00200-00b0021f (prio 0, RW): lpc-xscom
> 00012000-0001207f (prio 0, RW): core-20-xscom
> 00012100-0001217f (prio 0, RW): core-21-xscom
> 00012200-0001227f (prio 0, RW): core-22-xscom
> 00012300-0001237f (prio 0, RW): core-23-xscom
> 00012400-0001247f (prio 0, RW): core-24-xscom
> 00012500-0001257f (prio 0, RW): core-25-xscom
> 00012600-0001267f (prio 0, RW): core-26-xscom
> 
> 
> 
> > I'm not entirely sure if sharing just an interface will be sufficient
> > for devices under a bus, but it's worth investigating.
> 
> yes. I need to step back a bit and look how I could rework the patchset 
> to QOMify the whole. This is really qdev oriented and there are a couple 
> of places where fixes are needed. I am seeing that better now. 
> 
> The address space should be investigated also.
> 
> >> +
> >> +pnv_core->xd = PNV_CORE_XSCOM(qdev);
> >>  }
> >>  g_free(typename);
> >>  
> >> diff --git a/hw/ppc/pnv_core.c b/hw/ppc/pnv_core.c
> >> index 825aea1194a1..feba374740dc 100644
> >> --- a/hw/ppc/pnv_core.c
>

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 07:45:49AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-09-06 at 16:42 +0200, Cédric Le Goater wrote:
> > > Alternatively.. it might be simpler to just drop the SCOM as a
> > > separate device.  I think you could just hang the scom bus directly
> > > off the chip object.  IIUC the scom is basically the only chip-
> > level
> > > control mechanism, so this does make a certain amount of sense.
> > 
> > yes. I am exposing more and more stuff of the chip object under the 
> > xscom object so we should merge them. I agree. We will still need 
> > some XScomDevice of some sort.
> 
> What you can do is split it this way which matches the HW:
> 
>  - The chip object is the XSCOM parent, it owns the XSCOM bus,
> and expose functions (methods) to read/write XSCOMs. WE could rename
> XSCOM to "PIB" or "PCB" which is the real name of the bus ;-) But that
> might confuse things more than help .
> 
>  - A separate ADU object on each chip that is a SysDevice and does the
> MMIO bridge to XSCOM. It decodes the MMIO range for that chip and calls
> the above accessors.
> 
> That makes it easy to generate XSCOMs using different mechanisms if we
> wish to do so, which could come in handy, such as monitor commands, or
> if we ever do cosimulation with a separate BMC, a simulated FSI, all by
> just calling the first object's methods.

Not what I had in mind - I still was thinking of having the xscom
access unit do the translation and re-issue into the pcb bus address
space.

But sounds like this other approach could have some advantages.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 04:42:01PM +0200, Cédric Le Goater wrote:
> On 09/05/2016 05:39 AM, David Gibson wrote:
> > On Wed, Aug 31, 2016 at 06:34:11PM +0200, Cédric Le Goater wrote:
> >> From: Benjamin Herrenschmidt 
> >>
> >> XSCOM is an interface to a sideband bus provided by the POWER8 chip
> >> pervasive unit, which gives access to a number of facilities in the
> >> chip that are needed by the OPAL firmware and to a lesser extent,
> >> Linux. This is among others how the PCI Host bridges get configured
> >> at boot or how the LPC bus is accessed.
> >>
> >> This provides a simple bus and device type for devices sitting on
> >> XSCOM along with some facilities to optionally generate corresponding
> >> device-tree nodes
> >>
> >> Signed-off-by: Benjamin Herrenschmidt 
> >> [clg: updated for qemu-2.7
> >>   ported on new sPowerNVMachineState which was merged with PnvSystem
> >>   removed TRACE_XSCOM
> >>   fixed checkpatch errors
> >>   replaced assert with error_setg in xscom_realize()
> >>   reworked xscom_create
> >>   introduced the use of the chip_class for chip model contants
> >>   ]
> >> Signed-off-by: Cédric Le Goater 
> >> ---
> >>
> >>  They were some discussions on whether we should use a qemu
> >>  address_space instead of the xscom ranges defined in this patch. 
> >>  I gave it try, it is possible but it brings extra unnecessary calls
> >>  and complexity. I think the current solution is better.
> >>
> >>  hw/ppc/Makefile.objs   |   2 +-
> >>  hw/ppc/pnv.c   |  11 ++
> >>  hw/ppc/pnv_xscom.c | 408 
> >> +
> >>  include/hw/ppc/pnv.h   |   2 +
> >>  include/hw/ppc/pnv_xscom.h |  75 +
> >>  5 files changed, 497 insertions(+), 1 deletion(-)
> >>  create mode 100644 hw/ppc/pnv_xscom.c
> >>  create mode 100644 include/hw/ppc/pnv_xscom.h
> >>
> >> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> >> index 8105db7d5600..f580e5c41413 100644
> >> --- a/hw/ppc/Makefile.objs
> >> +++ b/hw/ppc/Makefile.objs
> >> @@ -6,7 +6,7 @@ obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o 
> >> spapr_rtas.o
> >>  obj-$(CONFIG_PSERIES) += spapr_pci.o spapr_rtc.o spapr_drc.o spapr_rng.o
> >>  obj-$(CONFIG_PSERIES) += spapr_cpu_core.o
> >>  # IBM PowerNV
> >> -obj-$(CONFIG_POWERNV) += pnv.o
> >> +obj-$(CONFIG_POWERNV) += pnv.o pnv_xscom.o
> >>  ifeq ($(CONFIG_PCI)$(CONFIG_PSERIES)$(CONFIG_LINUX), yyy)
> >>  obj-y += spapr_pci_vfio.o
> >>  endif
> >> diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
> >> index 06051268e200..a6e7f66b2c0a 100644
> >> --- a/hw/ppc/pnv.c
> >> +++ b/hw/ppc/pnv.c
> >> @@ -39,6 +39,8 @@
> >>  #include "exec/address-spaces.h"
> >>  #include "qemu/cutils.h"
> >>  
> >> +#include "hw/ppc/pnv_xscom.h"
> >> +
> >>  #include 
> >>  
> >>  #define FDT_ADDR0x0100
> >> @@ -103,6 +105,7 @@ static void *powernv_create_fdt(PnvMachineState *pnv,
> >>  char *buf;
> >>  const char plat_compat[] = "qemu,powernv\0ibm,powernv";
> >>  int off;
> >> +int i;
> >>  
> >>  fdt = g_malloc0(FDT_MAX_SIZE);
> >>  _FDT((fdt_create_empty_tree(fdt, FDT_MAX_SIZE)));
> >> @@ -142,6 +145,11 @@ static void *powernv_create_fdt(PnvMachineState *pnv,
> >>  /* Memory */
> >>  powernv_populate_memory(fdt);
> >>  
> >> +/* Populate XSCOM for each chip */
> >> +for (i = 0; i < pnv->num_chips; i++) {
> >> +xscom_populate_fdt(pnv->chips[i]->xscom, fdt, 0);
> >> +}
> > 
> > There will be a bunch of per-chip things in the fdt - I suggest a
> > common function to do all the fdt creation for a single chip.  Since
> > you've moved to using the fdt_rw functions exclusively, you don't have
> > the sequence constraints that would have made that awkward previously.
> 
> ok.
> 
> >>  return fdt;
> >>  }
> >>  
> >> @@ -305,6 +313,9 @@ static void pnv_chip_realize(DeviceState *dev, Error 
> >> **errp)
> >>  PnvChip *chip = PNV_CHIP(dev);
> >>  PnvChipClass *pcc = PNV_CHIP_GET_CLASS(chip);
> >>  
> >> +/* Set up XSCOM bus */
> >> +chip->xscom = xscom_create(chip);
> > 
> > So, this creates the xscom as a new device object, unfortunately doing
> > that from another object's realize() violations QOM lifetime rules as
> > I understand them.  I think - I have to admit I'm pretty confused on
> > this topic myself.
> > 
> > You could construct the scom from chip init, then realize it from chip
> > realize, but I think using object_new() (via qdev_create()) from
> > another object's init also breaks the rules.  You are however allowed
> > to allocate the scom state statically within the chip and use
> > object_initialize() instead, AIUI.
> > 
> > Alternatively.. it might be simpler to just drop the SCOM as a
> > separate device.  I think you could just hang the scom bus directly
> > off the chip object.  IIUC the scom is basically the only chip-level
> > control mechanism, so this does make a certain amount of sense.
> 
> yes. I am exposing more and more s

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 04:42:58PM +0200, Cédric Le Goater wrote:
> On 09/06/2016 02:48 AM, David Gibson wrote:
> > On Mon, Sep 05, 2016 at 05:11:53PM +1000, Benjamin Herrenschmidt wrote:
> >> On Mon, 2016-09-05 at 13:39 +1000, David Gibson wrote:
>  +static XScomDevice *xscom_find_target(XScomState *s, uint32_t
> >>> pcb_addr,
>  +  uint32_t *range)
>  +{
>  +BusChild *bc;
>  +
>  +QTAILQ_FOREACH(bc, &s->bus->bus.children, sibling) {
>  +DeviceState *qd = bc->child;
>  +XScomDevice *xd = XSCOM_DEVICE(qd);
>  +unsigned int i;
>  +
>  +for (i = 0; i < MAX_XSCOM_RANGES; i++) {
>  +if (xd->ranges[i].addr <= pcb_addr &&
>  +(xd->ranges[i].addr + xd->ranges[i].size) >
> >>> pcb_addr) {
>  +*range = i;
>  +return xd;
>  +}
>  +}
>  +}
> >>>
> >>> Hmm.. you could set up a SCOM local address space using the
> >>> infrastructure in memory.c, rather than doing your own dispatch.
> >>
> >> There are pros and cons to this approach. The memory.c stuff comes with
> >> quite a lot of baggage, not all of it very shinny to be honest ;-) I
> >> still *hate* how it forces upon us a whole 128-bit integer arithmetic
> >> library just so that it can represent 1____ ... 
> > 
> > Ugh, yeah.  I tried to argue against this when it first came in, but
> > was overruled.
> > 
> >> It would be make more sense to use inclusive start/end instead and
> >> stick to 64-bits.
> >>
> >> That being said, we could do that. We'd have to shift the XSCOM
> >> addresses left by 3 since each address is an 8 bytes reigster and
> >> forbid non-8-bytes accesses.
> > 
> > Ok.  I'm not particularly fussed either way.
> 
> 
> The change does seem too invasive. I can give it a try in next
> version.
> 
> When a memory region is triggered, the impacted device will have
> to convert the address with xscom_to_pcb_addr(). There is some 
> dependency on the chip model because the translation is different. 
> That would be an extra op in the xscom device model I guess. 

Actually, I was still thinking of having an MR for the scom interface
unit, which does the xscom_to_pcb_addr() then re-issues the access in
the PCB address space.

But your suggestion might work too.

> Also, the main purpose of the XscomBus is to loop on the devices 
> to populate the device tree. I am wondering if we could just use 
> a simple list under the chip for that purpose. 
> 
> Thanks,
> 
> C. 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 7/7] monitor: fix crash for platforms without a CPU 0

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 08:28:48AM +0200, Cédric Le Goater wrote:
> On 09/05/2016 06:27 AM, David Gibson wrote:
> > On Wed, Aug 31, 2016 at 06:34:15PM +0200, Cédric Le Goater wrote:
> >> On PowerNV, CPU ids start at 0x8 or 0x20, we don't have a CPU 0
> >> anymore. So let's use the first_cpu index to initialize the monitor.
> >>
> >> Signed-off-by: Cédric Le Goater 
> > 
> > So we need a patch like this - amongst other fixes - in order to allow
> > unplug of cpu 0.  I'm not really sure whether to push it ahead now, or
> > gather it up with other no-cpu-0 fixes and send them as a batch.
> 
> I think we could send it now as it removes an assumption on the cpu 
> number.

True enough.  I'll try to queue it once I've sorted out my first 2.8
pull request with the existing backlog.

> 
> C. 
> 
> 
> > 
> >> ---
> >>
> >>  So that you can dump the cpu list with the monitor :
> >>
> >>(qemu) info cpus
> >>* CPU #8: nip=0x0010 thread_id=7742
> >>  CPU #16: nip=0x0010 thread_id=7740
> >>  CPU #24: nip=0x0010 thread_id=7740
> >>  CPU #32: nip=0x0010 thread_id=7740
> >>  CPU #40: nip=0x0010 thread_id=7740
> >>  CPU #48: nip=0x0010 thread_id=7740
> >>  CPU #72: nip=0x0010 thread_id=7740
> >>  CPU #80: nip=0x0010 thread_id=7740
> >>  CPU #136: nip=0x0010 thread_id=7740
> >>  CPU #144: nip=0x0010 thread_id=7740
> >>  CPU #152: nip=0x0010 thread_id=7740
> >>  CPU #160: nip=0x0010 thread_id=7740
> >>  CPU #168: nip=0x0010 thread_id=7740
> >>  CPU #176: nip=0x0010 thread_id=7740
> >>  CPU #200: nip=0x0010 thread_id=7740
> >>  CPU #208: nip=0x0010 thread_id=7740
> >>
> >>  monitor.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/monitor.c b/monitor.c
> >> index e9009de09a6c..19b8ec14f40e 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -1027,7 +1027,7 @@ int monitor_set_cpu(int cpu_index)
> >>  CPUState *mon_get_cpu(void)
> >>  {
> >>  if (!cur_mon->mon_cpu) {
> >> -monitor_set_cpu(0);
> >> +monitor_set_cpu(first_cpu->cpu_index);
> >>  }
> >>  cpu_synchronize_state(cur_mon->mon_cpu);
> >>  return cur_mon->mon_cpu;
> > 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread David Gibson

On Wed, Sep 07, 2016 at 07:47:16AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-09-06 at 16:42 +0200, Cédric Le Goater wrote:
> > 
> > The change does seem too invasive. I can give it a try in next
> > version.
> > 
> > When a memory region is triggered, the impacted device will have
> > to convert the address with xscom_to_pcb_addr(). There is some 
> > dependency on the chip model because the translation is different. 
> > That would be an extra op in the xscom device model I guess. 
> 
> No. If you split the XSCOM bus from the MMIO -> XSCOM bridge (the ADU)
> then the conversion only happens in the former. You don't directly
> route the MMIOs over ! You intercept the MMIOs and use use the
> address_space_rw to "generate" the XSCOM accesses on the other side,
> like I do for the LPC bus.

Oh.. wait.. that is what I had in mind, I think I misinterpreted
something one of you said.

> 
> We need that anyway because of the way XSCOMs can manipulate the HMER
> etc...
> 
> > Also, the main purpose of the XscomBus is to loop on the devices 
> > to populate the device tree. I am wondering if we could just use 
> > a simple list under the chip for that purpose.
> 
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v2 1/7] docker.py: don't hang on large docker output

2016-09-06 Thread Eric Blake

On 09/06/2016 08:53 PM, Fam Zheng wrote:
>>>
>>> Too many blank lines? Otherwise looks good.
>>
>> You caught me sneaking in a PEP-8 compliance fix. ;)
> 
> Ah, no problem. I just didn't know.
> 
>>
>> Is qemu using a custom code style for Python sources? If so, is it
>> documented somewhere?
> 
> No, and following PEP-8 is completely fine.

Although it never hurts to mention stuff like this in the commit
message, to make it obvious that it was intentional :)

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/7] docker.py: don't hang on large docker output

2016-09-06 Thread Fam Zheng

On Tue, 09/06 20:31, Sascha Silbe wrote:
> Dear Fam,
> 
> Fam Zheng  writes:
> 
> > On Wed, 08/24 20:30, Sascha Silbe wrote:
> [tests/docker/docker.py]
> >> @@ -25,6 +25,10 @@ from tarfile import TarFile, TarInfo
> >>  from StringIO import StringIO
> >>  from shutil import copy, rmtree
> >>  
> >> +
> >> +DEVNULL = open(os.devnull, 'wb')
> >> +
> >> +
> >
> > Too many blank lines? Otherwise looks good.
> 
> You caught me sneaking in a PEP-8 compliance fix. ;)

Ah, no problem. I just didn't know.

> 
> Is qemu using a custom code style for Python sources? If so, is it
> documented somewhere?

No, and following PEP-8 is completely fine.

Fam

> 
> Thanks for reviewing!
> 
> Sascha
> -- 
> Softwareentwicklung Sascha Silbe, Niederhofenstraße 5/1, 71229 Leonberg
> https://se-silbe.de/
> USt-IdNr. DE281696641
> 
>

Re: [Qemu-devel] [PATCH v7] docs: add cpu-hotplug.txt

2016-09-06 Thread Dou Liyang


Hi, Eduardo

At 09/07/2016 04:05 AM, Eduardo Habkost wrote:

On Tue, Aug 23, 2016 at 01:17:01PM +0800, Dou Liyang wrote:

Hi Alexandre,

At 08/22/2016 04:56 PM, Alexandre DERUMIER wrote:

Hello,

I'm looking to implement cpu hotplug,

and I have a question about cpu flags

currently I have something like

-cpu qemu64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce
-smp 4,sockets=2,cores=2,maxcpus=4


Does I need to define flags like:

-smp 2,sockets=2,cores=2,maxcpus=4
-device 
qemu64-x86_64-cpu,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce,id=cpu1,socket-id=1,core-id=1,thread-id=0


I think we don't need to do that.
In my option, just like this:

-device qemu64-x86_64-cpu,id=cpu1,socket-id=1,..

Because QEMU sets the "-cpu" options in MachineState:

  current_machine->cpu_model = cpu_model;

when you add a CPU, QEMU can get the flag from the MachineState.


You don't need to repeat the flags, but that's not because of
MachineState::cpu_model, but because of the semantics of -smp:


Sorry for the reply.



The option:
  -smp MODEL,+FOO,+BAR


I guess you may mean "-cpu", not "-smp"


is internally translated to:
  -global MODEL.FOO=on
  -global MODEL.BAR=on
in addition to setting the CPU model for creating initial CPUs to
MODEL.



I see.




...

?


Another question,
is -smp mandatory ?  (if I want coldplug all cpus)


it's not mandatory. such as this:

  ./x86_64-softmmu/qemu-system-x86_64 -m 1G /image/fedora.img
  -enable-kvm -monitor stdio

 (qemu) info cpus
  * CPU #0: pc=0x81060586 (halted) thread_id=4032

the default number of CPUs is 1.



-smp sockets=2,cores=2,maxcpus=4



-device qemu64-x86_64-cpu,id=cpu1,socket-id=1,core-id=1,thread-id=0
-device qemu64-x86_64-cpu,id=cpu1,socket-id=1,core-id=2,thread-id=0
-device qemu64-x86_64-cpu,id=cpu3,socket-id=2,core-id=1,thread-id=0
-device qemu64-x86_64-cpu,id=cpu4,socket-id=2,core-id=2,thread-id=0

or does I need minimum 1 non unplugable cpu


As mentioned above, the default number of CPUs is 1, so "-smp
sockets=2" is the same as "-smp 1,sockets=2".

I assume you mean something like "-smp 0", but that doesn't work
today. I would like to eventually allow all CPUs to be created
using -device, but that's not possible yet.


I am interested in what is the benefit, if we can create all CPUs using
-device.   :)


Thanks,
Dou




-smp 1,sockets=2,cores=2,maxcpus=4
-device qemu64-x86_64-cpu,id=cpu1,socket-id=1,core-id=2,thread-id=0
-device qemu64-x86_64-cpu,id=cpu3,socket-id=2,core-id=1,thread-id=0
-device qemu64-x86_64-cpu,id=cpu4,socket-id=2,core-id=2,thread-id=0



I think that is better, and the socket-id/core-id/thread-id starts at
index 0

I am new to the community. Please don't mind, and take with a grain of
salt.   

Thanks,
Dou

Re: [Qemu-devel] [PULL 00/66] ppc-for-2.8 queue 20160906

2016-09-06 Thread David Gibson

On Tue, Sep 06, 2016 at 11:26:34PM +0200, Thomas Huth wrote:
> On 06.09.2016 22:55, Thomas Huth wrote:
> > On 06.09.2016 22:23, Paolo Bonzini wrote:
> >>
> >>
> >> On 06/09/2016 21:35, Thomas Huth wrote:
> >>> D'oh! I'll have a look...
> >>>
> > I also see warnings on non-KVM hosts during 'make check' which seem
> > to be produced by this new test:
> >
> > TEST: tests/boot-serial-test... (pid=2836)
> >   /i386/boot-serial/isapc:
> > warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit
> > 1]
> >>> That happens because the test is running QEMU with the isapc machine. I
> >>> also get that warning message if I simply run the following on the
> >>> command line:
> >>>
> >>> $ qemu-system-x86_64 -M isapc
> >>> warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit 1]
> >>>
> >>> So the warning is likely there since quite a while already, just nobody
> >>> is running with -M isapc nowadays anymore, so nobody noticed this ...
> >>> Does anybody have got a clue how to fix that warning?
> >>
> >> Implement vme... :)
> > 
> > Hmmm, I was rather looking for an easier solution. I think I also found
> > one: By using "-cpu qemu32", the warning does not occur anymore...
> 
> David,
> 
> since you likely got to respin this pull request anyway, could you
> please squash the following patch into my "tests: Check serial output
> of firmware boot of some machines" patch? This silences the warning
> about the unsupported vme TCG feature:
> 
> diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
> index b36c6bf..d98c564 100644
> --- a/tests/boot-serial-test.c
> +++ b/tests/boot-serial-test.c
> @@ -29,10 +29,10 @@ static testdef_t tests[] = {
>  { "ppc64", "ppce500", "", "U-Boot" },
>  { "ppc64", "prep", "", "Open Hack'Ware BIOS" },
>  { "ppc64", "pseries", "", "Open Firmware" },
> -{ "i386", "isapc", "-device sga", "SGABIOS" },
> +{ "i386", "isapc", "-cpu qemu32 -device sga", "SGABIOS" },
>  { "i386", "pc", "-device sga", "SGABIOS" },
>  { "i386", "q35", "-device sga", "SGABIOS" },
> -{ "x86_64", "isapc", "-device sga", "SGABIOS" },
> +{ "x86_64", "isapc", "-cpu qemu32 -device sga", "SGABIOS" },
>  { "x86_64", "q35", "-device sga", "SGABIOS" },
>  { "s390x", "s390-ccw-virtio",
>"-nodefaults -device sclpconsole,chardev=serial0", "virtio device" },

Done, thanks.  Seems to fix that warning.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] tests/acpi: speedup acpi tests

2016-09-06 Thread Michael S. Tsirkin

On Tue, Sep 06, 2016 at 10:22:26PM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/09/2016 21:21, Michael S. Tsirkin wrote:
> > On Tue, Sep 06, 2016 at 09:11:16PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 06/09/2016 17:51, Michael S. Tsirkin wrote:
>  Just use "-machine accel=kvm:tcg" and let QEMU do the hard work. :)
> 
>  Paolo
> >>>
> >>> Sounds good, but we really need to skip it when gsi
> >>> capability is not there (and when using kernel irqchip)
> >>> (because in that case we can't override apic irq0).
> >>
> >> We should just remove support for !kvm_has_gsi_routing on x86.  It's
> >> been there since November 2008.
> >>
> >> Paolo
> > 
> > Fine by me. Can you post a patch that does this?
> > This one can be a follow-up, and we can also
> > drop the conditional handling from acpi-build.
> 
> Go ahead and merge Marcel's patch.  I can remove kernel_irqchip=off once
> apic irq0 override is assumed.
> 
> Paolo

OK

Re: [Qemu-devel] [PATCH V13 00/12] Introduce COLO-Proxy

2016-09-06 Thread Zhang Chen




On 09/06/2016 04:02 PM, Jason Wang wrote:



On 2016年09月05日 17:37, Zhang Chen wrote:

COLO-proxy is a part of COLO project. COLO project is
composed of COLO-frame, COLO-proxy and block-replication.
It is used to compare the network package to help COLO
decide whether to do checkpoint. With COLO-proxy's help,
COLO greatly improves the performance.

The filter-redirector, filter-mirror, colo-compare
and filter-rewriter compose the COLO-proxy.

COLO-compare
It is used to compare the network package to help COLO decide
whether to do checkpoint.

Filter-rewriter
It will rewrite some of secondary packet to make
secondary guest's connection established successfully.
In this module we will rewrite tcp packet's ack to the secondary
from primary,and rewrite tcp packet's seq to the primary from
secondary.

The full version in this github:
https://github.com/zhangckid/qemu/tree/colo-v2.7-proxy-mode-compare-and-rewriter-sep5 



Look good, just find one minor issue. And please also:

- Fix the mingw build issue reported by the bot, you can test it 
through docker test

- Move the doc patch to the end of series

Thanks


OK~~ I will fix this in next version.

Thanks
Zhang Chen





v13:
   - add docs/colo-proxy.txt
   - add MAINTAINERS
   - remove unnecessary .h
   - remove QTAILQ_ENTRY(CompareState)
   - fix some comments
   - add find_and_check_chardev() to avoid code duplication
   - remove the "is_unix" in patch3
   - change error_report() to trace in patch4
   - use l2hdr_len here instead of ETH_HLEP
   - fix code style
   - remove colo_rm_connection()
   - remove hashtable_size
   - change g_queue_foreach() to g_queue_find_custom() in patch7
   - change trace_colo_compare_tcp_miscompare() to fprintf() in patch8
   - add codes not queue vlan packets

v12:
   - add qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext
 to this series as the first patch.
   - update COLO net ascii figure.
   - add chardev socket check.
   - fix some typo.
   - add some comments.
   - rename net/colo-base.c to net/colo.c
   - rename network/transport_layer to network/transport_header.
   - move the job that clear coon_list when hashtable_size oversize
 to connection_get.
   - reuse connection_destroy() do colo_rm_connection().
   - fix pkt mem leak in colo_compare_connection().
 (result be released in g_queue_remove(), so it were not leak)
   - rename thread_name "compare" to "colo-compare".
   - change icmp compare to memcmp().

v11:
   - Make patch 5 to a independent patch series.
 [PATCH V3] qemu-char: Add qemu_chr_add_handlers_full() for 
GMaincontext

   - For Jason's comments, merge filter-rewriter to this series.
 (patch 7,8,9)
   - Add reverse_connection_key()
   - remove conn_list in filter-rewriter
   - remove unprocessed_connections
   - add some comments

v10:
   - fix typo
   - Should we make patch 5 independent with this series?
 This patch just add a API for qemu-char.

v9:
  p5:
   - use chr_update_read_handler_full() replace
 the chr_update_read_handler()
   - use io_watch_poll_prepare_full() replace
 the io_watch_poll_prepare()
   - use io_watch_poll_funcs_full replace
 the io_watch_poll_funcs
   - avoid code duplication

v8:
  p5:
   - add new patch:
 qemu-char: Add qemu_chr_add_handlers_full() for GMaincontext

v7:
  p5:
- add [PATCH]qemu-char: Fix context for g_source_attach()
  in this patch series.

v6:
  p6:
- add more commit log.
- fix icmp comparison to compare all packet.

  p5:
- add more cpmments in commit log.
- change REGULAR_CHECK_MS to REGULAR_PACKET_CHECK_MS
- make check old packet independent to compare thread
- remove thread_status

  p4:
- change this patch only about
  Connection and ConnectionKey.
- add some comments in commit log.
- remove mode in fill_connection_key().
- fix some comments and bug.
- move colo_conn_state to patch of
  "work with colo-frame"
- remove conn_list_lock.
- add MAX_QUEUE_SIZE, if primary_list or
  secondary_list biger than MAX_QUEUE_SIZE
  we will drop packet.

  p3:
- add new independent kernel jhash patch.

  p2:
- add new independent colo-base patch.

  p1:
- add a ascii figure and some comments to explain it
- move trace.h to p2
- move QTAILQ_HEAD(, CompareState) net_compares to
  patch of "work with colo-frame"
- add some comments in qemu-option.hx


v5:
  p3:
 - comments from Jason
   we poll and handle chardev in comapre thread,
   Through this way, there's no need for extra
   synchronization with main loop
   this depend on another patch:
   qemu-char: Fix context for g_source_attach()
 - remove QemuEvent
  p2:
 - remove conn->list_lock
  p1:
 - move compare_pri/sec_chr_in to p3
 - move compare_chr_send to p2

v4:
  p4:
 - add some comments
 - fix some trace-events
 - fix tcp compare error
  p3:
 - add rcu_read_lock().
 - fix trace name
 - fix ja

[Qemu-devel] [PATCH] hmp: Improve 'info mtree' with optional parm for mapinfo

2016-09-06 Thread Thorsten Kohfeldt

From: Thorsten Kohfeldt 
Date: Wed, 31 Aug 2016 22:43:14 +0200
Subject: [PATCH] hmp: Improve 'info mtree' with optional parm for mapinfo

Motivation
When 'tuning' 'quirks' for VFIO imported devices, it is not easy to
directly grasp the implications of the priorisation algorithms in place
for the 'layered mapping' of memory regions.
Even though there are rules (documented in docs/memory.txt), once in a
while one might question the correctness of the actual implementation of
the rules.
Particularly, I believe I have uncovered a divergence of (sub-)region
priorisation/order/visibility in a corner case of importing a device
(which requires a 'quirk') with mmap enabled vs. mmap disabled.
This modification provides a means of visualising the ACTUAL
mapping/visibility/occlusion of subregions within regions, whereas the
current info mtree command only lists the tree of regions (all, visible
and invisible ones).
It is primarily intended to provide support for easy presentation of my
cause, but I strongly believe this modification also has general purpose
advantages.

Functional implementation
A new OPTIONAL integer parameter, mapinfo-width, is added to the
monitor/hmp 'info mtree' command.
Effect:
When given and between 5 and 257, then each mtree output line is
prefixed with  columns of 'visibility samples', depicting
what qemu's memory region priorisation algorithms effectively make
visible/accessible at that sample address at the time of inquiry.
NOTE that
the sampling algorithm virtually cuts the memory region into (width - 1)
'slices' and computes (width) samples at the edges of those virtual
slices. Thus, it is probably a good idea to always request (2^n + 1)
samples.
ALSO NOTE that
the memory regions are NOT actually accessed at those 'samples', ONLY a
region PRIORISATION EVALUATION is performed for the sample addresses.
You can setup a default using environment variable
QEMU_INFO_MTREE_MAPINFO_WIDTH (must be in the Qemu instance's
environment; when unset, then the default is 0, i.e. off).
Giving negative values for sample-width results in using that default,
while values above 257 are reduced to 257, and values from 0 to 4 switch
the sampling off.

Technical implementation
3 functions are added to memory.c:
sane_mtree_info_sample_width() is used to sanitise the new parameter,
i.e. to provide a default and adjust it towards 'usability'. It is
called by existing function mtree_info().
mtree_print_mr_v_samples() is called for each mtree memory region (mr)
output line in order to print the 'mapinfo' prefix. The call is
performed by existing function mtree_print_mr() with parameter
'const MemoryRegion *mr', thus promising the object under investigation
is not modified.
mtree_mr_sample_reftype_marker() is used to traverse an mr subtree for a
given hardware address in order to basically find the first matching
enabled region and return its type. It is called by
mtree_print_mr_v_samples() for  'sample'
addresses. As an mr tree is traversed, limited recursion is involved.
Additionally, for existing functions memory_region_to_address_space(),
memory_region_size(), and memory_region_find_rcu() their parameter
'MemoryRegion *mr' had to be constrained to 'const MemoryRegion *mr' in
order to satisfy the compiler while attempting to emphasize the
'object is not modified' promise by insisting to pass *mr as const into
mtree_print_mr_v_samples(). This did not entail changes in the bodies of
mentioned functions.

Signed-off-by: Thorsten Kohfeldt 
---
 hmp-commands-info.hx  |   9 +-
 include/exec/memory.h |   7 +-
 memory.c  | 246 --
 monitor.c |   4 +-
 4 files changed, 250 insertions(+), 16 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 74446c6..1593238 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -264,16 +264,17 @@ ETEXI

 {
 .name   = "mtree",
-.args_type  = "",
-.params = "",
-.help   = "show memory tree",
+.args_type  = "mapinfo-width:l?",
+.params = "[mapinfo-width]",
+.help   = "show memory tree "
+  "(mapinfo-width: depict memory subregion mappings with 
leading characters)",
 .mhandler.cmd = hmp_info_mtree,
 },

 STEXI
 @item info mtree
 @findex mtree
-Show memory tree.
+Show memory tree optionally depicting subregion mappings.
 ETEXI

 {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 3e4d416..751483c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -536,7 +536,7 @@ struct Object *memory_region_owner(MemoryRegion *mr);
  *
  * @mr: the memory region being queried.
  */
-uint64_t memory_region_size(MemoryRegion *mr);
+uint64_t memory_region_size(const MemoryRegion *mr);

 /**
  * memory_region_is_ram: check whether a memory region is random access
@@ -1202,7 +1202,10 @@ void memory_global_dirty_log_start(void);
  */
 void memory_global_dirty_log_stop

Re: [Qemu-devel] [PATCH] eepro100: Frame Reception

2016-09-06 Thread Mateus Krepsky Ludwich

Ping.

http://patchwork.ozlabs.org/patch/662355/


On Tue, Aug 23, 2016 at 1:46 PM, Mateus Krepsky Ludwich
 wrote:
> Changed E100 device so it updates EOF, F, and Actual Count fields of Receive
> Frame Descriptor (RFD).
> Assuming RFD.actual_count equals to size.
>
> Signed-off-by: Mateus Krepsky Ludwich 
> ---
>  hw/net/eepro100.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
> index bab4dbf..c0f3816 100644
> --- a/hw/net/eepro100.c
> +++ b/hw/net/eepro100.c
> @@ -1739,6 +1739,7 @@ static ssize_t nic_receive(NetClientState *nc,
> const uint8_t * buf, size_t size)
>   &rx, sizeof(eepro100_rx_t));
>  uint16_t rfd_command = le16_to_cpu(rx.command);
>  uint16_t rfd_size = le16_to_cpu(rx.size);
> +uint16_t rfd_count = size | 0xc000 ; /* Setting EOF and F bits */
>
>  if (size > rfd_size) {
>  logout("Receive buffer (%" PRId16 " bytes) too small for data "
> @@ -1756,6 +1757,8 @@ static ssize_t nic_receive(NetClientState *nc,
> const uint8_t * buf, size_t size)
>  offsetof(eepro100_rx_t, status), rfd_status);
>  stw_le_pci_dma(&s->dev, s->ru_base + s->ru_offset +
>  offsetof(eepro100_rx_t, count), size);
> +stw_le_pci_dma(&s->dev, s->ru_base + s->ru_offset +
> +offsetof(eepro100_rx_t, count), rfd_count);
>  /* Early receive interrupt not supported. */
>  #if 0
>  eepro100_er_interrupt(s);
> --
> 1.9.1

[Qemu-devel] [PATCH v3 2/8] docker: avoid dependency on 'realpath' package

2016-09-06 Thread Sascha Silbe

The 'realpath' executable is shipped in a separate package that isn't
installed by default on some distros.

We already use 'readlink -e' (provided by GNU coreutils) in some other
part of the code, so let's settle for that instead.

Signed-off-by: Sascha Silbe 
---
Too bad there isn't a POSIX equivalent of this.

 tests/docker/Makefile.include | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 4f4707d..1b20db0 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -116,7 +116,7 @@ docker-run-%: docker-qemu-src
-e EXTRA_CONFIGURE_OPTS=$(EXTRA_CONFIGURE_OPTS) 
\
-e V=$V -e J=$J -e DEBUG=$(DEBUG)\
-e CCACHE_DIR=/var/tmp/ccache \
-   -v $$(realpath 
$(DOCKER_SRC_COPY)):/var/tmp/qemu:z$(COMMA)ro \
+   -v $$(readlink -e 
$(DOCKER_SRC_COPY)):/var/tmp/qemu:z$(COMMA)ro \
-v $(DOCKER_CCACHE_DIR):/var/tmp/ccache:z \
qemu:$(IMAGE) \
/var/tmp/qemu/run \
-- 
1.9.1

[Qemu-devel] [PATCH v3 4/8] docker: debian-bootstrap.pre: print helpful message if DEB_ARCH/DEB_TYPE unset

2016-09-06 Thread Sascha Silbe

The debian-bootstrap image doesn't choose a default architecture and
distribution version, instead the user has to set both DEB_ARCH and
DEB_TYPE in the environment. Print a reasonably helpful message if
either of them isn't set instead of complaining about "qemu-" being
missing or erroring out because we cannot cd to the mirror URL.

Signed-off-by: Sascha Silbe 
---
v2→v3:
  - send error messages to stderr

 tests/docker/dockerfiles/debian-bootstrap.pre | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tests/docker/dockerfiles/debian-bootstrap.pre 
b/tests/docker/dockerfiles/debian-bootstrap.pre
index 9b95a6b..3d9f7f4 100755
--- a/tests/docker/dockerfiles/debian-bootstrap.pre
+++ b/tests/docker/dockerfiles/debian-bootstrap.pre
@@ -15,6 +15,19 @@ exit_and_skip()
 if [ -z $FAKEROOT ]; then
 echo "Please install fakeroot to enable bootstraping" >&2
 exit_and_skip
+
+fi
+
+if [ -z "${DEB_ARCH}" ]; then
+echo "Please set DEB_ARCH to choose an architecture (e.g. armhf)" >&2
+exit_and_skip
+
+fi
+
+if [ -z "${DEB_TYPE}" ]; then
+echo "Please set DEB_TYPE to a Debian archive name (e.g. testing)" >&2
+exit_and_skip
+
 fi
 
 # We check in order for
-- 
1.9.1

[Qemu-devel] [PATCH v3 3/8] docker: debian-bootstrap.pre: print error messages to stderr

2016-09-06 Thread Sascha Silbe

Send error messages where they belong so they're seen even if stdout
is redirected to /dev/null.

Signed-off-by: Sascha Silbe 
---
v2→v3:
  - new patch

 tests/docker/dockerfiles/debian-bootstrap.pre | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/docker/dockerfiles/debian-bootstrap.pre 
b/tests/docker/dockerfiles/debian-bootstrap.pre
index 5d9c8d5..9b95a6b 100755
--- a/tests/docker/dockerfiles/debian-bootstrap.pre
+++ b/tests/docker/dockerfiles/debian-bootstrap.pre
@@ -13,7 +13,7 @@ exit_and_skip()
 # fakeroot is needed to run the bootstrap stage
 #
 if [ -z $FAKEROOT ]; then
-echo "Please install fakeroot to enable bootstraping"
+echo "Please install fakeroot to enable bootstraping" >&2
 exit_and_skip
 fi
 
@@ -38,7 +38,7 @@ if [ -z $DEBOOTSTRAP_DIR ]; then
 else
 DEBOOTSTRAP=${DEBOOTSTRAP_DIR}/debootstrap
 if [ ! -f $DEBOOTSTRAP ]; then
-echo "Couldn't find script at ${DEBOOTSTRAP}"
+echo "Couldn't find script at ${DEBOOTSTRAP}" >&2
 exit_and_skip
 fi
 fi
@@ -48,7 +48,7 @@ fi
 #
 BINFMT_DIR=/proc/sys/fs/binfmt_misc
 if [ ! -e $BINFMT_DIR ]; then
-   echo "binfmt_misc needs enabling for a QEMU bootstrap to work"
+   echo "binfmt_misc needs enabling for a QEMU bootstrap to work" >&2
exit_and_skip
 else
 # DEB_ARCH and QEMU arch names are not totally aligned
@@ -76,7 +76,7 @@ else
 ;;
 esac
 if [ ! -e "${BINFMT_DIR}/$QEMU" ]; then
-echo "No binfmt_misc rule to run $QEMU, can't bootstrap"
+echo "No binfmt_misc rule to run $QEMU, can't bootstrap" >&2
 exit_and_skip
 fi
 fi
-- 
1.9.1

[Qemu-devel] [PATCH v3 8/8] docker: silence debootstrap when --quiet is given

2016-09-06 Thread Sascha Silbe

If we silence docker when --quiet is given, we should also silence the
.pre script (i.e. debootstrap).

Only discards stdout, so some diagnostics (e.g. from git clone) are
still printed. Most of the verbose output is gone however and this way
we still have a chance to see error messages.

Signed-off-by: Sascha Silbe 
---
 tests/docker/docker.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/docker/docker.py b/tests/docker/docker.py
index efb2bf4..b85c165 100755
--- a/tests/docker/docker.py
+++ b/tests/docker/docker.py
@@ -239,8 +239,9 @@ class BuildCommand(SubCommand):
 # Is there a .pre file to run in the build context?
 docker_pre = os.path.splitext(args.dockerfile)[0]+".pre"
 if os.path.exists(docker_pre):
+stdout = DEVNULL if args.quiet else None
 rc = subprocess.call(os.path.realpath(docker_pre),
- cwd=docker_dir)
+ cwd=docker_dir, stdout=stdout)
 if rc == 3:
 print "Skip"
 return 0
-- 
1.9.1

[Qemu-devel] [PATCH v3 5/8] docker: print warning if EXECUTABLE is not set when building debootstrap image

2016-09-06 Thread Sascha Silbe

Building the debian-debootstrap image will usually fail if EXECUTABLE
isn't set (when using the Makefile). Warn the user in this case so
they know why it's failing.

Signed-off-by: Sascha Silbe 
---
 tests/docker/Makefile.include | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index 1b20db0..19d4cc7 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -44,6 +44,9 @@ docker-image: ${DOCKER_TARGETS}
 
 # General rule for building docker images
 docker-image-%: $(DOCKER_FILES_DIR)/%.docker
+   @if test "$@" = docker-image-debian-bootstrap -a -z "$(EXECUTABLE)"; 
then \
+   echo WARNING: EXECUTABLE is not set, debootstrap may fail. 2>&1 
; \
+   fi
$(call quiet-command,\
$(SRC_PATH)/tests/docker/docker.py build qemu:$* $< \
$(if $V,,--quiet) $(if $(NOCACHE),--no-cache) \
-- 
1.9.1

Re: [Qemu-devel] [PATCH v4 2/3] tests: make pc_alloc_init/init_flags/uninit generic

2016-09-06 Thread Thomas Huth

On 06.09.2016 15:17, Laurent Vivier wrote:
> And add support for ppc64.
> 
> Signed-off-by: Laurent Vivier 
> ---
> v2:
> - remove useless parenthesis, inline
> 
>  tests/Makefile.include  |  3 ++-
>  tests/libqos/libqos.h   |  2 +-
>  tests/libqos/malloc-ppc64.c | 38 ++
>  tests/libqos/malloc-ppc64.h | 17 +
>  tests/libqos/malloc.c   | 42 ++
>  tests/libqos/malloc.h   |  3 +++
>  6 files changed, 103 insertions(+), 2 deletions(-)
>  create mode 100644 tests/libqos/malloc-ppc64.c
>  create mode 100644 tests/libqos/malloc-ppc64.h
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 14be491..a286848 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -557,8 +557,9 @@ tests/test-crypto-block$(EXESUF): 
> tests/test-crypto-block.o $(test-crypto-obj-y)
>  
>  libqos-obj-y = tests/libqos/pci.o tests/libqos/fw_cfg.o tests/libqos/malloc.o
>  libqos-obj-y += tests/libqos/i2c.o tests/libqos/libqos.o
> +libqos-obj-y += tests/libqos/malloc-ppc64.o tests/libqos/malloc-pc.o
>  libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o
> -libqos-pc-obj-y += tests/libqos/malloc-pc.o tests/libqos/libqos-pc.o
> +libqos-pc-obj-y += tests/libqos/libqos-pc.o
>  libqos-pc-obj-y += tests/libqos/ahci.o
>  libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o
>  libqos-imx-obj-y = $(libqos-obj-y) tests/libqos/i2c-imx.o
> diff --git a/tests/libqos/libqos.h b/tests/libqos/libqos.h
> index 604980d..7b71607 100644
> --- a/tests/libqos/libqos.h
> +++ b/tests/libqos/libqos.h
> @@ -3,7 +3,7 @@
>  
>  #include "libqtest.h"
>  #include "libqos/pci.h"
> -#include "libqos/malloc-pc.h"
> +#include "libqos/malloc.h"
>  
>  typedef struct QOSOps {
>  QGuestAllocator *(*init_allocator)(QAllocOpts);
> diff --git a/tests/libqos/malloc-ppc64.c b/tests/libqos/malloc-ppc64.c
> new file mode 100644
> index 000..1b31e33
> --- /dev/null
> +++ b/tests/libqos/malloc-ppc64.c
> @@ -0,0 +1,38 @@
> +/*
> + * libqos malloc support for PPC64
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqos/malloc-ppc64.h"
> +
> +#include "qemu-common.h"
> +
> +#define PAGE_SIZE 4096
> +
> +/* Memory must be a multiple of 256 MB,
> + * so we have at least 256MB
> + */
> +#define PPC64_MIN_SIZE 0x1000
> +
> +void ppc64_alloc_uninit(QGuestAllocator *allocator)
> +{
> +alloc_uninit(allocator);
> +}
> +
> +QGuestAllocator *ppc64_alloc_init_flags(QAllocOpts flags)
> +{
> +QGuestAllocator *s;
> +
> +s = alloc_init_flags(flags, 1 << 20, PPC64_MIN_SIZE);
> +alloc_set_page_size(s, PAGE_SIZE);
> +
> +return s;
> +}
> +
> +QGuestAllocator *ppc64_alloc_init(void)
> +{
> +return ppc64_alloc_init_flags(ALLOC_NO_FLAGS);
> +}
> diff --git a/tests/libqos/malloc-ppc64.h b/tests/libqos/malloc-ppc64.h
> new file mode 100644
> index 000..c2b2dff
> --- /dev/null
> +++ b/tests/libqos/malloc-ppc64.h
> @@ -0,0 +1,17 @@
> +/*
> + * libqos malloc support for PPC64
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef LIBQOS_MALLOC_PPC64_H
> +#define LIBQOS_MALLOC_PPC64_H
> +
> +#include "libqos/malloc.h"
> +
> +QGuestAllocator *ppc64_alloc_init(void);
> +QGuestAllocator *ppc64_alloc_init_flags(QAllocOpts flags);
> +void ppc64_alloc_uninit(QGuestAllocator *allocator);
> +
> +#endif
> diff --git a/tests/libqos/malloc.c b/tests/libqos/malloc.c
> index b8eff5f..6a02345 100644
> --- a/tests/libqos/malloc.c
> +++ b/tests/libqos/malloc.c
> @@ -12,6 +12,9 @@
>  
>  #include "qemu/osdep.h"
>  #include "libqos/malloc.h"
> +#include "libqos/malloc-pc.h"
> +#include "libqos/malloc-ppc64.h"
> +#include "libqtest.h"
>  #include "qemu-common.h"
>  #include "qemu/host-utils.h"
>  
> @@ -375,3 +378,42 @@ void migrate_allocator(QGuestAllocator *src,
>  QTAILQ_INSERT_HEAD(src->free, node, MLIST_ENTNAME);
>  return;
>  }
> +
> +QGuestAllocator *machine_alloc_init(void)
> +{
> +const char *arch = qtest_get_arch();
> +
> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> +return pc_alloc_init();
> +}
> +if (strcmp(arch, "ppc64") == 0) {
> +return ppc64_alloc_init();
> +}
> +return NULL;
> +}
> +
> +QGuestAllocator *machine_alloc_init_flags(QAllocOpts flags)
> +{
> +const char *arch = qtest_get_arch();
> +
> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> +return pc_alloc_init_flags(flags);
> +}
> +if (strcmp(arch, "ppc64") == 0) {
> +return ppc64_alloc_init_flags(flags);
> +}
> +return NULL;
> +}
> +
> +void machine_alloc_uninit(QGuestAllocator *allocator)
> +{
> +const char *arch = qtest_get_arch();
> +
> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "

Re: [Qemu-devel] [PATCH v4 3/3] tests: add RTAS command in the protocol

2016-09-06 Thread Greg Kurz

On Tue,  6 Sep 2016 15:17:57 +0200
Laurent Vivier  wrote:

> Add a first test to validate the protocol:
> 
> - rtas/get-time-of-day compares the time
>   from the guest with the time from the host.
> 
> Signed-off-by: Laurent Vivier 
> ---
> v4:
> - use qemu_strtoXXX() instead strtoXX()
> 
> v3:
> - use mktimegm() instead of timegm()
>  
> v2:
> - add a missing space in qrtas_call() prototype
>  
>  hw/ppc/spapr_rtas.c | 19 
>  include/hw/ppc/spapr_rtas.h | 10 +++
>  qtest.c | 17 +++
>  tests/Makefile.include  |  3 ++
>  tests/libqos/rtas.c | 71 
> +
>  tests/libqos/rtas.h | 11 +++
>  tests/libqtest.c| 10 +++
>  tests/libqtest.h| 15 ++
>  tests/rtas-test.c   | 42 +++
>  9 files changed, 198 insertions(+)
>  create mode 100644 include/hw/ppc/spapr_rtas.h
>  create mode 100644 tests/libqos/rtas.c
>  create mode 100644 tests/libqos/rtas.h
>  create mode 100644 tests/rtas-test.c
> 
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index dc058e5..8aed56f 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -36,6 +36,7 @@
>  
>  #include "hw/ppc/spapr.h"
>  #include "hw/ppc/spapr_vio.h"
> +#include "hw/ppc/spapr_rtas.h"
>  #include "hw/ppc/ppc.h"
>  #include "qapi-event.h"
>  #include "hw/boards.h"
> @@ -691,6 +692,24 @@ target_ulong spapr_rtas_call(PowerPCCPU *cpu, 
> sPAPRMachineState *spapr,
>  return H_PARAMETER;
>  }
>  
> +uint64_t qtest_rtas_call(char *cmd, uint32_t nargs, uint64_t args,
> + uint32_t nret, uint64_t rets)
> +{
> +int token;
> +
> +for (token = 0; token < RTAS_TOKEN_MAX - RTAS_TOKEN_BASE; token++) {
> +if (strcmp(cmd, rtas_table[token].name) == 0) {
> +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> +PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> +
> +rtas_table[token].fn(cpu, spapr, token + RTAS_TOKEN_BASE,
> + nargs, args, nret, rets);
> +return H_SUCCESS;
> +}
> +}
> +return H_PARAMETER;
> +}
> +
>  void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
>  {
>  assert((token >= RTAS_TOKEN_BASE) && (token < RTAS_TOKEN_MAX));
> diff --git a/include/hw/ppc/spapr_rtas.h b/include/hw/ppc/spapr_rtas.h
> new file mode 100644
> index 000..383611f
> --- /dev/null
> +++ b/include/hw/ppc/spapr_rtas.h
> @@ -0,0 +1,10 @@
> +#ifndef HW_SPAPR_RTAS_H
> +#define HW_SPAPR_RTAS_H
> +/*
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +uint64_t qtest_rtas_call(char *cmd, uint32_t nargs, uint64_t args,
> + uint32_t nret, uint64_t rets);
> +#endif /* HW_SPAPR_RTAS_H */
> diff --git a/qtest.c b/qtest.c
> index 4c94708..beb62b4 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -28,6 +28,9 @@
>  #include "qemu/option.h"
>  #include "qemu/error-report.h"
>  #include "qemu/cutils.h"
> +#ifdef TARGET_PPC64
> +#include "hw/ppc/spapr_rtas.h"
> +#endif
>  
>  #define MAX_IRQ 256
>  
> @@ -531,6 +534,20 @@ static void qtest_process_command(CharDriverState *chr, 
> gchar **words)
>  
>  qtest_send_prefix(chr);
>  qtest_send(chr, "OK\n");
> +#ifdef TARGET_PPC64
> +} else if (strcmp(words[0], "rtas") == 0) {
> +uint64_t res, args, ret;
> +unsigned long nargs, nret;
> +
> +g_assert(qemu_strtoul(words[2], NULL, 0, &nargs) == 0);
> +g_assert(qemu_strtoull(words[3], NULL, 0, &args) == 0);
> +g_assert(qemu_strtoul(words[4], NULL, 0, &nret) == 0);
> +g_assert(qemu_strtoull(words[5], NULL, 0, &ret) == 0);
> +res = qtest_rtas_call(words[1], nargs, args, nret, ret);
> +
> +qtest_send_prefix(chr);
> +qtest_sendf(chr, "OK %"PRIu64"\n", res);
> +#endif
>  } else if (qtest_enabled() && strcmp(words[0], "clock_step") == 0) {
>  int64_t ns;
>  
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index a286848..c456b8b 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -272,6 +272,7 @@ check-qtest-sparc-y += tests/prom-env-test$(EXESUF)
>  check-qtest-microblazeel-y = $(check-qtest-microblaze-y)
>  check-qtest-xtensaeb-y = $(check-qtest-xtensa-y)
>  check-qtest-ppc64-y += tests/postcopy-test$(EXESUF)
> +check-qtest-ppc64-y += tests/rtas-test$(EXESUF)
>  
>  check-qtest-generic-y += tests/qom-test$(EXESUF)
>  
> @@ -558,6 +559,7 @@ tests/test-crypto-block$(EXESUF): 
> tests/test-crypto-block.o $(test-crypto-obj-y)
>  libqos-obj-y = tests/libqos/pci.o tests/libqos/fw_cfg.o tests/libqos/malloc.o
>  libqos-obj-y += tests/libqos/i2c.o tests/libqos/libqos.o
>  libqos-obj-y += tests/libqos/malloc-ppc64.o tests/libqos/malloc-pc.o
> +libqos-obj-y += tests/libqos/rtas.o
>  libqos-pc-obj-y = $(libqos-obj-y) tests/

[Qemu-devel] [PATCH v3 6/8] docker: make sure debootstrap is at least 1.0.67

2016-09-06 Thread Sascha Silbe

debootstrap prior to 1.0.67 generated an empty sources.list during
foreign bootstraps (Debian#732255 [1]). Fall back to the git checkout
if the installed debootstrap version is too old.

[1] https://bugs.debian.org/732255

Signed-off-by: Sascha Silbe 
---
v2→v3:
  - fix unbalanced white space around pipes
  - replaced GNU-specific version sort option with POSIX compliant set
of options

 tests/docker/dockerfiles/debian-bootstrap.pre | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tests/docker/dockerfiles/debian-bootstrap.pre 
b/tests/docker/dockerfiles/debian-bootstrap.pre
index 3d9f7f4..ee69e89 100755
--- a/tests/docker/dockerfiles/debian-bootstrap.pre
+++ b/tests/docker/dockerfiles/debian-bootstrap.pre
@@ -3,6 +3,8 @@
 # Simple wrapper for debootstrap, run in the docker build context
 #
 FAKEROOT=`which fakeroot 2> /dev/null`
+# debootstrap < 1.0.67 generates empty sources.list, see Debian#732255
+MIN_DEBOOTSTRAP_VERSION=1.0.67
 
 exit_and_skip()
 {
@@ -40,9 +42,17 @@ fi
 #
 
 if [ -z $DEBOOTSTRAP_DIR ]; then
+NEED_DEBOOTSTRAP=false
 DEBOOTSTRAP=`which debootstrap 2> /dev/null`
 if [ -z $DEBOOTSTRAP ]; then
 echo "No debootstrap installed, attempting to install from SCM"
+NEED_DEBOOTSTRAP=true
+elif ! (echo "${MIN_DEBOOTSTRAP_VERSION}" ; "${DEBOOTSTRAP}" --version \
+| cut -d ' ' -f 2) | sort -t . -n -k 1,1 -k 2,2 -k 3,3 -C; then
+echo "debootstrap too old, attempting to install from SCM"
+NEED_DEBOOTSTRAP=true
+fi
+if $NEED_DEBOOTSTRAP; then
 DEBOOTSTRAP_SOURCE=https://anonscm.debian.org/git/d-i/debootstrap.git
 git clone ${DEBOOTSTRAP_SOURCE} ./debootstrap.git
 export DEBOOTSTRAP_DIR=./debootstrap.git
-- 
1.9.1

[Qemu-devel] [PATCH v3 7/8] docker: build debootstrap after cloning

2016-09-06 Thread Sascha Silbe

When using the git version of debootstrap (because no usable version
of debootstrap was installed on the host), we need to run 'make' so
that devices.tar.gz gets built. Otherwise the first debootstrap stage
will fail without printing any error message.

Signed-off-by: Sascha Silbe 
---
 tests/docker/dockerfiles/debian-bootstrap.pre | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/docker/dockerfiles/debian-bootstrap.pre 
b/tests/docker/dockerfiles/debian-bootstrap.pre
index ee69e89..4322fbb 100755
--- a/tests/docker/dockerfiles/debian-bootstrap.pre
+++ b/tests/docker/dockerfiles/debian-bootstrap.pre
@@ -57,6 +57,7 @@ if [ -z $DEBOOTSTRAP_DIR ]; then
 git clone ${DEBOOTSTRAP_SOURCE} ./debootstrap.git
 export DEBOOTSTRAP_DIR=./debootstrap.git
 DEBOOTSTRAP=./debootstrap.git/debootstrap
+(cd "${DEBOOTSTRAP_DIR}" && "${FAKEROOT}" make )
 fi
 else
 DEBOOTSTRAP=${DEBOOTSTRAP_DIR}/debootstrap
-- 
1.9.1

Re: [Qemu-devel] [PULL 00/66] ppc-for-2.8 queue 20160906

2016-09-06 Thread Thomas Huth

On 06.09.2016 16:04, Peter Maydell wrote:
> On 6 September 2016 at 04:39, David Gibson  
> wrote:
>> The following changes since commit e87d397e5ef66276ccc49b829527d605ca07d0ad:
>>
>>   Open 2.8 development tree (2016-09-05 11:38:54 +0100)
>>
>> are available in the git repository at:
>>
>>   git://github.com/dgibson/qemu.git tags/ppc-for-2.8-20160906
>>
>> for you to fetch changes up to 2ccbe3e29adb7c016bc5525d38d2a6e938c481af:
>>
>>   tests: Check serial output of firmware boot of some machines (2016-09-06 
>> 10:28:17 +1000)
>>
>> 
>> ppc patch queue for 2016-Sep-6
>>
>> This is my first pull request for the newly opened qemu-2.8 tree.  It
>> contains a heap of things that were too late for 2.7 and have been
>> queued for a while.  In particular:
>> * A number of preliminary patches for the powernv machine type
>> * A substantial cleanup of exception handling which will be
>>   necessary to support running a TCG with hypervisor
>>   facilities
>> * A start on support for POWER9
>> * Some TCG implementations for new POWER9 instructions
>> * Some TCG and related cleanups in preparation for POWER9
>> * Some assorted TCG optimizations
>> * An implementation of the H_CHANGE_LOGICAL_LAN_MAC hypercall
>>   which allows the MAC address to be changed on the PAPR virtual
>>   NIC.
>> * Add some extra test cases for several machines (this isn't
>>   strictly in the ppc code, but is most value to ppc)
>>
>> 
> 
> Hi. This fails 'make check' on ppc64be:
[...]
> TEST: tests/prom-env-test... (pid=14611)
>   /ppc64/prom-env/mac99:   **
> ERROR:/home/pm215/qemu/tests/prom-env-test.c:41:check_guest_memory:
> assertion failed (signature == MAGIC): (0x == 0xcafec0de)
> FAIL

D'oh! I'll have a look...

> I also see warnings on non-KVM hosts during 'make check' which seem
> to be produced by this new test:
> 
> TEST: tests/boot-serial-test... (pid=2836)
>   /i386/boot-serial/isapc:
> warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit
> 1]

That happens because the test is running QEMU with the isapc machine. I
also get that warning message if I simply run the following on the
command line:

$ qemu-system-x86_64 -M isapc
warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit 1]

So the warning is likely there since quite a while already, just nobody
is running with -M isapc nowadays anymore, so nobody noticed this ...
Does anybody have got a clue how to fix that warning?

 Thomas

Re: [Qemu-devel] [PATCH 1/9] tests: cris: force inlining

2016-09-06 Thread Edgar E. Iglesias

On Mon, Sep 05, 2016 at 01:54:04PM +0200, Rabin Vincent wrote:
> From: Rabin Vincent 
> 
> The CRIS tests expect that functions marked inline are always inline.
> With newer versions of GCC, building them results warnings like the
> following and spurious failures when they are run.
> 
> In file included from tests/tcg/cris/check_moveq.c:5:0:
> tests/tcg/cris/crisutils.h:66:20: warning: inlining failed in call to
> 'cris_tst_cc.constprop.0': call is unlikely and code size would grow 
> [-Winline]
> tests/tcg/cris/check_moveq.c:28:13: warning: called from here [-Winline]
> 
> Use the always_inline attribute when building them to fix this.

Hi Rabin,

Do you happen to have a public git tree/branch with these changes?

Best regards,
Edgar


> 
> Signed-off-by: Rabin Vincent 
> ---
>  tests/tcg/cris/sys.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/tcg/cris/sys.h b/tests/tcg/cris/sys.h
> index c5f88e1..b1bf4c5 100644
> --- a/tests/tcg/cris/sys.h
> +++ b/tests/tcg/cris/sys.h
> @@ -3,6 +3,8 @@
>  #define STRINGIFY(x) #x
>  #define TOSTRING(x) STRINGIFY(x)
>  
> +#define inline inline __attribute__((always_inline))
> +
>  #define CURRENT_LOCATION __FILE__ ":" TOSTRING(__LINE__)
>  
>  #define err() \
> -- 
> 2.1.4
>

Re: [Qemu-devel] [PULL 00/66] ppc-for-2.8 queue 20160906

2016-09-06 Thread Benjamin Herrenschmidt

On Tue, 2016-09-06 at 23:09 +0200, Thomas Huth wrote:
> The bad commit is: "ppc: Speed up load/store multiple"
> 
> There are two "#if defined(HOST_WORDS_BIGENDIAN)" sections in this patch
> which are both bad: The memcpy tries to copy 32-bit values into 64-bit
> registers, which of course does not work (unless you compile this code
> for a 32-bit host only where the size of a gpr is only 32-bit).

The instruction does that. It only copies 32-bits. I think it's
correct.    

> I'd suggest to remove the "#if defined(HOST_WORDS_BIGENDIAN)" part and
> use cpu_to_be32() and friends instead of the bswap32() here?

I tried that but it prevents the faster memcpy, the whole point was to
speed things up...

Cheers,
Ben.

Re: [Qemu-devel] [PATCH 5/9] target-cris: sync CC state at load/stores.

2016-09-06 Thread Edgar E. Iglesias

On Mon, Sep 05, 2016 at 12:02:14PM -0700, Richard Henderson wrote:
> On 09/05/2016 04:54 AM, Rabin Vincent wrote:
> >From: "Edgar E. Iglesias" 
> >
> >Icount may choose to abort and recompile a TB at any load/store.  We
> >need to sync the CC state at these insns.
> >
> >Signed-off-by: Edgar E. Iglesias 
> >Signed-off-by: Rabin Vincent 
> >---
> > target-cris/translate.c | 9 +
> > target-cris/translate_v10.c | 3 +++
> > 2 files changed, 12 insertions(+)
> >
> >diff --git a/target-cris/translate.c b/target-cris/translate.c
> >index f4a8d7d..c280e24 100644
> >--- a/target-cris/translate.c
> >+++ b/target-cris/translate.c
> >@@ -1098,6 +1098,9 @@ static void gen_load64(DisasContext *dc, TCGv_i64 dst, 
> >TCGv addr)
> > {
> > int mem_index = cpu_mmu_index(&dc->cpu->env, false);
> >
> >+/* Due to icount, we need to update the CC flags on load/stores.  */
> >+cris_evaluate_flags(dc);
> >+
> 
> This is not the proper way to handle this.  You should arrange to sync the
> CC flags in restore_state_to_opc.  There are plenty of examples in the tree
> you can examine.  It looks like there's plenty of room for cleanup for cris
> here.
>

Thanks, agreed.

Rabin, most of the CRIS patches that I left in the AXIS tree without upstreaming
need more work. I don't mind you posting them but keep in mind that there's
not much that can go up without tidy up.

Best regards,
Edgar

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread Benjamin Herrenschmidt

On Wed, 2016-09-07 at 07:47 +1000, Benjamin Herrenschmidt wrote:
> d be an extra op in the xscom device model I guess. 
> 
> No. If you split the XSCOM bus from the MMIO -> XSCOM bridge (the
> ADU)
> then the conversion only happens in the former. You don't directly
> route the MMIOs over ! You intercept the MMIOs and use use the
> address_space_rw to "generate" the XSCOM accesses on the other side,
> like I do for the LPC bus.
> 
> We need that anyway because of the way XSCOMs can manipulate the HMER
> etc...
> 
> > 
> > Also, the main purpose of the XscomBus is to loop on the devices 
> > to populate the device tree. I am wondering if we could just use 
> > a simple list under the chip for that purpose.

In fact, if you do the above, you no longer need a XSCOM device...

A number of "devices" can exist below a chip, all they need to have
XSCOMs is to register memory regions that are child of that chip's
xscom_region.

For device-tree, well, we could have a generic interface that anything
that can populate DT has and iterate through them. Or make a "chiplet"
class or something.

Cheers,
Ben.

Re: [Qemu-devel] [PATCH] tests/acpi: speedup acpi tests

2016-09-06 Thread Michael S. Tsirkin

On Tue, Sep 06, 2016 at 09:11:16PM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/09/2016 17:51, Michael S. Tsirkin wrote:
> > > Just use "-machine accel=kvm:tcg" and let QEMU do the hard work. :)
> > > 
> > > Paolo
> > 
> > Sounds good, but we really need to skip it when gsi
> > capability is not there (and when using kernel irqchip)
> > (because in that case we can't override apic irq0).
> 
> We should just remove support for !kvm_has_gsi_routing on x86.  It's
> been there since November 2008.
> 
> Paolo

Fine by me. Can you post a patch that does this?
This one can be a follow-up, and we can also
drop the conditional handling from acpi-build.

-- 
MST

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread Benjamin Herrenschmidt

On Tue, 2016-09-06 at 16:42 +0200, Cédric Le Goater wrote:
> > Alternatively.. it might be simpler to just drop the SCOM as a
> > separate device.  I think you could just hang the scom bus directly
> > off the chip object.  IIUC the scom is basically the only chip-
> level
> > control mechanism, so this does make a certain amount of sense.
> 
> yes. I am exposing more and more stuff of the chip object under the 
> xscom object so we should merge them. I agree. We will still need 
> some XScomDevice of some sort.

What you can do is split it this way which matches the HW:

 - The chip object is the XSCOM parent, it owns the XSCOM bus,
and expose functions (methods) to read/write XSCOMs. WE could rename
XSCOM to "PIB" or "PCB" which is the real name of the bus ;-) But that
might confuse things more than help .

 - A separate ADU object on each chip that is a SysDevice and does the
MMIO bridge to XSCOM. It decodes the MMIO range for that chip and calls
the above accessors.

That makes it easy to generate XSCOMs using different mechanisms if we
wish to do so, which could come in handy, such as monitor commands, or
if we ever do cosimulation with a separate BMC, a simulated FSI, all by
just calling the first object's methods.

Cheers,
Ben.

Re: [Qemu-devel] [PATCH v2 3/7] ppc/pnv: Add XSCOM infrastructure

2016-09-06 Thread Benjamin Herrenschmidt

On Tue, 2016-09-06 at 16:42 +0200, Cédric Le Goater wrote:
> 
> The change does seem too invasive. I can give it a try in next
> version.
> 
> When a memory region is triggered, the impacted device will have
> to convert the address with xscom_to_pcb_addr(). There is some 
> dependency on the chip model because the translation is different. 
> That would be an extra op in the xscom device model I guess. 

No. If you split the XSCOM bus from the MMIO -> XSCOM bridge (the ADU)
then the conversion only happens in the former. You don't directly
route the MMIOs over ! You intercept the MMIOs and use use the
address_space_rw to "generate" the XSCOM accesses on the other side,
like I do for the LPC bus.

We need that anyway because of the way XSCOMs can manipulate the HMER
etc...

> Also, the main purpose of the XscomBus is to loop on the devices 
> to populate the device tree. I am wondering if we could just use 
> a simple list under the chip for that purpose.

Re: [Qemu-devel] [PATCH v4 2/3] tests: make pc_alloc_init/init_flags/uninit generic

2016-09-06 Thread Greg Kurz

On Tue,  6 Sep 2016 15:17:56 +0200
Laurent Vivier  wrote:

> And add support for ppc64.
> 
> Signed-off-by: Laurent Vivier 
> ---
> v2:
> - remove useless parenthesis, inline
> 

This works indeed but I'm just feeling curious about the QOSOps type introduced
by the following commit:

commit 90e5add6f2fa0b0bd9a4c1d5a4de2304b5f3e466
Author: John Snow 
Date:   Mon Jan 19 15:15:55 2015 -0500

libqos: add pc specific interface

Wouldn't this be better to implement something similar for ppc64 instead of
relying on strcmp() ?

Cheers.

--
Greg

>  tests/Makefile.include  |  3 ++-
>  tests/libqos/libqos.h   |  2 +-
>  tests/libqos/malloc-ppc64.c | 38 ++
>  tests/libqos/malloc-ppc64.h | 17 +
>  tests/libqos/malloc.c   | 42 ++
>  tests/libqos/malloc.h   |  3 +++
>  6 files changed, 103 insertions(+), 2 deletions(-)
>  create mode 100644 tests/libqos/malloc-ppc64.c
>  create mode 100644 tests/libqos/malloc-ppc64.h
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 14be491..a286848 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -557,8 +557,9 @@ tests/test-crypto-block$(EXESUF): 
> tests/test-crypto-block.o $(test-crypto-obj-y)
>  
>  libqos-obj-y = tests/libqos/pci.o tests/libqos/fw_cfg.o tests/libqos/malloc.o
>  libqos-obj-y += tests/libqos/i2c.o tests/libqos/libqos.o
> +libqos-obj-y += tests/libqos/malloc-ppc64.o tests/libqos/malloc-pc.o
>  libqos-pc-obj-y = $(libqos-obj-y) tests/libqos/pci-pc.o
> -libqos-pc-obj-y += tests/libqos/malloc-pc.o tests/libqos/libqos-pc.o
> +libqos-pc-obj-y += tests/libqos/libqos-pc.o
>  libqos-pc-obj-y += tests/libqos/ahci.o
>  libqos-omap-obj-y = $(libqos-obj-y) tests/libqos/i2c-omap.o
>  libqos-imx-obj-y = $(libqos-obj-y) tests/libqos/i2c-imx.o
> diff --git a/tests/libqos/libqos.h b/tests/libqos/libqos.h
> index 604980d..7b71607 100644
> --- a/tests/libqos/libqos.h
> +++ b/tests/libqos/libqos.h
> @@ -3,7 +3,7 @@
>  
>  #include "libqtest.h"
>  #include "libqos/pci.h"
> -#include "libqos/malloc-pc.h"
> +#include "libqos/malloc.h"
>  
>  typedef struct QOSOps {
>  QGuestAllocator *(*init_allocator)(QAllocOpts);
> diff --git a/tests/libqos/malloc-ppc64.c b/tests/libqos/malloc-ppc64.c
> new file mode 100644
> index 000..1b31e33
> --- /dev/null
> +++ b/tests/libqos/malloc-ppc64.c
> @@ -0,0 +1,38 @@
> +/*
> + * libqos malloc support for PPC64
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "libqos/malloc-ppc64.h"
> +
> +#include "qemu-common.h"
> +
> +#define PAGE_SIZE 4096
> +
> +/* Memory must be a multiple of 256 MB,
> + * so we have at least 256MB
> + */
> +#define PPC64_MIN_SIZE 0x1000
> +
> +void ppc64_alloc_uninit(QGuestAllocator *allocator)
> +{
> +alloc_uninit(allocator);
> +}
> +
> +QGuestAllocator *ppc64_alloc_init_flags(QAllocOpts flags)
> +{
> +QGuestAllocator *s;
> +
> +s = alloc_init_flags(flags, 1 << 20, PPC64_MIN_SIZE);
> +alloc_set_page_size(s, PAGE_SIZE);
> +
> +return s;
> +}
> +
> +QGuestAllocator *ppc64_alloc_init(void)
> +{
> +return ppc64_alloc_init_flags(ALLOC_NO_FLAGS);
> +}
> diff --git a/tests/libqos/malloc-ppc64.h b/tests/libqos/malloc-ppc64.h
> new file mode 100644
> index 000..c2b2dff
> --- /dev/null
> +++ b/tests/libqos/malloc-ppc64.h
> @@ -0,0 +1,17 @@
> +/*
> + * libqos malloc support for PPC64
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef LIBQOS_MALLOC_PPC64_H
> +#define LIBQOS_MALLOC_PPC64_H
> +
> +#include "libqos/malloc.h"
> +
> +QGuestAllocator *ppc64_alloc_init(void);
> +QGuestAllocator *ppc64_alloc_init_flags(QAllocOpts flags);
> +void ppc64_alloc_uninit(QGuestAllocator *allocator);
> +
> +#endif
> diff --git a/tests/libqos/malloc.c b/tests/libqos/malloc.c
> index b8eff5f..6a02345 100644
> --- a/tests/libqos/malloc.c
> +++ b/tests/libqos/malloc.c
> @@ -12,6 +12,9 @@
>  
>  #include "qemu/osdep.h"
>  #include "libqos/malloc.h"
> +#include "libqos/malloc-pc.h"
> +#include "libqos/malloc-ppc64.h"
> +#include "libqtest.h"
>  #include "qemu-common.h"
>  #include "qemu/host-utils.h"
>  
> @@ -375,3 +378,42 @@ void migrate_allocator(QGuestAllocator *src,
>  QTAILQ_INSERT_HEAD(src->free, node, MLIST_ENTNAME);
>  return;
>  }
> +
> +QGuestAllocator *machine_alloc_init(void)
> +{
> +const char *arch = qtest_get_arch();
> +
> +if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
> +return pc_alloc_init();
> +}
> +if (strcmp(arch, "ppc64") == 0) {
> +return ppc64_alloc_init();
> +}
> +return NULL;
> +}
> +
> +QGuestAllocator *machine_alloc_init_flags(QAllocOpts flags)
> +{
> +const char *arch = qtest_get_arch();
> +
> +if (strcmp(arch, "i386")

Re: [Qemu-devel] [PATCH v2 1/1] block-backend: allow flush on devices with open tray

2016-09-06 Thread John Snow




On 09/05/2016 05:15 AM, Markus Armbruster wrote:

John Snow  writes:


On 09/02/2016 01:44 AM, Markus Armbruster wrote:

John Snow  writes:


If a device still has an attached BDS because the medium has not yet
been removed, we will be unable to migrate to a new host because
blk_flush will return an error for that backend.

Replace the call to blk_is_available to blk_is_inserted to weaken
the check and allow flushes from the backend to work, while still
disallowing flushes from the frontend/device model to work.

This fixes a regression present in 2.6.0 caused by the following commit:
fe1a9cbc339bb54d20f1ca4c1e8788d16944d5cf
block: Move some bdrv_*_all() functions to BB

Signed-off-by: John Snow 
---
 block/block-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index effa038..36a32c3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1058,7 +1058,7 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
offset,
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
-if (!blk_is_available(blk)) {
+if (!blk_is_inserted(blk)) {
 return blk_abort_aio_request(blk, cb, opaque, -ENOMEDIUM);
 }

@@ -1118,7 +1118,7 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, 
int count)

 int blk_co_flush(BlockBackend *blk)
 {
-if (!blk_is_available(blk)) {
+if (!blk_is_inserted(blk)) {
 return -ENOMEDIUM;
 }

@@ -1127,7 +1127,7 @@ int blk_co_flush(BlockBackend *blk)

 int blk_flush(BlockBackend *blk)
 {
-if (!blk_is_available(blk)) {
+if (!blk_is_inserted(blk)) {
 return -ENOMEDIUM;
 }


Naive question: should we flush before we open the tray?



Naive, but long-winded answer:

There are two types of flushes to me, conceptually:

Frontend and Backend.

Frontend would be a request from the quest to flush. If the medium in
question is absent, this should rightly fail. I do expect this to be
handled at the device layer.

Backend is a request from QEMU for various reasons, like needing to
drain the queue for migration or savevm.

What's happening here is that we have a backend request to flush a
medium that is inaccessible to the guest


Assuming we caught frontend requests at the device layer already.


 -- The flush all routine is
ignorant of this fact. So we get a migration request with an open tray
(because the user had opened it some time prior perhaps, and left it
open unwittingly) and it fails because its inaccessible to the
guest. Migration fails as a result.

That seems wrong to me. A few ways to fix it:

(1) Have internal flush requests be aware of the fact that there's
nothing to flush here: this is a read-only media and we could skip it.


Returning successfully because there's nothing to flush makes sense to
me.



Only slightly more involved. At the time I wrote the first patch, I 
don't think Denis' improvements had gone in yet, and I still haven't 
really looked at them to see how easy they are to co-opt here.



(2) Allow flush to fail in a non-fatal way for cases where we need to
flush, but cannot (-ENOMEDIUM) and where it would rightly fail on a
real machine.


I don't like -ENOMEDIUM when there's nothing to flush.


(3) Just allow flushes to work on devices not visible to the guest,
which is what I've done here. Internal requests should work, Guest
requests should fail.


I guess that's okay, ...



Conceptually, QEMU should be allowed to flush things to its internal 
drive abstractions regardless of what it looks like to the guest 
whenever it decides it is necessary to.


In practice, I don't know how clean we are about separating out who is 
requesting the flush to know which to deny.



I was briefly concerned about "What if this lets us flush something we
shouldn't?" and Max's take was "That doesn't seem so bad. CDROMs are
RO anyway."


... even though I don't buy Max's reason.  Writable removable media
exist.  The argument I can buy is that internal requests are
fundamentally different from external requests.



Yeah, I think any such flush request should be denied by the device model.


So I went with the easiest option here.

To answer your question more directly, we aren't choosing to
eject-then-flush, the user is. I can't move the flush before the eject
unless we flush-on-eject internally, then mark the device explicitly
as not needing to be flushed anymore, but that's essentially (1) up
there.

--js


I am at the moment inclined to stick with the existing solution "for 
now" and add hardening against flush requests from the guest later 
within the 2.8 window if needed.


--js

Re: [Qemu-devel] [virtio-dev][RFC v3] virtio-sdm: new device specification

2016-09-06 Thread Edgar E. Iglesias

Hi,

Sorry for the delay. I have a few questions.

I don't fully understand the purpose of this. Could you elaborate a little on 
that?
You need a real IPI signal to drive this I guess, so it's a little bit of a
chicken and egg problem.

A useful feature I can see is multiplexing a single underlying IPI (driving
an sdm channel) into multiple "virtual" IPI signals. Is that the main
use-case?

Regarding the boot and reset. Typically there's a reset signal you release and
the remote CPU starts running. You can ofcourse reset the CPU and restart
at anytime. Not sure if you need an additional BOOT virtual signal.

There may also be additional mechanisms to control the startup address of the 
remote CPU.
Any thoughts on that?

Best regards,
Edgar


On Tue, Aug 30, 2016 at 01:22:26PM +0200, Christian Pinto wrote:
> Hello,
> 
> are there any comments?
> 
> 
> Christian
> 
> 
> On 09/08/2016 09:37, Christian Pinto wrote:
> >This patch adds the specification of the Signal Dristribution Module virtio
> >device to the current virtio specification document.
> >
> >Signed-off-by: Christian Pinto 
> >Signed-off-by: Baptiste Reynal 
> >
> >---
> >v2 -> v3:
> >- Removed master field from configuration and replaced with device_id
> >- Added new RESET signal
> >- Added signals definition into device specs
> >- Added feature bits associated to signals
> >
> >v1 -> v2:
> >- Fixed some typos
> >- Removed dependencies from QEMU
> >- Added explanation on how SDM can be used in AMP systems
> >- Explained semantics of payload field in SDMSignalData struct
> >---
> >  content.tex|   2 +
> >  virtio-sdm.tex | 166 
> > +
> >  2 files changed, 168 insertions(+)
> >  create mode 100644 virtio-sdm.tex
> >
> >diff --git a/content.tex b/content.tex
> >index 3317916..7fcf779 100644
> >--- a/content.tex
> >+++ b/content.tex
> >@@ -5643,6 +5643,8 @@ descriptor for the \field{sense_len}, \field{residual},
> >  \field{status_qualifier}, \field{status}, \field{response} and
> >  \field{sense} fields.
> >+\input{virtio-sdm.tex}
> >+
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >  Currently there are three device-independent feature bits defined:
> >diff --git a/virtio-sdm.tex b/virtio-sdm.tex
> >new file mode 100644
> >index 000..ee43e23
> >--- /dev/null
> >+++ b/virtio-sdm.tex
> >@@ -0,0 +1,166 @@
> >+\section{Signal Distribution Module}\label{sec:Device Types / SDM Device}
> >+
> >+The virtio Signal Distribution Module is meant to enable Inter Processor 
> >signal
> >+exchange.
> >+An example are Inter Processor Interrupts used in AMP systems, for the
> >+processors involved to notify the presence of new data in the communication
> >+queues.
> >+In AMP systems signals are used for various purposes, an example are 
> >remoteproc
> >+or RPMSG. In the former signals are used by a master processor to trigger
> >+the boot of a slave processor. While the latter, RPMSG, uses virtio queues 
> >as a
> >+message exchange medium between processors. In this case the SDM can be 
> >used to
> >+generate the interrupt associated to the kick of a virtio queue.
> >+
> >+There are three signal types supported by the device, namely the
> >+\textit{IRQ} signal, \textit{BOOT} signal and \textit{RESET} signal. Such 
> >set of
> >+signals covers most of the needs of an AMP system, where a master processor 
> >can
> >+trigger the boot or reset of slave processors, and processors send IRQs 
> >between
> >+each other.
> >+
> >+\subsection{Device ID}\label{sec:Device Types / SDM Device / Device ID}
> >+
> >+21
> >+
> >+\subsection{Virtqueues}\label{sec:Device Types / SDM Device / Virtqueues}
> >+
> >+\begin{description}
> >+\item[0] hg_vq
> >+\item[1] gh_vq
> >+\end{description}
> >+
> >+Queue 0 is used for device-to-driver communication (i.e., notification of a
> >+signal), while queue 1 for driver-to-device communication.
> >+
> >+\subsection{Feature bits}\label{sec:Device Types / SDM Device / Feature 
> >bits}
> >+
> >+\begin{description}
> >+\item[VIRTIO_SDM_F_IRQ_SIG (0)] Device supports the IRQ signal.
> >+
> >+\item[VIRTIO_SDM_F_BOOT_SIG (1)] Device supports the BOOT signal.
> >+
> >+\item[VIRTIO_SDM_F_RESET_SIG (2)] Device supports the RESET signal.
> >+\end{description}
> >+
> >+This set of bits is used by each virtio SDM device to declare which of the
> >+signals it supports. By specification each device can support all or a 
> >subset of
> >+the defined signals.
> >+
> >+\subsection{Device configuration layout}\label{sec:Device Types / SDM 
> >Device /
> >+Device configuration layout}
> >+
> >+The device configuration is composed by three fields:
> >+\texttt{max_slaves}, \texttt{current_slaves} and the \texttt{device_id}.
> >+
> >+\begin{lstlisting}
> >+struct virtio_sdm_config {
> >+u16   max_slaves;
> >+u16   current_slaves;
> >+u32   device_id;
> >+};
> >+\end{lstlisting}
> >+
> >+The SDM device can be instantiated either as a master or as a slave. The 
>

Re: [Qemu-devel] [PATCH RFC v2 05/22] block/pcache: add aio requests into cache

2016-09-06 Thread Pavel Butsykin


On 01.09.2016 18:28, Kevin Wolf wrote:

Am 29.08.2016 um 19:10 hat Pavel Butsykin geschrieben:

For storing requests use an rbtree, here are add basic operations on the
rbtree to work with  cache nodes.

Signed-off-by: Pavel Butsykin 
---
  block/pcache.c | 190 -
  1 file changed, 189 insertions(+), 1 deletion(-)

diff --git a/block/pcache.c b/block/pcache.c
index 7f221d6..f5022f9 100644
--- a/block/pcache.c
+++ b/block/pcache.c
@@ -27,6 +27,7 @@
  #include "block/raw-aio.h"
  #include "qapi/error.h"
  #include "qapi/qmp/qstring.h"
+#include "qemu/rbtree.h"

  #define PCACHE_DEBUG

@@ -37,9 +38,53 @@
  #define DPRINTF(fmt, ...) do { } while (0)
  #endif

+typedef struct RbNodeKey {
+uint64_tnum;
+uint32_tsize;
+} RbNodeKey;
+
+typedef struct BlockNode {
+struct RbNode   rb_node;
+union {
+RbNodeKey   key;
+struct {
+uint64_tsector_num;
+uint32_tnb_sectors;
+};
+};


What's the deal with this union?

It just adds an alias, and 'sector_num'/'nb_sectors' are actually longer
to type than 'key.num' and 'key.size', so the only advantage that I
could image doesn't really apply.


Yes, you're right. Initially this was done to reduce the code in those
places where the name change will not confuse. But later it turned out
that such places are few.


But it brings problems: We always have to be careful to keep the two
structs in sync, and grepping for the field name will only bring up half
of the users because the other half uses the other alias.


Yep, relative to the rest of the Qemu code it looks unusual, and given
the small benefits it is need to remove :)


+QTAILQ_ENTRY(BlockNode) entry;
+} BlockNode;
+
+typedef struct PCNode {
+BlockNode cm;
+
+uint8_t  *data;
+} PCNode;


What do 'PC' and 'cm' mean?


PC - PrefetchCache, cm - common.

But I think it would be better to fix it:

PrefetchCache - PrefCacheNode, common - common.


+typedef struct ReqStor {
+struct {
+struct RbRoot root;
+CoMutex   lock;
+} tree;
+
+uint32_t curr_size;
+} ReqStor;


Same question for ReqStor. For an identifier that is used only three
times, it could be a bit more descriptive.


Storage requests


What unit has curr_size or what does it count? The nodes in the tree?


This is the current size of the storage requests.


Also, cur_ seems to be more common as a prefix than curr_.


OK


+typedef struct BDRVPCacheState {
+BlockDriverState **bs;


This is unused. (Good thing, it looks weird.)


Apparently I forgot to remove it.


+ReqStor pcache;
+
+struct {
+QTAILQ_HEAD(pcache_head, BlockNode) head;
+CoMutex lock;
+} list;
+} BDRVPCacheState;
+
  typedef struct PrefCacheAIOCB {
  BlockAIOCB common;

+BDRVPCacheState *s;


Not really needed, you already have acb->common.bs.


  QEMUIOVector *qiov;
  uint64_t sector_num;
  uint32_t nb_sectors;
@@ -64,6 +109,124 @@ static QemuOptsList runtime_opts = {
  },
  };

+#define PCNODE(_n) ((PCNode *)(_n))


container_of() would be preferable for type safety.


OK


+static int pcache_key_cmp(const RbNodeKey *key1, const RbNodeKey *key2)
+{
+assert(key1 != NULL);
+assert(key2 != NULL);
+
+if (key1->num >= key2->num + key2->size) {
+return 1;
+}
+if (key1->num + key1->size <= key2->num) {
+return -1;
+}
+
+return 0;
+}
+
+static void *node_insert(struct RbRoot *root, BlockNode *node)
+{
+struct RbNode **new = &(root->rb_node), *parent = NULL;
+
+/* Figure out where to put new node */
+while (*new) {
+BlockNode *this = container_of(*new, BlockNode, rb_node);
+int result = pcache_key_cmp(&node->key, &this->key);
+if (result == 0) {
+return this;
+}
+parent = *new;
+new = result < 0 ? &((*new)->rb_left) : &((*new)->rb_right);
+}
+/* Add new node and rebalance tree. */
+rb_link_node(&node->rb_node, parent, new);
+rb_insert_color(&node->rb_node, root);
+
+return node;
+}
+
+static inline PCNode *pcache_node_insert(struct RbRoot *root, PCNode *node)
+{
+return node_insert(root, &node->cm);
+}
+
+static inline void pcache_node_free(PCNode *node)
+{
+g_free(node->data);
+g_slice_free1(sizeof(*node), node);
+}


We moved away from g_slice_* because it turned out that it hurt more
than it helped.


Somewhere I saw it, thanks.


+static inline void *pcache_node_alloc(RbNodeKey* key)
+{
+PCNode *node = g_slice_alloc(sizeof(*node));
+
+node->cm.sector_num = key->num;
+node->cm.nb_sectors = key->size;


In other words, node->cm.key = *key;


Yep :)


+node->data = g_malloc(node->cm.nb_sectors << BDRV_SECTOR_BITS);
+
+return node;
+}
+
+static bool pcache_node_find_and_create(PrefCacheAIOCB *acb, RbNodeKey *key,
+PCNo

Re: [Qemu-devel] [PULL 00/66] ppc-for-2.8 queue 20160906

2016-09-06 Thread Paolo Bonzini



On 06/09/2016 21:35, Thomas Huth wrote:
> D'oh! I'll have a look...
> 
>> > I also see warnings on non-KVM hosts during 'make check' which seem
>> > to be produced by this new test:
>> > 
>> > TEST: tests/boot-serial-test... (pid=2836)
>> >   /i386/boot-serial/isapc:
>> > warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit
>> > 1]
> That happens because the test is running QEMU with the isapc machine. I
> also get that warning message if I simply run the following on the
> command line:
> 
> $ qemu-system-x86_64 -M isapc
> warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit 1]
> 
> So the warning is likely there since quite a while already, just nobody
> is running with -M isapc nowadays anymore, so nobody noticed this ...
> Does anybody have got a clue how to fix that warning?

Implement vme... :)

Paolo

Re: [Qemu-devel] [PATCH] tests/acpi: speedup acpi tests

2016-09-06 Thread Paolo Bonzini



On 06/09/2016 21:21, Michael S. Tsirkin wrote:
> On Tue, Sep 06, 2016 at 09:11:16PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 06/09/2016 17:51, Michael S. Tsirkin wrote:
 Just use "-machine accel=kvm:tcg" and let QEMU do the hard work. :)

 Paolo
>>>
>>> Sounds good, but we really need to skip it when gsi
>>> capability is not there (and when using kernel irqchip)
>>> (because in that case we can't override apic irq0).
>>
>> We should just remove support for !kvm_has_gsi_routing on x86.  It's
>> been there since November 2008.
>>
>> Paolo
> 
> Fine by me. Can you post a patch that does this?
> This one can be a follow-up, and we can also
> drop the conditional handling from acpi-build.

Go ahead and merge Marcel's patch.  I can remove kernel_irqchip=off once
apic irq0 override is assumed.

Paolo

Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

2016-09-06 Thread Alex Williamson

On Wed, 7 Sep 2016 01:05:11 +0530
Kirti Wankhede  wrote:

> On 9/6/2016 11:10 PM, Alex Williamson wrote:
> > On Sat, 3 Sep 2016 22:04:56 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:  
> >>>
> >>>
> >>> On 02/09/2016 20:33, Kirti Wankhede wrote:
>   We could even do:
> >>
> >> echo $UUID1:$GROUPA > create
> >>
> >> where $GROUPA is the group ID of a previously created mdev device into
> >> which $UUID1 is to be created and added to the same group.
>  
> >>>
> >>> From the point of view of libvirt, I think I prefer Alex's idea.
> >>>  could be an additional element in the nodedev-create XML:
> >>>
> >>> 
> >>>   my-vgpu
> >>>   pci__86_00_0
> >>>   
> >>> 
> >>> 0695d332-7831-493f-9e71-1c85c8911a08
> >>> group1
> >>>   
> >>> 
> >>>
> >>> (should group also be a UUID?)
> >>> 
> >>
> >> No, this should be a unique number in a system, similar to iommu_group.  
> > 
> > Sorry, just trying to catch up on this thread after a long weekend.
> > 
> > We're talking about iommu groups here, we're not creating any sort of
> > parallel grouping specific to mdev devices.  
> 
> I thought we were talking about group of mdev devices and not iommu
> group. IIRC, there were concerns about it (this would be similar to
> UUID+instance) and that would (ab)use iommu groups.

What constraints does a group, which is not an iommu group, place on the
usage of the mdev devices?  What happens if we put two mdev devices in
the same "mdev group" and then assign them to separate VMs/users?  I
believe that the answer is that this theoretical "mdev group" doesn't
actually impose any constraints on the devices within the group or how
they're used.

vfio knows about iommu groups and we consider an iommu group to be the
unit of ownership for userspace.  Therefore by placing multiple mdev
devices within the same iommu group we can be assured that there's only
one user for that group.  Furthermore, the specific case for this
association on NVIDIA is to couple the hardware peer-to-peer resources
for the individual mdev devices.  Therefore this particular grouping
does imply a lack of isolation between those mdev devices involved in
the group.

For mdev devices which are actually isolated from one another, where
they don't poke these p2p holes, placing them in the same iommu group
is definitely an abuse of the interface and is going to lead to
problems with a single iommu context.  But how does libvirt know that
one type of mdev device needs to be grouped while another type doesn't?

There's really not much that I like about using iommu groups in this
way, it's just that they seem to solve this particular problem of
enforcing how such a group can be used and imposing a second form of
grouping onto the vfio infrastructure seems much too complex.

> I'm thinking about your suggestion, but would also like to know your
> thought how sysfs interface would look like? Its still no clear to me.
> Or will it be better to have grouping at mdev layer?

In previous replies I had proposed that a group could be an additional
argument when we write the mdev UUID to the create entry in sysfs.
This is specifically why I listed only the UUID when creating the first
mdev device and UUID:group when creating the second.  The user would
need to go determine the group ID allocated for the first entry to
specify creating the second within that same group.

I have no love for this proposal, it's functional but not elegant and
again leaves libvirt lost in trying to determine which devices need to
be grouped together and which have no business being grouped together.

Let's think through this further and let me make a couple assumptions
to get started:

1) iommu groups are the way that we want to group NVIDIA vGPUs because:
  a) The peer-to-peer resources represent an isolation gap between
 mdev devices, iommu groups represent sets of isolated devices.
  b) The 1:1 mapping of an iommu group to a user matches the NVIDIA
 device model.
  c) iommu_group_for_each_dev() gives the vendor driver the
 functionality it needs to perform a first-open/last-close
 device walk for configuring these p2p resources.

2) iommu groups as used by mdev devices should contain the minimum
number of devices in order to provide the maximum iommu context
flexibility.

Do we agree on these?  The corollary is that NVIDIA is going to suffer
reduced iommu granularity exactly because of the requirement to setup
p2p resources between mdev devices within the same VM.  This has
implications when guest iommus are in play (viommu).

So by default we want an iommu group per mdev.  This works for all mdev
devices as far as we know, including NVIDIA with the constraint that we
only have a single NVIDIA device per VM.

What if we want multiple NVIDIA devices?  We either need to create the
additional devices with a property which will place them into the sa

Re: [Qemu-devel] [PULL 00/66] ppc-for-2.8 queue 20160906

2016-09-06 Thread Thomas Huth

On 06.09.2016 22:55, Thomas Huth wrote:
> On 06.09.2016 22:23, Paolo Bonzini wrote:
>>
>>
>> On 06/09/2016 21:35, Thomas Huth wrote:
>>> D'oh! I'll have a look...
>>>
> I also see warnings on non-KVM hosts during 'make check' which seem
> to be produced by this new test:
>
> TEST: tests/boot-serial-test... (pid=2836)
>   /i386/boot-serial/isapc:
> warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit
> 1]
>>> That happens because the test is running QEMU with the isapc machine. I
>>> also get that warning message if I simply run the following on the
>>> command line:
>>>
>>> $ qemu-system-x86_64 -M isapc
>>> warning: TCG doesn't support requested feature: CPUID.01H:EDX.vme [bit 1]
>>>
>>> So the warning is likely there since quite a while already, just nobody
>>> is running with -M isapc nowadays anymore, so nobody noticed this ...
>>> Does anybody have got a clue how to fix that warning?
>>
>> Implement vme... :)
> 
> Hmmm, I was rather looking for an easier solution. I think I also found
> one: By using "-cpu qemu32", the warning does not occur anymore...

David,

since you likely got to respin this pull request anyway, could you
please squash the following patch into my "tests: Check serial output
of firmware boot of some machines" patch? This silences the warning
about the unsupported vme TCG feature:

diff --git a/tests/boot-serial-test.c b/tests/boot-serial-test.c
index b36c6bf..d98c564 100644
--- a/tests/boot-serial-test.c
+++ b/tests/boot-serial-test.c
@@ -29,10 +29,10 @@ static testdef_t tests[] = {
 { "ppc64", "ppce500", "", "U-Boot" },
 { "ppc64", "prep", "", "Open Hack'Ware BIOS" },
 { "ppc64", "pseries", "", "Open Firmware" },
-{ "i386", "isapc", "-device sga", "SGABIOS" },
+{ "i386", "isapc", "-cpu qemu32 -device sga", "SGABIOS" },
 { "i386", "pc", "-device sga", "SGABIOS" },
 { "i386", "q35", "-device sga", "SGABIOS" },
-{ "x86_64", "isapc", "-device sga", "SGABIOS" },
+{ "x86_64", "isapc", "-cpu qemu32 -device sga", "SGABIOS" },
 { "x86_64", "q35", "-device sga", "SGABIOS" },
 { "s390x", "s390-ccw-virtio",
   "-nodefaults -device sclpconsole,chardev=serial0", "virtio device" },

Thanks,
 Thomas

[Qemu-devel] [PATCH V4] tests/acpi: speedup acpi tests

2016-09-06 Thread Marcel Apfelbaum

Use kvm acceleration if available.
Disable kernel-irqchip and use qemu64 cpu
for both kvm and tcg cases.

Using kvm acceleration saves about a second
and disabling kernel-irqchip has no visible
performance impact.

Acked-by: Michael S. Tsirkin 
Signed-off-by: Marcel Apfelbaum 
---
v3->v4:
  - Add comment to explain why the kernel irqchip is disabled (Michael)

v2->v3:
  - Get rid of -cpu qemu64, is the default anyway (Paolo)
  - Removed a not needed chunk

v1->v2:
  - Disable kernel_irqchip an use qemu64 cpu (Michael)
  - Replace check_kvm function with -machine accel=kvm:tcg (Paolo)
  - Don't pass the machine parameter to test_acpi_one

 tests/bios-tables-test.c | 39 ---
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index de4019e..7e27ea9 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -711,9 +711,12 @@ static void test_acpi_one(const char *params, test_data 
*data)
 {
 char *args;
 
-args = g_strdup_printf("-net none -display none %s "
+/* Disable kernel irqchip to be able to override apic irq0. */
+args = g_strdup_printf("-machine %s,accel=%s,kernel-irqchip=off "
+   "-net none -display none %s "
"-drive id=hd0,if=none,file=%s,format=raw "
"-device ide-hd,drive=hd0 ",
+   data->machine, "kvm:tcg",
params ? params : "", disk);
 
 qtest_start(args);
@@ -758,7 +761,7 @@ static void test_acpi_piix4_tcg(void)
 data.machine = MACHINE_PC;
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one("-machine accel=tcg", &data);
+test_acpi_one(NULL, &data);
 free_test_data(&data);
 }
 
@@ -771,7 +774,7 @@ static void test_acpi_piix4_tcg_bridge(void)
 data.variant = ".bridge";
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one("-machine accel=tcg -device pci-bridge,chassis_nr=1", &data);
+test_acpi_one("-device pci-bridge,chassis_nr=1", &data);
 free_test_data(&data);
 }
 
@@ -783,7 +786,7 @@ static void test_acpi_q35_tcg(void)
 data.machine = MACHINE_Q35;
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one("-machine q35,accel=tcg", &data);
+test_acpi_one(NULL, &data);
 free_test_data(&data);
 }
 
@@ -796,7 +799,7 @@ static void test_acpi_q35_tcg_bridge(void)
 data.variant = ".bridge";
 data.required_struct_types = base_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(base_required_struct_types);
-test_acpi_one("-machine q35,accel=tcg -device pci-bridge,chassis_nr=1",
+test_acpi_one("-device pci-bridge,chassis_nr=1",
   &data);
 free_test_data(&data);
 }
@@ -808,8 +811,7 @@ static void test_acpi_piix4_tcg_cphp(void)
 memset(&data, 0, sizeof(data));
 data.machine = MACHINE_PC;
 data.variant = ".cphp";
-test_acpi_one("-machine accel=tcg"
-  " -smp 2,cores=3,sockets=2,maxcpus=6",
+test_acpi_one("-smp 2,cores=3,sockets=2,maxcpus=6",
   &data);
 free_test_data(&data);
 }
@@ -821,8 +823,7 @@ static void test_acpi_q35_tcg_cphp(void)
 memset(&data, 0, sizeof(data));
 data.machine = MACHINE_Q35;
 data.variant = ".cphp";
-test_acpi_one("-machine q35,accel=tcg"
-  " -smp 2,cores=3,sockets=2,maxcpus=6",
+test_acpi_one(" -smp 2,cores=3,sockets=2,maxcpus=6",
   &data);
 free_test_data(&data);
 }
@@ -840,7 +841,7 @@ static void test_acpi_q35_tcg_ipmi(void)
 data.variant = ".ipmibt";
 data.required_struct_types = ipmi_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(ipmi_required_struct_types);
-test_acpi_one("-machine q35,accel=tcg -device ipmi-bmc-sim,id=bmc0"
+test_acpi_one("-device ipmi-bmc-sim,id=bmc0"
   " -device isa-ipmi-bt,bmc=bmc0",
   &data);
 free_test_data(&data);
@@ -858,7 +859,7 @@ static void test_acpi_piix4_tcg_ipmi(void)
 data.variant = ".ipmikcs";
 data.required_struct_types = ipmi_required_struct_types;
 data.required_struct_types_len = ARRAY_SIZE(ipmi_required_struct_types);
-test_acpi_one("-machine accel=tcg -device ipmi-bmc-sim,id=bmc0"
+test_acpi_one("-device ipmi-bmc-sim,id=bmc0"
   " -device isa-ipmi-kcs,irq=0,bmc=bmc0",
   &data);
 free_test_data(&data);
@@ -876,14 +877,14 @@ int main(int argc, char *argv[])
 g_test_init(&argc, &argv, NULL);
 
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
-qtest_add_func("acpi/piix4/tcg", test_acpi_piix4_tcg);
-

1 2 3 4 5 >

1 - 100 of 432 matches

Mail list logo