date:20140818

Re: [Qemu-devel] [PATCH] net: Forbid dealing with packets when VM is not running

2014-08-18 Thread zhanghailiang


On 2014/8/18 20:27, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

For all NICs(except virtio-net) emulated by qemu,
Such as e1000, rtl8139, pcnet and ne2k_pci,
Qemu can still receive packets when VM is not running.
If this happened in *migration's* last PAUSE VM stage,
The new dirty RAM related to the packets will be missed,
And this will lead serious network fault in VM.


I'd like to understand more about when exactly this happens.
migration.c:migration_thread  in the last step, takes the iothread, puts
the runstate into RUN_STATE_FINISH_MIGRATE and only then calls 
qemu_savevm_state_complete
before unlocking the iothread.



Hmm, Sorry, the description above may not be exact.

Actually, the action of receiving packets action happens after the
migration thread unlock the iothread-lock(when VM is stopped), but
before the end of the migration(to be exact, before close the socket
fd, which is used to send data to destination).

I think before close the socket fd, we can not assure all the RAM of the
VM has been copied to the memory of network card or has been send to
the destination, we still have the chance to modify its content. (If i
am wrong, please let me know, Thanks. ;) )

If the above situation happens, it will dirty parts of the RAM which
will be send and parts of new dirty RAM page may be missed. As a result,
The contents of memory in the destination is not integral, And this
will lead network fault in VM.

Here i have made a test:
(1) Download the new qemu.
(2) Extend the time window between qemu_savevm_state_complete and
migrate_fd_cleanup(where qemu_fclose(s->file) will be called), patch
like(this will extend the chances of receiving packets between the time
window):
diff --git a/migration.c b/migration.c
index 8d675b3..597abf9 100644
--- a/migration.c
+++ b/migration.c
@@ -614,7 +614,7 @@ static void *migration_thread(void *opaque)
 qemu_savevm_state_complete(s->file);
 }
 qemu_mutex_unlock_iothread();
-
+sleep(2);/*extend the time window between stop vm and 
end migration*/

 if (ret < 0) {
 migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_ERROR);

 break;
(3) Start VM (suse11sp1) and in VM ping xxx -i 0.1
(4) Migrate the VM to another host.

And the *PING* command in VM will very likely fail with message:
'Destination HOST Unreachable', the NIC in VM will stay unavailable
unless you run 'service network restart'.(without step 2, you may have
to migration lots of times between two hosts before that happens).

Thanks,
zhanghailiang


If that's true, how does this problem happen - can the net code
still receive packets with the iothread lock taken?
qemu_savevm_state_complete does a call to the RAM code, so I think this should 
get
any RAM changes that happened before the lock was taken.

I'm gently worried about threading stuff as well - is all this net code
running off fd handlers on the main thread or are there separate threads?

Dave


To avoid this, we forbid receiving packets in generic net code when
VM is not running. Also, when the runstate changes back to running,
we definitely need to flush queues to get packets flowing again.

Here we implement this in the net layer:
(1) Judge the vm runstate in qemu_can_send_packet
(2) Add a member 'VMChangeStateEntry *vmstate' to struct NICState,
Which will listen for VM runstate changes.
(3) Register a handler function for VMstate change.
When vm changes back to running, we flush all queues in the callback function.
(4) Remove checking vm state in virtio_net_can_receive

Signed-off-by: zhanghailiang
---
  hw/net/virtio-net.c |  4 
  include/net/net.h   |  2 ++
  net/net.c   | 32 
  3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 268eff9..287d762 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -839,10 +839,6 @@ static int virtio_net_can_receive(NetClientState *nc)
  VirtIODevice *vdev = VIRTIO_DEVICE(n);
  VirtIONetQueue *q = virtio_net_get_subqueue(nc);

-if (!vdev->vm_running) {
-return 0;
-}
-
  if (nc->queue_index>= n->curr_queues) {
  return 0;
  }
diff --git a/include/net/net.h b/include/net/net.h
index ed594f9..a294277 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -8,6 +8,7 @@
  #include "net/queue.h"
  #include "migration/vmstate.h"
  #include "qapi-types.h"
+#include "sysemu/sysemu.h"

  #define MAX_QUEUE_NUM 1024

@@ -96,6 +97,7 @@ typedef struct NICState {
  NICConf *conf;
  void *opaque;
  bool peer_deleted;
+VMChangeStateEntry *vmstate;
  } NICState;

  NetClientState *qemu_find_netdev(const char *id);
diff --git a/net/net.c b/net/net.c
index 6d930ea..21f0d48 100644
--- a/net/net.c
+++ b/net/net.c
@@ -242,6 +242,29 @@ NetClientState *qemu_new_net_client(NetClientInfo *info,
  retur

Re: [Qemu-devel] [Qemu-trivial] [PATCH v7] block/vvfat: assert return value of fopen which may fail

2014-08-18 Thread zhanghailiang


On 2014/8/18 19:42, Michael Tokarev wrote:

18.08.2014 12:06, Peter Maydell wrote:

On 18 August 2014 09:00, zhanghailiang  wrote:

From: Li Liu

fopen() may return NULL which will cause setbuf() segmentfault

Signed-off-by: zhanghailiang
Signed-off-by: Li Liu
---
  block/vvfat.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/block/vvfat.c b/block/vvfat.c
index 70176b1..62023e1 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1084,6 +1084,7 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,

  DLOG(if (stderr == NULL) {
  stderr = fopen("vvfat.log", "a");
+assert(stderr);
  setbuf(stderr, NULL);
  })


An assertion is no better than a segfault.

Better I think would be to just remove this whole lump of code
entirely. Lots of other files do debug printing to stderr without
attempting to open a file if stderr happens to be NULL, why
should vvfat.c be special?


Indeed.  I've applied a patch which just removes these 6 lines of code
(and sent it to the list too, for review).

Thanks,

/mjt

.


OK, thanks.:)

Re: [Qemu-devel] [Question] Why doesn't PCIe hotplug work for Q35 machine?

2014-08-18 Thread Gonglei (Arei)

> >> Subject: Re: [Question] Why doesn't PCIe hotplug work for Q35 machine?
> >>
> >> On Sun, 2014-08-17 at 13:00 +0200, Michael S. Tsirkin wrote:
> >>> On Fri, Aug 15, 2014 at 07:33:29AM +, Gonglei (Arei) wrote:
>  Hi,
> 
>  I noticed that the qemu-2.1 release change log says
>  " PCIe: Basic hot-plug/hot-unplug support for Q35 machine."
>  And then I made a testing for the hotplugging function of Q35.
>  But I'm failed, and I got the dmesg log in guest os as below:
> 
>  [ 159.035250] Pciehp :05:00.0:pcie24: Button pressed on Slot (0 - 4)
>  [ 159.035274] Pciehp :05:00.0:pcie24: Card present on Slot (0 - 4)
>  [ 159.036517] Pciehp :05:00.0:pcie24: PCI slot #0 - 4 - powering on
> due
> >> to button press.
>  [ 159.188049] Pciehp :05:00.0:pcie24: Failed to check link status
>  [ 159.201968] Pciehp :05:00.0:pcie24: Card not present on Slot (0 - 
>  4)
>  [ 159.202529] Pciehp :05:00.0:pcie24: Already disabled on Slot (0 - 
>  4)
> 
>  Steps of testing:
> 
>  #1. QEMU version:
> 
>   The lateset master tree source.
> 
>  #2. Command line:
> 
>  ./qemu-system-x86_64 -enable-kvm -m 2048 -machine q35 -device
> >> ide-drive,bus=ide.2,drive=MacHDD \
>   -drive
> id=MacHDD,if=none,file=/mnt/sdb/gonglei/image/redhat_q35.img
> >> -monitor stdio -vnc :10 -readconfig ../docs/q35-chipset.cfg
>  QEMU 2.0.93 monitor - type 'help' for more information
>  (qemu) device_add
> >> virtio-net-pci,id=nic2,bus=pcie-switch-downstream-port-1-1,addr=1.0
> >>>
> >>> I don't think you can use any slot except slot 0 for pci express.
> >
> > OK. Does the PCIe specification say that?
> > I appreciate very much that you explain more.
> 
> The closest I could find is in "7.3. Configuration Transaction
> Rules"/"7.3.1. Device Number":
> 
> With non-ARI Devices, PCI Express components are restricted to
> implementing a single Device Number on their primary interface (Upstream
> Port) [...] Downstream Ports that do not have ARI Forwarding enabled
> must associate only Device 0 with the device attached to the Logical Bus
> representing the Link from the Port. Configuration Requests
> targeting the Bus Number associated with a Link specifying Device Number
> 0 are delivered to the device attached to the Link; Configuration
> Requests specifying all other Device Numbers (1-31)
> must be terminated by the Switch Downstream Port or the Root Port with
> an Unsupported Request Completion Status (equivalent to Master Abort in
> PCI).
> 
Thanks a lot, Paolo.

And I found another issue when cold-plugging don't using slot 0, the PCIe 
device also can't
be searched in guest os. 

So, I have some questions and ideas:

1. Does qemu support ARI Forwarding for PCIe at present? If yes, how to enable 
it ?
2. If not, we should add some check for PCIe root ports and downstream ports,
 meanwhile add explaining document.
3. Those check should add in general code level, both hotplug and coldplug.

Am I right?  Thanks.

Best regards,
-Gonglei

[Qemu-devel] [PATCH V3] spapr: Fix stale HTAB during live migration

2014-08-18 Thread Samuel Mendoza-Jonas

If a guest reboots during a running migration, changes to the
hash page table are not necessarily updated on the destination.
Opening a new file descriptor to the HTAB forces the migration
handler to resend the entire table.

Signed-off-by: Samuel Mendoza-Jonas 
---
Changes in v3: Pointed out by David, htab_save_iterate could
potentially try to read before htab_fd is open again.
Leave opening the fd to the functions trying to read.
Changes in v2: Forgot check on kvmppc_get_htab_fd return value
 hw/ppc/spapr.c | 25 +
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 26 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 3a6d26d..5b41318 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -997,6 +997,10 @@ static void spapr_reset_htab(sPAPREnvironment *spapr)
 /* Kernel handles htab, we don't need to allocate one */
 spapr->htab_shift = shift;
 kvmppc_kern_htab = true;
+
+/* Check if we are overlapping a migration */
+if (spapr->htab_fd > 0)
+spapr->need_reset = true;
 } else {
 if (!spapr->htab) {
 /* Allocate an htab if we don't yet have one */
@@ -1156,6 +1160,7 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
 } else {
 assert(kvm_enabled());
 
+spapr->need_reset = false;
 spapr->htab_fd = kvmppc_get_htab_fd(false);
 if (spapr->htab_fd < 0) {
 fprintf(stderr, "Unable to open fd for reading hash table from 
KVM: %s\n",
@@ -1309,6 +1314,16 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
 if (!spapr->htab) {
 assert(kvm_enabled());
 
+if (atomic_cmpxchg(&spapr->need_reset, true, false) == true) {
+close(spapr->htab_fd);
+spapr->htab_fd = kvmppc_get_htab_fd(false);
+if (spapr->htab_fd < 0) {
+fprintf(stderr, "Unable to open fd for reading hash table from 
KVM: %s\n",
+strerror(errno));
+return -1;
+}
+}
+
 rc = kvmppc_save_htab(f, spapr->htab_fd,
   MAX_KVM_BUF_SIZE, MAX_ITERATION_NS);
 if (rc < 0) {
@@ -1340,6 +1355,16 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
 
 assert(kvm_enabled());
 
+if (atomic_cmpxchg(&spapr->need_reset, true, false) == true) {
+close(spapr->htab_fd);
+spapr->htab_fd = kvmppc_get_htab_fd(false);
+if (spapr->htab_fd < 0) {
+fprintf(stderr, "Unable to open fd for reading hash table from 
KVM: %s\n",
+strerror(errno));
+return -1;
+}
+}
+
 rc = kvmppc_save_htab(f, spapr->htab_fd, MAX_KVM_BUF_SIZE, -1);
 if (rc < 0) {
 return rc;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 0c2e3c5..9ab9827 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -71,6 +71,7 @@ typedef struct sPAPREnvironment {
 int htab_save_index;
 bool htab_first_pass;
 int htab_fd;
+bool need_reset;
 
 /* state for Dynamic Reconfiguration Connectors */
 sPAPRDrcEntry drc_table[SPAPR_DRC_TABLE_SIZE];
-- 
1.9.3

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Paolo Bonzini

Il 18/08/2014 22:50, Hulin, Patrick - 0559 - MITLL ha scritto:
>> >Correct. Doesn¹t work. Haven¹t fully diagnosed why, but it doesn¹t seem
>> >to ever hit the current_tb_modified passage if you invalidate beforehand.
> Yeah - mem_io_pc doesn¹t get updated until we¹re inside io_write, so
> tb_invalidate_phys_page_range thinks we¹re inside a different TB. As a
> result, it¹s ³is this TB modified² check still returns false.

We can set that (and probably mem_io_vaddr) before the for loop, too.

Paolo

Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

2014-08-18 Thread Jan Kiszka

On 2014-08-19 06:08, Knut Omang wrote:
>> Are you depending on interrupt remapping? If not, my patches are a bit
>> hacky and may cause their own issues if you are unlucky.
>  
> It does not depend directly but interprets a NULL PciDevice pointer as
> the special bus number (0xff) for non-pci devices, so I have tried to

Yeah, this won't map nicely on future chipsets - unless we redefine
Q35_PSEUDO_XXX as VTD_PSEUDO_XXX, or so, and use the same values for all
of them. Or create an unconnected pseudo PCI bus for the chipset devices
that carry the pseudo bus number.

> take heights for that - I can rebase if so desired, but I would like to
> see the interrupt remapping emerge as well ;-)

Good, then we are already two ;). Let's get the baseline ready, then we
can discuss extensions like interrupt remapping.

Jan

signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH V3] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Jason Wang

commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
support changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail. In
particular, this has been observed during migration.

To adapt this, several other changes are needed:
- remove the hdev->started assertion in vhost.c since we may want to
start the guest notifiers before vhost starts and stop the guest
notifiers after vhost is stopped.
- introduce the vhost_net_set_vq_index() and call it before setting
guest notifiers. This is used to guarantee vhost_net has the correct
virtqueue index when setting guest notifiers.

Reported-by: "Zhangjie (HZ)" 
Cc: William Dauchy 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 

--
Changes from Michael's patch:
- Remove the assertion
Changes from V1:
- Rebase to latest
Changes from V2:
- Introduce vhost_net_set_vq_index() to unbreak multiqueue

Zhang Jie, please test this patch to see if it fixes the issue.
---
 hw/net/vhost_net.c | 31 +++
 hw/virtio/vhost.c  |  2 --
 2 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index f87c798..fe0203d 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -188,9 +188,13 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 return vhost_dev_query(&net->dev, dev);
 }
 
+static void vhost_net_set_vq_index(struct vhost_net *net, int vq_index)
+{
+net->dev.vq_index = vq_index;
+}
+
 static int vhost_net_start_one(struct vhost_net *net,
-   VirtIODevice *dev,
-   int vq_index)
+   VirtIODevice *dev)
 {
 struct vhost_vring_file file = { };
 int r;
@@ -201,7 +205,6 @@ static int vhost_net_start_one(struct vhost_net *net,
 
 net->dev.nvqs = 2;
 net->dev.vqs = net->vqs;
-net->dev.vq_index = vq_index;
 
 r = vhost_dev_enable_notifiers(&net->dev, dev);
 if (r < 0) {
@@ -309,11 +312,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 }
 
 for (i = 0; i < total_queues; i++) {
-r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
-
-if (r < 0) {
-goto err;
-}
+vhost_net_set_vq_index(get_vhost_net(ncs[i].peer), i * 2);
 }
 
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
@@ -322,6 +321,14 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 goto err;
 }
 
+for (i = 0; i < total_queues; i++) {
+r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev);
+
+if (r < 0) {
+goto err;
+}
+}
+
 return 0;
 
 err:
@@ -339,16 +346,16 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState 
*ncs,
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
 int i, r;
 
+for (i = 0; i < total_queues; i++) {
+vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+}
+
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
 if (r < 0) {
 fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
 fflush(stderr);
 }
 assert(r >= 0);
-
-for (i = 0; i < total_queues; i++) {
-vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
-}
 }
 
 void vhost_net_cleanup(struct vhost_net *net)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e55fe1c..5d7c40a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -976,7 +976,6 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 bool vhost_virtqueue_pending(struct vhost_dev *hdev, int n)
 {
 struct vhost_virtqueue *vq = hdev->vqs + n - hdev->vq_index;
-assert(hdev->started);
 assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
 return event_notifier_test_and_clear(&vq->masked_notifier);
 }
@@ -988,7 +987,6 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, 
VirtIODevice *vdev, int n,
 struct VirtQueue *vvq = virtio_get_queue(vdev, n);
 int r, index = n - hdev->vq_index;
 
-assert(hdev->started);
 assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
 
 struct vhost_vring_file file = {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

2014-08-18 Thread Knut Omang

On Tue, 2014-08-19 at 06:08 +0200, Knut Omang wrote:
> On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote:
> > On 2014-08-18 18:34, Knut Omang wrote:
> > > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> > >> On 2014-08-16 10:45, Jan Kiszka wrote:
> > >>> On 2014-08-16 09:54, Knut Omang wrote:
> >  On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> > > Hi Knut,
> > >
> > > 2014-08-15 19:15 GMT+08:00 Knut Omang :
> > >> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> > >>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
> >  On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> > > On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> > >> Hi,
> > >>
> > >> These patches are intended to introduce Intel IOMMU (VT-d) 
> > >> emulation to q35
> > >> chipset. The major job in these patches is to add support for 
> > >> emulating Intel
> > >> IOMMU according to the VT-d specification, including basic 
> > >> responses to CSRs
> > >> accesses, the logics of DMAR (DMA remapping) and DMA memory 
> > >> address
> > >> translations.
> > >
> > > Thanks!
> > > Looks very good overall, I noted some coding style issues - I 
> > > didn't
> > > bother reporting each issue in every place where it appears - 
> > > reported
> > > each issue once only, so please find and fix all instances of each
> > > issue.
> > 
> >  BTW, because I was in urgent need for virtual test environment for
> >  Jailhouse, I hacked interrupt remapping on top of Le's patches:
> > 
> >  http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> > 
> >  The approach likely needs further discussions and refinements but 
> >  it
> >  already allows me to work on top with our hypervisor, and also 
> >  Linux.
> >  You can see from the last commit that Le's work made it pretty 
> >  easy to
> >  build this on top.
> > >>>
> > >>> Le,
> > >>>
> > >>> I have tried Jan's branch with my device setup which consists of a
> > >>> minimal q35 setup, an ioh3420 root port (specified as -device
> > >>> ioh3420,slot=0 ) and a pcie device plugged into that root port, 
> > >>> which
> > >>> gives the following lscpi -t:
> > >>>
> > >>> -[:00]-+-00.0
> > >>>+-01.0
> > >>>+-02.0
> > >>>+-03.0-[01]00.0
> > >>>+-04.0
> > >>>+-1f.0
> > >>>+-1f.2
> > >>>\-1f.3
> > >>>
> > >>> All seems to work beautifully (I see the ISA bridge happily receive
> > >>> translations) until the first DMA from my device model (at 1:00.0)
> > >>> arrives, at which point I get:
> > >>>
> > >>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] 
> > >>> fault addr fffa
> > >>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry 
> > >>> is clear
> > >>>
> > >>> I would have expected request device 01:00.0 for this.
> > >>> It is not clear to me yet if this is a weakness of the 
> > >>> implementation of
> > >>> ioh3420 or the iommu. Just wanted to let you know right away in 
> > >>> case you
> > >>> can shed some light to it or it is an easy fix,
> > >>>
> > >>> The device uses pci_dma_rw with itself as device pointer.
> > >>
> > >> To verify my hypothesis: with this rude hack my device now works much
> > >> better:
> > >>
> > >> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace 
> > >> *vtd_as,
> > >> int bus_num, int devfn,
> > >>  is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> > >>  } else {
> > >>  ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> > >> +if (ret_fr)
> > >> +ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> > >>  is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> > >>  if (ret_fr) {
> > >>  ret_fr = -ret_fr;
> > >>
> > >> Looking at how things look on hardware, multiple devices often 
> > >> receive
> > >> overlapping DMA address ranges for different physical addresses.
> > >>
> > >> So if I understand the way this works, every requester ID would also
> > >> need to have it's own unique VTDAddressSpace, as each pci
> > >> device/function sees a unique DMA address space..
> > >
> > > ioh3420 is a pcie-to-pcie bridge, right? 
> > 
> >  Yes.
> > 
> > > In my opinion, each pci-e
> > > device behind the pcie-to-pcie bridge can be assigned individually.
> > > For now I added the VT-d to q35 by just adding it to the root pci bus.
> > > You can see here in q35.c:
> > > pci_setup_iommu(pci_bus,

Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

2014-08-18 Thread Knut Omang

On Mon, 2014-08-18 at 20:50 +0200, Jan Kiszka wrote:
> On 2014-08-18 18:34, Knut Omang wrote:
> > On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
> >> On 2014-08-16 10:45, Jan Kiszka wrote:
> >>> On 2014-08-16 09:54, Knut Omang wrote:
>  On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> > Hi Knut,
> >
> > 2014-08-15 19:15 GMT+08:00 Knut Omang :
> >> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
> >>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
>  On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> > On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
> >> Hi,
> >>
> >> These patches are intended to introduce Intel IOMMU (VT-d) 
> >> emulation to q35
> >> chipset. The major job in these patches is to add support for 
> >> emulating Intel
> >> IOMMU according to the VT-d specification, including basic 
> >> responses to CSRs
> >> accesses, the logics of DMAR (DMA remapping) and DMA memory address
> >> translations.
> >
> > Thanks!
> > Looks very good overall, I noted some coding style issues - I didn't
> > bother reporting each issue in every place where it appears - 
> > reported
> > each issue once only, so please find and fix all instances of each
> > issue.
> 
>  BTW, because I was in urgent need for virtual test environment for
>  Jailhouse, I hacked interrupt remapping on top of Le's patches:
> 
>  http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap
> 
>  The approach likely needs further discussions and refinements but it
>  already allows me to work on top with our hypervisor, and also Linux.
>  You can see from the last commit that Le's work made it pretty easy 
>  to
>  build this on top.
> >>>
> >>> Le,
> >>>
> >>> I have tried Jan's branch with my device setup which consists of a
> >>> minimal q35 setup, an ioh3420 root port (specified as -device
> >>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
> >>> gives the following lscpi -t:
> >>>
> >>> -[:00]-+-00.0
> >>>+-01.0
> >>>+-02.0
> >>>+-03.0-[01]00.0
> >>>+-04.0
> >>>+-1f.0
> >>>+-1f.2
> >>>\-1f.3
> >>>
> >>> All seems to work beautifully (I see the ISA bridge happily receive
> >>> translations) until the first DMA from my device model (at 1:00.0)
> >>> arrives, at which point I get:
> >>>
> >>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault 
> >>> addr fffa
> >>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is 
> >>> clear
> >>>
> >>> I would have expected request device 01:00.0 for this.
> >>> It is not clear to me yet if this is a weakness of the implementation 
> >>> of
> >>> ioh3420 or the iommu. Just wanted to let you know right away in case 
> >>> you
> >>> can shed some light to it or it is an easy fix,
> >>>
> >>> The device uses pci_dma_rw with itself as device pointer.
> >>
> >> To verify my hypothesis: with this rude hack my device now works much
> >> better:
> >>
> >> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace 
> >> *vtd_as,
> >> int bus_num, int devfn,
> >>  is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>  } else {
> >>  ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
> >> +if (ret_fr)
> >> +ret_fr = dev_to_context_entry(s, 1, 0, &ce);
> >>  is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >>  if (ret_fr) {
> >>  ret_fr = -ret_fr;
> >>
> >> Looking at how things look on hardware, multiple devices often receive
> >> overlapping DMA address ranges for different physical addresses.
> >>
> >> So if I understand the way this works, every requester ID would also
> >> need to have it's own unique VTDAddressSpace, as each pci
> >> device/function sees a unique DMA address space..
> >
> > ioh3420 is a pcie-to-pcie bridge, right? 
> 
>  Yes.
> 
> > In my opinion, each pci-e
> > device behind the pcie-to-pcie bridge can be assigned individually.
> > For now I added the VT-d to q35 by just adding it to the root pci bus.
> > You can see here in q35.c:
> > pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> > So if we add a pcie-to-pcie bridge, we may have to call the
> > pci_setup_iommu() for that new bus. I don't know where to hook into
> > this now. :) If you know the mechanism behind that, you can try to add
> > that for the new bus. (I will dive into this after the clean up.)
> > What do you t

Re: [Qemu-devel] [PATCH] target-arm: Fix resetting issues on ARMv7-M CPUs

2014-08-18 Thread Martin Galvan

Ping http://patchwork.ozlabs.org/patch/379134/


On Mon, Aug 11, 2014 at 1:50 PM, Martin Galvan <
martin.gal...@tallertechnologies.com> wrote:

> When calling qemu_system_reset after startup on a Cortex-M CPU, the
> initial values of PC, MSP and the Thumb bit weren't set correctly. In
> particular, since Thumb was 0, an Usage Fault would arise immediately after
> trying to excecute any instruction on a Cortex-M.
>
> Signed-off-by: Martin Galvan 
> ---
>  target-arm/cpu.c | 19 ++-
>  target-arm/cpu.h |  4 
>  2 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/target-arm/cpu.c b/target-arm/cpu.c
> index 7cebb76..d436b59 100644
> --- a/target-arm/cpu.c
> +++ b/target-arm/cpu.c
> @@ -131,7 +131,6 @@ static void arm_cpu_reset(CPUState *s)
>  /* On ARMv7-M the CPSR_I is the value of the PRIMASK register, and is
> clear at reset.  Initial SP and PC are loaded from ROM.  */
>  if (IS_M(env)) {
> -uint32_t pc;
>  uint8_t *rom;
>  env->daif &= ~PSTATE_I;
>  rom = rom_ptr(0);
> @@ -140,11 +139,21 @@ static void arm_cpu_reset(CPUState *s)
> modified flash and reset itself.  However images
> loaded via -kernel have not been copied yet, so load the
> values directly from there.  */
> -env->regs[13] = ldl_p(rom) & 0xFFFC;
> -pc = ldl_p(rom + 4);
> -env->thumb = pc & 1;
> -env->regs[15] = pc & ~1;
> +env->initial_MSP = ldl_p(rom) & 0xFFFC;
> +env->initial_PC = ldl_p(rom + 4);
> +env->initial_PC &= ~1;
>  }
> +
> +/* If we do a system reset, rom will be NULL since its data
> +was zeroed when calling cpu_flush_icache_range at startup. Set
> +the initial registers here using the values we loaded from ROM
> +at startup. */
> +env->regs[13] = env->initial_MSP;
> +env->regs[15] = env->initial_PC;
> +
> +/* ARMv7-M only supports Thumb instructions. If this isn't
> +   set we'll get an Usage Fault. */
> +env->thumb = 1;
>  }
>
>  if (env->cp15.c1_sys & SCTLR_V) {
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index 79205ba..a56aebd 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -330,6 +330,10 @@ typedef struct CPUARMState {
>
>  void *nvic;
>  const struct arm_boot_info *boot_info;
> +
> +/* Initial MSP and PC for ARMv7-M CPUs */
> +uint32_t initial_MSP; /* Stored in 0x0 inside the guest ROM */
> +uint32_t initial_PC; /* Stored in 0x4 inside the guest ROM */
>  } CPUARMState;
>
>  #include "cpu-qom.h"
> --
> 1.9.1
>



-- 

[image: http://www.tallertechnologies.com]


Martín Galván

Software Engineer

Taller Technologies Argentina

San Lorenzo 47, 3rd Floor, Office 5

Córdoba, Argentina
Phone: 54 351 4217888 / +54 351 4218211

[image: http://www.linkedin.com/company/taller-technologies]
[image:
https://www.facebook.com/tallertechnologies]

Re: [Qemu-devel] [PATCH V2] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Jason Wang

On 08/19/2014 11:02 AM, Jason Wang wrote:
> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
> support changed the order of stopping the device. Previously
> vhost_dev_stop would disable backend and only afterwards, unset guest
> notifiers. We now unset guest notifiers while vhost is still
> active. This can lose interrupts causing guest networking to fail.
>
> Additionally, remove the hdev->started assert in vhost.c since we may
> want to start the guest notifiers before vhost starts and stop the
> guest notifiers after vhost is stopped.
>
> In particular, this has been observed during migration.
>
> Reported-by: "Zhangjie (HZ)" 
> Cc: William Dauchy 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 
>
> --

Looks like the patch breaks multiqueue test. I will post V3.

Re: [Qemu-devel] vhost-net issue with multiples interfaces using MQ

2014-08-18 Thread Jason Wang

On 08/18/2014 07:42 PM, William Dauchy wrote:
> Hello,
>
> Using qemu2.1.0, a linux v3.14.X x86_64 as host and a linux v3.12.X x86_64 as 
> guest
> I'm starting a VM with these network interfaces:
>
> [netdev "vifA.0"]
>   type = "tap"
>   vhost = "on"
>   ifname = "vifA.0"
>   downscript = "no"
>   script = "no"
>   queues = "2"
>
> [device "vifA.0"]
>   driver = "virtio-net-pci"
>   netdev = "vifA.0"
>   mac = "00:16:3e:1a:4e:11"
>   mq = "on"
>   vectors = "5"
>
> (and same config for two other interfaces but with a different MAC address)
>
> In the error message I'm getting during startup:
> unable to start vhost net: 95: falling back on userspace virtio

What's the qemu command line for your testing? I try simple command line
with 3 mq cards in qemu 2.1. Everything works fine.
> I saw that it's failing in `vhost_dev_start` where `vhost_dev_set_features` 
> is returning an error.
> r = vhost_dev_set_features(hdev, hdev->log_enabled);
> I'm hitting the issue starting from three interfaces, i.e no problem with one 
> or two.
> I also don't have the issue with disabling multi queue.
>
> Am I wrong on something? What can I do to help debug this situation?
>
> Regards,

Is this a regression? If yes, you can probably bisect to find the first
bad commit.

Thanks

[Qemu-devel] [PATCH V2] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Jason Wang

commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
support changed the order of stopping the device. Previously
vhost_dev_stop would disable backend and only afterwards, unset guest
notifiers. We now unset guest notifiers while vhost is still
active. This can lose interrupts causing guest networking to fail.

Additionally, remove the hdev->started assert in vhost.c since we may
want to start the guest notifiers before vhost starts and stop the
guest notifiers after vhost is stopped.

In particular, this has been observed during migration.

Reported-by: "Zhangjie (HZ)" 
Cc: William Dauchy 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 

--

Zhang Jie, please test this patch to see if it fixes the issue.
---
 hw/net/vhost_net.c | 20 ++--
 hw/virtio/vhost.c  |  2 --
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index f87c798..41d5158 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -308,6 +308,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 goto err;
 }
 
+r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
+if (r < 0) {
+error_report("Error binding guest notifier: %d", -r);
+goto err;
+}
+
 for (i = 0; i < total_queues; i++) {
 r = vhost_net_start_one(get_vhost_net(ncs[i].peer), dev, i * 2);
 
@@ -316,12 +322,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 }
 }
 
-r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
-if (r < 0) {
-error_report("Error binding guest notifier: %d", -r);
-goto err;
-}
-
 return 0;
 
 err:
@@ -339,16 +339,16 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState 
*ncs,
 VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
 int i, r;
 
+for (i = 0; i < total_queues; i++) {
+vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
+}
+
 r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
 if (r < 0) {
 fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
 fflush(stderr);
 }
 assert(r >= 0);
-
-for (i = 0; i < total_queues; i++) {
-vhost_net_stop_one(get_vhost_net(ncs[i].peer), dev);
-}
 }
 
 void vhost_net_cleanup(struct vhost_net *net)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e55fe1c..5d7c40a 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -976,7 +976,6 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
VirtIODevice *vdev)
 bool vhost_virtqueue_pending(struct vhost_dev *hdev, int n)
 {
 struct vhost_virtqueue *vq = hdev->vqs + n - hdev->vq_index;
-assert(hdev->started);
 assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
 return event_notifier_test_and_clear(&vq->masked_notifier);
 }
@@ -988,7 +987,6 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, 
VirtIODevice *vdev, int n,
 struct VirtQueue *vvq = virtio_get_queue(vdev, n);
 int r, index = n - hdev->vq_index;
 
-assert(hdev->started);
 assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
 
 struct vhost_vring_file file = {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Jason Wang

On 08/19/2014 03:53 AM, Michael S. Tsirkin wrote:
> On Mon, Aug 18, 2014 at 05:51:31PM +0800, Jason Wang wrote:
>> > commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>> > support changed the order of stopping the device. Previously
>> > vhost_dev_stop would disable backend and only afterwards, unset guest
>> > notifiers. We now unset guest notifiers while vhost is still
>> > active. This can lose interrupts causing guest networking to fail.
>> > 
>> > Additionally, remove the hdev->started assert in vhost.c since we may
>> > want to start the guest notifiers before vhost starts and stop the
>> > guest notifiers after vhost is stopped.
>> > 
>> > In particular, this has been observed during migration.
>> > 
>> > Reported-by: "Zhangjie (HZ)" 
>> > Signed-off-by: Michael S. Tsirkin 
>> > Signed-off-by: Jason Wang 
> This doesn't seem to apply to master.
> Can you rebase please?

Yes, will send a new version.

Re: [Qemu-devel] [PATCH] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Jason Wang

On 08/18/2014 09:20 PM, William Dauchy wrote:
> On Mon, Aug 18, 2014 at 11:51 AM, Jason Wang  wrote:
>>  err:
>> @@ -254,16 +254,16 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState 
>> *ncs,
>>  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>  int i, r;
>>
>> +for (i = 0; i < total_queues; i++) {
>> +vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
>> +}
>> +
>>  r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
>>  if (r < 0) {
>>  fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
>>  fflush(stderr);
>>  }
>>  assert(r >= 0);
>> -
>> -for (i = 0; i < total_queues; i++) {
>> -vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
>> -}
>>  }
> since
> ed8b4af Refactor virtio-net to use generic get_vhost_net
> get_vhost_net is used instead of tap_get_vhost_net
>
> Could you rebase your patch to facilitate tests or is it intentional?

Not intentional, my tree is out of date. I will rebase the patch.

Thanks for pointing this out.
>
> Thanks,

Re: [Qemu-devel] [PATCH] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Jason Wang

On 08/18/2014 08:11 PM, Zhangjie (HZ) wrote:
> On 2014/8/18 17:51, Jason Wang wrote:
>> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
>> support changed the order of stopping the device. Previously
>> vhost_dev_stop would disable backend and only afterwards, unset guest
>> notifiers. We now unset guest notifiers while vhost is still
>> active. This can lose interrupts causing guest networking to fail.
>>
>> Additionally, remove the hdev->started assert in vhost.c since we may
>> want to start the guest notifiers before vhost starts and stop the
>> guest notifiers after vhost is stopped.
>>
>> In particular, this has been observed during migration.
> Thanks! I will have a test about your patch today! :-)

The patch was reported not applied cleanly. I will rebase it and send a
new one.

Please test that patch.

Thanks.

[Qemu-devel] [PATCH] configure: no need to mkdir QMP

2014-08-18 Thread Liming Wang

commit 7537fe04 QMP: QMP/ -> docs/qmp/

Above commit has moved last QMP files to docs/qmp and it's not necessary
to create QMP directory. So remove it from configure.

Signed-off-by: Liming Wang 
---
 configure | 4 
 1 file changed, 4 deletions(-)

diff --git a/configure b/configure
index 283c71c..b3000fd 100755
--- a/configure
+++ b/configure
@@ -5335,10 +5335,6 @@ for rom in seabios vgabios ; do
 echo "LD=$ld" >> $config_mak
 done
 
-if test "$docs" = "yes" ; then
-  mkdir -p QMP
-fi
-
 # set up qemu-iotests in this build directory
 iotests_common_env="tests/qemu-iotests/common.env"
 iotests_check="tests/qemu-iotests/check"
-- 
1.9.1

[Qemu-devel] [PATCH v2 1/1] pc-dimm: Change PCDIMMDevice->node from UINT32 to INT32, and initialize it as -1.

2014-08-18 Thread Tang Chen

If user doesn't specify numa options, nb_numa_nodes will be 0. But 
PCDIMMDevice->node
is also initialized to 0. As a result, the following check will fail:

pc_dimm_realize()
{
..
if (dimm->node >= nb_numa_nodes) {
error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
   PRIu32 "' which exceeds the number of numa nodes: %d",
   dimm->node, nb_numa_nodes);
return;
}
..
}

But this is not an error.

PCDIMMDevice->node should be initialized to -1. This is for users who do not use
NUMA.

Signed-off-by: Tang Chen 
---

Change log v1 -> v2:
1. Simplify the comment.
2. Move the definition of NO_NODE_ID near where it is used.

 hw/mem/pc-dimm.c | 7 ++-
 include/hw/mem/pc-dimm.h | 2 +-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index ec8b1a3..34109a2 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -195,9 +195,14 @@ out:
 return ret;
 }
 
+/* Default value for PCDIMMDevice->node (unless specified by user).
+ * In this case, SRAT won't be created.
+ */
+#define NO_NODE_ID -1
+
 static Property pc_dimm_properties[] = {
 DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0),
-DEFINE_PROP_UINT32(PC_DIMM_NODE_PROP, PCDIMMDevice, node, 0),
+DEFINE_PROP_INT32(PC_DIMM_NODE_PROP, PCDIMMDevice, node, NO_NODE_ID),
 DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot,
   PC_DIMM_UNASSIGNED_SLOT),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 761eeef..82abb2f 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -53,7 +53,7 @@ typedef struct PCDIMMDevice {
 
 /* public */
 uint64_t addr;
-uint32_t node;
+int32_t node;
 int32_t slot;
 HostMemoryBackend *hostmem;
 } PCDIMMDevice;
-- 
1.8.4.2

Re: [Qemu-devel] [PATCH 1/1] pc-dimm: Change PCDIMMDevice->node from UINT32 to INT32, and initialize it as -1.

2014-08-18 Thread tangchen


Hi Michael, Paolo

Thanks for the advices. Will send a v2 patch soon.

Thanks.

On 08/19/2014 04:04 AM, Michael S. Tsirkin wrote:

On Mon, Aug 18, 2014 at 03:58:33PM +0200, Paolo Bonzini wrote:

Il 18/08/2014 15:56, Michael S. Tsirkin ha scritto:

+/* Initialize PCDIMMDevice->node to -1 so that even if user doesn't specify
+ * any numa option, PCDIMMDevice->node won't be 0, which indicates node0.
+ * In this case, SRAT won't be created, and guest kernel will fake a node.
+ */

I think you simply mean
/* default value for PCDIMMDevice->node (unless specified by user) */


+#define NO_NODE_ID -1

I think a comment about the SRAT would be useful.

Paolo

OK so add:

/* In this case, SRAT won't be created. */
.

Re: [Qemu-devel] [RFC PATCH 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove support

2014-08-18 Thread Gu Zheng

Hi Igor, Andreas,
Could you please help to review this series? Any comment is welcome.

Regards,
Gu
On 08/07/2014 12:53 PM, Gu Zheng wrote:

> This series is based on the previous patchset from Chen Fan:
> https://lists.nongnu.org/archive/html/qemu-devel/2014-05/msg02360.html
> https://lists.nongnu.org/archive/html/qemu-devel/2013-12/msg04266.html
> 
> Patch 1~3: add device_add foo-x86_64-cpu support
> These three patches try to make cpu hotplug with device_add, and make
> "-device foo-x86_64-cpu" available，also we can set apic-id
> property with command line, if without setting apic-id property,
> we offer the first unoccupied apic id as the default new apic id.
> When hotplug cpu with device_add, additional check of APIC ID will be
> done after cpu object initialization which was different from
> 'cpu_add' command that check 'ids' at the beginning.
> 
> Patch 4~10: add i386 cpu hot remove support
> Via implementing ACPI standard methods _EJ0 in ACPI table, after Guest
> OS remove one vCPU online, the fireware will store removed bitmap to
> QEMU, then QEMU could know to notify the assigned vCPU of exiting.
> Meanwhile, intruduce the QOM command 'device_del' to remove vCPU from
> QEMU itself.
> 
> Chen Fan (6):
>   cpu: introduce CpuTopoInfo structure for argument simplification
>   cpu: add device_add foo-x86_64-cpu support
>   x86: add x86_cpu_unrealizefn() for cpu apic remove
>   qom cpu: rename variable 'cpu_added_notifier' to
> 'cpu_hotplug_notifier'
>   i386: implement pc interface cpu_common_unrealizefn() in qom/cpu.c
>   cpu hotplug: implement function cpu_status_write() for vcpu ejection
> 
> Gu Zheng (4):
>   qom/cpu: move register_vmstate to common CPUClass.realizefn
>   i386: add cpu device_del support
>   qom cpu: add UNPLUG cpu notify support
>   cpus: reclaim allocated vCPU objects
> 
>  cpus.c|   44 
>  exec.c|   32 +
>  hw/acpi/cpu_hotplug.c |   55 +--
>  hw/acpi/ich9.c|   13 ++--
>  hw/acpi/piix4.c   |   21 +++---
>  hw/i386/acpi-dsdt-cpu-hotplug.dsl |6 ++-
>  hw/i386/kvm/apic.c|8 ++
>  hw/i386/pc.c  |2 +-
>  hw/intc/apic.c|   10 +++
>  hw/intc/apic_common.c |   26 ++-
>  include/hw/acpi/cpu_hotplug.h |   14 -
>  include/hw/acpi/ich9.h|2 +-
>  include/hw/cpu/icc_bus.h  |1 +
>  include/hw/i386/apic_internal.h   |3 +
>  include/qom/cpu.h |   12 +++
>  include/sysemu/kvm.h  |1 +
>  include/sysemu/sysemu.h   |2 +-
>  kvm-all.c |   57 +++-
>  qdev-monitor.c|1 +
>  qom/cpu.c |   29 +++--
>  target-i386/cpu-qom.h |1 +
>  target-i386/cpu.c |  138 
> -
>  target-i386/topology.h|   51 +
>  23 files changed, 464 insertions(+), 65 deletions(-)
>

Re: [Qemu-devel] How to create PCH to support those existing driver

2014-08-18 Thread Chen, Tiejun

On 2014/8/18 17:58, Michael S. Tsirkin wrote:

On Mon, Aug 18, 2014 at 05:01:25PM +0800, Chen, Tiejun wrote:

On 2014/8/18 16:21, Michael S. Tsirkin wrote:

On Mon, Aug 18, 2014 at 11:06:29AM +0800, Chen, Tiejun wrote:

On 2014/8/17 18:32, Michael S. Tsirkin wrote:

On Fri, Aug 15, 2014 at 09:58:40AM +0800, Chen, Tiejun wrote:

Michael and Paolo,

Please re-post discussion on list. These off list ones are just
wasting time since they invariably have to be repeated on list again.

Okay, now just reissue this discussion to all related guys. And do you think
we need to discuss in public, qemu and xen mail list?

Absolutely.

Now -CC qemu, xen and intel-gfx.

If I'm missing someone important please tell me as well.

After I created that new machine specific to IGD passthrough, xenigd, now I
will step next to register the PCH.

IIRC, our complete solution should be as follows:

#1 create a new machine based on piix, xenigd

This is done with Michael help.

#2 register ISA bridge

1> Its still fixed at 1f.0.
2> ISA bridge's vendor_id/device_id should be emulated but then

subsystem_vendor_id = PCI_VENDOR_ID_XEN;
subsystem_device_id = ISA Bridge's real device id

This mean we need to change driver to walk with this way.
For example, in
case of Linux native driver,

intel_detect_pch()
{
...
if (pch->subsystem_vendor == PCI_VENDOR_ID_XEN)
id = pch->subsystem_device & INTEL_PCH_DEVICE_ID_MASK;

Then driver can get that real device id by 'subsystem_device', right?

This is fine now but how to support those existing drivers which are just

Here correct one point, we don't need to care about supporting the legacy
driver since the legacy driver still should work qemu-traditional. So we
just make sure the existing driver with this subsystem_id way can support
those existing and legacy platform.

Now this is clear to me.

dependent on checking real vendor_id/device_id directly,

if (pch->vendor == PCI_VENDOR_ID_INTEL) {
unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK

Maybe I'm missing something, please hint me.

Thanks
Tiejun

The subsystem id was just one idea.

But from that email thread, "RH / Intel Virtualization Engineering Meeting -
Minutes 7/10", I didn't see other idea we should prefer currently.

What was finally agreed for future drivers is that guests will get all
information they need from the video card, this ID hack was needed only
for very old legacy devices.
http://article.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/42258
So this is for newer guests, they will work without need
for hacks, like any other device.

Actually we had a meeting to discuss our future solution, but seems you were
on vacation at that moment :)

In that meeting we had an agreement between us and some upstream guys.

We will have such a PCI capability structure in this PCI device to represent
all information in the future. This make sens to Intel as well.

Maybe Allen or Paolo known more details.

But obviously this a long-term solution, so currently we will work with this
subsystem_id way temporarily. And this way is accepted by those guys in the
meeting.

I don't see the point really. If you are modifying the driver,

Yes, we need to modify something in the driver.

why not modify it to its final form.

What's your final form?

Avoid poking at other devices besides the passed through card.
Get everything from BAR and other registers of the card.

Allen,

Could you reply this?

As I track that email thread, seems the follows is just a way you guys
achieve a better agreement.

"

why not set the subsys vid to the RH/Quamranet/Virtio VID, so it's
obvious for the use-match?

That's exactly the suggestion.  Though upstream they might be using the
XenSource id since the patches were for Xen.

Paolo
"
Or I'm missing something?

Thanks
Tiejun

I thought the point of this work is to make existing
linux/windows drivers work. Is it or isn't it?

Yes. We need change driver as I said previously like this,

>> 2> ISA bridge's vendor_id/device_id should be emulated but then
>>
>>subsystem_vendor_id = PCI_VENDOR_ID_XEN;
>>subsystem_device_id = ISA Bridge's real device id
>>
>> This mean we need to change driver to walk with this way.
>> For example, in
>> case of Linux native driver,
>>
>> intel_detect_pch()
>> {
>>...
>>if (pch->subsystem_vendor == PCI_VENDOR_ID_XEN)
>>id = pch->subsystem_device & INTEL_PCH_DEVICE_ID_MASK;

This is a minimal change to driver as we discussed.

Wrt changing drivers, change them to behave sanely, like all other
drivers, and avoid poking at the chipset.
http://article.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/42258
Seems to suggest one way to do this.
Can what is suggested there work for existing devices?

Again, Allen,

Are you sure if this suggestion can work?

[Qemu-devel] [RFC v1 2/2] arm: boot: Add EL jump-down code for Linux

2014-08-18 Thread Peter Crosthwaite

Linux should boot in EL2 or EL1. If in EL3, jump down before handing
off to Linux.

Signed-off-by: Peter Crosthwaite 
---

 hw/arm/boot.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 840f5da..f1f6365 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -35,6 +35,7 @@ typedef enum {
 FIXUP_GIC_CPU_IF, /* overwrite with GIC CPU interface address */
 FIXUP_BOOTREG,/* overwrite with boot register address */
 FIXUP_DSB,/* overwrite with correct DSB insn for cpu */
+FIXUP_EL, /* overwrite with kernel entry EL */
 FIXUP_MAX,
 } FixupType;
 
@@ -46,6 +47,20 @@ typedef struct ARMInsnFixup {
 } ARMInsnFixup;
 
 static const ARMInsnFixup bootloader_aarch64[] = {
+{ 0xd5384240 }, /* mrs x0, currentel */
+{ 0x7100301f }, /* cmp w0, #0xc */
+{ 0x5401 + (9 << 5) }, /* b.ne ELx_start */
+/* Jump down from EL3 to ELx */
+{ 0x1001 + (9 << 5) }, /* adr x1, ELx_start */
+{ 0xd53e1100 }, /* mrs x0, scr_el3 */
+{ 0xb240 }, /* orr x0, x0, #0x1 - SCR.NS */
+{ 0xb278 }, /* orr x0, x0, #0x80 - SCR.HCE */
+{ 0xd51e1100 }, /* msr scr_el3, x0 */
+{ 0xd2807820, FIXUP_EL, 7, 2 }, /* movz x0, 0x3c1 (+ EL<<2) */
+{ 0xd51e4000 }, /* msr spsr_el3, x0 */
+{ 0xd51e4021 }, /* msr elr_el3, x1 */
+{ 0xd69f03e0 }, /* eret */
+/* ELx_start: */
 { 0x58c0 }, /* ldr x0, arg ; Load the lower 32-bits of DTB */
 { 0xaa1f03e1 }, /* mov x1, xzr */
 { 0xaa1f03e2 }, /* mov x2, xzr */
@@ -141,6 +156,7 @@ static void write_bootloader(const char *name, hwaddr addr,
 case FIXUP_GIC_CPU_IF:
 case FIXUP_BOOTREG:
 case FIXUP_DSB:
+case FIXUP_EL:
 insn = deposit32(insn, shift, length, fixupcontext[fixup]);
 break;
 default:
@@ -583,6 +599,11 @@ void arm_load_kernel(ARMCPU *cpu, struct arm_boot_info 
*info)
 }
 fixupcontext[FIXUP_ENTRYPOINT] = entry;
 
+fixupcontext[FIXUP_EL] = 1;
+if (arm_feature(&cpu->env, ARM_FEATURE_EL2)) {
+fixupcontext[FIXUP_EL] = 2;
+}
+
 write_bootloader("bootloader", info->loader_start,
  primary_loader, fixupcontext);
 
-- 
2.0.1.1.gfbfc394

[Qemu-devel] [RFC v1 1/2] arm: boot: Add partial machine code fixup

2014-08-18 Thread Peter Crosthwaite

Allow a fixup to be used to deposit a field in a machine-code
instruction. The option shift and length fixup fields indicate the
field being deposited by the fixup.

Signed-off-by: Peter Crosthwaite 
---

 hw/arm/boot.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 1241761..840f5da 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -41,6 +41,8 @@ typedef enum {
 typedef struct ARMInsnFixup {
 uint32_t insn;
 FixupType fixup;
+int shift;
+int length;
 } ARMInsnFixup;
 
 static const ARMInsnFixup bootloader_aarch64[] = {
@@ -125,6 +127,10 @@ static void write_bootloader(const char *name, hwaddr addr,
 for (i = 0; i < len; i++) {
 uint32_t insn = insns[i].insn;
 FixupType fixup = insns[i].fixup;
+int shift = insns[i].shift;
+int length = insns[i].length ? insns[i].length : 32;
+
+assert(shift + length <= 32);
 
 switch (fixup) {
 case FIXUP_NONE:
@@ -135,7 +141,7 @@ static void write_bootloader(const char *name, hwaddr addr,
 case FIXUP_GIC_CPU_IF:
 case FIXUP_BOOTREG:
 case FIXUP_DSB:
-insn = fixupcontext[fixup];
+insn = deposit32(insn, shift, length, fixupcontext[fixup]);
 break;
 default:
 abort();
-- 
2.0.1.1.gfbfc394

[Qemu-devel] [RFC v1 0/2] EL3 support for AArch64 Linux bootloader

2014-08-18 Thread Peter Crosthwaite

Hi Peter, Edgar,

These patches add bootloader support for Edgars upcomming EL2/3 work.
This allows for Linux boot from EL3 without a bootloader or wrapper.

Regards,
Peter


Peter Crosthwaite (2):
  arm: boot: Add partial machine code fixup
  arm: boot: Add EL jump-down code for Linux

 hw/arm/boot.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

-- 
2.0.1.1.gfbfc394

[Qemu-devel] [PATCH target-arm v1 1/1] arm: translate-a64: Add CPU number to Debug info

2014-08-18 Thread Peter Crosthwaite

It's very useful when debugging SMP to know who disassembly or a CPU
state dump is being done on behalf of.

Signed-off-by: Peter Crosthwaite 
---

 target-arm/translate-a64.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f04ca49..957d23f 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -141,7 +141,8 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
 cpu_fprintf(f, " ");
 }
 }
-cpu_fprintf(f, "PSTATE=%08x (flags %c%c%c%c)\n",
+cpu_fprintf(f, "CPU%d: PSTATE=%08x (flags %c%c%c%c)\n",
+cs->cpu_index,
 psr,
 psr & PSTATE_N ? 'N' : '-',
 psr & PSTATE_Z ? 'Z' : '-',
@@ -10988,7 +10989,7 @@ done_generating:
 #ifdef DEBUG_DISAS
 if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
 qemu_log("\n");
-qemu_log("IN: %s\n", lookup_symbol(pc_start));
+qemu_log("CPU: %d, IN: %s\n",  cs->cpu_index, lookup_symbol(pc_start));
 log_target_disas(env, pc_start, dc->pc - pc_start,
  4 | (dc->bswap_code << 1));
 qemu_log("\n");
-- 
2.0.1.1.gfbfc394

Re: [Qemu-devel] [RFC PATCH v2 05/13] spapr_pci: Introduce a liobn number generating macros

2014-08-18 Thread David Gibson

On Fri, Aug 15, 2014 at 08:12:27PM +1000, Alexey Kardashevskiy wrote:
> We are going to have multiple DMA windows per PHB and we want them to
> migrate so we need a predictable way of assigning LIOBNs.
> 
> This introduces a macro which makes up a LIOBN from fixed prefix,
> PHB index (unique PHB id) and window number.
> 
> This introduces a SPAPR_PCI_DMA_WINDOW_NUM() to know the window number
> from LIOBN, will be used in next patch(es) to distinguish the default
> 32bit windows from dynamic windows.
> 
> Signed-off-by: Alexey Kardashevskiy 

Looks sane enough.

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpCrTGcqbyZR.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH v2 02/13] spapr_iommu: Disable in-kernel IOMMU tables for >4GB windows

2014-08-18 Thread David Gibson

On Fri, Aug 15, 2014 at 08:12:24PM +1000, Alexey Kardashevskiy wrote:
> The existing KVM_CREATE_SPAPR_TCE ioctl only support 4G windows max.
> We are going to add huge DMA windows support so this will create small
> window and unexpectedly fail later.
> 
> This disables KVM_CREATE_SPAPR_TCE for windows bigger that 4GB. Since
> those windows are normally mapped at the boot time, there will be no
> performance impact.
> 
> Signed-off-by: Alexey Kardashevskiy 

I think perhaps the crucial point here is that the window size
parameter to the kernel ioctl() is only 32-bit, so there's no way of
expressing a TCE window > 4GB.

> ---
>  hw/ppc/spapr_iommu.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index f6e32a4..36f5d27 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -113,11 +113,11 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  static int spapr_tce_table_realize(DeviceState *dev)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
> +uint64_t window_size = tcet->nb_table << tcet->page_shift;

tcet->nb_table is only 32-bit itself, so this isn't going to work as
intended without a cast.
-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpSiy2WxxfQ6.pgp
Description: PGP signature

Re: [Qemu-devel] [RFC PATCH v2 01/13] qom: Make object_child_foreach safe for objects removal

2014-08-18 Thread David Gibson

On Fri, Aug 15, 2014 at 08:12:23PM +1000, Alexey Kardashevskiy wrote:
> Current object_child_foreach() uses QTAILQ_FOREACH() to walk
> through children and that makes children removal from the callback
> impossible.
> 
> This makes object_child_foreach() use QTAILQ_FOREACH_SAFE().
> 
> Signed-off-by: Alexey Kardashevskiy 

Seems like a good idea to me.

Reviewed-by: David Gibson 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


pgpgaoHdyTPq2.pgp
Description: PGP signature

[Qemu-devel] [PATCH 12/12] spapr_pci: emit hotplug add/remove events during hotplug

2014-08-18 Thread Michael Roth

From: Tyrel Datwyler 

This uses extension of existing EPOW interrupt/event mechanism
to notify userspace tools like librtas/drmgr to handle
in-guest configuration/cleanup operations in response to
device_add/device_del.

Userspace tools that don't implement this extension will need
to be run manually in response/advance of device_add/device_del,
respectively.

Signed-off-by: Tyrel Datwyler 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr_pci.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 23864ab..17d37c0 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -944,6 +944,18 @@ static int spapr_device_hotplug_add(DeviceState *qdev, 
PCIDevice *dev)
 drc_entry_slot->state &= ~(uint32_t)INDICATOR_ENTITY_SENSE_MASK;
 drc_entry_slot->state |= encoded; /* and the slot */
 
+/* reliable unplug requires we wait for a transition from
+ * UNISOLATED->ISOLATED prior to device removal/deletion.
+ * However, slots populated by devices at boot-time will not
+ * have ever been set by guest tools to an UNISOLATED/populated
+ * state, so set this manually in the case of coldplug devices
+ */
+if (!DEVICE(dev)->hotplugged) {
+drc_entry_slot->state |= ENCODE_DRC_STATE(1,
+  INDICATOR_ISOLATION_MASK,
+  INDICATOR_ISOLATION_SHIFT);
+}
+
 /* add OF node for pci device and required OF DT properties */
 fdt_orig = g_malloc0(FDT_MAX_SIZE);
 offset = fdt_create(fdt_orig, FDT_MAX_SIZE);
@@ -1026,13 +1038,21 @@ static void spapr_device_hotplug_remove(DeviceState 
*qdev, PCIDevice *dev)
 static void spapr_phb_hot_plug(HotplugHandler *plug_handler,
DeviceState *plugged_dev, Error **errp)
 {
+int slot = PCI_SLOT(PCI_DEVICE(plugged_dev)->devfn);
+
 spapr_device_hotplug_add(DEVICE(plug_handler), PCI_DEVICE(plugged_dev));
+if (plugged_dev->hotplugged) {
+spapr_pci_hotplug_add_event(DEVICE(plug_handler), slot);
+}
 }
 
 static void spapr_phb_hot_unplug(HotplugHandler *plug_handler,
  DeviceState *plugged_dev, Error **errp)
 {
+int slot = PCI_SLOT(PCI_DEVICE(plugged_dev)->devfn);
+
 spapr_device_hotplug_remove(DEVICE(plug_handler), PCI_DEVICE(plugged_dev));
+spapr_pci_hotplug_remove_event(DEVICE(plug_handler), slot);
 }
 
 static void spapr_phb_realize(DeviceState *dev, Error **errp)
-- 
1.9.1

[Qemu-devel] [PATCH 03/12] spapr: add helper to retrieve a PHB/device DrcEntry

2014-08-18 Thread Michael Roth

Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c | 23 +++
 include/hw/ppc/spapr.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 90b25b3..39cb0bb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -309,6 +309,29 @@ sPAPRDrcEntry *spapr_phb_to_drc_entry(uint64_t buid)
  return NULL;
 }
 
+sPAPRDrcEntry *spapr_find_drc_entry(int drc_index)
+{
+int i, j;
+
+for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+sPAPRDrcEntry *phb_entry = &spapr->drc_table[i];
+if (phb_entry->drc_index == drc_index) {
+return phb_entry;
+}
+if (phb_entry->child_entries == NULL) {
+continue;
+}
+for (j = 0; j < SPAPR_DRC_PHB_SLOT_MAX; j++) {
+sPAPRDrcEntry *entry = &phb_entry->child_entries[j];
+if (entry->drc_index == drc_index) {
+return entry;
+}
+}
+ }
+
+ return NULL;
+}
+
 static void spapr_init_drc_table(void)
 {
 int i;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index c93794b..0ac1a19 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -516,5 +516,6 @@ int spapr_tcet_dma_dt(void *fdt, int node_off, const char 
*propname,
   sPAPRTCETable *tcet);
 sPAPRDrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state);
 sPAPRDrcEntry *spapr_phb_to_drc_entry(uint64_t buid);
+sPAPRDrcEntry *spapr_find_drc_entry(int drc_index);
 
 #endif /* !defined (__HW_SPAPR_H__) */
-- 
1.9.1

[Qemu-devel] [PATCH 01/12] spapr: populate DRC entries for root dt node

2014-08-18 Thread Michael Roth

From: Nathan Fontenot 

This add entries to the root OF node to advertise our PHBs as being
DR-capable in according with PAPR specification.

Each PHB is given a name of PHB, advertised as a PHB type,
and associated with a power domain of -1 (indicating to guests that
power management is handled automatically by hardware).

We currently allocate entries for up to 32 DR-capable PHBs, though
this limit can be increased later.

DrcEntry objects to track the state of the DR-connector associated
with each PHB are stored in a 32-entry array, and each DrcEntry has
in turn have a dynamically-sized number of child DR-connectors,
which we will use later to track the state of DR-connectors
associated with a PHB's physical slots.

Signed-off-by: Nathan Fontenot 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c | 143 +
 hw/ppc/spapr_pci.c |   1 +
 include/hw/ppc/spapr.h |  35 
 3 files changed, 179 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5c92707..d5e46c3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -296,6 +296,143 @@ static hwaddr spapr_node0_size(void)
 return ram_size;
 }
 
+sPAPRDrcEntry *spapr_phb_to_drc_entry(uint64_t buid)
+{
+int i;
+
+for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+if (spapr->drc_table[i].phb_buid == buid) {
+return &spapr->drc_table[i];
+}
+ }
+
+ return NULL;
+}
+
+static void spapr_init_drc_table(void)
+{
+int i;
+
+memset(spapr->drc_table, 0, sizeof(spapr->drc_table));
+
+/* For now we only care about PHB entries */
+for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+spapr->drc_table[i].drc_index = 0x201 + i;
+}
+}
+
+sPAPRDrcEntry *spapr_add_phb_to_drc_table(uint64_t buid, uint32_t state)
+{
+sPAPRDrcEntry *empty_drc = NULL;
+sPAPRDrcEntry *found_drc = NULL;
+int i, phb_index;
+
+for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+if (spapr->drc_table[i].phb_buid == 0) {
+empty_drc = &spapr->drc_table[i];
+}
+
+if (spapr->drc_table[i].phb_buid == buid) {
+found_drc = &spapr->drc_table[i];
+break;
+}
+}
+
+if (found_drc) {
+return found_drc;
+}
+
+if (empty_drc) {
+empty_drc->phb_buid = buid;
+empty_drc->state = state;
+empty_drc->cc_state.fdt = NULL;
+empty_drc->cc_state.offset = 0;
+empty_drc->cc_state.depth = 0;
+empty_drc->cc_state.state = CC_STATE_IDLE;
+empty_drc->child_entries =
+g_malloc0(sizeof(sPAPRDrcEntry) * SPAPR_DRC_PHB_SLOT_MAX);
+phb_index = buid - SPAPR_PCI_BASE_BUID;
+for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
+empty_drc->child_entries[i].drc_index =
+SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + (i << 3);
+}
+return empty_drc;
+}
+
+return NULL;
+}
+
+static void spapr_create_drc_dt_entries(void *fdt)
+{
+char char_buf[1024];
+uint32_t int_buf[SPAPR_DRC_TABLE_SIZE + 1];
+uint32_t *entries;
+int offset, fdt_offset;
+int i, ret;
+
+fdt_offset = fdt_path_offset(fdt, "/");
+
+/* ibm,drc-indexes */
+memset(int_buf, 0, sizeof(int_buf));
+int_buf[0] = SPAPR_DRC_TABLE_SIZE;
+
+for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
+int_buf[i] = spapr->drc_table[i-1].drc_index;
+}
+
+ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-indexes", int_buf,
+  sizeof(int_buf));
+if (ret) {
+fprintf(stderr, "Couldn't finalize ibm,drc-indexes property\n");
+}
+
+/* ibm,drc-power-domains */
+memset(int_buf, 0, sizeof(int_buf));
+int_buf[0] = SPAPR_DRC_TABLE_SIZE;
+
+for (i = 1; i <= SPAPR_DRC_TABLE_SIZE; i++) {
+int_buf[i] = 0x;
+}
+
+ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-power-domains", int_buf,
+  sizeof(int_buf));
+if (ret) {
+fprintf(stderr, "Couldn't finalize ibm,drc-power-domains property\n");
+}
+
+/* ibm,drc-names */
+memset(char_buf, 0, sizeof(char_buf));
+entries = (uint32_t *)&char_buf[0];
+*entries = SPAPR_DRC_TABLE_SIZE;
+offset = sizeof(*entries);
+
+for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+offset += sprintf(char_buf + offset, "PHB %d", i + 1);
+char_buf[offset++] = '\0';
+}
+
+ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-names", char_buf, offset);
+if (ret) {
+fprintf(stderr, "Couldn't finalize ibm,drc-names property\n");
+}
+
+/* ibm,drc-types */
+memset(char_buf, 0, sizeof(char_buf));
+entries = (uint32_t *)&char_buf[0];
+*entries = SPAPR_DRC_TABLE_SIZE;
+offset = sizeof(*entries);
+
+for (i = 0; i < SPAPR_DRC_TABLE_SIZE; i++) {
+offset += sprintf(char_buf + offset, "PHB");
+char_buf[offset++] = '\0';
+}
+
+ret = fdt_setprop(fdt, fdt_offset, "ibm,drc-types", char_buf, offset);
+if (ret) {
+

[Qemu-devel] [PATCH 10/12] spapr_events: re-use EPOW event infrastructure for hotplug events

2014-08-18 Thread Michael Roth

From: Nathan Fontenot 

This extends the data structures currently used to report EPOW events to
gets via the check-exception RTAS interfaces to also include event types
for hotplug/unplug events.

This is currently undocumented and being finalized for inclusion in PAPR
specification, but we implement this here as an extension for guest
userspace tools to implement (existing guest kernels simply log these
events via a sysfs interface that's read by rtas_errd).

Signed-off-by: Nathan Fontenot 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c |   2 +-
 hw/ppc/spapr_events.c  | 215 -
 include/hw/ppc/spapr.h |   4 +-
 3 files changed, 180 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 39cb0bb..825fd31 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1706,7 +1706,7 @@ static void ppc_spapr_init(MachineState *machine)
 spapr->fdt_skel = spapr_create_fdt_skel(initrd_base, initrd_size,
 kernel_size, kernel_le,
 boot_device, kernel_cmdline,
-spapr->epow_irq);
+spapr->check_exception_irq);
 assert(spapr->fdt_skel != NULL);
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 1b6157d..c0be0e5 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -32,6 +32,8 @@
 
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
+#include "hw/pci/pci.h"
+#include "hw/pci-host/spapr.h"
 
 #include 
 
@@ -77,6 +79,7 @@ struct rtas_error_log {
 #define   RTAS_LOG_TYPE_ECC_UNCORR  0x0009
 #define   RTAS_LOG_TYPE_ECC_CORR0x000a
 #define   RTAS_LOG_TYPE_EPOW0x0040
+#define   RTAS_LOG_TYPE_HOTPLUG 0x00e5
 uint32_t extended_length;
 } QEMU_PACKED;
 
@@ -166,6 +169,38 @@ struct epow_log_full {
 struct rtas_event_log_v6_epow epow;
 } QEMU_PACKED;
 
+struct rtas_event_log_v6_hp {
+#define RTAS_LOG_V6_SECTION_ID_HOTPLUG  0x4850 /* HP */
+struct rtas_event_log_v6_section_header hdr;
+uint8_t hotplug_type;
+#define RTAS_LOG_V6_HP_TYPE_CPU  1
+#define RTAS_LOG_V6_HP_TYPE_MEMORY   2
+#define RTAS_LOG_V6_HP_TYPE_SLOT 3
+#define RTAS_LOG_V6_HP_TYPE_PHB  4
+#define RTAS_LOG_V6_HP_TYPE_PCI  5
+uint8_t hotplug_action;
+#define RTAS_LOG_V6_HP_ACTION_ADD1
+#define RTAS_LOG_V6_HP_ACTION_REMOVE 2
+uint8_t hotplug_identifier;
+#define RTAS_LOG_V6_HP_ID_DRC_NAME   1
+#define RTAS_LOG_V6_HP_ID_DRC_INDEX  2
+#define RTAS_LOG_V6_HP_ID_DRC_COUNT  3
+uint8_t reserved;
+union {
+uint32_t index;
+uint32_t count;
+char name[1];
+} drc;
+} QEMU_PACKED;
+
+struct hp_log_full {
+struct rtas_error_log hdr;
+struct rtas_event_log_v6 v6hdr;
+struct rtas_event_log_v6_maina maina;
+struct rtas_event_log_v6_mainb mainb;
+struct rtas_event_log_v6_hp hp;
+} QEMU_PACKED;
+
 #define EVENT_MASK_INTERNAL_ERRORS   0x8000
 #define EVENT_MASK_EPOW  0x4000
 #define EVENT_MASK_HOTPLUG   0x1000
@@ -181,29 +216,61 @@ struct epow_log_full {
 }  \
 } while (0)
 
-void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq)
+void spapr_events_fdt_skel(void *fdt, uint32_t check_exception_irq)
 {
-uint32_t epow_irq_ranges[] = {cpu_to_be32(epow_irq), cpu_to_be32(1)};
-uint32_t epow_interrupts[] = {cpu_to_be32(epow_irq), 0};
+uint32_t irq_ranges[] = {cpu_to_be32(check_exception_irq), cpu_to_be32(1)};
+uint32_t interrupts[] = {cpu_to_be32(check_exception_irq), 0};
 
 _FDT((fdt_begin_node(fdt, "event-sources")));
 
 _FDT((fdt_property(fdt, "interrupt-controller", NULL, 0)));
 _FDT((fdt_property_cell(fdt, "#interrupt-cells", 2)));
 _FDT((fdt_property(fdt, "interrupt-ranges",
-   epow_irq_ranges, sizeof(epow_irq_ranges;
+   irq_ranges, sizeof(irq_ranges;
 
 _FDT((fdt_begin_node(fdt, "epow-events")));
-_FDT((fdt_property(fdt, "interrupts",
-   epow_interrupts, sizeof(epow_interrupts;
+_FDT((fdt_property(fdt, "interrupts", interrupts, sizeof(interrupts;
 _FDT((fdt_end_node(fdt)));
 
 _FDT((fdt_end_node(fdt)));
 }
 
 static struct epow_log_full *pending_epow;
+static struct hp_log_full *pending_hp;
 static uint32_t next_plid;
 
+static void spapr_init_v6hdr(struct rtas_event_log_v6 *v6hdr)
+{
+v6hdr->b0 = RTAS_LOG_V6_B0_VALID | RTAS_LOG_V6_B0_NEW_LOG
+| RTAS_LOG_V6_B0_BIGENDIAN;
+v6hdr->b2 = RTAS_LOG_V6_B2_POWERPC_FORMAT
+| RTAS_LOG_V6_B2_LOG

[Qemu-devel] [PATCH 05/12] spapr_pci: add get/set-power-level RTAS interfaces

2014-08-18 Thread Michael Roth

From: Nathan Fontenot 

Signed-off-by: Nathan Fontenot 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr_pci.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 23a3477..f007dd6 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -511,6 +511,27 @@ static void rtas_set_indicator(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 }
 
+static void rtas_set_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args, uint32_t nret,
+ target_ulong rets)
+{
+/* we currently only use a single, "live insert" powerdomain for
+ * hotplugged/dlpar'd resources, so the power is always live/full (100)
+ */
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+rtas_st(rets, 1, 100);
+}
+
+static void rtas_get_power_level(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+  uint32_t token, uint32_t nargs,
+  target_ulong args, uint32_t nret,
+  target_ulong rets)
+{
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+rtas_st(rets, 1, 100);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
 return (slot + pin) % PCI_NUM_PINS;
@@ -1175,6 +1196,10 @@ void spapr_pci_rtas_init(void)
 }
 spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator",
 rtas_set_indicator);
+spapr_rtas_register(RTAS_SET_POWER_LEVEL, "set-power-level",
+rtas_set_power_level);
+spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
+rtas_get_power_level);
 }
 
 static void spapr_pci_register_types(void)
-- 
1.9.1

[Qemu-devel] [PATCH 08/12] pci: allow 0 address for PCI IO regions

2014-08-18 Thread Michael Roth

Some kernels program a 0 address for io regions. PCI 3.0 spec
section 6.2.5.1 doesn't seem to disallow this.

Signed-off-by: Michael Roth 
---
 hw/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 351d320..9578749 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1035,7 +1035,7 @@ static pcibus_t pci_bar_address(PCIDevice *d,
 /* Check if 32 bit BAR wraps around explicitly.
  * TODO: make priorities correct and remove this work around.
  */
-if (last_addr <= new_addr || new_addr == 0 || last_addr >= UINT32_MAX) 
{
+if (last_addr <= new_addr || last_addr >= UINT32_MAX) {
 return PCI_BAR_UNMAPPED;
 }
 return new_addr;
-- 
1.9.1

[Qemu-devel] [PATCH 11/12] spapr_events: event-scan RTAS interface

2014-08-18 Thread Michael Roth

From: Tyrel Datwyler 

We don't actually rely on this interface to surface hotplug events, and
instead rely on the similar-but-interrupt-driven check-exception RTAS
interface used for EPOW events. However, the existence of this interface
is needed to ensure guest kernels initialize the event-reporting
interfaces which will in turn be used by userspace tools to handle these
events, so we implement this interface as a stub.

Signed-off-by: Tyrel Datwyler 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c | 1 +
 hw/ppc/spapr_events.c  | 9 +
 include/hw/ppc/spapr.h | 2 ++
 3 files changed, 12 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 825fd31..c65b13a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -702,6 +702,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
 refpoints, sizeof(refpoints;
 
 _FDT((fdt_property_cell(fdt, "rtas-error-log-max", RTAS_ERROR_LOG_MAX)));
+_FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", 
RTAS_EVENT_SCAN_RATE)));
 
 /*
  * According to PAPR, rtas ibm,os-term, does not gaurantee a return
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index c0be0e5..bb80080 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -449,6 +449,14 @@ static void check_exception(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 }
 }
 
+static void event_scan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+uint32_t token, uint32_t nargs,
+target_ulong args,
+uint32_t nret, target_ulong rets)
+{
+rtas_st(rets, 0, 1); /* no error events found */
+}
+
 void spapr_events_init(sPAPREnvironment *spapr)
 {
 spapr->check_exception_irq = xics_alloc(spapr->icp, 0, 0, false);
@@ -456,4 +464,5 @@ void spapr_events_init(sPAPREnvironment *spapr)
 qemu_register_powerdown_notifier(&spapr->epow_notifier);
 spapr_rtas_register(RTAS_CHECK_EXCEPTION, "check-exception",
 check_exception);
+spapr_rtas_register(RTAS_EVENT_SCAN, "event-scan", event_scan);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5382bf1..aab627f 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -484,6 +484,8 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr 
rtas_addr,
 
 #define RTAS_ERROR_LOG_MAX  2048
 
+#define RTAS_EVENT_SCAN_RATE1
+
 typedef struct sPAPRTCETable sPAPRTCETable;
 
 #define TYPE_SPAPR_TCE_TABLE "spapr-tce-table"
-- 
1.9.1

[Qemu-devel] [PATCH 06/12] spapr_pci: add get-sensor-state RTAS interface

2014-08-18 Thread Michael Roth

From: Mike Day 

Signed-off-by: Mike Day 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr_pci.c | 76 ++
 1 file changed, 76 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f007dd6..8d1351d 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -66,6 +66,7 @@
 #define INDICATOR_DR_MASK   0x00e0   /* 9002 three bits */
 #define INDICATOR_ALLOCATION_MASK   0x0300   /* 9003 two bits */
 #define INDICATOR_EPOW_MASK 0x1c00   /* 9 three bits */
+#define INDICATOR_ENTITY_SENSE_MASK 0xe000   /* 9003 three bits */
 
 #define INDICATOR_ISOLATION_SHIFT   0x00 /* bit 0 */
 #define INDICATOR_GLOBAL_INTERRUPT_SHIFT0x01 /* bit 1 */
@@ -75,6 +76,10 @@
 #define INDICATOR_DR_SHIFT  0x05 /* bits 5-7 */
 #define INDICATOR_ALLOCATION_SHIFT  0x08 /* bits 8-9 */
 #define INDICATOR_EPOW_SHIFT0x0a /* bits 10-12 */
+#define INDICATOR_ENTITY_SENSE_SHIFT0x0d /* bits 13-15 */
+
+#define INDICATOR_ENTITY_SENSE_EMPTY0
+#define INDICATOR_ENTITY_SENSE_PRESENT  1
 
 #define DECODE_DRC_STATE(state, m, s)  \
 uint32_t)(state) & (uint32_t)(m))) >> (s))
@@ -532,6 +537,75 @@ static void rtas_get_power_level(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 rtas_st(rets, 1, 100);
 }
 
+static void rtas_get_sensor_state(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+  uint32_t token, uint32_t nargs,
+  target_ulong args, uint32_t nret,
+  target_ulong rets)
+{
+uint32_t sensor = rtas_ld(args, 0);
+uint32_t drc_index = rtas_ld(args, 1);
+uint32_t sensor_state = 0, decoded = 0;
+uint32_t shift = 0, mask = 0;
+sPAPRDrcEntry *drc_entry = NULL;
+
+if (drc_index == 0) {  /* platform state sensor/indicator */
+sensor_state = spapr->state;
+} else { /* we should have a drc entry */
+drc_entry = spapr_find_drc_entry(drc_index);
+if (!drc_entry) {
+DPRINTF("unable to find DRC entry for index %x", drc_index);
+sensor_state = 0; /* empty */
+rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+return;
+}
+sensor_state = drc_entry->state;
+}
+switch (sensor) {
+case 9:  /* EPOW */
+shift = INDICATOR_EPOW_SHIFT;
+mask = INDICATOR_EPOW_MASK;
+break;
+case 9001: /* Isolation state */
+/* encode the new value into the correct bit field */
+shift = INDICATOR_ISOLATION_SHIFT;
+mask = INDICATOR_ISOLATION_MASK;
+break;
+case 9002: /* DR */
+shift = INDICATOR_DR_SHIFT;
+mask = INDICATOR_DR_MASK;
+break;
+case 9003: /* entity sense */
+shift = INDICATOR_ENTITY_SENSE_SHIFT;
+mask = INDICATOR_ENTITY_SENSE_MASK;
+break;
+case 9005: /* global interrupt */
+shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT;
+mask = INDICATOR_GLOBAL_INTERRUPT_MASK;
+break;
+case 9006: /* error log */
+shift = INDICATOR_ERROR_LOG_SHIFT;
+mask = INDICATOR_ERROR_LOG_MASK;
+break;
+case 9007: /* identify */
+shift = INDICATOR_IDENTIFY_SHIFT;
+mask = INDICATOR_IDENTIFY_MASK;
+break;
+case 9009: /* reset */
+shift = INDICATOR_RESET_SHIFT;
+mask = INDICATOR_RESET_MASK;
+break;
+default:
+DPRINTF("rtas_get_sensor_state: sensor not implemented: %d",
+sensor);
+rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+return;
+}
+
+decoded = DECODE_DRC_STATE(sensor_state, mask, shift);
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+rtas_st(rets, 1, decoded);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
 return (slot + pin) % PCI_NUM_PINS;
@@ -1200,6 +1274,8 @@ void spapr_pci_rtas_init(void)
 rtas_set_power_level);
 spapr_rtas_register(RTAS_GET_POWER_LEVEL, "get-power-level",
 rtas_get_power_level);
+spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
+rtas_get_sensor_state);
 }
 
 static void spapr_pci_register_types(void)
-- 
1.9.1

[Qemu-devel] [PATCH 09/12] spapr_pci: enable basic hotplug operations

2014-08-18 Thread Michael Roth

This enables hotplug for PHB bridges. Upon hotplug we generate the
OF-nodes required by PAPR specification and IEEE 1275-1994
"PCI Bus Binding to Open Firmware" for the device.

We associate the corresponding FDT for these nodes with the DrcEntry
corresponding to the slot, which will be fetched via
ibm,configure-connector RTAS calls by the guest as described by PAPR
specification. The FDT is cleaned up in the case of unplug.

Signed-off-by: Michael Roth 
---
 hw/ppc/spapr_pci.c | 235 +++--
 include/hw/ppc/spapr.h |   1 +
 2 files changed, 228 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 96a57be..23864ab 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -87,6 +87,17 @@
 #define ENCODE_DRC_STATE(val, m, s) \
 (((uint32_t)(val) << (s)) & (uint32_t)(m))
 
+#define FDT_MAX_SIZE0x1
+#define _FDT(exp) \
+do { \
+int ret = (exp);   \
+if (ret < 0) { \
+return ret;\
+}  \
+} while (0)
+
+static void spapr_drc_state_reset(sPAPRDrcEntry *drc_entry);
+
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
 sPAPRPHBState *sphb;
@@ -476,6 +487,22 @@ static void rtas_set_indicator(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 /* encode the new value into the correct bit field */
 shift = INDICATOR_ISOLATION_SHIFT;
 mask = INDICATOR_ISOLATION_MASK;
+if (drc_entry) {
+/* transition from unisolated to isolated for a hotplug slot
+ * entails completion of guest-side device unplug/cleanup, so
+ * we can now safely remove the device if qemu is waiting for
+ * it to be released
+ */
+if (DECODE_DRC_STATE(*pind, mask, shift) != indicator_state) {
+if (indicator_state == 0 && drc_entry->awaiting_release) {
+/* device_del has been called and host is waiting
+ * for guest to release/isolate device, go ahead
+ * and remove it now
+ */
+spapr_drc_state_reset(drc_entry);
+}
+}
+}
 break;
 case 9002: /* DR */
 shift = INDICATOR_DR_SHIFT;
@@ -816,6 +843,198 @@ static AddressSpace *spapr_pci_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 return &phb->iommu_as;
 }
 
+static int spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, int offset,
+   int phb_index)
+{
+int slot = PCI_SLOT(dev->devfn);
+char slotname[16];
+bool is_bridge = 1;
+sPAPRDrcEntry *drc_entry, *drc_entry_slot;
+uint32_t reg[RESOURCE_CELLS_TOTAL * 8] = { 0 };
+uint32_t assigned[RESOURCE_CELLS_TOTAL * 8] = { 0 };
+int reg_size, assigned_size;
+
+drc_entry = spapr_phb_to_drc_entry(phb_index + SPAPR_PCI_BASE_BUID);
+g_assert(drc_entry);
+drc_entry_slot = &drc_entry->child_entries[slot];
+
+if (pci_default_read_config(dev, PCI_HEADER_TYPE, 1) ==
+PCI_HEADER_TYPE_NORMAL) {
+is_bridge = 0;
+}
+
+_FDT(fdt_setprop_cell(fdt, offset, "vendor-id",
+  pci_default_read_config(dev, PCI_VENDOR_ID, 2)));
+_FDT(fdt_setprop_cell(fdt, offset, "device-id",
+  pci_default_read_config(dev, PCI_DEVICE_ID, 2)));
+_FDT(fdt_setprop_cell(fdt, offset, "revision-id",
+  pci_default_read_config(dev, PCI_REVISION_ID, 1)));
+_FDT(fdt_setprop_cell(fdt, offset, "class-code",
+  pci_default_read_config(dev, PCI_CLASS_DEVICE, 2) << 
8));
+
+_FDT(fdt_setprop_cell(fdt, offset, "interrupts",
+  pci_default_read_config(dev, PCI_INTERRUPT_PIN, 1)));
+
+/* if this device is NOT a bridge */
+if (!is_bridge) {
+_FDT(fdt_setprop_cell(fdt, offset, "min-grant",
+pci_default_read_config(dev, PCI_MIN_GNT, 1)));
+_FDT(fdt_setprop_cell(fdt, offset, "max-latency",
+pci_default_read_config(dev, PCI_MAX_LAT, 1)));
+_FDT(fdt_setprop_cell(fdt, offset, "subsystem-id",
+pci_default_read_config(dev, PCI_SUBSYSTEM_ID, 2)));
+_FDT(fdt_setprop_cell(fdt, offset, "subsystem-vendor-id",
+pci_default_read_config(dev, PCI_SUBSYSTEM_VENDOR_ID, 2)));
+}
+
+_FDT(fdt_setprop_cell(fdt, offset, "cache-line-size",
+pci_default_read_config(dev, PCI_CACHE_LINE_SIZE, 1)));
+
+/* the following fdt cells are masked off the pci status register */
+int pci_status = pci_default_read_config(dev, PCI_STATUS, 2);
+_FDT(fdt_setprop_cell(fdt, offset, "devsel-speed",
+  PCI_STATUS_DEVSEL_MASK & pci_status));
+_FDT(fdt_setprop_cell(fd

[Qemu-devel] [PATCH 07/12] spapr_pci: add ibm, configure-connector RTAS interface

2014-08-18 Thread Michael Roth

Signed-off-by: Michael Roth 
---
 hw/ppc/spapr_pci.c | 111 +
 1 file changed, 111 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 8d1351d..96a57be 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -606,6 +606,115 @@ static void rtas_get_sensor_state(PowerPCCPU *cpu, 
sPAPREnvironment *spapr,
 rtas_st(rets, 1, decoded);
 }
 
+/* configure-connector work area offsets, int32_t units */
+#define CC_IDX_NODE_NAME_OFFSET 2
+#define CC_IDX_PROP_NAME_OFFSET 2
+#define CC_IDX_PROP_LEN 3
+#define CC_IDX_PROP_DATA_OFFSET 4
+
+#define CC_VAL_DATA_OFFSET ((CC_IDX_PROP_DATA_OFFSET + 1) * 4)
+#define CC_RET_NEXT_SIB 1
+#define CC_RET_NEXT_CHILD 2
+#define CC_RET_NEXT_PROPERTY 3
+#define CC_RET_PREV_PARENT 4
+#define CC_RET_ERROR RTAS_OUT_HW_ERROR
+#define CC_RET_SUCCESS RTAS_OUT_SUCCESS
+
+static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
+ sPAPREnvironment *spapr,
+ uint32_t token, uint32_t nargs,
+ target_ulong args, uint32_t nret,
+ target_ulong rets)
+{
+uint64_t wa_addr = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 0);
+sPAPRDrcEntry *drc_entry = NULL;
+sPAPRConfigureConnectorState *ccs;
+void *wa_buf;
+int32_t *wa_buf_int;
+hwaddr map_len = 0x1024;
+uint32_t drc_index;
+int rc = 0, next_offset, tag, prop_len, node_name_len;
+const struct fdt_property *prop;
+const char *node_name, *prop_name;
+
+wa_buf = cpu_physical_memory_map(wa_addr, &map_len, 1);
+if (!wa_buf) {
+rc = CC_RET_ERROR;
+goto error_exit;
+}
+wa_buf_int = wa_buf;
+
+drc_index = *(uint32_t *)wa_buf;
+drc_entry = spapr_find_drc_entry(drc_index);
+if (!drc_entry) {
+rc = -1;
+goto error_exit;
+}
+
+ccs = &drc_entry->cc_state;
+if (ccs->state == CC_STATE_PENDING) {
+/* fdt should've been been attached to drc_entry during
+ * realize/hotplug
+ */
+g_assert(ccs->fdt);
+ccs->depth = 0;
+ccs->offset = ccs->offset_start;
+ccs->state = CC_STATE_ACTIVE;
+}
+
+if (ccs->state == CC_STATE_IDLE) {
+rc = -1;
+goto error_exit;
+}
+
+retry:
+tag = fdt_next_tag(ccs->fdt, ccs->offset, &next_offset);
+
+switch (tag) {
+case FDT_BEGIN_NODE:
+ccs->depth++;
+node_name = fdt_get_name(ccs->fdt, ccs->offset, &node_name_len);
+wa_buf_int[CC_IDX_NODE_NAME_OFFSET] = CC_VAL_DATA_OFFSET;
+strcpy(wa_buf + wa_buf_int[CC_IDX_NODE_NAME_OFFSET], node_name);
+rc = CC_RET_NEXT_CHILD;
+break;
+case FDT_END_NODE:
+ccs->depth--;
+if (ccs->depth == 0) {
+/* reached the end of top-level node, declare success */
+ccs->state = CC_STATE_PENDING;
+rc = CC_RET_SUCCESS;
+} else {
+rc = CC_RET_PREV_PARENT;
+}
+break;
+case FDT_PROP:
+prop = fdt_get_property_by_offset(ccs->fdt, ccs->offset, &prop_len);
+prop_name = fdt_string(ccs->fdt, fdt32_to_cpu(prop->nameoff));
+wa_buf_int[CC_IDX_PROP_NAME_OFFSET] = CC_VAL_DATA_OFFSET;
+wa_buf_int[CC_IDX_PROP_LEN] = prop_len;
+wa_buf_int[CC_IDX_PROP_DATA_OFFSET] =
+CC_VAL_DATA_OFFSET + strlen(prop_name) + 1;
+strcpy(wa_buf + wa_buf_int[CC_IDX_PROP_NAME_OFFSET], prop_name);
+memcpy(wa_buf + wa_buf_int[CC_IDX_PROP_DATA_OFFSET],
+   prop->data, prop_len);
+rc = CC_RET_NEXT_PROPERTY;
+break;
+case FDT_END:
+rc = CC_RET_ERROR;
+break;
+default:
+ccs->offset = next_offset;
+goto retry;
+}
+
+ccs->offset = next_offset;
+
+error_exit:
+cpu_physical_memory_unmap(wa_buf, 0x1024, 1, 0x1024);
+rtas_st(rets, 0, rc);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {
 return (slot + pin) % PCI_NUM_PINS;
@@ -1276,6 +1385,8 @@ void spapr_pci_rtas_init(void)
 rtas_get_power_level);
 spapr_rtas_register(RTAS_GET_SENSOR_STATE, "get-sensor-state",
 rtas_get_sensor_state);
+spapr_rtas_register(RTAS_IBM_CONFIGURE_CONNECTOR, 
"ibm,configure-connector",
+rtas_ibm_configure_connector);
 }
 
 static void spapr_pci_register_types(void)
-- 
1.9.1

[Qemu-devel] [PATCH 04/12] spapr_pci: add set-indicator RTAS interface

2014-08-18 Thread Michael Roth

From: Mike Day 

Signed-off-by: Mike Day 
Signed-off-by: Michael Roth 
---
 hw/ppc/spapr_pci.c | 119 +
 include/hw/ppc/spapr.h |   3 ++
 2 files changed, 122 insertions(+)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 924d488..23a3477 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -36,6 +36,16 @@
 
 #include "hw/pci/pci_bus.h"
 
+/* #define DEBUG_SPAPR */
+
+#ifdef DEBUG_SPAPR
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) \
+do { } while (0)
+#endif
+
 /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */
 #define RTAS_QUERY_FN   0
 #define RTAS_CHANGE_FN  1
@@ -47,6 +57,31 @@
 #define RTAS_TYPE_MSI   1
 #define RTAS_TYPE_MSIX  2
 
+/* For set-indicator RTAS interface */
+#define INDICATOR_ISOLATION_MASK0x0001   /* 9001 one bit */
+#define INDICATOR_GLOBAL_INTERRUPT_MASK 0x0002   /* 9005 one bit */
+#define INDICATOR_ERROR_LOG_MASK0x0004   /* 9006 one bit */
+#define INDICATOR_IDENTIFY_MASK 0x0008   /* 9007 one bit */
+#define INDICATOR_RESET_MASK0x0010   /* 9009 one bit */
+#define INDICATOR_DR_MASK   0x00e0   /* 9002 three bits */
+#define INDICATOR_ALLOCATION_MASK   0x0300   /* 9003 two bits */
+#define INDICATOR_EPOW_MASK 0x1c00   /* 9 three bits */
+
+#define INDICATOR_ISOLATION_SHIFT   0x00 /* bit 0 */
+#define INDICATOR_GLOBAL_INTERRUPT_SHIFT0x01 /* bit 1 */
+#define INDICATOR_ERROR_LOG_SHIFT   0x02 /* bit 2 */
+#define INDICATOR_IDENTIFY_SHIFT0x03 /* bit 3 */
+#define INDICATOR_RESET_SHIFT   0x04 /* bit 4 */
+#define INDICATOR_DR_SHIFT  0x05 /* bits 5-7 */
+#define INDICATOR_ALLOCATION_SHIFT  0x08 /* bits 8-9 */
+#define INDICATOR_EPOW_SHIFT0x0a /* bits 10-12 */
+
+#define DECODE_DRC_STATE(state, m, s)  \
+uint32_t)(state) & (uint32_t)(m))) >> (s))
+
+#define ENCODE_DRC_STATE(val, m, s) \
+(((uint32_t)(val) << (s)) & (uint32_t)(m))
+
 static sPAPRPHBState *find_phb(sPAPREnvironment *spapr, uint64_t buid)
 {
 sPAPRPHBState *sphb;
@@ -402,6 +437,80 @@ static void 
rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
 rtas_st(rets, 2, 1);/* 0 == level; 1 == edge */
 }
 
+static void rtas_set_indicator(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+   uint32_t token, uint32_t nargs,
+   target_ulong args, uint32_t nret,
+   target_ulong rets)
+{
+uint32_t indicator = rtas_ld(args, 0);
+uint32_t drc_index = rtas_ld(args, 1);
+uint32_t indicator_state = rtas_ld(args, 2);
+uint32_t encoded = 0, shift = 0, mask = 0;
+uint32_t *pind;
+sPAPRDrcEntry *drc_entry = NULL;
+
+if (drc_index == 0) { /* platform indicator */
+pind = &spapr->state;
+} else {
+drc_entry = spapr_find_drc_entry(drc_index);
+if (!drc_entry) {
+DPRINTF("rtas_set_indicator: unable to find drc_entry for %x",
+drc_index);
+rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+return;
+}
+pind = &drc_entry->state;
+}
+
+switch (indicator) {
+case 9:  /* EPOW */
+shift = INDICATOR_EPOW_SHIFT;
+mask = INDICATOR_EPOW_MASK;
+break;
+case 9001: /* Isolation state */
+/* encode the new value into the correct bit field */
+shift = INDICATOR_ISOLATION_SHIFT;
+mask = INDICATOR_ISOLATION_MASK;
+break;
+case 9002: /* DR */
+shift = INDICATOR_DR_SHIFT;
+mask = INDICATOR_DR_MASK;
+break;
+case 9003: /* Allocation State */
+shift = INDICATOR_ALLOCATION_SHIFT;
+mask = INDICATOR_ALLOCATION_MASK;
+break;
+case 9005: /* global interrupt */
+shift = INDICATOR_GLOBAL_INTERRUPT_SHIFT;
+mask = INDICATOR_GLOBAL_INTERRUPT_MASK;
+break;
+case 9006: /* error log */
+shift = INDICATOR_ERROR_LOG_SHIFT;
+mask = INDICATOR_ERROR_LOG_MASK;
+break;
+case 9007: /* identify */
+shift = INDICATOR_IDENTIFY_SHIFT;
+mask = INDICATOR_IDENTIFY_MASK;
+break;
+case 9009: /* reset */
+shift = INDICATOR_RESET_SHIFT;
+mask = INDICATOR_RESET_MASK;
+break;
+default:
+DPRINTF("rtas_set_indicator: indicator not implemented: %d",
+indicator);
+rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
+return;
+}
+
+encoded = ENCODE_DRC_STATE(indicator_state, mask, shift);
+/* clear the current indicator value */
+*pind &= ~mask;
+/* set the new value */
+*pind |= encoded;
+rtas_st(rets, 0, RTAS_OUT_SUCCESS);
+}
+
 static int pci_spapr_swizzle(int slot, int pin)
 {

[Qemu-devel] [PATCH v3 00/12] spapr: add support for pci hotplug

2014-08-18 Thread Michael Roth

These patches are based on ppc-next, and can also be obtained from:

https://github.com/mdroth/qemu/commits/spapr-pci-hotplug-v3-ppc-next

v3:
 * dropped emulation of firmware-managed BAR allocation. this will be
   introduced via a follow-up series via a -machine flag and tied to
   a separate hotplug event to avoid a race condition with guest vs.
   "firmware"-managed BAR allocation, in conjunction with required
   fixes to rpaphp hotplug kernel module to utilize this mode.
 * moved drc_table into sPAPREnvironment (Alexey)
 * moved INDICATOR_* constants and friends into spapr_pci.c (Alexey)
 * use prefixes for global types (DrcEntry/ConfigureConnectorState) (Alexey)
 * updated for new hotplug interface (Alexey)
 * fixed get-power-level to report current power-level rather than
   desired (Alexey)
 * rebased to latest ppc-next

v2:
  * re-ordered patches to fix build bisectability (Alexey)
  * replaced g_warning with DPRINTF in RTAS calls for guest errors (Alexey)
  * replaced g_warning with fprintf for qemu errors (Alexey)
  * updated RTAS calls to use pre-existing error/success macros (Alexey)
  * replaced DR_*/SENSOR_* macros with INDICATOR_* for set-indicator/
get-sensor-state (Alexey)

OVERVIEW

These patches add support for PCI hotplug for SPAPR guests. We advertise
each PHB as DR-capable (as defined by PAPR 13.5/13.6) with 32 hotpluggable
PCI slots per PHB, which models a standard PCI expansion device for Power
machines where the DRC name/loc-code/index for each slot are generated
based on bus/slot number.

This is compatible with existing guest kernel's via the rpaphp hotplug
module, and existing userspace tools such as drmgr/librtas/rtas_errd for
managing devices, in theory...

NOTES / ADDITIONAL DEPENDENCIES

This series relies on v1.2.19 or later of powerppc-utils (drmgr, rtas_errd,
ppc64-diag, and librtas components, specificially), which will automate
guest-side hotplug setup in response to an EPOW event emitted by QEMU. For
guests with older versions of powerpc-utils, a manual workaround must be
used (documented below).

PATCH LAYOUT

Patches
1-3   advertise PHBs and associated slots as hotpluggable to guests
4-7   add RTAS interfaces required for device configuration
8 fix for ppc (and other) guests that allocate IO bars starting
  at 0x0
9 enables device_add/device_del for spapr machines and
  guest-driven hotplug
10-12 define hotplug event structure and emit them in response to
  device_add/device_del

USAGE

For guests with powerpc-utils 1.2.19+:
  hotplug:
qemu:
  device_add e1000,id=slot0
  unplug:
qemu:
  device_del slot0

For guests with powerpc-utils prior to 1.2.19:
  hotplug:
qemu:
  device_add e1000,id=slot0
guest:
  drmgr -c pci -s "Slot 0" -n -a
  echo 1 >/sys/bus/pci/rescan
  unplug:
guest:
  drmgr -c pci -s "Slot 0" -n -r
  echo 1 >/sys/bus/pci/devices/:00:00.0/remove
qemu:
  device_del slot0

 hw/pci/pci.c|   2 +-
 hw/ppc/spapr.c  | 172 +-
 hw/ppc/spapr_events.c   | 224 
 hw/ppc/spapr_pci.c  | 689 
+--
 include/hw/pci-host/spapr.h |   1 +
 include/hw/ppc/spapr.h  |  46 -
 6 files changed, 1083 insertions(+), 51 deletions(-)

[Qemu-devel] [PATCH 02/12] spapr_pci: populate DRC dt entries for PHBs

2014-08-18 Thread Michael Roth

Reserve 32 entries of type PCI in each PHB's initial FDT. This
advertises to guests that each PHB is DR-capable device with
physical hotpluggable slots. This is necessary for allowing
hotplugging of devices to it later via bus rescan or guest rpaphp
hotplug module.

Each entry is assigned a name of "Slot <*32 +1>",
advertised as a hotpluggable PCI slot, and assigned to power domain
-1 to indicate to the guest that power management is handled by the
hardware.

This models a DR-capable PCI expansion device attached to a host/lpar
via a single PHB with 32 physical hotpluggable slots (as opposed to a
virtual bridge device with external management console). Hotplug will
be handled by the guest via bus rescan or the rpaphp hotplug module.

Signed-off-by: Michael Roth 
---
 hw/ppc/spapr.c  |   3 +-
 hw/ppc/spapr_pci.c  | 102 
 include/hw/pci-host/spapr.h |   1 +
 3 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d5e46c3..90b25b3 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -890,7 +890,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
 QLIST_FOREACH(phb, &spapr->phbs, list) {
 drc_entry = spapr_phb_to_drc_entry(phb->buid);
 g_assert(drc_entry);
-ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt);
+ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, drc_entry->drc_index,
+fdt);
 }
 
 if (ret < 0) {
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e85134f..924d488 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -851,8 +851,104 @@ static int spapr_phb_children_dt(Object *child, void 
*opaque)
 return 1;
 }
 
+static void spapr_create_drc_phb_dt_entries(void *fdt, int bus_off, int 
phb_index)
+{
+char char_buf[1024];
+uint32_t int_buf[SPAPR_DRC_PHB_SLOT_MAX + 1];
+uint32_t *entries;
+int i, ret, offset;
+
+/* ibm,drc-indexes */
+memset(int_buf, 0 , sizeof(int_buf));
+int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+for (i = 1; i <= SPAPR_DRC_PHB_SLOT_MAX; i++) {
+int_buf[i] = SPAPR_DRC_DEV_ID_BASE + (phb_index << 8) + ((i - 1) << 3);
+}
+
+ret = fdt_setprop(fdt, bus_off, "ibm,drc-indexes", int_buf,
+  sizeof(int_buf));
+if (ret) {
+fprintf(stderr, "error adding 'ibm,drc-indexes' field for PHB FDT");
+}
+
+/* ibm,drc-power-domains */
+memset(int_buf, 0, sizeof(int_buf));
+int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+for (i = 1; i <= SPAPR_DRC_PHB_SLOT_MAX; i++) {
+int_buf[i] = 0x;
+}
+
+ret = fdt_setprop(fdt, bus_off, "ibm,drc-power-domains", int_buf,
+  sizeof(int_buf));
+if (ret) {
+fprintf(stderr,
+"error adding 'ibm,drc-power-domains' field for PHB FDT");
+}
+
+/* ibm,drc-names */
+memset(char_buf, 0, sizeof(char_buf));
+entries = (uint32_t *)&char_buf[0];
+*entries = SPAPR_DRC_PHB_SLOT_MAX;
+offset = sizeof(*entries);
+
+for (i = 1; i <= SPAPR_DRC_PHB_SLOT_MAX; i++) {
+offset += sprintf(char_buf + offset, "Slot %d",
+  (phb_index * SPAPR_DRC_PHB_SLOT_MAX) + i - 1);
+char_buf[offset++] = '\0';
+}
+
+ret = fdt_setprop(fdt, bus_off, "ibm,drc-names", char_buf, offset);
+if (ret) {
+fprintf(stderr, "error adding 'ibm,drc-names' field for PHB FDT");
+}
+
+/* ibm,drc-types */
+memset(char_buf, 0, sizeof(char_buf));
+entries = (uint32_t *)&char_buf[0];
+*entries = SPAPR_DRC_PHB_SLOT_MAX;
+offset = sizeof(*entries);
+
+for (i = 0; i < SPAPR_DRC_PHB_SLOT_MAX; i++) {
+offset += sprintf(char_buf + offset, "28");
+char_buf[offset++] = '\0';
+}
+
+ret = fdt_setprop(fdt, bus_off, "ibm,drc-types", char_buf, offset);
+if (ret) {
+fprintf(stderr, "error adding 'ibm,drc-types' field for PHB FDT");
+}
+
+/* we want the initial indicator state to be 0 - "empty", when we
+ * hot-plug an adaptor in the slot, we need to set the indicator
+ * to 1 - "present."
+ */
+
+/* ibm,indicator-9003 */
+memset(int_buf, 0, sizeof(int_buf));
+int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+ret = fdt_setprop(fdt, bus_off, "ibm,indicator-9003", int_buf,
+  sizeof(int_buf));
+if (ret) {
+fprintf(stderr, "error adding 'ibm,indicator-9003' field for PHB FDT");
+}
+
+/* ibm,sensor-9003 */
+memset(int_buf, 0, sizeof(int_buf));
+int_buf[0] = SPAPR_DRC_PHB_SLOT_MAX;
+
+ret = fdt_setprop(fdt, bus_off, "ibm,sensor-9003", int_buf,
+  sizeof(int_buf));
+if (ret) {
+fprintf(stderr, "error adding 'ibm,sensor-9003' field for PHB FDT");
+}
+}
+
 int spapr_populate_pci_dt(sPAPRPHBState *phb,
   uint32_t xics_phandle,
+  uint32_t drc_index,
   void

Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation

2014-08-18 Thread Joel Schopp


On 08/18/2014 05:11 PM, Peter Maydell wrote:
> On 18 August 2014 22:54, Joel Schopp  wrote:
>> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
>> +{
>> +PlatformDevtreeData *data = opaque;
>> +void *fdt = data->fdt;
>> +const char *parent_node = data->node;
>> +int compat_str_len;
>> +char *nodename;
>> +int i, ret;
>> +uint32_t *irq_attr;
>> +uint64_t *reg_attr;
>> +uint64_t mmio_base;
>> +uint64_t irq_number;
>> +gchar mmio_base_prop[8];
>> +gchar irq_number_prop[8];
>> +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
>> +VFIODevice *vbasedev = &vdev->vbasedev;
>> +Object *obj = OBJECT(sbdev);
>> +
>> +mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
>> +
>> +nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
>> +   vbasedev->name,
>> +   mmio_base);
>> +
>> +qemu_fdt_add_subnode(fdt, nodename);
>> +
>> +compat_str_len = strlen(vdev->compat) + 1;
> At this point you've already substituted the NULs in,
> so you can't call strlen(), I think.
>
>> +qemu_fdt_setprop(fdt, nodename, "compatible",
>> +vdev->compat, compat_str_len);
>> +
>> +reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
>> +
>> +for (i = 0; i < vbasedev->num_regions; i++) {
>> +snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
>> +mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
>> +reg_attr[2*i] = 1;
>> +reg_attr[2*i+1] = mmio_base;
>> +reg_attr[2*i+2] = 1;
>> +reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
>> +}
>>
>> This should be 4 instead of 2.
>> Also, to support 64 bit systems I think this should be 2 instead of 1.
> Actually it depends entirely on what the board has done to
> create the device tree node that we're inserting this child
> node into. For ARM boot.c sets both #address-cells and
> #size-cells to 2 regardless of whether the system is 32
> or 64 bits, for simplicity. I imagine PPC does something
> different. If we're editing a dtb that the user passed in (which
> I think would be pretty lunatic so we shouldn't do this)
> we'd have to actually walk the dtb to try to figure out what
> the semantics of the reg property should be.
For the index [2*i],[2*i+1], etc is clearly a bug as when i = 1 it will
overwrite two of the values.  Changing that to [4*i],[4*i+1],etc fixes it.

I think you are right on the size.  I also wonder if the user doesn't
pass in a dtb if qemu should try to recreate the device-tree entry from
the platform device entry in the host kernel?  If so would that best be
done by recreating the values from /proc/device-tree ?

I also wish that qemu had a flag to output the generated dtb to a file
much like lkvm (kvmtool) has.

Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation

2014-08-18 Thread Peter Maydell

On 18 August 2014 22:54, Joel Schopp  wrote:
>
> +static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
> +{
> +PlatformDevtreeData *data = opaque;
> +void *fdt = data->fdt;
> +const char *parent_node = data->node;
> +int compat_str_len;
> +char *nodename;
> +int i, ret;
> +uint32_t *irq_attr;
> +uint64_t *reg_attr;
> +uint64_t mmio_base;
> +uint64_t irq_number;
> +gchar mmio_base_prop[8];
> +gchar irq_number_prop[8];
> +VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
> +VFIODevice *vbasedev = &vdev->vbasedev;
> +Object *obj = OBJECT(sbdev);
> +
> +mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
> +
> +nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
> +   vbasedev->name,
> +   mmio_base);
> +
> +qemu_fdt_add_subnode(fdt, nodename);
> +
> +compat_str_len = strlen(vdev->compat) + 1;

At this point you've already substituted the NULs in,
so you can't call strlen(), I think.

> +qemu_fdt_setprop(fdt, nodename, "compatible",
> +vdev->compat, compat_str_len);
> +
> +reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
> +
> +for (i = 0; i < vbasedev->num_regions; i++) {
> +snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
> +mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
> +reg_attr[2*i] = 1;
> +reg_attr[2*i+1] = mmio_base;
> +reg_attr[2*i+2] = 1;
> +reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
> +}
>
> This should be 4 instead of 2.
> Also, to support 64 bit systems I think this should be 2 instead of 1.

Actually it depends entirely on what the board has done to
create the device tree node that we're inserting this child
node into. For ARM boot.c sets both #address-cells and
#size-cells to 2 regardless of whether the system is 32
or 64 bits, for simplicity. I imagine PPC does something
different. If we're editing a dtb that the user passed in (which
I think would be pretty lunatic so we shouldn't do this)
we'd have to actually walk the dtb to try to figure out what
the semantics of the reg property should be.

thanks
-- PMM

[Qemu-devel] [RFC 3/4] ide: update ide_drive_get to work with both PCI-IDE and AHCI interfaces

2014-08-18 Thread John Snow

Signed-off-by: John Snow 
---
 hw/i386/pc_piix.c |  2 +-
 hw/ide/core.c | 11 +++
 include/hw/ide.h  |  3 ++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 47ac1b5..9da6f0e 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -239,7 +239,7 @@ static void pc_init1(MachineState *machine,
 
 pc_nic_init(isa_bus, pci_bus);
 
-ide_drive_get(hd, MAX_IDE_BUS);
+ide_drive_get(hd, IF_IDE, MAX_IDE_BUS);
 if (pci_enabled) {
 PCIDevice *dev;
 if (xen_enabled()) {
diff --git a/hw/ide/core.c b/hw/ide/core.c
index b48127f..408d7c1 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2481,16 +2481,19 @@ const VMStateDescription vmstate_ide_bus = {
 }
 };
 
-void ide_drive_get(DriveInfo **hd, int max_bus)
+void ide_drive_get(DriveInfo **hd, BlockInterfaceType type, int max_bus)
 {
 int i;
 
-if (drive_get_max_bus(IF_IDE) >= max_bus) {
+if (drive_get_max_bus(type) >= max_bus) {
 fprintf(stderr, "qemu: too many IDE bus: %d\n", max_bus);
 exit(1);
 }
 
-for(i = 0; i < max_bus * MAX_IDE_DEVS; i++) {
-hd[i] = drive_get(IF_IDE, i / MAX_IDE_DEVS, i % MAX_IDE_DEVS);
+for (i = 0; i < max_bus * if_get_max_devs(type); i++) {
+fprintf(stderr, "hd[%u] := drive_get_by_index(%d, %u);\n",
+i, type, i);
+hd[i] = drive_get_by_index(type, i);
+fprintf(stderr, "hd[%u] := %p\n", i, (void *)hd[i]);
 }
 }
diff --git a/include/hw/ide.h b/include/hw/ide.h
index bc8bd32..c4400ab 100644
--- a/include/hw/ide.h
+++ b/include/hw/ide.h
@@ -4,6 +4,7 @@
 #include "hw/isa/isa.h"
 #include "hw/pci/pci.h"
 #include "exec/memory.h"
+#include "sysemu/blockdev.h"
 
 #define MAX_IDE_DEVS   2
 
@@ -28,6 +29,6 @@ int ide_get_geometry(BusState *bus, int unit,
 int ide_get_bios_chs_trans(BusState *bus, int unit);
 
 /* ide/core.c */
-void ide_drive_get(DriveInfo **hd, int max_bus);
+void ide_drive_get(DriveInfo **hd, BlockInterfaceType type, int max_bus);
 
 #endif /* HW_IDE_H */
-- 
1.9.3

[Qemu-devel] [RFC 2/4] blockdev: add IF_AHCI to support -cdrom and -hd[a-d]

2014-08-18 Thread John Snow

Signed-off-by: John Snow 
---
 blockdev.c| 9 ++---
 include/sysemu/blockdev.h | 2 +-
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 58da77f..a9efe1f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -57,6 +57,7 @@ static const char *const if_name[IF_COUNT] = {
 [IF_SD] = "sd",
 [IF_VIRTIO] = "virtio",
 [IF_XEN] = "xen",
+[IF_AHCI] = "ahci",
 };
 
 static const int if_max_devs[IF_COUNT] = {
@@ -76,6 +77,7 @@ static const int if_max_devs[IF_COUNT] = {
  */
 [IF_IDE] = 2,
 [IF_SCSI] = 7,
+[IF_AHCI] = 1,
 };
 
 /*
@@ -876,7 +878,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 if (qemu_opts_id(all_opts) == NULL) {
 char *new_id;
 const char *mediastr = "";
-if (type == IF_IDE || type == IF_SCSI) {
+if (type == IF_IDE || type == IF_SCSI || type == IF_AHCI) {
 mediastr = (media == MEDIA_CDROM) ? "-cd" : "-hd";
 }
 if (max_devs) {
@@ -918,7 +920,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 werror = qemu_opt_get(legacy_opts, "werror");
 if (werror != NULL) {
 if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO &&
-type != IF_NONE) {
+type != IF_NONE && type != IF_AHCI) {
 error_report("werror is not supported by this bus type");
 goto fail;
 }
@@ -928,7 +930,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 rerror = qemu_opt_get(legacy_opts, "rerror");
 if (rerror != NULL) {
 if (type != IF_IDE && type != IF_VIRTIO && type != IF_SCSI &&
-type != IF_NONE) {
+type != IF_NONE && type != IF_AHCI) {
 error_report("rerror is not supported by this bus type");
 goto fail;
 }
@@ -966,6 +968,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type)
 
 switch(type) {
 case IF_IDE:
+case IF_AHCI:
 case IF_SCSI:
 case IF_XEN:
 case IF_NONE:
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 2420abd..6b02cf4 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -25,7 +25,7 @@ typedef enum {
  */
 IF_IDE = 0,
 IF_NONE,
-IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN,
+IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN, IF_AHCI,
 IF_COUNT
 } BlockInterfaceType;
 
-- 
1.9.3

[Qemu-devel] [RFC 0/4] Adding -cdrom, -hd[abcd] and -drive file=... to Q35

2014-08-18 Thread John Snow

Currently, the drive definitions created by drive_new() when using
the -drive file=...[,if=ide] or -cdrom or -hd[abcd] options are not
picked up by the Q35 initialization routine.

To fix this, we have to add hooks to search for these drives using
something like pc_piix's ide_drive_get and then add them using
something like pci_ide_create_devs.

Where it gets slightly wonky is the fact that if=ide is specified
to use two devices per bus, whereas AHCI does not utilize that
same master/slave mechanic. Therefore, many places inside of
blockdev.c where we add and define new drives use incorrect math
for AHCI devices and try to place them on impossible buses.
Notably -hdb and -hdd would become inaccessible.

To remedy this, I added a new interface type, IF_AHCI. Corresponding
to this change, I modified the default machine properties for Q35
to use this interface as a default.

The changes appear to work well, but where I'd like some feedback
is what should happen if people do something like:

qemu -M q35 -drive if=ide,file=fedora.qcow2

The code as presented here is not going to look for or attempt to
connect IDE devices, because it is now looking for /AHCI/ devices.

At worst, this may break a few existing scripts, but I actually think
that since the if=ide,file=... shorthand never worked to begin with,
the impact might actually be minimal.

But since the legacy IDE interface of the ICH9 is as of yet unemulated,
the if=ide drives don't have a reasonable place to go yet. I am also
not sure what a reasonable way to handle people specifying BOTH
if=ide and if=ahci drives would be.

John Snow (4):
  blockdev: add if_get_max_devs
  blockdev: add IF_AHCI to support -cdrom and -hd[a-d]
  ide: update ide_drive_get to work with both PCI-IDE and AHCI
interfaces
  ahci: implement -cdrom and -hd[a-d]

 blockdev.c| 18 +++---
 hw/i386/pc_piix.c |  2 +-
 hw/i386/pc_q35.c  |  4 
 hw/ide/ahci.c | 17 +
 hw/ide/ahci.h |  2 ++
 hw/ide/core.c | 11 +++
 include/hw/ide.h  |  3 ++-
 include/sysemu/blockdev.h |  3 ++-
 8 files changed, 50 insertions(+), 10 deletions(-)

-- 
1.9.3

[Qemu-devel] [RFC 4/4] ahci: implement -cdrom and -hd[a-d]

2014-08-18 Thread John Snow

Signed-off-by: John Snow 
---
 hw/i386/pc_q35.c |  4 
 hw/ide/ahci.c| 17 +
 hw/ide/ahci.h|  2 ++
 3 files changed, 23 insertions(+)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 4b5a274..4613565 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -86,6 +86,7 @@ static void pc_q35_init(MachineState *machine)
 DeviceState *icc_bridge;
 PcGuestInfo *guest_info;
 ram_addr_t lowmem;
+DriveInfo *hd[MAX_SATA_PORTS];
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -253,6 +254,8 @@ static void pc_q35_init(MachineState *machine)
true, "ich9-ahci");
 idebus[0] = qdev_get_child_bus(&ahci->qdev, "ide.0");
 idebus[1] = qdev_get_child_bus(&ahci->qdev, "ide.1");
+ide_drive_get(hd, IF_AHCI, MAX_SATA_PORTS);
+ahci_ide_create_devs(ahci, hd);
 
 if (usb_enabled(false)) {
 /* Should we create 6 UHCI according to ich9 spec? */
@@ -354,6 +357,7 @@ static QEMUMachine pc_q35_machine_v2_2 = {
 .name = "pc-q35-2.2",
 .alias = "q35",
 .init = pc_q35_init,
+.block_default_type = IF_AHCI,
 };
 
 #define PC_Q35_2_1_MACHINE_OPTIONS PC_Q35_2_2_MACHINE_OPTIONS
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 4cda0d0..bc8b8e9 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -1403,3 +1403,20 @@ static void sysbus_ahci_register_types(void)
 }
 
 type_init(sysbus_ahci_register_types)
+
+void ahci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table)
+{
+AHCIPCIState *d = ICH_AHCI(dev);
+AHCIState *ahci = &d->ahci;
+unsigned i;
+
+for (i = 0; i < ahci->ports; i++) {
+if (hd_table[i] == NULL) {
+continue;
+}
+fprintf(stderr, "ahci_ide_create_devs: ide_create_drive(bus:%p, 0, 
drive[index:%u])\n",
+(void *)&ahci->dev[i].port, i);
+ide_create_drive(&ahci->dev[i].port, 0, hd_table[i]);
+}
+
+}
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index 1543df7..41daedb 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -332,4 +332,6 @@ void ahci_uninit(AHCIState *s);
 
 void ahci_reset(AHCIState *s);
 
+void ahci_ide_create_devs(PCIDevice *dev, DriveInfo **hd_table);
+
 #endif /* HW_IDE_AHCI_H */
-- 
1.9.3

[Qemu-devel] [RFC 1/4] blockdev: add if_get_max_devs

2014-08-18 Thread John Snow

Signed-off-by: John Snow 
---
 blockdev.c| 9 +
 include/sysemu/blockdev.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 48bd9a3..58da77f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -110,6 +110,15 @@ void blockdev_auto_del(BlockDriverState *bs)
 }
 }
 
+int if_get_max_devs(BlockInterfaceType type)
+{
+if (type >= IF_IDE && type < IF_COUNT) {
+return if_max_devs[type];
+}
+
+return 0;
+}
+
 static int drive_index_to_bus_id(BlockInterfaceType type, int index)
 {
 int max_devs = if_max_devs[type];
diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 23a5d10..2420abd 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -48,6 +48,7 @@ struct DriveInfo {
 DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit);
 DriveInfo *drive_get_by_index(BlockInterfaceType type, int index);
 int drive_get_max_bus(BlockInterfaceType type);
+int if_get_max_devs(BlockInterfaceType type);
 DriveInfo *drive_get_next(BlockInterfaceType type);
 DriveInfo *drive_get_by_blockdev(BlockDriverState *bs);
 
-- 
1.9.3

[Qemu-devel] [PATCH] image-fuzzer: Trivial readability and formatting improvements

2014-08-18 Thread Maria Kustova

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/qcow2/fuzz.py | 15 ++--
 tests/image-fuzzer/runner.py | 51 
 2 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py
index 6e272c6..c652dc9 100644
--- a/tests/image-fuzzer/qcow2/fuzz.py
+++ b/tests/image-fuzzer/qcow2/fuzz.py
@@ -123,7 +123,7 @@ def string_validator(current, strings):
 return validator(current, random.choice, strings)
 
 
-def selector(current, constraints, validate=int_validator):
+def selector(current, constraints, validate=None):
 """Select one value from all defined by constraints.
 
 Each constraint produces one random value satisfying to it. The function
@@ -131,6 +131,9 @@ def selector(current, constraints, validate=int_validator):
 constraints overlaps).
 """
 
+if validate is None:
+validate = int_validator
+
 def iter_validate(c):
 """Apply validate() only to constraints represented as lists.
 
@@ -337,9 +340,8 @@ def l1_entry(current):
 constraints = UINT64_V
 # Reserved bits are ignored
 # Added a possibility when only flags are fuzzed
-offset = 0x7fff & random.choice([selector(current,
-  constraints),
- current])
+offset = 0x7fff & \
+ random.choice([selector(current, constraints), current])
 is_cow = random.randint(0, 1)
 return offset + (is_cow << UINT64_M)
 
@@ -349,9 +351,8 @@ def l2_entry(current):
 constraints = UINT64_V
 # Reserved bits are ignored
 # Add a possibility when only flags are fuzzed
-offset = 0x3ffe & random.choice([selector(current,
-  constraints),
- current])
+offset = 0x3ffe & \
+ random.choice([selector(current, constraints), current])
 is_compressed = random.randint(0, 1)
 is_cow = random.randint(0, 1)
 is_zero = random.randint(0, 1)
diff --git a/tests/image-fuzzer/runner.py b/tests/image-fuzzer/runner.py
index fd97c40..b142577 100755
--- a/tests/image-fuzzer/runner.py
+++ b/tests/image-fuzzer/runner.py
@@ -73,7 +73,7 @@ def run_app(fd, q_args):
 """Exception for signal.alarm events."""
 pass
 
-def handler(*arg):
+def handler(*args):
 """Notify that an alarm event occurred."""
 raise Alarm
 
@@ -137,8 +137,8 @@ class TestEnv(object):
 self.init_path = os.getcwd()
 self.work_dir = work_dir
 self.current_dir = os.path.join(work_dir, 'test-' + test_id)
-self.qemu_img = os.environ.get('QEMU_IMG', 'qemu-img')\
-  .strip().split(' ')
+self.qemu_img = \
+os.environ.get('QEMU_IMG', 'qemu-img').strip().split(' ')
 self.qemu_io = os.environ.get('QEMU_IO', 'qemu-io').strip().split(' ')
 strings = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x',
'%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d',
@@ -247,10 +247,8 @@ class TestEnv(object):
 
 os.chdir(self.current_dir)
 backing_file_name, backing_file_fmt = self._create_backing_file()
-img_size = image_generator.create_image('test.img',
-backing_file_name,
-backing_file_fmt,
-fuzz_config)
+img_size = image_generator.create_image(
+'test.img', backing_file_name, backing_file_fmt, fuzz_config)
 for item in commands:
 shutil.copy('test.img', 'copy.img')
 # 'off' and 'len' are multiple of the sector size
@@ -263,7 +261,7 @@ class TestEnv(object):
 elif item[0] == 'qemu-io':
 current_cmd = list(self.qemu_io)
 else:
-multilog("Warning: test command '%s' is not defined.\n" \
+multilog("Warning: test command '%s' is not defined.\n"
  % item[0], sys.stderr, self.log, self.parent_log)
 continue
 # Replace all placeholders with their real values
@@ -279,29 +277,30 @@ class TestEnv(object):
"Backing file: %s\n" \
% (self.seed, " ".join(current_cmd),
   self.current_dir, backing_file_name)
-
 temp_log = StringIO.StringIO()
 try:
 retcode = run_app(temp_log, current_cmd)
 except OSError, e:
-multilog(test_summary + "Error: Start of '%s' failed. " \
- "Reason: %s\n\n" % (os.path.basename(
- current_cmd[0]), e[1]),
+multilog(test_summary +
+

Re: [Qemu-devel] [PATCH v5 10/10] hw/arm/dyn_sysbus_devtree: enable simple VFIO dynamic instantiation

2014-08-18 Thread Joel Schopp


+static void vfio_fdt_add_device_node(SysBusDevice *sbdev, void *opaque)
+{
+PlatformDevtreeData *data = opaque;
+void *fdt = data->fdt;
+const char *parent_node = data->node;
+int compat_str_len;
+char *nodename;
+int i, ret;
+uint32_t *irq_attr;
+uint64_t *reg_attr;
+uint64_t mmio_base;
+uint64_t irq_number;
+gchar mmio_base_prop[8];
+gchar irq_number_prop[8];
+VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(sbdev);
+VFIODevice *vbasedev = &vdev->vbasedev;
+Object *obj = OBJECT(sbdev);
+
+mmio_base = object_property_get_int(obj, "mmio[0]", NULL);
+
+nodename = g_strdup_printf("%s/%s@%" PRIx64, parent_node,
+   vbasedev->name,
+   mmio_base);
+
+qemu_fdt_add_subnode(fdt, nodename);
+
+compat_str_len = strlen(vdev->compat) + 1;
+qemu_fdt_setprop(fdt, nodename, "compatible",
+vdev->compat, compat_str_len);
+
+reg_attr = g_new(uint64_t, vbasedev->num_regions*4);
+
+for (i = 0; i < vbasedev->num_regions; i++) {
+snprintf(mmio_base_prop, sizeof(mmio_base_prop), "mmio[%d]", i);
+mmio_base = object_property_get_int(obj, mmio_base_prop, NULL);
+reg_attr[2*i] = 1;
+reg_attr[2*i+1] = mmio_base;
+reg_attr[2*i+2] = 1;
+reg_attr[2*i+3] = memory_region_size(&vdev->regions[i]->mem);
+}

This should be 4 instead of 2. 
Also, to support 64 bit systems I think this should be 2 instead of 1.

[Qemu-devel] [PATCH 2/2] fuzz: Make fuzzing functions and values relevant to the qemu implementation

2014-08-18 Thread Maria Kustova

Heuristic values were added to fuzzing constraints and vectors.

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/qcow2/fuzz.py | 71 +---
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/tests/image-fuzzer/qcow2/fuzz.py b/tests/image-fuzzer/qcow2/fuzz.py
index 5852b4d..6e272c6 100644
--- a/tests/image-fuzzer/qcow2/fuzz.py
+++ b/tests/image-fuzzer/qcow2/fuzz.py
@@ -17,6 +17,7 @@
 #
 
 import random
+from sys import maxint as INT_MAX
 
 UINT8 = 0xff
 UINT16 = 0x
@@ -25,16 +26,21 @@ UINT64 = 0x
 # Most significant bit orders
 UINT32_M = 31
 UINT64_M = 63
+# Sizes
+UINT64_S = 8
 # Fuzz vectors
-UINT8_V = [0, 0x10, UINT8/4, UINT8/2 - 1, UINT8/2, UINT8/2 + 1, UINT8 - 1,
+UINT8_V = [0, 1, 0x10, UINT8/4, UINT8/2 - 1, UINT8/2, UINT8/2 + 1, UINT8 - 1,
UINT8]
-UINT16_V = [0, 0x100, 0x1000, UINT16/4, UINT16/2 - 1, UINT16/2, UINT16/2 + 1,
-UINT16 - 1, UINT16]
-UINT32_V = [0, 0x100, 0x1000, 0x1, 0x10, UINT32/4, UINT32/2 - 1,
-UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32]
-UINT64_V = UINT32_V + [0x100, 0x1000, 0x1, UINT64/4,
-   UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1,
-   UINT64]
+UINT16_V = [0, 1, 0x100, 0x1000, UINT16/4, UINT16/2 - 1, UINT16/2,
+UINT16/2 + 1, UINT16 - 1, UINT16]
+UINT32_V = UINT16_V + [UINT16 + 1, UINT16 + 2, 0x1, 0x10,
+   0x100, 0x1000, UINT32/4, UINT32/2 - 1,
+   UINT32/2, UINT32/2 + 1, UINT32 - 1, UINT32]
+# Exclude the vector of 16 bit values
+UINT64_V = UINT32_V[len(UINT16_V):] + \
+   [0, 1, UINT32 + 1, UINT32 + 2, 0x1, INT_MAX/UINT64_S - 1,
+INT_MAX / UINT64_S, INT_MAX/UINT64_S + 1, UINT64/4,
+UINT64/2 - 1, UINT64/2, UINT64/2 + 1, UINT64 - 1, UINT64]
 STRING_V = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x',
 '%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d', '%%20n',
 '%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s', '%p%p%p%p%p%p%p%p%p%p',
@@ -124,6 +130,7 @@ def selector(current, constraints, validate=int_validator):
 randomly selects one value satisfying at least one constraint (depending on
 constraints overlaps).
 """
+
 def iter_validate(c):
 """Apply validate() only to constraints represented as lists.
 
@@ -135,33 +142,22 @@ def selector(current, constraints, 
validate=int_validator):
 else:
 return c
 
-fuzz_values = [iter_validate(c) for c in constraints]
-# Remove current for cases it's implicitly specified in constraints
-# Duplicate validator functionality to prevent decreasing of probability
-# to get one of allowable values
-# TODO: remove validators after implementation of intelligent selection
-# of fields will be fuzzed
-try:
-fuzz_values.remove(current)
-except ValueError:
-pass
+v_constraints = [x for x in constraints if x != current]
+fuzz_values = [iter_validate(c) for c in v_constraints]
 return random.choice(fuzz_values)
 
 
 def magic(current):
-"""Fuzz magic header field.
-
-The function just returns the current magic value and provides uniformity
-of calls for all fuzzing functions.
-"""
-return current
+"""Fuzz magic header field."""
+constraints = ['VMDK', 'QED', '', 'OOOM'] + \
+  [truncate_string(STRING_V, len(current))]
+return selector(current, constraints, string_validator)
 
 
 def version(current):
 """Fuzz version header field."""
 constraints = UINT32_V + [
-[(2, 3)],  # correct values
-[(0, 1), (4, UINT32)]
+[(0, 4)]  # includes valid values
 ]
 return selector(current, constraints)
 
@@ -195,16 +191,18 @@ def size(current):
 
 def crypt_method(current):
 """Fuzz crypt method header field."""
-constraints = UINT32_V + [
-1,
-[(2, UINT32)]
-]
+# UINT32_V includes valid values [0, 1]
+constraints = UINT32_V
 return selector(current, constraints)
 
 
 def l1_size(current):
 """Fuzz L1 table size header field."""
-constraints = UINT32_V
+# QCOW_MAX_L1_SIZE = 0x200
+max_size = 0x200 / UINT64_S
+constraints = UINT32_V + \
+  [max_size - 1, max_size, max_size + 1] + \
+  [[(0, current + 1)]]
 return selector(current, constraints)
 
 
@@ -222,12 +220,18 @@ def refcount_table_offset(current):
 
 def refcount_table_clusters(current):
 """Fuzz refcount table clusters header field."""
-constraints = UINT32_V
+# QCOW_MAX_REFTABLE_SIZE = 0x80, MIN_CLUSTER_BITS = 9 =>
+# max size of reftable in clusters = 1 << 14
+max_size = 1 << 14
+constraints = UINT32_V + \
+  [max_size - 1, max_size, max_size + 1] + \
+  [[(0, current + 1)]]
 return selector(current, constraints)
 
 
 def nb_snapshots(current

[Qemu-devel] [PATCH 1/2] runner: Expand the list of default test commands

2014-08-18 Thread Maria Kustova

Additional commands were added to the default runner list to cover all qcow2
related code. This qcow2 specificity is selected to reduce number of
non-relevant tests. After implementation of a fuzzer for a new format the
default list should be updated.

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/runner.py | 75 
 1 file changed, 55 insertions(+), 20 deletions(-)

diff --git a/tests/image-fuzzer/runner.py b/tests/image-fuzzer/runner.py
index 2e1bd51..fd97c40 100755
--- a/tests/image-fuzzer/runner.py
+++ b/tests/image-fuzzer/runner.py
@@ -30,6 +30,9 @@ import getopt
 import StringIO
 import resource
 
+# All formats supported by the 'qemu-img create' command.
+WRITABLE_FORMATS = ['raw', 'vmdk', 'vdi', 'cow', 'qcow2', 'file', 'qed', 'vpc']
+
 try:
 import json
 except ImportError:
@@ -137,24 +140,57 @@ class TestEnv(object):
 self.qemu_img = os.environ.get('QEMU_IMG', 'qemu-img')\
   .strip().split(' ')
 self.qemu_io = os.environ.get('QEMU_IO', 'qemu-io').strip().split(' ')
-self.commands = [['qemu-img', 'check', '-f', 'qcow2', '$test_img'],
- ['qemu-img', 'info', '-f', 'qcow2', '$test_img'],
- ['qemu-io', '$test_img', '-c', 'read $off $len'],
- ['qemu-io', '$test_img', '-c', 'write $off $len'],
- ['qemu-io', '$test_img', '-c',
-  'aio_read $off $len'],
- ['qemu-io', '$test_img', '-c',
-  'aio_write $off $len'],
- ['qemu-io', '$test_img', '-c', 'flush'],
- ['qemu-io', '$test_img', '-c',
-  'discard $off $len'],
- ['qemu-io', '$test_img', '-c',
-  'truncate $off']]
-for fmt in ['raw', 'vmdk', 'vdi', 'cow', 'qcow2', 'file',
-'qed', 'vpc']:
+strings = ['%s%p%x%d', '.1024d', '%.2049d', '%p%p%p%p', '%x%x%x%x',
+   '%d%d%d%d', '%s%s%s%s', '%999s', '%08x', '%%20d',
+   '%%20n', '%%20x', '%%20s', '%s%s%s%s%s%s%s%s%s%s',
+   '%p%p%p%p%p%p%p%p%p%p',
+   '%#0123456x%08x%x%s%p%d%n%o%u%c%h%l%q%j%z%Z%t%i%e%g%f%a%C' +
+   '%S%08x%%', '%s x 129', '%x x 257']
+self.commands = [
+['qemu-img', 'check', '-f', 'qcow2', '$test_img'],
+['qemu-img', 'check', '-f', 'qcow2', '-r', 'leaks', '$test_img'],
+['qemu-img', 'check', '-f', 'qcow2', '-r', 'all', '$test_img'],
+['qemu-img', 'snapshot', '-c', 'new', '$test_img'],
+['qemu-img', 'info', '-f', 'qcow2', '$test_img'],
+['qemu-img', 'convert', '-c', '-f', 'qcow2', '-O', 'qcow2',
+ '$test_img', 'converted_image.qcow2'],
+['qemu-img', 'amend', '-o', 'compat=0.10', '-f', 'qcow2',
+ '$test_img'],
+['qemu-img', 'amend', '-o', 'lazy_refcounts=on', '-f', 'qcow2',
+ '$test_img'],
+['qemu-img', 'amend', '-o', 'lazy_refcounts=off', '-f', 'qcow2',
+ '$test_img'],
+['qemu-img', 'amend', '-o',
+ 'backing_file=' + random.choice(strings), '-f', 'qcow2',
+ '$test_img'],
+['qemu-img', 'amend', '-o', 'backing_fmt=' + 
random.choice(strings),
+ '-f', 'qcow2', '$test_img'],
+['qemu-io', '$test_img', '-c', 'read $off $len'],
+['qemu-io', '$test_img', '-c', 'read -p $off $len'],
+['qemu-io', '$test_img', '-c', 'write $off $len'],
+['qemu-io', '$test_img', '-c', 'write -c $off $len'],
+['qemu-io', '$test_img', '-c', 'write -p $off $len'],
+['qemu-io', '$test_img', '-c', 'write -z $off $len'],
+['qemu-io', '$test_img', '-c', 'aio_read $off $len'],
+['qemu-io', '$test_img', '-c', 'aio_write $off $len'],
+['qemu-io', '$test_img', '-c', 'flush'],
+['qemu-io', '$test_img', '-c', 'discard $off $len'],
+['qemu-io', '$test_img', '-c', 'truncate $off'],
+['qemu-io', '$test_img', '-c', 'info'],
+['qemu-io', '$test_img', '-c', 'map']
+]
+
+for fmt in WRITABLE_FORMATS:
+cache_opt = random.choice([
+[], ['-t', 'unsafe'],
+['-t', 'writethrough'],
+['-t', 'writeback'],
+['-t', 'none']
+])
+
 self.commands.append(
-['qemu-img', 'convert', '-f', 'qcow2', '-O', fmt,
- '$test_img', 'converted_image.' + fmt])
+['qemu-img', 'convert', '-f', 'qcow2', '-O', fmt] + cache_opt +
+['$test_img', 'converted_image.' + fmt])
 
 try:
 os.makedirs(self.current_dir)
@@ -177,9 +213,8 @@ class TestEnv(object):
 Format of a backing file is randomly cho

[Qemu-devel] [PATCH 0/2] image-fuzzer: Extend test coverage

2014-08-18 Thread Maria Kustova

This patch series contains changes improving test coverage.

Maria Kustova (2):
  runner: Expand the list of default test commands
  fuzz: Make fuzzing functions and values relevant to the qemu
implementation

 tests/image-fuzzer/qcow2/fuzz.py | 71 +++--
 tests/image-fuzzer/runner.py | 75 +---
 2 files changed, 92 insertions(+), 54 deletions(-)

-- 
1.9.3

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Paolo Bonzini

Il 18/08/2014 19:47, Hulin, Patrick - 0559 - MITLL ha scritto:
>> We'll have done the page for the first byte at the top of 
>> helper_{le,be}_{ld,st}_name.  When we discover it's an unaligned
>> access, we should load and check the pte for the second page.  We
>> might have to shuffle those two tests around, since in theory the
>> second page could be I/O mapped and we'd want to pass off the
>> whole access to io_mem_*.
>> 
>> Since two adjacent pages can't conflict in our direct-mapped TLB,
>> we can then safely pass off the work to secondary helpers without
>> fear the first TLB entry will be flushed.
> 
> This isn’t about cross-page writes, although that might make fixing
> the problem a little tricky. The issue occurs with two adjacent TBs
> on the same page: because the individual writes are split up and done
> in reverse order, writes (and reads) off the back of the current TB
> happen twice. In the case of an xor this means the original xor gets
> undone, which is what breaks in Windows 7.

If you fill the TLB beforehand as suggested by Richard, you can reverse
again the direction of the "for" loop.  This should hopefully fix your bug.

Of course if the write is not cross-page, there's no need to do the TLB
pre-fill.

Thanks for the test case!

Paolo

Re: [Qemu-devel] [ARM - FCVT inst] : Difference in calculated value

2014-08-18 Thread Peter Maydell

On 18 August 2014 22:04, Gaurav Sharma  wrote:
> Hi Peter,
> I cross checked it with a AFM model, and the results are indeed different.
> The problem I think lies in how we treat de-normalized numbers which are too
> small to represent in half precision.
> In case of qemu
>>> if(exp < -10)
>>> return signed/unsigned zero.
> However, in case rounding is set, we ignore and we return zero. This may not
> be true and we may have a smallest possible denormalized number.

Cool. Can you provide your test case, then, please?

thanks
-- PMM

Re: [Qemu-devel] [ARM - FCVT inst] : Difference in calculated value

2014-08-18 Thread Gaurav Sharma

Hi Peter,
I cross checked it with a AFM model, and the results are indeed different.
The problem I think lies in how we treat de-normalized numbers which are
too small to represent in half precision.
In case of qemu
>> if(exp < -10)
>> return signed/unsigned zero.
However, in case rounding is set, we ignore and we return zero. This may
not be true and we may have a smallest possible denormalized number.

Thanks,
Gaurav


On Sun, Aug 17, 2014 at 1:14 AM, Peter Maydell 
wrote:

> On 16 August 2014 20:06, Gaurav Sharma  wrote:
> > Can some one confirm is this is an issue with qemu implementation ?
>
> It's on my todo list to look at. If you want to confirm it as a QEMU
> bug your best bet is to write a short test program and compare
> the output on QEMU against running it on real hardware.
>
> -- PMM
>

Re: [Qemu-devel] [Qemu-trivial] [PATCH v7] tests/bios-tables-test: check the value returned by fopen()

2014-08-18 Thread Peter Maydell

On 18 August 2014 20:36, Michael S. Tsirkin  wrote:
> Does test fail if this path is triggered?

If our test harness doesn't report failure when a test
binary returns with a non-zero exit code then the
harness is broken, because there are other test
binaries that rely on that.

thanks
-- PMM

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Hulin, Patrick - 0559 - MITLL

On 8/18/14, 1:47 PM, "Hulin, Patrick - 0559 - MITLL"
 wrote:

>On Aug 18, 2014, at 1:37 PM, Richard Henderson  wrote:
>
>>On 08/16/2014 10:21 PM, Paolo Bonzini wrote:
>Would it work to just call tb_invalidate_phys_page_range before the
>helper_ret_stb loop?
>>I doubt it.
>
>Correct. Doesn¹t work. Haven¹t fully diagnosed why, but it doesn¹t seem
>to ever hit the current_tb_modified passage if you invalidate beforehand.

Yeah - mem_io_pc doesn¹t get updated until we¹re inside io_write, so
tb_invalidate_phys_page_range thinks we¹re inside a different TB. As a
result, it¹s ³is this TB modified² check still returns false.

I¹ve attached the correct source patch for the test case as well.

selfmodify.patch
Description: selfmodify.patch

selfmodify.flat
Description: selfmodify.flat

Re: [Qemu-devel] [PATCH 3/4] qcow2: Add runtime options for cache sizes

2014-08-18 Thread Max Reitz


On 18.08.2014 22:24, Max Reitz wrote:

On 18.08.2014 22:18, Eric Blake wrote:

On 08/18/2014 02:00 PM, Max Reitz wrote:

Add options for specifying the size of the metadata caches. This can
either be done directly for each cache (if only one is given, the other
will be derived according to a default ratio) or combined for both.

Signed-off-by: Max Reitz 
---
  block/qcow2.c | 112 
+++---

  block/qcow2.h |   3 ++
  2 files changed, 103 insertions(+), 12 deletions(-)

I would suspect that you need to also modify qapi/block-core.json to
document the new QMP parameters that can be tuned when hot-plugging a
qcow2 disk.


Oh, you're right - that structure is missing all of the corruption 
detection parameters as well. I'll prepare a follow-up patch tomorrow.


Make that "on Wednesday", as I won't be around tomorrow.

Max

Re: [Qemu-devel] [PATCH 3/4] qcow2: Add runtime options for cache sizes

2014-08-18 Thread Max Reitz


On 18.08.2014 22:18, Eric Blake wrote:

On 08/18/2014 02:00 PM, Max Reitz wrote:

Add options for specifying the size of the metadata caches. This can
either be done directly for each cache (if only one is given, the other
will be derived according to a default ratio) or combined for both.

Signed-off-by: Max Reitz 
---
  block/qcow2.c | 112 +++---
  block/qcow2.h |   3 ++
  2 files changed, 103 insertions(+), 12 deletions(-)

I would suspect that you need to also modify qapi/block-core.json to
document the new QMP parameters that can be tuned when hot-plugging a
qcow2 disk.


Oh, you're right - that structure is missing all of the corruption 
detection parameters as well. I'll prepare a follow-up patch tomorrow.


Max

Re: [Qemu-devel] [Qemu-trivial] [PATCH v7] slirp/misc: Use g_malloc() instead of malloc()

2014-08-18 Thread Jeff Cody

On Mon, Aug 18, 2014 at 03:32:21PM +0400, Michael Tokarev wrote:
> 18.08.2014 11:51, zhanghailiang пишет:
> > Here we don't check the return value of malloc() which may fail.
> > Use the g_malloc() instead, which will abort the program when
> > there is not enough memory.
> > 
> > Signed-off-by: zhanghailiang 
> > Reviewed-by: Alex Bennée 
> > ---
> >  slirp/misc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/slirp/misc.c b/slirp/misc.c
> > index b8eb74c..f7fe497 100644
> > --- a/slirp/misc.c
> > +++ b/slirp/misc.c
> > @@ -54,7 +54,7 @@ int add_exec(struct ex_list **ex_ptr, int do_pty, char 
> > *exec,
> > }
> >  
> > tmp_ptr = *ex_ptr;
> > -   *ex_ptr = (struct ex_list *)malloc(sizeof(struct ex_list));
> > +   *ex_ptr = (struct ex_list *)g_malloc(sizeof(struct ex_list));
> 
> There's a convinient macro in glib, g_new(typename, numelts).  Also
> there's a less commonly used g_renew() which is like realloc, but it
> is not applicable here.
> 

If you are going to respin anyway, I recommend dropping the
superfluous (struct ex_list *) cast here as well.

> > (*ex_ptr)->ex_fport = port;
> > (*ex_ptr)->ex_addr = addr;
> > (*ex_ptr)->ex_pty = do_pty;
> > @@ -235,7 +235,7 @@ strdup(str)
> >  {
> > char *bptr;
> >  
> > -   bptr = (char *)malloc(strlen(str)+1);
> > +   bptr = (char *)g_malloc(strlen(str)+1);
> > strcpy(bptr, str);
> >  
> > return bptr;
> 
> Oh.  And this one should be removed completely.  It is a reimplementation
> of strdup() for system which lacks it.  This code should go, we don't build
> on such a system anyway and we always have g_strdup().  There's one more
> usage of strdup() in this file, btw.
> 
> I'm sorry for being so picky, and you're already at v7, but heck.. We should
> be more active at reviewing patches :)
> 
> Thanks,
> 
> /mjt
>

Re: [Qemu-devel] [PATCH 3/4] qcow2: Add runtime options for cache sizes

2014-08-18 Thread Eric Blake

On 08/18/2014 02:00 PM, Max Reitz wrote:
> Add options for specifying the size of the metadata caches. This can
> either be done directly for each cache (if only one is given, the other
> will be derived according to a default ratio) or combined for both.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/qcow2.c | 112 
> +++---
>  block/qcow2.h |   3 ++
>  2 files changed, 103 insertions(+), 12 deletions(-)

I would suspect that you need to also modify qapi/block-core.json to
document the new QMP parameters that can be tuned when hot-plugging a
qcow2 disk.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v6 07/10] linux-user: check return value of malloc()

2014-08-18 Thread Michael S. Tsirkin

On Thu, Aug 14, 2014 at 04:31:35PM +0300, Riku Voipio wrote:
> On Thu, Aug 14, 2014 at 03:29:18PM +0800, zhanghailiang wrote:
> > Signed-off-by: zhanghailiang 
> > Acked-by: Riku Voipio 
> 
> Applied to linux-user as Michael seemed wary of passing these via
> trivial.
> 
> Riku

Pls remember to add Cc qemu-trivial on bugfixes in your tree.


> >---
> >  linux-user/syscall.c | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> > index a50229d..8e5ccf1 100644
> > --- a/linux-user/syscall.c
> > +++ b/linux-user/syscall.c
> > @@ -2870,6 +2870,10 @@ static inline abi_long do_msgsnd(int msqid, abi_long 
> > msgp,
> >  if (!lock_user_struct(VERIFY_READ, target_mb, msgp, 0))
> >  return -TARGET_EFAULT;
> >  host_mb = malloc(msgsz+sizeof(long));
> > +if (!host_mb) {
> > +unlock_user_struct(target_mb, msgp, 0);
> > +return -TARGET_ENOMEM;
> > +}
> >  host_mb->mtype = (abi_long) tswapal(target_mb->mtype);
> >  memcpy(host_mb->mtext, target_mb->mtext, msgsz);
> >  ret = get_errno(msgsnd(msqid, host_mb, msgsz, msgflg));
> > -- 
> > 1.7.12.4
> > 
> >

Re: [Qemu-devel] [Qemu-trivial] [PATCH v6 03/10] virtio-blk: fix reference a pointer which might be freed

2014-08-18 Thread Michael S. Tsirkin

On Mon, Aug 18, 2014 at 03:49:22PM +0400, Michael Tokarev wrote:
> 14.08.2014 11:29, zhanghailiang wrote:
> > In function virtio_blk_handle_request, it may freed memory pointed by req,
> > So do not access member of req after calling this function.
> > 
> > Reviewed-by: Stefan Hajnoczi 
> > Signed-off-by: zhanghailiang 
> > ---
> >  hw/block/virtio-blk.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> > index c241c50..54a853a 100644
> > --- a/hw/block/virtio-blk.c
> > +++ b/hw/block/virtio-blk.c
> > @@ -458,7 +458,7 @@ static void virtio_blk_handle_output(VirtIODevice 
> > *vdev, VirtQueue *vq)
> >  static void virtio_blk_dma_restart_bh(void *opaque)
> >  {
> >  VirtIOBlock *s = opaque;
> > -VirtIOBlockReq *req = s->rq;
> > +VirtIOBlockReq *req = s->rq, *next = NULL;
> >  MultiReqBuffer mrb = {
> >  .num_writes = 0,
> >  };
> > @@ -469,8 +469,9 @@ static void virtio_blk_dma_restart_bh(void *opaque)
> >  s->rq = NULL;
> >  
> >  while (req) {
> > +next = req->next;
> >  virtio_blk_handle_request(req, &mrb);
> > -req = req->next;
> > +req = next;
> >  }
> >  
> >  virtio_submit_multiwrite(s->bs, &mrb);
> 
> So, finally, I've applied this patch:
> 
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -469,8 +469,9 @@ static void virtio_blk_dma_restart_bh(void *opaque)
>  s->rq = NULL;
> 
>  while (req) {
> +VirtIOBlockReq *next = req->next;
>  virtio_blk_handle_request(req, &mrb);
> -req = req->next;
> +req = next;
>  }
> 
>  virtio_submit_multiwrite(s->bs, &mrb);
> 
> and dropped Stefan's Reviewed-by on the way ;)
> 
> This is a bugfix after all ;)
> 
> Thanks,
> 
> /mjt


By the way, could you please add Cc qemu-stable on bugfixes
you have queued?
These are likely appopriate for 2.1.1.

-- 
MST

[Qemu-devel] [PATCH v2 3/4] qcow2: Add runtime options for cache sizes

2014-08-18 Thread Max Reitz

Add options for specifying the size of the metadata caches. This can
either be done directly for each cache (if only one is given, the other
will be derived according to a default ratio) or combined for both.

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 112 +++---
 block/qcow2.h |   3 ++
 2 files changed, 103 insertions(+), 12 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 910d9cf..f9e045f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -442,6 +442,22 @@ static QemuOptsList qcow2_runtime_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "Check for unintended writes into an inactive L2 table",
 },
+{
+.name = QCOW2_OPT_CACHE_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum combined metadata (L2 tables and refcount blocks) 
"
+"cache size",
+},
+{
+.name = QCOW2_OPT_L2_CACHE_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum L2 table cache size",
+},
+{
+.name = QCOW2_OPT_REFCOUNT_CACHE_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum refcount block cache size",
+},
 { /* end of list */ }
 },
 };
@@ -457,6 +473,61 @@ static const char 
*overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
 [QCOW2_OL_INACTIVE_L2_BITNR]= QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
+ uint64_t *refcount_cache_size, Error **errp)
+{
+uint64_t combined_cache_size;
+bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
+
+combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
+l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
+refcount_cache_size_set = qemu_opt_get(opts, 
QCOW2_OPT_REFCOUNT_CACHE_SIZE);
+
+combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
+*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0);
+*refcount_cache_size = qemu_opt_get_size(opts,
+ QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0);
+
+if (combined_cache_size_set) {
+if (l2_cache_size_set && refcount_cache_size_set) {
+error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
+   " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
+   "the same time");
+return;
+} else if (*l2_cache_size > combined_cache_size) {
+error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
+   QCOW2_OPT_CACHE_SIZE);
+return;
+} else if (*refcount_cache_size > combined_cache_size) {
+error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
+   QCOW2_OPT_CACHE_SIZE);
+return;
+}
+
+if (l2_cache_size_set) {
+*refcount_cache_size = combined_cache_size - *l2_cache_size;
+} else if (refcount_cache_size_set) {
+*l2_cache_size = combined_cache_size - *refcount_cache_size;
+} else {
+*refcount_cache_size = combined_cache_size
+ / (DEFAULT_L2_REFCOUNT_SIZE_RATIO + 1);
+*l2_cache_size = combined_cache_size - *refcount_cache_size;
+}
+} else {
+if (!l2_cache_size_set && !refcount_cache_size_set) {
+*l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE;
+*refcount_cache_size = *l2_cache_size
+ / DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+} else if (!l2_cache_size_set) {
+*l2_cache_size = *refcount_cache_size
+   * DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+} else if (!refcount_cache_size_set) {
+*refcount_cache_size = *l2_cache_size
+ / DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+}
+}
+}
+
 static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -707,18 +778,43 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
-/* alloc L2 table/refcount block cache */
-l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE / s->cluster_size;
+/* get L2 table/refcount block cache size from command line options */
+opts = qemu_opts_create(&qcow2_runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
+read_cache_sizes(opts, &l2_cache_size, &refcount_cache_size, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
+l2_cache_size /= s->cluster_size;
 if (l2_cache_size < MIN_L2_CACHE_SIZE) {

[Qemu-devel] [PATCH v2 4/4] iotests: Add test for qcow2's cache options

2014-08-18 Thread Max Reitz

Add a test which tests various combinations of qcow2's cache options
(some of which are valid, some of which are not).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/103 | 99 ++
 tests/qemu-iotests/103.out | 29 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 129 insertions(+)
 create mode 100755 tests/qemu-iotests/103
 create mode 100644 tests/qemu-iotests/103.out

diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103
new file mode 100755
index 000..0f1dc9f
--- /dev/null
+++ b/tests/qemu-iotests/103
@@ -0,0 +1,99 @@
+#!/bin/bash
+#
+# Test case for qcow2 metadata cache size specification
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+here=$PWD
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+IMG_SIZE=64K
+
+_make_test_img $IMG_SIZE
+$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo '=== Testing invalid option combinations ==='
+echo
+
+# all sizes set at the same time
+$QEMU_IO -c "open -o 
cache-size=1.25M,l2-cache-size=1M,refcount-cache-size=0.25M $TEST_IMG" \
+2>&1 | _filter_testdir | _filter_imgfmt
+# l2-cache-size may not exceed cache-size
+$QEMU_IO -c "open -o cache-size=1M,l2-cache-size=2M $TEST_IMG" 2>&1 \
+| _filter_testdir | _filter_imgfmt
+# refcount-cache-size may not exceed cache-size
+$QEMU_IO -c "open -o cache-size=1M,refcount-cache-size=2M $TEST_IMG" 2>&1 \
+| _filter_testdir | _filter_imgfmt
+# 0 should be a valid size (e.g. for enforcing the minimum), so this should not
+# work
+$QEMU_IO -c "open -o cache-size=0,l2-cache-size=0,refcount-cache-size=0 
$TEST_IMG" \
+2>&1 | _filter_testdir | _filter_imgfmt
+
+echo
+echo '=== Testing valid option combinations ==='
+echo
+
+# There should be a reasonable and working minimum
+$QEMU_IO -c "open -o cache-size=0 $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+$QEMU_IO -c "open -o l2-cache-size=0 $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+$QEMU_IO -c "open -o refcount-cache-size=0 $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+
+# Derive cache sizes from combined size (with a reasonable ratio, but we cannot
+# test that)
+$QEMU_IO -c "open -o cache-size=2M $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+# Fix one cache, derive the other
+$QEMU_IO -c "open -o cache-size=2M,l2-cache-size=1M $TEST_IMG" \
+ -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+$QEMU_IO -c "open -o cache-size=2M,refcount-cache-size=1M $TEST_IMG" \
+ -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+# Directly set both caches
+$QEMU_IO -c "open -o l2-cache-size=1M,refcount-cache-size=0.25M $TEST_IMG" \
+ -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/103.out b/tests/qemu-iotests/103.out
new file mode 100644
index 000..ddf6b5a
--- /dev/null
+++ b/tests/qemu-iotests/103.out
@@ -0,0 +1,29 @@
+QA output created by 103
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536 
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing invalid option combinations ===
+
+qemu-io: can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and 
refcount-cache-size may not be set the same time
+qemu-io: can't open device TEST_DIR/t.IMGFMT: l2-cache-size may not exceed 
cache-size
+qemu-io: can't open device TEST_DIR/t.IMGFMT: refcount-cache-size may not 
exceed cache-size
+qemu-io: can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and 
refcount-cache-size may not be set the same time
+
+=== Testing valid option combinations ===
+
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:

[Qemu-devel] [PATCH v2 1/4] qcow2: Constant cache size in bytes

2014-08-18 Thread Max Reitz

Specifying the metadata cache sizes in clusters results in less clusters
(and much less bytes) covered for small cluster sizes and vice versa.
Using a constant byte size reduces this difference, and makes it
possible to manually specify the cache size in an easily comprehensible
unit.

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 16 ++--
 block/qcow2.h | 10 --
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 435e0e1..910d9cf 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -470,6 +470,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 uint64_t l1_vm_state_index;
 const char *opt_overlap_check;
 int overlap_check_template = 0;
+uint64_t l2_cache_size, refcount_cache_size;
 
 ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
 if (ret < 0) {
@@ -707,8 +708,19 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* alloc L2 table/refcount block cache */
-s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE / s->cluster_size;
+if (l2_cache_size < MIN_L2_CACHE_SIZE) {
+l2_cache_size = MIN_L2_CACHE_SIZE;
+}
+
+refcount_cache_size = l2_cache_size
+/ (DEFAULT_L2_REFCOUNT_SIZE_RATIO * s->cluster_size);
+if (refcount_cache_size < MIN_REFCOUNT_CACHE_SIZE) {
+refcount_cache_size = MIN_REFCOUNT_CACHE_SIZE;
+}
+
+s->l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
+s->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
 if (s->l2_table_cache == NULL || s->refcount_block_cache == NULL) {
 error_setg(errp, "Could not allocate metadata caches");
 ret = -ENOMEM;
diff --git a/block/qcow2.h b/block/qcow2.h
index b49424b..671783d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -64,10 +64,16 @@
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
-#define L2_CACHE_SIZE 16
+#define MIN_L2_CACHE_SIZE 1 /* cluster */
 
 /* Must be at least 4 to cover all cases of refcount table growth */
-#define REFCOUNT_CACHE_SIZE 4
+#define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
+
+#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
+
+/* The refblock cache needs only a fourth of the L2 cache size to cover as many
+ * clusters */
+#define DEFAULT_L2_REFCOUNT_SIZE_RATIO 4
 
 #define DEFAULT_CLUSTER_SIZE 65536
 
-- 
2.0.4

[Qemu-devel] [PATCH v2 2/4] qcow2: Use g_try_new0() for cache array

2014-08-18 Thread Max Reitz

With a variable cache size, the number given to qcow2_cache_create() may
be huge. Therefore, use g_try_new0().

While at it, use g_new0() instead of g_malloc0() for allocating the
Qcow2Cache object.

Signed-off-by: Max Reitz 
---
 block/qcow2-cache.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index fe0615a..904f6b1 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -48,9 +48,12 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int 
num_tables)
 Qcow2Cache *c;
 int i;
 
-c = g_malloc0(sizeof(*c));
+c = g_new0(Qcow2Cache, 1);
 c->size = num_tables;
-c->entries = g_new0(Qcow2CachedTable, num_tables);
+c->entries = g_try_new0(Qcow2CachedTable, num_tables);
+if (!c->entries) {
+goto fail;
+}
 
 for (i = 0; i < c->size; i++) {
 c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
@@ -62,8 +65,10 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int 
num_tables)
 return c;
 
 fail:
-for (i = 0; i < c->size; i++) {
-qemu_vfree(c->entries[i].table);
+if (c->entries) {
+for (i = 0; i < c->size; i++) {
+qemu_vfree(c->entries[i].table);
+}
 }
 g_free(c->entries);
 g_free(c);
-- 
2.0.4

[Qemu-devel] [PATCH v2 0/4] qcow2: Allow runtime specification of cache sizes

2014-08-18 Thread Max Reitz

Currently, the metadata cache size is only tunable on compile time
through macros. However, some users may want to use the minimal cache
size (for whatever reason) and others may want to increase the cache
size because they have enough memory and want to increase performance.

This series adds runtime options for setting the cache size in bytes
(which is an easily comprehensible unit) in various ways (by setting
each cache explicitly or the total size).


This series (patch 2) depends on Markus' series
"[PATCH v2 0/4] block: Use g_new() & friends more".


v2:
 - Patch 2: c->entries may be NULL in the fail path; respect that case


git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/4:[] [--] 'qcow2: Constant cache size in bytes'
002/4:[0006] [FC] 'qcow2: Use g_try_new0() for cache array'
003/4:[] [--] 'qcow2: Add runtime options for cache sizes'
004/4:[] [--] 'iotests: Add test for qcow2's cache options'


Max Reitz (4):
  qcow2: Constant cache size in bytes
  qcow2: Use g_try_new0() for cache array
  qcow2: Add runtime options for cache sizes
  iotests: Add test for qcow2's cache options

 block/qcow2-cache.c|  13 +++--
 block/qcow2.c  | 120 +
 block/qcow2.h  |  13 -
 tests/qemu-iotests/103 |  99 +
 tests/qemu-iotests/103.out |  29 +++
 tests/qemu-iotests/group   |   1 +
 6 files changed, 259 insertions(+), 16 deletions(-)
 create mode 100755 tests/qemu-iotests/103
 create mode 100644 tests/qemu-iotests/103.out

-- 
2.0.4

Re: [Qemu-devel] [PATCH v2 4/4] block: Drop some superfluous casts from void *

2014-08-18 Thread Jeff Cody

On Mon, Aug 18, 2014 at 06:10:43PM +0200, Markus Armbruster wrote:
> They clutter the code.  Unfortunately, I can't figure out how to make
> Coccinelle drop all of them, so I have to settle for common special
> cases:
> 
> @@
> type T;
> T *pt;
> void *pv;
> @@
> - pt = (T *)pv;
> + pt = pv;
> @@
> type T;
> @@
> - (T *)
>   (\(g_malloc\|g_malloc0\|g_realloc\|g_new\|g_new0\|g_renew\|
>g_try_malloc\|g_try_malloc0\|g_try_realloc\|
>g_try_new\|g_try_new0\|g_try_renew\)(...))
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/vhdx-log.c| 2 +-
>  block/vvfat.c   | 8 
>  hw/ide/microdrive.c | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/block/vhdx-log.c b/block/vhdx-log.c
> index eb5c7a0..4267e60 100644
> --- a/block/vhdx-log.c
> +++ b/block/vhdx-log.c
> @@ -923,7 +923,7 @@ static int vhdx_log_write(BlockDriverState *bs, 
> BDRVVHDXState *s,
>  buffer = qemu_blockalign(bs, total_length);
>  memcpy(buffer, &new_hdr, sizeof(new_hdr));
>  
> -new_desc = (VHDXLogDescriptor *) (buffer + sizeof(new_hdr));
> +new_desc = (buffer + sizeof(new_hdr));

Agree with Max, in that the parenthesis could be removed.  Not worthy
of a respin normally, but since the point of this patch is to
unclutter the code, I guess it makes sense to fix it.

>  data_sector = buffer + (desc_sectors * VHDX_LOG_SECTOR_SIZE);
>  data_tmp = data;
>  
> diff --git a/block/vvfat.c b/block/vvfat.c
> index f877e85..401539d 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -732,7 +732,7 @@ static int read_directory(BDRVVVFATState* s, int 
> mapping_index)
>   if(first_cluster == 0 && (is_dotdot || is_dot))
>   continue;
>  
> - buffer=(char*)g_malloc(length);
> + buffer=g_malloc(length);

You missed a spot to put spaces around the '=' here.  Nothing worthy
of a respin, of course - just a note in case you decide to respin.


[...]

With or without the changes above:

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [PATCH 1/1] pc-dimm: Change PCDIMMDevice->node from UINT32 to INT32, and initialize it as -1.

2014-08-18 Thread Michael S. Tsirkin

On Mon, Aug 18, 2014 at 03:58:33PM +0200, Paolo Bonzini wrote:
> Il 18/08/2014 15:56, Michael S. Tsirkin ha scritto:
> > > +/* Initialize PCDIMMDevice->node to -1 so that even if user doesn't 
> > > specify
> > > + * any numa option, PCDIMMDevice->node won't be 0, which indicates node0.
> > > + * In this case, SRAT won't be created, and guest kernel will fake a 
> > > node.
> > > + */
> > 
> > I think you simply mean
> > /* default value for PCDIMMDevice->node (unless specified by user) */
> >
> > > +#define NO_NODE_ID -1
> 
> I think a comment about the SRAT would be useful.
> 
> Paolo

OK so add:

/* In this case, SRAT won't be created. */

[Qemu-devel] [PATCH V2 0/2] runner: Control test duration

2014-08-18 Thread Maria Kustova

The first patch adds the '--duration SECONDS' argument. After the specified
duration the runner allows to end the current test and then exits.

The second patch adds forced termination of a program under test, if the test
execution takes more than 10 minutes to indicate program freezes.

If a program under test hangs, then the specified test duration can be overrun
up to 10 minutes.

The patch series is based on https://github.com/stefanha/qemu/commits/block,
commit 07a45925fa88376f8583a333e74f7eeb0f455685

v1 -> v2:
 * Trivial fixes based on the review of Fam Zheng
 * Increased time-out (in some cases 5 minutes interval returned false
   negatives)

Maria Kustova (2):
  runner: Add an argument for test duration
  runner: Kill a program under test by time-out

 tests/image-fuzzer/runner.py | 50 +---
 1 file changed, 42 insertions(+), 8 deletions(-)

-- 
1.9.3

Re: [Qemu-devel] [PATCH] monitor: fix use after free

2014-08-18 Thread Michael S. Tsirkin

On Mon, Aug 18, 2014 at 02:05:46PM -0400, Luiz Capitulino wrote:
> On Sun, 17 Aug 2014 11:45:17 +0200
> "Michael S. Tsirkin"  wrote:
> 
> > The function monitor_fdset_dup_fd_find_remove() references member of
> > 'mon_fdset' which - when remove flag is set - may be freed in function
> > monitor_fdset_cleanup().
> > remove is set by monitor_fdset_dup_fd_remove which in practice
> > does not need the returned value, so make it void,
> > and return -1 from monitor_fdset_dup_fd_find_remove.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > Reported-by: zhanghailiang 
> 
> I've applied this one to the qmp tree, but I have one comment below.
> 
> > ---
> >  include/monitor/monitor.h | 2 +-
> >  monitor.c | 8 +---
> >  stubs/fdset-remove-fd.c   | 3 +--
> >  3 files changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> > index 3d6929d..78a5fc8 100644
> > --- a/include/monitor/monitor.h
> > +++ b/include/monitor/monitor.h
> > @@ -64,7 +64,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool 
> > has_fdset_id, int64_t fdset_id,
> >  Error **errp);
> >  int monitor_fdset_get_fd(int64_t fdset_id, int flags);
> >  int monitor_fdset_dup_fd_add(int64_t fdset_id, int dup_fd);
> > -int monitor_fdset_dup_fd_remove(int dup_fd);
> > +void monitor_fdset_dup_fd_remove(int dup_fd);
> >  int monitor_fdset_dup_fd_find(int dup_fd);
> >  
> >  #endif /* !MONITOR_H */
> > diff --git a/monitor.c b/monitor.c
> > index cdbaa60..ba1908f 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -2542,8 +2542,10 @@ static int monitor_fdset_dup_fd_find_remove(int 
> > dup_fd, bool remove)
> >  if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
> >  monitor_fdset_cleanup(mon_fdset);
> >  }
> > +return -1;
> 
> Returning -1 here looks wrong. A better fix is probably to split this
> function into two functions as it's doing two unrelated things currently.
> 
> I've applied the fix anyway because the only user of remove=true doesn't
> care about the return value and also because the bad semantic is less worse
> than the use after free...


I agree generally, it's just that I don't understand this function.
For example, doesn't it leak memory? When
monitor_fdset_dup_fd_find_remove drops a MonFdsetFd entry from list, it does
not seem to free it.



> > +} else {
> > +return mon_fdset->id;
> >  }
> > -return mon_fdset->id;
> >  }
> >  }
> >  }
> > @@ -2555,9 +2557,9 @@ int monitor_fdset_dup_fd_find(int dup_fd)
> >  return monitor_fdset_dup_fd_find_remove(dup_fd, false);
> >  }
> >  
> > -int monitor_fdset_dup_fd_remove(int dup_fd)
> > +void monitor_fdset_dup_fd_remove(int dup_fd)
> >  {
> > -return monitor_fdset_dup_fd_find_remove(dup_fd, true);
> > +monitor_fdset_dup_fd_find_remove(dup_fd, true);
> >  }
> >  
> >  int monitor_handle_fd_param(Monitor *mon, const char *fdname)
> > diff --git a/stubs/fdset-remove-fd.c b/stubs/fdset-remove-fd.c
> > index b3886d9..7f6d61e 100644
> > --- a/stubs/fdset-remove-fd.c
> > +++ b/stubs/fdset-remove-fd.c
> > @@ -1,7 +1,6 @@
> >  #include "qemu-common.h"
> >  #include "monitor/monitor.h"
> >  
> > -int monitor_fdset_dup_fd_remove(int dupfd)
> > +void monitor_fdset_dup_fd_remove(int dupfd)
> >  {
> > -return -1;
> >  }

[Qemu-devel] [PATCH V2 1/2] runner: Add an argument for test duration

2014-08-18 Thread Maria Kustova

After the specified duration the runner stops executing new tests, but it
doesn't interrupt running ones.

Reviewed-by: Fam Zheng 
Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/runner.py | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tests/image-fuzzer/runner.py b/tests/image-fuzzer/runner.py
index 3fa7fca..ea9916b 100755
--- a/tests/image-fuzzer/runner.py
+++ b/tests/image-fuzzer/runner.py
@@ -25,6 +25,7 @@ import subprocess
 import random
 import shutil
 from itertools import count
+import time
 import getopt
 import StringIO
 import resource
@@ -269,6 +270,7 @@ if __name__ == '__main__':
 
 Optional arguments:
   -h, --helpdisplay this help and exit
+  -d, --duration=NUMBER finish tests after NUMBER of seconds
   -c, --command=JSONrun tests for all commands specified in
 the JSON array
   -s, --seed=STRING seed for a test image generation,
@@ -325,10 +327,15 @@ if __name__ == '__main__':
 finally:
 test.finish()
 
+def should_continue(duration, start_time):
+"""Return True if a new test can be started and False otherwise."""
+current_time = int(time.time())
+return (duration is None) or (current_time - start_time < duration)
+
 try:
-opts, args = getopt.gnu_getopt(sys.argv[1:], 'c:hs:kv',
+opts, args = getopt.gnu_getopt(sys.argv[1:], 'c:hs:kvd:',
['command=', 'help', 'seed=', 'config=',
-'keep_passed', 'verbose'])
+'keep_passed', 'verbose', 'duration='])
 except getopt.error, e:
 print >>sys.stderr, \
 "Error: %s\n\nTry 'runner.py --help' for more information" % e
@@ -339,6 +346,8 @@ if __name__ == '__main__':
 log_all = False
 seed = None
 config = None
+duration = None
+
 for opt, arg in opts:
 if opt in ('-h', '--help'):
 usage()
@@ -357,6 +366,8 @@ if __name__ == '__main__':
 log_all = True
 elif opt in ('-s', '--seed'):
 seed = arg
+elif opt in ('-d', '--duration'):
+duration = int(arg)
 elif opt == '--config':
 try:
 config = json.loads(arg)
@@ -394,9 +405,11 @@ if __name__ == '__main__':
 resource.setrlimit(resource.RLIMIT_CORE, (-1, -1))
 # If a seed is specified, only one test will be executed.
 # Otherwise runner will terminate after a keyboard interruption
-for test_id in count(1):
+start_time = int(time.time())
+test_id = count(1)
+while should_continue(duration, start_time):
 try:
-run_test(str(test_id), seed, work_dir, run_log, cleanup,
+run_test(str(test_id.next()), seed, work_dir, run_log, cleanup,
  log_all, command, config)
 except (KeyboardInterrupt, SystemExit):
 sys.exit(1)
-- 
1.9.3

[Qemu-devel] [PATCH V2 2/2] runner: Kill a program under test by time-out

2014-08-18 Thread Maria Kustova

If a program under test get frozen, the test should finish and report about its
failure.
In such cases the runner waits for 10 minutes until the program ends its
execution. After this time-out the program will be terminated and the test will
be marked as failed.

For current limitation of test image size to 10 MB as a maximum an execution of
each command takes about several seconds in general, so 10 minutes is enough to
discriminate freeze, but not drastically increase an overall test duration.

Signed-off-by: Maria Kustova 
---
 tests/image-fuzzer/runner.py | 29 +
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/tests/image-fuzzer/runner.py b/tests/image-fuzzer/runner.py
index ea9916b..2e1bd51 100755
--- a/tests/image-fuzzer/runner.py
+++ b/tests/image-fuzzer/runner.py
@@ -65,14 +65,35 @@ def run_app(fd, q_args):
 """Start an application with specified arguments and return its exit code
 or kill signal depending on the result of execution.
 """
+
+class Alarm(Exception):
+"""Exception for signal.alarm events."""
+pass
+
+def handler(*arg):
+"""Notify that an alarm event occurred."""
+raise Alarm
+
+signal.signal(signal.SIGALRM, handler)
+signal.alarm(600)
+term_signal = signal.SIGKILL
 devnull = open('/dev/null', 'r+')
 process = subprocess.Popen(q_args, stdin=devnull,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
-out, err = process.communicate()
-fd.write(out)
-fd.write(err)
-return process.returncode
+try:
+out, err = process.communicate()
+signal.alarm(0)
+fd.write(out)
+fd.write(err)
+fd.flush()
+return process.returncode
+
+except Alarm:
+os.kill(process.pid, term_signal)
+fd.write('The command was terminated by timeout.\n')
+fd.flush()
+return -term_signal
 
 
 class TestException(Exception):
-- 
1.9.3

[Qemu-devel] [PATCH 4/4] iotests: Add test for qcow2's cache options

2014-08-18 Thread Max Reitz

Add a test which tests various combinations of qcow2's cache options
(some of which are valid, some of which are not).

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/103 | 99 ++
 tests/qemu-iotests/103.out | 29 ++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 129 insertions(+)
 create mode 100755 tests/qemu-iotests/103
 create mode 100644 tests/qemu-iotests/103.out

diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103
new file mode 100755
index 000..0f1dc9f
--- /dev/null
+++ b/tests/qemu-iotests/103
@@ -0,0 +1,99 @@
+#!/bin/bash
+#
+# Test case for qcow2 metadata cache size specification
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+here=$PWD
+tmp=/tmp/$$
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+IMG_SIZE=64K
+
+_make_test_img $IMG_SIZE
+$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo '=== Testing invalid option combinations ==='
+echo
+
+# all sizes set at the same time
+$QEMU_IO -c "open -o 
cache-size=1.25M,l2-cache-size=1M,refcount-cache-size=0.25M $TEST_IMG" \
+2>&1 | _filter_testdir | _filter_imgfmt
+# l2-cache-size may not exceed cache-size
+$QEMU_IO -c "open -o cache-size=1M,l2-cache-size=2M $TEST_IMG" 2>&1 \
+| _filter_testdir | _filter_imgfmt
+# refcount-cache-size may not exceed cache-size
+$QEMU_IO -c "open -o cache-size=1M,refcount-cache-size=2M $TEST_IMG" 2>&1 \
+| _filter_testdir | _filter_imgfmt
+# 0 should be a valid size (e.g. for enforcing the minimum), so this should not
+# work
+$QEMU_IO -c "open -o cache-size=0,l2-cache-size=0,refcount-cache-size=0 
$TEST_IMG" \
+2>&1 | _filter_testdir | _filter_imgfmt
+
+echo
+echo '=== Testing valid option combinations ==='
+echo
+
+# There should be a reasonable and working minimum
+$QEMU_IO -c "open -o cache-size=0 $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+$QEMU_IO -c "open -o l2-cache-size=0 $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+$QEMU_IO -c "open -o refcount-cache-size=0 $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+
+# Derive cache sizes from combined size (with a reasonable ratio, but we cannot
+# test that)
+$QEMU_IO -c "open -o cache-size=2M $TEST_IMG" -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+# Fix one cache, derive the other
+$QEMU_IO -c "open -o cache-size=2M,l2-cache-size=1M $TEST_IMG" \
+ -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+$QEMU_IO -c "open -o cache-size=2M,refcount-cache-size=1M $TEST_IMG" \
+ -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+# Directly set both caches
+$QEMU_IO -c "open -o l2-cache-size=1M,refcount-cache-size=0.25M $TEST_IMG" \
+ -c 'read -P 42 0 64k' \
+| _filter_qemu_io
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/103.out b/tests/qemu-iotests/103.out
new file mode 100644
index 000..ddf6b5a
--- /dev/null
+++ b/tests/qemu-iotests/103.out
@@ -0,0 +1,29 @@
+QA output created by 103
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536 
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing invalid option combinations ===
+
+qemu-io: can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and 
refcount-cache-size may not be set the same time
+qemu-io: can't open device TEST_DIR/t.IMGFMT: l2-cache-size may not exceed 
cache-size
+qemu-io: can't open device TEST_DIR/t.IMGFMT: refcount-cache-size may not 
exceed cache-size
+qemu-io: can't open device TEST_DIR/t.IMGFMT: cache-size, l2-cache-size and 
refcount-cache-size may not be set the same time
+
+=== Testing valid option combinations ===
+
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:

Re: [Qemu-devel] [PATCH 2/4] qcow2: Use g_try_new0() for cache array

2014-08-18 Thread Max Reitz


On 18.08.2014 22:00, Max Reitz wrote:

With a variable cache size, the number given to qcow2_cache_create() may
be huge. Therefore, use g_try_new0().

While at it, use g_new0() instead of g_malloc0() for allocating the
Qcow2Cache object.

Signed-off-by: Max Reitz 
---
  block/qcow2-cache.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index fe0615a..6eabfaa 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -48,9 +48,12 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int 
num_tables)
  Qcow2Cache *c;
  int i;
  
-c = g_malloc0(sizeof(*c));

+c = g_new0(Qcow2Cache, 1);
  c->size = num_tables;
-c->entries = g_new0(Qcow2CachedTable, num_tables);
+c->entries = g_try_new0(Qcow2CachedTable, num_tables);
+if (!c->entries) {
+goto fail;
+}


Self-NACK: The fail path needs to acknowledge that c->entries may be NULL.

Max

[Qemu-devel] [PATCH 3/4] qcow2: Add runtime options for cache sizes

2014-08-18 Thread Max Reitz

Add options for specifying the size of the metadata caches. This can
either be done directly for each cache (if only one is given, the other
will be derived according to a default ratio) or combined for both.

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 112 +++---
 block/qcow2.h |   3 ++
 2 files changed, 103 insertions(+), 12 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 910d9cf..f9e045f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -442,6 +442,22 @@ static QemuOptsList qcow2_runtime_opts = {
 .type = QEMU_OPT_BOOL,
 .help = "Check for unintended writes into an inactive L2 table",
 },
+{
+.name = QCOW2_OPT_CACHE_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum combined metadata (L2 tables and refcount blocks) 
"
+"cache size",
+},
+{
+.name = QCOW2_OPT_L2_CACHE_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum L2 table cache size",
+},
+{
+.name = QCOW2_OPT_REFCOUNT_CACHE_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Maximum refcount block cache size",
+},
 { /* end of list */ }
 },
 };
@@ -457,6 +473,61 @@ static const char 
*overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
 [QCOW2_OL_INACTIVE_L2_BITNR]= QCOW2_OPT_OVERLAP_INACTIVE_L2,
 };
 
+static void read_cache_sizes(QemuOpts *opts, uint64_t *l2_cache_size,
+ uint64_t *refcount_cache_size, Error **errp)
+{
+uint64_t combined_cache_size;
+bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
+
+combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
+l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
+refcount_cache_size_set = qemu_opt_get(opts, 
QCOW2_OPT_REFCOUNT_CACHE_SIZE);
+
+combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
+*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0);
+*refcount_cache_size = qemu_opt_get_size(opts,
+ QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0);
+
+if (combined_cache_size_set) {
+if (l2_cache_size_set && refcount_cache_size_set) {
+error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
+   " and " QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not be set "
+   "the same time");
+return;
+} else if (*l2_cache_size > combined_cache_size) {
+error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
+   QCOW2_OPT_CACHE_SIZE);
+return;
+} else if (*refcount_cache_size > combined_cache_size) {
+error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
+   QCOW2_OPT_CACHE_SIZE);
+return;
+}
+
+if (l2_cache_size_set) {
+*refcount_cache_size = combined_cache_size - *l2_cache_size;
+} else if (refcount_cache_size_set) {
+*l2_cache_size = combined_cache_size - *refcount_cache_size;
+} else {
+*refcount_cache_size = combined_cache_size
+ / (DEFAULT_L2_REFCOUNT_SIZE_RATIO + 1);
+*l2_cache_size = combined_cache_size - *refcount_cache_size;
+}
+} else {
+if (!l2_cache_size_set && !refcount_cache_size_set) {
+*l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE;
+*refcount_cache_size = *l2_cache_size
+ / DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+} else if (!l2_cache_size_set) {
+*l2_cache_size = *refcount_cache_size
+   * DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+} else if (!refcount_cache_size_set) {
+*refcount_cache_size = *l2_cache_size
+ / DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+}
+}
+}
+
 static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -707,18 +778,43 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
-/* alloc L2 table/refcount block cache */
-l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE / s->cluster_size;
+/* get L2 table/refcount block cache size from command line options */
+opts = qemu_opts_create(&qcow2_runtime_opts, NULL, 0, &error_abort);
+qemu_opts_absorb_qdict(opts, options, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
+read_cache_sizes(opts, &l2_cache_size, &refcount_cache_size, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail;
+}
+
+l2_cache_size /= s->cluster_size;
 if (l2_cache_size < MIN_L2_CACHE_SIZE) {

[Qemu-devel] [PATCH 0/4] qcow2: Allow runtime specification of cache sizes

2014-08-18 Thread Max Reitz

Currently, the metadata cache size is only tunable on compile time
through macros. However, some users may want to use the minimal cache
size (for whatever reason) and others may want to increase the cache
size because they have enough memory and want to increase performance.

This series adds runtime options for setting the cache size in bytes
(which is an easily comprehensible unit) in various ways (by setting
each cache explicitly or the total size).


This series (patch 2) depends on Markus' series
"[PATCH v2 0/4] block: Use g_new() & friends more".


Max Reitz (4):
  qcow2: Constant cache size in bytes
  qcow2: Use g_try_new0() for cache array
  qcow2: Add runtime options for cache sizes
  iotests: Add test for qcow2's cache options

 block/qcow2-cache.c|   7 ++-
 block/qcow2.c  | 120 +
 block/qcow2.h  |  13 -
 tests/qemu-iotests/103 |  99 +
 tests/qemu-iotests/103.out |  29 +++
 tests/qemu-iotests/group   |   1 +
 6 files changed, 255 insertions(+), 14 deletions(-)
 create mode 100755 tests/qemu-iotests/103
 create mode 100644 tests/qemu-iotests/103.out

-- 
2.0.4

[Qemu-devel] [PATCH 2/4] qcow2: Use g_try_new0() for cache array

2014-08-18 Thread Max Reitz

With a variable cache size, the number given to qcow2_cache_create() may
be huge. Therefore, use g_try_new0().

While at it, use g_new0() instead of g_malloc0() for allocating the
Qcow2Cache object.

Signed-off-by: Max Reitz 
---
 block/qcow2-cache.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index fe0615a..6eabfaa 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -48,9 +48,12 @@ Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int 
num_tables)
 Qcow2Cache *c;
 int i;
 
-c = g_malloc0(sizeof(*c));
+c = g_new0(Qcow2Cache, 1);
 c->size = num_tables;
-c->entries = g_new0(Qcow2CachedTable, num_tables);
+c->entries = g_try_new0(Qcow2CachedTable, num_tables);
+if (!c->entries) {
+goto fail;
+}
 
 for (i = 0; i < c->size; i++) {
 c->entries[i].table = qemu_try_blockalign(bs->file, s->cluster_size);
-- 
2.0.4

[Qemu-devel] [PATCH 1/4] qcow2: Constant cache size in bytes

2014-08-18 Thread Max Reitz

Specifying the metadata cache sizes in clusters results in less clusters
(and much less bytes) covered for small cluster sizes and vice versa.
Using a constant byte size reduces this difference, and makes it
possible to manually specify the cache size in an easily comprehensible
unit.

Signed-off-by: Max Reitz 
---
 block/qcow2.c | 16 ++--
 block/qcow2.h | 10 --
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 435e0e1..910d9cf 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -470,6 +470,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, 
int flags,
 uint64_t l1_vm_state_index;
 const char *opt_overlap_check;
 int overlap_check_template = 0;
+uint64_t l2_cache_size, refcount_cache_size;
 
 ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
 if (ret < 0) {
@@ -707,8 +708,19 @@ static int qcow2_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* alloc L2 table/refcount block cache */
-s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+l2_cache_size = DEFAULT_L2_CACHE_BYTE_SIZE / s->cluster_size;
+if (l2_cache_size < MIN_L2_CACHE_SIZE) {
+l2_cache_size = MIN_L2_CACHE_SIZE;
+}
+
+refcount_cache_size = l2_cache_size
+/ (DEFAULT_L2_REFCOUNT_SIZE_RATIO * s->cluster_size);
+if (refcount_cache_size < MIN_REFCOUNT_CACHE_SIZE) {
+refcount_cache_size = MIN_REFCOUNT_CACHE_SIZE;
+}
+
+s->l2_table_cache = qcow2_cache_create(bs, l2_cache_size);
+s->refcount_block_cache = qcow2_cache_create(bs, refcount_cache_size);
 if (s->l2_table_cache == NULL || s->refcount_block_cache == NULL) {
 error_setg(errp, "Could not allocate metadata caches");
 ret = -ENOMEM;
diff --git a/block/qcow2.h b/block/qcow2.h
index b49424b..671783d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -64,10 +64,16 @@
 #define MIN_CLUSTER_BITS 9
 #define MAX_CLUSTER_BITS 21
 
-#define L2_CACHE_SIZE 16
+#define MIN_L2_CACHE_SIZE 1 /* cluster */
 
 /* Must be at least 4 to cover all cases of refcount table growth */
-#define REFCOUNT_CACHE_SIZE 4
+#define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
+
+#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
+
+/* The refblock cache needs only a fourth of the L2 cache size to cover as many
+ * clusters */
+#define DEFAULT_L2_REFCOUNT_SIZE_RATIO 4
 
 #define DEFAULT_CLUSTER_SIZE 65536
 
-- 
2.0.4

Re: [Qemu-devel] [PATCH v2 3/4] qemu-io-cmds: g_renew() can't fail, bury dead error handling

2014-08-18 Thread Jeff Cody

On Mon, Aug 18, 2014 at 06:10:42PM +0200, Markus Armbruster wrote:
> Signed-off-by: Markus Armbruster 
> ---
>  qemu-io-cmds.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
> index afd8867..b224ede 100644
> --- a/qemu-io-cmds.c
> +++ b/qemu-io-cmds.c
> @@ -115,22 +115,13 @@ static char **breakline(char *input, int *count)
>  int c = 0;
>  char *p;
>  char **rval = g_new0(char *, 1);
> -char **tmp;
>  
>  while (rval && (p = qemu_strsep(&input, " ")) != NULL) {
>  if (!*p) {
>  continue;
>  }
>  c++;
> -tmp = g_renew(char *, rval, (c + 1));
> -if (!tmp) {
> -g_free(rval);
> -rval = NULL;
> -c = 0;
> -break;
> -} else {
> -rval = tmp;
> -}
> +rval = g_renew(char *, rval, (c + 1));
>  rval[c - 1] = p;
>  rval[c] = NULL;
>  }
> -- 
> 1.9.3
> 
>

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [PATCH v2 2/4] block: Use g_new() & friends to avoid multiplying sizes

2014-08-18 Thread Jeff Cody

On Mon, Aug 18, 2014 at 06:10:41PM +0200, Markus Armbruster wrote:
> g_new(T, n) is safer than g_malloc(sizeof(*v) * n) for two reasons.
> One, it catches multiplication overflowing size_t.  Two, it returns
> T * rather than void *, which lets the compiler catch more type
> errors.
> 
> Perhaps a conversion to g_malloc_n() would be neater in places, but
> that's merely four years old, and we can't use such newfangled stuff.
> 
> This commit only touches allocations with size arguments of the form
> sizeof(T), plus two that use 4 instead of sizeof(uint32_t).  We can
> make the others safe by converting to g_malloc_n() when it becomes
> available to us in a couple of years.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/bochs.c   |  2 +-
>  block/parallels.c   |  2 +-
>  block/qcow2-cache.c |  2 +-
>  block/qed-check.c   |  3 +--
>  block/rbd.c |  2 +-
>  block/sheepdog.c|  2 +-
>  hw/block/nvme.c |  8 
>  qemu-io-cmds.c  | 10 +-
>  8 files changed, 15 insertions(+), 16 deletions(-)
> 

[...]

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [PATCH] vhost_net: start/stop guest notifiers properly

2014-08-18 Thread Michael S. Tsirkin

On Mon, Aug 18, 2014 at 05:51:31PM +0800, Jason Wang wrote:
> commit a9f98bb5ebe6fb1869321dcc58e72041ae626ad8 vhost: multiqueue
> support changed the order of stopping the device. Previously
> vhost_dev_stop would disable backend and only afterwards, unset guest
> notifiers. We now unset guest notifiers while vhost is still
> active. This can lose interrupts causing guest networking to fail.
> 
> Additionally, remove the hdev->started assert in vhost.c since we may
> want to start the guest notifiers before vhost starts and stop the
> guest notifiers after vhost is stopped.
> 
> In particular, this has been observed during migration.
> 
> Reported-by: "Zhangjie (HZ)" 
> Signed-off-by: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 


This doesn't seem to apply to master.
Can you rebase please?
> --
> 
> Zhang Jie, please test this patch to see if it fixes the issue.
> ---
>  hw/net/vhost_net.c | 20 ++--
>  hw/virtio/vhost.c  |  2 --
>  2 files changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 006576d..72084ba 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -223,6 +223,12 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
> *ncs,
>  goto err;
>  }
>  
> +r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> +if (r < 0) {
> +error_report("Error binding guest notifier: %d", -r);
> +goto err;
> +}
> +
>  for (i = 0; i < total_queues; i++) {
>  r = vhost_net_start_one(tap_get_vhost_net(ncs[i].peer), dev, i * 2);
>  
> @@ -231,12 +237,6 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
> *ncs,
>  }
>  }
>  
> -r = k->set_guest_notifiers(qbus->parent, total_queues * 2, true);
> -if (r < 0) {
> -error_report("Error binding guest notifier: %d", -r);
> -goto err;
> -}
> -
>  return 0;
>  
>  err:
> @@ -254,16 +254,16 @@ void vhost_net_stop(VirtIODevice *dev, NetClientState 
> *ncs,
>  VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>  int i, r;
>  
> +for (i = 0; i < total_queues; i++) {
> +vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
> +}
> +
>  r = k->set_guest_notifiers(qbus->parent, total_queues * 2, false);
>  if (r < 0) {
>  fprintf(stderr, "vhost guest notifier cleanup failed: %d\n", r);
>  fflush(stderr);
>  }
>  assert(r >= 0);
> -
> -for (i = 0; i < total_queues; i++) {
> -vhost_net_stop_one(tap_get_vhost_net(ncs[i].peer), dev);
> -}
>  }
>  
>  void vhost_net_cleanup(struct vhost_net *net)
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 9e336ad..d74514a 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -969,7 +969,6 @@ void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
> VirtIODevice *vdev)
>  bool vhost_virtqueue_pending(struct vhost_dev *hdev, int n)
>  {
>  struct vhost_virtqueue *vq = hdev->vqs + n - hdev->vq_index;
> -assert(hdev->started);
>  assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
>  return event_notifier_test_and_clear(&vq->masked_notifier);
>  }
> @@ -981,7 +980,6 @@ void vhost_virtqueue_mask(struct vhost_dev *hdev, 
> VirtIODevice *vdev, int n,
>  struct VirtQueue *vvq = virtio_get_queue(vdev, n);
>  int r, index = n - hdev->vq_index;
>  
> -assert(hdev->started);
>  assert(n >= hdev->vq_index && n < hdev->vq_index + hdev->nvqs);
>  
>  struct vhost_vring_file file = {
> -- 
> 1.8.3.1

Re: [Qemu-devel] [PATCH 3/8] target-ppc: Bug Fix: rlwimi

2014-08-18 Thread Tom Musta

On 8/15/2014 3:05 PM, Richard Henderson wrote:
> On 08/11/2014 09:23 AM, Tom Musta wrote:
>> Also fix the special case of MB=31 and ME=0 to copy the entire contents
>> of the source GPR.
> 
> Err, that's not what you did.
> 
>>  if (likely(sh == 0 && mb == 0 && me == 31)) {
>> +#if defined(TARGET_PPC64)
>> +tcg_gen_mov_i64(cpu_gpr[rA(ctx->opcode)], cpu_gpr[rS(ctx->opcode)]);
>> +#else
>>  tcg_gen_ext32u_tl(cpu_gpr[rA(ctx->opcode)], 
>> cpu_gpr[rS(ctx->opcode)]);
>> +#endif
> 
> This is the reverse condition.  Which, true enough, should not be implemented
> with ext32u for PPC64.  But a MOV isn't right either, it is
> 
>   deposit(ra, rs, 0, 32)
> 
> Which does point out that we should probably implement anything MB <= ME and 
> SH
> == 31 - ME with the deposit opcode.
> 
> 
> r~
> 

Richard:

Good catch.  I found a bug in my test generator ... rlwimi is unusual in that 
the
"RA" register is both a source and a target.  A fix is forthcoming.

Thanks also for your other comments.  Unlike this one, I believe they are 
optimizations.
I will investigate and potentially publish some additional changes.  Alex has 
already
taken this series into his ppc-next, so the new patches will be relative to 
these.

Re: [Qemu-devel] [PATCH v2 1/4] block: Use g_new() & friends where that makes obvious sense

2014-08-18 Thread Jeff Cody

On Mon, Aug 18, 2014 at 06:10:40PM +0200, Markus Armbruster wrote:
> g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
> for two reasons.  One, it catches multiplication overflowing size_t.
> Two, it returns T * rather than void *, which lets the compiler catch
> more type errors.
> 
> Patch created with Coccinelle, with two manual changes on top:
> 
> * Add const to bdrv_iterate_format() to keep the types straight
> 
> * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
>   inexplicably misses
> 
> Coccinelle semantic patch:
> 
> @@
> type T;
> @@
> -g_malloc(sizeof(T))
> +g_new(T, 1)
> @@
> type T;
> @@
> -g_try_malloc(sizeof(T))
> +g_try_new(T, 1)
> @@
> type T;
> @@
> -g_malloc0(sizeof(T))
> +g_new0(T, 1)
> @@
> type T;
> @@
> -g_try_malloc0(sizeof(T))
> +g_try_new0(T, 1)
> @@
> type T;
> expression n;
> @@
> -g_malloc(sizeof(T) * (n))
> +g_new(T, n)
> @@
> type T;
> expression n;
> @@
> -g_try_malloc(sizeof(T) * (n))
> +g_try_new(T, n)
> @@
> type T;
> expression n;
> @@
> -g_malloc0(sizeof(T) * (n))
> +g_new0(T, n)
> @@
> type T;
> expression n;
> @@
> -g_try_malloc0(sizeof(T) * (n))
> +g_try_new0(T, n)
> @@
> type T;
> expression p, n;
> @@
> -g_realloc(p, sizeof(T) * (n))
> +g_renew(T, p, n)
> @@
> type T;
> expression p, n;
> @@
> -g_try_realloc(p, sizeof(T) * (n))
> +g_try_renew(T, p, n)
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block-migration.c  |  6 +++---
>  block.c| 14 +++---
>  block/archipelago.c|  6 +++---
>  block/gluster.c|  8 
>  block/iscsi.c  |  2 +-
>  block/nfs.c|  2 +-
>  block/qcow.c   |  2 +-
>  block/qcow2-cluster.c  |  2 +-
>  block/qcow2-refcount.c |  8 
>  block/qcow2-snapshot.c |  8 
>  block/raw-posix.c  |  2 +-
>  block/rbd.c|  4 ++--
>  block/sheepdog.c   |  4 ++--
>  block/vdi.c|  2 +-
>  block/vhdx.c   |  4 ++--
>  block/vmdk.c   |  7 +++
>  block/vvfat.c  |  2 +-
>  blockdev-nbd.c |  2 +-
>  blockdev.c |  2 +-
>  hw/ide/ahci.c  |  2 +-
>  qemu-io-cmds.c |  2 +-
>  qemu-io.c  |  2 +-
>  22 files changed, 46 insertions(+), 47 deletions(-)

[...]

Reviewed-by: Jeff Cody

Re: [Qemu-devel] [Qemu-trivial] [PATCH v7] tests/bios-tables-test: check the value returned by fopen()

2014-08-18 Thread Michael S. Tsirkin

On Mon, Aug 18, 2014 at 06:24:02PM +0400, Michael Tokarev wrote:
> 18.08.2014 17:44, Michael S. Tsirkin wrote:
> > On Mon, Aug 18, 2014 at 03:38:12PM +0400, Michael Tokarev wrote:
> >> 18.08.2014 11:54, zhanghailiang wrote:
> >>> The function fopen() may fail, so check its return value.
> >>>
> >>> Signed-off-by: zhanghailiang 
> >>> Signed-off-by: Li Liu 
> >>> Reviewed-by: Alex Bennée 
> >>> ---
> >>>  tests/bios-tables-test.c | 5 +
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>> diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
> >>> index 045eb27..feb3b58 100644
> >>> --- a/tests/bios-tables-test.c
> >>> +++ b/tests/bios-tables-test.c
> >>> @@ -790,6 +790,11 @@ int main(int argc, char *argv[])
> >>>  const char *arch = qtest_get_arch();
> >>>  FILE *f = fopen(disk, "w");
> >>>  int ret;
> >>> +
> >>> +if (!f) {
> >>> +fprintf(stderr, "Couldn't open \"%s\": %s", disk, 
> >>> strerror(errno));
> >>> +return -1;
> >>> +}
> >>
> >> Applied to -trivial, with s/-1/1/.  No need to resend it again.
> >>
> >> Thanks,
> >>
> >> /mjt
> >>
> >>>  fwrite(boot_sector, 1, sizeof boot_sector, f);
> >>>  fclose(f);
> > 
> > This doesn't seem appropriate to trivial tree:
> > there were some objections from Marcel, I'd like to
> > see them addressed.
> 
> I see the objections as addressed, don't you?
> 
> /mjt

Does test fail if this path is triggered?

[Qemu-devel] [PULL 1/3] monitor: Remove hardcoded watchdog event names

2014-08-18 Thread Luiz Capitulino

From: Hani Benhabiles 

Signed-off-by: Hani Benhabiles 
Signed-off-by: Luiz Capitulino 
---
 monitor.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/monitor.c b/monitor.c
index cdbaa60..48f0fdc 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4521,16 +4521,15 @@ void netdev_del_completion(ReadLineState *rs, int 
nb_args, const char *str)
 
 void watchdog_action_completion(ReadLineState *rs, int nb_args, const char 
*str)
 {
+int i;
+
 if (nb_args != 2) {
 return;
 }
 readline_set_completion_index(rs, strlen(str));
-add_completion_option(rs, str, "reset");
-add_completion_option(rs, str, "shutdown");
-add_completion_option(rs, str, "poweroff");
-add_completion_option(rs, str, "pause");
-add_completion_option(rs, str, "debug");
-add_completion_option(rs, str, "none");
+for (i = 0; WatchdogExpirationAction_lookup[i]; i++) {
+add_completion_option(rs, str, WatchdogExpirationAction_lookup[i]);
+}
 }
 
 void migrate_set_capability_completion(ReadLineState *rs, int nb_args,
-- 
1.9.3

[Qemu-devel] [PULL 2/3] dump.c: Fix memory leak issue in cleanup processing for dump_init()

2014-08-18 Thread Luiz Capitulino

From: Chen Gang 

In dump_init(), when failure occurs, need notice about 'fd' and memory
mapping. So call dump_cleanup() for it (need let all initializations at
front).

Also simplify dump_cleanup(): remove redundant 'ret' and redundant 'fd'
checking.

Signed-off-by: Chen Gang 
Reviewed-by: Laszlo Ersek 
Signed-off-by: Luiz Capitulino 
---
 dump.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/dump.c b/dump.c
index ce646bc..71d3e94 100644
--- a/dump.c
+++ b/dump.c
@@ -71,18 +71,14 @@ uint64_t cpu_to_dump64(DumpState *s, uint64_t val)
 
 static int dump_cleanup(DumpState *s)
 {
-int ret = 0;
-
 guest_phys_blocks_free(&s->guest_phys_blocks);
 memory_mapping_list_free(&s->list);
-if (s->fd != -1) {
-close(s->fd);
-}
+close(s->fd);
 if (s->resume) {
 vm_start();
 }
 
-return ret;
+return 0;
 }
 
 static void dump_error(DumpState *s, const char *reason)
@@ -1499,6 +1495,8 @@ static int dump_init(DumpState *s, int fd, bool 
has_format,
 s->begin = begin;
 s->length = length;
 
+memory_mapping_list_init(&s->list);
+
 guest_phys_blocks_init(&s->guest_phys_blocks);
 guest_phys_blocks_append(&s->guest_phys_blocks);
 
@@ -1526,7 +1524,6 @@ static int dump_init(DumpState *s, int fd, bool 
has_format,
 }
 
 /* get memory mapping */
-memory_mapping_list_init(&s->list);
 if (paging) {
 qemu_get_guest_memory_mapping(&s->list, &s->guest_phys_blocks, &err);
 if (err != NULL) {
@@ -1622,12 +1619,7 @@ static int dump_init(DumpState *s, int fd, bool 
has_format,
 return 0;
 
 cleanup:
-guest_phys_blocks_free(&s->guest_phys_blocks);
-
-if (s->resume) {
-vm_start();
-}
-
+dump_cleanup(s);
 return -1;
 }
 
-- 
1.9.3

[Qemu-devel] [PULL 0/3] QMP queue

2014-08-18 Thread Luiz Capitulino

Three little birds.

The following changes since commit 08ab59770da57648bfb8fc9be37f0ef7fb50b0f9:

  Merge remote-tracking branch 'remotes/mcayland/qemu-sparc' into staging 
(2014-08-18 12:55:02 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/qmp-unstable.git queue/qmp

for you to fetch changes up to b3dd1b8c295636e64ceb14cdc4db6420d7319e38:

  monitor: fix use after free (2014-08-18 14:39:10 -0400)


Chen Gang (1):
  dump.c: Fix memory leak issue in cleanup processing for dump_init()

Hani Benhabiles (1):
  monitor: Remove hardcoded watchdog event names

Michael S. Tsirkin (1):
  monitor: fix use after free

 dump.c| 18 +-
 include/monitor/monitor.h |  2 +-
 monitor.c | 19 ++-
 stubs/fdset-remove-fd.c   |  3 +--
 4 files changed, 17 insertions(+), 25 deletions(-)

[Qemu-devel] [PULL 3/3] monitor: fix use after free

2014-08-18 Thread Luiz Capitulino

From: "Michael S. Tsirkin" 

The function monitor_fdset_dup_fd_find_remove() references member of
'mon_fdset' which - when remove flag is set - may be freed in function
monitor_fdset_cleanup().
remove is set by monitor_fdset_dup_fd_remove which in practice
does not need the returned value, so make it void,
and return -1 from monitor_fdset_dup_fd_find_remove.

Reported-by: zhanghailiang 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Luiz Capitulino 
---
 include/monitor/monitor.h | 2 +-
 monitor.c | 8 +---
 stubs/fdset-remove-fd.c   | 3 +--
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 3d6929d..78a5fc8 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -64,7 +64,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 Error **errp);
 int monitor_fdset_get_fd(int64_t fdset_id, int flags);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int dup_fd);
-int monitor_fdset_dup_fd_remove(int dup_fd);
+void monitor_fdset_dup_fd_remove(int dup_fd);
 int monitor_fdset_dup_fd_find(int dup_fd);
 
 #endif /* !MONITOR_H */
diff --git a/monitor.c b/monitor.c
index 48f0fdc..34cee74 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2542,8 +2542,10 @@ static int monitor_fdset_dup_fd_find_remove(int dup_fd, 
bool remove)
 if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
 monitor_fdset_cleanup(mon_fdset);
 }
+return -1;
+} else {
+return mon_fdset->id;
 }
-return mon_fdset->id;
 }
 }
 }
@@ -2555,9 +2557,9 @@ int monitor_fdset_dup_fd_find(int dup_fd)
 return monitor_fdset_dup_fd_find_remove(dup_fd, false);
 }
 
-int monitor_fdset_dup_fd_remove(int dup_fd)
+void monitor_fdset_dup_fd_remove(int dup_fd)
 {
-return monitor_fdset_dup_fd_find_remove(dup_fd, true);
+monitor_fdset_dup_fd_find_remove(dup_fd, true);
 }
 
 int monitor_handle_fd_param(Monitor *mon, const char *fdname)
diff --git a/stubs/fdset-remove-fd.c b/stubs/fdset-remove-fd.c
index b3886d9..7f6d61e 100644
--- a/stubs/fdset-remove-fd.c
+++ b/stubs/fdset-remove-fd.c
@@ -1,7 +1,6 @@
 #include "qemu-common.h"
 #include "monitor/monitor.h"
 
-int monitor_fdset_dup_fd_remove(int dupfd)
+void monitor_fdset_dup_fd_remove(int dupfd)
 {
-return -1;
 }
-- 
1.9.3

Re: [Qemu-devel] [PATCH v3 0/5] intel-iommu: introduce Intel IOMMU (VT-d) emulation to q35 chipset

2014-08-18 Thread Jan Kiszka

On 2014-08-18 18:34, Knut Omang wrote:
> On Sat, 2014-08-16 at 10:47 +0200, Jan Kiszka wrote:
>> On 2014-08-16 10:45, Jan Kiszka wrote:
>>> On 2014-08-16 09:54, Knut Omang wrote:
 On Fri, 2014-08-15 at 19:37 +0800, Le Tan wrote:
> Hi Knut,
>
> 2014-08-15 19:15 GMT+08:00 Knut Omang :
>> On Fri, 2014-08-15 at 06:42 +0200, Knut Omang wrote:
>>> On Thu, 2014-08-14 at 14:10 +0200, Jan Kiszka wrote:
 On 2014-08-14 13:15, Michael S. Tsirkin wrote:
> On Mon, Aug 11, 2014 at 03:04:57PM +0800, Le Tan wrote:
>> Hi,
>>
>> These patches are intended to introduce Intel IOMMU (VT-d) emulation 
>> to q35
>> chipset. The major job in these patches is to add support for 
>> emulating Intel
>> IOMMU according to the VT-d specification, including basic responses 
>> to CSRs
>> accesses, the logics of DMAR (DMA remapping) and DMA memory address
>> translations.
>
> Thanks!
> Looks very good overall, I noted some coding style issues - I didn't
> bother reporting each issue in every place where it appears - reported
> each issue once only, so please find and fix all instances of each
> issue.

 BTW, because I was in urgent need for virtual test environment for
 Jailhouse, I hacked interrupt remapping on top of Le's patches:

 http://git.kiszka.org/?p=qemu.git;a=shortlog;h=refs/heads/queues/vtd-intremap

 The approach likely needs further discussions and refinements but it
 already allows me to work on top with our hypervisor, and also Linux.
 You can see from the last commit that Le's work made it pretty easy to
 build this on top.
>>>
>>> Le,
>>>
>>> I have tried Jan's branch with my device setup which consists of a
>>> minimal q35 setup, an ioh3420 root port (specified as -device
>>> ioh3420,slot=0 ) and a pcie device plugged into that root port, which
>>> gives the following lscpi -t:
>>>
>>> -[:00]-+-00.0
>>>+-01.0
>>>+-02.0
>>>+-03.0-[01]00.0
>>>+-04.0
>>>+-1f.0
>>>+-1f.2
>>>\-1f.3
>>>
>>> All seems to work beautifully (I see the ISA bridge happily receive
>>> translations) until the first DMA from my device model (at 1:00.0)
>>> arrives, at which point I get:
>>>
>>> [ 1663.732413] dmar: DMAR:[DMA Write] Request device [00:03.0] fault 
>>> addr fffa
>>> [ 1663.732413] DMAR:[fault reason 02] Present bit in context entry is 
>>> clear
>>>
>>> I would have expected request device 01:00.0 for this.
>>> It is not clear to me yet if this is a weakness of the implementation of
>>> ioh3420 or the iommu. Just wanted to let you know right away in case you
>>> can shed some light to it or it is an easy fix,
>>>
>>> The device uses pci_dma_rw with itself as device pointer.
>>
>> To verify my hypothesis: with this rude hack my device now works much
>> better:
>>
>> @@ -774,6 +780,8 @@ static void iommu_translate(VTDAddressSpace *vtd_as,
>> int bus_num, int devfn,
>>  is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>  } else {
>>  ret_fr = dev_to_context_entry(s, bus_num, devfn, &ce);
>> +if (ret_fr)
>> +ret_fr = dev_to_context_entry(s, 1, 0, &ce);
>>  is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
>>  if (ret_fr) {
>>  ret_fr = -ret_fr;
>>
>> Looking at how things look on hardware, multiple devices often receive
>> overlapping DMA address ranges for different physical addresses.
>>
>> So if I understand the way this works, every requester ID would also
>> need to have it's own unique VTDAddressSpace, as each pci
>> device/function sees a unique DMA address space..
>
> ioh3420 is a pcie-to-pcie bridge, right? 

 Yes.

> In my opinion, each pci-e
> device behind the pcie-to-pcie bridge can be assigned individually.
> For now I added the VT-d to q35 by just adding it to the root pci bus.
> You can see here in q35.c:
> pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
> So if we add a pcie-to-pcie bridge, we may have to call the
> pci_setup_iommu() for that new bus. I don't know where to hook into
> this now. :) If you know the mechanism behind that, you can try to add
> that for the new bus. (I will dive into this after the clean up.)
> What do you think?

 Thanks for the quick answer, that helped a lot!

 Looking into the details here I realize it is slightly more complicated:
 secondary buses are enumerated after device instantiation, as part of
 the host PCI enumeration, so if I add a similar setup call in the bridge
 setup, it will be

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Hulin, Patrick - 0559 - MITLL

On 8/18/14, 1:47 PM, "Hulin, Patrick - 0559 - MITLL"
 wrote:

>On Aug 17, 2014, at 1:21 AM, Paolo Bonzini  wrote:
>
>> Il 15/08/2014 23:49, Hulin, Patrick - 0559 - MITLL ha scritto:
> In this case, the write is 8 bytes and unaligned, so it gets split
> into 8 single-byte writes. In stock QEMU, these writes are done in
> reverse order (see the loop in softmmu_template.h, line 402). The
> third decryption xor from Kernel Patch Protection should hit 4 bytes
> that are in the current TB and 4 bytes in the TB afterwards in linear
> order. Since this happens in reverse order, and the last 4 bytes of
> the write do not intersect the current TB, those writes happen
> successfully and QEMU's memory is modified. The 4th byte in linear
> order (the 5th in temporal order) then triggers the
> current_tb_modified flag and cpu_restore_state, longjmp'ing out.
> 
 Would it work to just call tb_invalidate_phys_page_range before the
 helper_ret_stb loop?
>>> 
>>> Maybe. I think there¹s another issue, which is that QEMU¹s ending up
>>> in the I/O read/write code instead of the normal memory RW. This could
>>> be QEMU messing up, it could be PatchGuard doing something weird, or it
>>> could be me misunderstanding what¹s going on. I never really figured
>>>out
>>> how the control flow works here.
>> 
>> That's okay.  Everything that's in the slow path goes down
>> io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
>> tracking and causes QEMU to choose that path).
>> 
>> Try making a self-contained test case using the kvm-unit-tests harness
>> (git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).
>> 
>> Paolo
>
>Okay. I¹ve attached a test case. Note that since this has to run without
>KVM it messes around with the run script.

The friendly folks on IRC point out that I should probably include the
actual binary.



selfmodify.flat
Description: selfmodify.flat

Re: [Qemu-devel] [PATCH] monitor: fix use after free

2014-08-18 Thread Luiz Capitulino

On Sun, 17 Aug 2014 11:45:17 +0200
"Michael S. Tsirkin"  wrote:

> The function monitor_fdset_dup_fd_find_remove() references member of
> 'mon_fdset' which - when remove flag is set - may be freed in function
> monitor_fdset_cleanup().
> remove is set by monitor_fdset_dup_fd_remove which in practice
> does not need the returned value, so make it void,
> and return -1 from monitor_fdset_dup_fd_find_remove.
> 
> Signed-off-by: Michael S. Tsirkin 
> Reported-by: zhanghailiang 

I've applied this one to the qmp tree, but I have one comment below.

> ---
>  include/monitor/monitor.h | 2 +-
>  monitor.c | 8 +---
>  stubs/fdset-remove-fd.c   | 3 +--
>  3 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
> index 3d6929d..78a5fc8 100644
> --- a/include/monitor/monitor.h
> +++ b/include/monitor/monitor.h
> @@ -64,7 +64,7 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
> int64_t fdset_id,
>  Error **errp);
>  int monitor_fdset_get_fd(int64_t fdset_id, int flags);
>  int monitor_fdset_dup_fd_add(int64_t fdset_id, int dup_fd);
> -int monitor_fdset_dup_fd_remove(int dup_fd);
> +void monitor_fdset_dup_fd_remove(int dup_fd);
>  int monitor_fdset_dup_fd_find(int dup_fd);
>  
>  #endif /* !MONITOR_H */
> diff --git a/monitor.c b/monitor.c
> index cdbaa60..ba1908f 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2542,8 +2542,10 @@ static int monitor_fdset_dup_fd_find_remove(int 
> dup_fd, bool remove)
>  if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>  monitor_fdset_cleanup(mon_fdset);
>  }
> +return -1;

Returning -1 here looks wrong. A better fix is probably to split this
function into two functions as it's doing two unrelated things currently.

I've applied the fix anyway because the only user of remove=true doesn't
care about the return value and also because the bad semantic is less worse
than the use after free...

> +} else {
> +return mon_fdset->id;
>  }
> -return mon_fdset->id;
>  }
>  }
>  }
> @@ -2555,9 +2557,9 @@ int monitor_fdset_dup_fd_find(int dup_fd)
>  return monitor_fdset_dup_fd_find_remove(dup_fd, false);
>  }
>  
> -int monitor_fdset_dup_fd_remove(int dup_fd)
> +void monitor_fdset_dup_fd_remove(int dup_fd)
>  {
> -return monitor_fdset_dup_fd_find_remove(dup_fd, true);
> +monitor_fdset_dup_fd_find_remove(dup_fd, true);
>  }
>  
>  int monitor_handle_fd_param(Monitor *mon, const char *fdname)
> diff --git a/stubs/fdset-remove-fd.c b/stubs/fdset-remove-fd.c
> index b3886d9..7f6d61e 100644
> --- a/stubs/fdset-remove-fd.c
> +++ b/stubs/fdset-remove-fd.c
> @@ -1,7 +1,6 @@
>  #include "qemu-common.h"
>  #include "monitor/monitor.h"
>  
> -int monitor_fdset_dup_fd_remove(int dupfd)
> +void monitor_fdset_dup_fd_remove(int dupfd)
>  {
> -return -1;
>  }

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Hulin, Patrick - 0559 - MITLL

On Aug 17, 2014, at 1:21 AM, Paolo Bonzini  wrote:

> Il 15/08/2014 23:49, Hulin, Patrick - 0559 - MITLL ha scritto:
 In this case, the write is 8 bytes and unaligned, so it gets split
 into 8 single-byte writes. In stock QEMU, these writes are done in
 reverse order (see the loop in softmmu_template.h, line 402). The
 third decryption xor from Kernel Patch Protection should hit 4 bytes
 that are in the current TB and 4 bytes in the TB afterwards in linear
 order. Since this happens in reverse order, and the last 4 bytes of
 the write do not intersect the current TB, those writes happen
 successfully and QEMU's memory is modified. The 4th byte in linear
 order (the 5th in temporal order) then triggers the
 current_tb_modified flag and cpu_restore_state, longjmp'ing out.
 
>>> Would it work to just call tb_invalidate_phys_page_range before the
>>> helper_ret_stb loop?
>> 
>> Maybe. I think there’s another issue, which is that QEMU’s ending up
>> in the I/O read/write code instead of the normal memory RW. This could
>> be QEMU messing up, it could be PatchGuard doing something weird, or it
>> could be me misunderstanding what’s going on. I never really figured out
>> how the control flow works here.
> 
> That's okay.  Everything that's in the slow path goes down
> io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
> tracking and causes QEMU to choose that path).
> 
> Try making a self-contained test case using the kvm-unit-tests harness
> (git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).
> 
> Paolo

Okay. I’ve attached a test case. Note that since this has to run without KVM it 
messes around with the run script.



selfmodify.patch
Description: selfmodify.patch

Re: [Qemu-devel] [PATCH v5 2/7] tests: Add virtio device initialization

2014-08-18 Thread Marc Marí

>El Mon, 18 Aug 2014 14:46:07 +0200
>Marc Marí  escribió:
> +void qvirtio_pci_device_enable(QVirtioPCIDevice *d)
> +{
> +qpci_device_enable(d->pdev);
> +d->addr = qpci_iomap(d->pdev, 0);
> +g_assert(d->addr != NULL);
> +}
> +

qpci_iomap changed its prototype in one of the lastest pull requests.
qpci_iomap should have a NULL as third parameter. Will change in
next version.

Marc

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Hulin, Patrick - 0559 - MITLL

On Aug 18, 2014, at 1:37 PM, Richard Henderson  wrote:

> On 08/16/2014 10:21 PM, Paolo Bonzini wrote:
 Would it work to just call tb_invalidate_phys_page_range before the
 helper_ret_stb loop?
> 
> I doubt it.

Correct. Doesn’t work. Haven’t fully diagnosed why, but it doesn’t seem to ever 
hit the current_tb_modified passage if you invalidate beforehand.

> I believe that the proper solution is to force *both* page table entries into
> the TLB before performing any actual memory operations.
> 
> We'll have done the page for the first byte at the top of
> helper_{le,be}_{ld,st}_name.  When we discover it's an unaligned access, we
> should load and check the pte for the second page.  We might have to shuffle
> those two tests around, since in theory the second page could be I/O mapped 
> and
> we'd want to pass off the whole access to io_mem_*.
> 
> Since two adjacent pages can't conflict in our direct-mapped TLB, we can then
> safely pass off the work to secondary helpers without fear the first TLB entry
> will be flushed.

This isn’t about cross-page writes, although that might make fixing the problem 
a little tricky. The issue occurs with two adjacent TBs on the same page: 
because the individual writes are split up and done in reverse order, writes 
(and reads) off the back of the current TB happen twice. In the case of an xor 
this means the original xor gets undone, which is what breaks in Windows 7.

Re: [Qemu-devel] [PATCH v5 6/7] libqos: Added MSI-X support

2014-08-18 Thread Marc Marí

>El Mon, 18 Aug 2014 14:46:11 +0200
>Marc Marí  escribió:
> +void qpci_msix_enable(QPCIDevice *dev)
> +{
> +uint8_t addr;
> +uint16_t val;
> +uint32_t table;
> +uint8_t bir_table;
> +uint8_t bir_pba;
> +void *offset;
> +
> +addr = qpci_find_capability(dev, PCI_CAP_ID_MSIX);
> +g_assert_cmphex(addr, !=, 0);
> +
> +val = qpci_config_readw(dev, addr + PCI_MSIX_FLAGS);
> +qpci_config_writew(dev, addr + PCI_MSIX_FLAGS, val |
> PCI_MSIX_FLAGS_ENABLE); +
> +table = qpci_config_readl(dev, addr + PCI_MSIX_TABLE);
> +bir_table = table & PCI_MSIX_FLAGS_BIRMASK;
> +offset = qpci_iomap(dev, bir_table);
> +dev->msix_table = offset + (table & ~PCI_MSIX_FLAGS_BIRMASK);
> +
> +table = qpci_config_readl(dev, addr+PCI_MSIX_PBA);
> +bir_pba = table & PCI_MSIX_FLAGS_BIRMASK;
> +if (bir_pba != bir_table) {
> +offset = qpci_iomap(dev, bir_pba);

qpci_iomap changed its prototype in one of the lastest pull requests.
Both qpci_iomap should have a NULL as third parameter. Will change in
next version.

Marc

Re: [Qemu-devel] [RFC PATCH v2 10/13] linux headers update for DDW

2014-08-18 Thread Alex Williamson

On Fri, 2014-08-15 at 20:12 +1000, Alexey Kardashevskiy wrote:
> Since the changes are not in upstream yet, no tag or branch is specified here.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  linux-headers/linux/vfio.h | 37 -
>  1 file changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index 26c218e..f0aa97d 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -448,13 +448,48 @@ struct vfio_iommu_type1_dma_unmap {
>   */
>  struct vfio_iommu_spapr_tce_info {
>   __u32 argsz;
> - __u32 flags;/* reserved for future use */
> + __u32 flags;
> +#define VFIO_IOMMU_SPAPR_TCE_FLAG_DDW1 /* Support dynamic windows */
>   __u32 dma32_window_start;   /* 32 bit window start (bytes) */
>   __u32 dma32_window_size;/* 32 bit window size (bytes) */
>  };
>  
>  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO_IO(VFIO_TYPE, VFIO_BASE + 12)
>  
> +/*
> + * Dynamic DMA windows
> + */
> +struct vfio_iommu_spapr_tce_query {
> + __u32 argsz;
> + /* out */
> + __u32 windows_available;
> + __u32 page_size_mask;
> +};

Why do we need a new ioctl for this vs extending tce_info to include it?
That's sort of the point of including argsz and flags in the ioctl.

> +#define VFIO_IOMMU_SPAPR_TCE_QUERY   _IO(VFIO_TYPE, VFIO_BASE + 17)
> +
> +struct vfio_iommu_spapr_tce_create {
> + __u32 argsz;
> + /* in */
> + __u32 page_shift;
> + __u32 window_shift;
> + /* out */
> + __u64 start_addr;
> +
> +};
> +#define VFIO_IOMMU_SPAPR_TCE_CREATE  _IO(VFIO_TYPE, VFIO_BASE + 18)
> +
> +struct vfio_iommu_spapr_tce_remove {
> + __u32 argsz;
> + /* in */
> + __u64 start_addr;
> +};
> +#define VFIO_IOMMU_SPAPR_TCE_REMOVE  _IO(VFIO_TYPE, VFIO_BASE + 19)
> +
> +struct vfio_iommu_spapr_tce_reset {
> + __u32 argsz;
> +};
> +#define VFIO_IOMMU_SPAPR_TCE_RESET   _IO(VFIO_TYPE, VFIO_BASE + 20)
> +

argsz by itself seems rather pointless if we don't have a flags field to
augment the structure.  Thanks,

Alex

>  /* * */
>  
>  #endif /* VFIO_H */

Re: [Qemu-devel] QEMU, self-modifying code, and Windows 7 64-bit (no KVM)

2014-08-18 Thread Richard Henderson

On 08/16/2014 10:21 PM, Paolo Bonzini wrote:
>>> Would it work to just call tb_invalidate_phys_page_range before the
>>> helper_ret_stb loop?

I doubt it.

>> Maybe. I think there’s another issue, which is that QEMU’s ending up
>> in the I/O read/write code instead of the normal memory RW. This could
>> be QEMU messing up, it could be PatchGuard doing something weird, or it
>> could be me misunderstanding what’s going on. I never really figured out
>> how the control flow works here.
> 
> That's okay.  Everything that's in the slow path goes down
> io_mem_read/write (in this case TLB_NOTDIRTY is set for dirty-page
> tracking and causes QEMU to choose that path).
> 
> Try making a self-contained test case using the kvm-unit-tests harness
> (git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git).

I believe that the proper solution is to force *both* page table entries into
the TLB before performing any actual memory operations.

We'll have done the page for the first byte at the top of
helper_{le,be}_{ld,st}_name.  When we discover it's an unaligned access, we
should load and check the pte for the second page.  We might have to shuffle
those two tests around, since in theory the second page could be I/O mapped and
we'd want to pass off the whole access to io_mem_*.

Since two adjacent pages can't conflict in our direct-mapped TLB, we can then
safely pass off the work to secondary helpers without fear the first TLB entry
will be flushed.

r~

1 2 3 >

1 - 100 of 286 matches

Mail list logo