date:20160216

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Neo Jia

On Wed, Feb 17, 2016 at 07:46:15AM +, Tian, Kevin wrote:
> > From: Neo Jia
> > Sent: Wednesday, February 17, 2016 3:26 PM
> > 
> > 
> 
> > 
> > If your most concern is having this kind of path doesn't provide enough
> > information of the virtual device, we can add more sysfs attributes within 
> > the
> > directory of /sys/devices/virtual/vgpu/$UUID-$vgpu_idx/ to reflect the
> > information you want.
> 
> Like Gerd said, you can have something like this:
> 
> -device vfio-pci,sysfsdev=/sys/devices/virtual/vgpu/vgpu_idx/UUID

Hi Kevin,

The vgpu_idx is not unique number at all.

For example, how to locate the path of a given VM? Whoever is going to configure
the qemu has to walk through *all* the current vgpu path to locate the UUID to
match the QEMU's VM UUID. This is not required if you have UUID as part of the
device path.

> 
> > 
> > Even with UUID, you don't need libvirt at all. you can get uuid by running
> > uuidgen command, I don't need libvirt to code up and test the RFC that I 
> > have
> > sent out early. :-)
> 
> although simple, it still creates unnecessary user space dependency for
> kernel resource management...

I think I has answered this, UUID is not a user space or kernel space
concept, it is just a generic way to represent object, it just make sure that
virtual gpu device directory can be uniquely addressed.

Thanks,
Neo

> 
> Thanks
> Kevin

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Tian, Kevin

> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Wednesday, February 17, 2016 3:32 PM
> 
> On Wed, Feb 17, 2016 at 07:52:53AM +0100, Gerd Hoffmann wrote:
> >   Hi,
> >
> > > The answer is simple, having a UUID as part of the device name will give 
> > > you a
> > > unique sysfs path that will be opened by QEMU.
> >
> > A descriptive name will work too, and I think it'll be easier to make
> > those names persistent because you don't have to store the uuids
> > somewhere to re-create the same setup afer reboot.
> 
> Hi Gerd,
> 
> Right, UUID will be persistent cross reboot. The qemu vgpu path for a given 
> VM will
> not get changed when it gets reboots and multiple other devices have been
> created in the middle.

Curious why persistence matters here. It's completely OK to assign 
another vgpu to this VM across reboot, as long as the new vgpu
provides same capability to previous one.

Thanks
Kevin

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Tian, Kevin

> From: Neo Jia
> Sent: Wednesday, February 17, 2016 3:26 PM
> 
> 
> >
> > Qemu doesn't need to know the relation between virtual/physical devices at
> > all. It's just a path regardless of how vgpu name is created (either with 
> > your
> > UUID proposal or my descriptive string proposal)
> 
> No, with path like above, QEMU needs to know the virtual device is created 
> from
> that physical device :00:02.0 right? (you have mentioned this yourself
> actually below.) If QEMU doesn't want to know that, then he will transfer the

I didn't say anything that Qemu needs to know it. I said "I can immediately know
..." which means a human being. :-)

Qemu only needs a path to open.


[...]
> 
> >
> > This is a typical way how device nodes are created within sysfs, e.g. on my
> > platform:
> >
> > $ ls /sys/class/drm/
> > card0/  card0-DP-2/ card0-HDMI-A-2/ controlD64/
> > card0-DP-1/ card0-HDMI-A-1/ card0-VGA-1/version
> >
> > $ ls /sys/bus/pci/devices/
> > :00:00.0  :00:14.0  :00:1a.0  :00:1c.1  :00:1f.2
> > :00:02.0  :00:16.0  :00:1b.0  :00:1d.0  :00:1f.3
> > :00:03.0  :00:19.0  :00:1c.0  :00:1f.0  :02:00.0
> >
> > We'd better keep such style when creating vgpu nodes in sysfs. UUID is
> > at most anther info suffixed to the default string (or in another file), if
> > necessary.
> 
> It doesn't apply here, your above example are all physical devices.
> 
> The reason I want to have UUID is it to match the instance of problem domain
> here - VM.

It doesn't matter whether it's physical or virtual. Sysfs includes many nodes
not physically existing. 

The point that I don't understand, is why you insist the only way to associate
vgpu to a VM is by encoding UUID in vgpu name. libvirt maintains many 
attributes (including other virtual devices) for a given VM, in its internal 
database. It's not a problem to reverse find a VM according to a general vgpu 
name.

[...]
> >
> > I'm fine to have another node to provide more vendor specific information. 
> > But
> > I don't want to make UUID mandatory when creating a vGPU instance, as
> > explained above. Today we can create VMs in KVM w/o using libvirt, and w/o
> > the need of allocating any UUID. Same thing should be supported for vgpu
> > feature too.
> 
> So, I really don't see any drawback of using UUID as part of the virtual 
> device
> directory, it is easy, simple and organically reflecting the relation between
> virtual device and the owner.

"organically reflecting" only when other database is included (as libvirt is 
involved),
while losing the merit which a descriptive name can bring.

> 
> Each QEMU process is representing a VM, and a UUID is associated with it. The
> virtual gpu device is just a virtual device of this owner, so the path is:
> 
> -device vfio-pci,sysfsdev=/sys/devices/virtual/vgpu/$UUID-$vgpu_idx/
> 
> You can have multiple virtual device per VM and QEMU doesn't need to know 
> which
> physical device it comes from, especially it will automatically know which
> virtual device it owns, so the -device vfio-pci path will be setup for free.

As I commented earlier, Qemu never needs to know that information
regardless of how the vgpu is named.

> 
> If your most concern is having this kind of path doesn't provide enough
> information of the virtual device, we can add more sysfs attributes within the
> directory of /sys/devices/virtual/vgpu/$UUID-$vgpu_idx/ to reflect the
> information you want.

Like Gerd said, you can have something like this:

-device vfio-pci,sysfsdev=/sys/devices/virtual/vgpu/vgpu_idx/UUID

> 
> Even with UUID, you don't need libvirt at all. you can get uuid by running
> uuidgen command, I don't need libvirt to code up and test the RFC that I have
> sent out early. :-)

although simple, it still creates unnecessary user space dependency for
kernel resource management...

Thanks
Kevin

Re: [Qemu-devel] [PATCH] add CephFS support in VirtFS

2016-02-16 Thread Jevon Qiao


Hi Daniel,

Thank you for reviewing my code, please see my reply in-line.
On 15/2/16 17:17, Daniel P. Berrange wrote:

On Sun, Feb 14, 2016 at 01:06:40PM +0800, Jevon Qiao wrote:

diff --git a/configure b/configure
index 83b40fc..cecece7 100755
--- a/configure
+++ b/configure
@@ -1372,6 +1377,7 @@ disabled with --disable-FEATURE, default is enabled if
available:
vhost-net   vhost-net acceleration support
spice   spice
rbd rados block device (rbd)
+  cephfs  Ceph File System

Inconsistent vertical alignment with surrounding text
This is just a display issue, I'll send the patch with 'git send-email' 
later after I address all the technical comments.

libiscsiiscsi support
libnfs  nfs support
smartcard   smartcard support (libcacard)
+/*
+ * Helper function for cephfs_preadv and cephfs_pwritev
+ */
+inline static ssize_t preadv_pwritev(struct ceph_mount_info *cmount, int
fd,

Your email client is mangling long lines, here and in many other
places in the file. Please either fix your email client to not
insert arbitrary line breaks, or use git send-email to submit
the patch.

Ditto.

+  const struct iovec *iov, int iov_cnt,
+  off_t offset, bool do_write)
+{
+ssize_t ret = 0;
+int i = 0;

Use size_t for iterators

I'll revise the code.

+size_t len = 0;
+void *buf, *buftmp;
+size_t bufoffset = 0;
+
+for (; i < iov_cnt; i++) {
+len += iov[i].iov_len;
+}

iov_size() does this calculation

Thanks for the suggestion.

+
+buf = malloc(len);

Use g_new0(uint8_t, len)

OK.

+if (buf == NULL) {
+errno = ENOMEM;
+return -1;
+}

and don't check ENOMEM;

Any reason for this?

+
+i = 0;
+buftmp = buf;
+if (do_write) {
+for (i = 0; i < iov_cnt; i++) {
+memcpy((buftmp + bufoffset), iov[i].iov_base, iov[i].iov_len);
+bufoffset += iov[i].iov_len;
+}
+ret = ceph_write(cmount, fd, buf, len, offset);
+if (ret <= 0) {
+   errno = -ret;
+   ret = -1;
+}
+} else {
+ret = ceph_read(cmount, fd, buf, len, offset);
+if (ret <= 0) {
+errno = -ret;
+ret = -1;
+} else {
+for (i = 0; i < iov_cnt; i++) {
+memcpy(iov[i].iov_base, (buftmp + bufoffset),
iov[i].iov_len);

Mangled long line again.

That's the email client issue.

+bufoffset += iov[i].iov_len;
+}
+}
+}
+
+free(buf);
+return ret;
+}
+
+static int cephfs_update_file_cred(struct ceph_mount_info *cmount,
+   const char *name, FsCred *credp)

Align the parameters on following line to the '('

I will revise the code.

+{
+int fd, ret;
+fd = ceph_open(cmount, name, O_NONBLOCK | O_NOFOLLOW, credp->fc_mode);
+if (fd < 0) {
+return fd;
+}
+ret = ceph_fchown(cmount, fd, credp->fc_uid, credp->fc_gid);
+if (ret < 0) {
+goto err_out;
+}
+ret = ceph_fchmod(cmount, fd, credp->fc_mode & 0);
+err_out:
+close(fd);
+return ret;
+}
+
+static int cephfs_lstat(FsContext *fs_ctx, V9fsPath *fs_path,
+struct stat *stbuf)
+{
+D_CEPHFS("cephfs_lstat");

All of these D_CEPHFS() lines you have are really inserting trace
points, so you should really use the QEMU trace facility instead
of a fprintf() based macro. ie add to trace-events and then
call the generated trace fnuction for your event. Then get rid
of your D_CEPHFS macro.

I will revise the code.

+int ret;
+char *path = fs_path->data;
+struct cephfs_data *cfsdata = (struct cephfs_data *)fs_ctx->private;

fs_ctx->private is 'void *' so you don't need the (struct cephfs_data *)
cast there - 'void *' casts to anything automatically. The same issue
in all the other functions below too.

OK, I see.

+ret = ceph_lstat(cfsdata->cmount, path, stbuf);
+if (ret){
+errno = -ret;
+ret = -1;
+}
+return ret;
+}
+
+static ssize_t cephfs_readlink(FsContext *fs_ctx, V9fsPath *fs_path,
+   char *buf, size_t bufsz)
+{
+D_CEPHFS("cephfs_readlink");
+int ret;
+char *path = fs_path->data;
+struct cephfs_data *cfsdata = (struct cephfs_data *)fs_ctx->private;
+
+ret = ceph_readlink(cfsdata->cmount, path, buf, bufsz);
+return ret;
+}
+
+static int cephfs_close(FsContext *ctx, V9fsFidOpenState *fs)
+{
+D_CEPHFS("cephfs_close");
+struct cephfs_data *cfsdata = (struct cephfs_data*)ctx->private;
+
+return ceph_close(cfsdata->cmount, fs->fd);
+}
+
+static int cephfs_closedir(FsContext *ctx, V9fsFidOpenState *fs)
+{
+D_CEPHFS("cephfs_closedir");
+struct cephfs_data *cfsdata = (struct cephfs_data*)ctx->private;
+
+return ceph_closedir(cfsdata->cmount, (struct ceph_dir_result
*)fs->dir);
+}
+
+static int cephfs_open(FsContext *ctx, V9fsPath *fs_path,
+

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Neo Jia

On Wed, Feb 17, 2016 at 07:52:53AM +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > The answer is simple, having a UUID as part of the device name will give 
> > you a
> > unique sysfs path that will be opened by QEMU.
> 
> A descriptive name will work too, and I think it'll be easier to make
> those names persistent because you don't have to store the uuids
> somewhere to re-create the same setup afer reboot.

Hi Gerd,

Right, UUID will be persistent cross reboot. The qemu vgpu path for a given VM 
will 
not get changed when it gets reboots and multiple other devices have been
created in the middle.

> 
> > If you are worried about losing meaningful name here, we can create a sysfs 
> > file
> > to capture the vendor device description if you like.
> 
> You can also store the uuid in a sysfs file ...

Another benefit is that having the UUID as part of the virtual vgpu device path 
will
allow whoever is going to config the QEMU to automatically discover the virtual
device sysfs for free.

Thanks,
Neo

> 
> cheers,
>   Gerd
>

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Neo Jia

On Wed, Feb 17, 2016 at 06:02:36AM +, Tian, Kevin wrote:
> > From: Neo Jia
> > Sent: Wednesday, February 17, 2016 1:38 PM
> > > > >
> > > > >
> > > >
> > > > Hi Kevin,
> > > >
> > > > The answer is simple, having a UUID as part of the device name will 
> > > > give you a
> > > > unique sysfs path that will be opened by QEMU.
> > > >
> > > > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have 
> > > > multiple
> > > > virtual gpu devices per VM coming from same or different physical 
> > > > devices.
> > >
> > > That is not a problem. We can add physical device info too like 
> > > vgpu-vendor-0-0,
> > > vgpu-vendor-1-0, ...
> > >
> > > Please note Qemu doesn't care about the actual name. It just accepts a 
> > > sysfs path
> > > to open.
> > 
> > Hi Kevin,
> > 
> > No, I think you are making things even more complicated than it is required,
> > also it is not generic anymore as you are requiring the QEMU to know more 
> > than
> > he needs to.
> > 
> > The way you name those devices will require QEMU to know the relation
> > between virtual devices and physical devices. I don't think that is good.
> 
> I don't think you get my point. Look at how device is assigned in Qemu today:
> 
> -device vfio-pci,host=02:00.0
> 
> Then in a recent patch from Alex, Qemu will accept sysfsdev as well:
> 
> -device vfio-pci,sysfsdev=/sys/devices/pci:00/:00:1c.0/:02:00.0
> 
> Then with vgu (one example from Alex's original post):
> 
> -device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@:00:02.0

Hi Kevin,

I am fully aware of Alex's patch, that is just an example, but he doesn't
exclude the cases of using UUID as the device path as he mentioned in the
beginning of this long email thread, actually he has already agreed with this
UUID-$vgpu_idx path. :-)

Also, you should note that with the proposal we have, it doesn't require to have
anything like either intel-vgpu or nvidia-vgpu as the path, having a non-vendor
specific information within the vgpu device path is one of the requirement IIRC.

-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@:00:02.0

> 
> Qemu doesn't need to know the relation between virtual/physical devices at
> all. It's just a path regardless of how vgpu name is created (either with your
> UUID proposal or my descriptive string proposal)

No, with path like above, QEMU needs to know the virtual device is created from
that physical device :00:02.0 right? (you have mentioned this yourself
actually below.) If QEMU doesn't want to know that, then he will transfer the
burden to the upper layer stack such as libvirt, who has to figure out the right
path of this new VM, as the vgpu<$id>, the <%id> will become another generic
number generated by the vgpu core driver. 

So why not just use UUID here?

> 
> > 
> > My flow is like this:
> > 
> > libvirt creats a VM object, it will have a UUID. then it will use the UUID 
> > to
> > create virtual gpu devices, then it will pass the UUID to the QEMU (actually
> > QEMU already has the VM UUID), then it will just open up the unique path.
> 
> If you look at above example, it's not UUID itself being passed to Qemu. It's
> the sysfsdev path.

UUID is always sent to QEMU, if you look at your QEMU command line, please.

Yes, it is a sysfs path, even with UUID, it is a sysfs path, the only difference
is that we have a uuid embedded within the device name.

> 
> > 
> > Also, you need to consider those 0-0 numbers are not generic as the UUID.
> 
> that encoding could be flexible to include any meaningful string. libvirt can
> itself manages how UUID is mapped to an actual vgpu name.
> 
> > >
> > > >
> > > > If you are worried about losing meaningful name here, we can create a 
> > > > sysfs file
> > > > to capture the vendor device description if you like.
> > > >
> > >
> > > Having the vgpu name descriptive is more informative imo. User can simply 
> > > check
> > > sysfs names to know raw information w/o relying on 3rd party agent to 
> > > query
> > > information around an opaque UUID.
> > >
> > 
> > You are actually arguing against your own design here, unfortunately. If you
> > look at your design carefully, it is your design actually require to have a 
> > 3rd
> > party code to figure out the VM and virtual gpu device relation as it is
> > never documented in the sysfs.
> 
> No. It's not about figuring out the relation between VM and vGPU. It's about
> figuring out some raw information about vGPU itself, such as:
> 
> vgpu0@:00:02.0, I can immediately know it's the 1st instance created on
> :00:02.0 device. While with an UUID, it doesn't speak anything useful 
> here.

Right, I get your point, but this requires the qemu to know this virtual device
is created on that physical device, isn't it, given you already know that the
virtual device is created on the physical device :00:02.0, right?

> 
> This is a typical way how device nodes are created within sysfs, e.g. on my
> platform:
> 
> $ ls

Re: [Qemu-devel] [PATCH 2/2] hw/9pfs: fix alignment issue when host filesystem block size is larger than client msize

2016-02-16 Thread Jevon Qiao


Hi Aneesh,

Thank you for reviewing my code, please see my reply in-line.
On 14/2/16 21:38, Aneesh Kumar K.V wrote:

Jevon Qiao  writes:


The following patch is to fix alignment issue when host filesystem block
size
is larger than client msize.

Thanks,
Jevon

That is not the right format to send patch. You can send them as a
series using git-send-email.
Yes, you're correct. I will send the patches later after I address all 
the technical comments.

From: Jevon Qiao 
Date: Sun, 14 Feb 2016 15:11:08 +0800
Subject: [PATCH] hw/9pfs: fix alignment issue when host filesystem block
size
   is larger than client msize.

Per the previous implementation, iounit will be assigned to be 0 after the
first if statement as (s->msize - P9_IOHDRSZ)/stbuf.f_bsize will be zero
when
host filesystem block size is larger than msize. Finally, iounit will be
equal
to s->msize - P9_IOHDRSZ, which is usually not aligned.

Signed-off-by: Jevon Qiao 
---
   hw/9pfs/virtio-9p.c | 19 ---
   1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index f972731..005d3a8 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -1326,7 +1326,7 @@ out_nofid:
   static int32_t get_iounit(V9fsPDU *pdu, V9fsPath *path)
   {
   struct statfs stbuf;
-int32_t iounit = 0;
+int32_t iounit = 0, unit = 0;
   V9fsState *s = pdu->s;

   /*
@@ -1334,8 +1334,21 @@ static int32_t get_iounit(V9fsPDU *pdu, V9fsPath
*path)
* and as well as less than (client msize - P9_IOHDRSZ))
*/
   if (!v9fs_co_statfs(pdu, path, )) {
-iounit = stbuf.f_bsize;
-iounit *= (s->msize - P9_IOHDRSZ)/stbuf.f_bsize;
+/*
+ * If host filesystem block size is larger than client msize,
+ * we will use PAGESIZE as the unit. The reason why we choose
+ * PAGESIZE is because the data will be splitted in terms of
+ * PAGESIZE in the virtio layer. In this case, the final
+ * iounit is equal to the value of ((msize/unit) - 1) * unit.
+ */
+if (stbuf.f_bsize > s->msize) {
+iounit = 4096;
+unit = 4096;

What page size it should be guest or host ?. Also why 4096 ?. ppc64 use
64K page size.

The data to be read or written will be divided into pieces according to the
size of iounit and msize firstly, and then mapped to pages before being 
added

into virtqueue. Since all these operations happen in the guest side, so the
page size should be guest. Please correct me if I'm wrong.

As for the number 4096, It's the default value in Linux OS. I did not take
other platforms into account, it's my fault. To make it suitable for all 
platforms,

shall I use the function getpagesize() here?

Thanks,
Jevon

+} else {
+iounit = stbuf.f_bsize;
+unit = stbuf.f_bsize;
+}
+iounit *= (s->msize - P9_IOHDRSZ)/unit;
   }
   if (!iounit) {
   iounit = s->msize - P9_IOHDRSZ;
--

-aneesh

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Gerd Hoffmann

  Hi,

> The answer is simple, having a UUID as part of the device name will give you a
> unique sysfs path that will be opened by QEMU.

A descriptive name will work too, and I think it'll be easier to make
those names persistent because you don't have to store the uuids
somewhere to re-create the same setup afer reboot.

> If you are worried about losing meaningful name here, we can create a sysfs 
> file
> to capture the vendor device description if you like.

You can also store the uuid in a sysfs file ...

cheers,
  Gerd

[Qemu-devel] [PATCH] scripts/kvm/kvm_stat: Fix missing right parantheses

2016-02-16 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 scripts/kvm/kvm_stat | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kvm/kvm_stat b/scripts/kvm/kvm_stat
index 3cf1181..517fbec 100755
--- a/scripts/kvm/kvm_stat
+++ b/scripts/kvm/kvm_stat
@@ -800,7 +800,7 @@ def check_access(options):
 if options.tracepoints:
 sys.exit(1)
 
-sys.stderr.write("Falling back to debugfs statistics!\n"
+sys.stderr.write("Falling back to debugfs statistics!\n")
 options.debugfs = True
 sleep(5)
 
-- 
2.4.3

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Tian, Kevin

> From: Neo Jia
> Sent: Wednesday, February 17, 2016 1:38 PM
> > > >
> > > >
> > >
> > > Hi Kevin,
> > >
> > > The answer is simple, having a UUID as part of the device name will give 
> > > you a
> > > unique sysfs path that will be opened by QEMU.
> > >
> > > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have multiple
> > > virtual gpu devices per VM coming from same or different physical devices.
> >
> > That is not a problem. We can add physical device info too like 
> > vgpu-vendor-0-0,
> > vgpu-vendor-1-0, ...
> >
> > Please note Qemu doesn't care about the actual name. It just accepts a 
> > sysfs path
> > to open.
> 
> Hi Kevin,
> 
> No, I think you are making things even more complicated than it is required,
> also it is not generic anymore as you are requiring the QEMU to know more than
> he needs to.
> 
> The way you name those devices will require QEMU to know the relation
> between virtual devices and physical devices. I don't think that is good.

I don't think you get my point. Look at how device is assigned in Qemu today:

-device vfio-pci,host=02:00.0

Then in a recent patch from Alex, Qemu will accept sysfsdev as well:

-device vfio-pci,sysfsdev=/sys/devices/pci:00/:00:1c.0/:02:00.0

Then with vgu (one example from Alex's original post):

-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@:00:02.0

Qemu doesn't need to know the relation between virtual/physical devices at
all. It's just a path regardless of how vgpu name is created (either with your
UUID proposal or my descriptive string proposal)

> 
> My flow is like this:
> 
> libvirt creats a VM object, it will have a UUID. then it will use the UUID to
> create virtual gpu devices, then it will pass the UUID to the QEMU (actually
> QEMU already has the VM UUID), then it will just open up the unique path.

If you look at above example, it's not UUID itself being passed to Qemu. It's
the sysfsdev path.

> 
> Also, you need to consider those 0-0 numbers are not generic as the UUID.

that encoding could be flexible to include any meaningful string. libvirt can
itself manages how UUID is mapped to an actual vgpu name.

> >
> > >
> > > If you are worried about losing meaningful name here, we can create a 
> > > sysfs file
> > > to capture the vendor device description if you like.
> > >
> >
> > Having the vgpu name descriptive is more informative imo. User can simply 
> > check
> > sysfs names to know raw information w/o relying on 3rd party agent to query
> > information around an opaque UUID.
> >
> 
> You are actually arguing against your own design here, unfortunately. If you
> look at your design carefully, it is your design actually require to have a 
> 3rd
> party code to figure out the VM and virtual gpu device relation as it is
> never documented in the sysfs.

No. It's not about figuring out the relation between VM and vGPU. It's about
figuring out some raw information about vGPU itself, such as:

vgpu0@:00:02.0, I can immediately know it's the 1st instance created on
:00:02.0 device. While with an UUID, it doesn't speak anything useful here.

This is a typical way how device nodes are created within sysfs, e.g. on my
platform:

$ ls /sys/class/drm/
card0/  card0-DP-2/ card0-HDMI-A-2/ controlD64/
card0-DP-1/ card0-HDMI-A-1/ card0-VGA-1/version

$ ls /sys/bus/pci/devices/
:00:00.0  :00:14.0  :00:1a.0  :00:1c.1  :00:1f.2
:00:02.0  :00:16.0  :00:1b.0  :00:1d.0  :00:1f.3
:00:03.0  :00:19.0  :00:1c.0  :00:1f.0  :02:00.0

We'd better keep such style when creating vgpu nodes in sysfs. UUID is
at most anther info suffixed to the default string (or in another file), if 
necessary.

> 
> In our current design, it doesn't require any 3rd party agent as the VM UUID 
> is
> part of the QEMU command already, and the VM UUID is already embedded within 
> the
> virtual device path.
> 
> Also, it doesn't require 3rd party to retrieve information as the virtual 
> device
> will just be a directory, we will have another file within each virtual gpu
> device directory, you can always cat the file to retrieve vendor information.
> 
> Let's use the UUID-$vgpu_idx as the virtual device directory name plus a 
> vendor
> description file within that directory, so we don't lose any additional
> information, also capture the VM and virtual device relation.
> 

I'm fine to have another node to provide more vendor specific information. But
I don't want to make UUID mandatory when creating a vGPU instance, as 
explained above. Today we can create VMs in KVM w/o using libvirt, and w/o 
the need of allocating any UUID. Same thing should be supported for vgpu
feature too.

Thanks
Kevin

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Neo Jia

On Tue, Feb 16, 2016 at 10:09:43PM -0700, Eric Blake wrote:
> * PGP Signed by an unknown key
> 
> On 02/16/2016 10:04 PM, Tian, Kevin wrote:
> 
> 
> ...rather than making readers scroll through 16k bytes of repetitions of
> the same things they saw earlier in the thread, but getting worse with
> each iteration due to excessive quoting.
> 

Hi Eric,

Sorry about that, I will pay attention to this.

Thanks,
Neo

> -- 
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
> 
> 
> * Unknown Key
> * 0x2527436A

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Neo Jia

On Wed, Feb 17, 2016 at 05:04:31AM +, Tian, Kevin wrote:
> > From: Neo Jia
> > Sent: Wednesday, February 17, 2016 12:18 PM
> > 
> > On Wed, Feb 17, 2016 at 03:31:24AM +, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Tuesday, February 16, 2016 4:49 PM
> > > >
> > > > On Tue, Feb 16, 2016 at 08:10:42AM +, Tian, Kevin wrote:
> > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > Sent: Tuesday, February 16, 2016 3:53 PM
> > > > > >
> > > > > > On Tue, Feb 16, 2016 at 07:40:47AM +, Tian, Kevin wrote:
> > > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM
> > > > > > > >
> > > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +, Tian, Kevin wrote:
> > > > > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +, Tian, Kevin wrote:
> > > > > > > > > > > > From: Alex Williamson 
> > > > > > > > > > > > [mailto:alex.william...@redhat.com]
> > > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wrote:
> > > > > > > > > > > > >   Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Actually I have a long puzzle in this area. 
> > > > > > > > > > > > > > Definitely libvirt will
> > use
> > > > UUID
> > > > > > to
> > > > > > > > > > > > > > mark a VM. And obviously UUID is not recorded 
> > > > > > > > > > > > > > within KVM.
> > Then
> > > > how
> > > > > > does
> > > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could be a 
> > > > > > > > > > > > > > good reference
> > to
> > > > this
> > > > > > design.
> > > > > > > > > > > > >
> > > > > > > > > > > > > libvirt keeps track which qemu instance belongs to 
> > > > > > > > > > > > > which vm.
> > > > > > > > > > > > > qemu also gets started with "-uuid ...", so one can 
> > > > > > > > > > > > > query qemu
> > via
> > > > > > > > > > > > > monitor ("info uuid") to figure what the uuid is.  It 
> > > > > > > > > > > > > is also in the
> > > > > > > > > > > > > smbios tables so the guest can see it in the system 
> > > > > > > > > > > > > information
> > table.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The uuid is not visible to the kernel though, the kvm 
> > > > > > > > > > > > > kernel driver
> > > > > > > > > > > > > doesn't know what the uuid is (and neither does 
> > > > > > > > > > > > > vfio).  qemu uses
> > > > file
> > > > > > > > > > > > > handles to talk to both kvm and vfio.  qemu notifies 
> > > > > > > > > > > > > both kvm
> > and
> > > > vfio
> > > > > > > > > > > > > about anything relevant events (guest address space 
> > > > > > > > > > > > > changes
> > etc)
> > > > and
> > > > > > > > > > > > > connects file descriptors (eventfd -> irqfd).
> > > > > > > > > > > >
> > > > > > > > > > > > I think the original link to using a VM UUID for the 
> > > > > > > > > > > > vGPU comes from
> > > > > > > > > > > > NVIDIA having a userspace component which might get 
> > > > > > > > > > > > launched
> > from
> > > > a udev
> > > > > > > > > > > > event as the vGPU is created or the set of vGPUs within 
> > > > > > > > > > > > that UUID
> > is
> > > > > > > > > > > > started.  Using the VM UUID then gives them a way to 
> > > > > > > > > > > > associate
> > that
> > > > > > > > > > > > userspace process with a VM instance.  Maybe it could 
> > > > > > > > > > > > register with
> > > > > > > > > > > > libvirt for some sort of service provided for the VM, I 
> > > > > > > > > > > > don't know.
> > > > > > > > > > >
> > > > > > > > > > > Intel doesn't have this requirement. It should be enough 
> > > > > > > > > > > as long as
> > > > > > > > > > > libvirt maintains which sysfs vgpu node is associated to 
> > > > > > > > > > > a VM UUID.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio device, 
> > > > > > > > > > > > > something
> > > > > > > > > > > > > like /sys/devices/virtual/vgpu/.   can be 
> > > > > > > > > > > > > a uuid
> > if
> > > > you
> > > > > > want
> > > > > > > > > > > > > have it that way, but it could be pretty much 
> > > > > > > > > > > > > anything.  The sysfs
> > node
> > > > > > > > > > > > > will probably show up as-is in the libvirt xml when 
> > > > > > > > > > > > > assign a vgpu
> > to
> > > > a
> > > > > > > > > > > > > vm.  So the name should be something stable (i.e. 
> > > > > > > > > > > > > when using
> > a uuid
> > > > as
> > > > > > > > > > > > > name you should better not generate a new one on each 
> > > > > > > > > > > > > boot).
> > > > > > > > > > > >
> > > > > > > > > > > > Actually I don't think there's really a persistent 
> > > > > > > > > > > > naming issue, that's
> > > > > > > > > > > > probably where we diverge from the SR-IOV model.  
> > > > > > > > > > > > SR-IOV

Re: [Qemu-devel] [PATCH V3 2/2] tests/test-filter-mirror:add filter-mirror unit test

2016-02-16 Thread Zhang Chen




On 02/15/2016 01:54 PM, Jason Wang wrote:


On 02/04/2016 03:43 PM, Zhang Chen wrote:

From: ZhangChen 

Using qtest qmp interface to implement following cases:
1) add/remove filter-mirror
2) add a filter-mirror then delete the netdev
3) add/remove more than one filter-mirrors
4) add more than one filter-mirrors and then delete the netdev

The steps here is rather similar to test-netfilter.c. Let's try to
generalize them instead of duplicating codes.


We consider that netfilter need a common test case to test common
function for all filter plugin. so we will remove it in this patch and
write anther patch for netfilter common test in futrue. and now
we will focus on filter-redirector, filter-rewriter and filter-compare.


5) add filter-mirror with:
-object filter-mirror,id=qtest-f0,netdev=qtest-bn0,queue=tx,outdev=mirror0

then inject packet from the socket connected to qtest-bn0,
filter-mirror will copy and mirror the packet to mirror0.
we read packet from mirror0 and then compare to what we inject.
del filter-mirror.

we start qemu with:
-netdev socket,id=qtest-bn0,listen=127.0.0.1:9005
-device e1000,netdev=qtest-bn0,id=qtest-e0
-chardev socket,id=mirror0,host=127.0.0.1,port=9003,server,nowait
-chardev socket,id=mirror1,host=127.0.0.1,port=9004,server,nowait

Hardcoded port is not good here since it may cause false positive
(consider the tests may be trigged by lots of automated script both
upstream and downstream). A better solution is using socketpair(2) and
passing pre-created fd(s) to file chardev.


I will fix it in next patch

Thanks
zhangchen


Signed-off-by: zhangchen 
Signed-off-by: Wen Congyang 


[...]


.



--
Thanks
zhangchen

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Eric Blake

On 02/16/2016 10:04 PM, Tian, Kevin wrote:
>> From: Neo Jia

[meta-comment]

...

 On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wrote:
>   Hi,
>
>> Actually I have a long puzzle in this area. Definitely libvirt 
>> will
>> use
 UUID
>> to

With this much quoting, your mailer is breaking down.  It's not only
okay, but encouraged, to trim your message to just what you are directly
replying to...

>> Hi Kevin,
>>
>> The answer is simple, having a UUID as part of the device name will give you 
>> a
>> unique sysfs path that will be opened by QEMU.
>>
>> vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have multiple
>> virtual gpu devices per VM coming from same or different physical devices.
> 
> That is not a problem. We can add physical device info too like 
> vgpu-vendor-0-0,
> vgpu-vendor-1-0, ...

...rather than making readers scroll through 16k bytes of repetitions of
the same things they saw earlier in the thread, but getting worse with
each iteration due to excessive quoting.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Tian, Kevin

> From: Neo Jia
> Sent: Wednesday, February 17, 2016 12:18 PM
> 
> On Wed, Feb 17, 2016 at 03:31:24AM +, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:c...@nvidia.com]
> > > Sent: Tuesday, February 16, 2016 4:49 PM
> > >
> > > On Tue, Feb 16, 2016 at 08:10:42AM +, Tian, Kevin wrote:
> > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > Sent: Tuesday, February 16, 2016 3:53 PM
> > > > >
> > > > > On Tue, Feb 16, 2016 at 07:40:47AM +, Tian, Kevin wrote:
> > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM
> > > > > > >
> > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +, Tian, Kevin wrote:
> > > > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM
> > > > > > > > >
> > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +, Tian, Kevin wrote:
> > > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM
> > > > > > > > > > >
> > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wrote:
> > > > > > > > > > > >   Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > > Actually I have a long puzzle in this area. 
> > > > > > > > > > > > > Definitely libvirt will
> use
> > > UUID
> > > > > to
> > > > > > > > > > > > > mark a VM. And obviously UUID is not recorded within 
> > > > > > > > > > > > > KVM.
> Then
> > > how
> > > > > does
> > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could be a good 
> > > > > > > > > > > > > reference
> to
> > > this
> > > > > design.
> > > > > > > > > > > >
> > > > > > > > > > > > libvirt keeps track which qemu instance belongs to 
> > > > > > > > > > > > which vm.
> > > > > > > > > > > > qemu also gets started with "-uuid ...", so one can 
> > > > > > > > > > > > query qemu
> via
> > > > > > > > > > > > monitor ("info uuid") to figure what the uuid is.  It 
> > > > > > > > > > > > is also in the
> > > > > > > > > > > > smbios tables so the guest can see it in the system 
> > > > > > > > > > > > information
> table.
> > > > > > > > > > > >
> > > > > > > > > > > > The uuid is not visible to the kernel though, the kvm 
> > > > > > > > > > > > kernel driver
> > > > > > > > > > > > doesn't know what the uuid is (and neither does vfio).  
> > > > > > > > > > > > qemu uses
> > > file
> > > > > > > > > > > > handles to talk to both kvm and vfio.  qemu notifies 
> > > > > > > > > > > > both kvm
> and
> > > vfio
> > > > > > > > > > > > about anything relevant events (guest address space 
> > > > > > > > > > > > changes
> etc)
> > > and
> > > > > > > > > > > > connects file descriptors (eventfd -> irqfd).
> > > > > > > > > > >
> > > > > > > > > > > I think the original link to using a VM UUID for the vGPU 
> > > > > > > > > > > comes from
> > > > > > > > > > > NVIDIA having a userspace component which might get 
> > > > > > > > > > > launched
> from
> > > a udev
> > > > > > > > > > > event as the vGPU is created or the set of vGPUs within 
> > > > > > > > > > > that UUID
> is
> > > > > > > > > > > started.  Using the VM UUID then gives them a way to 
> > > > > > > > > > > associate
> that
> > > > > > > > > > > userspace process with a VM instance.  Maybe it could 
> > > > > > > > > > > register with
> > > > > > > > > > > libvirt for some sort of service provided for the VM, I 
> > > > > > > > > > > don't know.
> > > > > > > > > >
> > > > > > > > > > Intel doesn't have this requirement. It should be enough as 
> > > > > > > > > > long as
> > > > > > > > > > libvirt maintains which sysfs vgpu node is associated to a 
> > > > > > > > > > VM UUID.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio device, 
> > > > > > > > > > > > something
> > > > > > > > > > > > like /sys/devices/virtual/vgpu/.   can be a 
> > > > > > > > > > > > uuid
> if
> > > you
> > > > > want
> > > > > > > > > > > > have it that way, but it could be pretty much anything. 
> > > > > > > > > > > >  The sysfs
> node
> > > > > > > > > > > > will probably show up as-is in the libvirt xml when 
> > > > > > > > > > > > assign a vgpu
> to
> > > a
> > > > > > > > > > > > vm.  So the name should be something stable (i.e. when 
> > > > > > > > > > > > using
> a uuid
> > > as
> > > > > > > > > > > > name you should better not generate a new one on each 
> > > > > > > > > > > > boot).
> > > > > > > > > > >
> > > > > > > > > > > Actually I don't think there's really a persistent naming 
> > > > > > > > > > > issue, that's
> > > > > > > > > > > probably where we diverge from the SR-IOV model.  SR-IOV 
> > > > > > > > > > > cannot
> > > > > > > > > > > dynamically add a new VF, it needs to reset the number of 
> > > > > > > > > > > VFs to
> zero,
> > > > > > > > > > > then re-allocate all of them up to the new desired count. 
> > > > > > > > > > >  That has
> some
> > > > > > > > > > > obvious implications.  I think with both vendors

Re: [Qemu-devel] [PATCH v9 29/37] qapi: Eliminate empty visit_type_FOO_fields

2016-02-16 Thread Eric Blake

On 01/25/2016 10:04 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> For empty structs, such as the 'Abort' helper type used as part
>> of the 'transaction' command, we were emitting a no-op
>> visit_type_FOO_fields().  Optimize things to instead omit calls
>> for empty structs.  Generated code changes resemble:
>>

>> Another reason for doing this optimization is that it gets us
>> closer to merging the code for visiting structs and unions:
>> since flat unions have no local members, they do not need to
>> have a visit_type_UNION_fields() emitted, even when they start
>> sharing the code used to visit structs.
>>

> 
> I'm not sure the optimization is worthwhile by itself.  Empty structs
> are rare.  I'm reserving judgement until I see the struct/union
> unification.

We managed to pull off unification without this patch, and your argument
about making the generator more verbose with no-ops if it is less
maintenance is still resonating with me.  I think I'm going to drop this
and 30/37 in my next round of patches, with no real loss in functionality.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH RFC 0/4] ARM SMMUv3 Emulation

2016-02-16 Thread Prem Mallappa

Hi Peter,

I have access till
"PRD03-GENC-010952 Version 11.0" updated 18/04/15.

infocenter.arm.com has only SMMUv2 made public

Please let me know if there is a new version available
 - to public.
 - through channels, I have to work with my organizational contact to get
to the new one.

I am not expecting any fundamental operation changes. However, I'll take a
look if theres a new version.

Cheers,
/Prem


On Tue, Feb 16, 2016 at 4:20 PM, Peter Maydell 
wrote:

> On 11 January 2016 at 14:16,   wrote:
> > From: Prem Mallappa 
> >
> > Implementation Notes:
> >
> > - SMMUv3 model, as per ARM SMMUv3 11.0 spec
>
> I haven't reviewed any of this yet, but 11.0 is not the current
> revision of the SMMUv3 spec, so you should probably start by
> updating your code to the current version. Otherwise code review
> will be full of "this isn't what the spec says"...
>
> thanks
> -- PMM
>



-- 
Cheers,
/Prem

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Neo Jia

On Wed, Feb 17, 2016 at 03:31:24AM +, Tian, Kevin wrote:
> > From: Neo Jia [mailto:c...@nvidia.com]
> > Sent: Tuesday, February 16, 2016 4:49 PM
> > 
> > On Tue, Feb 16, 2016 at 08:10:42AM +, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > Sent: Tuesday, February 16, 2016 3:53 PM
> > > >
> > > > On Tue, Feb 16, 2016 at 07:40:47AM +, Tian, Kevin wrote:
> > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > Sent: Tuesday, February 16, 2016 3:37 PM
> > > > > >
> > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +, Tian, Kevin wrote:
> > > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM
> > > > > > > >
> > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +, Tian, Kevin wrote:
> > > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM
> > > > > > > > > >
> > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wrote:
> > > > > > > > > > >   Hi,
> > > > > > > > > > >
> > > > > > > > > > > > Actually I have a long puzzle in this area. Definitely 
> > > > > > > > > > > > libvirt will use
> > UUID
> > > > to
> > > > > > > > > > > > mark a VM. And obviously UUID is not recorded within 
> > > > > > > > > > > > KVM. Then
> > how
> > > > does
> > > > > > > > > > > > libvirt talk to KVM based on UUID? It could be a good 
> > > > > > > > > > > > reference to
> > this
> > > > design.
> > > > > > > > > > >
> > > > > > > > > > > libvirt keeps track which qemu instance belongs to which 
> > > > > > > > > > > vm.
> > > > > > > > > > > qemu also gets started with "-uuid ...", so one can query 
> > > > > > > > > > > qemu via
> > > > > > > > > > > monitor ("info uuid") to figure what the uuid is.  It is 
> > > > > > > > > > > also in the
> > > > > > > > > > > smbios tables so the guest can see it in the system 
> > > > > > > > > > > information table.
> > > > > > > > > > >
> > > > > > > > > > > The uuid is not visible to the kernel though, the kvm 
> > > > > > > > > > > kernel driver
> > > > > > > > > > > doesn't know what the uuid is (and neither does vfio).  
> > > > > > > > > > > qemu uses
> > file
> > > > > > > > > > > handles to talk to both kvm and vfio.  qemu notifies both 
> > > > > > > > > > > kvm and
> > vfio
> > > > > > > > > > > about anything relevant events (guest address space 
> > > > > > > > > > > changes etc)
> > and
> > > > > > > > > > > connects file descriptors (eventfd -> irqfd).
> > > > > > > > > >
> > > > > > > > > > I think the original link to using a VM UUID for the vGPU 
> > > > > > > > > > comes from
> > > > > > > > > > NVIDIA having a userspace component which might get 
> > > > > > > > > > launched from
> > a udev
> > > > > > > > > > event as the vGPU is created or the set of vGPUs within 
> > > > > > > > > > that UUID is
> > > > > > > > > > started.  Using the VM UUID then gives them a way to 
> > > > > > > > > > associate that
> > > > > > > > > > userspace process with a VM instance.  Maybe it could 
> > > > > > > > > > register with
> > > > > > > > > > libvirt for some sort of service provided for the VM, I 
> > > > > > > > > > don't know.
> > > > > > > > >
> > > > > > > > > Intel doesn't have this requirement. It should be enough as 
> > > > > > > > > long as
> > > > > > > > > libvirt maintains which sysfs vgpu node is associated to a VM 
> > > > > > > > > UUID.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > qemu needs a sysfs node as handle to the vfio device, 
> > > > > > > > > > > something
> > > > > > > > > > > like /sys/devices/virtual/vgpu/.   can be a 
> > > > > > > > > > > uuid if
> > you
> > > > want
> > > > > > > > > > > have it that way, but it could be pretty much anything.  
> > > > > > > > > > > The sysfs node
> > > > > > > > > > > will probably show up as-is in the libvirt xml when 
> > > > > > > > > > > assign a vgpu to
> > a
> > > > > > > > > > > vm.  So the name should be something stable (i.e. when 
> > > > > > > > > > > using a uuid
> > as
> > > > > > > > > > > name you should better not generate a new one on each 
> > > > > > > > > > > boot).
> > > > > > > > > >
> > > > > > > > > > Actually I don't think there's really a persistent naming 
> > > > > > > > > > issue, that's
> > > > > > > > > > probably where we diverge from the SR-IOV model.  SR-IOV 
> > > > > > > > > > cannot
> > > > > > > > > > dynamically add a new VF, it needs to reset the number of 
> > > > > > > > > > VFs to zero,
> > > > > > > > > > then re-allocate all of them up to the new desired count.  
> > > > > > > > > > That has some
> > > > > > > > > > obvious implications.  I think with both vendors here, we 
> > > > > > > > > > can
> > > > > > > > > > dynamically allocate new vGPUs, so I would expect that 
> > > > > > > > > > libvirt would
> > > > > > > > > > create each vGPU instance as it's needed.  None would be 
> > > > > > > > > > created by
> > > > > > > > > > default without user interaction.

Re: [Qemu-devel] [PATCH V3 1/2] net/filter-mirror:Add filter-mirror

2016-02-16 Thread Zhang Chen




On 02/15/2016 03:06 PM, Zhang Chen wrote:



On 02/15/2016 01:23 PM, Jason Wang wrote:


On 02/04/2016 05:00 PM, Zhang Chen wrote:


On 02/04/2016 03:43 PM, Zhang Chen wrote:

From: ZhangChen 

Filter-mirror is a netfilter plugin.
It gives qemu the ability to copy and mirror guest's
net packet. we output packet to chardev.

To make it compact, how about "It gives qemu the ability to mirror
packets to a chardev."?


OK, will fix it in next version.


usage:

-netdev tap,id=hn0
-chardev socket,id=mirror0,host=ip_primary,port=X,server,nowait
-filter-mirror,id=m0,netdev=hn0,queue=tx/rx/all,outdev=mirror0

An issue with mirror (and dump) is that it can not work correctly with
the netdev that has a vnet header. Need to fix this, a possible solution
is to checksum the buffer and strip the header before passing it to a
chardev.



Thanks, I don't consider about vnet, we will fix it in next version.



We have discussed for vnet in our team.  we think filter-mirror no need to
do some analysis packet job, just do mirror job. and other job put it on
other plugin like filter-writer and filter-compare. If we have two guest
that both have vnet header, mirror one guest's packet to anther one.
strip the header then mirror packet will result in errors. so let's strip
vnet header in other plugin. keep filter-mirror simple.the filter-redirector
is same as filter-mirror.


Signed-off-by: ZhangChen 
Signed-off-by: Wen Congyang 
Reviewed-by: Yang Hongyang 
Reviewed-by: zhanghailiang 
---
   net/Makefile.objs   |   1 +
   net/filter-mirror.c | 171

   qemu-options.hx |   5 ++
   vl.c|   3 +-
   4 files changed, 179 insertions(+), 1 deletion(-)
   create mode 100644 net/filter-mirror.c

diff --git a/net/Makefile.objs b/net/Makefile.objs
index 5fa2f97..de06ebe 100644
--- a/net/Makefile.objs
+++ b/net/Makefile.objs
@@ -15,3 +15,4 @@ common-obj-$(CONFIG_VDE) += vde.o
   common-obj-$(CONFIG_NETMAP) += netmap.o
   common-obj-y += filter.o
   common-obj-y += filter-buffer.o
+common-obj-y += traffic-mirror.o

s/traffic-mirror.o/filter-mirror.o/ rebase error


diff --git a/net/filter-mirror.c b/net/filter-mirror.c
new file mode 100644
index 000..87ccaf5
--- /dev/null
+++ b/net/filter-mirror.c
@@ -0,0 +1,171 @@
+/*
+ * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2016 FUJITSU LIMITED
+ * Copyright (c) 2016 Intel Corporation
+ *
+ * Author: Zhang Chen 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#include "net/filter.h"
+#include "net/net.h"
+#include "qemu-common.h"
+#include "qapi/qmp/qerror.h"
+#include "qapi-visit.h"
+#include "qom/object.h"
+#include "qemu/main-loop.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/char.h"
+#include "qemu/iov.h"
+#include "qemu/sockets.h"
+
+#define FILTER_MIRROR(obj) \
+OBJECT_CHECK(MirrorState, (obj), TYPE_FILTER_MIRROR)
+
+#define TYPE_FILTER_MIRROR "filter-mirror"
+
+typedef struct MirrorState {
+NetFilterState parent_obj;
+char *outdev;
+CharDriverState *chr_out;
+} MirrorState;
+
+static ssize_t filter_mirror_send(NetFilterState *nf,
+   const struct iovec *iov,
+   int iovcnt)
+{
+MirrorState *s = FILTER_MIRROR(nf);
+ssize_t ret = 0;
+ssize_t size = 0;
+uint32_t len =  0;
+char *buf;
+
+size = iov_size(iov, iovcnt);
+len = htonl(size);
+if (!size) {
+return 0;
+}
+
+buf = g_malloc0(size);
+iov_to_buf(iov, iovcnt, 0, buf, size);
+ret = qemu_chr_fe_write_all(s->chr_out, (uint8_t *),
sizeof(len));
+if (ret < 0) {

I believe we should also fail when ret < sizeof(len) and modify the
caller check in filter_mirror_iov(). To make this a little bit easier,
there's no need to return ssize_t here (otherwise, caller need to call
iov_size() before checking the return value), just return 0 for success
and -EFXXX for failure.


OK, will fix it in next version

Thanks
zhangchen


Other looks good.



.





--
Thanks
zhangchen

Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-16 Thread Tian, Kevin

> From: Neo Jia [mailto:c...@nvidia.com]
> Sent: Tuesday, February 16, 2016 4:49 PM
> 
> On Tue, Feb 16, 2016 at 08:10:42AM +, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:c...@nvidia.com]
> > > Sent: Tuesday, February 16, 2016 3:53 PM
> > >
> > > On Tue, Feb 16, 2016 at 07:40:47AM +, Tian, Kevin wrote:
> > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > Sent: Tuesday, February 16, 2016 3:37 PM
> > > > >
> > > > > On Tue, Feb 16, 2016 at 07:27:09AM +, Tian, Kevin wrote:
> > > > > > > From: Neo Jia [mailto:c...@nvidia.com]
> > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM
> > > > > > >
> > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +, Tian, Kevin wrote:
> > > > > > > > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM
> > > > > > > > >
> > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wrote:
> > > > > > > > > >   Hi,
> > > > > > > > > >
> > > > > > > > > > > Actually I have a long puzzle in this area. Definitely 
> > > > > > > > > > > libvirt will use
> UUID
> > > to
> > > > > > > > > > > mark a VM. And obviously UUID is not recorded within KVM. 
> > > > > > > > > > > Then
> how
> > > does
> > > > > > > > > > > libvirt talk to KVM based on UUID? It could be a good 
> > > > > > > > > > > reference to
> this
> > > design.
> > > > > > > > > >
> > > > > > > > > > libvirt keeps track which qemu instance belongs to which vm.
> > > > > > > > > > qemu also gets started with "-uuid ...", so one can query 
> > > > > > > > > > qemu via
> > > > > > > > > > monitor ("info uuid") to figure what the uuid is.  It is 
> > > > > > > > > > also in the
> > > > > > > > > > smbios tables so the guest can see it in the system 
> > > > > > > > > > information table.
> > > > > > > > > >
> > > > > > > > > > The uuid is not visible to the kernel though, the kvm 
> > > > > > > > > > kernel driver
> > > > > > > > > > doesn't know what the uuid is (and neither does vfio).  
> > > > > > > > > > qemu uses
> file
> > > > > > > > > > handles to talk to both kvm and vfio.  qemu notifies both 
> > > > > > > > > > kvm and
> vfio
> > > > > > > > > > about anything relevant events (guest address space changes 
> > > > > > > > > > etc)
> and
> > > > > > > > > > connects file descriptors (eventfd -> irqfd).
> > > > > > > > >
> > > > > > > > > I think the original link to using a VM UUID for the vGPU 
> > > > > > > > > comes from
> > > > > > > > > NVIDIA having a userspace component which might get launched 
> > > > > > > > > from
> a udev
> > > > > > > > > event as the vGPU is created or the set of vGPUs within that 
> > > > > > > > > UUID is
> > > > > > > > > started.  Using the VM UUID then gives them a way to 
> > > > > > > > > associate that
> > > > > > > > > userspace process with a VM instance.  Maybe it could 
> > > > > > > > > register with
> > > > > > > > > libvirt for some sort of service provided for the VM, I don't 
> > > > > > > > > know.
> > > > > > > >
> > > > > > > > Intel doesn't have this requirement. It should be enough as 
> > > > > > > > long as
> > > > > > > > libvirt maintains which sysfs vgpu node is associated to a VM 
> > > > > > > > UUID.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > qemu needs a sysfs node as handle to the vfio device, 
> > > > > > > > > > something
> > > > > > > > > > like /sys/devices/virtual/vgpu/.   can be a 
> > > > > > > > > > uuid if
> you
> > > want
> > > > > > > > > > have it that way, but it could be pretty much anything.  
> > > > > > > > > > The sysfs node
> > > > > > > > > > will probably show up as-is in the libvirt xml when assign 
> > > > > > > > > > a vgpu to
> a
> > > > > > > > > > vm.  So the name should be something stable (i.e. when 
> > > > > > > > > > using a uuid
> as
> > > > > > > > > > name you should better not generate a new one on each boot).
> > > > > > > > >
> > > > > > > > > Actually I don't think there's really a persistent naming 
> > > > > > > > > issue, that's
> > > > > > > > > probably where we diverge from the SR-IOV model.  SR-IOV 
> > > > > > > > > cannot
> > > > > > > > > dynamically add a new VF, it needs to reset the number of VFs 
> > > > > > > > > to zero,
> > > > > > > > > then re-allocate all of them up to the new desired count.  
> > > > > > > > > That has some
> > > > > > > > > obvious implications.  I think with both vendors here, we can
> > > > > > > > > dynamically allocate new vGPUs, so I would expect that 
> > > > > > > > > libvirt would
> > > > > > > > > create each vGPU instance as it's needed.  None would be 
> > > > > > > > > created by
> > > > > > > > > default without user interaction.
> > > > > > > > >
> > > > > > > > > Personally I think using a UUID makes sense, but it needs to 
> > > > > > > > > be
> > > > > > > > > userspace policy whether that UUID has any implicit meaning 
> > > > > > > > > like
> > > > > > > > > matching the VM UUID.  Having an index within a UUID bothers 
> > > > > > > > > me a
> bit,
> > > >

Re: [Qemu-devel] [PATCH v3] virtio-pci: call pci reset variant when guest requests reset.

2016-02-16 Thread Fam Zheng

On Thu, 01/28 16:08, Gerd Hoffmann wrote:
> Actually fixes linux not finding virtio 1.0 device virtqueues after
> reboot.  Which is new I think, any chance linux kernel virtio code
> became more strict in 4.3?
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Fam Zheng 
Tested-by: Fam Zheng 

> ---
>  hw/virtio/virtio-pci.c | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 94667e6..fb1b061 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -47,6 +47,7 @@
>  
>  static void virtio_pci_bus_new(VirtioBusState *bus, size_t bus_size,
> VirtIOPCIProxy *dev);
> +static void virtio_pci_reset(DeviceState *qdev);
>  
>  /* virtio device */
>  /* DeviceState to VirtIOPCIProxy. For use off data-path. TODO: use QOM. */
> @@ -404,9 +405,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  case VIRTIO_PCI_QUEUE_PFN:
>  pa = (hwaddr)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
>  if (pa == 0) {
> -virtio_pci_stop_ioeventfd(proxy);
> -virtio_reset(vdev);
> -msix_unuse_all_vectors(>pci_dev);
> +virtio_pci_reset(DEVICE(proxy));
>  }
>  else
>  virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
> @@ -432,8 +431,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  }
>  
>  if (vdev->status == 0) {
> -virtio_reset(vdev);
> -msix_unuse_all_vectors(>pci_dev);
> +virtio_pci_reset(DEVICE(proxy));
>  }
>  
>  /* Linux before 2.6.34 drives the device without enabling
> @@ -1351,8 +1349,7 @@ static void virtio_pci_common_write(void *opaque, 
> hwaddr addr,
>  }
>  
>  if (vdev->status == 0) {
> -virtio_reset(vdev);
> -msix_unuse_all_vectors(>pci_dev);
> +virtio_pci_reset(DEVICE(proxy));
>  }
>  
>  break;
> -- 
> 1.8.3.1
> 
>

Re: [Qemu-devel] [PATCH] qed: fix bdrv_qed_drain

2016-02-16 Thread Fam Zheng

On Tue, 02/16 16:53, Paolo Bonzini wrote:
> The current implementation of bdrv_qed_drain can cause a double
> completion of a request.
> 
> The problem is that bdrv_qed_drain calls qed_plug_allocating_write_reqs
> unconditionally, but this is not correct if an allocating write
> is queued.  In this case, qed_unplug_allocating_write_reqs will
> restart the allocating write and possibly cause it to complete.
> The aiocb however is still in use for the L2/L1 table writes,
> and will then be completed again as soon as the table writes
> are stable.
> 
> The fix is to only call qed_plug_allocating_write_reqs and
> bdrv_aio_flush (which is the same as the timer callback---the patch
> makes this explicit) only if the timer was scheduled in the first
> place.  This fixes qemu-iotests test 011.
> 
> Cc: qemu-sta...@nongnu.org
> Cc: qemu-bl...@nongnu.org
> Cc: Stefan Hajnoczi 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/qed.c | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/block/qed.c b/block/qed.c
> index 404be1e..ebba220 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -380,12 +380,13 @@ static void bdrv_qed_drain(BlockDriverState *bs)
>  {
>  BDRVQEDState *s = bs->opaque;
>  
> -/* Cancel timer and start doing I/O that were meant to happen as if it
> - * fired, that way we get bdrv_drain() taking care of the ongoing 
> requests
> - * correctly. */
> -qed_cancel_need_check_timer(s);
> -qed_plug_allocating_write_reqs(s);
> -bdrv_aio_flush(s->bs, qed_clear_need_check, s);
> +/* Fire the timer immediately in order to start doing I/O as soon as the
> + * header is flushed.
> + */
> +if (s->need_check_timer && timer_pending(s->need_check_timer)) {

We can assert(s->need_check_timer);

> +qed_cancel_need_check_timer(s);
> +qed_need_check_timer_cb(s);
> +}

What if an allocating write is queued (the else branch case)? Its completion
will be in bdrv_drain and it could arm the need_check_timer which is wrong.

We need to drain the allocating_write_reqs queue before checking the timer.

Fam

Re: [Qemu-devel] [PATCH v2 06/11] nvdimm acpi: initialize the resource used by NVDIMM ACPI

2016-02-16 Thread Xiao Guangrong




On 02/16/2016 07:00 PM, Igor Mammedov wrote:

On Tue, 16 Feb 2016 02:35:41 +0800
Xiao Guangrong  wrote:


On 02/16/2016 01:24 AM, Igor Mammedov wrote:

On Mon, 15 Feb 2016 23:53:13 +0800
Xiao Guangrong  wrote:


On 02/15/2016 09:32 PM, Igor Mammedov wrote:

On Mon, 15 Feb 2016 13:45:59 +0200
"Michael S. Tsirkin"  wrote:


On Mon, Feb 15, 2016 at 11:47:42AM +0100, Igor Mammedov wrote:

On Mon, 15 Feb 2016 18:13:38 +0800
Xiao Guangrong  wrote:


On 02/15/2016 05:18 PM, Michael S. Tsirkin wrote:

On Mon, Feb 15, 2016 at 10:11:05AM +0100, Igor Mammedov wrote:

On Sun, 14 Feb 2016 13:57:27 +0800
Xiao Guangrong  wrote:


On 02/08/2016 07:03 PM, Igor Mammedov wrote:

On Wed, 13 Jan 2016 02:50:05 +0800
Xiao Guangrong  wrote:


32 bits IO port starting from 0x0a18 in guest is reserved for NVDIMM
ACPI emulation. The table, NVDIMM_DSM_MEM_FILE, will be patched into
NVDIMM ACPI binary code

OSPM uses this port to tell QEMU the final address of the DSM memory
and notify QEMU to emulate the DSM method

Would you need to pass control to QEMU if each NVDIMM had its whole
label area MemoryRegion mapped right after its storage MemoryRegion?



No, label data is not mapped into guest's address space and it only
can be accessed by DSM method indirectly.

Yep, per spec label data should be accessed via _DSM but question
wasn't about it,


Ah, sorry, i missed your question.


Why would one map only 4Kb window and serialize label data
via it if it could be mapped as whole, that way _DMS method will be
much less complicated and there won't be need to add/support a protocol
for its serialization.



Is it ever accessed on data path? If not I prefer the current approach:


The label data is only accessed via two DSM commands - Get Namespace Label
Data and Set Namespace Label Data, no other place need to be emulated.


limit the window used, the serialization protocol seems rather simple.



Yes.

Label data is at least 128k which is big enough for BIOS as it allocates
memory at 0 ~ 4G which is tight region. It also needs guest OS to support
lager max-xfer (the max size that can be transferred one time), the size
in current Linux NVDIMM driver is 4k.

However, using lager DSM buffer can help us to simplify NVDIMM hotplug for
the case that too many nvdimm devices present in the system and their FIT
info can not be filled into one page. Each PMEM-only device needs 0xb8 bytes
and we can append 256 memory devices at most, so 12 pages are needed to
contain this info. The prototype we implemented is using ourself-defined
protocol to read piece of _FIT and concatenate them before return to Guest,
please refer to:
https://github.com/xiaogr/qemu/commit/c46ce01c8433ac0870670304360b3c4aa414143a

As 12 pages are not small region for BIOS and the _FIT size may be extended in 
the
future development (eg, if PBLK is introduced) i am not sure if we need this. Of
course, another approach to simplify it is that we limit the number of NVDIMM
device to make sure their _FIT < 4k.

My suggestion is not to have only one label area for every NVDIMM but
rather to map each label area right after each NVDIMM's data memory.
That way _DMS can be made non-serialized and guest could handle
label data in parallel.


I think that alignment considerations would mean we are burning up
1G of phys address space for this. For PAE we only have 64G
of this address space, so this would be a problem.

That's true that it will burning away address space, however that
just means that PAE guests would not be able to handle as many
NVDIMMs as 64bit guests. The same applies to DIMMs as well, with
alignment enforced. If one needs more DIMMs he/she can switch
to 64bit guest to use them.

It's trade of inefficient GPA consumption vs efficient NVDIMMs access.
Also with fully mapped label area for each NVDIMM we don't have to
introduce and maintain any guest visible serialization protocol
(protocol for serializing _DSM via 4K window) which becomes ABI.


It's true for label access but it is not for the long term as we will
need to support other _DSM commands such as vendor specific command,
PBLK dsm command, also NVDIMM MCE related commands will be introduced
in the future, so we will come back here at that time. :(

I believe for block mode NVDIMM would also need per NVDIMM mapping
for performance reasons (parallel access).
As for the rest could that commands go via MMIO that we usually
use for control path?


So both input data and output data go through single MMIO, we need to
introduce a protocol to pass these data, that is complex?

And is any MMIO we can reuse (more complexer？) or we should allocate this
MMIO page （the old question - where to allocated?）?

Maybe you could reuse/extend memhotplug IO interface,
or alternatively as Michael suggested add a vendor specific PCI_Config,
I'd suggest PM

Re: [Qemu-devel] [PATCH v8 0/5] add ACPI node for fw_cfg on pc and arm

2016-02-16 Thread Gabriel L. Somlo

On Thu, Feb 11, 2016 at 05:06:00PM -0500, Gabriel L. Somlo wrote:
> Generate an ACPI DSDT node for fw_cfg on pc and arm guests.
> 
> New since v7:
> 
>   - edited commit blurb on 3/5 to match updated content, i.e. that
> the ACPI node is now inserted into the DSDT (no longer the SSDT).
> (Thanks to Igor Mammedov for catching that!)

BTW, regarding Igor's question about Windows starting to search for a
driver: Just for grins, I installed Windows 10 (with qemu git master
*before* this series was applied). Then, after applying the series,
DeviceManager was happy and had no unknown hardware listed.

Only after setting the fw_cfg _STA to 0x0F did I get an unknown device
like so: http://imagebin.ca/v/2XDUfONVF3bY

So, on XP, Windows 7, and Windows 10, the fw_cfg device showing up in
ACPI with a _STA set to 0x0B will *NOT* prompt the device manager to 
start making trouble :)

Thanks,
--Gabriel

> >New since v6:
> > - rebased to fit on top of fb306ff and f264d36, which moved things
> >   around in pc's acpi-build.c (only patch 3/5 affected);
> > - kernel-side fw_cfg sysfs driver accepted into upstream linux
> >
> >>New since v5:
> >>
> >>- rebased on top of latest QEMU git master
> >>
> >>>New since v4:
> >>>
> >>>   - rebased on top of Marc's DMA series
> >>>   - drop machine compat dependency for insertion into x86/ssdt
> >>> (patch 3/5), following agreement between Igor and Eduardo
> >>>   - [mm]io register range now covers DMA register as well, if
> >>> available.
> >>>   - s/bios/firmware in doc file updates
> >>>
> New since v3:
> 
>   - rebased to work on top of 87e896ab (introducing pc-*-25 classes),
> inserting fw_cfg acpi node only for machines >= 2.5.
> 
>   - reintroduce _STA with value 0x0B (bit 2 for u/i visibility turned
> off to avoid Windows complaining -- thanks Igor for catching that!)
> 
> If there's any other feedback besides questions regarding the
> appropriateness of "QEMU0002" as the value of _HID, please don't hesitate!
> 
> >New since v2:
> >
> > - pc/i386 node in ssdt only on machine types *newer* than 2.4
> >   (as suggested by Eduardo)
> >
> >I appreciate any further comments and reviews. Hopefully we can make
> >this palatable for upstream, modulo the lingering concerns about whether
> >"QEMU0002" is ok to use as the value of _HID, which I'll hopefully get
> >sorted out with the kernel crew...
> >
> >>New since v1:
> >>
> >>- expose control register size (suggested by Marc Marí)
> >>
> >>- leaving out _UID and _STA fields (thanks Shannon & Igor)
> >>
> >>- using "QEMU0002" as the value of _HID (thanks Michael)
> >>
> >>- added documentation blurb to docs/specs/fw_cfg.txt
> >>  (mainly to record usage of the "QEMU0002" string with fw_cfg).
> >>
> >>> This series adds a fw_cfg device node to the SSDT (on pc), or to the
> >>> DSDT (on arm).
> >>>
> >>>   - Patch 1/3 moves (and renames) the BIOS_CFG_IOPORT (0x510)
> >>> define from pc.c to pc.h, so that it could be used from
> >>> acpi-build.c in patch 2/3.
> >>> 
> >>>   - Patch 2/3 adds a fw_cfg node to the pc SSDT.
> >>> 
> >>>   - Patch 3/3 adds a fw_cfg node to the arm DSDT.
> >>>
> >>> I made up some names - "FWCF" for the node name, and "FWCF0001"
> >>> for _HID; no idea whether that's appropriate, or how else I should
> >>> figure out what to use instead...
> >>>
> >>> Also, using scope "\\_SB", based on where fw_cfg shows up in the
> >>> output of "info qtree". Again, if that's wrong, please point me in
> >>> the right direction.
> >>>
> >>> Re. 3/3 (also mentioned after the commit blurb in the patch itself),
> >>> I noticed none of the other DSDT entries contain a _STA field, 
> >>> wondering
> >>> why it would (not) make sense to include that, same as on the PC.
> 
> Gabriel L. Somlo (5):
>   fw_cfg: expose control register size in fw_cfg.h
>   pc: fw_cfg: move ioport base constant to pc.h
>   acpi: pc: add fw_cfg device node to dsdt
>   acpi: arm: add fw_cfg device node to dsdt
>   fw_cfg: document ACPI device node information
> 
>  docs/specs/fw_cfg.txt |  9 +
>  hw/arm/virt-acpi-build.c  | 15 +++
>  hw/i386/acpi-build.c  | 29 +
>  hw/i386/pc.c  |  5 ++---
>  hw/nvram/fw_cfg.c |  4 +++-
>  include/hw/i386/pc.h  |  2 ++
>  include/hw/nvram/fw_cfg.h |  3 +++
>  7 files changed, 63 insertions(+), 4 deletions(-)
> 
> -- 
> 2.4.3
>

[Qemu-devel] [PATCH] qapi-visit: Honor prefix of discriminator enum

2016-02-16 Thread Eric Blake

When we added support for a user-specified prefix for an enum
type (commit 351d36e), we forgot to teach the qapi-visit code
to honor that prefix in the case of using a prefixed enum as
the discriminator for a flat union.  While there is still some
on-list debate on whether we want to keep prefixes, we should
at least make it work as long as it is still part of the code
base.

Reported-by: Daniel P. Berrange 
Signed-off-by: Eric Blake 
---
 scripts/qapi-visit.py   | 3 ++-
 tests/qapi-schema/qapi-schema-test.json | 9 ++---
 tests/qapi-schema/qapi-schema-test.out  | 7 +--
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
index 0cc9b08..2bdb5a1 100644
--- a/scripts/qapi-visit.py
+++ b/scripts/qapi-visit.py
@@ -293,7 +293,8 @@ void visit_type_%(c_name)s(Visitor *v, const char *name, 
%(c_name)s **obj, Error
 case %(case)s:
 ''',
  case=c_enum_const(variants.tag_member.type.name,
-   var.name))
+   var.name,
+   variants.tag_member.type.prefix))
 if simple_union_type:
 ret += mcgen('''
 visit_type_%(c_type)s(v, "data", &(*obj)->u.%(c_name)s, );
diff --git a/tests/qapi-schema/qapi-schema-test.json 
b/tests/qapi-schema/qapi-schema-test.json
index 4b89527..353a34e 100644
--- a/tests/qapi-schema/qapi-schema-test.json
+++ b/tests/qapi-schema/qapi-schema-test.json
@@ -73,14 +73,17 @@
   'base': 'UserDefZero',
   'data': { 'string': 'str', 'enum1': 'EnumOne' } }

+{ 'struct': 'UserDefUnionBase2',
+  'base': 'UserDefZero',
+  'data': { 'string': 'str', 'enum1': 'QEnumTwo' } }
+
 # this variant of UserDefFlatUnion defaults to a union that uses fields with
 # allocated types to test corner cases in the cleanup/dealloc visitor
 { 'union': 'UserDefFlatUnion2',
-  'base': 'UserDefUnionBase',
+  'base': 'UserDefUnionBase2',
   'discriminator': 'enum1',
   'data': { 'value1' : 'UserDefC', # intentional forward reference
-'value2' : 'UserDefB',
-'value3' : 'UserDefA' } }
+'value2' : 'UserDefB' } }

 { 'alternate': 'UserDefAlternate',
   'data': { 'uda': 'UserDefA', 's': 'str', 'i': 'int' } }
diff --git a/tests/qapi-schema/qapi-schema-test.out 
b/tests/qapi-schema/qapi-schema-test.out
index 2c546b7..241aadb 100644
--- a/tests/qapi-schema/qapi-schema-test.out
+++ b/tests/qapi-schema/qapi-schema-test.out
@@ -121,11 +121,10 @@ object UserDefFlatUnion
 case value2: UserDefB
 case value3: UserDefB
 object UserDefFlatUnion2
-base UserDefUnionBase
+base UserDefUnionBase2
 tag enum1
 case value1: UserDefC
 case value2: UserDefB
-case value3: UserDefA
 object UserDefNativeListUnion
 member type: UserDefNativeListUnionKind optional=False
 case integer: :obj-intList-wrapper
@@ -167,6 +166,10 @@ object UserDefUnionBase
 base UserDefZero
 member string: str optional=False
 member enum1: EnumOne optional=False
+object UserDefUnionBase2
+base UserDefZero
+member string: str optional=False
+member enum1: QEnumTwo optional=False
 object UserDefZero
 member integer: int optional=False
 event __ORG.QEMU_X-EVENT __org.qemu_x-Struct
-- 
2.5.0

Re: [Qemu-devel] [PATCH v10 02/13] qapi: Forbid empty unions and useless alternates

2016-02-16 Thread Eric Blake

On 02/16/2016 09:08 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Empty unions serve no purpose, and while we compile with gcc
>> which permits them, strict C99 forbids them.  We could inject
>> a dummy member (and in fact, we do for empty structs), but while
> 
> gen_variants() injects void *data. 
> 
>> empty structs make sense in qapi,
> 
> Suggest to cut the paragaph until here.

Side effect of rebasing - I originally had this patch after one that
deletes the 'data' member, but that requires a few other patches.  I'll
reword it to mention that we want to delete 'data', at which point we
would be left with an empty union if we didn't prohibit it at parse time.

>> @@ -613,7 +616,11 @@ def check_alternate(expr, expr_info):
>>  members = expr['data']
>>  types_seen = {}
>>
>> -# Check every branch
>> +# Check every branch; require at least two branches
>> +if len(members) < 2:
>> +raise QAPIExprError(expr_info,
>> +"Alternate '%s' should have at least two 
>> branches "
>> +"in 'data'" % name)
> 
> This is stricter than the commit message announced.  You can either
> relax to at least one branch, or amend the commit message.

Will amend; a later patch actually relies on the 2-or-more promise for
alternates, so it is worth documenting as intentional.

>>  A simple union type defines a mapping from automatic discriminator
>>  values to data types like in this example:
> 
> Missing: update of section "Alternate types".

Already done there (commit 7b1b98c4):

"An alternate type is one that allows a choice between two or more JSON
data types (string, integer, number, or object, but currently not..."

I didn't see anything worth rewording.

> 
> [tests/ diff looks good]
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] -qmp tcp:localhost:xxx, server, nowait inside console and in programm

2016-02-16 Thread Vasiliy Tolstov

Hi! I have strange think, i'm try to run qemu with -qmp flag, when i'm
run qemu with -qmp tcp:localhost:,server,nowait inside linux
console all works fine and telnet gets qemu banner, but when i'm run
inside golang programm telnet connected but does not get banner.
qemu args identical.
the difference is that qemu inside programm does not have stding (it
/dev/null) and stdout/stderr goest to additional writers (log writer).

Why this can happening?

-- 
Vasiliy Tolstov,
e-mail: v.tols...@selfip.ru

Re: [Qemu-devel] [PATCH v2 7/9] i.MX: Add i.MX6 SOC implementation.

2016-02-16 Thread Peter Maydell

On 16 February 2016 at 21:47, Jean-Christophe DUBOIS
 wrote:
> In QEMU, other Cortex A9 (Versatilepb.c, Exynos, Zynq ...) are also setting
> has_el3 to false ...

So these generally are the "legacy" platforms which were
added before we ever had EL3 support in QEMU. For them it's hard
to turn the EL3 support on for the board even if in theory
it ought to be on, because we don't know what users are
running on it that we might break. With a new to QEMU board
we have an opportunity to get it right from the start.

-kernel I would expect to work, though, at least if the
only issue is the interrupt controller setup. It seems
worth investigating why it goes wrong.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 7/9] i.MX: Add i.MX6 SOC implementation.

2016-02-16 Thread Jean-Christophe DUBOIS


Le 16/02/2016 22:06, Peter Maydell a écrit :

On 16 February 2016 at 20:49, Jean-Christophe DUBOIS
 wrote:

Le 16/02/2016 16:31, Peter Maydell a écrit :

On 8 February 2016 at 22:08, Jean-Christophe Dubois 
wrote:

+object_property_set_bool(OBJECT(>cpu[i]), false,
+ "has_el3", _abort);

Do the CPUs in this board really not have EL3 ?


Well the Cortex A9 is certainly able to get to the secure mode. However, if
I enable it (has_el3 set to true), the OS (Linux or Xvisor) will not boot.
Disabling it allow both OS to boot on the emulated i.MX6.

Would you have some idea on the reason for this "hang" during the boot when
EL3 is enabled?

How are you booting the OS/hypervisor? Via -kernel or via -bios ?


via -kernel ...


Usually if it doesn't boot this is because there's some bit of
boot rom/loader code that runs on the real h/w and isn't being
run in your QEMU setup, that does the initial setup of the h/w
in secure mode. (In particular, if the secure side doesn't set
the GIC interrupts to be NS accessible then things don't work
very well.)


Well I guess on real hw uboot is setting things so that everything work 
thereafter. But here I don't have uboot and I jump directly to Linux ...


In QEMU, other Cortex A9 (Versatilepb.c, Exynos, Zynq ...) are also 
setting has_el3 to false ...


JC


I wouldn't have expected that to be an issue for booting linux
via -kernel though because there we should be booting the kernel
NS and have a hack to configure the GIC appropriately.

Peter C may remember the details of how this should work
better than me.

thanks
-- PMM

[Qemu-devel] [RFC PATCH v4] fw/pci: Add support for mapping Intel IGD via QEMU

2016-02-16 Thread Alex Williamson

QEMU provides two fw_cfg files to support IGD.  The first holds the
OpRegion data which holds the Video BIOS Table (VBT).  This needs to
be copied into reserved memory and the address stored in the ASL
Storage register of the device at 0xFC offset in PCI config space.
The OpRegion is generally 8KB.  This file is named "etc/igd-opregion".

The second file tells us the required size of the stolen memory space
for the device.  This is a dummy file, it has no backing so we only
allocate the space without copying anything into it.  This space
requires 1MB alignment and is generally either 1MB or 2MB, depending
on the hardware config.  If the user has opted in QEMU to expose
additional stolen memory beyond the GTT (GGMS), the GMS may add an
additional 32MB to 512MB.  The base address of the reserved memory
allocated for this is written back to the Base Data of Stolen Memory
register (BDSM) at PCI config offset 0x5C on the device.  This file is
named "etc/igd-bdsm".

Signed-off-by: Alex Williamson 
---

v4: Back to a single patch thanks to Kevin's suggestion to use
memalign_tmphigh() for larger allocations.  Now creating
reserved space for stolen memory and writing the value to
the BDSM register is queued off the existence of a fw_cfg
file, just like the OpRegion.  The only difference is that
we don't copy the contents, just use the meta data.

 src/fw/pciinit.c |   47 +++
 1 file changed, 47 insertions(+)

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index 0ed5dfb..dc2e433 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -269,6 +269,49 @@ static void ich9_smbus_setup(struct pci_device *dev, void 
*arg)
 pci_config_writeb(bdf, ICH9_SMB_HOSTC, ICH9_SMB_HOSTC_HST_EN);
 }
 
+static void intel_igd_setup(struct pci_device *dev, void *arg)
+{
+struct romfile_s *opregion = romfile_find("etc/igd-opregion");
+struct romfile_s *bdsm = romfile_find("etc/igd-bdsm");
+void *addr;
+u16 bdf = dev->bdf;
+
+if (opregion && opregion->size) {
+addr = memalign_high(PAGE_SIZE, opregion->size);
+if (!addr) {
+warn_noalloc();
+return;
+}
+
+if (opregion->copy(opregion, addr, opregion->size) < 0) {
+free(addr);
+return;
+}
+
+pci_config_writel(bdf, 0xFC, cpu_to_le32((u32)addr));
+
+dprintf(1, "Intel IGD OpRegion enabled at 0x%08x, size %dKB, dev "
+"%02x:%02x.%x\n", (u32)addr, opregion->size >> 10,
+pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
+}
+
+if (bdsm && bdsm->size) {
+addr = memalign_tmphigh(1024 * 1024, bdsm->size);
+if (!addr) {
+warn_noalloc();
+return;
+}
+
+e820_add((u32)addr, bdsm->size, E820_RESERVED);
+
+pci_config_writel(bdf, 0x5C, cpu_to_le32((u32)addr));
+
+dprintf(1, "Intel IGD BDSM enabled at 0x%08x, size %dMB, dev "
+"%02x:%02x.%x\n", (u32)addr, bdsm->size >> 20,
+pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
+}
+}
+
 static const struct pci_device_id pci_device_tbl[] = {
 /* PIIX3/PIIX4 PCI to ISA bridge */
 PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0,
@@ -302,6 +345,10 @@ static const struct pci_device_id pci_device_tbl[] = {
 PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0017, 0xff00, apple_macio_setup),
 PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0022, 0xff00, apple_macio_setup),
 
+/* Intel IGD OpRegion setup */
+PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+ intel_igd_setup),
+
 PCI_DEVICE_END,
 };

[Qemu-devel] [RFC PATCH v3 9/9] Intel IGD support for vfio

2016-02-16 Thread Alex Williamson


---
 hw/vfio/common.c  |2 
 hw/vfio/pci-quirks.c  |  476 +
 hw/vfio/pci.c |   68 ++
 hw/vfio/pci.h |9 +
 include/hw/vfio/vfio-common.h |2 
 trace-events  |7 +
 6 files changed, 562 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 879a657..c201bee 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -493,7 +493,7 @@ static void vfio_listener_release(VFIOContainer *container)
 memory_listener_unregister(>listener);
 }
 
-static struct vfio_info_cap_header *
+struct vfio_info_cap_header *
 vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id)
 {
 struct vfio_info_cap_header *hdr;
diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 49ecf11..5828362 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -11,9 +11,11 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/nvram/fw_cfg.h"
 #include "pci.h"
 #include "trace.h"
 #include "qemu/range.h"
+#include "qemu/error-report.h"
 
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
 static bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint32_t device)
@@ -962,6 +964,479 @@ static void vfio_probe_rtl8168_bar2_quirk(VFIOPCIDevice 
*vdev, int nr)
 }
 
 /*
+ * Intel IGD support
+ *
+ * We need to do a few things to support Intel Integrated Graphics Devices:
+ *  1) Define a stolen memory region and trap I/O port writes programming it
+ *  2) Expose the OpRegion if one is provided to us
+ *  3) Copy key PCI config space register values from the host bridge
+ *  4) Create an LPC/ISA bridge and do the same for it.
+ *
+ * We mostly try to hide IGD stolen memory (GGMS/GMS) from the VM, but if a ROM
+ * is exposed it will try to use at least 1MB of GGMS stolen memory, apparently
+ * for VESA modes.  The ROM itself seems to contain the address of the host
+ * stolen memory range as execution of the vBIOS writes these addresses without
+ * probing the hardware.  Fortunately the vBIOS writes these addresses through
+ * I/O port space and they're only for use by the graphics device itself.
+ * Therefore we can intercept them without a performance penalty to native
+ * drivers and we can modify them to a new range without affecting
+ * functionality.  We ask the VM BIOS to allocate a new reserved range for this
+ * with the "etc/igd-bdsm" fw_cfg file, which is not actually readable, just a
+ * convenient way of providing a named tag with size.  If VGA is not disabled
+ * on IGD, we'll also automatically enable it through this process since
+ * execution of the vBIOS sort of implies VGA.  This can all be disabled by
+ * passing rombar=0 to the device.
+ *
+ * The remaining quirks are all enabled through vfio device specific regions
+ * and are triggered through discovery of those regions.  Exposing the OpRegion
+ * is mainly useful for the Video BIOS Table (VBT).  We create a copy of the
+ * OpRegion data and ask the VM BIOS to create storage space for it and copy it
+ * into VM memory using the "etc/igd-opregion" fw_cfg file.
+ *
+ * The host and ISA bridge features are necessary for IGD versions that do not
+ * support Intel's Universal Passthrough Mode (UPT).  UPT should be supported
+ * on BroadWell and newer GPUs.  However, not all guest drivers support this.
+ * Therefore if the IGD is an 8th generation or newer, we'll only initialize
+ * these devices if VGA mode is not supported.  Since VGA mode can be
+ * automatically enabled for the stolen memory support above, this means
+ * specifically disabling ROM support.
+ */
+
+/*
+ * This presumes the device is already known to be an Intel VGA device, so we
+ * take liberties in which device ID bits match which generation.
+ * See linux:include/drm/i915_pciids.h for IDs.
+ */
+static int igd_gen(VFIOPCIDevice *vdev)
+{
+if ((vdev->device_id & 0xfff) == 0xa84) {
+return 8; /* Broxton */
+}
+
+switch (vdev->device_id & 0xff00) {
+/* Old, untested, unavailable, unknown */
+case 0x:
+case 0x2500:
+case 0x2700:
+case 0x2900:
+case 0x2a00:
+case 0x2e00:
+case 0x3500:
+case 0xa000:
+return -1;
+/* SandyBridge, IvyBridge, ValleyView, Haswell */
+case 0x0100:
+case 0x0400:
+case 0x0a00:
+case 0x0c00:
+case 0x0d00:
+case 0x0f00:
+return 6;
+/* BroadWell, CherryView, SkyLake, KabyLake */
+case 0x1600:
+case 0x1900:
+case 0x2200:
+case 0x5900:
+return 8;
+}
+
+return 8; /* Assume newer is compatible */
+}
+
+typedef struct VFIOIGDQuirk {
+struct VFIOPCIDevice *vdev;
+uint32_t index;
+} VFIOIGDQuirk;
+
+#define IGD_GMCH 0x50 /* Graphics Control Register */
+#define IGD_BDSM 0x5c /* Base Data of Stolen Memory */
+#define IGD_ASLS 0xfc /* ASL Storage Register */
+
+static uint64_t vfio_igd_quirk_data_read(void *opaque,
+

[Qemu-devel] [PATCH] target-tricore: Fix wrong precedences on psw_write

2016-02-16 Thread Bastian Koppelmann

Wrong braces on the restore of the cached TCGv SV and V bit could lead to
a wrong PSW. While at this it removes unnecessary braces for the restore
of the cached TCGv AV and SAV bits.

Signed-off-by: Bastian Koppelmann 
---
 target-tricore/helper.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-tricore/helper.c b/target-tricore/helper.c
index a8fd418..7d96dad 100644
--- a/target-tricore/helper.c
+++ b/target-tricore/helper.c
@@ -127,9 +127,9 @@ uint32_t psw_read(CPUTriCoreState *env)
 void psw_write(CPUTriCoreState *env, uint32_t val)
 {
 env->PSW_USB_C = (val & MASK_USB_C);
-env->PSW_USB_V = (val & MASK_USB_V << 1);
-env->PSW_USB_SV = (val & MASK_USB_SV << 2);
-env->PSW_USB_AV = ((val & MASK_USB_AV) << 3);
-env->PSW_USB_SAV = ((val & MASK_USB_SAV) << 4);
+env->PSW_USB_V = (val & MASK_USB_V) << 1;
+env->PSW_USB_SV = (val & MASK_USB_SV) << 2;
+env->PSW_USB_AV = (val & MASK_USB_AV) << 3;
+env->PSW_USB_SAV = (val & MASK_USB_SAV) << 4;
 env->PSW = val;
 }
-- 
2.7.1

[Qemu-devel] [RFC PATCH v3 7/9] vfio/pci: Fixup PCI option ROMs

2016-02-16 Thread Alex Williamson

Devices like Intel graphics are known to not only have bad checksums,
but also the wrong device ID.  This is not so surprising given that
the video BIOS is typically part of the system firmware image rather
that embedded into the device and needs to support any IGD device
installed into the system.

Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   30 ++
 1 file changed, 30 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8e20781..d0d0864 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -832,6 +832,36 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 break;
 }
 }
+
+/*
+ * Test the ROM signature against our device, if the vendor is correct
+ * but the device ID doesn't match, store the correct device ID and
+ * recompute the checksum.  Intel IGD devices need this and are known
+ * to have bogus checksums so we can't simply adjust the checksum.
+ */
+if (pci_get_word(vdev->rom) == 0xaa55 &&
+pci_get_word(vdev->rom + 0x18) + 8 < vdev->rom_size &&
+!memcmp(vdev->rom + pci_get_word(vdev->rom + 0x18), "PCIR", 4)) {
+uint16_t vid, did;
+
+vid = pci_get_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 4);
+did = pci_get_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 6);
+
+if (vid == vdev->vendor_id && did != vdev->device_id) {
+int i;
+uint8_t csum, *data = vdev->rom;
+
+pci_set_word(vdev->rom + pci_get_word(vdev->rom + 0x18) + 6,
+ vdev->device_id);
+data[6] = 0;
+
+for (csum = 0, i = 0; i < vdev->rom_size; i++) {
+csum += data[i];
+}
+
+data[6] = -csum;
+}
+}
 }
 
 static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)

[Qemu-devel] [RFC PATCH v3 5/9] linux-headers/vfio: Update for proposed capabilities list

2016-02-16 Thread Alex Williamson

Signed-off-by: Alex Williamson 
---
 linux-headers/linux/vfio.h |  101 +++-
 1 file changed, 99 insertions(+), 2 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index aa276bc..759b850 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -39,6 +39,13 @@
 #define VFIO_SPAPR_TCE_v2_IOMMU7
 
 /*
+ * The No-IOMMU IOMMU offers no translation or isolation for devices and
+ * supports no ioctls outside of VFIO_CHECK_EXTENSION.  Use of VFIO's No-IOMMU
+ * code will taint the host kernel and should be used with extreme caution.
+ */
+#define VFIO_NOIOMMU_IOMMU 8
+
+/*
  * The IOCTL interface is designed for extensibility by embedding the
  * structure length (argsz) and flags into structures passed between
  * kernel and userspace.  We therefore use the _IO() macro for these
@@ -52,6 +59,33 @@
 #define VFIO_TYPE  (';')
 #define VFIO_BASE  100
 
+/*
+ * For extension of INFO ioctls, VFIO makes use of a capability chain
+ * designed after PCI/e capabilities.  A flag bit indicates whether
+ * this capability chain is supported and a field defined in the fixed
+ * structure defines the offset of the first capability in the chain.
+ * This field is only valid when the corresponding bit in the flags
+ * bitmap is set.  This offset field is relative to the start of the
+ * INFO buffer, as is the next field within each capability header.
+ * The id within the header is a shared address space per INFO ioctl,
+ * while the version field is specific to the capability id.  The
+ * contents following the header are specific to the capability id.
+ */
+struct vfio_info_cap_header {
+   __u16   id; /* Identifies capability */
+   __u16   version;/* Version specific to the capability ID */
+   __u32   next;   /* Offset of next capability */
+};
+
+/*
+ * Callers of INFO ioctls passing insufficiently sized buffers will see
+ * the capability chain flag bit set, a zero value for the first capability
+ * offset (if available within the provided argsz), and argsz will be
+ * updated to report the necessary buffer size.  For compatibility, the
+ * INFO ioctl will not report error in this case, but the capability chain
+ * will not be available.
+ */
+
 /*  IOCTLs for VFIO file descriptor (/dev/vfio/vfio)  */
 
 /**
@@ -187,13 +221,73 @@ struct vfio_region_info {
 #define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports read */
 #define VFIO_REGION_INFO_FLAG_WRITE(1 << 1) /* Region supports write */
 #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */
+#define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */
__u32   index;  /* Region index */
-   __u32   resv;   /* Reserved for alignment */
+   __u32   cap_offset; /* Offset within info struct of first cap */
__u64   size;   /* Region size (bytes) */
__u64   offset; /* Region offset from start of device fd */
 };
 #define VFIO_DEVICE_GET_REGION_INFO_IO(VFIO_TYPE, VFIO_BASE + 8)
 
+/*
+ * The sparse mmap capability allows finer granularity of specifying areas
+ * within a region with mmap support.  When specified, the user should only
+ * mmap the offset ranges specified by the areas array.  mmaps outside of the
+ * areas specified may fail (such as the range covering a PCI MSI-X table) or
+ * may result in improper device behavior.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_REGION_INFO_CAP_SPARSE_MMAP   1
+
+struct vfio_region_sparse_mmap_area {
+   __u64   offset; /* Offset of mmap'able area within region */
+   __u64   size;   /* Size of mmap'able area */
+};
+
+struct vfio_region_info_cap_sparse_mmap {
+   struct vfio_info_cap_header header;
+   __u32   nr_areas;
+   __u32   reserved;
+   struct vfio_region_sparse_mmap_area areas[];
+};
+
+/*
+ * The device specific type capability allows regions unique to a specific
+ * device or class of devices to be exposed.  This helps solve the problem for
+ * vfio bus drivers of defining which region indexes correspond to which region
+ * on the device, without needing to resort to static indexes, as done by
+ * vfio-pci.  For instance, if we were to go back in time, we might remove
+ * VFIO_PCI_VGA_REGION_INDEX and let vfio-pci simply define that all indexes
+ * greater than or equal to VFIO_PCI_NUM_REGIONS are device specific and we'd
+ * make a "VGA" device specific type to describe the VGA access space.  This
+ * means that non-VGA devices wouldn't need to waste this index, and thus the
+ * address space associated with it due to implementation of device file
+ * descriptor offsets in vfio-pci.
+ *
+ * The current implementation is now part of the user ABI, so we can't use this
+ * for VGA, but there are other upcoming use cases, such

[Qemu-devel] [RFC PATCH v3 8/9] vfio/pci: Split out VGA setup

2016-02-16 Thread Alex Williamson

This could be setup later by device specific code, such as IGD
initialization.

Signed-off-by: Alex Williamson 
---
 hw/vfio/pci.c |   82 +
 hw/vfio/pci.h |2 +
 2 files changed, 50 insertions(+), 34 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d0d0864..368f40d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2053,6 +2053,52 @@ static VFIODeviceOps vfio_pci_ops = {
 .vfio_eoi = vfio_intx_eoi,
 };
 
+int vfio_populate_vga(VFIOPCIDevice *vdev)
+{
+VFIODevice *vbasedev = >vbasedev;
+struct vfio_region_info *reg_info;
+int ret;
+
+if (vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
+ret = vfio_get_region_info(vbasedev,
+   VFIO_PCI_VGA_REGION_INDEX, _info);
+if (ret) {
+return ret;
+}
+
+if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
+!(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
+reg_info->size < 0xb + 1) {
+error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
+ (unsigned long)reg_info->flags,
+ (unsigned long)reg_info->size);
+g_free(reg_info);
+return -EINVAL;
+}
+
+vdev->vga = g_new0(VFIOVGA, 1);
+
+vdev->vga->fd_offset = reg_info->offset;
+vdev->vga->fd = vdev->vbasedev.fd;
+
+g_free(reg_info);
+
+vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
+vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
+QLIST_INIT(>vga->region[QEMU_PCI_VGA_MEM].quirks);
+
+vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
+vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
+QLIST_INIT(>vga->region[QEMU_PCI_VGA_IO_LO].quirks);
+
+vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
+vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
+QLIST_INIT(>vga->region[QEMU_PCI_VGA_IO_HI].quirks);
+}
+
+return 0;
+}
+
 static int vfio_populate_device(VFIOPCIDevice *vdev)
 {
 VFIODevice *vbasedev = >vbasedev;
@@ -2112,45 +2158,13 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 
 g_free(reg_info);
 
-if ((vdev->features & VFIO_FEATURE_ENABLE_VGA) &&
-vbasedev->num_regions > VFIO_PCI_VGA_REGION_INDEX) {
-ret = vfio_get_region_info(vbasedev,
-   VFIO_PCI_VGA_REGION_INDEX, _info);
+if (vdev->features & VFIO_FEATURE_ENABLE_VGA) {
+ret = vfio_populate_vga(vdev);
 if (ret) {
 error_report(
 "vfio: Device does not support requested feature x-vga");
 goto error;
 }
-
-if (!(reg_info->flags & VFIO_REGION_INFO_FLAG_READ) ||
-!(reg_info->flags & VFIO_REGION_INFO_FLAG_WRITE) ||
-reg_info->size < 0xb + 1) {
-error_report("vfio: Unexpected VGA info, flags 0x%lx, size 0x%lx",
- (unsigned long)reg_info->flags,
- (unsigned long)reg_info->size);
-g_free(reg_info);
-ret = -1;
-goto error;
-}
-
-vdev->vga = g_new0(VFIOVGA, 1);
-
-vdev->vga->fd_offset = reg_info->offset;
-vdev->vga->fd = vdev->vbasedev.fd;
-
-g_free(reg_info);
-
-vdev->vga->region[QEMU_PCI_VGA_MEM].offset = QEMU_PCI_VGA_MEM_BASE;
-vdev->vga->region[QEMU_PCI_VGA_MEM].nr = QEMU_PCI_VGA_MEM;
-QLIST_INIT(>vga->region[QEMU_PCI_VGA_MEM].quirks);
-
-vdev->vga->region[QEMU_PCI_VGA_IO_LO].offset = QEMU_PCI_VGA_IO_LO_BASE;
-vdev->vga->region[QEMU_PCI_VGA_IO_LO].nr = QEMU_PCI_VGA_IO_LO;
-QLIST_INIT(>vga->region[QEMU_PCI_VGA_IO_LO].quirks);
-
-vdev->vga->region[QEMU_PCI_VGA_IO_HI].offset = QEMU_PCI_VGA_IO_HI_BASE;
-vdev->vga->region[QEMU_PCI_VGA_IO_HI].nr = QEMU_PCI_VGA_IO_HI;
-QLIST_INIT(>vga->region[QEMU_PCI_VGA_IO_HI].quirks);
 }
 
 irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b8a7189..3976f68 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -157,4 +157,6 @@ void vfio_bar_quirk_exit(VFIOPCIDevice *vdev, int nr);
 void vfio_bar_quirk_finalize(VFIOPCIDevice *vdev, int nr);
 void vfio_setup_resetfn_quirk(VFIOPCIDevice *vdev);
 
+int vfio_populate_vga(VFIOPCIDevice *vdev);
+
 #endif /* HW_VFIO_VFIO_PCI_H */

[Qemu-devel] [RFC PATCH v3 6/9] vfio: Enable sparse mmap capability

2016-02-16 Thread Alex Williamson

The sparse mmap capability in a vfio region info allows vfio to tell
us which sub-areas of a region may be mmap'd.  Thus rather than
assuming a single mmap covers the entire region and later frobbing it
ourselves for things like the PCI MSI-X vector table, we can read that
directly from vfio.

Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c |   67 +++---
 trace-events |2 ++
 2 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 96ccb79..879a657 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -493,6 +493,54 @@ static void vfio_listener_release(VFIOContainer *container)
 memory_listener_unregister(>listener);
 }
 
+static struct vfio_info_cap_header *
+vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id)
+{
+struct vfio_info_cap_header *hdr;
+void *ptr = info;
+
+if (!(info->flags & VFIO_REGION_INFO_FLAG_CAPS)) {
+return NULL;
+}
+
+for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) {
+if (hdr->id == id) {
+return hdr;
+}
+}
+
+return NULL;
+}
+
+static void vfio_setup_region_sparse_mmaps(VFIORegion *region,
+   struct vfio_region_info *info)
+{
+struct vfio_info_cap_header *hdr;
+struct vfio_region_info_cap_sparse_mmap *sparse;
+int i;
+
+hdr = vfio_get_region_info_cap(info, VFIO_REGION_INFO_CAP_SPARSE_MMAP);
+if (!hdr) {
+return;
+}
+
+sparse = container_of(hdr, struct vfio_region_info_cap_sparse_mmap, 
header);
+
+trace_vfio_region_sparse_mmap_header(region->vbasedev->name,
+ region->nr, sparse->nr_areas);
+
+region->nr_mmaps = sparse->nr_areas;
+region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+
+for (i = 0; i < region->nr_mmaps; i++) {
+region->mmaps[i].offset = sparse->areas[i].offset;
+region->mmaps[i].size = sparse->areas[i].size;
+trace_vfio_region_sparse_mmap_entry(i, region->mmaps[i].offset,
+region->mmaps[i].offset +
+region->mmaps[i].size);
+}
+}
+
 int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
   int index, const char *name)
 {
@@ -519,11 +567,14 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
 region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
 !(region->size & ~qemu_real_host_page_mask)) {
 
-region->nr_mmaps = 1;
-region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+vfio_setup_region_sparse_mmaps(region, info);
 
-region->mmaps[0].offset = 0;
-region->mmaps[0].size = region->size;
+if (!region->nr_mmaps) {
+region->nr_mmaps = 1;
+region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+region->mmaps[0].offset = 0;
+region->mmaps[0].size = region->size;
+}
 }
 }
 
@@ -1083,6 +1134,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
 *info = g_malloc0(argsz);
 
 (*info)->index = index;
+retry:
 (*info)->argsz = argsz;
 
 if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
@@ -1090,6 +1142,13 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
 return -errno;
 }
 
+if ((*info)->argsz > argsz) {
+argsz = (*info)->argsz;
+*info = g_realloc(*info, argsz);
+
+goto retry;
+}
+
 return 0;
 }
 
diff --git a/trace-events b/trace-events
index 21cd28d..c2f48af 100644
--- a/trace-events
+++ b/trace-events
@@ -1736,6 +1736,8 @@ vfio_region_mmap(const char *name, unsigned long offset, 
unsigned long end) "Reg
 vfio_region_exit(const char *name, int index) "Device %s, region %d"
 vfio_region_finalize(const char *name, int index) "Device %s, region %d"
 vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps 
enabled: %d"
+vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
+vfio_region_sparse_mmap_entry(int i, off_t start, off_t end) "sparse entry %d 
[0x%lx - 0x%lx]"
 
 # hw/vfio/platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group 
#%d"

[Qemu-devel] [RFC PATCH v3 2/9] vfio: Wrap VFIO_DEVICE_GET_REGION_INFO

2016-02-16 Thread Alex Williamson

In preparation for supporting capability chains on regions, wrap
ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
each caller.

Signed-off-by: Alex Williamson 
---
 hw/vfio/common.c  |   18 +
 hw/vfio/pci.c |   81 +
 hw/vfio/platform.c|   13 ---
 include/hw/vfio/vfio-common.h |3 ++
 4 files changed, 69 insertions(+), 46 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 607ec70..e20fc4f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -959,6 +959,24 @@ void vfio_put_base_device(VFIODevice *vbasedev)
 close(vbasedev->fd);
 }
 
+int vfio_get_region_info(VFIODevice *vbasedev, int index,
+ struct vfio_region_info **info)
+{
+size_t argsz = sizeof(struct vfio_region_info);
+
+*info = g_malloc0(argsz);
+
+(*info)->index = index;
+(*info)->argsz = argsz;
+
+if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
+g_free(*info);
+return -errno;
+}
+
+return 0;
+}
+
 static int vfio_container_do_ioctl(AddressSpace *as, int32_t groupid,
int req, void *param)
 {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5524121..a52947b 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -783,25 +783,25 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 
 static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
-struct vfio_region_info reg_info = {
-.argsz = sizeof(reg_info),
-.index = VFIO_PCI_ROM_REGION_INDEX
-};
+struct vfio_region_info *reg_info;
 uint64_t size;
 off_t off = 0;
 ssize_t bytes;
 
-if (ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, _info)) {
+if (vfio_get_region_info(>vbasedev,
+ VFIO_PCI_ROM_REGION_INDEX, _info)) {
 error_report("vfio: Error getting ROM info: %m");
 return;
 }
 
-trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info.size,
-(unsigned long)reg_info.offset,
-(unsigned long)reg_info.flags);
+trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info->size,
+(unsigned long)reg_info->offset,
+(unsigned long)reg_info->flags);
+
+vdev->rom_size = size = reg_info->size;
+vdev->rom_offset = reg_info->offset;
 
-vdev->rom_size = size = reg_info.size;
-vdev->rom_offset = reg_info.offset;
+g_free(reg_info);
 
 if (!vdev->rom_size) {
 vdev->rom_read_failed = true;
@@ -2026,7 +2026,7 @@ static VFIODeviceOps vfio_pci_ops = {
 static int vfio_populate_device(VFIOPCIDevice *vdev)
 {
 VFIODevice *vbasedev = >vbasedev;
-struct vfio_region_info reg_info = { .argsz = sizeof(reg_info) };
+struct vfio_region_info *reg_info;
 struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
 int i, ret = -1;
 
@@ -2048,72 +2048,73 @@ static int vfio_populate_device(VFIOPCIDevice *vdev)
 }
 
 for (i = VFIO_PCI_BAR0_REGION_INDEX; i < VFIO_PCI_ROM_REGION_INDEX; i++) {
-reg_info.index = i;
-
-ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, _info);
+ret = vfio_get_region_info(vbasedev, i, _info);
 if (ret) {
 error_report("vfio: Error getting region %d info: %m", i);
 goto error;
 }
 
 trace_vfio_populate_device_region(vbasedev->name, i,
-  (unsigned long)reg_info.size,
-  (unsigned long)reg_info.offset,
-  (unsigned long)reg_info.flags);
+  (unsigned long)reg_info->size,
+  (unsigned long)reg_info->offset,
+  (unsigned long)reg_info->flags);
 
 vdev->bars[i].region.vbasedev = vbasedev;
-vdev->bars[i].region.flags = reg_info.flags;
-vdev->bars[i].region.size = reg_info.size;
-vdev->bars[i].region.fd_offset = reg_info.offset;
+vdev->bars[i].region.flags = reg_info->flags;
+vdev->bars[i].region.size = reg_info->size;
+vdev->bars[i].region.fd_offset = reg_info->offset;
 vdev->bars[i].region.nr = i;
 QLIST_INIT(>bars[i].quirks);
-}
 
-reg_info.index = VFIO_PCI_CONFIG_REGION_INDEX;
+g_free(reg_info);
+}
 
-ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_REGION_INFO, _info);
+ret = vfio_get_region_info(vbasedev,
+   VFIO_PCI_CONFIG_REGION_INDEX, _info);
 if (ret) {
 error_report("vfio: Error getting config info: %m");
 goto error;
 }
 
 trace_vfio_populate_device_config(vdev->vbasedev.name,
-  (unsigned long)reg_info.size,
-

[Qemu-devel] [RFC PATCH v3 4/9] vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions

2016-02-16 Thread Alex Williamson

Match common vfio code with setup, exit, and finalize functions for
BAR, quirk, and VGA management.  VGA is also changed to dynamic
allocation to match the other MemoryRegions.

Signed-off-by: Alex Williamson 
---
 hw/vfio/pci-quirks.c |   38 -
 hw/vfio/pci.c|  114 +-
 hw/vfio/pci.h|   10 ++--
 3 files changed, 71 insertions(+), 91 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index d626ec9..49ecf11 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -290,10 +290,10 @@ static void vfio_vga_probe_ati_3c3_quirk(VFIOPCIDevice 
*vdev)
 
 memory_region_init_io(quirk->mem, OBJECT(vdev), _ati_3c3_quirk, vdev,
   "vfio-ati-3c3-quirk", 1);
-memory_region_add_subregion(>vga.region[QEMU_PCI_VGA_IO_HI].mem,
+memory_region_add_subregion(>vga->region[QEMU_PCI_VGA_IO_HI].mem,
 3 /* offset 3 bytes from 0x3c0 */, quirk->mem);
 
-QLIST_INSERT_HEAD(>vga.region[QEMU_PCI_VGA_IO_HI].quirks,
+QLIST_INSERT_HEAD(>vga->region[QEMU_PCI_VGA_IO_HI].quirks,
   quirk, next);
 
 trace_vfio_quirk_ati_3c3_probe(vdev->vbasedev.name);
@@ -428,7 +428,7 @@ static uint64_t vfio_nvidia_3d4_quirk_read(void *opaque,
 
 quirk->state = NONE;
 
-return vfio_vga_read(>vga.region[QEMU_PCI_VGA_IO_HI],
+return vfio_vga_read(>vga->region[QEMU_PCI_VGA_IO_HI],
  addr + 0x14, size);
 }
 
@@ -465,7 +465,7 @@ static void vfio_nvidia_3d4_quirk_write(void *opaque, 
hwaddr addr,
 break;
 }
 
-vfio_vga_write(>vga.region[QEMU_PCI_VGA_IO_HI],
+vfio_vga_write(>vga->region[QEMU_PCI_VGA_IO_HI],
addr + 0x14, data, size);
 }
 
@@ -481,7 +481,7 @@ static uint64_t vfio_nvidia_3d0_quirk_read(void *opaque,
 VFIONvidia3d0Quirk *quirk = opaque;
 VFIOPCIDevice *vdev = quirk->vdev;
 VFIONvidia3d0State old_state = quirk->state;
-uint64_t data = vfio_vga_read(>vga.region[QEMU_PCI_VGA_IO_HI],
+uint64_t data = vfio_vga_read(>vga->region[QEMU_PCI_VGA_IO_HI],
   addr + 0x10, size);
 
 quirk->state = NONE;
@@ -523,7 +523,7 @@ static void vfio_nvidia_3d0_quirk_write(void *opaque, 
hwaddr addr,
 }
 }
 
-vfio_vga_write(>vga.region[QEMU_PCI_VGA_IO_HI],
+vfio_vga_write(>vga->region[QEMU_PCI_VGA_IO_HI],
addr + 0x10, data, size);
 }
 
@@ -551,15 +551,15 @@ static void vfio_vga_probe_nvidia_3d0_quirk(VFIOPCIDevice 
*vdev)
 
 memory_region_init_io(>mem[0], OBJECT(vdev), _nvidia_3d4_quirk,
   data, "vfio-nvidia-3d4-quirk", 2);
-memory_region_add_subregion(>vga.region[QEMU_PCI_VGA_IO_HI].mem,
+memory_region_add_subregion(>vga->region[QEMU_PCI_VGA_IO_HI].mem,
 0x14 /* 0x3c0 + 0x14 */, >mem[0]);
 
 memory_region_init_io(>mem[1], OBJECT(vdev), _nvidia_3d0_quirk,
   data, "vfio-nvidia-3d0-quirk", 2);
-memory_region_add_subregion(>vga.region[QEMU_PCI_VGA_IO_HI].mem,
+memory_region_add_subregion(>vga->region[QEMU_PCI_VGA_IO_HI].mem,
 0x10 /* 0x3c0 + 0x10 */, >mem[1]);
 
-QLIST_INSERT_HEAD(>vga.region[QEMU_PCI_VGA_IO_HI].quirks,
+QLIST_INSERT_HEAD(>vga->region[QEMU_PCI_VGA_IO_HI].quirks,
   quirk, next);
 
 trace_vfio_quirk_nvidia_3d0_probe(vdev->vbasedev.name);
@@ -970,28 +970,28 @@ void vfio_vga_quirk_setup(VFIOPCIDevice *vdev)
 vfio_vga_probe_nvidia_3d0_quirk(vdev);
 }
 
-void vfio_vga_quirk_teardown(VFIOPCIDevice *vdev)
+void vfio_vga_quirk_exit(VFIOPCIDevice *vdev)
 {
 VFIOQuirk *quirk;
 int i, j;
 
-for (i = 0; i < ARRAY_SIZE(vdev->vga.region); i++) {
-QLIST_FOREACH(quirk, >vga.region[i].quirks, next) {
+for (i = 0; i < ARRAY_SIZE(vdev->vga->region); i++) {
+QLIST_FOREACH(quirk, >vga->region[i].quirks, next) {
 for (j = 0; j < quirk->nr_mem; j++) {
-memory_region_del_subregion(>vga.region[i].mem,
+memory_region_del_subregion(>vga->region[i].mem,
 >mem[j]);
 }
 }
 }
 }
 
-void vfio_vga_quirk_free(VFIOPCIDevice *vdev)
+void vfio_vga_quirk_finalize(VFIOPCIDevice *vdev)
 {
 int i, j;
 
-for (i = 0; i < ARRAY_SIZE(vdev->vga.region); i++) {
-while (!QLIST_EMPTY(>vga.region[i].quirks)) {
-VFIOQuirk *quirk = QLIST_FIRST(>vga.region[i].quirks);
+for (i = 0; i < ARRAY_SIZE(vdev->vga->region); i++) {
+while (!QLIST_EMPTY(>vga->region[i].quirks)) {
+VFIOQuirk *quirk = QLIST_FIRST(>vga->region[i].quirks);
 QLIST_REMOVE(quirk, next);
 for (j = 0; j < quirk->nr_mem; j++) {
 object_unparent(OBJECT(>mem[j]));
@@ -1012,7 +1012,7 @@ void vfio_bar_quirk_setup(VFIOPCIDevice *vdev, int nr)

[Qemu-devel] [RFC PATCH v3 3/9] vfio: Generalize region support

2016-02-16 Thread Alex Williamson

Both platform and PCI vfio drivers create a "slow", I/O memory region
with one or more mmap memory regions overlayed when supported by the
device. Generalize this to a set of common helpers in the core that
pulls the region info from vfio, fills the region data, configures
slow mapping, and adds helpers for comleting the mmap, enable/disable,
and teardown.  This can be immediately used by the PCI MSI-X code,
which needs to mmap around the MSI-X vector table.

This also changes VFIORegion.mem to be dynamically allocated because
otherwise we don't know how the caller has allocated VFIORegion and
therefore don't know whether to unreference it to destroy the
MemoryRegion or not.

Signed-off-by: Alex Williamson 
---
 hw/arm/sysbus-fdt.c   |2 
 hw/vfio/common.c  |  172 ++---
 hw/vfio/pci-quirks.c  |   24 +++---
 hw/vfio/pci.c |  168 +---
 hw/vfio/platform.c|   72 +++--
 include/hw/vfio/vfio-common.h |   23 -
 trace-events  |   10 ++
 7 files changed, 282 insertions(+), 189 deletions(-)

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 68a3de5..4dfa0e4 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -94,7 +94,7 @@ static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice 
*sbdev, void *opaque)
 mmio_base = platform_bus_get_mmio_addr(pbus, sbdev, i);
 reg_attr[2 * i] = cpu_to_be32(mmio_base);
 reg_attr[2 * i + 1] = cpu_to_be32(
-memory_region_size(>regions[i]->mem));
+memory_region_size(vdev->regions[i]->mem));
 }
 ret = qemu_fdt_setprop(fdt, nodename, "reg", reg_attr,
vbasedev->num_regions * 2 * sizeof(uint32_t));
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e20fc4f..96ccb79 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -493,46 +493,162 @@ static void vfio_listener_release(VFIOContainer 
*container)
 memory_listener_unregister(>listener);
 }
 
-int vfio_mmap_region(Object *obj, VFIORegion *region,
- MemoryRegion *mem, MemoryRegion *submem,
- void **map, size_t size, off_t offset,
- const char *name)
+int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
+  int index, const char *name)
 {
-int ret = 0;
-VFIODevice *vbasedev = region->vbasedev;
+struct vfio_region_info *info;
+int ret;
+
+ret = vfio_get_region_info(vbasedev, index, );
+if (ret) {
+return ret;
+}
+
+region->vbasedev = vbasedev;
+region->flags = info->flags;
+region->size = info->size;
+region->fd_offset = info->offset;
+region->nr = index;
 
-if (!vbasedev->no_mmap && size && region->flags &
-VFIO_REGION_INFO_FLAG_MMAP) {
-int prot = 0;
+if (region->size) {
+region->mem = g_new0(MemoryRegion, 1);
+memory_region_init_io(region->mem, obj, _region_ops,
+  region, name, region->size);
 
-if (region->flags & VFIO_REGION_INFO_FLAG_READ) {
-prot |= PROT_READ;
+if (!vbasedev->no_mmap &&
+region->flags & VFIO_REGION_INFO_FLAG_MMAP &&
+!(region->size & ~qemu_real_host_page_mask)) {
+
+region->nr_mmaps = 1;
+region->mmaps = g_new0(VFIOMmap, region->nr_mmaps);
+
+region->mmaps[0].offset = 0;
+region->mmaps[0].size = region->size;
 }
+}
+
+g_free(info);
+
+trace_vfio_region_setup(vbasedev->name, index, name,
+region->flags, region->fd_offset, region->size);
+return 0;
+}
 
-if (region->flags & VFIO_REGION_INFO_FLAG_WRITE) {
-prot |= PROT_WRITE;
+int vfio_region_mmap(VFIORegion *region)
+{
+int i, prot = 0;
+char *name;
+
+if (!region->mem) {
+return 0;
+}
+
+prot |= region->flags & VFIO_REGION_INFO_FLAG_READ ? PROT_READ : 0;
+prot |= region->flags & VFIO_REGION_INFO_FLAG_WRITE ? PROT_WRITE : 0;
+
+for (i = 0; i < region->nr_mmaps; i++) {
+region->mmaps[i].mmap = mmap(NULL, region->mmaps[i].size, prot,
+ MAP_SHARED, region->vbasedev->fd,
+ region->fd_offset +
+ region->mmaps[i].offset);
+if (region->mmaps[i].mmap == MAP_FAILED) {
+int ret = -errno;
+
+trace_vfio_region_mmap_fault(memory_region_name(region->mem), i,
+ region->fd_offset +
+ region->mmaps[i].offset,
+ region->fd_offset +
+ region->mmaps[i].offset +
+ region->mmaps[i].size

[Qemu-devel] [RFC PATCH v3 1/9] vfio: Add sysfsdev property for pci & platform

2016-02-16 Thread Alex Williamson

vfio-pci currently requires a host= parameter, which comes in the
form of a PCI address in [domain:] notation.  We
expect to find a matching entry in sysfs for that under
/sys/bus/pci/devices/.  vfio-platform takes a similar approach, but
defines the host= parameter to be a string, which can be matched
directly under /sys/bus/platform/devices/.  On the PCI side, we have
some interest in using vfio to expose vGPU devices.  These are not
actual discrete PCI devices, so they don't have a compatible host PCI
bus address or a device link where QEMU wants to look for it.  There's
also really no requirement that vfio can only be used to expose
physical devices, a new vfio bus and iommu driver could expose a
completely emulated device.  To fit within the vfio framework, it
would need a kernel struct device and associated IOMMU group, but
those are easy constraints to manage.

To support such devices, which would include vGPUs, that honor the
VFIO PCI programming API, but are not necessarily backed by a unique
PCI address, add support for specifying any device in sysfs.  The
vfio API already has support for probing the device type to ensure
compatibility with either vfio-pci or vfio-platform.

With this, a vfio-pci device could either be specified as:

-device vfio-pci,host=02:00.0

or

-device vfio-pci,sysfsdev=/sys/devices/pci:00/:00:1c.0/:02:00.0

or even

-device vfio-pci,sysfsdev=/sys/bus/pci/devices/:02:00.0

When vGPU support comes along, this might look something more like:

-device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@:00:02.0

NB - This is only a made up example path, but it should be noted that
the device namespace is global for vfio, a virtual device cannot
overlap with existing namespaces and should not create a name prone to
conflict, such as a simple instance number.

The same changes is made for vfio-platform, specifying sysfsdev has
precedence over the old host option.

Signed-off-by: Alex Williamson 
Tested-by: Eric Auger 
Reviewed-by: Eric Auger 
---
 hw/vfio/pci.c |  130 +
 hw/vfio/platform.c|   55 ++---
 include/hw/vfio/vfio-common.h |1 
 3 files changed, 86 insertions(+), 100 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 49f3d2d..5524121 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -895,12 +895,8 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
 /* Since pci handles romfile, just print a message and return */
 if (vfio_blacklist_opt_rom(vdev) && vdev->pdev.romfile) {
-error_printf("Warning : Device at %04x:%02x:%02x.%x "
- "is known to cause system instability issues during "
- "option rom execution. "
- "Proceeding anyway since user specified romfile\n",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+error_printf("Warning : Device at %s is known to cause system 
instability issues during option rom execution. Proceeding anyway since user 
specified romfile\n",
+ vdev->vbasedev.name);
 }
 return;
 }
@@ -913,9 +909,7 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 pwrite(fd, , 4, offset) != 4 ||
 pread(fd, , 4, offset) != 4 ||
 pwrite(fd, , 4, offset) != 4) {
-error_report("%s(%04x:%02x:%02x.%x) failed: %m",
- __func__, vdev->host.domain, vdev->host.bus,
- vdev->host.slot, vdev->host.function);
+error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
 return;
 }
 
@@ -927,29 +921,18 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 
 if (vfio_blacklist_opt_rom(vdev)) {
 if (dev->opts && qemu_opt_get(dev->opts, "rombar")) {
-error_printf("Warning : Device at %04x:%02x:%02x.%x "
- "is known to cause system instability issues during "
- "option rom execution. "
- "Proceeding anyway since user specified non zero 
value for "
- "rombar\n",
- vdev->host.domain, vdev->host.bus, vdev->host.slot,
- vdev->host.function);
+error_printf("Warning : Device at %s is known to cause system 
instability issues during option rom execution. Proceeding anyway since user 
specified non zero value for rombar\n",
+ vdev->vbasedev.name);
 } else {
-error_printf("Warning : Rom loading for device at "
- "%04x:%02x:%02x.%x has been disabled due to "
- "system instability issues. "
- "Specify rombar=1 or romfile to

[Qemu-devel] [RFC PATCH v3 0/9] vfio: capability chains, sparse mmap, device specific regions, IGD support

2016-02-16 Thread Alex Williamson

v3:

Quite a bit of restructuring, functional differences include exposing
another fw_cfg file to indicate the size of the stolen memory region.
We really have no need to copy anything into stolen memory, so while
we tell SeaBIOS about it via a fw_cfg file, the data pointer is NULL
so it can't be read.

I'm also now reading the GGMS size from the GMCH register which
determines the size of the GTT region of stolen memory.  The vBIOS
is typically only using 1MB, but this is often 2MB in hardware.  I
also give the user the ability to specify a GMS value for further
stolen memory.  We default it to zero and it's an experimental option
so we can remove if it's not useful.  QEMU now does the virtualization
of the GMCH and BDSM registers, which is was sort of doing before
anyway, but vfio kernel no longer does anything special for them.
Getting the GGMS size requires that we know something about the IGD
version we're using, so code added for that.

One fun thing, IGD is really part of the reason that the x-vga option
is experimental.  IGD doesn't like to give up VGA routing.  Now we
can use that to our advantage.  If the hardware doesn't report VGA
disabled, we can automatically turn it on.  I position this around
all the other stuff of doing vBIOS and BDSM quirks, so it can be
disabled by specifying rombar=0 or using the new x-no-auto-vga
options.  I'm also using this to signal when to skip creating the
ISA bridge and messing with the host bridge.  If we have a Gen8 or
newer and rombar=0 is specified then we don't do any special setup,
which should enable Intel's Universal Passthrough Mode.  This is
already supported by libvirt, so it should make an easy path between
old and new modes.

Not seen here is a whole revision that created fake BARs on the ISA
bride for the opregion and stolen memory such that they were
automatically mapped with no BIOS requirement.  That has a gap that
stolen memory gets disabled during BAR sizing and breaks altogether
if the guest moves the BAR.  It really only affects VESA mode, but
it's still enough to abandon that hack approach.

This will work with the previous kernel patches, but I'd recommend
v3 anyway, plus the PCI FLR reset delay on latops.  You will
definitely need new SeaBIOS for this or else you'll get a hw_error.
Happy testing, reviews and feedback welcome.  Thanks,

Alex

v2:

IGD support is greatly expanded.  Due to feedback on the previous
serious QEMU no longer maps the OpRegion to the guest, we simply fill
a buffer and expose it as fw_cfg.  We could still do the mapping in
the future if there's value to it.

New features include the use of host and LPC bridge config space
provided through new vfio device specific regions.  This eliminates
the need for QEMU to go poking around in pci-sysfs.  Additionally the
host and LPC changes are now initiated by vfio-pci upon finding the
necessary regions to support these.  Thus the igd_passthru=on machine
option is not needed for this series.  This series no longer has any
dependency on Gerd's previous IGD series.

Also included is PCI option ROM fixups, which automatically fixes the
device ID in the ROM and recalculates the checksum for ROMs loaded
through vfio.  This is necessary for IGD as the ROM vfio provides us
through the shadow ROM space typically has the wrong ID and bogus
checksum.  It would also be useful for anyone "soft modding" a card
by specifying a different device ID and manually hacking the ROM.

Finally is a quirk to handle stolen memory and requires cooperation
with SeaBIOS.  We need the vBIOS, as enabled by the ROM support
above, for lighting up laptop panels (at least for my SNB system),
but that vBIOS tries to make use of host stolen memory, which either
overlaps VM RAM or empty space, which leads to VM memory corruption
or DMAR faults respectively.  We can prevent this by intercepting
the vBIOS programming of the device to instead use a buffer allocated
by SeaBIOS.  I'm amazed this works, but it does... at least for me.
Comments and testing feedback welcome.  You'll need this QEMU patch
series, the latest vfio patch series (including the PCI reset path
on laptops), and a new SeaBIOS patch series.  Thanks,

Alex


v1:

This is the QEMU compliment to the vfio kernel capability chain
series.  This is RFC since it depends on those non-upstream kernel
changes.  Patch 1/ will be posted separately, it's somewhat unrelated,
but is in my build tree so I include it here for anyone that wants to
build this series.

This series includes sparse mmap support for avoiding mmaps over the
MSI-X vector table and device specific memory regions for IGD OpRegion
support.  MemoryRegions are significantly generalize for the former,
to make it really easy for each vfio region to be backed by none or
more mmap MemoryRegion.  The MSI-X vector table then either adds an
mmap region, or not via a legacy quirk or explicit sparse mmap
support.

IGD OpRegions are exposed as new device specific region, which simply
entails searching

Re: [Qemu-devel] [PATCH v1 6/9] target-arm/translate-a64.c: Unify some of the ldst_reg decoding

2016-02-16 Thread Sergey Fedorov

On 12.02.2016 17:33, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
>
> The various load/store variants under disas_ldst_reg can all reuse the
> same decoding for opc, size, rt and is_vector.
>
> This patch unifies the decoding in preparation for generating
> instruction syndromes for data aborts.
> This will allow us to reduce the number of places to hook in updates
> to the load/store state needed to generate the insn syndromes.
>
> No functional change.
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Sergey Fedorov 

> ---
>  target-arm/translate-a64.c | 41 +++--
>  1 file changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index bf31f8a..9e26d5e 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -2075,19 +2075,19 @@ static void disas_ldst_pair(DisasContext *s, uint32_t 
> insn)
>   * size: 00 -> 8 bit, 01 -> 16 bit, 10 -> 32 bit, 11 -> 64bit
>   * opc: 00 -> store, 01 -> loadu, 10 -> loads 64, 11 -> loads 32
>   */
> -static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn)
> +static void disas_ldst_reg_imm9(DisasContext *s, uint32_t insn,
> +int opc,
> +int size,
> +int rt,
> +bool is_vector)
>  {
> -int rt = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
>  int imm9 = sextract32(insn, 12, 9);
> -int opc = extract32(insn, 22, 2);
> -int size = extract32(insn, 30, 2);
>  int idx = extract32(insn, 10, 2);
>  bool is_signed = false;
>  bool is_store = false;
>  bool is_extended = false;
>  bool is_unpriv = (idx == 2);
> -bool is_vector = extract32(insn, 26, 1);
>  bool post_index;
>  bool writeback;
>  
> @@ -2194,19 +2194,19 @@ static void disas_ldst_reg_imm9(DisasContext *s, 
> uint32_t insn)
>   * Rn: address register or SP for base
>   * Rm: offset register or ZR for offset
>   */
> -static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn)
> +static void disas_ldst_reg_roffset(DisasContext *s, uint32_t insn,
> +   int opc,
> +   int size,
> +   int rt,
> +   bool is_vector)
>  {
> -int rt = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
>  int shift = extract32(insn, 12, 1);
>  int rm = extract32(insn, 16, 5);
> -int opc = extract32(insn, 22, 2);
>  int opt = extract32(insn, 13, 3);
> -int size = extract32(insn, 30, 2);
>  bool is_signed = false;
>  bool is_store = false;
>  bool is_extended = false;
> -bool is_vector = extract32(insn, 26, 1);
>  
>  TCGv_i64 tcg_rm;
>  TCGv_i64 tcg_addr;
> @@ -2283,14 +2283,14 @@ static void disas_ldst_reg_roffset(DisasContext *s, 
> uint32_t insn)
>   * Rn: base address register (inc SP)
>   * Rt: target register
>   */
> -static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn)
> +static void disas_ldst_reg_unsigned_imm(DisasContext *s, uint32_t insn,
> +int opc,
> +int size,
> +int rt,
> +bool is_vector)
>  {
> -int rt = extract32(insn, 0, 5);
>  int rn = extract32(insn, 5, 5);
>  unsigned int imm12 = extract32(insn, 10, 12);
> -bool is_vector = extract32(insn, 26, 1);
> -int size = extract32(insn, 30, 2);
> -int opc = extract32(insn, 22, 2);
>  unsigned int offset;
>  
>  TCGv_i64 tcg_addr;
> @@ -2349,20 +2349,25 @@ static void disas_ldst_reg_unsigned_imm(DisasContext 
> *s, uint32_t insn)
>  /* Load/store register (all forms) */
>  static void disas_ldst_reg(DisasContext *s, uint32_t insn)
>  {
> +int rt = extract32(insn, 0, 5);
> +int opc = extract32(insn, 22, 2);
> +bool is_vector = extract32(insn, 26, 1);
> +int size = extract32(insn, 30, 2);
> +
>  switch (extract32(insn, 24, 2)) {
>  case 0:
>  if (extract32(insn, 21, 1) == 1 && extract32(insn, 10, 2) == 2) {
> -disas_ldst_reg_roffset(s, insn);
> +disas_ldst_reg_roffset(s, insn, opc, size, rt, is_vector);
>  } else {
>  /* Load/store register (unscaled immediate)
>   * Load/store immediate pre/post-indexed
>   * Load/store register unprivileged
>   */
> -disas_ldst_reg_imm9(s, insn);
> +disas_ldst_reg_imm9(s, insn, opc, size, rt, is_vector);
>  }
>  break;
>  case 1:
> -disas_ldst_reg_unsigned_imm(s, insn);
> +disas_ldst_reg_unsigned_imm(s, insn, opc, size, rt, is_vector);
>  break;
>  default:
>

Re: [Qemu-devel] [PATCH v1 5/9] target-arm/translate-a64.c: Use extract32 in disas_ldst_reg_imm9

2016-02-16 Thread Sergey Fedorov

On 12.02.2016 17:33, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
>
> Use extract32 instead of open coding the bit masking when decoding
> is_signed and is_extended. This streamlines the decoding with some
> of the other ldst variants.
>
> No functional change.
>
> Signed-off-by: Edgar E. Iglesias 

Reviewed-by: Sergey Fedorov 

> ---
>  target-arm/translate-a64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index 7f65aea..bf31f8a 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -2117,8 +2117,8 @@ static void disas_ldst_reg_imm9(DisasContext *s, 
> uint32_t insn)
>  return;
>  }
>  is_store = (opc == 0);
> -is_signed = opc & (1<<1);
> -is_extended = (size < 3) && (opc & 1);
> +is_signed = extract32(opc, 1, 1);
> +is_extended = (size < 3) && extract32(opc, 0, 1);
>  }
>  
>  switch (idx) {

Re: [Qemu-devel] [PATCH v2 7/9] i.MX: Add i.MX6 SOC implementation.

2016-02-16 Thread Peter Maydell

On 16 February 2016 at 20:49, Jean-Christophe DUBOIS
 wrote:
> Le 16/02/2016 16:31, Peter Maydell a écrit :
>> On 8 February 2016 at 22:08, Jean-Christophe Dubois 
>> wrote:
>>> +object_property_set_bool(OBJECT(>cpu[i]), false,
>>> + "has_el3", _abort);
>>
>> Do the CPUs in this board really not have EL3 ?
>
>
> Well the Cortex A9 is certainly able to get to the secure mode. However, if
> I enable it (has_el3 set to true), the OS (Linux or Xvisor) will not boot.
> Disabling it allow both OS to boot on the emulated i.MX6.
>
> Would you have some idea on the reason for this "hang" during the boot when
> EL3 is enabled?

How are you booting the OS/hypervisor? Via -kernel or via -bios ?
Usually if it doesn't boot this is because there's some bit of
boot rom/loader code that runs on the real h/w and isn't being
run in your QEMU setup, that does the initial setup of the h/w
in secure mode. (In particular, if the secure side doesn't set
the GIC interrupts to be NS accessible then things don't work
very well.)

I wouldn't have expected that to be an issue for booting linux
via -kernel though because there we should be booting the kernel
NS and have a hack to configure the GIC appropriately.

Peter C may remember the details of how this should work
better than me.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v10 09/13] qapi: Emit structs used as variants in topological order

2016-02-16 Thread Markus Armbruster

Eric Blake  writes:

> On 02/16/2016 10:03 AM, Markus Armbruster wrote:
>> Eric Blake  writes:
>> 
>>> Right now, we emit the branches of union types as a boxed pointer,
>>> and it suffices to have a forward declaration of the type.  However,
>>> a future patch will swap things to directly use the branch type,
>>> instead of hiding it behind a pointer.  For this to work, the
>>> compiler needs the full definition of the type, not just a forward
>>> declaration, prior to the union that is including the branch type.
>>> This patch just adds topological sorting to hoist all types
>>> mentioned in a branch of a union to be fully declared before the
>>> union itself.  The sort is always possible, because we do not
>>> allow circular union types that include themselves as a direct
>>> branch (it is, however, still possible to include a branch type
>>> that itself has a pointer to the union, for a type that can
>>> indirectly recursively nest itself - that remains safe, because
>>> that the member of the branch type will remain a pointer, and the
>>> QMP representation of such a type adds another {} for each recurring
>>> layer of the union type).
>>>
>
>>> +ret = ''
>>> +if variants:
>>> +for v in variants.variants:
>>> +if isinstance(v.type, QAPISchemaObjectType) and \
>>> +   not v.type.is_implicit():
>>> +ret += gen_object(v.type.name, v.type.base,
>>> +  v.type.local_members, v.type.variants)
>> 
>> PEP 8:
>> 
>> The preferred way of wrapping long lines is by using Python's
>> implied line continuation inside parentheses, brackets and
>> braces. Long lines can be broken over multiple lines by wrapping
>> expressions in parentheses. These should be used in preference to
>> using a backslash for line continuation.
>> 
>> In this case:
>> 
>>if (isinstance(v.type, QAPISchemaObjectType) and
>>not v.type.is_implicit()):
>
> pep8 silently accepted my version, but complains about yours:
>
> scripts/qapi-types.py:65:5: E129 visually indented line with same indent
> as next logical line
>
> So the compromise for both of us is added indentation:
>
> if (isinstance(v.type, QAPISchemaObjectType) and
> not v.type.is_implicit()):
> ret += ...

Sold.

>
> Or, I could revisit my earlier proposal of:
>
> v.type.is_implicit(QAPISchemaObjectType)
>
> of giving .is_implicit() an optional parameter; if absent, all types are
> considered, but if present, the predicate is True only if the type of
> the object being queried matches the parameter type name.
>
> Here's the last time we discussed the tradeoffs of the shorter form:
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02272.html

[Qemu-devel] [PATCH 4/4] module: Rename machine_init() to opts_init()

2016-02-16 Thread Eduardo Habkost

The only remaining users of machine_init() only call
qemu_add_opts(). Rename machine_init() to opts_init() and move it
closer to the qemu_add_opts() calls on vl.c.

Cc: "Michael S. Tsirkin" 
Cc: Igor Mammedov 
Cc: Gerd Hoffmann 
Signed-off-by: Eduardo Habkost 
---
 fsdev/qemu-fsdev-opts.c | 2 +-
 hw/acpi/core.c  | 2 +-
 hw/smbios/smbios.c  | 2 +-
 include/qemu/module.h   | 4 ++--
 ui/spice-core.c | 2 +-
 ui/vnc.c| 2 +-
 vl.c| 2 +-
 7 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsdev/qemu-fsdev-opts.c b/fsdev/qemu-fsdev-opts.c
index 0b4619f..88a4ac3 100644
--- a/fsdev/qemu-fsdev-opts.c
+++ b/fsdev/qemu-fsdev-opts.c
@@ -83,4 +83,4 @@ static void fsdev_register_config(void)
 qemu_add_opts(_fsdev_opts);
 qemu_add_opts(_virtfs_opts);
 }
-machine_init(fsdev_register_config);
+opts_init(fsdev_register_config);
diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 3a14e90..714bc68 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -68,7 +68,7 @@ static void acpi_register_config(void)
 qemu_add_opts(_acpi_opts);
 }
 
-machine_init(acpi_register_config);
+opts_init(acpi_register_config);
 
 static int acpi_checksum(const uint8_t *data, int len)
 {
diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
index 3b5f9bd..1362e79 100644
--- a/hw/smbios/smbios.c
+++ b/hw/smbios/smbios.c
@@ -320,7 +320,7 @@ static void smbios_register_config(void)
 qemu_add_opts(_smbios_opts);
 }
 
-machine_init(smbios_register_config);
+opts_init(smbios_register_config);
 
 static void smbios_validate_table(void)
 {
diff --git a/include/qemu/module.h b/include/qemu/module.h
index 72d9498..24b61ec 100644
--- a/include/qemu/module.h
+++ b/include/qemu/module.h
@@ -42,14 +42,14 @@ static void __attribute__((constructor)) do_qemu_init_ ## 
function(void)\
 
 typedef enum {
 MODULE_INIT_BLOCK,
-MODULE_INIT_MACHINE,
+MODULE_INIT_OPTS,
 MODULE_INIT_QAPI,
 MODULE_INIT_QOM,
 MODULE_INIT_MAX
 } module_init_type;
 
 #define block_init(function) module_init(function, MODULE_INIT_BLOCK)
-#define machine_init(function) module_init(function, MODULE_INIT_MACHINE)
+#define opts_init(function) module_init(function, MODULE_INIT_OPTS)
 #define qapi_init(function) module_init(function, MODULE_INIT_QAPI)
 #define type_init(function) module_init(function, MODULE_INIT_QOM)
 
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 4dbd99a..0038169 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -931,4 +931,4 @@ static void spice_register_config(void)
 {
 qemu_add_opts(_spice_opts);
 }
-machine_init(spice_register_config);
+opts_init(spice_register_config);
diff --git a/ui/vnc.c b/ui/vnc.c
index b6bbea5..2d06ca4 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3878,4 +3878,4 @@ static void vnc_register_config(void)
 {
 qemu_add_opts(_vnc_opts);
 }
-machine_init(vnc_register_config);
+opts_init(vnc_register_config);
diff --git a/vl.c b/vl.c
index 18e6086..6f7772c 100644
--- a/vl.c
+++ b/vl.c
@@ -2987,6 +2987,7 @@ int main(int argc, char **argv, char **envp)
 qemu_add_opts(_icount_opts);
 qemu_add_opts(_semihosting_config_opts);
 qemu_add_opts(_fw_cfg_opts);
+module_call_init(MODULE_INIT_OPTS);
 
 runstate_init();
 
@@ -2999,7 +3000,6 @@ int main(int argc, char **argv, char **envp)
 QLIST_INIT (_change_state_head);
 os_setup_early_signal_handling();
 
-module_call_init(MODULE_INIT_MACHINE);
 machine_class = find_default_machine();
 cpu_model = NULL;
 snapshot = 0;
-- 
2.1.0

[Qemu-devel] [PATCH 3/4] s390x/css: Use static initialization for channel_subsys fields

2016-02-16 Thread Eduardo Habkost

machine_init() will be gone, but we don't need it if we just
initialize the channel_subsys fields statically.

Cc: Cornelia Huck 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Alexander Graf 
Signed-off-by: Eduardo Habkost 
---
 hw/s390x/css.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 2e9659a..15eb154 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -62,7 +62,15 @@ typedef struct ChannelSubSys {
 QTAILQ_HEAD(, IoAdapter) io_adapters;
 } ChannelSubSys;
 
-static ChannelSubSys channel_subsys;
+static ChannelSubSys channel_subsys = {
+.pending_crws = QTAILQ_HEAD_INITIALIZER(channel_subsys.pending_crws),
+.do_crw_mchk = true,
+.sei_pending = false,
+.do_crw_mchk = true,
+.crws_lost = false,
+.chnmon_active = false,
+.io_adapters = QTAILQ_HEAD_INITIALIZER(channel_subsys.io_adapters),
+};
 
 int css_create_css_image(uint8_t cssid, bool default_image)
 {
@@ -1514,18 +1522,6 @@ int subch_device_load(SubchDev *s, QEMUFile *f)
 return 0;
 }
 
-
-static void css_init(void)
-{
-QTAILQ_INIT(_subsys.pending_crws);
-channel_subsys.sei_pending = false;
-channel_subsys.do_crw_mchk = true;
-channel_subsys.crws_lost = false;
-channel_subsys.chnmon_active = false;
-QTAILQ_INIT(_subsys.io_adapters);
-}
-machine_init(css_init);
-
 void css_reset_sch(SubchDev *sch)
 {
 PMCW *p = >curr_status.pmcw;
-- 
2.1.0

[Qemu-devel] [PATCH 2/4] s390x/css: Allocate channel_subsys statically

2016-02-16 Thread Eduardo Habkost

There's no need to use g_malloc0() to allocate the channel_subsys
struct, just use a static variable.

Cc: Cornelia Huck 
Cc: Christian Borntraeger 
Cc: Richard Henderson 
Cc: Alexander Graf 
Signed-off-by: Eduardo Habkost 
---
 hw/s390x/css.c | 177 -
 1 file changed, 88 insertions(+), 89 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index c29068b..2e9659a 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -62,7 +62,7 @@ typedef struct ChannelSubSys {
 QTAILQ_HEAD(, IoAdapter) io_adapters;
 } ChannelSubSys;
 
-static ChannelSubSys *channel_subsys;
+static ChannelSubSys channel_subsys;
 
 int css_create_css_image(uint8_t cssid, bool default_image)
 {
@@ -70,12 +70,12 @@ int css_create_css_image(uint8_t cssid, bool default_image)
 if (cssid > MAX_CSSID) {
 return -EINVAL;
 }
-if (channel_subsys->css[cssid]) {
+if (channel_subsys.css[cssid]) {
 return -EBUSY;
 }
-channel_subsys->css[cssid] = g_malloc0(sizeof(CssImage));
+channel_subsys.css[cssid] = g_malloc0(sizeof(CssImage));
 if (default_image) {
-channel_subsys->default_cssid = cssid;
+channel_subsys.default_cssid = cssid;
 }
 return 0;
 }
@@ -90,7 +90,7 @@ int css_register_io_adapter(uint8_t type, uint8_t isc, bool 
swap,
 S390FLICStateClass *fsc = S390_FLIC_COMMON_GET_CLASS(fs);
 
 *id = 0;
-QTAILQ_FOREACH(adapter, _subsys->io_adapters, sibling) {
+QTAILQ_FOREACH(adapter, _subsys.io_adapters, sibling) {
 if ((adapter->type == type) && (adapter->isc == isc)) {
 *id = adapter->id;
 found = true;
@@ -110,7 +110,7 @@ int css_register_io_adapter(uint8_t type, uint8_t isc, bool 
swap,
 adapter->id = *id;
 adapter->isc = isc;
 adapter->type = type;
-QTAILQ_INSERT_TAIL(_subsys->io_adapters, adapter, sibling);
+QTAILQ_INSERT_TAIL(_subsys.io_adapters, adapter, sibling);
 } else {
 g_free(adapter);
 fprintf(stderr, "Unexpected error %d when registering adapter %d\n",
@@ -122,7 +122,7 @@ out:
 
 uint16_t css_build_subchannel_id(SubchDev *sch)
 {
-if (channel_subsys->max_cssid > 0) {
+if (channel_subsys.max_cssid > 0) {
 return (sch->cssid << 8) | (1 << 3) | (sch->ssid << 1) | 1;
 }
 return (sch->ssid << 1) | 1;
@@ -778,12 +778,12 @@ static void css_update_chnmon(SubchDev *sch)
 
 offset = sch->curr_status.pmcw.mbi << 5;
 count = address_space_lduw(_space_memory,
-   channel_subsys->chnmon_area + offset,
+   channel_subsys.chnmon_area + offset,
MEMTXATTRS_UNSPECIFIED,
NULL);
 count++;
 address_space_stw(_space_memory,
-  channel_subsys->chnmon_area + offset, count,
+  channel_subsys.chnmon_area + offset, count,
   MEMTXATTRS_UNSPECIFIED, NULL);
 }
 }
@@ -812,7 +812,7 @@ int css_do_ssch(SubchDev *sch, ORB *orb)
 }
 
 /* If monitoring is active, update counter. */
-if (channel_subsys->chnmon_active) {
+if (channel_subsys.chnmon_active) {
 css_update_chnmon(sch);
 }
 sch->channel_prog = orb->cpa;
@@ -971,16 +971,16 @@ int css_do_stcrw(CRW *crw)
 CrwContainer *crw_cont;
 int ret;
 
-crw_cont = QTAILQ_FIRST(_subsys->pending_crws);
+crw_cont = QTAILQ_FIRST(_subsys.pending_crws);
 if (crw_cont) {
-QTAILQ_REMOVE(_subsys->pending_crws, crw_cont, sibling);
+QTAILQ_REMOVE(_subsys.pending_crws, crw_cont, sibling);
 copy_crw_to_guest(crw, _cont->crw);
 g_free(crw_cont);
 ret = 0;
 } else {
 /* List was empty, turn crw machine checks on again. */
 memset(crw, 0, sizeof(*crw));
-channel_subsys->do_crw_mchk = true;
+channel_subsys.do_crw_mchk = true;
 ret = 1;
 }
 
@@ -999,12 +999,12 @@ void css_undo_stcrw(CRW *crw)
 
 crw_cont = g_try_malloc0(sizeof(CrwContainer));
 if (!crw_cont) {
-channel_subsys->crws_lost = true;
+channel_subsys.crws_lost = true;
 return;
 }
 copy_crw_from_guest(_cont->crw, crw);
 
-QTAILQ_INSERT_HEAD(_subsys->pending_crws, crw_cont, sibling);
+QTAILQ_INSERT_HEAD(_subsys.pending_crws, crw_cont, sibling);
 }
 
 int css_do_tpi(IOIntCode *int_code, int lowcore)
@@ -1022,9 +1022,9 @@ int css_collect_chp_desc(int m, uint8_t cssid, uint8_t 
f_chpid, uint8_t l_chpid,
 CssImage *css;
 
 if (!m && !cssid) {
-css = channel_subsys->css[channel_subsys->default_cssid];
+css = channel_subsys.css[channel_subsys.default_cssid];
 } else {
-css = channel_subsys->css[cssid];
+css = channel_subsys.css[cssid];
 }
 if (!css) {

[Qemu-devel] [PATCH 0/4] machine: Eliminate machine_init()/MODULE_INIT_MACHINE

2016-02-16 Thread Eduardo Habkost

There are currently three types of users of machine_init():
* type_register*() callers
* The channel_subsys initialization of at hw/s390x/css.c
* qemu_add_opts() callers

This series:
* Changes type_register*() callers to use type_init()
* Changes s390x/css to simply initialize channel_subsys fields statically
* Replaces machine_init() with a new opts_init() helper, after
  all remaining machine_init() users are just qemu_add_opts()
  callers

Eduardo Habkost (4):
  machine: Use type_init() to register machine classes
  s390x/css: Allocate channel_subsys statically
  s390x/css: Use static initialization for channel_subsys fields
  module: Rename machine_init() to opts_init()

 fsdev/qemu-fsdev-opts.c |   2 +-
 hw/acpi/core.c  |   2 +-
 hw/arm/exynos4_boards.c |   2 +-
 hw/arm/gumstix.c|   2 +-
 hw/arm/highbank.c   |   2 +-
 hw/arm/nseries.c|   2 +-
 hw/arm/omap_sx1.c   |   2 +-
 hw/arm/realview.c   |   2 +-
 hw/arm/spitz.c  |   2 +-
 hw/arm/stellaris.c  |   2 +-
 hw/arm/versatilepb.c|   2 +-
 hw/arm/vexpress.c   |   2 +-
 hw/arm/virt.c   |   2 +-
 hw/lm32/lm32_boards.c   |   2 +-
 hw/mips/mips_jazz.c |   2 +-
 hw/ppc/ppc405_boards.c  |   2 +-
 hw/ppc/spapr.c  |   2 +-
 hw/s390x/css.c  | 185 +++-
 hw/smbios/smbios.c  |   2 +-
 hw/sparc/sun4m.c|   4 --
 hw/sparc64/sun4u.c  |   4 --
 hw/xtensa/xtfpga.c  |   2 +-
 include/hw/boards.h |   2 +-
 include/hw/i386/pc.h|   2 +-
 include/qemu/module.h   |   4 +-
 ui/spice-core.c |   2 +-
 ui/vnc.c|   2 +-
 vl.c|   2 +-
 28 files changed, 116 insertions(+), 129 deletions(-)

-- 
2.1.0

[Qemu-devel] [PATCH 1/4] machine: Use type_init() to register machine classes

2016-02-16 Thread Eduardo Habkost

Change all machine_init() users that simply call type_register*()
to use type_init().

Cc: Evgeny Voevodin 
Cc: Maksim Kozlov 
Cc: Igor Mitsyanko 
Cc: Dmitry Solodkiy 
Cc: Peter Maydell 
Cc: Rob Herring 
Cc: Andrzej Zaborowski 
Cc: Michael Walle 
Cc: "Hervé Poussineau" 
Cc: Aurelien Jarno 
Cc: Leon Alrae 
Cc: Alexander Graf 
Cc: David Gibson 
Cc: Blue Swirl 
Cc: Mark Cave-Ayland 
Cc: Max Filippov 
Cc: "Michael S. Tsirkin" 
Signed-off-by: Eduardo Habkost 
---
 hw/arm/exynos4_boards.c | 2 +-
 hw/arm/gumstix.c| 2 +-
 hw/arm/highbank.c   | 2 +-
 hw/arm/nseries.c| 2 +-
 hw/arm/omap_sx1.c   | 2 +-
 hw/arm/realview.c   | 2 +-
 hw/arm/spitz.c  | 2 +-
 hw/arm/stellaris.c  | 2 +-
 hw/arm/versatilepb.c| 2 +-
 hw/arm/vexpress.c   | 2 +-
 hw/arm/virt.c   | 2 +-
 hw/lm32/lm32_boards.c   | 2 +-
 hw/mips/mips_jazz.c | 2 +-
 hw/ppc/ppc405_boards.c  | 2 +-
 hw/ppc/spapr.c  | 2 +-
 hw/sparc/sun4m.c| 4 
 hw/sparc64/sun4u.c  | 4 
 hw/xtensa/xtfpga.c  | 2 +-
 include/hw/boards.h | 2 +-
 include/hw/i386/pc.h| 2 +-
 20 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/hw/arm/exynos4_boards.c b/hw/arm/exynos4_boards.c
index 42faa8c..5b11cd9 100644
--- a/hw/arm/exynos4_boards.c
+++ b/hw/arm/exynos4_boards.c
@@ -181,4 +181,4 @@ static void exynos4_machines_init(void)
 type_register_static(_type);
 }
 
-machine_init(exynos4_machines_init)
+type_init(exynos4_machines_init)
diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
index 626d338..d59d9ba 100644
--- a/hw/arm/gumstix.c
+++ b/hw/arm/gumstix.c
@@ -156,4 +156,4 @@ static void gumstix_machine_init(void)
 type_register_static(_type);
 }
 
-machine_init(gumstix_machine_init)
+type_init(gumstix_machine_init)
diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index e25cf5e..e37378c 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -437,4 +437,4 @@ static void calxeda_machines_init(void)
 type_register_static(_type);
 }
 
-machine_init(calxeda_machines_init)
+type_init(calxeda_machines_init)
diff --git a/hw/arm/nseries.c b/hw/arm/nseries.c
index d9e61f7..9a5f33b 100644
--- a/hw/arm/nseries.c
+++ b/hw/arm/nseries.c
@@ -1450,4 +1450,4 @@ static void nseries_machine_init(void)
 type_register_static(_type);
 }
 
-machine_init(nseries_machine_init)
+type_init(nseries_machine_init)
diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
index 68236a3..cd50691 100644
--- a/hw/arm/omap_sx1.c
+++ b/hw/arm/omap_sx1.c
@@ -252,4 +252,4 @@ static void sx1_machine_init(void)
 type_register_static(_machine_v2_type);
 }
 
-machine_init(sx1_machine_init)
+type_init(sx1_machine_init)
diff --git a/hw/arm/realview.c b/hw/arm/realview.c
index 90429fc..481ae00 100644
--- a/hw/arm/realview.c
+++ b/hw/arm/realview.c
@@ -457,4 +457,4 @@ static void realview_machine_init(void)
 type_register_static(_pbx_a9_type);
 }
 
-machine_init(realview_machine_init)
+type_init(realview_machine_init)
diff --git a/hw/arm/spitz.c b/hw/arm/spitz.c
index 607cb58..c3048f3 100644
--- a/hw/arm/spitz.c
+++ b/hw/arm/spitz.c
@@ -1037,7 +1037,7 @@ static void spitz_machine_init(void)
 type_register_static(_type);
 }
 
-machine_init(spitz_machine_init)
+type_init(spitz_machine_init)
 
 static bool is_version_0(void *opaque, int version_id)
 {
diff --git a/hw/arm/stellaris.c b/hw/arm/stellaris.c
index de8dbb2..c3c72f1 100644
--- a/hw/arm/stellaris.c
+++ b/hw/arm/stellaris.c
@@ -1420,7 +1420,7 @@ static void stellaris_machine_init(void)
 type_register_static(_type);
 }
 
-machine_init(stellaris_machine_init)
+type_init(stellaris_machine_init)
 
 static void stellaris_i2c_class_init(ObjectClass *klass, void *data)
 {
diff --git a/hw/arm/versatilepb.c b/hw/arm/versatilepb.c
index d061f0f..5f7523e 100644
--- a/hw/arm/versatilepb.c
+++ b/hw/arm/versatilepb.c
@@ -419,7 +419,7 @@ static void versatile_machine_init(void)
 type_register_static(_type);
 }
 
-machine_init(versatile_machine_init)
+type_init(versatile_machine_init)
 
 static void vpb_sic_class_init(ObjectClass *klass, void *data)
 {
diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index 3154aea..9eca64c 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -798,4 +798,4 @@ static void vexpress_machine_init(void)
 type_register_static(_a15_info);
 }
 
-machine_init(vexpress_machine_init);
+type_init(vexpress_machine_init);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 44bbbea..69eef0b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1315,4 +1315,4 @@ static void machvirt_machine_init(void)
 type_register_static(_info);
 }

Re: [Qemu-devel] [PATCH v2 7/9] i.MX: Add i.MX6 SOC implementation.

2016-02-16 Thread Jean-Christophe DUBOIS


Le 16/02/2016 16:31, Peter Maydell a écrit :

On 8 February 2016 at 22:08, Jean-Christophe Dubois  
wrote:

For now we only support the following devices:
* up to 4 Cortex A9 cores
* A9 MPCORE (SCU, GIC, TWD)
* 5 i.MX UARTs
* 2 EPIT timers
* 1 GPT timer
* 3 I2C controllers
* 7 GPIO controllers
* 6 SDHC controllers
* 1 CCM device
* 1 SRC device
* various ROM/RAM areas.

Signed-off-by: Jean-Christophe Dubois 
---

Changes since V1:
  * use g_strdup_printf/g_free instead of local char array.
  * output a message on exit for unsupported number of cores.

  default-configs/arm-softmmu.mak |   1 +
  hw/arm/Makefile.objs|   1 +
  hw/arm/fsl-imx6.c   | 407 
  include/hw/arm/fsl-imx6.h   | 447 
  4 files changed, 856 insertions(+)
  create mode 100644 hw/arm/fsl-imx6.c
  create mode 100644 include/hw/arm/fsl-imx6.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index d9b90a5..ba3a380 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -99,6 +99,7 @@ CONFIG_ALLWINNER_A10_PIT=y
  CONFIG_ALLWINNER_A10_PIC=y
  CONFIG_ALLWINNER_A10=y

+CONFIG_FSL_IMX6=y
  CONFIG_FSL_IMX31=y
  CONFIG_FSL_IMX25=y

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 2195b60..ac383df 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -15,3 +15,4 @@ obj-$(CONFIG_STM32F205_SOC) += stm32f205_soc.o
  obj-$(CONFIG_XLNX_ZYNQMP) += xlnx-zynqmp.o xlnx-ep108.o
  obj-$(CONFIG_FSL_IMX25) += fsl-imx25.o imx25_pdk.o
  obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
+obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o
diff --git a/hw/arm/fsl-imx6.c b/hw/arm/fsl-imx6.c
new file mode 100644
index 000..0faae27
--- /dev/null
+++ b/hw/arm/fsl-imx6.c
@@ -0,0 +1,407 @@
+/*
+ * Copyright (c) 2015 Jean-Christophe Dubois 
+ *
+ * i.MX6 SOC emulation.
+ *
+ * Based on hw/arm/fsl-imx31.c
+ *
+ *  This program is free software; you can redistribute it and/or modify it
+ *  under the terms of the GNU General Public License as published by the
+ *  Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful, but WITHOUT
+ *  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ *  FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ *  for more details.
+ *
+ *  You should have received a copy of the GNU General Public License along
+ *  with this program; if not, see .
+ */
+
+#include "hw/arm/fsl-imx6.h"

Include "osdep/qemu.h" first, please (see comments on patch 8).


+#include "sysemu/sysemu.h"
+#include "exec/address-spaces.h"
+#include "hw/boards.h"
+#include "sysemu/char.h"
+#include "qemu/error-report.h"
+static void fsl_imx6_realize(DeviceState *dev, Error **errp)
+{
+FslIMX6State *s = FSL_IMX6(dev);
+uint16_t i;
+Error *err = NULL;
+
+for (i = 0; i < smp_cpus; i++) {
+
+if (smp_cpus == 1) {
+/* On uniprocessor, the CBAR is set to 0 */
+object_property_set_int(OBJECT(>cpu[i]), 0,
+"reset-cbar", _abort);

0 is the default for this property so you don't really need to set this.


+} else {
+object_property_set_int(OBJECT(>cpu[i]), FSL_IMX6_A9MPCORE_ADDR,
+"reset-cbar", _abort);
+}
+
+/* All CPU but CPU 0 start in power off mode */
+if (i) {
+object_property_set_bool(OBJECT(>cpu[i]), true,
+ "start-powered-off", _abort);
+}
+
+object_property_set_bool(OBJECT(>cpu[i]), false,
+ "has_el3", _abort);

Do the CPUs in this board really not have EL3 ?


Well the Cortex A9 is certainly able to get to the secure mode. However, 
if I enable it (has_el3 set to true), the OS (Linux or Xvisor) will not 
boot. Disabling it allow both OS to boot on the emulated i.MX6.


Would you have some idea on the reason for this "hang" during the boot 
when EL3 is enabled?


JC




Otherwise this looks OK.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 0/4] Error location reporting fixes

2016-02-16 Thread Markus Armbruster

Eduardo Habkost  writes:

> This fixes the following bugs in error reporting:
>
>   $ qemu-system-x86_64 -icount rr=x -vnc :0
>   qemu-system-x86_64: -vnc :0: Invalid icount rr option: x
>
>   $ qemu-system-x86_64 -m size= -vnc :0
>   qemu-system-x86_64: -vnc :0: missing 'size' option value
>
> The last command-line option (-vnc) is being shown in the error
> message, instead of the -m or -icount options.
>
> This also includes a patch submitted previously by Marcel, to
> ensure there are no ordering conflicts when applying the patches.
> Marcel's patch fixes the following bug:
>
>   $ qemu-system-x86_64 -M q35-1.5 -redir tcp:8022::22
>   qemu-system-x86_64: -redir tcp:8022::22: unsupported machine type
>   Use -machine help to list supported machines

Applied to error-next, thanks!

Re: [Qemu-devel] [PATCH 5/4] vl: Clean up machine selection in main().

2016-02-16 Thread Markus Armbruster

Laszlo Ersek  writes:

> On 02/16/16 15:57, Markus Armbruster wrote:
>> We set machine_class to the default first, and update it to the real
>> one later.  Any use of machine_class in between is almost certainly
>> wrong.  Set it once and for all instead.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  vl.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>> 
>> diff --git a/vl.c b/vl.c
>> index 7918e9f..098728c 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2748,8 +2748,9 @@ static const QEMUOption *lookup_opt(int argc, char 
>> **argv,
>>  return popt;
>>  }
>>  
>> -static void set_machine_options(MachineClass **machine_class)
>> +static MachineClass *select_machine(MachineClass *dflt)
>>  {
>> +MachineClass *machine_class = dflt;
>>  const char *optarg;
>>  QemuOpts *opts;
>>  Location loc;
>> @@ -2761,16 +2762,17 @@ static void set_machine_options(MachineClass 
>> **machine_class)
>>  
>>  optarg = qemu_opt_get(opts, "type");
>>  if (optarg) {
>> -*machine_class = machine_parse(optarg);
>> +machine_class = machine_parse(optarg);
>>  }
>>  
>> -if (*machine_class == NULL) {
>> +if (!machine_class) {
>>  error_report("No machine specified, and there is no default");
>>  error_printf("Use -machine help to list supported machines\n");
>>  exit(1);
>>  }
>>  
>>  loc_pop();
>> +return machine_class;
>>  }
>>  
>>  static int machine_set_property(void *opaque,
>> @@ -3075,7 +3077,6 @@ int main(int argc, char **argv, char **envp)
>>  os_setup_early_signal_handling();
>>  
>>  module_call_init(MODULE_INIT_MACHINE);
>> -machine_class = find_default_machine();
>>  cpu_model = NULL;
>>  snapshot = 0;
>>  cyls = heads = secs = 0;
>> @@ -4066,7 +4067,7 @@ int main(int argc, char **argv, char **envp)
>>  
>>  replay_configure(icount_opts);
>>  
>> -set_machine_options(_class);
>> +machine_class = select_machine(find_default_machine());
>>  
>>  set_memory_options(_slots, _size, machine_class);
>>  
>> 
>
> Sorry for not being more responsive in this thread. I read through the
> patches now (including this one), and they look good to me.
>
> I have one suggestion for the commit message of this patch, after
> checking "vl.c" (and keeping the earlier patches of the series in mind):
> after the statement
>
>   Any use of machine_class in between is almost certainly wrong
>
> can you please observe
>
>   (there are no such uses right now)
>
> ?

Done.

> series
> Reviewed-by: Laszlo Ersek 

Thanks!

Re: [Qemu-devel] [PATCH 5/4] vl: Clean up machine selection in main().

2016-02-16 Thread Markus Armbruster

Marcel Apfelbaum  writes:

> On 02/16/2016 04:57 PM, Markus Armbruster wrote:
>> We set machine_class to the default first, and update it to the real
>> one later.  Any use of machine_class in between is almost certainly
>> wrong.  Set it once and for all instead.
>>
>> Signed-off-by: Markus Armbruster 
>> ---
>>   vl.c | 11 ++-
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/vl.c b/vl.c
>> index 7918e9f..098728c 100644
>> --- a/vl.c
>> +++ b/vl.c
>> @@ -2748,8 +2748,9 @@ static const QEMUOption *lookup_opt(int argc, char 
>> **argv,
>>   return popt;
>>   }
>>
>> -static void set_machine_options(MachineClass **machine_class)
>> +static MachineClass *select_machine(MachineClass *dflt)
>
> Hi Markus,
>
> I am no fan of "dflt" naming, but I can live with it.
>
>>   {
>> +MachineClass *machine_class = dflt;
>>   const char *optarg;
>>   QemuOpts *opts;
>>   Location loc;
>> @@ -2761,16 +2762,17 @@ static void set_machine_options(MachineClass 
>> **machine_class)
>>
>>   optarg = qemu_opt_get(opts, "type");
>>   if (optarg) {
>> -*machine_class = machine_parse(optarg);
>> +machine_class = machine_parse(optarg);
>>   }
>>
>> -if (*machine_class == NULL) {
>> +if (!machine_class) {
>>   error_report("No machine specified, and there is no default");
>>   error_printf("Use -machine help to list supported machines\n");
>>   exit(1);
>>   }
>>
>>   loc_pop();
>> +return machine_class;
>>   }
>>
>>   static int machine_set_property(void *opaque,
>> @@ -3075,7 +3077,6 @@ int main(int argc, char **argv, char **envp)
>>   os_setup_early_signal_handling();
>>
>>   module_call_init(MODULE_INIT_MACHINE);
>> -machine_class = find_default_machine();
>>   cpu_model = NULL;
>>   snapshot = 0;
>>   cyls = heads = secs = 0;
>> @@ -4066,7 +4067,7 @@ int main(int argc, char **argv, char **envp)
>>
>>   replay_configure(icount_opts);
>>
>> -set_machine_options(_class);
>> +machine_class = select_machine(find_default_machine());
>
> I like the approach "going all the way", I would
> even hide the call to find_default_machine inside select_machine.

Good idea.  Squashing in:

diff --git a/vl.c b/vl.c
index 098728c..c0b6747 100644
--- a/vl.c
+++ b/vl.c
@@ -2748,9 +2748,9 @@ static const QEMUOption *lookup_opt(int argc, char **argv,
 return popt;
 }
 
-static MachineClass *select_machine(MachineClass *dflt)
+static MachineClass *select_machine(void)
 {
-MachineClass *machine_class = dflt;
+MachineClass *machine_class = find_default_machine();
 const char *optarg;
 QemuOpts *opts;
 Location loc;
@@ -4067,7 +4067,7 @@ int main(int argc, char **argv, char **envp)
 
 replay_configure(icount_opts);
 
-machine_class = select_machine(find_default_machine());
+machine_class = select_machine();
 
 set_memory_options(_slots, _size, machine_class);
 
>
> Reviewed-by: Marcel Apfelbaum 

Thanks!

Re: [Qemu-devel] [PATCH v10 10/13] qapi: Don't box struct branch of alternate

2016-02-16 Thread Eric Blake

On 02/16/2016 12:07 PM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> There's no reason to do two malloc's for an alternate type visiting
>> a QAPI struct; let's just inline the struct directly as the C union
>> branch of the struct.
>>
>> Surprisingly, no clients were actually using the struct member prior
>> to this patch; some testsuite coverage is added to avoid future
>> regressions.
>>
>> Ultimately, we want to do the same treatment for QAPI unions, but
>> as that touches a lot more client code, it is better as a separate
>> patch.  So in the meantime, I had to hack in a way to test if we
>> are visiting an alternate type, within qapi-types:gen_variants();
>> the hack is possible because an earlier patch guaranteed that all
>> alternates have at least two branches, with at most one object
>> branch; the hack will go away in a later patch.
> 
> Suggest:
> 
>   Ultimately, we want to do the same treatment for QAPI unions, but as
>   that touches a lot more client code, it is better as a separate patch.
>   The two share gen_variants(), and I had to hack in a way to test if we
>   are visiting an alternate type: look for a non-object branch.  This
>   works because alternates have at least two branches, with at most one
>   object branch, and unions have only object branches.  The hack will go
>   away in a later patch.

Nicer.

> 
>> The generated code difference to qapi-types.h is relatively small,
>> made possible by a new c_type(is_member) parameter in qapi.py:
> 
> Let's drop the "made possible" clause here.

I was trying to document the new is_member parameter somewhere in the
commit message, but I can agree that it could be its own paragraph
rather than a clause here.

> 
> This is in addition to visit_type_BlockdevOptions(), so we need another
> name.
> 
> I can't quite see how the function is tied to alternates, though.
> 

I'm open to naming suggestions.  Also, I noticed that we have
'visit_type_FOO_fields' and 'visit_type_implicit_FOO'; so I didn't know
whether to name it 'visit_type_alternate_FOO' or
'visit_type_FOO_alternate'; it gets more interesting later in the series
when I completely delete 'visit_type_implicit_FOO'.

>>
>> and use it like this:
>>
>> | switch ((*obj)->type) {
>> | case QTYPE_QDICT:
>> |-visit_type_BlockdevOptions(v, name, &(*obj)->u.definition, );
>> |+visit_type_alternate_BlockdevOptions(v, name, 
>> &(*obj)->u.definition, );
> 
> Let's compare the two functions.  First visit_type_BlockdevOptions():
> 

> 
> Now visit_type_alternate_BlockdevOptions(), with differences annotated
> with //:
> 
> static void visit_type_alternate_BlockdevOptions(Visitor *v,
> const char *name,
> BlockdevOptions *obj, // one * less

Yep, because we no longer need to malloc a second object, so we no
longer need to propagate a change to obj back to the caller.

> Error **errp)
> {
> Error *err = NULL;
> 
> visit_start_struct(v, name, NULL, 0, ); // NULL instead of 
> // suppresses malloc
> if (err) {
> goto out;
> }
> // null check dropped (obj can't be null)
> visit_type_BlockdevOptions_fields(v, obj, );

Also, here we pass 'obj'; visit_type_FOO() had to pass '*obj' (again,
because we have one less level of indirection, and 7/13 reduced the
indirection required in visit_type_FOO_fields()).

> // visit_start_union() + switch dropped
> error_propagate(errp, err);
> err = NULL;
> visit_end_struct(v, );
> out:
> error_propagate(errp, err);
> }
> 
> Why can we drop visit_start_union() + switch?

visit_start_union() is dropped because its only purpose was to determine
if the dealloc visitor needs to visit the default branch. When we had a
separate allocation, we did not want to visit the branch if the
discriminator was not successfully parsed, because otherwise we would
dereference NULL.  But now that we don't have a second pointer
allocation, we no longer have anything to dealloc, and we can no longer
dereference NULL. Explained better in 12/13, where I delete
visit_start_union() altogether.  But maybe I could keep it in this patch
in the meantime, to minimize confusion.

Dropped switch, on the other hand, looks to be a genuine bug.  Eeek.
That should absolutely be present, and it proves that our testsuite is
not strong enough for not catching me on it.

And now that you've made me think about it, maybe I have yet another
idea.  Right now, we've split the visit of local members into
visit_type_FOO_fields(), while leaving the variant members to be visited
in visit_type_FOO()

visit_type_FOO_fields() is static, so we can change it without impacting
the entire tree; I could add a bool parameter to that function, and write:

visit_type_FOO() {
  visit_start_struct(obj)

Re: [Qemu-devel] [PATCH v2] qemu-options.hx: Improve documentation of chardev multiplexing mode

2016-02-16 Thread Kashyap Chamarthy

On Tue, Feb 16, 2016 at 05:28:58PM +, Peter Maydell wrote:
> The current documentation of chardev mux=on is rather brief and opaque;
> expand it to hopefully be a bit more helpful.
> 
> Signed-off-by: Peter Maydell 
> ---
> There was some discussion on #qemu yesterday evening about multiplexing,
> and "make the docs a bit less confusing" was one suggestion...
> 
> v1->v2 changes:
>  * include examples of the multiplexer use
>  * mention that some other command options implicitly create a mux
>  * link to the documentation of the mux's escape keys
>  * fix up the documentation of mux escape keys so it can actually
>be linked to
>  * drop the not-implemented "Ctrl-a ?" from the docs
>  * improve the documentation of the mux keys a bit (in particular
>mentioning -echr, and being more generic than just "console/monitor")
> 
> Our doc structure overall is pretty busted (why is all the documentation
> of generic stuff like -chardev lurking in "PC system emulation", for
> instance), so this is about as far as I want to go in cleaning up
> for now...
> 
>  qemu-doc.texi   | 30 --
>  qemu-options.hx | 45 +++--
>  2 files changed, 63 insertions(+), 12 deletions(-)
> 
> diff --git a/qemu-doc.texi b/qemu-doc.texi
> index c324da8..bc9dd13 100644
> --- a/qemu-doc.texi
> +++ b/qemu-doc.texi
> @@ -158,7 +158,8 @@ TODO (no longer available)
>  * pcsys_introduction:: Introduction
>  * pcsys_quickstart::   Quick Start
>  * sec_invocation:: Invocation
> -* pcsys_keys:: Keys
> +* pcsys_keys:: Keys in the graphical frontends
> +* mux_keys::   Keys in the character backend multiplexer
>  * pcsys_monitor::  QEMU Monitor
>  * disk_images::Disk Images
>  * pcsys_network::  Network emulation
> @@ -272,7 +273,7 @@ targets do not need a disk image.
>  @c man end
>  
>  @node pcsys_keys
> -@section Keys
> +@section Keys in the graphical frontends
>  
>  @c man begin OPTIONS
>  
> @@ -322,15 +323,23 @@ Toggle mouse and keyboard grab.
>  In the virtual consoles, you can use @key{Ctrl-Up}, @key{Ctrl-Down},
>  @key{Ctrl-PageUp} and @key{Ctrl-PageDown} to move in the back log.
>  
> -@kindex Ctrl-a h
> -During emulation, if you are using the @option{-nographic} option, use
> -@key{Ctrl-a h} to get terminal commands:
> +@c man end
> +
> +@node mux_keys
> +@section Keys in the character backend multiplexer
> +
> +@c man begin OPTIONS
> +
> +During emulation, if you are using a character backend multiplexer
> +(which is the default if you are using @option{-nographic}) then
> +several commands are available via an escape sequence. These
> +key sequences all start with an escape character, which is @key{Ctrl-a}
> +by default, but can be changed with @option{-echr}. The list below assumes
> +you're using the default.
>  
>  @table @key
>  @item Ctrl-a h
>  @kindex Ctrl-a h
> -@item Ctrl-a ?
> -@kindex Ctrl-a ?
>  Print this help
>  @item Ctrl-a x
>  @kindex Ctrl-a x
> @@ -346,10 +355,11 @@ Toggle console timestamps
>  Send break (magic sysrq in Linux)
>  @item Ctrl-a c
>  @kindex Ctrl-a c
> -Switch between console and monitor
> +Rotate between the frontends connected to the multiplexer (usually
> +this switches between the monitor and the console)
>  @item Ctrl-a Ctrl-a
> -@kindex Ctrl-a a
> -Send Ctrl-a
> +@kindex Ctrl-a Ctrl-a
> +Send the escape character to the frontend
>  @end table
>  @c man end
>  
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 2f0465e..7e6762e 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2162,8 +2162,49 @@ All devices must have an id, which can be any string 
> up to 127 characters long.
>  It is used to uniquely identify this device in other command line directives.
>  
>  A character device may be used in multiplexing mode by multiple front-ends.
> -The key sequence of @key{Control-a} and @key{c} will rotate the input focus
> -between attached front-ends. Specify @option{mux=on} to enable this mode.
> +Specify @option{mux=on} to enable this mode.
> +A multiplexer is a "1:N" device, and here the "1" end is your specified 
> chardev
> +backend, and the "N" end is the various parts of QEMU that can talk to a 
> chardev.
> +If you create a chardev with @option{id=myid} and @option{mux=on}, QEMU will
> +create a multiplexer with your specified ID, and you can then configure 
> multiple
> +front ends to use that chardev ID for their input/output. Up to four 
> different
> +front ends can be connected to a single multiplexed chardev. (Without
> +multiplexing enabled, a chardev can only be used by a single front end.)
> +For instance you could use this to allow a single stdio chardev to be used by
> +two serial ports and the QEMU monitor:
> +
> +@example
> +-chardev stdio,mux=on,id=char0 \
> +-mon chardev=char0,mode=readline,default \
> +-serial chardev:char0 \
> +-serial chardev:char0
> +@end example
> +
> +You can have more than one multiplexer in a

[Qemu-devel] [Bug 1490611] Re: Using qemu >=2.2.1 to convert raw->VHD (fixed) adds extra padding to the result file, which Microsoft Azure rejects as invalid

2016-02-16 Thread Jeff Cody

First, I'd say that if you are converting an image over to use on
Hyper-V, you would probably be better served using the VHDX format
(completely different from VHD) - it is the newer format (and completely
different from VHD), and is supported by QEMU as well.  It is better
defined and more consistent (at least so far) in its specification.

That said, I think for the specific VHD problem we could look at the
Creator field in the image.  My only reservations on that are:

1.) I haven't looked at the Creator field comprehensively across all
revisions of Hyper-V and Virtual PC.  But in my small sample size, it
seems feasible.

2.) It most likely won't be 100%, because of edge cases (e.g. I don't
know what happens when Hyper-V opens a Virtual-PC produced VHD file, and
under what circumstances it may or may not alter the Creator field)

But the above two reservations can be overcome with the appropriate
options that can be passed to the VHD format, to override the auto-
detection method.

I have access to both Virtual PC and Hyper-V, so I can put together a
small patch series tomorrow to try that out.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1490611

Title:
  Using qemu >=2.2.1 to convert raw->VHD (fixed) adds extra padding to
  the result file, which Microsoft Azure rejects as invalid

Status in QEMU:
  Fix Released

Bug description:
  Starting with a raw disk image, using "qemu-img convert" to convert
  from raw to VHD results in the output VHD file's virtual size being
  aligned to the nearest 516096 bytes (16 heads x 63 sectors per head x
  512 bytes per sector), instead of preserving the input file's size as
  the output VHD's virtual disk size.

  Microsoft Azure requires that disk images (VHDs) submitted for upload
  have virtual sizes aligned to a megabyte boundary. (Ex. 4096MB,
  4097MB, 4098MB, etc. are OK, 4096.5MB is rejected with an error.) This
  is reflected in Microsoft's documentation: https://azure.microsoft.com
  /en-us/documentation/articles/virtual-machines-linux-create-upload-
  vhd-generic/

  This is reproducible with the following set of commands (including the
  Azure command line tools from https://github.com/Azure/azure-xplat-
  cli). For the following example, I used qemu version 2.2.1:

  $ dd if=/dev/zero of=source-disk.img bs=1M count=4096

  $ stat source-disk.img 
File: ‘source-disk.img’
Size: 4294967296  Blocks: 798656 IO Block: 4096   regular file
  Device: fc01h/64513dInode: 13247963Links: 1
  Access: (0644/-rw-r--r--)  Uid: ( 1000/  smkent)   Gid: ( 1000/  smkent)
  Access: 2015-08-18 09:48:02.613988480 -0700
  Modify: 2015-08-18 09:48:02.825985646 -0700
  Change: 2015-08-18 09:48:02.825985646 -0700
   Birth: -

  $ qemu-img convert -f raw -o subformat=fixed -O vpc source-disk.img
  dest-disk.vhd

  $ stat dest-disk.vhd 
File: ‘dest-disk.vhd’
Size: 4296499712  Blocks: 535216 IO Block: 4096   regular file
  Device: fc01h/64513dInode: 13247964Links: 1
  Access: (0644/-rw-r--r--)  Uid: ( 1000/  smkent)   Gid: ( 1000/  smkent)
  Access: 2015-08-18 09:50:22.252077624 -0700
  Modify: 2015-08-18 09:49:24.424868868 -0700
  Change: 2015-08-18 09:49:24.424868868 -0700
   Birth: -

  $ azure vm image create testimage1 dest-disk.vhd -o linux -l "West US"
  info:Executing command vm image create
  + Retrieving storage accounts 
 
  info:VHD size : 4097 MB
  info:Uploading 4195800.5 KB
  Requested:100.0% Completed:100.0% Running:   0 Time: 1m 0s Speed:  6744 KB/s 
  info:https://[redacted].blob.core.windows.net/vm-images/dest-disk.vhd was 
uploaded successfully
  error:   The VHD 
https://[redacted].blob.core.windows.net/vm-images/dest-disk.vhd has an 
unsupported virtual size of 4296499200 bytes.  The size must be a whole number 
(in MBs).
  info:Error information has been recorded to /home/smkent/.azure/azure.err
  error:   vm image create command failed

  I also ran the above commands using qemu 2.4.0, which resulted in the
  same error as the conversion behavior is the same.

  However, qemu 2.1.1 and earlier (including qemu 2.0.0 installed by
  Ubuntu 14.04) does not pad the virtual disk size during conversion.
  Using qemu-img convert from qemu versions <=2.1.1 results in a VHD
  that is exactly the size of the raw input file plus 512 bytes (for the
  VHD footer). Those qemu versions do not attempt to realign the disk.
  As a result, Azure accepts VHD files created using those versions of
  qemu-img convert for upload.

  Is there a reason why newer qemu realigns the converted VHD file? It
  would be useful if an option were added to disable this feature, as
  current versions of qemu cannot be used to create VHD files for Azure
  using Microsoft's official instructions.

To manage notifications about this bug go to:

Re: [Qemu-devel] [PATCH v1 8/9] target-arm: A64: Create Instruction Syndromes for Data Aborts

2016-02-16 Thread Peter Maydell

On 12 February 2016 at 14:34, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Add support for generating the instruction syndrome for Data Aborts.
> These syndromes are used by hypervisors for example to trap and emulate
> memory accesses.
>
> We save the decoded data out-of-band with the TBs at translation time.
> When exceptions hit, the extra data attached to the TB is used to
> recreate the state needed to encode instruction syndromes.
> This avoids the need to emit moves with every load/store.

I think this patch also would be simpler if the encoded info
put in with the TBs was just the syndrome register, rather
than some other encoding.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v1 7/9] target-arm: Add the ARMInsnSyndrome type

2016-02-16 Thread Peter Maydell

On 12 February 2016 at 14:34, Edgar E. Iglesias
 wrote:
> From: "Edgar E. Iglesias" 
>
> Add the ARMInsnSyndrome type including helper functions to
> encode and decode it into an u32. This is in preparation for
> Instruction Syndrome generation for Data Aborts.
>
> No functional change.

I find this patch confusing -- syndromes are already 32 bits,
so why is the encoding of the syndrome information into 32 bits
not just the syndrome register format ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH v10 10/13] qapi: Don't box struct branch of alternate

2016-02-16 Thread Markus Armbruster

Eric Blake  writes:

> There's no reason to do two malloc's for an alternate type visiting
> a QAPI struct; let's just inline the struct directly as the C union
> branch of the struct.
>
> Surprisingly, no clients were actually using the struct member prior
> to this patch; some testsuite coverage is added to avoid future
> regressions.
>
> Ultimately, we want to do the same treatment for QAPI unions, but
> as that touches a lot more client code, it is better as a separate
> patch.  So in the meantime, I had to hack in a way to test if we
> are visiting an alternate type, within qapi-types:gen_variants();
> the hack is possible because an earlier patch guaranteed that all
> alternates have at least two branches, with at most one object
> branch; the hack will go away in a later patch.

Suggest:

  Ultimately, we want to do the same treatment for QAPI unions, but as
  that touches a lot more client code, it is better as a separate patch.
  The two share gen_variants(), and I had to hack in a way to test if we
  are visiting an alternate type: look for a non-object branch.  This
  works because alternates have at least two branches, with at most one
  object branch, and unions have only object branches.  The hack will go
  away in a later patch.

> The generated code difference to qapi-types.h is relatively small,
> made possible by a new c_type(is_member) parameter in qapi.py:

Let's drop the "made possible" clause here.

>
> | struct BlockdevRef {
> | QType type;
> | union { /* union tag is @type */
> | void *data;
> |-BlockdevOptions *definition;
> |+BlockdevOptions definition;
> | char *reference;
> | } u;
> | };
>
> meanwhile, in qapi-visit.h, we create a new visit_type_alternate_Foo(),
> comparable to visit_type_implicit_Foo():
>
> |+static void visit_type_alternate_BlockdevOptions(Visitor *v, const char 
> *name, BlockdevOptions *obj, Error **errp)
> |+{
> |+Error *err = NULL;
> |+
> |+visit_start_struct(v, name, NULL, 0, );
> |+if (err) {
> |+goto out;
> |+}
> |+visit_type_BlockdevOptions_fields(v, obj, );
> |+error_propagate(errp, err);
> |+err = NULL;
> |+visit_end_struct(v, );
> |+out:
> |+error_propagate(errp, err);
> |+}

This is in addition to visit_type_BlockdevOptions(), so we need another
name.

I can't quite see how the function is tied to alternates, though.

>
> and use it like this:
>
> | switch ((*obj)->type) {
> | case QTYPE_QDICT:
> |-visit_type_BlockdevOptions(v, name, &(*obj)->u.definition, );
> |+visit_type_alternate_BlockdevOptions(v, name, 
> &(*obj)->u.definition, );

Let's compare the two functions.  First visit_type_BlockdevOptions():

void visit_type_BlockdevOptions(Visitor *v,
const char *name,
BlockdevOptions **obj,
Error **errp)
{
Error *err = NULL;

visit_start_struct(v, name, (void **)obj, sizeof(BlockdevOptions), 
);
if (err) {
goto out;
}
if (!*obj) {
goto out_obj;
}
visit_type_BlockdevOptions_fields(v, *obj, );
if (err) {
goto out_obj;
}
if (!visit_start_union(v, !!(*obj)->u.data, ) || err) {
goto out_obj;
}
switch ((*obj)->driver) {
case BLOCKDEV_DRIVER_ARCHIPELAGO:
visit_type_implicit_BlockdevOptionsArchipelago(v, 
&(*obj)->u.archipelago, );
break;
[All the other cases...]
default:
abort();
}
out_obj:
error_propagate(errp, err);
err = NULL;
visit_end_struct(v, );
out:
error_propagate(errp, err);
}

Now visit_type_alternate_BlockdevOptions(), with differences annotated
with //:

static void visit_type_alternate_BlockdevOptions(Visitor *v,
const char *name,
BlockdevOptions *obj, // one * less
Error **errp)
{
Error *err = NULL;

visit_start_struct(v, name, NULL, 0, ); // NULL instead of 
// suppresses malloc
if (err) {
goto out;
}
// null check dropped (obj can't be null)
visit_type_BlockdevOptions_fields(v, obj, );
// visit_start_union() + switch dropped
error_propagate(errp, err);
err = NULL;
visit_end_struct(v, );
out:
error_propagate(errp, err);
}

Why can we drop visit_start_union() + switch?

> | break;
> | case QTYPE_QSTRING:
> | visit_type_str(v, name, &(*obj)->u.reference, );
>
> Signed-off-by: Eric Blake 
>
> ---
> v10: new patch
> ---
>  scripts/qapi-types.py   | 10 ++-
>  scripts/qapi-visit.py   | 49 
>

Re: [Qemu-devel] [PATCH 06/17] qcow2-dirty-bitmap: add qcow2_dirty_bitmap_load()

2016-02-16 Thread Vladimir Sementsov-Ogievskiy


On 07.10.2015 02:01, John Snow wrote:


On 09/05/2015 12:43 PM, Vladimir Sementsov-Ogievskiy wrote:

This function loads block dirty bitmap from qcow2.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-dirty-bitmap.c | 155 +
  block/qcow2.c  |   2 +
  block/qcow2.h  |   5 ++
  include/block/block_int.h  |   5 ++
  4 files changed, 167 insertions(+)

diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
index 1260d1d..ea50137 100644
--- a/block/qcow2-dirty-bitmap.c
+++ b/block/qcow2-dirty-bitmap.c
@@ -99,6 +99,13 @@ static int check_constraints(int cluster_size,
  return fail ? -EINVAL : 0;
  }
  
+static QCowDirtyBitmapHeader *bitmap_header(BDRVQcowState *s,

+QCowDirtyBitmap *bitmap)
+{

BDRVQcow2State here and everywhere below, again.


+return (QCowDirtyBitmapHeader *)
+   (s->dirty_bitmap_directory + bitmap->offset);
+}
+
  static int directory_read(BlockDriverState *bs)
  {
  int ret;
@@ -195,3 +202,151 @@ out:
  }
  return ret;
  }
+
+static QCowDirtyBitmap *find_dirty_bitmap_by_name(BlockDriverState *bs,
+  const char *name)
+{
+BDRVQcowState *s = bs->opaque;
+QCowDirtyBitmap *bm, *end = s->dirty_bitmaps + s->nb_dirty_bitmaps;
+
+for (bm = s->dirty_bitmaps; bm < end; ++bm) {
+if (strcmp(bm->name, name) == 0) {
+return bm;
+}
+}
+
+return NULL;
+}
+

Whoops. This says to me we really need to prohibit bitmaps with the same
name from being stored in the same file, and mention this in the spec,
and test for it on load.

Perhaps we can create a hash-table and fail verification on open if
there's a collision. We can then use that hash-table here for
find_dirty_bitmap_by_name to speed up lookup since we already went
through the trouble of loading it.

Might help for large cases where we're approaching 64K bitmaps, will not
be too big of a performance hit for casual use.


So, it (hash table approach) may be implemented later




+/* dirty sectors in cluster is a number of sectors in the image, corresponding
+ * to one cluster of bitmap data */
+static uint64_t dirty_sectors_in_cluster(const BDRVQcowState *s,
+ const BdrvDirtyBitmap *bitmap)
+{
+uint32_t sector_granularity =
+bdrv_dirty_bitmap_granularity(bitmap) >> BDRV_SECTOR_BITS;
+
+return (uint64_t)sector_granularity * (s->cluster_size << 3);
+}
+
+/* load_bitmap()
+ * load dirty bitmap from Dirty Bitmap Table
+ * Dirty Bitmap Table entries are assumed to be in big endian format */
+static int load_bitmap(BlockDriverState *bs,
+   const uint64_t *dirty_bitmap_table,
+   uint32_t dirty_bitmap_table_size,
+   BdrvDirtyBitmap *bitmap)
+{
+int ret = 0;
+BDRVQcowState *s = bs->opaque;
+uint64_t sector, dsc;
+uint64_t bm_size = bdrv_dirty_bitmap_size(bitmap);

I found some of this hard to unwind, bear with me:

AKA, the number of sectors that bitmap tracks ...


+int cl_size = s->cluster_size;
+uint8_t *buf = NULL;
+uint32_t i, tab_size =
+size_to_clusters(s, bdrv_dirty_bitmap_data_size(bitmap, bm_size));
+

bdrv_dirty_bitmap_data_size(bitmap, COUNT) calculates for us how much
actual real size the lowest level of the hbitmap actually takes.

Then size_to_clusters tells us how many clusters we need to store that,
and therefore should map back to be the same as the predicted value,
dirty_bitmap_table_size.


+if (tab_size > dirty_bitmap_table_size) {
+return -EINVAL;
+}
+

I assume this is not == because the real table size might have padding
or other such things, but if the calculated tab size is bigger than the
actual then we have a problem.

But I think that you've passed in "birty_ditmap_table_size" as the total
byte count of the table, but "tab_size" is computed here as the number
of entries. I think you should multiply tab_size by uint64_t and test if
they're equal.


+bdrv_clear_dirty_bitmap(bitmap);
+

Clear takes the aio_context for the associated BDS and then releases it...


+buf = g_malloc0(cl_size);
+dsc = dirty_sectors_in_cluster(s, bitmap);
+for (i = 0, sector = 0; i < tab_size; ++i, sector += dsc) {
+uint64_t end = MIN(bm_size, sector + dsc);
+uint64_t offset = be64_to_cpu(dirty_bitmap_table[i]);
+
+if (offset & DBM_TABLE_ENTRY_RESERVED_MASK) {
+ret = -EINVAL;
+goto finish;
+}
+
+/* zero offset means cluster unallocated */
+if (offset) {
+ret = bdrv_pread(bs->file, offset, buf, cl_size);
+if (ret < 0) {
+goto finish;
+}
+bdrv_dirty_bitmap_deserialize_part(bitmap, buf, sector, end);

...but at this point, I believe we're editing this

Re: [Qemu-devel] [PATCH v1 3/9] target-arm: Add the thumb/IL flag to syn_data_abort

2016-02-16 Thread Sergey Fedorov

On 12.02.2016 17:33, Edgar E. Iglesias wrote:
> From: "Edgar E. Iglesias" 
>
> Signed-off-by: Edgar E. Iglesias 
> ---
>  target-arm/internals.h | 4 +++-
>  target-arm/op_helper.c | 6 --
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/target-arm/internals.h b/target-arm/internals.h
> index 70bec4a..b1c483b 100644
> --- a/target-arm/internals.h
> +++ b/target-arm/internals.h
> @@ -360,9 +360,11 @@ static inline uint32_t syn_insn_abort(int same_el, int 
> ea, int s1ptw, int fsc)
>  }
>  
>  static inline uint32_t syn_data_abort(int same_el, int ea, int cm, int s1ptw,
> -  int wnr, int fsc)
> +  int wnr, int fsc,
> +  bool is_thumb)
>  {
>  return (EC_DATAABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
> +| (is_thumb ? 0 : ARM_EL_IL)
>  | (ea << 9) | (cm << 8) | (s1ptw << 7) | (wnr << 6) | fsc;
>  }
>  
> diff --git a/target-arm/op_helper.c b/target-arm/op_helper.c
> index bd48549..4e629e1 100644
> --- a/target-arm/op_helper.c
> +++ b/target-arm/op_helper.c
> @@ -115,7 +115,8 @@ void tlb_fill(CPUState *cs, target_ulong addr, int 
> is_write, int mmu_idx,
>  syn = syn_insn_abort(same_el, 0, fi.s1ptw, syn);
>  exc = EXCP_PREFETCH_ABORT;
>  } else {
> -syn = syn_data_abort(same_el, 0, 0, fi.s1ptw, is_write == 1, 
> syn);
> +syn = syn_data_abort(same_el, 0, 0, fi.s1ptw, is_write == 1, syn,
> + env->thumb);
>  if (is_write == 1 && arm_feature(env, ARM_FEATURE_V6)) {
>  fsr |= (1 << 11);
>  }
> @@ -161,7 +162,8 @@ void arm_cpu_do_unaligned_access(CPUState *cs, vaddr 
> vaddr, int is_write,
>  }
>  
>  raise_exception(env, EXCP_DATA_ABORT,
> -syn_data_abort(same_el, 0, 0, 0, is_write == 1, 0x21),
> +syn_data_abort(same_el, 0, 0, 0, is_write == 1, 0x21,
> +   env->thumb),
>  target_el);
>  }
>  

ESR_ELx.IL is about instruction length. Thumb instructions can be
32-bit-long. In such case, IL should be set to 1 even if env->thumb is
set. Additionally, a data abort exception for which the value of the ISV
bit is 0, should also set IL to 1, no matter what was the instruction
length. See ARM ARMv8 A.i, section D7.2.27 ESR_ELx, Exception Syndrome
Register (ELx).

Regards,
Sergey

Re: [Qemu-devel] [PATCH] usb: check RNDIS buffer offsets & length

2016-02-16 Thread P J P

  Hello Gerd,

+-- On Tue, 16 Feb 2016, Gerd Hoffmann wrote --+
| Moves up the check so it is done for every control xfer.  Good.
 ... 
| Why this is needed?  All control transfers go through do_token_setup
| first, so with the check moved in do_token_setup we should never ever
| trigger it here ...

  I see, okay.

| > -if (bufoffs + buflen > length)
| > +if (buflen > length || bufoffs >= length || bufoffs + buflen > length) 
{
| >  return USB_RET_STALL;
| > +}
| 
| What is this?  Not mentioned in the commit message.  Looks like integer
| overflow prevention to me (if correct: separate patch with proper commit
| message please).

  That's right. I've sent separate revised patches for the above two changes.

Thank you.
--
Prasad J Pandit / Red Hat Product Security Team
47AF CE69 3A90 54AA 9045 1053 DD13 3D32 FE5B 041F

[Qemu-devel] [PATCH 2/2] usb: check RNDIS buffer offsets & length

2016-02-16 Thread P J P

From: Prasad J Pandit 

When processing remote NDIS control message packets,
the USB Net device emulator uses a fixed length(4096) data buffer.
The incoming informationBufferOffset & Length combination could
overflow and cross that range. Check control message buffer
offsets and length to avoid it.

Reported-by: Qinghao Tang 
Signed-off-by: Prasad J Pandit 
---
 hw/usb/dev-network.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Update as per review
  -> https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg03475.html

diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 8a4ff49..180adce 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -915,8 +915,9 @@ static int rndis_query_response(USBNetState *s,
 
 bufoffs = le32_to_cpu(buf->InformationBufferOffset) + 8;
 buflen = le32_to_cpu(buf->InformationBufferLength);
-if (bufoffs + buflen > length)
+if (buflen > length || bufoffs >= length || bufoffs + buflen > length) {
 return USB_RET_STALL;
+}
 
 infobuflen = ndis_query(s, le32_to_cpu(buf->OID),
 bufoffs + (uint8_t *) buf, buflen, infobuf,
@@ -961,8 +962,9 @@ static int rndis_set_response(USBNetState *s,
 
 bufoffs = le32_to_cpu(buf->InformationBufferOffset) + 8;
 buflen = le32_to_cpu(buf->InformationBufferLength);
-if (bufoffs + buflen > length)
+if (buflen > length || bufoffs >= length || bufoffs + buflen > length) {
 return USB_RET_STALL;
+}
 
 ret = ndis_set(s, le32_to_cpu(buf->OID),
 bufoffs + (uint8_t *) buf, buflen);
@@ -1212,8 +1214,9 @@ static void usb_net_handle_dataout(USBNetState *s, 
USBPacket *p)
 if (le32_to_cpu(msg->MessageType) == RNDIS_PACKET_MSG) {
 uint32_t offs = 8 + le32_to_cpu(msg->DataOffset);
 uint32_t size = le32_to_cpu(msg->DataLength);
-if (offs + size <= len)
+if (offs < len && size < len && offs + size <= len) {
 qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs, size);
+}
 }
 s->out_ptr -= len;
 memmove(s->out_buf, >out_buf[len], s->out_ptr);
-- 
2.5.0

[Qemu-devel] [PATCH 0/2] usb: check RNDIS offsets & length

2016-02-16 Thread P J P

From: Prasad J Pandit 

Hello,

When processing remote NDIS control message packets, the USB Net
device emulator uses a fixed length(4096) data buffer. The incoming
packet length could exceed that OR informationBufferOffset & Length
combination could overflow and cross that range. These two patches
add checks to avoid such overflows.

Thank you.
---
Prasad J Pandit (2):
  usb: check RNDIS message length
  usb: check RNDIS buffer offsets & length

 hw/usb/core.c| 18 +-
 hw/usb/dev-network.c |  9 ++---
 2 files changed, 15 insertions(+), 12 deletions(-)

-- 
2.5.0

[Qemu-devel] [PATCH 1/2] usb: check RNDIS message length

2016-02-16 Thread P J P

From: Prasad J Pandit 

When processing remote NDIS control message packets, the USB Net
device emulator uses a fixed length(4096) data buffer. The incoming
packet length could exceed this limit. Add a check to avoid it.

Signed-off-by: Prasad J Pandit 
---
 hw/usb/core.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

Update as per review
  -> https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg03475.html

diff --git a/hw/usb/core.c b/hw/usb/core.c
index d0025db..7f46370 100644
--- a/hw/usb/core.c
+++ b/hw/usb/core.c
@@ -128,9 +128,16 @@ static void do_token_setup(USBDevice *s, USBPacket *p)
 }
 
 usb_packet_copy(p, s->setup_buf, p->iov.size);
+s->setup_index = 0;
 p->actual_length = 0;
 s->setup_len   = (s->setup_buf[7] << 8) | s->setup_buf[6];
-s->setup_index = 0;
+if (s->setup_len > sizeof(s->data_buf)) {
+fprintf(stderr,
+"usb_generic_handle_packet: ctrl buffer too small (%d > 
%zu)\n",
+s->setup_len, sizeof(s->data_buf));
+p->status = USB_RET_STALL;
+return;
+}
 
 request = (s->setup_buf[0] << 8) | s->setup_buf[1];
 value   = (s->setup_buf[3] << 8) | s->setup_buf[2];
@@ -151,13 +158,6 @@ static void do_token_setup(USBDevice *s, USBPacket *p)
 }
 s->setup_state = SETUP_STATE_DATA;
 } else {
-if (s->setup_len > sizeof(s->data_buf)) {
-fprintf(stderr,
-"usb_generic_handle_packet: ctrl buffer too small (%d > 
%zu)\n",
-s->setup_len, sizeof(s->data_buf));
-p->status = USB_RET_STALL;
-return;
-}
 if (s->setup_len == 0)
 s->setup_state = SETUP_STATE_ACK;
 else
@@ -176,7 +176,7 @@ static void do_token_in(USBDevice *s, USBPacket *p)
 request = (s->setup_buf[0] << 8) | s->setup_buf[1];
 value   = (s->setup_buf[3] << 8) | s->setup_buf[2];
 index   = (s->setup_buf[5] << 8) | s->setup_buf[4];
- 
+
 switch(s->setup_state) {
 case SETUP_STATE_ACK:
 if (!(s->setup_buf[0] & USB_DIR_IN)) {
-- 
2.5.0

Re: [Qemu-devel] [PATCH 05/17] qcow2-dirty-bitmap: read dirty bitmap directory

2016-02-16 Thread Vladimir Sementsov-Ogievskiy


On 07.10.2015 00:27, John Snow wrote:


On 09/05/2015 12:43 PM, Vladimir Sementsov-Ogievskiy wrote:

Adds qcow2_read_dirty_bitmaps, reading Dirty Bitmap Directory as
specified in docs/specs/qcow2.txt

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2-dirty-bitmap.c | 155 +
  block/qcow2.h  |  10 +++
  2 files changed, 165 insertions(+)

diff --git a/block/qcow2-dirty-bitmap.c b/block/qcow2-dirty-bitmap.c
index fd4e0ef..1260d1d 100644
--- a/block/qcow2-dirty-bitmap.c
+++ b/block/qcow2-dirty-bitmap.c
@@ -25,6 +25,9 @@
   * THE SOFTWARE.
   */
  
+#include "block/block_int.h"

+#include "block/qcow2.h"
+
  /* NOTICE: DBM here means Dirty Bitmap and used as a namespace for _internal_
   * constants. Please do not use this _internal_ abbreviation for other needs
   * and/or outside of this file. */
@@ -40,3 +43,155 @@
  
  /* bits [0, 8] U [56, 63] are reserved */

  #define DBM_TABLE_ENTRY_RESERVED_MASK 0xff0001ff
+
+void qcow2_free_dirty_bitmaps(BlockDriverState *bs)
+{
+BDRVQcowState *s = bs->opaque;

BDRVQcow2State here and everywhere else in this patch, now.


+int i;
+
+for (i = 0; i < s->nb_dirty_bitmaps; i++) {
+g_free(s->dirty_bitmaps[i].name);
+}
+g_free(s->dirty_bitmaps);
+s->dirty_bitmaps = NULL;
+s->nb_dirty_bitmaps = 0;
+
+g_free(s->dirty_bitmap_directory);
+s->dirty_bitmap_directory = NULL;
+}
+
+static void bitmap_header_to_cpu(QCowDirtyBitmapHeader *h)
+{
+be64_to_cpus(>dirty_bitmap_table_offset);
+be64_to_cpus(>nb_virtual_bits);
+be32_to_cpus(>dirty_bitmap_table_size);
+be32_to_cpus(>granularity_bits);
+be32_to_cpus(>flags);
+be16_to_cpus(>name_size);

I realize you probably got these functions by example from the other
qcow2 files, but what exactly is cpu*s* here? What does the *s* stand for?

I guess it refers to the in-place swapping variants that the Linux


yes, in-place swapping


kernel defines?

hmm, just a curiosity on my part ...

the function looks correct, anyway. :)


+}
+
+static int calc_dir_entry_size(size_t name_size)
+{
+return align_offset(sizeof(QCowDirtyBitmapHeader) + name_size, 8);

Matches spec.


+}
+
+static int dir_entry_size(QCowDirtyBitmapHeader *h)
+{
+return calc_dir_entry_size(h->name_size);

OK.


+}
+
+static int check_constraints(int cluster_size,
+ QCowDirtyBitmapHeader *h)
+{
+uint64_t phys_bitmap_bytes =
+(uint64_t)h->dirty_bitmap_table_size * cluster_size;
+uint64_t max_virtual_bits = (phys_bitmap_bytes * 8) << h->granularity_bits;
+
+int fail =
+(h->dirty_bitmap_table_offset % cluster_size) ||
+(h->dirty_bitmap_table_size > DBM_MAX_TABLE_SIZE) ||
+(phys_bitmap_bytes > DBM_MAX_PHYS_SIZE) ||
+(h->nb_virtual_bits > max_virtual_bits) ||
+(h->granularity_bits > DBM_MAX_GRANULARITY_BITS) ||
+(h->flags & DBM_RESERVED_FLAGS) ||
+(h->name_size > DBM_MAX_NAME_SIZE);
+

Function is a little dense, but appears to be correct -- apart from the
DMB_RESERVED_FLAGS issue I mentioned earlier.


with this patch there are no flags, they will be added with the 
following patches and this mask will be changed.






+return fail ? -EINVAL : 0;
+}
+
+static int directory_read(BlockDriverState *bs)
+{
+int ret;
+BDRVQcowState *s = bs->opaque;
+uint8_t *entry, *end;
+
+if (s->dirty_bitmap_directory != NULL) {
+/* already read */
+return -EEXIST;
+}
+
+s->dirty_bitmap_directory = g_try_malloc0(s->dirty_bitmap_directory_size);
+if (s->dirty_bitmap_directory == NULL) {
+return -ENOMEM;
+}
+

I assume we're trying here in case the directory size is garbage, as a
method of preventing garbage from crashing our program. Since
dirty_bitmap_directory_size was in theory already read in (by a function
checked in later in this series), did we not validate that input value?


Hmm, it is verified, but the allowed range is large.. I'm not sure, but 
it seems like someone asked me to use _try_ for user defined or large 
allocations, you or Stefan..





+ret = bdrv_pread(bs->file,
+ s->dirty_bitmap_directory_offset,
+ s->dirty_bitmap_directory,
+ s->dirty_bitmap_directory_size);
+if (ret < 0) {
+goto fail;
+}
+

Alright, so we read the entire directory into memory... which can be as
large as 64K * 1024, or 64MiB. A non-trivial size.


But, on the other hand, in normal cases with 1-2 bitmaps it will be 
little, and I'm not sure that it is good idea to implement now more 
complex solution.


Also, if all 64K bitmaps will be loaded into the memory, it will much 
more memory than 64mib..





+entry = s->dirty_bitmap_directory;
+end = s->dirty_bitmap_directory + s->dirty_bitmap_directory_size;
+while (entry < end) {
+

Re: [Qemu-devel] [PATCH v6 7/8] hw/arm/sysbus-fdt: enable amd-xgbe dynamic instantiation

2016-02-16 Thread Peter Maydell

On 1 February 2016 at 13:51, Eric Auger  wrote:
> This patch allows the instantiation of the vfio-amd-xgbe device
> from the QEMU command line (-device vfio-amd-xgbe,host="").
>
> The guest is exposed with a device tree node that combines the description
> of both XGBE and PHY (representation supported from 4.2 onwards kernel):
> Documentation/devicetree/bindings/net/amd-xgbe.txt.
>
> There are 5 register regions, 6 interrupts including 4 optional
> edge-sensitive per-channel interrupts.
>
> Some property values are inherited from host device tree. Host device tree
> must feature a combined XGBE/PHY representation (>= 4.2 host kernel).
>
> 2 clock nodes (dma and ptp) also are created. It is checked those clocks
> are fixed on host side.
>
> AMD XGBE node creation function has a dependency on vfio Linux header and
> more generally node creation function for VFIO platform devices only make
> sense with CONFIG_LINUX so let's protect this code with #ifdef CONFIG_LINUX.
>
> Signed-off-by: Eric Auger 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v6 0/8] AMD XGBE KVM platform passthrough

2016-02-16 Thread Peter Maydell

On 1 February 2016 at 13:51, Eric Auger  wrote:
> This series allows to set up AMD XGBE passthrough. This was tested on AMD
> Seattle.
>
> The first upstreamed device supporting KVM platform passthrough was the
> Calxeda Midway XGMAC. Compared to this latter, the XGBE XGMAC exposes a
> much more complex device tree node.
>
> - First There are 2 device tree node formats:
> one where XGBE and PHY are described in separate nodes and another one
> that combines both description in a single node (only supported by 4.2
> onwards kernels). Only the combined description is supported for passthrough,
> meaning the host must be >= 4.2 and must feature a device tree with a combined
> description. The guest will also be exposed with a combined description,
> meaning only >= 4.2 guest are supported. It is not planned to support
> separate node representation since assignment of the PHY is less
> straigtforward.
>
> - the XGMAC/PHY node depends on 2 clock nodes (DMA and PTP).
> The code checks those clocks are fixed to make sure they cannot be
> switched off at some point after the native driver gets unbound.
>
> - there are many property values to populate on guest side. Most of them
> cannot be hardcoded. That series implements host device tree blob extraction
> from the host /proc/device-tree (inspired from dtc implementation)
> and retrieve host property values to populate guest dtb.
>
> - the case where the host uses ACPI is not yet covered since there is
>   no usable ACPI description for this HW yet.
>
> The patches can be found at
> https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-xgbe-v6
>
> Previous versions can be found at
> https://git.linaro.org/people/eric.auger/qemu.git/shortlog/refs/heads/v2.5.0-xgbe-v

I think you have review on everything in this series now, but I'm assuming
this is going to go via the vfio tree (or at any rate not via target-arm).
Let me know if that's wrong.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v6 8/8] hw/arm/sysbus-fdt: remove qemu_fdt_setprop returned value check

2016-02-16 Thread Peter Maydell

On 1 February 2016 at 13:51, Eric Auger  wrote:
> qemu_fdt_setprop asserts in case of error hence no need to check
> the returned value.
>
> Signed-off-by: Eric Auger 
>

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v6 3/8] device_tree: introduce qemu_fdt_node_path

2016-02-16 Thread Peter Maydell

On 1 February 2016 at 13:51, Eric Auger  wrote:
> This new helper routine returns a NULL terminated array of
> node paths matching a node name and a compat string.
>
> Signed-off-by: Eric Auger 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [PATCH v6 2/8] device_tree: introduce load_device_tree_from_sysfs

2016-02-16 Thread Peter Maydell

On 1 February 2016 at 13:51, Eric Auger  wrote:
> This function returns the host device tree blob from sysfs
> (/proc/device-tree). It uses a recursive function inspired
> from dtc read_fstree.
>
> Signed-off-by: Eric Auger 

Reviewed-by: Peter Maydell 

thanks
-- PMM

[Qemu-devel] [Bug 1490611] Re: Using qemu >=2.2.1 to convert raw->VHD (fixed) adds extra padding to the result file, which Microsoft Azure rejects as invalid

2016-02-16 Thread Stephen A. Zarkos

If you create a subformat option I would humbly recommend focusing on
Hyper-V and Azure compat, and then create an option that enforces the
legacy behavior with your patch.  This is essentially what some Linux
distros have started to do.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1490611

Title:
  Using qemu >=2.2.1 to convert raw->VHD (fixed) adds extra padding to
  the result file, which Microsoft Azure rejects as invalid

Status in QEMU:
  Fix Released

Bug description:
  Starting with a raw disk image, using "qemu-img convert" to convert
  from raw to VHD results in the output VHD file's virtual size being
  aligned to the nearest 516096 bytes (16 heads x 63 sectors per head x
  512 bytes per sector), instead of preserving the input file's size as
  the output VHD's virtual disk size.

  Microsoft Azure requires that disk images (VHDs) submitted for upload
  have virtual sizes aligned to a megabyte boundary. (Ex. 4096MB,
  4097MB, 4098MB, etc. are OK, 4096.5MB is rejected with an error.) This
  is reflected in Microsoft's documentation: https://azure.microsoft.com
  /en-us/documentation/articles/virtual-machines-linux-create-upload-
  vhd-generic/

  This is reproducible with the following set of commands (including the
  Azure command line tools from https://github.com/Azure/azure-xplat-
  cli). For the following example, I used qemu version 2.2.1:

  $ dd if=/dev/zero of=source-disk.img bs=1M count=4096

  $ stat source-disk.img 
File: ‘source-disk.img’
Size: 4294967296  Blocks: 798656 IO Block: 4096   regular file
  Device: fc01h/64513dInode: 13247963Links: 1
  Access: (0644/-rw-r--r--)  Uid: ( 1000/  smkent)   Gid: ( 1000/  smkent)
  Access: 2015-08-18 09:48:02.613988480 -0700
  Modify: 2015-08-18 09:48:02.825985646 -0700
  Change: 2015-08-18 09:48:02.825985646 -0700
   Birth: -

  $ qemu-img convert -f raw -o subformat=fixed -O vpc source-disk.img
  dest-disk.vhd

  $ stat dest-disk.vhd 
File: ‘dest-disk.vhd’
Size: 4296499712  Blocks: 535216 IO Block: 4096   regular file
  Device: fc01h/64513dInode: 13247964Links: 1
  Access: (0644/-rw-r--r--)  Uid: ( 1000/  smkent)   Gid: ( 1000/  smkent)
  Access: 2015-08-18 09:50:22.252077624 -0700
  Modify: 2015-08-18 09:49:24.424868868 -0700
  Change: 2015-08-18 09:49:24.424868868 -0700
   Birth: -

  $ azure vm image create testimage1 dest-disk.vhd -o linux -l "West US"
  info:Executing command vm image create
  + Retrieving storage accounts 
 
  info:VHD size : 4097 MB
  info:Uploading 4195800.5 KB
  Requested:100.0% Completed:100.0% Running:   0 Time: 1m 0s Speed:  6744 KB/s 
  info:https://[redacted].blob.core.windows.net/vm-images/dest-disk.vhd was 
uploaded successfully
  error:   The VHD 
https://[redacted].blob.core.windows.net/vm-images/dest-disk.vhd has an 
unsupported virtual size of 4296499200 bytes.  The size must be a whole number 
(in MBs).
  info:Error information has been recorded to /home/smkent/.azure/azure.err
  error:   vm image create command failed

  I also ran the above commands using qemu 2.4.0, which resulted in the
  same error as the conversion behavior is the same.

  However, qemu 2.1.1 and earlier (including qemu 2.0.0 installed by
  Ubuntu 14.04) does not pad the virtual disk size during conversion.
  Using qemu-img convert from qemu versions <=2.1.1 results in a VHD
  that is exactly the size of the raw input file plus 512 bytes (for the
  VHD footer). Those qemu versions do not attempt to realign the disk.
  As a result, Azure accepts VHD files created using those versions of
  qemu-img convert for upload.

  Is there a reason why newer qemu realigns the converted VHD file? It
  would be useful if an option were added to disable this feature, as
  current versions of qemu cannot be used to create VHD files for Azure
  using Microsoft's official instructions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1490611/+subscriptions

Re: [Qemu-devel] [PULL 00/28] Bug fixes + NBD-over-TLS support patches for 2016-02-16

2016-02-16 Thread Peter Maydell

On 16 February 2016 at 16:34, Paolo Bonzini  wrote:
> The following changes since commit 80b5d6bfc1280fa06e2514a414690c0e5b4b514b:
>
>   Merge remote-tracking branch 'remotes/rth/tags/pull-i386-20160215' into 
> staging (2016-02-15 11:45:11 +)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to ddffee3904828f11d596a13bd3c8960d747c66d8:
>
>   nbd: enable use of TLS with nbd-server-start command (2016-02-16 17:17:49 
> +0100)
>
> 
> * Coverity fixes for IPMI and mptsas
> * qemu-char fixes from Daniel and Marc-André
> * Bug fixes that break qemu-iotests
> * Changes to fix reset from panicked state
> * checkpatch false positives for designated initializers
> * TLS support in the NBD servers and clients
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [RFC v2 0/8] KVM PCI/MSI passthrough with mach-virt

2016-02-16 Thread Peter Maydell

On 29 January 2016 at 16:53, Eric Auger  wrote:
> This series enables KVM PCI/MSI passthrough with mach-virt.
>
> A new memory region type is introduced (reserved iova). On
> vfio_listener_region_add this IOVA region is registered to the kernel with
> VFIO_IOMMU_MAP_DMA (using the new VFIO_DMA_MAP_FLAG_MSI_RESERVED_IOVA flag).
>
> The host VFIO PCI driver then can use this IOVA window to map some host
> physical addresses, accessed by passthrough'ed PCI devices, through the IOMMU.
> The first goal is to map host MSI controller frames (GICv2M, GITS_TRANSLATER).
>
> mach-virt currently instantiates a 16x64kB reserved IOVA window. This
> provisions for future usage. Most probably this exceeds MSI binding needs.
> To avoid wasting guest PA, we now map the reserved region onto the
> platform bus MMIO.
>
> The series includes Pranav/Tushar' series:
> QEMU, [v2 0/2] Generic PCIe host bridge INTx determination for INTx routing
> ((https://lists.nongnu.org/archive/html/qemu-devel/2015-04/msg04361.html))
>
> Those patches are not mandated for PCI/MSI passthrough to work but without
> those, the following warning is observed and can puzzle the end-user:
> "qemu-system-aarch64: PCI: Bug - unimplemented PCI INTx routing 
> (gpex-pcihost)"

I've replied with comments about the parts I care about; I'm leaving
the rest of the review to others (VFIO related, etc).

thanks
-- PMM

Re: [Qemu-devel] [RFC v2 7/8] hw: arm: virt: register reserved IOVA region

2016-02-16 Thread Peter Maydell

On 29 January 2016 at 16:53, Eric Auger  wrote:
> Registers a 16x64kB reserved iova region. Currently this iova
> region is used by the host kernel to map host MSI controller frames
> (GICv2m, GITS_TRANSLATER). The host kernel needs this iova window
> since it cannot program the PCIe device with MSI frame physical
> address (as opposed to x86) since the MSI write transactions go
> through the IOMMU.
>
> The reserved region is mapped on the platform bus.

I guess that keeps it neatly out of the way of everybody else :-)

> Signed-off-by: Eric Auger 
>
> ---
>
> RFC v1 -> RFC v2:
> - use the platform bus to map the reserved iova region
> ---
>  hw/arm/virt.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 3839c68..4b2a891 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -805,7 +805,7 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
> uint32_t gic_phandle,
>  }
>
>  static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
> -bool use_highmem)
> +bool use_highmem, MemoryRegion **reserved_reg)
>  {
>  hwaddr base_mmio = vbi->memmap[VIRT_PCIE_MMIO].base;
>  hwaddr size_mmio = vbi->memmap[VIRT_PCIE_MMIO].size;
> @@ -920,10 +920,16 @@ static void create_pcie(const VirtBoardInfo *vbi, 
> qemu_irq *pic,
>  qemu_fdt_setprop_cell(vbi->fdt, nodename, "#interrupt-cells", 1);
>  create_pcie_irq_map(vbi, vbi->gic_phandle, irq, nodename);
>
> +/* initialize the reserved iova region for MSI binding (16 x 64kb) */
> +*reserved_reg = g_new0(MemoryRegion, 1);
> +memory_region_init_reserved_iova(*reserved_reg, OBJECT(dev),
> + "reserved-iova",
> + 0x10, _fatal);

So the only reason this is here is because we need to have a pointer to
the PCIe controller DeviceState, right? I think it would be better to
make create_pcie() return the DeviceState* instead of NULL. Then you
can either (a) pass the pcie controller pointer into create_platform_bus()
and have that create and map the reserved iova region, or (b) have a
separate function to create the reserved iova region. In any case I
think it fits more naturally with the rest of the platform bus code
rather than in the PCIe controller creation function.

thanks
-- PMM

Re: [Qemu-devel] [RFC v2 6/8] hw: platform-bus: enable to map any memory region onto the platform-bus

2016-02-16 Thread Peter Maydell

On 29 January 2016 at 16:53, Eric Auger  wrote:
> The platform bus currently is used to map dynamically instantiable
> platform device MMIO regions. The platform bus also can be seen as a
> pool of free guest physical addresses. We would like to use that pool
> to allocate a contiguous reserved IOVA region usable for MSI message
> address IOMMU mapping.
>
> This patch introduces platform_bus_map_region which enables to map any
> memory region onto the platform bus.
>
> Signed-off-by: Eric Auger 
> ---
>  hw/core/platform-bus.c| 26 --
>  include/hw/platform-bus.h |  7 +++
>  2 files changed, 23 insertions(+), 10 deletions(-)
>
> diff --git a/hw/core/platform-bus.c b/hw/core/platform-bus.c
> index aa55d01..7d0f5e0 100644
> --- a/hw/core/platform-bus.c
> +++ b/hw/core/platform-bus.c
> @@ -128,16 +128,14 @@ static void platform_bus_map_irq(PlatformBusDevice 
> *pbus, SysBusDevice *sbdev,
>  sysbus_connect_irq(sbdev, n, pbus->irqs[irqn]);
>  }
>
> -static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice 
> *sbdev,
> -  int n)
> +void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr)
>  {
> -MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
> -uint64_t size = memory_region_size(sbdev_mr);
> +uint64_t size = memory_region_size(mr);
>  uint64_t alignment = (1ULL << (63 - clz64(size + size - 1)));
>  uint64_t off;
>  bool found_region = false;
>
> -if (memory_region_is_mapped(sbdev_mr)) {
> +if (memory_region_is_mapped(mr)) {
>  /* Region is already mapped, nothing to do */
>  return;
>  }
> @@ -154,13 +152,21 @@ static void platform_bus_map_mmio(PlatformBusDevice 
> *pbus, SysBusDevice *sbdev,
>  }
>
>  if (!found_region) {
> -error_report("Platform Bus: Can not fit MMIO region of size %"PRIx64,
> - size);
> -exit(1);
> +error_setg(_fatal,
> +   "Platform Bus: Can not fit region %s of size %"PRIx64,
> +   mr->name, size);
>  }
>
> -/* Map the device's region into our Platform Bus MMIO space */
> -memory_region_add_subregion(>mmio, off, sbdev_mr);
> +/* Map the region into our Platform Bus MMIO space */
> +memory_region_add_subregion(>mmio, off, mr);
> +}
> +
> +static void platform_bus_map_mmio(PlatformBusDevice *pbus, SysBusDevice 
> *sbdev,
> +  int n)
> +{
> +MemoryRegion *sbdev_mr = sysbus_mmio_get_region(sbdev, n);
> +
> +platform_bus_map_region(pbus, sbdev_mr);
>  }
>
>  /*
> diff --git a/include/hw/platform-bus.h b/include/hw/platform-bus.h
> index bd42b83..ee19674 100644
> --- a/include/hw/platform-bus.h
> +++ b/include/hw/platform-bus.h
> @@ -54,4 +54,11 @@ int platform_bus_get_irqn(PlatformBusDevice *platform_bus, 
> SysBusDevice *sbdev,
>  hwaddr platform_bus_get_mmio_addr(PlatformBusDevice *pbus, SysBusDevice 
> *sbdev,
>int n);
>
> +/**
> + * platform_bus_map_region: map a region into the platform bus

s/region/MemoryRegion/

> + * @pbus: platform bus handle
> + * @mr: memory region handle
> + */
> +void platform_bus_map_region(PlatformBusDevice *pbus, MemoryRegion *mr);
> +
>  #endif /* !HW_PLATFORM_BUS_H */
> --
> 1.9.1

otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Qemu-devel] [RFC v2 3/8] Generic PCIe host bridge INTx determination for INTx routing

2016-02-16 Thread Peter Maydell

On 29 January 2016 at 16:53, Eric Auger  wrote:
> This patch stores information about assigned legacy interrupt numbers in
> GPEX host structure.
> This is used during GPEX INTx number determination from a pin during
> INTx routing.
>
> Signed-off-by: Pranavkumar Sawargaonkar 
> Signed-off-by: Tushar Jagad 
> ---
>  hw/arm/virt.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 15658f4..3839c68 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -826,6 +826,7 @@ static void create_pcie(const VirtBoardInfo *vbi, 
> qemu_irq *pic,
>  char *nodename;
>  int i;
>  PCIHostState *pci;
> +GPEXHost *s;
>
>  dev = qdev_create(NULL, TYPE_GPEX_HOST);
>  qdev_init_nofail(dev);
> @@ -861,8 +862,11 @@ static void create_pcie(const VirtBoardInfo *vbi, 
> qemu_irq *pic,
>  /* Map IO port space */
>  sysbus_mmio_map(SYS_BUS_DEVICE(dev), 2, base_pio);
>
> +s = GPEX_HOST(dev);
> +
>  for (i = 0; i < GPEX_NUM_IRQS; i++) {
>  sysbus_connect_irq(SYS_BUS_DEVICE(dev), i, pic[irq + i]);
> +s->irq_num[i] = irq + i;
>  }

I don't think that the board code should be prodding stuff in the GPEXHost
struct like this -- device structs are supposed to be private to the
device implementation. If you need the information in the device then
you need to come up with a better API for this.

thanks
-- PMM

[Qemu-devel] [PATCH v3 07/10] hw/sd/pxa2xx_mmci: convert to SysBusDevice object

2016-02-16 Thread Peter Maydell

Convert the pxa2xx_mmci device to be a sysbus device.

In this commit we only change the device itself, and leave
the interface to the SD card using the old non-SDBus APIs.

Signed-off-by: Peter Maydell 
Reviewed-by: Peter Crosthwaite 
---
 hw/sd/pxa2xx_mmci.c | 74 -
 1 file changed, 56 insertions(+), 18 deletions(-)

diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c
index 81da7b7..cadbba3 100644
--- a/hw/sd/pxa2xx_mmci.c
+++ b/hw/sd/pxa2xx_mmci.c
@@ -12,16 +12,24 @@
 
 #include "qemu/osdep.h"
 #include "hw/hw.h"
+#include "hw/sysbus.h"
 #include "hw/arm/pxa.h"
 #include "hw/sd/sd.h"
 #include "hw/qdev.h"
+#include "hw/qdev-properties.h"
+
+#define TYPE_PXA2XX_MMCI "pxa2xx-mmci"
+#define PXA2XX_MMCI(obj) OBJECT_CHECK(PXA2xxMMCIState, (obj), TYPE_PXA2XX_MMCI)
+
+typedef struct PXA2xxMMCIState {
+SysBusDevice parent_obj;
 
-struct PXA2xxMMCIState {
 MemoryRegion iomem;
 qemu_irq irq;
 qemu_irq rx_dma;
 qemu_irq tx_dma;
 
+BlockBackend *blk;
 SDState *card;
 
 uint32_t status;
@@ -49,7 +57,7 @@ struct PXA2xxMMCIState {
 int resp_len;
 
 int cmdreq;
-};
+} PXA2xxMMCIState;
 
 #define MMC_STRPCL 0x00/* MMC Clock Start/Stop register */
 #define MMC_STAT   0x04/* MMC Status register */
@@ -475,31 +483,61 @@ PXA2xxMMCIState *pxa2xx_mmci_init(MemoryRegion *sysmem,
 BlockBackend *blk, qemu_irq irq,
 qemu_irq rx_dma, qemu_irq tx_dma)
 {
+DeviceState *dev;
+SysBusDevice *sbd;
 PXA2xxMMCIState *s;
 
-s = (PXA2xxMMCIState *) g_malloc0(sizeof(PXA2xxMMCIState));
-s->irq = irq;
-s->rx_dma = rx_dma;
-s->tx_dma = tx_dma;
-
-memory_region_init_io(>iomem, NULL, _mmci_ops, s,
-  "pxa2xx-mmci", 0x0010);
-memory_region_add_subregion(sysmem, base, >iomem);
-
-/* Instantiate the actual storage */
-s->card = sd_init(blk, false);
+dev = qdev_create(NULL, TYPE_PXA2XX_MMCI);
+s = PXA2XX_MMCI(dev);
+/* Reach into the device and initialize the SD card. This is
+ * unclean but will vanish when we update to SDBus APIs.
+ */
+s->card = sd_init(s->blk, false);
 if (s->card == NULL) {
 exit(1);
 }
-
-register_savevm(NULL, "pxa2xx_mmci", 0, 0,
-pxa2xx_mmci_save, pxa2xx_mmci_load, s);
-
+qdev_init_nofail(dev);
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_mmio_map(sbd, 0, base);
+sysbus_connect_irq(sbd, 0, irq);
+qdev_connect_gpio_out_named(dev, "rx-dma", 0, rx_dma);
+qdev_connect_gpio_out_named(dev, "tx-dma", 0, tx_dma);
 return s;
 }
 
 void pxa2xx_mmci_handlers(PXA2xxMMCIState *s, qemu_irq readonly,
-qemu_irq coverswitch)
+  qemu_irq coverswitch)
 {
 sd_set_cb(s->card, readonly, coverswitch);
 }
+
+static void pxa2xx_mmci_instance_init(Object *obj)
+{
+PXA2xxMMCIState *s = PXA2XX_MMCI(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+DeviceState *dev = DEVICE(obj);
+
+memory_region_init_io(>iomem, obj, _mmci_ops, s,
+  "pxa2xx-mmci", 0x0010);
+sysbus_init_mmio(sbd, >iomem);
+sysbus_init_irq(sbd, >irq);
+qdev_init_gpio_out_named(dev, >rx_dma, "rx-dma", 1);
+qdev_init_gpio_out_named(dev, >tx_dma, "tx-dma", 1);
+
+register_savevm(NULL, "pxa2xx_mmci", 0, 0,
+pxa2xx_mmci_save, pxa2xx_mmci_load, s);
+}
+
+static const TypeInfo pxa2xx_mmci_info = {
+.name = TYPE_PXA2XX_MMCI,
+.parent = TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(PXA2xxMMCIState),
+.instance_init = pxa2xx_mmci_instance_init,
+};
+
+static void pxa2xx_mmci_register_types(void)
+{
+type_register_static(_mmci_info);
+}
+
+type_init(pxa2xx_mmci_register_types)
-- 
1.9.1

[Qemu-devel] [PATCH v3 05/10] hw/sd/sdhci.c: Update to use SDBus APIs

2016-02-16 Thread Peter Maydell

Update the SDHCI code to use the new SDBus APIs.

This commit introduces the new command line options required
to connect a disk to sdhci-pci:

 -device sdhci-pci -drive id=mydrive,[...] -device sd,drive=mydrive

Signed-off-by: Peter Maydell 
---
 hw/sd/sdhci.c | 97 ---
 include/hw/sd/sdhci.h |  3 +-
 2 files changed, 69 insertions(+), 31 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 3d1eb85..396dd10 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -55,6 +55,9 @@
 } \
 } while (0)
 
+#define TYPE_SDHCI_BUS "sdhci-bus"
+#define SDHCI_BUS(obj) OBJECT_CHECK(SDBus, (obj), TYPE_SDHCI_BUS)
+
 /* Default SD/MMC host controller features information, which will be
  * presented in CAPABILITIES register of generic SD host controller at reset.
  * If not stated otherwise:
@@ -145,9 +148,9 @@ static void sdhci_raise_insertion_irq(void *opaque)
 }
 }
 
-static void sdhci_insert_eject_cb(void *opaque, int irq, int level)
+static void sdhci_set_inserted(DeviceState *dev, bool level)
 {
-SDHCIState *s = (SDHCIState *)opaque;
+SDHCIState *s = (SDHCIState *)dev;
 DPRINT_L1("Card state changed: %s!\n", level ? "insert" : "eject");
 
 if ((s->norintsts & SDHC_NIS_REMOVE) && level) {
@@ -172,9 +175,9 @@ static void sdhci_insert_eject_cb(void *opaque, int irq, 
int level)
 }
 }
 
-static void sdhci_card_readonly_cb(void *opaque, int irq, int level)
+static void sdhci_set_readonly(DeviceState *dev, bool level)
 {
-SDHCIState *s = (SDHCIState *)opaque;
+SDHCIState *s = (SDHCIState *)dev;
 
 if (level) {
 s->prnsts &= ~SDHC_WRITE_PROTECT;
@@ -186,6 +189,8 @@ static void sdhci_card_readonly_cb(void *opaque, int irq, 
int level)
 
 static void sdhci_reset(SDHCIState *s)
 {
+DeviceState *dev = DEVICE(s);
+
 timer_del(s->insert_timer);
 timer_del(s->transfer_timer);
 /* Set all registers to 0. Capabilities registers are not cleared
@@ -194,8 +199,11 @@ static void sdhci_reset(SDHCIState *s)
 memset(>sdmasysad, 0, (uintptr_t)>capareg - 
(uintptr_t)>sdmasysad);
 
 if (!s->noeject_quirk) {
-sd_set_cb(s->card, s->ro_cb, s->eject_cb);
+/* Reset other state based on current card insertion/readonly status */
+sdhci_set_inserted(dev, sdbus_get_inserted(>sdbus));
+sdhci_set_readonly(dev, sdbus_get_readonly(>sdbus));
 }
+
 s->data_count = 0;
 s->stopped_state = sdhc_not_stopped;
 }
@@ -213,7 +221,7 @@ static void sdhci_send_command(SDHCIState *s)
 request.cmd = s->cmdreg >> 8;
 request.arg = s->argument;
 DPRINT_L1("sending CMD%u ARG[0x%08x]\n", request.cmd, request.arg);
-rlen = sd_do_command(s->card, , response);
+rlen = sdbus_do_command(>sdbus, , response);
 
 if (s->cmdreg & SDHC_CMD_RESPONSE) {
 if (rlen == 4) {
@@ -269,7 +277,7 @@ static void sdhci_end_transfer(SDHCIState *s)
 request.cmd = 0x0C;
 request.arg = 0;
 DPRINT_L1("Automatically issue CMD%d %08x\n", request.cmd, 
request.arg);
-sd_do_command(s->card, , response);
+sdbus_do_command(>sdbus, , response);
 /* Auto CMD12 response goes to the upper Response register */
 s->rspreg[3] = (response[0] << 24) | (response[1] << 16) |
 (response[2] << 8) | response[3];
@@ -301,7 +309,7 @@ static void sdhci_read_block_from_card(SDHCIState *s)
 }
 
 for (index = 0; index < (s->blksize & 0x0fff); index++) {
-s->fifo_buffer[index] = sd_read_data(s->card);
+s->fifo_buffer[index] = sdbus_read_data(>sdbus);
 }
 
 /* New data now available for READ through Buffer Port Register */
@@ -394,7 +402,7 @@ static void sdhci_write_block_to_card(SDHCIState *s)
 }
 
 for (index = 0; index < (s->blksize & 0x0fff); index++) {
-sd_write_data(s->card, s->fifo_buffer[index]);
+sdbus_write_data(>sdbus, s->fifo_buffer[index]);
 }
 
 /* Next data can be written through BUFFER DATORT register */
@@ -476,7 +484,7 @@ static void sdhci_sdma_transfer_multi_blocks(SDHCIState *s)
 while (s->blkcnt) {
 if (s->data_count == 0) {
 for (n = 0; n < block_size; n++) {
-s->fifo_buffer[n] = sd_read_data(s->card);
+s->fifo_buffer[n] = sdbus_read_data(>sdbus);
 }
 }
 begin = s->data_count;
@@ -517,7 +525,7 @@ static void sdhci_sdma_transfer_multi_blocks(SDHCIState *s)
 s->sdmasysad += s->data_count - begin;
 if (s->data_count == block_size) {
 for (n = 0; n < block_size; n++) {
-sd_write_data(s->card, s->fifo_buffer[n]);
+sdbus_write_data(>sdbus, s->fifo_buffer[n]);
 }
 s->data_count = 0;
 if (s->trnmod & SDHC_TRNS_BLK_CNT_EN) {
@@ -549,7 +557,7 @@ static void sdhci_sdma_transfer_single_block(SDHCIState *s)

[Qemu-devel] [PATCH v3 01/10] hw/sd/sdhci.c: Remove x-drive property

2016-02-16 Thread Peter Maydell

The following commits will remove support for the old sdhci-pci
command line syntax using the x-drive property:
 -device sdhci-pci,x-drive=mydrive -drive id=mydrive,[...]
and replace it with an explicit sd device:
 -device sdhci-pci -drive id=mydrive,[...] -device sd,drive=mydrive

(This is OK because x-drive is experimental.)

This commit removes the x-drive property so that old style
command lines will fail with a reasonable error message:
  -device sdhci-pci,x-drive=mydrive: Property '.x-drive' not found

Signed-off-by: Peter Maydell 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
---
 hw/sd/sdhci.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 30e3bf4..3d1eb85 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1220,12 +1220,6 @@ const VMStateDescription sdhci_vmstate = {
 /* Capabilities registers provide information on supported features of this
  * specific host controller implementation */
 static Property sdhci_pci_properties[] = {
-/*
- * We currently fuse controller and card into a single device
- * model, but we intend to separate them.  For that purpose, the
- * properties that belong to the card are marked as experimental.
- */
-DEFINE_PROP_DRIVE("x-drive", SDHCIState, blk),
 DEFINE_PROP_UINT32("capareg", SDHCIState, capareg,
 SDHC_CAPAB_REG_DEFAULT),
 DEFINE_PROP_UINT32("maxcurr", SDHCIState, maxcurr, 0),
-- 
1.9.1

[Qemu-devel] [PATCH v3 06/10] sdhci_sysbus: Create SD card device in users, not the device itself

2016-02-16 Thread Peter Maydell

Move the creation of the SD card device from the sdhci_sysbus
device itself into the boards that create these devices.
This allows us to remove the cannot_instantiate_with_device_add
notation because we no longer call drive_get_next in the device
model.

Signed-off-by: Peter Maydell 
---
 hw/arm/xilinx_zynq.c | 17 -
 hw/arm/xlnx-ep108.c  | 21 +
 hw/arm/xlnx-zynqmp.c |  8 
 hw/sd/sdhci.c| 25 -
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 66e7f27..a35983a 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -28,6 +28,7 @@
 #include "hw/misc/zynq-xadc.h"
 #include "hw/ssi/ssi.h"
 #include "qemu/error-report.h"
+#include "hw/sd/sd.h"
 
 #define NUM_SPI_FLASHES 4
 #define NUM_QSPI_FLASHES 2
@@ -154,8 +155,10 @@ static void zynq_init(MachineState *machine)
 MemoryRegion *address_space_mem = get_system_memory();
 MemoryRegion *ext_ram = g_new(MemoryRegion, 1);
 MemoryRegion *ocm_ram = g_new(MemoryRegion, 1);
-DeviceState *dev;
+DeviceState *dev, *carddev;
 SysBusDevice *busdev;
+DriveInfo *di;
+BlockBackend *blk;
 qemu_irq pic[64];
 int n;
 
@@ -245,11 +248,23 @@ static void zynq_init(MachineState *machine)
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0xE010);
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[56-IRQ_OFFSET]);
 
+di = drive_get_next(IF_SD);
+blk = di ? blk_by_legacy_dinfo(di) : NULL;
+carddev = qdev_create(qdev_get_child_bus(dev, "sd-bus"), TYPE_SD_CARD);
+qdev_prop_set_drive(carddev, "drive", blk, _fatal);
+object_property_set_bool(OBJECT(carddev), true, "realized", _fatal);
+
 dev = qdev_create(NULL, "generic-sdhci");
 qdev_init_nofail(dev);
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0xE0101000);
 sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, pic[79-IRQ_OFFSET]);
 
+di = drive_get_next(IF_SD);
+blk = di ? blk_by_legacy_dinfo(di) : NULL;
+carddev = qdev_create(qdev_get_child_bus(dev, "sd-bus"), TYPE_SD_CARD);
+qdev_prop_set_drive(carddev, "drive", blk, _fatal);
+object_property_set_bool(OBJECT(carddev), true, "realized", _fatal);
+
 dev = qdev_create(NULL, TYPE_ZYNQ_XADC);
 qdev_init_nofail(dev);
 sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, 0xF8007100);
diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
index 0de132a..a1bd283 100644
--- a/hw/arm/xlnx-ep108.c
+++ b/hw/arm/xlnx-ep108.c
@@ -59,6 +59,27 @@ static void xlnx_ep108_init(MachineState *machine)
 
 object_property_set_bool(OBJECT(>soc), true, "realized", _fatal);
 
+/* Create and plug in the SD cards */
+for (i = 0; i < XLNX_ZYNQMP_NUM_SDHCI; i++) {
+BusState *bus;
+DriveInfo *di = drive_get_next(IF_SD);
+BlockBackend *blk = di ? blk_by_legacy_dinfo(di) : NULL;
+DeviceState *carddev;
+char *bus_name;
+
+bus_name = g_strdup_printf("sd-bus%d", i);
+bus = qdev_get_child_bus(DEVICE(>soc), bus_name);
+g_free(bus_name);
+if (!bus) {
+error_report("No SD bus found for SD card %d", i);
+exit(1);
+}
+carddev = qdev_create(bus, TYPE_SD_CARD);
+qdev_prop_set_drive(carddev, "drive", blk, _fatal);
+object_property_set_bool(OBJECT(carddev), true, "realized",
+ _fatal);
+}
+
 for (i = 0; i < XLNX_ZYNQMP_NUM_SPIS; i++) {
 SSIBus *spi_bus;
 DeviceState *flash_dev;
diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 1508d08..4fbb635 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -327,6 +327,8 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 sysbus_connect_irq(SYS_BUS_DEVICE(>sata), 0, gic_spi[SATA_INTR]);
 
 for (i = 0; i < XLNX_ZYNQMP_NUM_SDHCI; i++) {
+char *bus_name;
+
 object_property_set_bool(OBJECT(>sdhci[i]), true,
  "realized", );
 if (err) {
@@ -337,6 +339,12 @@ static void xlnx_zynqmp_realize(DeviceState *dev, Error 
**errp)
 sdhci_addr[i]);
 sysbus_connect_irq(SYS_BUS_DEVICE(>sdhci[i]), 0,
gic_spi[sdhci_intr[i]]);
+/* Alias controller SD bus to the SoC itself */
+bus_name = g_strdup_printf("sd-bus%d", i);
+object_property_add_alias(OBJECT(s), bus_name,
+  OBJECT(>sdhci[i]), "sd-bus",
+  _abort);
+g_free(bus_name);
 }
 
 for (i = 0; i < XLNX_ZYNQMP_NUM_SPIS; i++) {
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 396dd10..73e7c87 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1296,29 +1296,6 @@ static void sdhci_sysbus_realize(DeviceState *dev, Error 
** errp)
 {
 SDHCIState *s = SYSBUS_SDHCI(dev);
 SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
-DriveInfo *di;
-BlockBackend *blk;
-

[Qemu-devel] [PATCH v3 08/10] hw/sd/pxa2xx_mmci: Update to use new SDBus APIs

2016-02-16 Thread Peter Maydell

Now the PXA2xx MMCI device is QOMified itself, we can
update it to use the SDBus APIs to talk to the SD card.

Signed-off-by: Peter Maydell 
---
 hw/sd/pxa2xx_mmci.c | 80 +++--
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c
index cadbba3..75986ea 100644
--- a/hw/sd/pxa2xx_mmci.c
+++ b/hw/sd/pxa2xx_mmci.c
@@ -17,10 +17,14 @@
 #include "hw/sd/sd.h"
 #include "hw/qdev.h"
 #include "hw/qdev-properties.h"
+#include "qemu/error-report.h"
 
 #define TYPE_PXA2XX_MMCI "pxa2xx-mmci"
 #define PXA2XX_MMCI(obj) OBJECT_CHECK(PXA2xxMMCIState, (obj), TYPE_PXA2XX_MMCI)
 
+#define TYPE_PXA2XX_MMCI_BUS "pxa2xx-mmci-bus"
+#define PXA2XX_MMCI_BUS(obj) OBJECT_CHECK(SDBus, (obj), TYPE_PXA2XX_MMCI_BUS)
+
 typedef struct PXA2xxMMCIState {
 SysBusDevice parent_obj;
 
@@ -28,9 +32,11 @@ typedef struct PXA2xxMMCIState {
 qemu_irq irq;
 qemu_irq rx_dma;
 qemu_irq tx_dma;
+qemu_irq inserted;
+qemu_irq readonly;
 
 BlockBackend *blk;
-SDState *card;
+SDBus sdbus;
 
 uint32_t status;
 uint32_t clkrt;
@@ -130,7 +136,7 @@ static void pxa2xx_mmci_fifo_update(PXA2xxMMCIState *s)
 
 if (s->cmdat & CMDAT_WR_RD) {
 while (s->bytesleft && s->tx_len) {
-sd_write_data(s->card, s->tx_fifo[s->tx_start ++]);
+sdbus_write_data(>sdbus, s->tx_fifo[s->tx_start++]);
 s->tx_start &= 0x1f;
 s->tx_len --;
 s->bytesleft --;
@@ -140,7 +146,7 @@ static void pxa2xx_mmci_fifo_update(PXA2xxMMCIState *s)
 } else
 while (s->bytesleft && s->rx_len < 32) {
 s->rx_fifo[(s->rx_start + (s->rx_len ++)) & 0x1f] =
-sd_read_data(s->card);
+sdbus_read_data(>sdbus);
 s->bytesleft --;
 s->intreq |= INT_RXFIFO_REQ;
 }
@@ -174,7 +180,7 @@ static void pxa2xx_mmci_wakequeues(PXA2xxMMCIState *s)
 request.arg = s->arg;
 request.crc = 0;   /* FIXME */
 
-rsplen = sd_do_command(s->card, , response);
+rsplen = sdbus_do_command(>sdbus, , response);
 s->intreq |= INT_END_CMD;
 
 memset(s->resp_fifo, 0, sizeof(s->resp_fifo));
@@ -483,32 +489,59 @@ PXA2xxMMCIState *pxa2xx_mmci_init(MemoryRegion *sysmem,
 BlockBackend *blk, qemu_irq irq,
 qemu_irq rx_dma, qemu_irq tx_dma)
 {
-DeviceState *dev;
+DeviceState *dev, *carddev;
 SysBusDevice *sbd;
 PXA2xxMMCIState *s;
+Error *err = NULL;
 
 dev = qdev_create(NULL, TYPE_PXA2XX_MMCI);
 s = PXA2XX_MMCI(dev);
-/* Reach into the device and initialize the SD card. This is
- * unclean but will vanish when we update to SDBus APIs.
- */
-s->card = sd_init(s->blk, false);
-if (s->card == NULL) {
-exit(1);
-}
-qdev_init_nofail(dev);
 sbd = SYS_BUS_DEVICE(dev);
 sysbus_mmio_map(sbd, 0, base);
 sysbus_connect_irq(sbd, 0, irq);
 qdev_connect_gpio_out_named(dev, "rx-dma", 0, rx_dma);
 qdev_connect_gpio_out_named(dev, "tx-dma", 0, tx_dma);
+
+/* Create and plug in the sd card */
+carddev = qdev_create(qdev_get_child_bus(dev, "sd-bus"), TYPE_SD_CARD);
+qdev_prop_set_drive(carddev, "drive", blk, );
+if (err) {
+error_report("failed to init SD card: %s", error_get_pretty(err));
+return NULL;
+}
+object_property_set_bool(OBJECT(carddev), true, "realized", );
+if (err) {
+error_report("failed to init SD card: %s", error_get_pretty(err));
+return NULL;
+}
+
 return s;
 }
 
+static void pxa2xx_mmci_set_inserted(DeviceState *dev, bool inserted)
+{
+PXA2xxMMCIState *s = PXA2XX_MMCI(dev);
+
+qemu_set_irq(s->inserted, inserted);
+}
+
+static void pxa2xx_mmci_set_readonly(DeviceState *dev, bool readonly)
+{
+PXA2xxMMCIState *s = PXA2XX_MMCI(dev);
+
+qemu_set_irq(s->readonly, readonly);
+}
+
 void pxa2xx_mmci_handlers(PXA2xxMMCIState *s, qemu_irq readonly,
   qemu_irq coverswitch)
 {
-sd_set_cb(s->card, readonly, coverswitch);
+DeviceState *dev = DEVICE(s);
+
+s->readonly = readonly;
+s->inserted = coverswitch;
+
+pxa2xx_mmci_set_inserted(dev, sdbus_get_inserted(>sdbus));
+pxa2xx_mmci_set_readonly(dev, sdbus_get_readonly(>sdbus));
 }
 
 static void pxa2xx_mmci_instance_init(Object *obj)
@@ -526,6 +559,17 @@ static void pxa2xx_mmci_instance_init(Object *obj)
 
 register_savevm(NULL, "pxa2xx_mmci", 0, 0,
 pxa2xx_mmci_save, pxa2xx_mmci_load, s);
+
+qbus_create_inplace(>sdbus, sizeof(s->sdbus),
+TYPE_PXA2XX_MMCI_BUS, DEVICE(obj), "sd-bus");
+}
+
+static void pxa2xx_mmci_bus_class_init(ObjectClass *klass, void *data)
+{
+SDBusClass *sbc = SD_BUS_CLASS(klass);
+
+sbc->set_inserted = pxa2xx_mmci_set_inserted;
+sbc->set_readonly = pxa2xx_mmci_set_readonly;
 }
 
 static const TypeInfo pxa2xx_mmci_info = {
@@ -535,9 +579,17

[Qemu-devel] [PATCH v3 13/14] block: Use bdrv_next() instead of bdrv_states

2016-02-16 Thread Max Reitz

There is no point in manually iterating through the bdrv_states list
when there is bdrv_next().

Signed-off-by: Max Reitz 
---
 block.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index f3597ac..69fc696 100644
--- a/block.c
+++ b/block.c
@@ -3303,10 +3303,10 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error 
**errp)
 
 void bdrv_invalidate_cache_all(Error **errp)
 {
-BlockDriverState *bs;
+BlockDriverState *bs = NULL;
 Error *local_err = NULL;
 
-QTAILQ_FOREACH(bs, _states, device_list) {
+while ((bs = bdrv_next(bs)) != NULL) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
@@ -3336,10 +3336,10 @@ static int bdrv_inactivate(BlockDriverState *bs)
 
 int bdrv_inactivate_all(void)
 {
-BlockDriverState *bs;
+BlockDriverState *bs = NULL;
 int ret;
 
-QTAILQ_FOREACH(bs, _states, device_list) {
+while ((bs = bdrv_next(bs)) != NULL) {
 AioContext *aio_context = bdrv_get_aio_context(bs);
 
 aio_context_acquire(aio_context);
@@ -4185,10 +4185,10 @@ bool bdrv_recurse_is_first_non_filter(BlockDriverState 
*bs,
  */
 bool bdrv_is_first_non_filter(BlockDriverState *candidate)
 {
-BlockDriverState *bs;
+BlockDriverState *bs = NULL;
 
 /* walk down the bs forest recursively */
-QTAILQ_FOREACH(bs, _states, device_list) {
+while ((bs = bdrv_next(bs)) != NULL) {
 bool perm;
 
 /* try to recurse in this top level bs */
-- 
2.7.1

[Qemu-devel] [PATCH v3 09/10] hw/sd/pxa2xx_mmci: Convert to VMStateDescription

2016-02-16 Thread Peter Maydell

Convert the pxa2xx_mmci device from manual save/load
functions to a VMStateDescription structure.

This is a migration compatibility break.

Signed-off-by: Peter Maydell 
Reviewed-by: Peter Crosthwaite 
---
 hw/sd/pxa2xx_mmci.c | 156 +---
 1 file changed, 64 insertions(+), 92 deletions(-)

diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c
index 75986ea..d9f5202 100644
--- a/hw/sd/pxa2xx_mmci.c
+++ b/hw/sd/pxa2xx_mmci.c
@@ -44,27 +44,72 @@ typedef struct PXA2xxMMCIState {
 uint32_t cmdat;
 uint32_t resp_tout;
 uint32_t read_tout;
-int blklen;
-int numblk;
+int32_t blklen;
+int32_t numblk;
 uint32_t intmask;
 uint32_t intreq;
-int cmd;
+int32_t cmd;
 uint32_t arg;
 
-int active;
-int bytesleft;
+int32_t active;
+int32_t bytesleft;
 uint8_t tx_fifo[64];
-int tx_start;
-int tx_len;
+uint32_t tx_start;
+uint32_t tx_len;
 uint8_t rx_fifo[32];
-int rx_start;
-int rx_len;
+uint32_t rx_start;
+uint32_t rx_len;
 uint16_t resp_fifo[9];
-int resp_len;
+uint32_t resp_len;
 
-int cmdreq;
+int32_t cmdreq;
 } PXA2xxMMCIState;
 
+static bool pxa2xx_mmci_vmstate_validate(void *opaque, int version_id)
+{
+PXA2xxMMCIState *s = opaque;
+
+return s->tx_start < ARRAY_SIZE(s->tx_fifo)
+&& s->rx_start < ARRAY_SIZE(s->rx_fifo)
+&& s->tx_len <= ARRAY_SIZE(s->tx_fifo)
+&& s->rx_len <= ARRAY_SIZE(s->rx_fifo)
+&& s->resp_len <= ARRAY_SIZE(s->resp_fifo);
+}
+
+
+static const VMStateDescription vmstate_pxa2xx_mmci = {
+.name = "pxa2xx-mmci",
+.version_id = 2,
+.minimum_version_id = 2,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(status, PXA2xxMMCIState),
+VMSTATE_UINT32(clkrt, PXA2xxMMCIState),
+VMSTATE_UINT32(spi, PXA2xxMMCIState),
+VMSTATE_UINT32(cmdat, PXA2xxMMCIState),
+VMSTATE_UINT32(resp_tout, PXA2xxMMCIState),
+VMSTATE_UINT32(read_tout, PXA2xxMMCIState),
+VMSTATE_INT32(blklen, PXA2xxMMCIState),
+VMSTATE_INT32(numblk, PXA2xxMMCIState),
+VMSTATE_UINT32(intmask, PXA2xxMMCIState),
+VMSTATE_UINT32(intreq, PXA2xxMMCIState),
+VMSTATE_INT32(cmd, PXA2xxMMCIState),
+VMSTATE_UINT32(arg, PXA2xxMMCIState),
+VMSTATE_INT32(cmdreq, PXA2xxMMCIState),
+VMSTATE_INT32(active, PXA2xxMMCIState),
+VMSTATE_INT32(bytesleft, PXA2xxMMCIState),
+VMSTATE_UINT32(tx_start, PXA2xxMMCIState),
+VMSTATE_UINT32(tx_len, PXA2xxMMCIState),
+VMSTATE_UINT32(rx_start, PXA2xxMMCIState),
+VMSTATE_UINT32(rx_len, PXA2xxMMCIState),
+VMSTATE_UINT32(resp_len, PXA2xxMMCIState),
+VMSTATE_VALIDATE("fifo size incorrect", pxa2xx_mmci_vmstate_validate),
+VMSTATE_UINT8_ARRAY(tx_fifo, PXA2xxMMCIState, 64),
+VMSTATE_UINT8_ARRAY(rx_fifo, PXA2xxMMCIState, 32),
+VMSTATE_UINT16_ARRAY(resp_fifo, PXA2xxMMCIState, 9),
+VMSTATE_END_OF_LIST()
+}
+};
+
 #define MMC_STRPCL 0x00/* MMC Clock Start/Stop register */
 #define MMC_STAT   0x04/* MMC Status register */
 #define MMC_CLKRT  0x08/* MMC Clock Rate register */
@@ -406,84 +451,6 @@ static const MemoryRegionOps pxa2xx_mmci_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static void pxa2xx_mmci_save(QEMUFile *f, void *opaque)
-{
-PXA2xxMMCIState *s = (PXA2xxMMCIState *) opaque;
-int i;
-
-qemu_put_be32s(f, >status);
-qemu_put_be32s(f, >clkrt);
-qemu_put_be32s(f, >spi);
-qemu_put_be32s(f, >cmdat);
-qemu_put_be32s(f, >resp_tout);
-qemu_put_be32s(f, >read_tout);
-qemu_put_be32(f, s->blklen);
-qemu_put_be32(f, s->numblk);
-qemu_put_be32s(f, >intmask);
-qemu_put_be32s(f, >intreq);
-qemu_put_be32(f, s->cmd);
-qemu_put_be32s(f, >arg);
-qemu_put_be32(f, s->cmdreq);
-qemu_put_be32(f, s->active);
-qemu_put_be32(f, s->bytesleft);
-
-qemu_put_byte(f, s->tx_len);
-for (i = 0; i < s->tx_len; i ++)
-qemu_put_byte(f, s->tx_fifo[(s->tx_start + i) & 63]);
-
-qemu_put_byte(f, s->rx_len);
-for (i = 0; i < s->rx_len; i ++)
-qemu_put_byte(f, s->rx_fifo[(s->rx_start + i) & 31]);
-
-qemu_put_byte(f, s->resp_len);
-for (i = s->resp_len; i < 9; i ++)
-qemu_put_be16s(f, >resp_fifo[i]);
-}
-
-static int pxa2xx_mmci_load(QEMUFile *f, void *opaque, int version_id)
-{
-PXA2xxMMCIState *s = (PXA2xxMMCIState *) opaque;
-int i;
-
-qemu_get_be32s(f, >status);
-qemu_get_be32s(f, >clkrt);
-qemu_get_be32s(f, >spi);
-qemu_get_be32s(f, >cmdat);
-qemu_get_be32s(f, >resp_tout);
-qemu_get_be32s(f, >read_tout);
-s->blklen = qemu_get_be32(f);
-s->numblk = qemu_get_be32(f);
-qemu_get_be32s(f, >intmask);
-qemu_get_be32s(f, >intreq);
-s->cmd = qemu_get_be32(f);
-qemu_get_be32s(f, >arg);
-s->cmdreq =

[Qemu-devel] [PATCH v3 14/14] block: Remove bdrv_states list

2016-02-16 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 31 +++
 blockdev.c|  7 ---
 include/block/block.h |  1 -
 include/block/block_int.h |  4 
 4 files changed, 3 insertions(+), 40 deletions(-)

diff --git a/block.c b/block.c
index 69fc696..95a7b61 100644
--- a/block.c
+++ b/block.c
@@ -72,8 +72,6 @@ struct BdrvDirtyBitmap {
 
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
-struct BdrvStates bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states);
-
 static QTAILQ_HEAD(, BlockDriverState) graph_bdrv_states =
 QTAILQ_HEAD_INITIALIZER(graph_bdrv_states);
 
@@ -246,10 +244,7 @@ void bdrv_register(BlockDriver *bdrv)
 
 BlockDriverState *bdrv_new_root(void)
 {
-BlockDriverState *bs = bdrv_new();
-
-QTAILQ_INSERT_TAIL(_states, bs, device_list);
-return bs;
+return bdrv_new();
 }
 
 BlockDriverState *bdrv_new(void)
@@ -2240,26 +2235,10 @@ void bdrv_close_all(void)
 }
 }
 
-/* Note that bs->device_list.tqe_prev is initially null,
- * and gets set to non-null by QTAILQ_INSERT_TAIL().  Establish
- * the useful invariant "bs in bdrv_states iff bs->tqe_prev" by
- * resetting it to null on remove.  */
-void bdrv_device_remove(BlockDriverState *bs)
-{
-QTAILQ_REMOVE(_states, bs, device_list);
-bs->device_list.tqe_prev = NULL;
-}
-
-/* make a BlockDriverState anonymous by removing from bdrv_state and
- * graph_bdrv_state list.
-   Also, NULL terminate the device_name to prevent double remove */
+/* make a BlockDriverState anonymous by removing from graph_bdrv_state list.
+ * Also, NULL terminate the device_name to prevent double remove */
 void bdrv_make_anon(BlockDriverState *bs)
 {
-/* Take care to remove bs from bdrv_states only when it's actually
- * in it. */
-if (bs->device_list.tqe_prev) {
-bdrv_device_remove(bs);
-}
 if (bs->node_name[0] != '\0') {
 QTAILQ_REMOVE(_bdrv_states, bs, node_list);
 }
@@ -2296,10 +2275,6 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 }
 if (from->blk) {
 blk_set_bs(from->blk, to);
-if (!to->device_list.tqe_prev) {
-QTAILQ_INSERT_BEFORE(from, to, device_list);
-}
-bdrv_device_remove(from);
 }
 }
 
diff --git a/blockdev.c b/blockdev.c
index 6370490..b8ebdda 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2417,11 +2417,6 @@ void qmp_x_blockdev_remove_medium(const char *device, 
Error **errp)
 goto out;
 }
 
-/* This follows the convention established by bdrv_make_anon() */
-if (bs->device_list.tqe_prev) {
-bdrv_device_remove(bs);
-}
-
 blk_remove_bs(blk);
 
 if (!blk_dev_has_tray(blk)) {
@@ -2469,8 +2464,6 @@ static void qmp_blockdev_insert_anon_medium(const char 
*device,
 
 blk_insert_bs(blk, bs);
 
-QTAILQ_INSERT_TAIL(_states, bs, device_list);
-
 if (!blk_dev_has_tray(blk)) {
 /* For tray-less devices, blockdev-close-tray is a no-op (or may not be
  * called at all); therefore, the medium needs to be pushed into the
diff --git a/include/block/block.h b/include/block/block.h
index bd27e6c..6a9cfc6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -199,7 +199,6 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
 BlockDriverState *bdrv_new_root(void);
 BlockDriverState *bdrv_new(void);
-void bdrv_device_remove(BlockDriverState *bs);
 void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_replace_in_backing_chain(BlockDriverState *old,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9ef823a..fdcacab 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -442,8 +442,6 @@ struct BlockDriverState {
 char node_name[32];
 /* element of the list of named nodes building the graph */
 QTAILQ_ENTRY(BlockDriverState) node_list;
-/* element of the list of "drives" the guest sees */
-QTAILQ_ENTRY(BlockDriverState) device_list;
 /* element of the list of all BlockDriverStates (all_bdrv_states) */
 QTAILQ_ENTRY(BlockDriverState) bs_list;
 /* element of the list of monitor-owned BDS */
@@ -501,8 +499,6 @@ extern BlockDriver bdrv_file;
 extern BlockDriver bdrv_raw;
 extern BlockDriver bdrv_qcow2;
 
-extern QTAILQ_HEAD(BdrvStates, BlockDriverState) bdrv_states;
-
 /**
  * bdrv_setup_io_funcs:
  *
-- 
2.7.1

[Qemu-devel] [PATCH v3 00/10] hw/sd: QOMify sd.c (and pxa2xx_mmci)

2016-02-16 Thread Peter Maydell

This series attempts to QOMify sd.c (the actual SD card model),
including a proper QOM bus between the controller and the card.

TLDR: just a rebase and fix of #includes so it builds on current
master. Still need review on patches 4, 5, 6, and 8.


This series removes the experimental x-drive property on sdhci-pci;
the syntax for using that device changes:

instead of using the x-drive property:

  -device sdhci-pci,x-drive=mydrive -drive id=mydrive,[...]

we create a specific sd device (which is autoplugged into the
sdhci-pci device's "sd-bus" bus if you don't manually connect them):

  -device sdhci-pci -drive id=mydrive,[...] -device sd-card,drive=mydrive


The basic structure of the patch series is:
 * QOMify sd.c itself
 * Add the QOM bus APIs
 * Convert sdhci to use the QOM bus APIs
 * QOMify pxa2xx_mmci
 * Convert pxa2xx_mmci to use the QOM bus APIs

Some notes on compatibility/bisection:
 * the old-style non-QOMified controllers (which get their drive
   via blk_by_legacy_dinfo() continue to work through the whole series
 * the only QOMified controller which doesn't do that is sdhci-pci,
   which uses the experimental x-drive property to take a drive. This
   support and property is removed in patch 1; support for the new
   syntax is added in patch 5.
   (Since we're breaking the old syntax anyway I chose to not try to
   introduce the new syntax in the same commit as breaking the old one;
   I think it makes the patches easier to review.)

I don't have any Xilinx test images, so I haven't been able to test
the changes to those boards (beyond confirming that the board doesn't
crash on startup and that 'info qtree' seems to show SD cards
connected to the right things). I have tested sdhci-pci and
the pxa2xx.

Changes v1->v2:
 * change from "sd"/TYPE_SD/SD() to "sd-card"/TYPE_SD_CARD/SD_CARD()
 * similarly SD_CLASS -> SD_CARD_CLASS; SD_GET_CLASS -> SD_CARD_GET_CLASS;
   SDClass -> SDCardClass
 * fix pxa cut-n-paste flub
 * use error_propagate() rather than assuming input errp is non-NULL
 * remove stray blank lines/etc
 * use the new QOM alias-this-bus functionality as SPI does for Xilinx
 * use ARRAY_SIZE rather than sizeof
 * fix failure to register vmstate
Changes v2->v3:
 * add a missing #include that meant we weren't building if applied
   to current master
 * made new file include qemu/osdep.h first

thanks
-- PMM

Peter Maydell (10):
  hw/sd/sdhci.c: Remove x-drive property
  hw/sd/sd.c: QOMify
  hw/sd/sd.c: Convert sd_reset() function into Device reset method
  hw/sd: Add QOM bus which SD cards plug in to
  hw/sd/sdhci.c: Update to use SDBus APIs
  sdhci_sysbus: Create SD card device in users, not the device itself
  hw/sd/pxa2xx_mmci: convert to SysBusDevice object
  hw/sd/pxa2xx_mmci: Update to use new SDBus APIs
  hw/sd/pxa2xx_mmci: Convert to VMStateDescription
  hw/sd/pxa2xx_mmci: Add reset function

 hw/arm/xilinx_zynq.c  |  17 ++-
 hw/arm/xlnx-ep108.c   |  21 
 hw/arm/xlnx-zynqmp.c  |   8 ++
 hw/sd/Makefile.objs   |   2 +-
 hw/sd/core.c  | 146 
 hw/sd/pxa2xx_mmci.c   | 306 --
 hw/sd/sd.c| 150 -
 hw/sd/sdhci.c |  82 --
 include/hw/sd/sd.h|  65 +++
 include/hw/sd/sdhci.h |   3 +-
 10 files changed, 624 insertions(+), 176 deletions(-)
 create mode 100644 hw/sd/core.c

-- 
1.9.1

[Qemu-devel] [PATCH v3 11/14] block: Add blk_next_root_bs()

2016-02-16 Thread Max Reitz

This function iterates over all BDSs attached to a BB. We are going to
need it when rewriting bdrv_next() so it no longer uses bdrv_states.

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 24 
 include/sysemu/block-backend.h |  1 +
 2 files changed, 25 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index a918c35..d1621ec 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -269,6 +269,30 @@ BlockBackend *blk_next(BlockBackend *blk)
 }
 
 /*
+ * Iterates over all BlockDriverStates which are attached to a BlockBackend.
+ * This function is for use by bdrv_next().
+ *
+ * @bs must be NULL or a BDS that is attached to a BB.
+ */
+BlockDriverState *blk_next_root_bs(BlockDriverState *bs)
+{
+BlockBackend *blk;
+
+if (bs) {
+assert(bs->blk);
+blk = bs->blk;
+} else {
+blk = NULL;
+}
+
+do {
+blk = blk_all_next(blk);
+} while (blk && !blk->bs);
+
+return blk ? blk->bs : NULL;
+}
+
+/*
  * Add a BlockBackend into the list of backends referenced by the monitor.
  * Strictly for use by blockdev.c.
  */
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 3a47982..5ac9bd2 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -73,6 +73,7 @@ const char *blk_name(BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
 bool blk_name_taken(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
+BlockDriverState *blk_next_root_bs(BlockDriverState *bs);
 void monitor_add_blk(BlockBackend *blk);
 void monitor_remove_blk(BlockBackend *blk);
 
-- 
2.7.1

[Qemu-devel] [PATCH v3 02/10] hw/sd/sd.c: QOMify

2016-02-16 Thread Peter Maydell

Turn the SD card into a QOM device.
This conversion only changes the device itself; the various
functions which are effectively methods on the device are not
touched at this point.

Signed-off-by: Peter Maydell 
Reviewed-by: Peter Crosthwaite 
---
 hw/sd/sd.c | 99 ++
 include/hw/sd/sd.h |  3 ++
 2 files changed, 80 insertions(+), 22 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index dd614b0..1681728 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -34,6 +34,8 @@
 #include "sysemu/block-backend.h"
 #include "hw/sd/sd.h"
 #include "qemu/bitmap.h"
+#include "hw/qdev-properties.h"
+#include "qemu/error-report.h"
 
 //#define DEBUG_SD 1
 
@@ -78,6 +80,8 @@ enum SDCardStates {
 };
 
 struct SDState {
+DeviceState parent_obj;
+
 uint32_t mode;/* current card mode, one of SDCardModes */
 int32_t state;/* current card state, one of SDCardStates */
 uint32_t ocr;
@@ -473,34 +477,26 @@ static const VMStateDescription sd_vmstate = {
 }
 };
 
-/* We do not model the chip select pin, so allow the board to select
-   whether card should be in SSI or MMC/SD mode.  It is also up to the
-   board to ensure that ssi transfers only occur when the chip select
-   is asserted.  */
+/* Legacy initialization function for use by non-qdevified callers */
 SDState *sd_init(BlockBackend *blk, bool is_spi)
 {
-SDState *sd;
+DeviceState *dev;
+Error *err = NULL;
 
-if (blk && blk_is_read_only(blk)) {
-fprintf(stderr, "sd_init: Cannot use read-only drive\n");
+dev = qdev_create(NULL, TYPE_SD_CARD);
+qdev_prop_set_drive(dev, "drive", blk, );
+if (err) {
+error_report("sd_init failed: %s", error_get_pretty(err));
 return NULL;
 }
-
-sd = (SDState *) g_malloc0(sizeof(SDState));
-sd->buf = blk_blockalign(blk, 512);
-sd->spi = is_spi;
-sd->enable = true;
-sd->blk = blk;
-sd_reset(sd);
-if (sd->blk) {
-/* Attach dev if not already attached.  (This call ignores an
- * error return code if sd->blk is already attached.) */
-/* FIXME ignoring blk_attach_dev() failure is dangerously brittle */
-blk_attach_dev(sd->blk, sd);
-blk_set_dev_ops(sd->blk, _block_ops, sd);
+qdev_prop_set_bit(dev, "spi", is_spi);
+object_property_set_bool(OBJECT(dev), true, "realized", );
+if (err) {
+error_report("sd_init failed: %s", error_get_pretty(err));
+return NULL;
 }
-vmstate_register(NULL, -1, _vmstate, sd);
-return sd;
+
+return SD_CARD(dev);
 }
 
 void sd_set_cb(SDState *sd, qemu_irq readonly, qemu_irq insert)
@@ -1769,3 +1765,62 @@ void sd_enable(SDState *sd, bool enable)
 {
 sd->enable = enable;
 }
+
+static void sd_instance_init(Object *obj)
+{
+SDState *sd = SD_CARD(obj);
+
+sd->enable = true;
+}
+
+static void sd_realize(DeviceState *dev, Error **errp)
+{
+SDState *sd = SD_CARD(dev);
+
+if (sd->blk && blk_is_read_only(sd->blk)) {
+error_setg(errp, "Cannot use read-only drive as SD card");
+return;
+}
+
+sd->buf = blk_blockalign(sd->blk, 512);
+
+if (sd->blk) {
+blk_set_dev_ops(sd->blk, _block_ops, sd);
+}
+
+sd_reset(sd);
+}
+
+static Property sd_properties[] = {
+DEFINE_PROP_DRIVE("drive", SDState, blk),
+/* We do not model the chip select pin, so allow the board to select
+ * whether card should be in SSI or MMC/SD mode.  It is also up to the
+ * board to ensure that ssi transfers only occur when the chip select
+ * is asserted.  */
+DEFINE_PROP_BOOL("spi", SDState, spi, false),
+DEFINE_PROP_END_OF_LIST()
+};
+
+static void sd_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->realize = sd_realize;
+dc->props = sd_properties;
+dc->vmsd = _vmstate;
+}
+
+static const TypeInfo sd_info = {
+.name = TYPE_SD_CARD,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(SDState),
+.class_init = sd_class_init,
+.instance_init = sd_instance_init,
+};
+
+static void sd_register_types(void)
+{
+type_register_static(_info);
+}
+
+type_init(sd_register_types)
diff --git a/include/hw/sd/sd.h b/include/hw/sd/sd.h
index 79adb5b..404d589 100644
--- a/include/hw/sd/sd.h
+++ b/include/hw/sd/sd.h
@@ -68,6 +68,9 @@ typedef struct {
 
 typedef struct SDState SDState;
 
+#define TYPE_SD_CARD "sd-card"
+#define SD_CARD(obj) OBJECT_CHECK(SDState, (obj), TYPE_SD_CARD)
+
 SDState *sd_init(BlockBackend *bs, bool is_spi);
 int sd_do_command(SDState *sd, SDRequest *req,
   uint8_t *response);
-- 
1.9.1

[Qemu-devel] [PATCH v3 10/10] hw/sd/pxa2xx_mmci: Add reset function

2016-02-16 Thread Peter Maydell

Add a reset function to the pxa2xx_mmci device; previously it had
no handling for system reset at all.

Signed-off-by: Peter Maydell 
Reviewed-by: Peter Crosthwaite 
---
 hw/sd/pxa2xx_mmci.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/hw/sd/pxa2xx_mmci.c b/hw/sd/pxa2xx_mmci.c
index d9f5202..001d04e 100644
--- a/hw/sd/pxa2xx_mmci.c
+++ b/hw/sd/pxa2xx_mmci.c
@@ -511,6 +511,35 @@ void pxa2xx_mmci_handlers(PXA2xxMMCIState *s, qemu_irq 
readonly,
 pxa2xx_mmci_set_readonly(dev, sdbus_get_readonly(>sdbus));
 }
 
+static void pxa2xx_mmci_reset(DeviceState *d)
+{
+PXA2xxMMCIState *s = PXA2XX_MMCI(d);
+
+s->status = 0;
+s->clkrt = 0;
+s->spi = 0;
+s->cmdat = 0;
+s->resp_tout = 0;
+s->read_tout = 0;
+s->blklen = 0;
+s->numblk = 0;
+s->intmask = 0;
+s->intreq = 0;
+s->cmd = 0;
+s->arg = 0;
+s->active = 0;
+s->bytesleft = 0;
+s->tx_start = 0;
+s->tx_len = 0;
+s->rx_start = 0;
+s->rx_len = 0;
+s->resp_len = 0;
+s->cmdreq = 0;
+memset(s->tx_fifo, 0, sizeof(s->tx_fifo));
+memset(s->rx_fifo, 0, sizeof(s->rx_fifo));
+memset(s->resp_fifo, 0, sizeof(s->resp_fifo));
+}
+
 static void pxa2xx_mmci_instance_init(Object *obj)
 {
 PXA2xxMMCIState *s = PXA2XX_MMCI(obj);
@@ -533,6 +562,7 @@ static void pxa2xx_mmci_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 
 dc->vmsd = _pxa2xx_mmci;
+dc->reset = pxa2xx_mmci_reset;
 }
 
 static void pxa2xx_mmci_bus_class_init(ObjectClass *klass, void *data)
-- 
1.9.1

[Qemu-devel] [PATCH v3 07/14] blockdev: Add list of monitor-owned BlockBackends

2016-02-16 Thread Max Reitz

The monitor does hold references to some BlockBackends so it should have
a list of those BBs; blk_backends is a different list, as it contains
references to all BBs (after a follow-up patch, that is), and that
should not be changed because we do need such a list.

monitor_remove_blk() is idempotent so that we can call it in
blockdev_auto_del() without having to care whether it had been called in
do_drive_del() before. monitor_add_blk() is idempotent for symmetry
reasons (monitor_remove_blk() is, so it would be strange for
monitor_add_blk() not to be).

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 34 +-
 blockdev.c |  8 
 include/sysemu/block-backend.h |  2 ++
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 62cff17..430d7d5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -30,6 +30,8 @@ struct BlockBackend {
 BlockDriverState *bs;
 DriveInfo *legacy_dinfo;/* null unless created by drive_new() */
 QTAILQ_ENTRY(BlockBackend) link; /* for blk_backends */
+QTAILQ_ENTRY(BlockBackend) monitor_link; /* for monitor_block_backends */
+bool in_monitor_list;
 
 void *dev;  /* attached device model, if any */
 /* TODO change to DeviceState when all users are qdevified */
@@ -71,6 +73,11 @@ static void drive_info_del(DriveInfo *dinfo);
 static QTAILQ_HEAD(, BlockBackend) blk_backends =
 QTAILQ_HEAD_INITIALIZER(blk_backends);
 
+/* All BlockBackends referenced by the monitor and which are iterated through 
by
+ * blk_next() */
+static QTAILQ_HEAD(, BlockBackend) monitor_block_backends =
+QTAILQ_HEAD_INITIALIZER(monitor_block_backends);
+
 /*
  * Create a new BlockBackend with @name, with a reference count of one.
  * @name must not be null or empty.
@@ -260,7 +267,32 @@ void blk_remove_all_bs(void)
  */
 BlockBackend *blk_next(BlockBackend *blk)
 {
-return blk ? QTAILQ_NEXT(blk, link) : QTAILQ_FIRST(_backends);
+return blk ? QTAILQ_NEXT(blk, monitor_link)
+   : QTAILQ_FIRST(_block_backends);
+}
+
+/*
+ * Add a BlockBackend into the list of backends referenced by the monitor.
+ * Strictly for use by blockdev.c.
+ */
+void monitor_add_blk(BlockBackend *blk)
+{
+if (!blk->in_monitor_list) {
+QTAILQ_INSERT_TAIL(_block_backends, blk, monitor_link);
+blk->in_monitor_list = true;
+}
+}
+
+/*
+ * Remove a BlockBackend from the list of backends referenced by the monitor.
+ * Strictly for use by blockdev.c.
+ */
+void monitor_remove_blk(BlockBackend *blk)
+{
+if (blk->in_monitor_list) {
+QTAILQ_REMOVE(_block_backends, blk, monitor_link);
+blk->in_monitor_list = false;
+}
 }
 
 /*
diff --git a/blockdev.c b/blockdev.c
index 46cd8a9..ba1f648 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -147,6 +147,7 @@ void blockdev_auto_del(BlockBackend *blk)
 DriveInfo *dinfo = blk_legacy_dinfo(blk);
 
 if (dinfo && dinfo->auto_del) {
+monitor_remove_blk(blk);
 blk_unref(blk);
 }
 }
@@ -643,6 +644,8 @@ static BlockBackend *blockdev_init(const char *file, QDict 
*bs_opts,
 
 blk_set_on_error(blk, on_read_error, on_write_error);
 
+monitor_add_blk(blk);
+
 err_no_bs_opts:
 qemu_opts_del(opts);
 QDECREF(interval_dict);
@@ -2813,6 +2816,8 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 blk_remove_bs(blk);
 }
 
+monitor_remove_blk(blk);
+
 /* if we have a device attached to this BlockDriverState
  * then we need to make the drive anonymous until the device
  * can be removed.  If this is a drive with no device backing
@@ -3899,6 +3904,7 @@ void qmp_blockdev_add(BlockdevOptions *options, Error 
**errp)
 
 if (bs && bdrv_key_required(bs)) {
 if (blk) {
+monitor_remove_blk(blk);
 blk_unref(blk);
 } else {
 QTAILQ_REMOVE(_bdrv_states, bs, monitor_list);
@@ -3928,6 +3934,7 @@ void qmp_x_blockdev_del(bool has_id, const char *id,
 }
 
 if (has_id) {
+/* blk_by_name() never returns a BB that is not owned by the monitor */
 blk = blk_by_name(id);
 if (!blk) {
 error_setg(errp, "Cannot find block backend %s", id);
@@ -3975,6 +3982,7 @@ void qmp_x_blockdev_del(bool has_id, const char *id,
 }
 
 if (blk) {
+monitor_remove_blk(blk);
 blk_unref(blk);
 } else {
 QTAILQ_REMOVE(_bdrv_states, bs, monitor_list);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8027671..078690a 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -73,6 +73,8 @@ const char *blk_name(BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
 bool blk_name_taken(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
+void monitor_add_blk(BlockBackend *blk);
+void monitor_remove_blk(BlockBackend

[Qemu-devel] [PATCH v3 09/14] block: Move some bdrv_*_all() functions to BB

2016-02-16 Thread Max Reitz

Move bdrv_commit_all() and bdrv_flush_all() to the BlockBackend level.

Signed-off-by: Max Reitz 
---
 block.c   | 20 
 block/block-backend.c | 44 +++
 block/io.c| 20 
 include/block/block.h |  2 --
 stubs/Makefile.objs   |  2 +-
 stubs/{bdrv-commit-all.c => blk-commit-all.c} |  4 +--
 6 files changed, 41 insertions(+), 51 deletions(-)
 rename stubs/{bdrv-commit-all.c => blk-commit-all.c} (53%)

diff --git a/block.c b/block.c
index a119840..5eac9a3 100644
--- a/block.c
+++ b/block.c
@@ -2531,26 +2531,6 @@ ro_cleanup:
 return ret;
 }
 
-int bdrv_commit_all(void)
-{
-BlockDriverState *bs;
-
-QTAILQ_FOREACH(bs, _states, device_list) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
-
-aio_context_acquire(aio_context);
-if (bs->drv && bs->backing) {
-int ret = bdrv_commit(bs);
-if (ret < 0) {
-aio_context_release(aio_context);
-return ret;
-}
-}
-aio_context_release(aio_context);
-}
-return 0;
-}
-
 /*
  * Return values:
  * 0- success
diff --git a/block/block-backend.c b/block/block-backend.c
index a10fe44..a918c35 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -908,11 +908,6 @@ int blk_flush(BlockBackend *blk)
 return bdrv_flush(blk->bs);
 }
 
-int blk_flush_all(void)
-{
-return bdrv_flush_all();
-}
-
 void blk_drain(BlockBackend *blk)
 {
 if (blk->bs) {
@@ -1361,5 +1356,42 @@ BlockBackendRootState *blk_get_root_state(BlockBackend 
*blk)
 
 int blk_commit_all(void)
 {
-return bdrv_commit_all();
+BlockBackend *blk = NULL;
+
+while ((blk = blk_all_next(blk)) != NULL) {
+AioContext *aio_context = blk_get_aio_context(blk);
+
+aio_context_acquire(aio_context);
+if (blk_is_available(blk) && blk->bs->drv && blk->bs->backing) {
+int ret = bdrv_commit(blk->bs);
+if (ret < 0) {
+aio_context_release(aio_context);
+return ret;
+}
+}
+aio_context_release(aio_context);
+}
+return 0;
+}
+
+int blk_flush_all(void)
+{
+BlockBackend *blk = NULL;
+int result = 0;
+
+while ((blk = blk_all_next(blk)) != NULL) {
+AioContext *aio_context = blk_get_aio_context(blk);
+int ret;
+
+aio_context_acquire(aio_context);
+if (blk_is_inserted(blk)) {
+ret = blk_flush(blk);
+if (ret < 0 && !result) {
+result = ret;
+}
+}
+aio_context_release(aio_context);
+}
+
+return result;
 }
diff --git a/block/io.c b/block/io.c
index a69bfc4..5f9b6d6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1445,26 +1445,6 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState 
*bs,
  BDRV_REQ_ZERO_WRITE | flags);
 }
 
-int bdrv_flush_all(void)
-{
-BlockDriverState *bs = NULL;
-int result = 0;
-
-while ((bs = bdrv_next(bs))) {
-AioContext *aio_context = bdrv_get_aio_context(bs);
-int ret;
-
-aio_context_acquire(aio_context);
-ret = bdrv_flush(bs);
-if (ret < 0 && !result) {
-result = ret;
-}
-aio_context_release(aio_context);
-}
-
-return result;
-}
-
 typedef struct BdrvCoGetBlockStatusData {
 BlockDriverState *bs;
 BlockDriverState *base;
diff --git a/include/block/block.h b/include/block/block.h
index 1c4f4d8..db9e0f5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -273,7 +273,6 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
 void bdrv_refresh_limits(BlockDriverState *bs, Error **errp);
 int bdrv_commit(BlockDriverState *bs);
-int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
 const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
@@ -374,7 +373,6 @@ int bdrv_inactivate_all(void);
 /* Ensure contents are flushed to disk.  */
 int bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
-int bdrv_flush_all(void);
 void bdrv_close_all(void);
 void bdrv_drain(BlockDriverState *bs);
 void bdrv_drain_all(void);
diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index e922de9..9d9f1d0 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -1,5 +1,5 @@
 stub-obj-y += arch-query-cpu-def.o
-stub-obj-y += bdrv-commit-all.o
+stub-obj-y += blk-commit-all.o
 stub-obj-y += blockdev-close-all-bdrv-states.o
 stub-obj-y += clock-warp.o
 stub-obj-y += cpu-get-clock.o
diff --git a/stubs/bdrv-commit-all.c b/stubs/blk-commit-all.c
similarity index 53%
rename from stubs/bdrv-commit-all.c
rename to stubs/blk-commit-all.c
index

[Qemu-devel] [PATCH v3 04/10] hw/sd: Add QOM bus which SD cards plug in to

2016-02-16 Thread Peter Maydell

Add a QOM bus for SD cards to plug in to.

Note that since sd_enable() is used only by one board and there
only as part of a broken implementation, we do not provide it in
the SDBus API (but instead add a warning comment about the old
function). Whoever converts OMAP and the nseries boards to QOM
will need to either implement the card switch properly or move
the enable hack into the OMAP MMC controller model.

In the SDBus API, the old-style use of sd_set_cb to register some
qemu_irqs for notification of card insertion and write-protect
toggling is replaced with methods in the SDBusClass which the
card calls on status changes and methods in the SDClass which
the controller can call to find out the current status. The
query methods will allow us to remove the abuse of the 'register
irqs' API by controllers in their reset methods to trigger
the card to tell them about the current status again.

Signed-off-by: Peter Maydell 
---
 hw/sd/Makefile.objs |   2 +-
 hw/sd/core.c| 146 
 hw/sd/sd.c  |  47 +++--
 include/hw/sd/sd.h  |  62 ++
 4 files changed, 252 insertions(+), 5 deletions(-)
 create mode 100644 hw/sd/core.c

diff --git a/hw/sd/Makefile.objs b/hw/sd/Makefile.objs
index f1aed83..31c8330 100644
--- a/hw/sd/Makefile.objs
+++ b/hw/sd/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-$(CONFIG_PL181) += pl181.o
 common-obj-$(CONFIG_SSI_SD) += ssi-sd.o
-common-obj-$(CONFIG_SD) += sd.o
+common-obj-$(CONFIG_SD) += sd.o core.o
 common-obj-$(CONFIG_SDHCI) += sdhci.o
 
 obj-$(CONFIG_MILKYMIST) += milkymist-memcard.o
diff --git a/hw/sd/core.c b/hw/sd/core.c
new file mode 100644
index 000..14c2bdf
--- /dev/null
+++ b/hw/sd/core.c
@@ -0,0 +1,146 @@
+/*
+ * SD card bus interface code.
+ *
+ * Copyright (c) 2015 Linaro Limited
+ *
+ * Author:
+ *  Peter Maydell 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-core.h"
+#include "sysemu/block-backend.h"
+#include "hw/sd/sd.h"
+
+static SDState *get_card(SDBus *sdbus)
+{
+/* We only ever have one child on the bus so just return it */
+BusChild *kid = QTAILQ_FIRST(>qbus.children);
+
+if (!kid) {
+return NULL;
+}
+return SD_CARD(kid->child);
+}
+
+int sdbus_do_command(SDBus *sdbus, SDRequest *req, uint8_t *response)
+{
+SDState *card = get_card(sdbus);
+
+if (card) {
+SDCardClass *sc = SD_CARD_GET_CLASS(card);
+
+return sc->do_command(card, req, response);
+}
+
+return 0;
+}
+
+void sdbus_write_data(SDBus *sdbus, uint8_t value)
+{
+SDState *card = get_card(sdbus);
+
+if (card) {
+SDCardClass *sc = SD_CARD_GET_CLASS(card);
+
+sc->write_data(card, value);
+}
+}
+
+uint8_t sdbus_read_data(SDBus *sdbus)
+{
+SDState *card = get_card(sdbus);
+
+if (card) {
+SDCardClass *sc = SD_CARD_GET_CLASS(card);
+
+return sc->read_data(card);
+}
+
+return 0;
+}
+
+bool sdbus_data_ready(SDBus *sdbus)
+{
+SDState *card = get_card(sdbus);
+
+if (card) {
+SDCardClass *sc = SD_CARD_GET_CLASS(card);
+
+return sc->data_ready(card);
+}
+
+return false;
+}
+
+bool sdbus_get_inserted(SDBus *sdbus)
+{
+SDState *card = get_card(sdbus);
+
+if (card) {
+SDCardClass *sc = SD_CARD_GET_CLASS(card);
+
+return sc->get_inserted(card);
+}
+
+return false;
+}
+
+bool sdbus_get_readonly(SDBus *sdbus)
+{
+SDState *card = get_card(sdbus);
+
+if (card) {
+SDCardClass *sc = SD_CARD_GET_CLASS(card);
+
+return sc->get_readonly(card);
+}
+
+return false;
+}
+
+void sdbus_set_inserted(SDBus *sdbus, bool inserted)
+{
+SDBusClass *sbc = SD_BUS_GET_CLASS(sdbus);
+BusState *qbus = BUS(sdbus);
+
+if (sbc->set_inserted) {
+sbc->set_inserted(qbus->parent, inserted);
+}
+}
+
+void sdbus_set_readonly(SDBus *sdbus, bool readonly)
+{
+SDBusClass *sbc = SD_BUS_GET_CLASS(sdbus);
+BusState *qbus = BUS(sdbus);
+
+if (sbc->set_readonly) {
+sbc->set_readonly(qbus->parent, readonly);
+}
+}
+
+static const TypeInfo sd_bus_info = {
+.name = TYPE_SD_BUS,
+.parent = TYPE_BUS,
+.instance_size = sizeof(SDBus),
+.class_size = sizeof(SDBusClass),
+};
+
+static void sd_bus_register_types(void)
+{
+

[Qemu-devel] [PATCH v3 12/14] block: Rewrite bdrv_next()

2016-02-16 Thread Max Reitz

Instead of using the bdrv_states list, iterate over all the
BlockDriverStates attached to BlockBackends, and over all the
monitor-owned BDSs afterwards (except for those attached to a BB).

Signed-off-by: Max Reitz 
---
 block.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 5eac9a3..f3597ac 100644
--- a/block.c
+++ b/block.c
@@ -2979,12 +2979,23 @@ BlockDriverState *bdrv_next_node(BlockDriverState *bs)
 return QTAILQ_NEXT(bs, node_list);
 }
 
+/* Iterates over all top-level BlockDriverStates, i.e. BDSs that are owned by
+ * the monitor or attached to a BlockBackend */
 BlockDriverState *bdrv_next(BlockDriverState *bs)
 {
-if (!bs) {
-return QTAILQ_FIRST(_states);
+if (!bs || bs->blk) {
+bs = blk_next_root_bs(bs);
+if (bs) {
+return bs;
+}
 }
-return QTAILQ_NEXT(bs, device_list);
+
+/* Ignore all BDSs that are attached to a BlockBackend here; they have been
+ * handled by the above block already */
+do {
+bs = bdrv_next_monitor_owned(bs);
+} while (bs && bs->blk);
+return bs;
 }
 
 const char *bdrv_get_node_name(const BlockDriverState *bs)
-- 
2.7.1

[Qemu-devel] [PATCH v3 06/14] block: Use blk_{commit, flush}_all() consistently

2016-02-16 Thread Max Reitz

Replace bdrv_commmit_all() and bdrv_flush_all() by their BlockBackend
equivalents.

Signed-off-by: Max Reitz 
---
 blockdev.c  | 2 +-
 cpus.c  | 5 +++--
 qemu-char.c | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 8eff47d..46cd8a9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1175,7 +1175,7 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
 int ret;
 
 if (!strcmp(device, "all")) {
-ret = bdrv_commit_all();
+ret = blk_commit_all();
 } else {
 BlockDriverState *bs;
 AioContext *aio_context;
diff --git a/cpus.c b/cpus.c
index 898426c..2d34518 100644
--- a/cpus.c
+++ b/cpus.c
@@ -29,6 +29,7 @@
 #include "qapi/qmp/qerror.h"
 #include "qemu/error-report.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/block-backend.h"
 #include "exec/gdbstub.h"
 #include "sysemu/dma.h"
 #include "sysemu/kvm.h"
@@ -726,7 +727,7 @@ static int do_vm_stop(RunState state)
 }
 
 bdrv_drain_all();
-ret = bdrv_flush_all();
+ret = blk_flush_all();
 
 return ret;
 }
@@ -1428,7 +1429,7 @@ int vm_stop_force_state(RunState state)
 bdrv_drain_all();
 /* Make sure to return an error if the flush in a previous vm_stop()
  * failed. */
-return bdrv_flush_all();
+return blk_flush_all();
 }
 }
 
diff --git a/qemu-char.c b/qemu-char.c
index 1b7d5da..bd58df2 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -25,6 +25,7 @@
 #include "qemu-common.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/block-backend.h"
 #include "qemu/error-report.h"
 #include "qemu/timer.h"
 #include "sysemu/char.h"
@@ -561,7 +562,7 @@ static int mux_proc_byte(CharDriverState *chr, MuxDriver 
*d, int ch)
  break;
 }
 case 's':
-bdrv_commit_all();
+blk_commit_all();
 break;
 case 'b':
 qemu_chr_be_event(chr, CHR_EVENT_BREAK);
-- 
2.7.1

[Qemu-devel] [PATCH v3 08/14] blockdev: Remove blk_hide_on_behalf_of_hmp_drive_del()

2016-02-16 Thread Max Reitz

This function first removed the BlockBackend from the blk_backends list
and cleared its name so it would no longer be found by blk_name(); since
blk_next() now iterates through monitor_block_backends (which the BB is
removed from in hmp_drive_del()), this is no longer necessary.

Second, bdrv_make_anon() was called on the BDS. This was intended for
cases where the BDS was owned by that BB alone; in which case the BDS
will no longer exist at this point thanks to the blk_remove_bs() in
hmp_drive_del().

Therefore, this function does nothing useful anymore. Remove it.

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 25 ++---
 blockdev.c |  7 ++-
 include/sysemu/block-backend.h |  2 --
 3 files changed, 4 insertions(+), 30 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 430d7d5..a10fe44 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -69,7 +69,7 @@ static const AIOCBInfo block_backend_aiocb_info = {
 
 static void drive_info_del(DriveInfo *dinfo);
 
-/* All the BlockBackends (except for hidden ones) */
+/* All the BlockBackends */
 static QTAILQ_HEAD(, BlockBackend) blk_backends =
 QTAILQ_HEAD_INITIALIZER(blk_backends);
 
@@ -181,10 +181,7 @@ static void blk_delete(BlockBackend *blk)
 g_free(blk->root_state.throttle_group);
 throttle_group_unref(blk->root_state.throttle_state);
 }
-/* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */
-if (blk->name[0]) {
-QTAILQ_REMOVE(_backends, blk, link);
-}
+QTAILQ_REMOVE(_backends, blk, link);
 g_free(blk->name);
 drive_info_del(blk->legacy_dinfo);
 block_acct_cleanup(>stats);
@@ -400,24 +397,6 @@ BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo)
 }
 
 /*
- * Hide @blk.
- * @blk must not have been hidden already.
- * Make attached BlockDriverState, if any, anonymous.
- * Once hidden, @blk is invisible to all functions that don't receive
- * it as argument.  For example, blk_by_name() won't return it.
- * Strictly for use by do_drive_del().
- * TODO get rid of it!
- */
-void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk)
-{
-QTAILQ_REMOVE(_backends, blk, link);
-blk->name[0] = 0;
-if (blk->bs) {
-bdrv_make_anon(blk->bs);
-}
-}
-
-/*
  * Disassociates the currently associated BlockDriverState from @blk.
  */
 void blk_remove_bs(BlockBackend *blk)
diff --git a/blockdev.c b/blockdev.c
index ba1f648..6b929a9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2818,13 +2818,10 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
 
 monitor_remove_blk(blk);
 
-/* if we have a device attached to this BlockDriverState
- * then we need to make the drive anonymous until the device
- * can be removed.  If this is a drive with no device backing
- * then we can just get rid of the block driver state right here.
+/* If this BlockBackend has a device attached to it, its refcount will be
+ * decremented when the device is removed; otherwise we have to do so here.
  */
 if (blk_get_attached_dev(blk)) {
-blk_hide_on_behalf_of_hmp_drive_del(blk);
 /* Further I/O must not pause the guest */
 blk_set_on_error(blk, BLOCKDEV_ON_ERROR_REPORT,
  BLOCKDEV_ON_ERROR_REPORT);
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 078690a..3a47982 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -80,8 +80,6 @@ BlockDriverState *blk_bs(BlockBackend *blk);
 void blk_remove_bs(BlockBackend *blk);
 void blk_insert_bs(BlockBackend *blk, BlockDriverState *bs);
 
-void blk_hide_on_behalf_of_hmp_drive_del(BlockBackend *blk);
-
 void blk_iostatus_enable(BlockBackend *blk);
 bool blk_iostatus_is_enabled(const BlockBackend *blk);
 BlockDeviceIoStatus blk_iostatus(const BlockBackend *blk);
-- 
2.7.1

[Qemu-devel] [PATCH v3 03/10] hw/sd/sd.c: Convert sd_reset() function into Device reset method

2016-02-16 Thread Peter Maydell

Convert the sd_reset() function into a proper Device reset method.

Signed-off-by: Peter Maydell 
Reviewed-by: Alistair Francis 
Reviewed-by: Peter Crosthwaite 
---
 hw/sd/sd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 1681728..edd4c82 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -394,8 +394,9 @@ static inline uint64_t sd_addr_to_wpnum(uint64_t addr)
 return addr >> (HWBLOCK_SHIFT + SECTOR_SHIFT + WPGROUP_SHIFT);
 }
 
-static void sd_reset(SDState *sd)
+static void sd_reset(DeviceState *dev)
 {
+SDState *sd = SD_CARD(dev);
 uint64_t size;
 uint64_t sect;
 
@@ -436,7 +437,7 @@ static void sd_cardchange(void *opaque, bool load)
 
 qemu_set_irq(sd->inserted_cb, blk_is_inserted(sd->blk));
 if (blk_is_inserted(sd->blk)) {
-sd_reset(sd);
+sd_reset(DEVICE(sd));
 qemu_set_irq(sd->readonly_cb, sd->wp_switch);
 }
 }
@@ -680,7 +681,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 
 default:
 sd->state = sd_idle_state;
-sd_reset(sd);
+sd_reset(DEVICE(sd));
 return sd->spi ? sd_r1 : sd_r0;
 }
 break;
@@ -1787,8 +1788,6 @@ static void sd_realize(DeviceState *dev, Error **errp)
 if (sd->blk) {
 blk_set_dev_ops(sd->blk, _block_ops, sd);
 }
-
-sd_reset(sd);
 }
 
 static Property sd_properties[] = {
@@ -1808,6 +1807,7 @@ static void sd_class_init(ObjectClass *klass, void *data)
 dc->realize = sd_realize;
 dc->props = sd_properties;
 dc->vmsd = _vmstate;
+dc->reset = sd_reset;
 }
 
 static const TypeInfo sd_info = {
-- 
1.9.1

[Qemu-devel] [PATCH v3 01/14] monitor: Use BB list for BB name completion

2016-02-16 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 monitor.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/monitor.c b/monitor.c
index 73eac17..7620e20 100644
--- a/monitor.c
+++ b/monitor.c
@@ -42,6 +42,7 @@
 #include "ui/console.h"
 #include "ui/input.h"
 #include "sysemu/blockdev.h"
+#include "sysemu/block-backend.h"
 #include "audio/audio.h"
 #include "disas/disas.h"
 #include "sysemu/balloon.h"
@@ -3467,7 +3468,7 @@ static void monitor_find_completion_by_table(Monitor *mon,
 int i;
 const char *ptype, *str, *name;
 const mon_cmd_t *cmd;
-BlockDriverState *bs;
+BlockBackend *blk = NULL;
 
 if (nb_args <= 1) {
 /* command completion */
@@ -3522,8 +3523,8 @@ static void monitor_find_completion_by_table(Monitor *mon,
 case 'B':
 /* block device name completion */
 readline_set_completion_index(mon->rs, strlen(str));
-for (bs = bdrv_next(NULL); bs; bs = bdrv_next(bs)) {
-name = bdrv_get_device_name(bs);
+while ((blk = blk_next(blk)) != NULL) {
+name = blk_name(blk);
 if (str[0] == '\0' ||
 !strncmp(name, str, strlen(str))) {
 readline_add_completion(mon->rs, name);
-- 
2.7.1

[Qemu-devel] [PATCH v3 04/14] block: Add blk_name_taken()

2016-02-16 Thread Max Reitz

There may be BlockBackends which are not returned by blk_by_name(), but
do exist and have a name. blk_name_taken() allows testing whether a
specific name is in use already, independent of whether the BlockBackend
with that name is accessible through blk_by_name().

Signed-off-by: Max Reitz 
---
 block.c|  4 ++--
 block/block-backend.c  | 19 ++-
 include/sysemu/block-backend.h |  1 +
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index efc3c43..a119840 100644
--- a/block.c
+++ b/block.c
@@ -847,8 +847,8 @@ static void bdrv_assign_node_name(BlockDriverState *bs,
 return;
 }
 
-/* takes care of avoiding namespaces collisions */
-if (blk_by_name(node_name)) {
+/* takes care of avoiding namespace collisions */
+if (blk_name_taken(node_name)) {
 error_setg(errp, "node-name=%s is conflicting with a device id",
node_name);
 goto out;
diff --git a/block/block-backend.c b/block/block-backend.c
index 30decb4..45d4057 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -87,7 +87,7 @@ BlockBackend *blk_new(const char *name, Error **errp)
 error_setg(errp, "Invalid device name");
 return NULL;
 }
-if (blk_by_name(name)) {
+if (blk_name_taken(name)) {
 error_setg(errp, "Device with id '%s' already exists", name);
 return NULL;
 }
@@ -291,6 +291,23 @@ BlockBackend *blk_by_name(const char *name)
 }
 
 /*
+ * This function should be used to check whether a certain BlockBackend name is
+ * already taken; blk_by_name() will only search in the list of monitor-owned
+ * BlockBackends which is not necessarily complete.
+ */
+bool blk_name_taken(const char *name)
+{
+BlockBackend *blk = NULL;
+
+while ((blk = blk_all_next(blk)) != NULL) {
+if (!strcmp(name, blk->name)) {
+return true;
+}
+}
+return false;
+}
+
+/*
  * Return the BlockDriverState attached to @blk if any, else null.
  */
 BlockDriverState *blk_bs(BlockBackend *blk)
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index ec30331..3fbf822 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -71,6 +71,7 @@ void blk_unref(BlockBackend *blk);
 void blk_remove_all_bs(void);
 const char *blk_name(BlockBackend *blk);
 BlockBackend *blk_by_name(const char *name);
+bool blk_name_taken(const char *name);
 BlockBackend *blk_next(BlockBackend *blk);
 
 BlockDriverState *blk_bs(BlockBackend *blk);
-- 
2.7.1

[Qemu-devel] [PATCH v3 05/14] block: Add blk_commit_all()

2016-02-16 Thread Max Reitz

Later, we will remove bdrv_commit_all() and move its contents here, and
in order to replace bdrv_commit_all() calls by calls to blk_commit_all()
before doing so, we need to add it as an alias now.

Signed-off-by: Max Reitz 
---
 block/block-backend.c  | 5 +
 include/sysemu/block-backend.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 45d4057..62cff17 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1347,3 +1347,8 @@ BlockBackendRootState *blk_get_root_state(BlockBackend 
*blk)
 {
 return >root_state;
 }
+
+int blk_commit_all(void)
+{
+return bdrv_commit_all();
+}
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 3fbf822..8027671 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -128,6 +128,7 @@ int blk_co_discard(BlockBackend *blk, int64_t sector_num, 
int nb_sectors);
 int blk_co_flush(BlockBackend *blk);
 int blk_flush(BlockBackend *blk);
 int blk_flush_all(void);
+int blk_commit_all(void);
 void blk_drain(BlockBackend *blk);
 void blk_drain_all(void);
 void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error,
-- 
2.7.1

1 2 3 4 >

1 - 100 of 382 matches

Mail list logo