date:20160504

Re: [Qemu-devel] [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

2016-05-04 Thread Jike Song

On 05/04/2016 06:43 AM, Alex Williamson wrote:
> On Tue, 3 May 2016 00:10:41 +0530
> Kirti Wankhede  wrote:
>> +
>> +/*
>> + * Pin a set of guest PFNs and return their associated host PFNs for vGPU.
>> + * @vaddr [in]: array of guest PFNs
>> + * @npage [in]: count of array elements
>> + * @prot [in] : protection flags
>> + * @pfn_base[out] : array of host PFNs
>> + */
>> +int vfio_pin_pages(void *iommu_data, dma_addr_t *vaddr, long npage,
>> +   int prot, dma_addr_t *pfn_base)
>> +{
>> +struct vfio_iommu *iommu = iommu_data;
>> +struct vfio_domain *domain = NULL, *domain_vgpu = NULL;
>> +int i = 0, ret = 0;
>> +long retpage;
>> +dma_addr_t remote_vaddr = 0;
>> +dma_addr_t *pfn = pfn_base;
>> +struct vfio_dma *dma;
>> +
>> +if (!iommu || !pfn_base)
>> +return -EINVAL;
>> +
>> +if (list_empty(&iommu->domain_list)) {
>> +ret = -EINVAL;
>> +goto pin_done;
>> +}
>> +
>> +get_first_domains(iommu, &domain, &domain_vgpu);
>> +
>> +// Return error if vGPU domain doesn't exist
> 
> No c++ style comments please.
> 
>> +if (!domain_vgpu) {
>> +ret = -EINVAL;
>> +goto pin_done;
>> +}
>> +
>> +for (i = 0; i < npage; i++) {
>> +struct vfio_vgpu_pfn *p;
>> +struct vfio_vgpu_pfn *lpfn;
>> +unsigned long tpfn;
>> +dma_addr_t iova;
>> +
>> +mutex_lock(&iommu->lock);
>> +
>> +iova = vaddr[i] << PAGE_SHIFT;
>> +
>> +dma = vfio_find_dma(iommu, iova, 0 /*  size */);
>> +if (!dma) {
>> +mutex_unlock(&iommu->lock);
>> +ret = -EINVAL;
>> +goto pin_done;
>> +}
>> +
>> +remote_vaddr = dma->vaddr + iova - dma->iova;
>> +
>> +retpage = vfio_pin_pages_internal(domain_vgpu, remote_vaddr,
>> +  (long)1, prot, &tpfn);
>> +mutex_unlock(&iommu->lock);
>> +if (retpage <= 0) {
>> +WARN_ON(!retpage);
>> +ret = (int)retpage;
>> +goto pin_done;
>> +}
>> +
>> +pfn[i] = tpfn;
>> +
>> +mutex_lock(&domain_vgpu->lock);
>> +
>> +// search if pfn exist
>> +if ((p = vfio_find_vgpu_pfn(domain_vgpu, tpfn))) {
>> +atomic_inc(&p->ref_count);
>> +mutex_unlock(&domain_vgpu->lock);
>> +continue;
>> +}
> 
> The only reason I can come up with for why we'd want to integrate an
> api-only domain into the existing type1 code would be to avoid page
> accounting issues where we count locked pages once for a normal
> assigned device and again for a vgpu, but that's not what we're doing
> here.  We're not only locking the pages again regardless of them
> already being locked, we're counting every time we lock them through
> this new interface.  So there's really no point at all to making type1
> become this unsupportable.  In that case we should be pulling out the
> common code that we want to share from type1 and making a new type1
> compatible vfio iommu backend rather than conditionalizing everything
> here.
> 
>> +
>> +// add to pfn_list
>> +lpfn = kzalloc(sizeof(*lpfn), GFP_KERNEL);
>> +if (!lpfn) {
>> +ret = -ENOMEM;
>> +mutex_unlock(&domain_vgpu->lock);
>> +goto pin_done;
>> +}
>> +lpfn->vmm_va = remote_vaddr;
>> +lpfn->iova = iova;
>> +lpfn->pfn = pfn[i];
>> +lpfn->npage = 1;
>> +lpfn->prot = prot;
>> +atomic_inc(&lpfn->ref_count);
>> +vfio_link_vgpu_pfn(domain_vgpu, lpfn);
>> +mutex_unlock(&domain_vgpu->lock);
>> +}
>> +
>> +ret = i;
>> +
>> +pin_done:
>> +return ret;
>> +}
>> +EXPORT_SYMBOL(vfio_pin_pages);
>> +

IIUC, an api-only domain is a VFIO domain *without* underlying IOMMU
hardware. It just, as you said in another mail, "rather than
programming them into an IOMMU for a device, it simply stores the
translations for use by later requests".

That imposes a constraint on gfx driver: hardware IOMMU must be disabled.
Otherwise, if IOMMU is present, the gfx driver eventually programs
the hardware IOMMU with IOVA returned by pci_map_page or dma_map_page;
Meanwhile, the IOMMU backend for vgpu only maintains GPA <-> HPA
translations without any knowledge about hardware IOMMU, how is the
device model supposed to do to get an IOVA for a given GPA (thereby HPA
by the IOMMU backend here)?

If things go as guessed above, as vfio_pin_pages() indicates, it
pin & translate vaddr to PFN, then it will be very difficult for the
device model to figure out:

1, for a given GPA, how to avoid calling dma_map_page multiple times?
2, for which page to call dma_unmap_page?

--
Th

Re: [Qemu-devel] Migration ToDo list (a.k.a. Rant)

2016-05-04 Thread Hailiang Zhang


Hi Juan,

On 2016/5/4 19:20, Juan Quintela wrote:


Hi

I am lots of times asked about what is the ToDo list for migration, that
was on my head, and random notes over my desk, so, trying some
organization (Yes, I would put this in the wiki).


- migration thread on reception
   would make trivial to do other things while receiving, and would make
   postcopy easier also (I was going to put much easier, but postcopy is
   never easy).

- migration capabilities and parameters
   this is a mess.  Not, is worse than that.  I don't know who is to
   blame here, but something needs to be done:

  void qmp_migrate_set_parameters(bool has_compress_level,
 int64_t compress_level,
 bool has_compress_threads,
 int64_t compress_threads,
 bool has_decompress_threads,
 int64_t decompress_threads,
 bool has_x_cpu_throttle_initial,
 int64_t x_cpu_throttle_initial,
 bool has_x_cpu_throttle_increment,
 int64_t x_cpu_throttle_increment,
 bool has_multifd_threads,
 int64_t multifd_threads,
 Error **errp)



 Can we move this to an array of structs, please, pretty please?
 I think that for this one, the blame is on qmp

but we can continue:

migrate
migrate_cancel
migrate_incoming
migrate_start_postcopy

   Not a lot to do until here

migrate_set_capability
   Minor nickpit, if it only allow booleans, "migrate_set_capability 
x-multifd",
   should be an equivalent of "migrate_set_capability x-multifd on"

migrate_set_cache_size
migrate_set_downtime
migrate_set_speed
   This three should be claimed obsolete, deprecated, whatever, and
   make it on top of next one

migrate_set_parameter

Now to read the migration information:

  migrate_capabilities
good
  migrate_parameters
good
  migrate_cache_size
good, but we are missing migrate_speed and migrate_downtime, see
why I want it be inside migrate_set_parameters

  migrate
now, this is . weird?  We put here lots of information, and
this is basically the only way to put information out.  To make
things more interesting, the values change meaning during
migration, and the fields it shows change also over time.

- info migrate
   This deserves its own item.  Lets see a typical output

(qemu)info migrate

capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: off x-multifd: on

Aha, we have the capabilities, but not the parameters.  This is
historical, I know, but don't belong here.

Migration status: completed
ok
total time: 1621 milliseconds
ok
downtime: 208 milliseconds
ok
setup: 9 milliseconds
ok

transferred ram: 609708 kbytes
kilo bytes, not pages

throughput: 27.64 mbps
but we measure bandwidth is megabytes by second
previous one was kylobytes

remaining ram: 0 kbytes

total ram: 2106180 kbytes
this amount don't change.  I can understand why it was here.

duplicate: 452528 pages
name is historical.  It really means pages filled with the same
characeter.  Althought in practical effects it means zero pages

skipped: 0 pages
Even I don't remember what this means.
normal: 151064 pages
This is normal pages that we have sent, i.e. pages that are not zero
pages nor skipped pages.  Notice that we have put here pages, not
bytes, not kilobytes, but pages.

normal bytes: 604256 kbytes
Don't worry, we put for you the same number as kilobytes.

dirty sync count: 11
Number of iterations over the full ram.  Yes, I know, we are very,
very bad at naming.

And we still have more optional information that appears if we are doing
block migration, xbzrle, compression, rdma, etc, etc.

We need to decide some units also internal.  Some things are in bytes,
some are in kilobytes, some are in pages.  Some are in host pages, or
guest pages, or who knows :-(


- Block migration (the migration/block.c one).  This is the bastard
   child of migration.  Much less tested, we should make a decision
   about letting it live or deprecating it.  Things needed from memory:
  - functions should return the same values than ram.c
some functions don't have "exact" values, and return 1 when there
are more than one block dirty, etc, etc
  - if we continue maintaing it, allowing it to have _some_ shared
devices and some non shared ones, insntead of everything?

- RDMA: Another step child

   This is really, really weird.  We don't use the normal infrastructure
   for RDMA, we use the ram_control_* stuff.  We should

Re: [Qemu-devel] [RFC PATCH v3 2/3] VFIO driver for vGPU device

2016-05-04 Thread Kirti Wankhede




On 5/5/2016 2:44 AM, Neo Jia wrote:

On Wed, May 04, 2016 at 11:06:19AM -0600, Alex Williamson wrote:

On Wed, 4 May 2016 03:23:13 +
"Tian, Kevin"  wrote:


From: Alex Williamson [mailto:alex.william...@redhat.com]
Sent: Wednesday, May 04, 2016 6:43 AM

+
+   if (gpu_dev->ops->write) {
+   ret = gpu_dev->ops->write(vgpu_dev,
+ user_data,
+ count,
+ vgpu_emul_space_config,
+ pos);
+   }
+
+   memcpy((void *)(vdev->vconfig + pos), (void *)user_data, count);


So write is expected to user_data to allow only the writable bits to be
changed?  What's really being saved in the vconfig here vs the vendor
vgpu driver?  It seems like we're only using it to cache the BAR
values, but we're not providing the BAR emulation here, which seems
like one of the few things we could provide so it's not duplicated in
every vendor driver.  But then we only need a few u32s to do that, not
all of config space.


We can borrow same vconfig emulation from existing vfio-pci driver.
But doing so doesn't mean that vendor vgpu driver cannot have its
own vconfig emulation further. vGPU is not like a real device, since
there may be no physical config space implemented for each vGPU.
So anyway vendor vGPU driver needs to create/emulate the virtualized
config space while the way how is created might be vendor specific.
So better to keep the interface to access raw vconfig space from
vendor vGPU driver.


I'm hoping config space will be very simple for a vgpu, so I don't know
that it makes sense to add that complexity early on.  Neo/Kirti, what
capabilities do you expect to provide?  Who provides the MSI
capability?  Is a PCIe capability provided?  Others?




From VGPU_VFIO point of view, VGPU_VFIO would not provide or modify any 
capabilities. Vendor vGPU driver should provide config space. Then 
vendor driver can provide PCI capabilities or PCIe capabilities, it 
might also have vendor specific information. VGPU_VFIO driver would not 
intercept that information.



Currently only standard PCI caps.

MSI cap is emulated by the vendor drivers via the above interface.

No PCIe caps so far.



Nvidia vGPU device is standard PCI device. We tested standard PCI caps.

Thanks,
Kirti.




+static ssize_t vgpu_dev_rw(void *device_data, char __user *buf,
+   size_t count, loff_t *ppos, bool iswrite)
+{
+   unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+   struct vfio_vgpu_device *vdev = device_data;
+
+   if (index >= VFIO_PCI_NUM_REGIONS)
+   return -EINVAL;
+
+   switch (index) {
+   case VFIO_PCI_CONFIG_REGION_INDEX:
+   return vgpu_dev_config_rw(vdev, buf, count, ppos, iswrite);
+
+   case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
+   return vgpu_dev_bar_rw(vdev, buf, count, ppos, iswrite);
+
+   case VFIO_PCI_ROM_REGION_INDEX:
+   case VFIO_PCI_VGA_REGION_INDEX:


Wait a sec, who's doing the VGA emulation?  We can't be claiming to
support a VGA region and then fail to provide read/write access to it
like we said it has.


For Intel side we plan to not support VGA region when upstreaming our
KVMGT work, which means Intel vGPU will be exposed only as a
secondary graphics card then so legacy VGA is not required. Also no
VBIOS/ROM requirement. Guess we can remove above two regions.


So this needs to be optional based on what the mediation driver
provides.  It seems like we're just making passthroughs for the vendor
mediation driver to speak vfio.


+
+static int vgpu_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault 
*vmf)
+{
+   int ret = 0;
+   struct vfio_vgpu_device *vdev = vma->vm_private_data;
+   struct vgpu_device *vgpu_dev;
+   struct gpu_device *gpu_dev;
+   u64 virtaddr = (u64)vmf->virtual_address;
+   u64 offset, phyaddr;
+   unsigned long req_size, pgoff;
+   pgprot_t pg_prot;
+
+   if (!vdev && !vdev->vgpu_dev)
+   return -EINVAL;
+
+   vgpu_dev = vdev->vgpu_dev;
+   gpu_dev  = vgpu_dev->gpu_dev;
+
+   offset   = vma->vm_pgoff << PAGE_SHIFT;
+   phyaddr  = virtaddr - vma->vm_start + offset;
+   pgoff= phyaddr >> PAGE_SHIFT;
+   req_size = vma->vm_end - virtaddr;
+   pg_prot  = vma->vm_page_prot;
+
+   if (gpu_dev->ops->validate_map_request) {
+   ret = gpu_dev->ops->validate_map_request(vgpu_dev, virtaddr, 
&pgoff,
+&req_size, &pg_prot);
+   if (ret)
+   return ret;
+
+   if (!req_size)
+   return -EINVAL;
+   }
+
+   ret = remap_pfn_range(vma, virtaddr, pgoff, req_size, pg_prot);


So not supporting validate_map_request() means that the user can
direc

Re: [Qemu-devel] [PATCH v6 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-05-04 Thread Peter Xu

On Thu, May 05, 2016 at 11:25:35AM +0800, Peter Xu wrote:
> Hi, all,
> 
> This is v6 for Intel IOMMU IR support. This series introduced quite
> a few new patches based on v5. Sorry for that (Yes, Jan is
> contributing to it as well, though most of which are really good
> ideas for me :). Hopefully we can get its convergence in this
> version.
> 
> To make the review easier, I tried to keep all the existing patches
> and indexes (also, this is easier for me too to do the
> modifications, and logically I feel this make more sense and clean,
> please let me know if I am wrong). Patches 1-18 are v5 patches, and
> patches 19-26 are newly added patches.
> 
> All the new patches may need more review, many of them are outside
> Intel IOMMU scope, and touching other part of codes, which I am
> still not very sure about.
> 
> Testing is only covering basic smoke test for the following matrix:
> 
> - IR enabled/disable
> - kernel irqchip off/split
> - network device: tap with/without vhost, e1000

Again... Missing link:

  https://github.com/xzpeter/qemu vtd-intr-v6

Thanks,

-- peterx

[Qemu-devel] [PATCH v4 3/6] qemu-io: Allow unaligned access by default

2016-05-04 Thread Eric Blake

There's no reason to require the user to specify a flag just so
they can pass in unaligned numbers.  Keep 'read -p' and 'write -p'
as no-ops so that I don't have to hunt down and update all users
of qemu-io, but otherwise make their behavior default as 'read' and
'write'.  Also fix 'write -z', 'readv', 'writev', 'writev',
'aio_read', 'aio_write', and 'aio_write -z'.  For now, 'read -b',
'multiwrite', 'write -b', and 'write -c' still require alignment.

qemu-iotest 23 is updated to match, as the only test that was
previously explicitly expecting an error on an unaligned request.

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c |   63 +-
 tests/qemu-iotests/023.out | 2160 +---
 2 files changed, 1452 insertions(+), 771 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index dc6b0dc..8bcf742 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -395,12 +395,6 @@ create_iovec(BlockBackend *blk, QEMUIOVector *qiov, char 
**argv, int nr_iov,
 goto fail;
 }

-if (len & 0x1ff) {
-printf("length argument %" PRId64
-   " is not sector aligned\n", len);
-goto fail;
-}
-
 sizes[i] = len;
 count += len;
 }
@@ -634,7 +628,7 @@ static void read_help(void)
 " -b, -- read from the VM state rather than the virtual disk\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -l, -- length for pattern verification (only with -P)\n"
-" -p, -- allow unaligned access\n"
+" -p, -- ignored for back-compat\n"
 " -P, -- use a pattern to verify read data\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -s, -- start offset for pattern verification (only with -P)\n"
@@ -650,7 +644,7 @@ static const cmdinfo_t read_cmd = {
 .cfunc  = read_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-abCpqv] [-P pattern [-s off] [-l len]] off len",
+.args   = "[-abCqv] [-P pattern [-s off] [-l len]] off len",
 .oneline= "reads a number of bytes at a specified offset",
 .help   = read_help,
 };
@@ -658,7 +652,7 @@ static const cmdinfo_t read_cmd = {
 static int read_f(BlockBackend *blk, int argc, char **argv)
 {
 struct timeval t1, t2;
-bool Cflag = false, pflag = false, qflag = false, vflag = false;
+bool Cflag = false, qflag = false, vflag = false;
 bool Pflag = false, sflag = false, lflag = false, bflag = false;
 int c, cnt;
 char *buf;
@@ -686,7 +680,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 }
 break;
 case 'p':
-pflag = true;
+/* Ignored for back-compat */
 break;
 case 'P':
 Pflag = true;
@@ -718,11 +712,6 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 return qemuio_command_usage(&read_cmd);
 }

-if (bflag && pflag) {
-printf("-b and -p cannot be specified at the same time\n");
-return 0;
-}
-
 offset = cvtnum(argv[optind]);
 if (offset < 0) {
 print_cvtnum_err(offset, argv[optind]);
@@ -753,7 +742,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 return 0;
 }

-if (!pflag) {
+if (bflag) {
 if (offset & 0x1ff) {
 printf("offset %" PRId64 " is not sector aligned\n",
offset);
@@ -890,12 +879,6 @@ static int readv_f(BlockBackend *blk, int argc, char 
**argv)
 }
 optind++;

-if (offset & 0x1ff) {
-printf("offset %" PRId64 " is not sector aligned\n",
-   offset);
-return 0;
-}
-
 nr_iov = argc - optind;
 buf = create_iovec(blk, &qiov, &argv[optind], nr_iov, 0xab);
 if (buf == NULL) {
@@ -952,7 +935,7 @@ static void write_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
 " -c, -- write compressed data with blk_write_compressed\n"
-" -p, -- allow unaligned access\n"
+" -p, -- ignored for back-compat\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
@@ -968,7 +951,7 @@ static const cmdinfo_t write_cmd = {
 .cfunc  = write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCpqz] [-P pattern ] off len",
+.args   = "[-bcCqz] [-P pattern ] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -976,7 +959,7 @@ static const cmdinfo_t write_cmd = {
 static int write_f(BlockBackend *blk, int argc, char **argv)
 {
 struct timeval t1, t2;
-bool Cflag = false, pflag = false, qflag = false, bflag = false;
+bool Cflag = false, qflag = false, bflag = false;
 bool Pflag = false, zflag = false, cflag = false;
 int c, cnt;
 char *buf = NULL;
@@ -998,7 +981,7 @@ static int write_f(BlockBackend *blk, int argc, char **a

[Qemu-devel] [PATCH v4 1/6] qemu-io: Add missing option documentation

2016-05-04 Thread Eric Blake

Commit 499afa2 added --image-opts, but forgot to document it in
--help.  Likewise for commit 9e8f183 and -d/--discard.

Finally, commit 10d9d75 removed -g/--growable, but forgot to
cull it from the valid short options.

Signed-off-by: Eric Blake 
---
 qemu-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/qemu-io.c b/qemu-io.c
index 0598251..4aba7e0 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -221,6 +221,7 @@ static void usage(const char *name)
 "\n"
 "  --object OBJECTDEF   define an object such as 'secret' for\n"
 "   passwords and/or encryption keys\n"
+"  --image-opts treat file as option string\n"
 "  -c, --cmd STRING execute command with its arguments\n"
 "   from the given string\n"
 "  -f, --format FMT specifies the block driver to use\n"
@@ -230,6 +231,7 @@ static void usage(const char *name)
 "  -m, --misalign   misalign allocations for O_DIRECT\n"
 "  -k, --native-aio use kernel AIO implementation (on Linux only)\n"
 "  -t, --cache=MODE use the given cache mode for the image\n"
+"  -d, --discard=MODE   use the given discard mode for the image\n"
 "  -T, --trace FILE enable trace events listed in the given file\n"
 "  -h, --help   display this help and exit\n"
 "  -V, --versionoutput version information and exit\n"
@@ -410,7 +412,7 @@ static QemuOptsList file_opts = {
 int main(int argc, char **argv)
 {
 int readonly = 0;
-const char *sopt = "hVc:d:f:rsnmgkt:T:";
+const char *sopt = "hVc:d:f:rsnmkt:T:";
 const struct option lopt[] = {
 { "help", no_argument, NULL, 'h' },
 { "version", no_argument, NULL, 'V' },
-- 
2.5.5

[Qemu-devel] [PATCH v4 2/6] qemu-io: Use bool for command line flags

2016-05-04 Thread Eric Blake

We require a C99 compiler; let's use it to express what we
really mean.

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 94 +-
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 0bbbc72..dc6b0dc 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -345,7 +345,7 @@ static void dump_buffer(const void *buffer, int64_t offset, 
int64_t len)
 }

 static void print_report(const char *op, struct timeval *t, int64_t offset,
- int64_t count, int64_t total, int cnt, int Cflag)
+ int64_t count, int64_t total, int cnt, bool Cflag)
 {
 char s1[64], s2[64], ts[64];

@@ -658,8 +658,8 @@ static const cmdinfo_t read_cmd = {
 static int read_f(BlockBackend *blk, int argc, char **argv)
 {
 struct timeval t1, t2;
-int Cflag = 0, pflag = 0, qflag = 0, vflag = 0;
-int Pflag = 0, sflag = 0, lflag = 0, bflag = 0;
+bool Cflag = false, pflag = false, qflag = false, vflag = false;
+bool Pflag = false, sflag = false, lflag = false, bflag = false;
 int c, cnt;
 char *buf;
 int64_t offset;
@@ -672,13 +672,13 @@ static int read_f(BlockBackend *blk, int argc, char 
**argv)
 while ((c = getopt(argc, argv, "bCl:pP:qs:v")) != -1) {
 switch (c) {
 case 'b':
-bflag = 1;
+bflag = true;
 break;
 case 'C':
-Cflag = 1;
+Cflag = true;
 break;
 case 'l':
-lflag = 1;
+lflag = true;
 pattern_count = cvtnum(optarg);
 if (pattern_count < 0) {
 print_cvtnum_err(pattern_count, optarg);
@@ -686,20 +686,20 @@ static int read_f(BlockBackend *blk, int argc, char 
**argv)
 }
 break;
 case 'p':
-pflag = 1;
+pflag = true;
 break;
 case 'P':
-Pflag = 1;
+Pflag = true;
 pattern = parse_pattern(optarg);
 if (pattern < 0) {
 return 0;
 }
 break;
 case 'q':
-qflag = 1;
+qflag = true;
 break;
 case 's':
-sflag = 1;
+sflag = true;
 pattern_offset = cvtnum(optarg);
 if (pattern_offset < 0) {
 print_cvtnum_err(pattern_offset, optarg);
@@ -707,7 +707,7 @@ static int read_f(BlockBackend *blk, int argc, char **argv)
 }
 break;
 case 'v':
-vflag = 1;
+vflag = true;
 break;
 default:
 return qemuio_command_usage(&read_cmd);
@@ -844,7 +844,7 @@ static const cmdinfo_t readv_cmd = {
 static int readv_f(BlockBackend *blk, int argc, char **argv)
 {
 struct timeval t1, t2;
-int Cflag = 0, qflag = 0, vflag = 0;
+bool Cflag = false, qflag = false, vflag = false;
 int c, cnt;
 char *buf;
 int64_t offset;
@@ -853,25 +853,25 @@ static int readv_f(BlockBackend *blk, int argc, char 
**argv)
 int nr_iov;
 QEMUIOVector qiov;
 int pattern = 0;
-int Pflag = 0;
+bool Pflag = false;

 while ((c = getopt(argc, argv, "CP:qv")) != -1) {
 switch (c) {
 case 'C':
-Cflag = 1;
+Cflag = true;
 break;
 case 'P':
-Pflag = 1;
+Pflag = true;
 pattern = parse_pattern(optarg);
 if (pattern < 0) {
 return 0;
 }
 break;
 case 'q':
-qflag = 1;
+qflag = true;
 break;
 case 'v':
-vflag = 1;
+vflag = true;
 break;
 default:
 return qemuio_command_usage(&readv_cmd);
@@ -976,8 +976,8 @@ static const cmdinfo_t write_cmd = {
 static int write_f(BlockBackend *blk, int argc, char **argv)
 {
 struct timeval t1, t2;
-int Cflag = 0, pflag = 0, qflag = 0, bflag = 0, Pflag = 0, zflag = 0;
-int cflag = 0;
+bool Cflag = false, pflag = false, qflag = false, bflag = false;
+bool Pflag = false, zflag = false, cflag = false;
 int c, cnt;
 char *buf = NULL;
 int64_t offset;
@@ -989,29 +989,29 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 while ((c = getopt(argc, argv, "bcCpP:qz")) != -1) {
 switch (c) {
 case 'b':
-bflag = 1;
+bflag = true;
 break;
 case 'c':
-cflag = 1;
+cflag = true;
 break;
 case 'C':
-Cflag = 1;
+Cflag = true;
 break;
 case 'p':
-pflag = 1;
+pflag = true;
 break;
 case 'P':
-Pflag = 1;
+Pflag = true;
 pattern = parse_pattern(optarg);
 if (pattern < 0) {
 return 0;
 }

[Qemu-devel] [PATCH v4 5/6] qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact

2016-05-04 Thread Eric Blake

When opening a file from the command line, qemu-io defaults
to BDRV_O_UNMAP but allows -d to give full control to disable
unmaps. But when opening via the 'open' command, qemu-io did
not set BDRV_O_UNMAP, and had no way to allow it.

Make it at least possible to symmetrically test things:
'qemu-io -d ignore' at the CLI now matches 'qemu-io> open'
in batch mode, and 'qemu-io' or 'qemu-io -d unmap' at
the CLI matches 'qemu-io> open -u'.

Signed-off-by: Eric Blake 
---
 qemu-io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 4aba7e0..2196159 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -107,6 +107,7 @@ static void open_help(void)
 " -r, -- open file read-only\n"
 " -s, -- use snapshot file\n"
 " -n, -- disable host cache\n"
+" -u, -- allow discard and zero operations to unmap\n"
 " -o, -- options to be given to the block driver"
 "\n");
 }
@@ -120,7 +121,7 @@ static const cmdinfo_t open_cmd = {
 .argmin = 1,
 .argmax = -1,
 .flags  = CMD_NOFILE_OK,
-.args   = "[-Crsn] [-o options] [path]",
+.args   = "[-Crsnu] [-o options] [path]",
 .oneline= "open the file specified by path",
 .help   = open_help,
 };
@@ -144,7 +145,7 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
 QemuOpts *qopts;
 QDict *opts;

-while ((c = getopt(argc, argv, "snrgo:")) != -1) {
+while ((c = getopt(argc, argv, "snrguo:")) != -1) {
 switch (c) {
 case 's':
 flags |= BDRV_O_SNAPSHOT;
@@ -156,6 +157,9 @@ static int open_f(BlockBackend *blk, int argc, char **argv)
 case 'r':
 readonly = 1;
 break;
+case 'u':
+flags |= BDRV_O_UNMAP;
+break;
 case 'o':
 if (imageOpts) {
 printf("--image-opts and 'open -o' are mutually exclusive\n");
-- 
2.5.5

[Qemu-devel] [PATCH v4 6/6] qemu-io: Add 'write -z -u' to test MAY_UNMAP flag

2016-05-04 Thread Eric Blake

Make it easier to control whether the BDRV_REQ_MAY_UNMAP flag
can be passed through a write_zeroes command, by adding the '-u'
flag to qemu-io 'write -z' and 'aio_write -z'.  To be useful,
the device has to be opened with 'qemu-io -d unmap' (or the
just-added 'open -u' subcommand).

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index ba811fe..e71bc5c 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -943,6 +943,7 @@ static void write_help(void)
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
+" -u, -- with -z, allow unmapping\n"
 " -z, -- write zeroes using blk_co_write_zeroes\n"
 "\n");
 }
@@ -955,7 +956,7 @@ static const cmdinfo_t write_cmd = {
 .cfunc  = write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCfqz] [-P pattern ] off len",
+.args   = "[-bcCfquz] [-P pattern ] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -974,7 +975,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
 int64_t total = 0;
 int pattern = 0xcd;

-while ((c = getopt(argc, argv, "bcCfpP:qz")) != -1) {
+while ((c = getopt(argc, argv, "bcCfpP:quz")) != -1) {
 switch (c) {
 case 'b':
 bflag = true;
@@ -1001,6 +1002,9 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 case 'q':
 qflag = true;
 break;
+case 'u':
+flags |= BDRV_REQ_MAY_UNMAP;
+break;
 case 'z':
 zflag = true;
 break;
@@ -1023,6 +1027,11 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }

+if ((flags & BDRV_REQ_MAY_UNMAP) && !zflag) {
+printf("-u requires -z to be specified\n");
+return 0;
+}
+
 if (zflag && Pflag) {
 printf("-z and -P cannot be specified at the same time\n");
 return 0;
@@ -1561,6 +1570,7 @@ static void aio_write_help(void)
 " -C, -- report statistics in a machine parsable format\n"
 " -f, -- use Force Unit Access semantics\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
+" -u, -- with -z, allow unmapping\n"
 " -z, -- write zeroes using blk_aio_write_zeroes\n"
 "\n");
 }
@@ -1572,7 +1582,7 @@ static const cmdinfo_t aio_write_cmd = {
 .cfunc  = aio_write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-Cfqz] [-P pattern ] off len [len..]",
+.args   = "[-Cfquz] [-P pattern ] off len [len..]",
 .oneline= "asynchronously writes a number of bytes",
 .help   = aio_write_help,
 };
@@ -1596,6 +1606,9 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 case 'q':
 ctx->qflag = true;
 break;
+case 'u':
+flags |= BDRV_REQ_MAY_UNMAP;
+break;
 case 'P':
 pattern = parse_pattern(optarg);
 if (pattern < 0) {
@@ -1623,6 +1636,11 @@ static int aio_write_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }

+if ((flags & BDRV_REQ_MAY_UNMAP) && !ctx->zflag) {
+printf("-u requires -z to be specified\n");
+return 0;
+}
+
 if (ctx->zflag && ctx->Pflag) {
 printf("-z and -P cannot be specified at the same time\n");
 g_free(ctx);
-- 
2.5.5

[Qemu-devel] [PATCH v4 4/6] qemu-io: Add 'write -f' to test FUA flag

2016-05-04 Thread Eric Blake

Make it easier to test block drivers with BDRV_REQ_FUA in
.supported_write_flags, by adding the '-f' flag to qemu-io to
conditionally pass the flag through to specific writes ('write',
'write -z', 'writev', 'aio_write', 'aio_write -z'). You'll want
to use 'qemu-io -t none' to actually make -f useful (as
otherwise, the default writethrough mode automatically sets the
FUA bit on every write).

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 57 +
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 8bcf742..ba811fe 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -428,13 +428,13 @@ static int do_pread(BlockBackend *blk, char *buf, int64_t 
offset,
 }

 static int do_pwrite(BlockBackend *blk, char *buf, int64_t offset,
- int64_t count, int64_t *total)
+ int64_t count, int flags, int64_t *total)
 {
 if (count > INT_MAX) {
 return -ERANGE;
 }

-*total = blk_pwrite(blk, offset, (uint8_t *)buf, count, 0);
+*total = blk_pwrite(blk, offset, (uint8_t *)buf, count, flags);
 if (*total < 0) {
 return *total;
 }
@@ -446,6 +446,7 @@ typedef struct {
 int64_t offset;
 int64_t count;
 int64_t *total;
+int flags;
 int ret;
 bool done;
 } CoWriteZeroes;
@@ -454,7 +455,8 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)
 {
 CoWriteZeroes *data = opaque;

-data->ret = blk_co_write_zeroes(data->blk, data->offset, data->count, 0);
+data->ret = blk_co_write_zeroes(data->blk, data->offset, data->count,
+data->flags);
 data->done = true;
 if (data->ret < 0) {
 *data->total = data->ret;
@@ -465,7 +467,7 @@ static void coroutine_fn co_write_zeroes_entry(void *opaque)
 }

 static int do_co_write_zeroes(BlockBackend *blk, int64_t offset, int64_t count,
-  int64_t *total)
+  int flags, int64_t *total)
 {
 Coroutine *co;
 CoWriteZeroes data = {
@@ -473,6 +475,7 @@ static int do_co_write_zeroes(BlockBackend *blk, int64_t 
offset, int64_t count,
 .offset = offset,
 .count  = count,
 .total  = total,
+.flags  = flags,
 .done   = false,
 };

@@ -558,11 +561,11 @@ static int do_aio_readv(BlockBackend *blk, QEMUIOVector 
*qiov,
 }

 static int do_aio_writev(BlockBackend *blk, QEMUIOVector *qiov,
- int64_t offset, int *total)
+ int64_t offset, int flags, int *total)
 {
 int async_ret = NOT_DONE;

-blk_aio_pwritev(blk, offset, qiov, 0, aio_rw_done, &async_ret);
+blk_aio_pwritev(blk, offset, qiov, flags, aio_rw_done, &async_ret);
 while (async_ret == NOT_DONE) {
 main_loop_wait(false);
 }
@@ -935,6 +938,7 @@ static void write_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
 " -c, -- write compressed data with blk_write_compressed\n"
+" -f, -- use Force Unit Access semantics\n"
 " -p, -- ignored for back-compat\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
@@ -951,7 +955,7 @@ static const cmdinfo_t write_cmd = {
 .cfunc  = write_f,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCqz] [-P pattern ] off len",
+.args   = "[-bcCfqz] [-P pattern ] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -961,6 +965,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
 struct timeval t1, t2;
 bool Cflag = false, qflag = false, bflag = false;
 bool Pflag = false, zflag = false, cflag = false;
+int flags = 0;
 int c, cnt;
 char *buf = NULL;
 int64_t offset;
@@ -969,7 +974,7 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
 int64_t total = 0;
 int pattern = 0xcd;

-while ((c = getopt(argc, argv, "bcCpP:qz")) != -1) {
+while ((c = getopt(argc, argv, "bcCfpP:qz")) != -1) {
 switch (c) {
 case 'b':
 bflag = true;
@@ -980,6 +985,9 @@ static int write_f(BlockBackend *blk, int argc, char **argv)
 case 'C':
 Cflag = true;
 break;
+case 'f':
+flags |= BDRV_REQ_FUA;
+break;
 case 'p':
 /* Ignored for back-compat */
 break;
@@ -1010,6 +1018,11 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }

+if ((flags & BDRV_REQ_FUA) && (bflag + cflag)) {
+printf("-f and -b or -c cannot be specified at the same time\n");
+return 0;
+}
+
 if (zflag && Pflag) {
 printf("-z and -P cannot be specified at the same time\n");
 return 0;
@@ -1054,11 +1067,11 @@ static int write_f(BlockBackend *blk, int argc, char

[Qemu-devel] [PATCH v4 0/6] qemu-io: UI enhancements

2016-05-04 Thread Eric Blake

While working on NBD, I found myself cursing the qemu-io UI for
not letting me test various scenarios, particularly after fixing
NBD to serve at byte granularity [1].  And in the process of
writing these qemu-io enhancements, I also managed to flush out
several other bugs in the block layer proper, with fixes posted
separately, such as loss of BDRV_REQ_FUA during write_zeroes [2]

2.7 material, depends on Kevin's block-next:
git://repo.or.cz/qemu/kevin.git block-next
and on my pending "block: kill sector-based blk_write/read"
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00557.html

Previously posted as part of a larger NBD series [3] at v3, hence
this series starts life at v4.

[1] commit df7b97ff
[2] https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00285.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03526.html

Also available as a tag at this location:
git fetch git://repo.or.cz/qemu/ericb.git nbd-qemu-io-v4

Changes since then:
More cleanups
Include readv/writev and aio_read/aio_write.
Update qemu-iotests 23 to match new 'write' semantics [kwolf]

001/6:[] [--] 'qemu-io: Add missing option documentation'
002/6:[down] 'qemu-io: Use bool for command line flags'
003/6:[down] 'qemu-io: Allow unaligned access by default'
004/6:[0056] [FC] 'qemu-io: Add 'write -f' to test FUA flag'
005/6:[] [--] 'qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact'
006/6:[0026] [FC] 'qemu-io: Add 'write -z -u' to test MAY_UNMAP flag'

Eric Blake (6):
  qemu-io: Add missing option documentation
  qemu-io: Use bool for command line flags
  qemu-io: Allow unaligned access by default
  qemu-io: Add 'write -f' to test FUA flag
  qemu-io: Add 'open -u' to set BDRV_O_UNMAP after the fact
  qemu-io: Add 'write -z -u' to test MAY_UNMAP flag

 qemu-io-cmds.c |  222 ++---
 qemu-io.c  |   12 +-
 tests/qemu-iotests/023.out | 2160 +---
 3 files changed, 1562 insertions(+), 832 deletions(-)

-- 
2.5.5

[Qemu-devel] [PATCH v6 24/26] kvm-irqchip: i386: add hook for add/remove virq

2016-05-04 Thread Peter Xu

Adding two hooks to be notified when adding/removing msi routes. On x86
platform, one list is maintained for all existing msi routes.

Signed-off-by: Peter Xu 
---
 include/sysemu/kvm.h |  6 ++
 kvm-all.c|  2 ++
 target-arm/kvm.c | 11 +++
 target-i386/kvm.c| 38 ++
 target-mips/kvm.c| 11 +++
 target-ppc/kvm.c | 11 +++
 target-s390x/kvm.c   | 11 +++
 trace-events |  2 ++
 8 files changed, 92 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 20b52f0..94a7f63 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -355,6 +355,12 @@ void kvm_arch_init_irq_routing(KVMState *s);
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev);
 
+/* Notify arch about newly added MSI routes */
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev);
+/* Notify arch about released MSI routes */
+int kvm_arch_release_virq_post(int virq);
+
 int kvm_arch_msi_data_to_gsi(uint32_t data);
 
 int kvm_set_irq(KVMState *s, int irq, int level);
diff --git a/kvm-all.c b/kvm-all.c
index f0dc769..a984564 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1073,6 +1073,7 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
 }
 }
 clear_gsi(s, virq);
+kvm_arch_release_virq_post(virq);
 }
 
 static unsigned int kvm_hash_msi(uint32_t data)
@@ -1221,6 +1222,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 }
 
 kvm_add_routing_entry(s, &kroute);
+kvm_arch_add_msi_route_post(&kroute, vector, dev);
 kvm_irqchip_commit_routes(s);
 
 return virq;
diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 3671032..90c293e 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -622,6 +622,17 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry 
*route,
 return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return (data - 32) & 0x;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d1a4d77..f043e45 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3355,6 +3355,44 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
 return 0;
 }
 
+typedef struct MSIRouteEntry MSIRouteEntry;
+
+struct MSIRouteEntry {
+PCIDevice *dev; /* Device pointer */
+int vector; /* MSI/MSIX vector index */
+int virq;   /* Virtual IRQ index */
+QLIST_ENTRY(MSIRouteEntry) list;
+};
+
+/* List of used GSI routes */
+static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
+QLIST_HEAD_INITIALIZER(msi_route_list);
+
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+MSIRouteEntry *entry = g_new0(MSIRouteEntry, 1);
+entry->dev = dev;
+entry->vector = vector;
+entry->virq = route->gsi;
+QLIST_INSERT_HEAD(&msi_route_list, entry, list);
+trace_kvm_x86_add_msi_route(route->gsi);
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+MSIRouteEntry *entry, *next;
+QLIST_FOREACH_SAFE(entry, &msi_route_list, list, next) {
+if (entry->virq == virq) {
+trace_kvm_x86_remove_msi_route(virq);
+QLIST_REMOVE(entry, list);
+break;
+}
+}
+return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 abort();
diff --git a/target-mips/kvm.c b/target-mips/kvm.c
index 950bc05..1dd7904 100644
--- a/target-mips/kvm.c
+++ b/target-mips/kvm.c
@@ -1044,6 +1044,17 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
 return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 abort();
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index c4c8146..143a2bf 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -2566,6 +2566,17 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
 return 0;
 }
 
+int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
+int vector, PCIDevice *dev)
+{
+return 0;
+}
+
+int kvm_arch_release_virq_post(int virq)
+{
+return 0;
+}
+
 int kvm_arch_msi_data_to_gsi(uint32_t data)
 {
 return data & 0x;
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index e1859ca..22d2ed4 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -2246,6 +2246,17 @@ int kvm_arch_fixup_

Re: [Qemu-devel] [PATCH 11/18] vhost-user: add shutdown support

2016-05-04 Thread Yuanhan Liu

Hello,

On Wed, May 04, 2016 at 10:13:49PM +0300, Michael S. Tsirkin wrote:
> How do you avoid it?
> 
> > > Management is required to make this robust, auto-reconnect
> > > is handy for people bypassing management.
> > 
> > tbh, I don't like autoreconnect. My previous series didn't include
> > this and assumed the feature would be supported only when qemu is
> > configured to be the server. I added reconnect upon request by users.
> 
> I don't have better solutions so OK I guess.

Yes, it's a request from me :)
Well, there may be few others also requested that.

The reason I had this asking is that, so far, we just have only one
vhost-user frontend, and that is QEMU. But we may have many backends,
I'm aware of 4 at the time writing, including the vubr from QEMU.
While we could do vhost-user client and reconnect implementation
on all backends, it's clear that implementing it in the only backend
(QEMU) introduces more benefits.

OTOH, I could implement DPDK vhost-user as client and try reconnect
there, if that could makes all stuff a bit easier.

--yliu

[Qemu-devel] [PATCH v6 26/26] kvm-irqchip: do explicit commit when update irq

2016-05-04 Thread Peter Xu

In the past, we are doing gsi route commit for each irqchip route
update. This is not efficient if we are updating lots of routes in the
same time. This patch removes the committing phase in
kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
routes updated.

Signed-off-by: Peter Xu 
---
 hw/i386/kvm/pci-assign.c | 2 ++
 hw/misc/ivshmem.c| 1 +
 hw/vfio/pci.c| 1 +
 hw/virtio/virtio-pci.c   | 1 +
 include/sysemu/kvm.h | 2 +-
 kvm-all.c| 2 --
 kvm-stub.c   | 4 
 target-i386/kvm.c| 1 +
 8 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 85caf37..3615916 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -1015,6 +1015,7 @@ static void assigned_dev_update_msi_msg(PCIDevice 
*pci_dev)
 
 kvm_irqchip_update_msi_route(kvm_state, assigned_dev->msi_virq[0],
  msi_get_message(pci_dev, 0), pci_dev);
+kvm_irqchip_commit_routes(kvm_state);
 }
 
 static bool assigned_dev_msix_masked(MSIXTableEntry *entry)
@@ -1601,6 +1602,7 @@ static void assigned_dev_msix_mmio_write(void *opaque, 
hwaddr addr,
 if (ret) {
 error_report("Error updating irq routing entry (%d)", ret);
 }
+kvm_irqchip_commit_routes(kvm_state);
 }
 }
 }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 6909346..953d7f8 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -325,6 +325,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned 
vector,
 if (ret < 0) {
 return ret;
 }
+kvm_irqchip_commit_routes(kvm_state);
 
 return kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, n, NULL, v->virq);
 }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2b2f935..eb09bc6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -458,6 +458,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, 
MSIMessage msg,
  PCIDevice *pdev)
 {
 kvm_irqchip_update_msi_route(kvm_state, vector->virq, msg, pdev);
+kvm_irqchip_commit_routes(kvm_state);
 }
 
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index df85f28..6342435 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -872,6 +872,7 @@ static int virtio_pci_vq_vector_unmask(VirtIOPCIProxy 
*proxy,
 if (ret < 0) {
 return ret;
 }
+kvm_irqchip_commit_routes(kvm_state);
 }
 }
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 94a7f63..b7a20eb 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -367,7 +367,6 @@ int kvm_set_irq(KVMState *s, int irq, int level);
 int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
 void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
-void kvm_irqchip_commit_routes(KVMState *s);
 
 void kvm_put_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
 void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
@@ -490,6 +489,7 @@ static inline void cpu_synchronize_post_init(CPUState *cpu)
 int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
  PCIDevice *dev);
+void kvm_irqchip_commit_routes(KVMState *s);
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
diff --git a/kvm-all.c b/kvm-all.c
index 95f1df3..8106efb 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1034,8 +1034,6 @@ static int kvm_update_routing_entry(KVMState *s,
 
 *entry = *new_entry;
 
-kvm_irqchip_commit_routes(s);
-
 return 0;
 }
 
diff --git a/kvm-stub.c b/kvm-stub.c
index 421c9ce..d2c1a5b 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -131,6 +131,10 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 return -ENOSYS;
 }
 
+void kvm_irqchip_commit_routes(KVMState *s)
+{
+}
+
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
 return -ENOSYS;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 579662b..80b3251 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -3382,6 +3382,7 @@ static void kvm_update_msi_routes_all(void *private, bool 
global,
 kvm_irqchip_update_msi_route(kvm_state, entry->virq,
  msg, entry->dev);
 }
+kvm_irqchip_commit_routes(kvm_state);
 trace_kvm_x86_update_msi_routes(cnt);
 }
 
-- 
2.4.11

[Qemu-devel] [PATCH v6 25/26] kvm-irqchip: x86: add msi route notify fn

2016-05-04 Thread Peter Xu

One more IEC notifier is added to let msi routes know about the IEC
changes. When interrupt invalidation happens, all registered msi routes
will be updated for all PCI devices.

Since both vfio and vhost are possible gsi route consumers, this patch
will go one step further to keep them safe in split irqchip mode and
when irqfd is enabled.

Signed-off-by: Peter Xu 
---
 hw/pci/pci.c | 15 +++
 include/hw/pci/pci.h |  2 ++
 kvm-all.c| 10 +-
 target-i386/kvm.c| 30 ++
 trace-events |  1 +
 5 files changed, 49 insertions(+), 9 deletions(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index bb605ef..620f712 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2498,6 +2498,21 @@ PCIDevice *pci_get_function_0(PCIDevice *pci_dev)
 }
 }
 
+MSIMessage pci_get_msi_message(PCIDevice *dev, int vector)
+{
+MSIMessage msg;
+if (msix_enabled(dev)) {
+msg = msix_get_message(dev, vector);
+} else if (msi_enabled(dev)) {
+msg = msi_get_message(dev, vector);
+} else {
+/* Should never happen */
+error_report("%s: unknown interrupt type", __func__);
+abort();
+}
+return msg;
+}
+
 static const TypeInfo pci_device_type_info = {
 .name = TYPE_PCI_DEVICE,
 .parent = TYPE_DEVICE,
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index ef6ba51..04945ad 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -776,4 +776,6 @@ extern const VMStateDescription vmstate_pci_device;
 .offset = vmstate_offset_pointer(_state, _field, PCIDevice), \
 }
 
+MSIMessage pci_get_msi_message(PCIDevice *dev, int vector);
+
 #endif
diff --git a/kvm-all.c b/kvm-all.c
index a984564..95f1df3 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1186,15 +1186,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 MSIMessage msg = {0, 0};
 
 if (dev) {
-if (msix_enabled(dev)) {
-msg = msix_get_message(dev, vector);
-} else if (msi_enabled(dev)) {
-msg = msi_get_message(dev, vector);
-} else {
-/* Should never happen */
-error_report("%s: unknown interrupt type", __func__);
-abort();
-}
+msg = pci_get_msi_message(dev, vector);
 }
 
 if (kvm_gsi_direct_mapping()) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index f043e45..579662b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -37,6 +37,7 @@
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 
 #include "exec/ioport.h"
 #include "standard-headers/asm-x86/hyperv.h"
@@ -3368,15 +3369,44 @@ struct MSIRouteEntry {
 static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
 QLIST_HEAD_INITIALIZER(msi_route_list);
 
+static void kvm_update_msi_routes_all(void *private, bool global,
+  uint32_t index, uint32_t mask)
+{
+int cnt = 0;
+MSIRouteEntry *entry;
+MSIMessage msg;
+/* TODO: explicit route update */
+QLIST_FOREACH(entry, &msi_route_list, list) {
+cnt++;
+msg = pci_get_msi_message(entry->dev, entry->vector);
+kvm_irqchip_update_msi_route(kvm_state, entry->virq,
+ msg, entry->dev);
+}
+trace_kvm_x86_update_msi_routes(cnt);
+}
+
 int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
 int vector, PCIDevice *dev)
 {
+static bool notify_list_inited = false;
 MSIRouteEntry *entry = g_new0(MSIRouteEntry, 1);
 entry->dev = dev;
 entry->vector = vector;
 entry->virq = route->gsi;
 QLIST_INSERT_HEAD(&msi_route_list, entry, list);
 trace_kvm_x86_add_msi_route(route->gsi);
+
+if (!notify_list_inited) {
+/* For the first time we do add route, add ourselves into
+ * IOMMU's IEC notify list if needed. */
+X86IOMMUState *iommu = x86_iommu_get_default();
+if (iommu) {
+x86_iommu_iec_register_notifier(iommu,
+kvm_update_msi_routes_all,
+NULL);
+}
+notify_list_inited = true;
+}
 return 0;
 }
 
diff --git a/trace-events b/trace-events
index 2eea7f7..2c7220f 100644
--- a/trace-events
+++ b/trace-events
@@ -1914,3 +1914,4 @@ aspeed_vic_write(uint64_t offset, unsigned size, uint32_t 
data) "To 0x%" PRIx64
 kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI 
%" PRIu32
 kvm_x86_add_msi_route(int virq) "Adding route entry for virq %d"
 kvm_x86_remove_msi_route(int virq) "Removing route entry for virq %d"
+kvm_x86_update_msi_routes(int num) "Updated %d MSI routes"
-- 
2.4.11

[Qemu-devel] [PATCH v6 23/26] kvm-irqchip: simplify kvm_irqchip_add_msi_route

2016-05-04 Thread Peter Xu

Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route
into the vector number. Vector index provides more information than the
MSIMessage, we can retrieve the MSIMessage using the vector easily. This
will avoid fetching MSIMessage every time before adding MSI routes.

Meanwhile, the vector info will be used in the coming patches to further
enable gsi route update notifications.

Signed-off-by: Peter Xu 
---
 hw/i386/kvm/pci-assign.c |  8 ++--
 hw/misc/ivshmem.c|  3 +--
 hw/vfio/pci.c| 11 +--
 hw/virtio/virtio-pci.c   |  9 +++--
 include/sysemu/kvm.h | 13 -
 kvm-all.c| 18 --
 kvm-stub.c   |  2 +-
 target-i386/kvm.c|  3 +--
 8 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index bf425a2..85caf37 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -974,10 +974,9 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 }
 
 if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
-MSIMessage msg = msi_get_message(pci_dev, 0);
 int virq;
 
-virq = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
+virq = kvm_irqchip_add_msi_route(kvm_state, 0, pci_dev);
 if (virq < 0) {
 perror("assigned_dev_update_msi: kvm_irqchip_add_msi_route");
 return;
@@ -1042,7 +1041,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 uint16_t entries_nr = 0;
 int i, r = 0;
 MSIXTableEntry *entry = adev->msix_table;
-MSIMessage msg;
 
 /* Get the usable entry number for allocating */
 for (i = 0; i < adev->msix_max; i++, entry++) {
@@ -1079,9 +1077,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 continue;
 }
 
-msg.address = entry->addr_lo | ((uint64_t)entry->addr_hi << 32);
-msg.data = entry->data;
-r = kvm_irqchip_add_msi_route(kvm_state, msg, pci_dev);
+r = kvm_irqchip_add_msi_route(kvm_state, i, pci_dev);
 if (r < 0) {
 return r;
 }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index e40f23b..6909346 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -444,13 +444,12 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int 
vector,
  Error **errp)
 {
 PCIDevice *pdev = PCI_DEVICE(s);
-MSIMessage msg = msix_get_message(pdev, vector);
 int ret;
 
 IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
 assert(!s->msi_vectors[vector].pdev);
 
-ret = kvm_irqchip_add_msi_route(kvm_state, msg, pdev);
+ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev);
 if (ret < 0) {
 error_setg(errp, "kvm_irqchip_add_msi_route failed");
 return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d091d8c..2b2f935 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -417,11 +417,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 }
 
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
-  MSIMessage *msg, bool msix)
+  int vector_n, bool msix)
 {
 int virq;
 
-if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi) || !msg) {
+if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi)) {
 return;
 }
 
@@ -429,7 +429,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
VFIOMSIVector *vector,
 return;
 }
 
-virq = kvm_irqchip_add_msi_route(kvm_state, *msg, &vdev->pdev);
+virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev);
 if (virq < 0) {
 event_notifier_cleanup(&vector->kvm_interrupt);
 return;
@@ -495,7 +495,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 vfio_update_kvm_msi_virq(vector, *msg, pdev);
 }
 } else {
-vfio_add_kvm_msi_virq(vdev, vector, msg, true);
+vfio_add_kvm_msi_virq(vdev, vector, nr, true);
 }
 
 /*
@@ -639,7 +639,6 @@ retry:
 
 for (i = 0; i < vdev->nr_vectors; i++) {
 VFIOMSIVector *vector = &vdev->msi_vectors[i];
-MSIMessage msg = msi_get_message(&vdev->pdev, i);
 
 vector->vdev = vdev;
 vector->virq = -1;
@@ -656,7 +655,7 @@ retry:
  * Attempt to enable route through KVM irqchip,
  * default to userspace handling if unavailable.
  */
-vfio_add_kvm_msi_virq(vdev, vector, &msg, false);
+vfio_add_kvm_msi_virq(vdev, vector, i, false);
 }
 
 /* Set interrupt type prior to possible interrupts */
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index bfedbbf..df85f28 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -727,14 +727,13 @@ static uint32_t virtio_read_config(PCIDevice *pci_dev,
 
 static int kvm_virtio_pci_vq_vector_use(VirtIOPCIPr

[Qemu-devel] [PATCH v6 19/26] intel_iommu: Add support for Extended Interrupt Mode

2016-05-04 Thread Peter Xu

From: Jan Kiszka 

As neither QEMU nor KVM support more than 255 CPUs so far, this is
simple: we only need to switch the destination ID translation in
vtd_remap_irq_get if EIME is set.

Once CFI support is there, it will have to take EIM into account as
well. So far, nothing to do for this.

This patch allows to use x2APIC in split irqchip mode of KVM.

Signed-off-by: Jan Kiszka 
---
 hw/i386/intel_iommu.c  | 16 +---
 hw/i386/intel_iommu_internal.h |  2 ++
 include/hw/i386/intel_iommu.h  |  1 +
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index dc0e4ba..1e57125 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -922,6 +922,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState 
*s)
 value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
 s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
 s->intr_root = value & VTD_IRTA_ADDR_MASK;
+s->intr_eime = value & VTD_IRTA_EIME;
 
 /* Notify global invalidation */
 vtd_iec_notify_all(s, true, 0, 0);
@@ -2056,11 +2057,13 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
uint16_t index, VTDIrq *irq
 irq->trigger_mode = irte.trigger_mode;
 irq->vector = irte.vector;
 irq->delivery_mode = irte.delivery_mode;
-/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+irq->dest = irte.dest_id;
+if (!iommu->intr_eime) {
 #define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
 #define  VTD_IR_APIC_DEST_SHIFT(8)
-irq->dest = (irte.dest_id & VTD_IR_APIC_DEST_MASK) >> \
-VTD_IR_APIC_DEST_SHIFT;
+irq->dest = (irq->dest & VTD_IR_APIC_DEST_MASK) >>
+VTD_IR_APIC_DEST_SHIFT;
+}
 irq->dest_mode = irte.dest_mode;
 irq->redir_hint = irte.redir_hint;
 
@@ -2330,7 +2333,7 @@ static void vtd_init(IntelIOMMUState *s)
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
 if (ms->iommu_intr) {
-s->ecap |= VTD_ECAP_IR;
+s->ecap |= VTD_ECAP_IR | VTD_ECAP_EIM;
 }
 
 vtd_reset_context_cache(s);
@@ -2384,10 +2387,9 @@ static void vtd_init(IntelIOMMUState *s)
 vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
 
 /*
- * Interrupt remapping registers, not support extended interrupt
- * mode for now.
+ * Interrupt remapping registers.
  */
-vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
+vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf80fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 10c20fe..72b0114 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,7 @@
 
 /* IRTA_REG */
 #define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_EIME   (1ULL << 11)
 #define VTD_IRTA_SIZE_MASK  (0xfULL)
 
 /* ECAP_REG */
@@ -184,6 +185,7 @@
 #define VTD_ECAP_QI (1ULL << 1)
 /* Interrupt Remapping support */
 #define VTD_ECAP_IR (1ULL << 3)
+#define VTD_ECAP_EIM(1ULL << 4)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 4fe92cf..c0c5819 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -261,6 +261,7 @@ struct IntelIOMMUState {
 bool intr_enabled;  /* Whether guest enabled IR */
 dma_addr_t intr_root;   /* Interrupt remapping table pointer */
 uint32_t intr_size; /* Number of IR table entries */
+bool intr_eime; /* Extended interrupt mode enabled */
 QLIST_HEAD(, VTD_IEC_Notifier) iec_notifiers; /* IEC notify list */
 };
 
-- 
2.4.11

[Qemu-devel] [PATCH v6 22/26] x86-iommu: replace existing VT-d hooks into X86 ones

2016-05-04 Thread Peter Xu

Previously, there are lots of VT-d hooks in common codes (like q35,
ioapic, etc.). A better way is to avoid using VT-d interfaces. Also, we
can start to abstract some common functions between Intel and future AMD
IOMMU device.

This patch cleaned up all the VT-d hooks into x86 ones, and moved IEC
notifier list from VT-d state to x86 state, so that this can be further
leveraged by AMD IOMMU codes.

Instead of searching in the global device tree every time, one static
variable is declared to store the default system x86 IOMMU device.

Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c  |  4 +--
 hw/i386/intel_iommu.c | 67 +
 hw/i386/x86-iommu.c   | 49 +++
 hw/intc/ioapic.c  |  6 ++--
 hw/pci-host/q35.c | 11 +++
 include/hw/i386/intel_iommu.h | 33 ---
 include/hw/i386/x86-iommu.h   | 77 ++-
 target-i386/kvm.c |  7 ++--
 8 files changed, 160 insertions(+), 94 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index b064bc2..6cc686e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -51,7 +51,7 @@
 #include "hw/i386/ich9.h"
 #include "hw/pci/pci_bus.h"
 #include "hw/pci-host/q35.h"
-#include "hw/i386/intel_iommu.h"
+#include "hw/i386/x86-iommu.h"
 #include "hw/timer/hpet.h"
 
 #include "hw/acpi/aml-build.h"
@@ -2677,7 +2677,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
 
 static bool acpi_has_iommu(void)
 {
-return !!vtd_iommu_get();
+return !!x86_iommu_get_default();
 }
 
 static
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 51ad0b5..bee85e4 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -25,6 +25,7 @@
 #include "intel_iommu_internal.h"
 #include "hw/pci/pci.h"
 #include "hw/boards.h"
+#include "hw/i386/x86-iommu.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -191,7 +192,7 @@ static void vtd_reset_context_cache(IntelIOMMUState *s)
 
 VTD_DPRINTF(CACHE, "global context_cache_gen=1");
 while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
+for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
 vtd_as = vtd_bus->dev_as[devfn_it];
 if (!vtd_as) {
 continue;
@@ -903,17 +904,7 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
 static void vtd_iec_notify_all(IntelIOMMUState *s, bool global,
uint32_t index, uint32_t mask)
 {
-VTD_IEC_Notifier *notifier;
-
-VTD_DPRINTF(INV, "notify IEC invalidate: global=%d, index=%u, mask=%u",
-global, index, mask);
-
-QLIST_FOREACH(notifier, &s->iec_notifiers, list) {
-if (notifier->iec_notify) {
-notifier->iec_notify(notifier->private, global,
- index, mask);
-}
-}
+x86_iommu_iec_notify_all(X86_IOMMU_DEVICE(s), global, index, mask);
 }
 
 static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
@@ -994,7 +985,7 @@ static void vtd_context_device_invalidate(IntelIOMMUState 
*s,
 vtd_bus = vtd_find_as_from_bus_num(s, VTD_SID_TO_BUS(source_id));
 if (vtd_bus) {
 devfn = VTD_SID_TO_DEVFN(source_id);
-for (devfn_it = 0; devfn_it < VTD_PCI_DEVFN_MAX; ++devfn_it) {
+for (devfn_it = 0; devfn_it < X86_IOMMU_PCI_DEVFN_MAX; ++devfn_it) {
 vtd_as = vtd_bus->dev_as[devfn_it];
 if (vtd_as && ((devfn_it & mask) == (devfn & mask))) {
 VTD_DPRINTF(INV, "invalidate context-cahce of devfn 0x%"PRIx16,
@@ -2037,7 +2028,7 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t 
index,
 return -VTD_FR_IR_IRTE_RSVD;
 }
 
-if (sid != VTD_SID_INVALID) {
+if (sid != X86_IOMMU_SID_INVALID) {
 /* Validate IRTE SID */
 switch (entry->sid_vtype) {
 case VTD_SVT_NONE:
@@ -2219,10 +2210,11 @@ do_not_translate:
 return 0;
 }
 
-int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst,
-  uint16_t sid)
+static int vtd_int_remap(X86IOMMUState *iommu, MSIMessage *src,
+ MSIMessage *dst, uint16_t sid)
 {
-return vtd_interrupt_remap_msi(iommu, src, dst, sid);
+return vtd_interrupt_remap_msi(INTEL_IOMMU_DEVICE(iommu),
+   src, dst, sid);
 }
 
 static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
@@ -2248,7 +2240,7 @@ static MemTxResult vtd_mem_ir_write(void *opaque, hwaddr 
addr,
 {
 int ret = 0;
 MSIMessage from = {0}, to = {0};
-uint16_t sid = VTD_SID_INVALID;
+uint16_t sid = X86_IOMMU_SID_INVALID;
 
 from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
 from.data = (uint32_t) value;
@@ -2293,24 +2285,18 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
 },
 };
 
-void vtd_iec_register_notifier(IntelIOM

[Qemu-devel] [PATCH v6 17/26] ioapic: keep RO bits for IOAPIC entry

2016-05-04 Thread Peter Xu

Currently IOAPIC RO bits can be written. To be better aligned with
hardware, we should let them read-only.

Reviewed-by: Radim Krčmář 
Signed-off-by: Peter Xu 
---
 hw/intc/ioapic.c  | 4 
 include/hw/i386/ioapic_internal.h | 5 +
 2 files changed, 9 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index b41ab89..d7ebb5c 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -307,6 +307,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
 default:
 index = (s->ioregsel - IOAPIC_REG_REDTBL_BASE) >> 1;
 if (index >= 0 && index < IOAPIC_NUM_PINS) {
+uint64_t ro_bits = s->ioredtbl[index] & IOAPIC_RO_BITS;
 if (s->ioregsel & 1) {
 s->ioredtbl[index] &= 0x;
 s->ioredtbl[index] |= (uint64_t)val << 32;
@@ -314,6 +315,9 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
 s->ioredtbl[index] &= ~0xULL;
 s->ioredtbl[index] |= val;
 }
+/* restore RO bits */
+s->ioredtbl[index] &= IOAPIC_RW_BITS;
+s->ioredtbl[index] |= ro_bits;
 ioapic_service(s);
 }
 }
diff --git a/include/hw/i386/ioapic_internal.h 
b/include/hw/i386/ioapic_internal.h
index d279f2d..31dafb3 100644
--- a/include/hw/i386/ioapic_internal.h
+++ b/include/hw/i386/ioapic_internal.h
@@ -48,6 +48,11 @@
 #define IOAPIC_LVT_DEST_MODE(1 << IOAPIC_LVT_DEST_MODE_SHIFT)
 #define IOAPIC_LVT_DELIV_MODE   (7 << IOAPIC_LVT_DELIV_MODE_SHIFT)
 
+/* Bits that are read-only for IOAPIC entry */
+#define IOAPIC_RO_BITS  (IOAPIC_LVT_REMOTE_IRR | \
+ IOAPIC_LVT_DELIV_STATUS)
+#define IOAPIC_RW_BITS  (~(uint64_t)IOAPIC_RO_BITS)
+
 #define IOAPIC_TRIGGER_EDGE 0
 #define IOAPIC_TRIGGER_LEVEL1
 
-- 
2.4.11

[Qemu-devel] [PATCH v6 16/26] ioapic: register VT-d IEC invalidate notifier

2016-05-04 Thread Peter Xu

Let IOAPIC the first consumer of VT-d IEC invalidation notifiers. This
is only used for split irqchip case, when VT-d receives IR invalidation
requests, IOAPIC will be notified to update kernel irq routes. For
simplicity, we just update all IOAPIC routes, even if the invalidated
entries are not IOAPIC ones.

Signed-off-by: Peter Xu 
---
 hw/intc/ioapic.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index d6e88d5..b41ab89 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -30,6 +30,7 @@
 #include "sysemu/kvm.h"
 #include "target-i386/cpu.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/intel_iommu.h"
 
 //#define DEBUG_IOAPIC
 
@@ -197,6 +198,14 @@ static void ioapic_update_kvm_routes(IOAPICCommonState *s)
 #endif
 }
 
+static void ioapic_iec_notifier(void *private, bool global,
+uint32_t index, uint32_t mask)
+{
+IOAPICCommonState *s = (IOAPICCommonState *)private;
+/* For simplicity, we just update all the routes */
+ioapic_update_kvm_routes(s);
+}
+
 void ioapic_eoi_broadcast(int vector)
 {
 IOAPICCommonState *s;
@@ -330,6 +339,18 @@ static void ioapic_realize(DeviceState *dev, Error **errp)
 qdev_init_gpio_in(dev, ioapic_set_irq, IOAPIC_NUM_PINS);
 
 ioapics[ioapic_no] = s;
+
+#ifdef CONFIG_KVM
+if (kvm_irqchip_is_split()) {
+IntelIOMMUState *iommu = vtd_iommu_get();
+if (iommu) {
+/* Register this IOAPIC with IOMMU IEC notifier, so that
+ * when there are IR invalidates, we can be notified to
+ * update kernel IR cache. */
+vtd_iec_register_notifier(iommu, ioapic_iec_notifier, s);
+}
+}
+#endif
 }
 
 static void ioapic_class_init(ObjectClass *klass, void *data)
-- 
2.4.11

[Qemu-devel] [PATCH v6 21/26] x86-iommu: introduce parent class

2016-05-04 Thread Peter Xu

Introducing parent class for intel-iommu devices named "x86-iommu". This
is preparation work to abstract shared functionalities out from Intel
and AMD IOMMUs. Currently, only the parent class is introduced. It does
nothing yet.

Signed-off-by: Peter Xu 
---
 hw/i386/Makefile.objs |  2 +-
 hw/i386/intel_iommu.c |  4 ++--
 hw/i386/x86-iommu.c   | 42 ++
 hw/pci-host/q35.c |  2 +-
 include/hw/i386/intel_iommu.h |  3 ++-
 include/hw/i386/x86-iommu.h   | 35 +++
 6 files changed, 83 insertions(+), 5 deletions(-)
 create mode 100644 hw/i386/x86-iommu.c
 create mode 100644 include/hw/i386/x86-iommu.h

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index b52d5b8..90e94ff 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -2,7 +2,7 @@ obj-$(CONFIG_KVM) += kvm/
 obj-y += multiboot.o
 obj-y += pc.o pc_piix.o pc_q35.o
 obj-y += pc_sysfw.o
-obj-y += intel_iommu.o
+obj-y += x86-iommu.o intel_iommu.o
 obj-$(CONFIG_XEN) += ../xenpv/ xen/
 
 obj-y += kvmvapic.o
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index b8666b8..51ad0b5 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2343,7 +2343,7 @@ IntelIOMMUState *vtd_iommu_get(void)
 bool ambiguous = false;
 Object *intel_iommu = NULL;
 
-intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
+intel_iommu = object_resolve_path_type("", TYPE_X86_IOMMU_DEVICE,
  &ambiguous);
 if (ambiguous)
 intel_iommu = NULL;
@@ -2479,7 +2479,7 @@ static void vtd_class_init(ObjectClass *klass, void *data)
 
 static const TypeInfo vtd_info = {
 .name  = TYPE_INTEL_IOMMU_DEVICE,
-.parent= TYPE_SYS_BUS_DEVICE,
+.parent= TYPE_X86_IOMMU_DEVICE,
 .instance_size = sizeof(IntelIOMMUState),
 .class_init= vtd_class_init,
 };
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
new file mode 100644
index 000..7338f98
--- /dev/null
+++ b/hw/i386/x86-iommu.c
@@ -0,0 +1,42 @@
+/*
+ * QEMU emulation of common X86 IOMMU
+ *
+ * Copyright (C) 2016 Peter Xu, Red Hat 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/boards.h"
+#include "hw/i386/x86-iommu.h"
+
+static void x86_iommu_class_init(ObjectClass *klass, void *data)
+{
+}
+
+static const TypeInfo x86_iommu_info = {
+.name  = TYPE_X86_IOMMU_DEVICE,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(X86IOMMUState),
+.class_init= x86_iommu_class_init,
+.abstract  = true,
+};
+
+static void x86_iommu_register_types(void)
+{
+type_register_static(&x86_iommu_info);
+}
+
+type_init(x86_iommu_register_types)
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index d32c123..fe19eff 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -441,7 +441,7 @@ static void mch_init_dmar(MCHPCIState *mch)
 PCIBus *pci_bus = PCI_BUS(qdev_get_parent_bus(DEVICE(mch)));
 
 mch->iommu = INTEL_IOMMU_DEVICE(qdev_create(NULL, 
TYPE_INTEL_IOMMU_DEVICE));
-object_property_add_child(OBJECT(mch), "intel-iommu",
+object_property_add_child(OBJECT(mch), "x86-iommu",
   OBJECT(mch->iommu), NULL);
 qdev_init_nofail(DEVICE(mch->iommu));
 sysbus_mmio_map(SYS_BUS_DEVICE(mch->iommu), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 0886cb7..3f4a46e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -26,6 +26,7 @@
 #include "hw/i386/ioapic.h"
 #include "hw/pci/msi.h"
 #include "hw/sysbus.h"
+#include "hw/i386/x86-iommu.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -240,7 +241,7 @@ typedef struct VTD_IEC_Notifier VTD_IEC_Notifier;
 
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
-SysBusDevice busdev;
+X86IOMMUState x86_iommu;
 MemoryRegion csrmem;
 uint8_t csr[DMAR_REG_SIZE]; /* register values */
 uint8_t wmask[DMAR_REG_SIZE];   /* R/W bytes */
diff --git a/include/hw/i386/x86-iommu.h b/include/hw/i386/x86-iommu.h
new file mode 100644
index 000..3987755
--- /dev/null
+++ b/include/hw/i386/x86-iommu.h
@@ -0,0 +1,35 @@
+/*
+ * Common IOMMU interface fo

[Qemu-devel] [PATCH v6 11/26] q35: ioapic: add support for emulated IOAPIC IR

2016-05-04 Thread Peter Xu

This patch translates all IOAPIC interrupts into MSI ones. One pseudo
ioapic address space is added to transfer the MSI message. By default,
it will be system memory address space. When IR is enabled, it will be
IOMMU address space.

Currently, only emulated IOAPIC is supported.

Idea suggested by Jan Kiszka and Rita Sinha in the following patch:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg01933.html

Signed-off-by: Peter Xu 
---
 hw/i386/pc.c  |  3 +++
 hw/intc/ioapic.c  | 28 
 hw/pci-host/q35.c |  4 
 include/hw/i386/apic-msidef.h |  1 +
 include/hw/i386/ioapic_internal.h |  1 +
 include/hw/i386/pc.h  |  4 
 6 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 99437e0..365e82f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1395,6 +1395,9 @@ void pc_memory_init(PCMachineState *pcms,
 rom_add_option(option_rom[i].name, option_rom[i].bootindex);
 }
 pcms->fw_cfg = fw_cfg;
+
+/* Init default IOAPIC address space */
+pcms->ioapic_as = &address_space_memory;
 }
 
 qemu_irq pc_allocate_cpu_irq(void)
diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 378e663..92334a6 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -28,6 +28,8 @@
 #include "hw/i386/ioapic_internal.h"
 #include "include/hw/pci/msi.h"
 #include "sysemu/kvm.h"
+#include "target-i386/cpu.h"
+#include "hw/i386/apic-msidef.h"
 
 //#define DEBUG_IOAPIC
 
@@ -49,13 +51,15 @@ extern int ioapic_no;
 
 static void ioapic_service(IOAPICCommonState *s)
 {
+AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
+uint32_t addr, data;
 uint8_t i;
 uint8_t trig_mode;
 uint8_t vector;
 uint8_t delivery_mode;
 uint32_t mask;
 uint64_t entry;
-uint8_t dest;
+uint16_t dest_idx;
 uint8_t dest_mode;
 
 for (i = 0; i < IOAPIC_NUM_PINS; i++) {
@@ -66,7 +70,14 @@ static void ioapic_service(IOAPICCommonState *s)
 entry = s->ioredtbl[i];
 if (!(entry & IOAPIC_LVT_MASKED)) {
 trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-dest = entry >> IOAPIC_LVT_DEST_SHIFT;
+/*
+ * By default, this would be dest_id[8] +
+ * reserved[8]. When IR is enabled, this would be
+ * interrupt_index[15] + interrupt_format[1]. This
+ * field never means anything, but only used to
+ * generate corresponding MSI.
+ */
+dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
 dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
 delivery_mode =
 (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
@@ -96,8 +107,17 @@ static void ioapic_service(IOAPICCommonState *s)
 #else
 (void)coalesce;
 #endif
-apic_deliver_irq(dest, dest_mode, delivery_mode, vector,
- trig_mode);
+/* No matter whether IR is enabled, we translate
+ * the IOAPIC message into a MSI one, and its
+ * address space will decide whether we need a
+ * translation. */
+addr = APIC_DEFAULT_ADDRESS | \
+(dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
+(dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+data = (vector << MSI_DATA_VECTOR_SHIFT) |
+(trig_mode << MSI_DATA_TRIGGER_SHIFT) |
+(delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+stl_le_phys(ioapic_as, addr, data);
 }
 }
 }
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 70f897e..d32c123 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -437,6 +437,7 @@ static AddressSpace *q35_host_dma_iommu(PCIBus *bus, void 
*opaque, int devfn)
 
 static void mch_init_dmar(MCHPCIState *mch)
 {
+PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
 PCIBus *pci_bus = PCI_BUS(qdev_get_parent_bus(DEVICE(mch)));
 
 mch->iommu = INTEL_IOMMU_DEVICE(qdev_create(NULL, 
TYPE_INTEL_IOMMU_DEVICE));
@@ -446,6 +447,9 @@ static void mch_init_dmar(MCHPCIState *mch)
 sysbus_mmio_map(SYS_BUS_DEVICE(mch->iommu), 0, Q35_HOST_BRIDGE_IOMMU_ADDR);
 
 pci_setup_iommu(pci_bus, q35_host_dma_iommu, mch->iommu);
+/* Pseudo address space under root PCI bus. */
+pcms->ioapic_as = q35_host_dma_iommu(pci_bus, mch->iommu,
+ Q35_PSEUDO_DEVFN_IOAPIC);
 }
 
 static void mch_realize(PCIDevice *d, Error **errp)
diff --git a/include/hw/i386/apic-msidef.h b/include/hw/i386/apic-msidef.h
index 6e2eb71..8b4d4cc 100644
--- a/include/hw/i386/apic-msidef.h
+++ b/include/hw/i386/apic-msidef.h
@@ -25,6 +25,7 @@
 #define MSI_ADDR_REDIRECTION_SHIFT  3
 
 #define MSI_ADDR_DEST_ID_SHIFT  12
+#defin

[Qemu-devel] [PATCH v6 15/26] intel_iommu: introduce IEC notifiers

2016-05-04 Thread Peter Xu

This patch introduces Intel VT-d IEC (Interrupt Entry Cache)
invalidation notifier list. When vIOMMU receives IEC invalidate request,
all the registered units will be notified with specific invalidation
requests.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 56 --
 hw/i386/intel_iommu_internal.h | 24 +++---
 include/hw/i386/intel_iommu.h  | 22 +
 3 files changed, 91 insertions(+), 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 701d792..dc0e4ba 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -900,6 +900,22 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_iec_notify_all(IntelIOMMUState *s, bool global,
+   uint32_t index, uint32_t mask)
+{
+VTD_IEC_Notifier *notifier;
+
+VTD_DPRINTF(INV, "notify IEC invalidate: global=%d, index=%u, mask=%u",
+global, index, mask);
+
+QLIST_FOREACH(notifier, &s->iec_notifiers, list) {
+if (notifier->iec_notify) {
+notifier->iec_notify(notifier->private, global,
+ index, mask);
+}
+}
+}
+
 static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
 {
 uint64_t value = 0;
@@ -907,7 +923,8 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState 
*s)
 s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
 s->intr_root = value & VTD_IRTA_ADDR_MASK;
 
-/* TODO: invalidate interrupt entry cache */
+/* Notify global invalidation */
+vtd_iec_notify_all(s, true, 0, 0);
 
 VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
 s->intr_root, s->intr_size);
@@ -1409,6 +1426,21 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, 
VTDInvDesc *inv_desc)
 return true;
 }
 
+static bool vtd_process_inv_iec_desc(IntelIOMMUState *s,
+ VTDInvDesc *inv_desc)
+{
+VTD_DPRINTF(INV, "inv ir glob %d index %d mask %d",
+inv_desc->iec.granularity,
+inv_desc->iec.index,
+inv_desc->iec.index_mask);
+
+vtd_iec_notify_all(s, inv_desc->iec.granularity,
+   inv_desc->iec.index,
+   inv_desc->iec.index_mask);
+
+return true;
+}
+
 static bool vtd_process_inv_desc(IntelIOMMUState *s)
 {
 VTDInvDesc inv_desc;
@@ -1449,12 +1481,12 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 break;
 
 case VTD_INV_DESC_IEC:
-VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
-"not implemented yet");
-/*
- * Since currently we do not cache interrupt entries, we can
- * just mark this descriptor as "good" and move on.
- */
+VTD_DPRINTF(INV, "Invalidation Interrupt Entry Cache "
+"Descriptor hi 0x%"PRIx64 " lo 0x%"PRIx64,
+inv_desc.hi, inv_desc.lo);
+if (!vtd_process_inv_iec_desc(s, &inv_desc)) {
+return false;
+}
 break;
 
 default:
@@ -2212,6 +2244,15 @@ static const MemoryRegionOps vtd_mem_ir_ops = {
 },
 };
 
+void vtd_iec_register_notifier(IntelIOMMUState *s, vtd_iec_notify_fn fn,
+   void *data)
+{
+VTD_IEC_Notifier *notifier = g_new0(VTD_IEC_Notifier, 1);
+notifier->iec_notify = fn;
+notifier->private = data;
+QLIST_INSERT_HEAD(&s->iec_notifiers, notifier, list);
+}
+
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn)
 {
 uintptr_t key = (uintptr_t)bus;
@@ -2374,6 +2415,7 @@ static void vtd_realize(DeviceState *dev, Error **errp)
  g_free, g_free);
 s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, 
vtd_uint64_equal,
   g_free, g_free);
+QLIST_INIT(&s->iec_notifiers);
 vtd_init(s);
 }
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e1a08cb..10c20fe 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -296,12 +296,28 @@ typedef enum VTDFaultReason {
 
 #define VTD_CONTEXT_CACHE_GEN_MAX   0xUL
 
+/* Interrupt Entry Cache Invalidation Descriptor: VT-d 6.5.2.7. */
+struct VTDInvDescIEC {
+uint32_t type:4;/* Should always be 0x4 */
+uint32_t granularity:1; /* If set, it's global IR invalidation */
+uint32_t resved_1:22;
+uint32_t index_mask:5;  /* 2^N for continuous int invalidation */
+uint32_t index:16;  /* Start index to invalidate */
+uint32_t reserved_2:16;
+};
+typedef struct VTDInvDescIEC VTDInvDescIEC;
+
 /* Queued Invalidation Descriptor */
-struct VTDInvDesc {
-uint64_t lo;
-uint64_t hi;
+union VTDInvDesc {
+struct {
+uint64_t lo;
+uint64_t hi;
+};
+

[Qemu-devel] [PATCH v6 20/26] intel_iommu: add SID validation for IR

2016-05-04 Thread Peter Xu

This patch enables SID validation. Invalid interrupts will be dropped.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 70 +++
 include/hw/i386/intel_iommu.h | 21 -
 target-i386/kvm.c |  3 +-
 3 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1e57125..b8666b8 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2002,11 +2002,15 @@ static Property vtd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+uint16_t vtd_svt_mask[VTD_SQ_MAX] = {0x, 0xfffb, 0xfff9, 0xfff8};
+
 /* Read IRTE entry with specific index */
 static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
-VTD_IRTE *entry)
+VTD_IRTE *entry, uint16_t sid)
 {
 dma_addr_t addr = 0x00;
+uint16_t mask;
+uint8_t bus, bus_max, bus_min;
 
 addr = iommu->intr_root + index * sizeof(*entry);
 if (dma_memory_read(&address_space_memory, addr, entry,
@@ -2033,23 +2037,57 @@ static int vtd_irte_get(IntelIOMMUState *iommu, 
uint16_t index,
 return -VTD_FR_IR_IRTE_RSVD;
 }
 
-/*
- * TODO: Check Source-ID corresponds to SVT (Source Validation
- * Type) bits
- */
+if (sid != VTD_SID_INVALID) {
+/* Validate IRTE SID */
+switch (entry->sid_vtype) {
+case VTD_SVT_NONE:
+VTD_DPRINTF(IR, "No SID validation for IRTE index %d", index);
+break;
+
+case VTD_SVT_ALL:
+mask = vtd_svt_mask[entry->sid_q];
+if ((entry->source_id & mask) != (sid & mask)) {
+VTD_DPRINTF(GENERAL, "SID validation for IRTE index "
+"%d failed (reqid 0x%04x sid 0x%04x)", index,
+sid, entry->source_id);
+return -VTD_FR_IR_SID_ERR;
+}
+break;
+
+case VTD_SVT_BUS:
+bus_max = entry->source_id >> 8;
+bus_min = entry->source_id & 0xff;
+bus = sid >> 8;
+if (bus > bus_max || bus < bus_min) {
+VTD_DPRINTF(GENERAL, "SID validation for IRTE index %d "
+"failed (bus %d outside %d-%d)", index, bus,
+bus_min, bus_max);
+return -VTD_FR_IR_SID_ERR;
+}
+break;
+
+default:
+VTD_DPRINTF(GENERAL, "Invalid SVT bits (0x%x) in IRTE index "
+"%d", entry->sid_vtype, index);
+/* Take this as verification failure. */
+return -VTD_FR_IR_SID_ERR;
+break;
+}
+}
 
 return 0;
 }
 
 /* Fetch IRQ information of specific IR index */
-static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq 
*irq)
+static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index,
+ VTDIrq *irq, uint16_t sid)
 {
 VTD_IRTE irte;
 int ret = 0;
 
 bzero(&irte, sizeof(irte));
 
-ret = vtd_irte_get(iommu, index, &irte);
+ret = vtd_irte_get(iommu, index, &irte, sid);
 if (ret) {
 return ret;
 }
@@ -2101,7 +2139,8 @@ static void vtd_generate_msi_message(VTDIrq *irq, 
MSIMessage *msg_out)
 /* Interrupt remapping for MSI/MSI-X entry */
 static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
MSIMessage *origin,
-   MSIMessage *translated)
+   MSIMessage *translated,
+   uint16_t sid)
 {
 int ret = 0;
 VTD_IR_MSIAddress addr;
@@ -2136,7 +2175,7 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
 
 index = addr.index_h << 15 | addr.index_l;
 
-ret = vtd_remap_irq_get(iommu, index, &irq);
+ret = vtd_remap_irq_get(iommu, index, &irq, sid);
 if (ret) {
 return ret;
 }
@@ -2180,9 +2219,10 @@ do_not_translate:
 return 0;
 }
 
-int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst)
+int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst,
+  uint16_t sid)
 {
-return vtd_interrupt_remap_msi(iommu, src, dst);
+return vtd_interrupt_remap_msi(iommu, src, dst, sid);
 }
 
 static MemTxResult vtd_mem_ir_read(void *opaque, hwaddr addr,
@@ -2208,11 +2248,17 @@ static MemTxResult vtd_mem_ir_write(void *opaque, 
hwaddr addr,
 {
 int ret = 0;
 MSIMessage from = {0}, to = {0};
+uint16_t sid = VTD_SID_INVALID;
 
 from.address = (uint64_t) addr + VTD_INTERRUPT_ADDR_FIRST;
 from.data = (uint32_t) value;
 
-ret = vtd_interrupt_remap_msi(opaque, &from, &to);
+if (!attrs.unspecified) {
+/* We have explicit Source ID */
+sid = attrs.requester_id;
+}
+
+ret = vtd_interrupt_remap_msi(opaque, &from, &to, sid);
 if (ret) {
 /* TODO: report error */
 VTD_DPRINTF(GENERAL, "int remap fail for addr 0x

[Qemu-devel] [PATCH v6 12/26] ioapic: introduce ioapic_entry_parse() helper

2016-05-04 Thread Peter Xu

Abstract IOAPIC entry parsing logic into a helper function.

Signed-off-by: Peter Xu 
---
 hw/intc/ioapic.c | 110 +++
 1 file changed, 54 insertions(+), 56 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 92334a6..d6e88d5 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -49,18 +49,56 @@ static IOAPICCommonState *ioapics[MAX_IOAPICS];
 /* global variable from ioapic_common.c */
 extern int ioapic_no;
 
+struct ioapic_entry_info {
+/* fields parsed from IOAPIC entries */
+uint8_t masked;
+uint8_t trig_mode;
+uint16_t dest_idx;
+uint8_t dest_mode;
+uint8_t delivery_mode;
+uint8_t vector;
+
+/* MSI message generated from above parsed fields */
+uint32_t addr;
+uint32_t data;
+};
+
+static void ioapic_entry_parse(uint64_t entry, struct ioapic_entry_info *info)
+{
+bzero(info, sizeof(*info));
+info->masked = (entry >> IOAPIC_LVT_MASKED_SHIFT) & 1;
+info->trig_mode = (entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1;
+/*
+ * By default, this would be dest_id[8] + reserved[8]. When IR
+ * is enabled, this would be interrupt_index[15] +
+ * interrupt_format[1]. This field never means anything, but
+ * only used to generate corresponding MSI.
+ */
+info->dest_idx = (entry >> IOAPIC_LVT_DEST_IDX_SHIFT) & 0x;
+info->dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
+info->delivery_mode = (entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) \
+& IOAPIC_DM_MASK;
+if (info->delivery_mode == IOAPIC_DM_EXTINT) {
+info->vector = pic_read_irq(isa_pic);
+} else {
+info->vector = entry & IOAPIC_VECTOR_MASK;
+}
+
+info->addr = APIC_DEFAULT_ADDRESS | \
+(info->dest_idx << MSI_ADDR_DEST_IDX_SHIFT) | \
+(info->dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
+info->data = (info->vector << MSI_DATA_VECTOR_SHIFT) | \
+(info->trig_mode << MSI_DATA_TRIGGER_SHIFT) | \
+(info->delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
+}
+
 static void ioapic_service(IOAPICCommonState *s)
 {
 AddressSpace *ioapic_as = PC_MACHINE(qdev_get_machine())->ioapic_as;
-uint32_t addr, data;
+struct ioapic_entry_info info;
 uint8_t i;
-uint8_t trig_mode;
-uint8_t vector;
-uint8_t delivery_mode;
 uint32_t mask;
 uint64_t entry;
-uint16_t dest_idx;
-uint8_t dest_mode;
 
 for (i = 0; i < IOAPIC_NUM_PINS; i++) {
 mask = 1 << i;
@@ -68,33 +106,18 @@ static void ioapic_service(IOAPICCommonState *s)
 int coalesce = 0;
 
 entry = s->ioredtbl[i];
-if (!(entry & IOAPIC_LVT_MASKED)) {
-trig_mode = ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1);
-/*
- * By default, this would be dest_id[8] +
- * reserved[8]. When IR is enabled, this would be
- * interrupt_index[15] + interrupt_format[1]. This
- * field never means anything, but only used to
- * generate corresponding MSI.
- */
-dest_idx = entry >> IOAPIC_LVT_DEST_IDX_SHIFT;
-dest_mode = (entry >> IOAPIC_LVT_DEST_MODE_SHIFT) & 1;
-delivery_mode =
-(entry >> IOAPIC_LVT_DELIV_MODE_SHIFT) & IOAPIC_DM_MASK;
-if (trig_mode == IOAPIC_TRIGGER_EDGE) {
+ioapic_entry_parse(entry, &info);
+if (!info.masked) {
+if (info.trig_mode == IOAPIC_TRIGGER_EDGE) {
 s->irr &= ~mask;
 } else {
 coalesce = s->ioredtbl[i] & IOAPIC_LVT_REMOTE_IRR;
 s->ioredtbl[i] |= IOAPIC_LVT_REMOTE_IRR;
 }
-if (delivery_mode == IOAPIC_DM_EXTINT) {
-vector = pic_read_irq(isa_pic);
-} else {
-vector = entry & IOAPIC_VECTOR_MASK;
-}
+
 #ifdef CONFIG_KVM
 if (kvm_irqchip_is_split()) {
-if (trig_mode == IOAPIC_TRIGGER_EDGE) {
+if (info.trig_mode == IOAPIC_TRIGGER_EDGE) {
 kvm_set_irq(kvm_state, i, 1);
 kvm_set_irq(kvm_state, i, 0);
 } else {
@@ -111,13 +134,7 @@ static void ioapic_service(IOAPICCommonState *s)
  * the IOAPIC message into a MSI one, and its
  * address space will decide whether we need a
  * translation. */
-addr = APIC_DEFAULT_ADDRESS | \
-(dest_idx << MSI_ADDR_DEST_IDX_SHIFT) |
-(dest_mode << MSI_ADDR_DEST_MODE_SHIFT);
-data = (vector << MSI_DATA_VECTOR_SHIFT) |
-(trig_mode << MSI_DATA_TRIGGER_SHIFT) |
-(delivery_mode << MSI_DATA_DELIVERY_MODE_SHIFT);
-stl_le_phys(ioapic_as, addr, data);
+stl_le_phys(ioa

[Qemu-devel] [PATCH v6 14/26] q35: add "intremap" parameter to enable IR

2016-05-04 Thread Peter Xu

One flag is added to specify whether to enable IR for emulated IOMMU. By
default, interrupt remapping is not supportted. To enable it, we should
specify something like:

$ qemu-system-x86_64 -M q35,iommu=on,intremap=on

To be more clear, the following command:

$ qemu-system-x86_64 -M q35,iommu=on

Will enable IOMMU only, without interrupt remapping support.

Currently, Intel IOMMU IR only support kernel-irqchip={off|split}. We
need to specify either of it in -M as well.

Signed-off-by: Peter Xu 
---
 hw/core/machine.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 276ad61..5994b9f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -300,6 +300,20 @@ static void machine_set_iommu(Object *obj, bool value, 
Error **errp)
 ms->iommu = value;
 }
 
+static bool machine_get_intremap(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->iommu_intr;
+}
+
+static void machine_set_intremap(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->iommu_intr = value;
+}
+
 static void machine_set_suppress_vmdesc(Object *obj, bool value, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -480,6 +494,12 @@ static void machine_initfn(Object *obj)
 object_property_set_description(obj, "iommu",
 "Set on/off to enable/disable Intel IOMMU 
(VT-d)",
 NULL);
+object_property_add_bool(obj, "intremap", machine_get_intremap,
+ machine_set_intremap, NULL);
+object_property_set_description(obj, "intremap",
+"Set on/off to enable/disable IOMMU"
+" interrupt remapping",
+NULL);
 object_property_add_bool(obj, "suppress-vmdesc",
  machine_get_suppress_vmdesc,
  machine_set_suppress_vmdesc, NULL);
-- 
2.4.11

[Qemu-devel] [PATCH v6 06/26] intel_iommu: handle interrupt remap enable

2016-05-04 Thread Peter Xu

Handle writting to IRE bit in global command register.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 00b873c..4d14124 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1180,6 +1180,22 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool 
en)
 }
 }
 
+/* Handle Interrupt Remap Enable/Disable */
+static void vtd_handle_gcmd_ire(IntelIOMMUState *s, bool en)
+{
+VTD_DPRINTF(CSR, "Interrupt Remap Enable %s", (en ? "on" : "off"));
+
+if (en) {
+s->intr_enabled = true;
+/* Ok - report back to driver */
+vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRES);
+} else {
+s->intr_enabled = false;
+/* Ok - report back to driver */
+vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_IRES, 0);
+}
+}
+
 /* Handle write to Global Command Register */
 static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 {
@@ -1204,6 +1220,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 /* Set/update the interrupt remapping root-table pointer */
 vtd_handle_gcmd_sirtp(s);
 }
+if (changed & VTD_GCMD_IRE) {
+/* Interrupt remap enable/disable */
+vtd_handle_gcmd_ire(s, val & VTD_GCMD_IRE);
+}
 }
 
 /* Handle write to Context Command Register */
-- 
2.4.11

[Qemu-devel] [PATCH v6 08/26] intel_iommu: provide helper function vtd_get_iommu

2016-05-04 Thread Peter Xu

Moves acpi_get_iommu() under VT-d to make it a public function.

Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c  |  7 +--
 hw/i386/intel_iommu.c | 13 +
 include/hw/i386/intel_iommu.h |  2 ++
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 5d2d87b..b064bc2 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2677,12 +2677,7 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
 
 static bool acpi_has_iommu(void)
 {
-bool ambiguous;
-Object *intel_iommu;
-
-intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
-   &ambiguous);
-return intel_iommu && !ambiguous;
+return !!vtd_iommu_get();
 }
 
 static
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4d14124..a44289f 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2001,6 +2001,19 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
 return vtd_dev_as;
 }
 
+IntelIOMMUState *vtd_iommu_get(void)
+{
+bool ambiguous = false;
+Object *intel_iommu = NULL;
+
+intel_iommu = object_resolve_path_type("", TYPE_INTEL_IOMMU_DEVICE,
+ &ambiguous);
+if (ambiguous)
+intel_iommu = NULL;
+
+return (IntelIOMMUState *)intel_iommu;
+}
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 4914fe6..9ee84f7 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -196,5 +196,7 @@ struct IntelIOMMUState {
  * create a new one if none exists
  */
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
+/* Get default IOMMU object */
+IntelIOMMUState *vtd_iommu_get(void);
 
 #endif
-- 
2.4.11

[Qemu-devel] [PATCH v6 18/26] ioapic: clear remote irr bit for edge-triggered interrupts

2016-05-04 Thread Peter Xu

This is to better emulate IOAPIC version 0x1X hardware. Linux kernel
leveraged this "feature" to do explicit EOI since EOI register is still
not introduced at that time. This will also fix the issue that level
triggered interrupts failed to work when IR enabled (tested with Linux
kernel version 4.5).

Reviewed-by: Radim Krčmář 
Signed-off-by: Peter Xu 
---
 hw/intc/ioapic.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index d7ebb5c..ce06954 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -281,6 +281,34 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int 
size)
 return val;
 }
 
+/*
+ * This is to satisfy the hack in Linux kernel. One hack of it is to
+ * simulate clearing the Remote IRR bit of IOAPIC entry using the
+ * following:
+ *
+ * "For IO-APIC's with EOI register, we use that to do an explicit EOI.
+ * Otherwise, we simulate the EOI message manually by changing the trigger
+ * mode to edge and then back to level, with RTE being masked during
+ * this."
+ *
+ * (See linux kernel __eoi_ioapic_pin() comment in commit c0205701)
+ *
+ * This is based on the assumption that, Remote IRR bit will be
+ * cleared by IOAPIC hardware when configured as edge-triggered
+ * interrupts.
+ *
+ * Without this, level-triggered interrupts in IR mode might fail to
+ * work correctly.
+ */
+static inline void
+ioapic_fix_edge_remote_irr(uint64_t *entry)
+{
+if (!(*entry & IOAPIC_LVT_TRIGGER_MODE)) {
+/* Edge-triggered interrupts, make sure remote IRR is zero */
+*entry &= ~((uint64_t)IOAPIC_LVT_REMOTE_IRR);
+}
+}
+
 static void
 ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
  unsigned int size)
@@ -318,6 +346,7 @@ ioapic_mem_write(void *opaque, hwaddr addr, uint64_t val,
 /* restore RO bits */
 s->ioredtbl[index] &= IOAPIC_RW_BITS;
 s->ioredtbl[index] |= ro_bits;
+ioapic_fix_edge_remote_irr(&s->ioredtbl[index]);
 ioapic_service(s);
 }
 }
-- 
2.4.11

[Qemu-devel] [PATCH v6 10/26] intel_iommu: Add support for PCI MSI remap

2016-05-04 Thread Peter Xu

This patch enables interrupt remapping for PCI devices.

To play the trick, one memory region "iommu_ir" is added as child region
of the original iommu memory region, covering range 0xfeeX (which is
the address range for APIC). All the writes to this range will be taken
as MSI, and translation is carried out only when IR is enabled.

Idea suggested by Paolo Bonzini.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 247 +
 hw/i386/intel_iommu_internal.h |   2 +
 include/hw/i386/intel_iommu.h  |  52 +
 3 files changed, 301 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index a44289f..701d792 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1969,6 +1969,248 @@ static Property vtd_properties[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+/* Read IRTE entry with specific index */
+static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index,
+VTD_IRTE *entry)
+{
+dma_addr_t addr = 0x00;
+
+addr = iommu->intr_root + index * sizeof(*entry);
+if (dma_memory_read(&address_space_memory, addr, entry,
+sizeof(*entry))) {
+VTD_DPRINTF(GENERAL, "error: fail to access IR root at 0x%"PRIx64
+" + %"PRIu16, iommu->intr_root, index);
+return -VTD_FR_IR_ROOT_INVAL;
+}
+
+if (!entry->present) {
+VTD_DPRINTF(GENERAL, "error: present flag not set in IRTE"
+" entry index %u value 0x%"PRIx64 " 0x%"PRIx64,
+index, le64_to_cpu(entry->data[1]),
+le64_to_cpu(entry->data[0]));
+return -VTD_FR_IR_ENTRY_P;
+}
+
+if (entry->__reserved_0 || entry->__reserved_1 || \
+entry->__reserved_2) {
+VTD_DPRINTF(GENERAL, "error: IRTE entry index %"PRIu16
+" reserved fields non-zero: 0x%"PRIx64 " 0x%"PRIx64,
+index, le64_to_cpu(entry->data[1]),
+le64_to_cpu(entry->data[0]));
+return -VTD_FR_IR_IRTE_RSVD;
+}
+
+/*
+ * TODO: Check Source-ID corresponds to SVT (Source Validation
+ * Type) bits
+ */
+
+return 0;
+}
+
+/* Fetch IRQ information of specific IR index */
+static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, VTDIrq 
*irq)
+{
+VTD_IRTE irte;
+int ret = 0;
+
+bzero(&irte, sizeof(irte));
+
+ret = vtd_irte_get(iommu, index, &irte);
+if (ret) {
+return ret;
+}
+
+irq->trigger_mode = irte.trigger_mode;
+irq->vector = irte.vector;
+irq->delivery_mode = irte.delivery_mode;
+/* Not support EIM yet: please refer to vt-d 9.10 DST bits */
+#define  VTD_IR_APIC_DEST_MASK (0xff00ULL)
+#define  VTD_IR_APIC_DEST_SHIFT(8)
+irq->dest = (irte.dest_id & VTD_IR_APIC_DEST_MASK) >> \
+VTD_IR_APIC_DEST_SHIFT;
+irq->dest_mode = irte.dest_mode;
+irq->redir_hint = irte.redir_hint;
+
+VTD_DPRINTF(IR, "remapping interrupt index %d: trig:%u,vec:%u,"
+"deliver:%u,dest:%u,dest_mode:%u", index,
+irq->trigger_mode, irq->vector, irq->delivery_mode,
+irq->dest, irq->dest_mode);
+
+return 0;
+}
+
+/* Generate one MSI message from VTDIrq info */
+static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out)
+{
+VTD_MSIMessage msg = {};
+
+/* Generate address bits */
+msg.dest_mode = irq->dest_mode;
+msg.redir_hint = irq->redir_hint;
+msg.dest = irq->dest;
+msg.__addr_head = 0xfee;
+/* Keep this from original MSI address bits */
+msg.__not_used = irq->msi_addr_last_bits;
+
+/* Generate data bits */
+msg.vector = irq->vector;
+msg.delivery_mode = irq->delivery_mode;
+msg.level = 1;
+msg.trigger_mode = irq->trigger_mode;
+
+msg_out->address = msg.msi_addr;
+msg_out->data = msg.msi_data;
+}
+
+/* Interrupt remapping for MSI/MSI-X entry */
+static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated)
+{
+int ret = 0;
+VTD_IR_MSIAddress addr;
+uint16_t index = 0;
+VTDIrq irq = {0};
+
+assert(origin && translated);
+
+if (!iommu || !iommu->intr_enabled) {
+goto do_not_translate;
+}
+
+if (origin->address & VTD_MSI_ADDR_HI_MASK) {
+VTD_DPRINTF(GENERAL, "error: MSI addr high 32 bits nonzero"
+" during interrupt remapping: 0x%"PRIx32,
+(uint32_t)((origin->address & VTD_MSI_ADDR_HI_MASK) >> \
+VTD_MSI_ADDR_HI_SHIFT));
+return -VTD_FR_IR_REQ_RSVD;
+}
+
+addr.data = origin->address & VTD_MSI_ADDR_LO_MASK;
+if (addr.__head != 0xfee) {
+VTD_DPRINTF(GENERAL, "error: MSI addr low 32 bits invalid: "
+"0x%"PRIx32, addr.data);
+return -VTD_FR_IR_REQ_RSVD;
+}
+
+/* This is compatible mode. */
+if (

[Qemu-devel] [PATCH v6 04/26] acpi: add DMAR scope definition for root IOAPIC

2016-05-04 Thread Peter Xu

To enable interrupt remapping for intel IOMMU device, each IOAPIC device
in the system reported via ACPI MADT must be explicitly enumerated under
one specific remapping hardware unit. This patch adds the root-complex
IOAPIC into the default DMAR device.

Please refer to VT-d spec 8.3.1.1 for more information.

Signed-off-by: Peter Xu 
---
 hw/i386/acpi-build.c| 17 +++--
 include/hw/acpi/acpi-defs.h | 15 +++
 include/hw/pci-host/q35.h   |  9 +
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 80dd1bb..5d2d87b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -77,6 +77,9 @@
 #define ACPI_BUILD_DPRINTF(fmt, ...)
 #endif
 
+/* Default IOAPIC ID */
+#define ACPI_BUILD_IOAPIC_ID 0x0
+
 typedef struct AcpiMcfgInfo {
 uint64_t mcfg_base;
 uint32_t mcfg_size;
@@ -375,7 +378,6 @@ build_madt(GArray *table_data, GArray *linker, 
PCMachineState *pcms)
 io_apic = acpi_data_push(table_data, sizeof *io_apic);
 io_apic->type = ACPI_APIC_IO;
 io_apic->length = sizeof(*io_apic);
-#define ACPI_BUILD_IOAPIC_ID 0x0
 io_apic->io_apic_id = ACPI_BUILD_IOAPIC_ID;
 io_apic->address = cpu_to_le32(IO_APIC_DEFAULT_ADDRESS);
 io_apic->interrupt = cpu_to_le32(0);
@@ -2582,6 +2584,9 @@ build_dmar_q35(MachineState *ms, GArray *table_data, 
GArray *linker)
 AcpiTableDmar *dmar;
 AcpiDmarHardwareUnit *drhd;
 uint8_t dmar_flags = 0;
+AcpiDmarDeviceScope *scope = NULL;
+/* Root complex IOAPIC use one path[0] only */
+uint16_t scope_size = sizeof(*scope) + sizeof(uint16_t);
 
 if (ms->iommu_intr) {
 /* enable INTR for the IOMMU device */
@@ -2595,11 +2600,19 @@ build_dmar_q35(MachineState *ms, GArray *table_data, 
GArray *linker)
 /* DMAR Remapping Hardware Unit Definition structure */
 drhd = acpi_data_push(table_data, sizeof(*drhd));
 drhd->type = cpu_to_le16(ACPI_DMAR_TYPE_HARDWARE_UNIT);
-drhd->length = cpu_to_le16(sizeof(*drhd));   /* No device scope now */
+drhd->length = cpu_to_le16(sizeof(*drhd) + scope_size);
 drhd->flags = ACPI_DMAR_INCLUDE_PCI_ALL;
 drhd->pci_segment = cpu_to_le16(0);
 drhd->address = cpu_to_le64(Q35_HOST_BRIDGE_IOMMU_ADDR);
 
+/* Scope definition for the root-complex IOAPIC */
+scope = acpi_data_push(table_data, scope_size);
+scope->entry_type = cpu_to_le16(ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC);
+scope->length = scope_size;
+scope->enumeration_id = cpu_to_le16(ACPI_BUILD_IOAPIC_ID);
+scope->bus = cpu_to_le16(Q35_PSEUDO_BUS_PLATFORM);
+scope->path[0] = cpu_to_le16(Q35_PSEUDO_DEVFN_IOAPIC);
+
 build_header(linker, table_data, (void *)(table_data->data + dmar_start),
  "DMAR", table_data->len - dmar_start, 1, NULL, NULL);
 }
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index c7a03d4..2430af6 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -556,6 +556,20 @@ enum {
 /*
  * Sub-structures for DMAR
  */
+
+#define ACPI_DMAR_DEV_SCOPE_TYPE_IOAPIC (0x03)
+
+/* Device scope structure for DRHD. */
+struct AcpiDmarDeviceScope {
+uint8_t entry_type;
+uint8_t length;
+uint16_t reserved;
+uint8_t enumeration_id;
+uint8_t bus;
+uint16_t path[0];   /* list of dev:func pairs */
+} QEMU_PACKED;
+typedef struct AcpiDmarDeviceScope AcpiDmarDeviceScope;
+
 /* Type 0: Hardware Unit Definition */
 struct AcpiDmarHardwareUnit {
 uint16_t type;
@@ -564,6 +578,7 @@ struct AcpiDmarHardwareUnit {
 uint8_t reserved;
 uint16_t pci_segment;   /* The PCI Segment associated with this unit */
 uint64_t address;   /* Base address of remapping hardware register-set */
+AcpiDmarDeviceScope scope[0];
 } QEMU_PACKED;
 typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit;
 
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index c5c073d..9afc221 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -175,4 +175,13 @@ typedef struct Q35PCIHost {
 
 uint64_t mch_mcfg_base(void);
 
+/*
+ * Arbitary but unique BNF number for IOAPIC device. This is only
+ * used when interrupt remapping is enabled.
+ *
+ * TODO: make sure there would have no conflict with real PCI bus
+ */
+#define Q35_PSEUDO_BUS_PLATFORM (0xff)
+#define Q35_PSEUDO_DEVFN_IOAPIC (0x00)
+
 #endif /* HW_Q35_H */
-- 
2.4.11

[Qemu-devel] [PATCH v6 07/26] intel_iommu: define several structs for IOMMU IR

2016-05-04 Thread Peter Xu

Several data structs are defined to better support the rest of the
patches: IRTE to parse remapping table entries, and IOAPIC/MSI related
structure bits to parse interrupt entries to be filled in by guest
kernel.

Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h | 60 +++
 1 file changed, 60 insertions(+)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index cc49839..4914fe6 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -52,6 +52,9 @@ typedef struct IntelIOMMUState IntelIOMMUState;
 typedef struct VTDAddressSpace VTDAddressSpace;
 typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
+typedef union VTD_IRTE VTD_IRTE;
+typedef union VTD_IR_IOAPICEntry VTD_IR_IOAPICEntry;
+typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -90,6 +93,63 @@ struct VTDIOTLBEntry {
 bool write_flags;
 };
 
+/* Interrupt Remapping Table Entry Definition */
+union VTD_IRTE {
+struct {
+uint8_t present:1;  /* Whether entry present/available */
+uint8_t fault_disable:1;/* Fault Processing Disable */
+uint8_t dest_mode:1;/* Destination Mode */
+uint8_t redir_hint:1;   /* Redirection Hint */
+uint8_t trigger_mode:1; /* Trigger Mode */
+uint8_t delivery_mode:3;/* Delivery Mode */
+uint8_t __avail:4;  /* Available spaces for software */
+uint8_t __reserved_0:3; /* Reserved 0 */
+uint8_t irte_mode:1;/* IRTE Mode */
+uint8_t vector:8;   /* Interrupt Vector */
+uint8_t __reserved_1:8; /* Reserved 1 */
+uint32_t dest_id:32;/* Destination ID */
+uint16_t source_id:16;  /* Source-ID */
+uint8_t sid_q:2;/* Source-ID Qualifier */
+uint8_t sid_vtype:2;/* Source-ID Validation Type */
+uint64_t __reserved_2:44;   /* Reserved 2 */
+} QEMU_PACKED;
+uint64_t data[2];
+};
+
+/* Programming format for IOAPIC table entries */
+union VTD_IR_IOAPICEntry {
+struct {
+uint8_t vector:8;   /* Vector */
+uint8_t __zeros:3;  /* Reserved (all zero) */
+uint8_t index_h:1;  /* Interrupt Index bit 15 */
+uint8_t status:1;   /* Deliver Status */
+uint8_t polarity:1; /* Interrupt Polarity */
+uint8_t remote_irr:1;   /* Remote IRR */
+uint8_t trigger_mode:1; /* Trigger Mode */
+uint8_t mask:1; /* Mask */
+uint32_t __reserved:31; /* Reserved (should all zero) */
+uint8_t int_mode:1; /* Interrupt Format */
+uint16_t index_l:15;/* Interrupt Index bits 14-0 */
+} QEMU_PACKED;
+uint64_t data;
+};
+
+/* Programming format for MSI/MSI-X addresses */
+union VTD_IR_MSIAddress {
+struct {
+uint8_t __not_care:2;
+uint8_t index_h:1;  /* Interrupt index bit 15 */
+uint8_t sub_valid:1;/* SHV: Sub-Handle Valid bit */
+uint8_t int_mode:1; /* Interrupt format */
+uint16_t index_l:15;/* Interrupt index bit 14-0 */
+uint16_t __head:12; /* Should always be: 0x0fee */
+} QEMU_PACKED;
+uint32_t data;
+};
+
+/* When IR is enabled, all MSI/MSI-X data bits should be zero */
+#define VTD_IR_MSI_DATA  (0)
+
 /* The iommu (DMAR) device state struct */
 struct IntelIOMMUState {
 SysBusDevice busdev;
-- 
2.4.11

[Qemu-devel] [PATCH v6 09/26] intel_iommu: add IR translation faults defines

2016-05-04 Thread Peter Xu

Adding translation fault definitions for interrupt remapping. Please
refer to VT-d spec section 7.1.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu_internal.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 309833f..2a9987f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -271,6 +271,19 @@ typedef enum VTDFaultReason {
  * context-entry.
  */
 VTD_FR_CONTEXT_ENTRY_TT,
+
+/* Interrupt remapping transition faults */
+VTD_FR_IR_REQ_RSVD = 0x20, /* One or more IR request reserved
+* fields set */
+VTD_FR_IR_INDEX_OVER = 0x21, /* Index value greater than max */
+VTD_FR_IR_ENTRY_P = 0x22,/* Present (P) not set in IRTE */
+VTD_FR_IR_ROOT_INVAL = 0x23, /* IR Root table invalid */
+VTD_FR_IR_IRTE_RSVD = 0x24,  /* IRTE Rsvd field non-zero with
+  * Present flag set */
+VTD_FR_IR_REQ_COMPAT = 0x25, /* Encountered compatible IR
+  * request while disabled */
+VTD_FR_IR_SID_ERR = 0x26,   /* Invalid Source-ID */
+
 /* This is not a normal fault reason. We use this to indicate some faults
  * that are not referenced by the VT-d specification.
  * Fault event with such reason should not be recorded.
-- 
2.4.11

[Qemu-devel] [PATCH v6 13/26] intel_iommu: add support for split irqchip

2016-05-04 Thread Peter Xu

In split irqchip mode, IOAPIC is working in user space, only update
kernel irq routes when entry changed. When IR is enabled, we directly
update the kernel with translated messages. It works just like a kernel
cache for the remapping entries.

Since KVM irqfd is using kernel gsi routes to deliver interrupts, as
long as we can support split irqchip, we will support irqfd as
well. Also, since kernel gsi routes will cache translated interrupts,
irqfd delivery will not suffer from any performance impact due to IR.

And, since we supported irqfd, vhost devices will be able to work
seamlessly with IR now. Logically this should contain both vhost-net and
vhost-user case.

Signed-off-by: Peter Xu 
---
 include/hw/i386/intel_iommu.h |  2 ++
 target-i386/kvm.c | 24 
 trace-events  |  3 +++
 3 files changed, 29 insertions(+)

diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 5945670..5910e6f 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -25,6 +25,7 @@
 #include "sysemu/dma.h"
 #include "hw/i386/ioapic.h"
 #include "hw/pci/msi.h"
+#include "hw/sysbus.h"
 
 #define TYPE_INTEL_IOMMU_DEVICE "intel-iommu"
 #define INTEL_IOMMU_DEVICE(obj) \
@@ -250,5 +251,6 @@ struct IntelIOMMUState {
 VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn);
 /* Get default IOMMU object */
 IntelIOMMUState *vtd_iommu_get(void);
+int vtd_int_remap(void *iommu, MSIMessage *src, MSIMessage *dst);
 
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 799fdfa..ea5387c 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -36,6 +36,7 @@
 #include "hw/i386/apic.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/i386/apic-msidef.h"
+#include "hw/i386/intel_iommu.h"
 
 #include "exec/ioport.h"
 #include "standard-headers/asm-x86/hyperv.h"
@@ -43,6 +44,7 @@
 #include "hw/pci/msi.h"
 #include "migration/migration.h"
 #include "exec/memattrs.h"
+#include "trace.h"
 
 //#define DEBUG_KVM
 
@@ -3327,6 +3329,28 @@ int kvm_device_msix_deassign(KVMState *s, uint32_t 
dev_id)
 int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
  uint64_t address, uint32_t data, PCIDevice *dev)
 {
+IntelIOMMUState *iommu = vtd_iommu_get();
+
+if (iommu) {
+int ret;
+MSIMessage src, dst;
+
+src.address = route->u.msi.address_hi;
+src.address <<= VTD_MSI_ADDR_HI_SHIFT;
+src.address |= route->u.msi.address_lo;
+src.data = route->u.msi.data;
+
+ret = vtd_int_remap(iommu, &src, &dst);
+if (ret) {
+trace_kvm_x86_fixup_msi_error(route->gsi);
+return 1;
+}
+
+route->u.msi.address_hi = dst.address >> VTD_MSI_ADDR_HI_SHIFT;
+route->u.msi.address_lo = dst.address & VTD_MSI_ADDR_LO_MASK;
+route->u.msi.data = dst.data;
+}
+
 return 0;
 }
 
diff --git a/trace-events b/trace-events
index 8350743..b03d310 100644
--- a/trace-events
+++ b/trace-events
@@ -1909,3 +1909,6 @@ aspeed_vic_update_fiq(int flags) "Raising FIQ: %d"
 aspeed_vic_update_irq(int flags) "Raising IRQ: %d"
 aspeed_vic_read(uint64_t offset, unsigned size, uint32_t value) "From 0x%" 
PRIx64 " of size %u: 0x%" PRIx32
 aspeed_vic_write(uint64_t offset, unsigned size, uint32_t data) "To 0x%" 
PRIx64 " of size %u: 0x%" PRIx32
+
+# target-i386/kvm.c
+kvm_x86_fixup_msi_error(uint32_t gsi) "VT-d failed to remap interrupt for GSI 
%" PRIu32
-- 
2.4.11

[Qemu-devel] [PATCH v6 02/26] intel_iommu: allow queued invalidation for IR

2016-05-04 Thread Peter Xu

Queued invalidation is required for IR. This patch add basic support for
interrupt cache invalidate requests. Since we currently have no IR cache
implemented yet, we can just skip all interrupt cache invalidation
requests for now.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 9 +
 hw/i386/intel_iommu_internal.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 347718f..4b0558e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1400,6 +1400,15 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s)
 }
 break;
 
+case VTD_INV_DESC_IEC:
+VTD_DPRINTF(INV, "Interrupt Entry Cache Invalidation "
+"not implemented yet");
+/*
+ * Since currently we do not cache interrupt entries, we can
+ * just mark this descriptor as "good" and move on.
+ */
+break;
+
 default:
 VTD_DPRINTF(GENERAL, "error: unkonw Invalidation Descriptor type "
 "hi 0x%"PRIx64 " lo 0x%"PRIx64 " type %"PRIu8,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index e5f514c..b648e69 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -286,6 +286,8 @@ typedef struct VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_TYPE   0xf
 #define VTD_INV_DESC_CC 0x1 /* Context-cache Invalidate Desc */
 #define VTD_INV_DESC_IOTLB  0x2
+#define VTD_INV_DESC_IEC0x4 /* Interrupt Entry Cache
+   Invalidate Descriptor */
 #define VTD_INV_DESC_WAIT   0x5 /* Invalidation Wait Descriptor */
 #define VTD_INV_DESC_NONE   0   /* Not an Invalidate Descriptor */
 
-- 
2.4.11

[Qemu-devel] [PATCH v6 05/26] intel_iommu: define interrupt remap table addr register

2016-05-04 Thread Peter Xu

Defined Interrupt Remap Table Address register to store IR table
pointer. Also, do proper handling on global command register writes to
store table pointer and its size.

One more debug flag "DEBUG_IR" is added for interrupt remapping.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 52 +-
 hw/i386/intel_iommu_internal.h |  4 
 include/hw/i386/intel_iommu.h  |  5 
 3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 17668d6..00b873c 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -30,7 +30,7 @@
 #ifdef DEBUG_INTEL_IOMMU
 enum {
 DEBUG_GENERAL, DEBUG_CSR, DEBUG_INV, DEBUG_MMU, DEBUG_FLOG,
-DEBUG_CACHE,
+DEBUG_CACHE, DEBUG_IR,
 };
 #define VTD_DBGBIT(x)   (1 << DEBUG_##x)
 static int vtd_dbgflags = VTD_DBGBIT(GENERAL) | VTD_DBGBIT(CSR);
@@ -900,6 +900,19 @@ static void vtd_root_table_setup(IntelIOMMUState *s)
 (s->root_extended ? "(extended)" : ""));
 }
 
+static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
+{
+uint64_t value = 0;
+value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
+s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
+s->intr_root = value & VTD_IRTA_ADDR_MASK;
+
+/* TODO: invalidate interrupt entry cache */
+
+VTD_DPRINTF(CSR, "int remap table addr 0x%"PRIx64 " size %"PRIu32,
+s->intr_root, s->intr_size);
+}
+
 static void vtd_context_global_invalidate(IntelIOMMUState *s)
 {
 s->context_cache_gen++;
@@ -1138,6 +1151,16 @@ static void vtd_handle_gcmd_srtp(IntelIOMMUState *s)
 vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_RTPS);
 }
 
+/* Set Interrupt Remap Table Pointer */
+static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s)
+{
+VTD_DPRINTF(CSR, "set Interrupt Remap Table Pointer");
+
+vtd_interrupt_remap_table_setup(s);
+/* Ok - report back to driver */
+vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS);
+}
+
 /* Handle Translation Enable/Disable */
 static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
 {
@@ -1177,6 +1200,10 @@ static void vtd_handle_gcmd_write(IntelIOMMUState *s)
 /* Queued Invalidation Enable */
 vtd_handle_gcmd_qie(s, val & VTD_GCMD_QIE);
 }
+if (val & VTD_GCMD_SIRTP) {
+/* Set/update the interrupt remapping root-table pointer */
+vtd_handle_gcmd_sirtp(s);
+}
 }
 
 /* Handle write to Context Command Register */
@@ -1838,6 +1865,23 @@ static void vtd_mem_write(void *opaque, hwaddr addr,
 vtd_update_fsts_ppf(s);
 break;
 
+case DMAR_IRTA_REG:
+VTD_DPRINTF(IR, "DMAR_IRTA_REG write addr 0x%"PRIx64
+", size %d, val 0x%"PRIx64, addr, size, val);
+if (size == 4) {
+vtd_set_long(s, addr, val);
+} else {
+vtd_set_quad(s, addr, val);
+}
+break;
+
+case DMAR_IRTA_REG_HI:
+VTD_DPRINTF(IR, "DMAR_IRTA_REG_HI write addr 0x%"PRIx64
+", size %d, val 0x%"PRIx64, addr, size, val);
+assert(size == 4);
+vtd_set_long(s, addr, val);
+break;
+
 default:
 VTD_DPRINTF(GENERAL, "error: unhandled reg write addr 0x%"PRIx64
 ", size %d, val 0x%"PRIx64, addr, size, val);
@@ -2017,6 +2061,12 @@ static void vtd_init(IntelIOMMUState *s)
 /* Fault Recording Registers, 128-bit */
 vtd_define_quad(s, DMAR_FRCD_REG_0_0, 0, 0, 0);
 vtd_define_quad(s, DMAR_FRCD_REG_0_2, 0, 0, 0x8000ULL);
+
+/*
+ * Interrupt remapping registers, not support extended interrupt
+ * mode for now.
+ */
+vtd_define_quad(s, DMAR_IRTA_REG, 0, 0xf00fULL, 0);
 }
 
 /* Should not reset address_spaces when reset because devices will still use
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 5b98a11..309833f 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -172,6 +172,10 @@
 #define VTD_RTADDR_RTT  (1ULL << 11)
 #define VTD_RTADDR_ADDR_MASK(VTD_HAW_MASK ^ 0xfffULL)
 
+/* IRTA_REG */
+#define VTD_IRTA_ADDR_MASK  (VTD_HAW_MASK ^ 0xfffULL)
+#define VTD_IRTA_SIZE_MASK  (0xfULL)
+
 /* ECAP_REG */
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO(DMAR_IOTLB_REG_OFFSET << 4)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 0d89796..cc49839 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -125,6 +125,11 @@ struct IntelIOMMUState {
 MemoryRegionIOMMUOps iommu_ops;
 GHashTable *vtd_as_by_busptr;   /* VTDBus objects indexed by PCIBus* 
reference */
 VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by 
bus number */
+
+/* interrupt remapping */
+bool intr_enabled;  /* Whether guest enabled IR */
+dma_addr_t intr_root;   /* Interrupt remapping table poi

[Qemu-devel] [PATCH v6 03/26] intel_iommu: set IR bit for ECAP register

2016-05-04 Thread Peter Xu

Enable IR in IOMMU Extended Capability register.

Signed-off-by: Peter Xu 
---
 hw/i386/intel_iommu.c  | 7 +++
 hw/i386/intel_iommu_internal.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4b0558e..17668d6 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -24,6 +24,7 @@
 #include "exec/address-spaces.h"
 #include "intel_iommu_internal.h"
 #include "hw/pci/pci.h"
+#include "hw/boards.h"
 
 /*#define DEBUG_INTEL_IOMMU*/
 #ifdef DEBUG_INTEL_IOMMU
@@ -1941,6 +1942,8 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, 
PCIBus *bus, int devfn)
  */
 static void vtd_init(IntelIOMMUState *s)
 {
+MachineState *ms = MACHINE(qdev_get_machine());
+
 memset(s->csr, 0, DMAR_REG_SIZE);
 memset(s->wmask, 0, DMAR_REG_SIZE);
 memset(s->w1cmask, 0, DMAR_REG_SIZE);
@@ -1961,6 +1964,10 @@ static void vtd_init(IntelIOMMUState *s)
  VTD_CAP_SAGAW | VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS;
 s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
 
+if (ms->iommu_intr) {
+s->ecap |= VTD_ECAP_IR;
+}
+
 vtd_reset_context_cache(s);
 vtd_reset_iotlb(s);
 
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index b648e69..5b98a11 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -176,6 +176,8 @@
 /* (offset >> 4) << 8 */
 #define VTD_ECAP_IRO(DMAR_IOTLB_REG_OFFSET << 4)
 #define VTD_ECAP_QI (1ULL << 1)
+/* Interrupt Remapping support */
+#define VTD_ECAP_IR (1ULL << 3)
 
 /* CAP_REG */
 /* (offset >> 4) << 24 */
-- 
2.4.11

[Qemu-devel] [PATCH v6 01/26] acpi: enable INTR for DMAR report structure

2016-05-04 Thread Peter Xu

Introduce iommu_intr in MachineState to show whether IOMMU IR is
enabled. By default, IR is off.

In ACPI DMA remapping report structure, enable INTR flag when specified.

Signed-off-by: Peter Xu 
---
 hw/core/machine.c |  2 ++
 hw/i386/acpi-build.c  | 12 +---
 include/hw/boards.h   |  1 +
 include/hw/i386/intel_iommu.h |  2 ++
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 6dbbc85..276ad61 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -382,6 +382,8 @@ static void machine_initfn(Object *obj)
 ms->kvm_shadow_mem = -1;
 ms->dump_guest_core = true;
 ms->mem_merge = true;
+/* Disable interrupt remapping by default. */
+ms->iommu_intr = false;
 
 object_property_add_str(obj, "accel",
 machine_get_accel, machine_set_accel, NULL);
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 6477003..80dd1bb 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2575,16 +2575,22 @@ build_mcfg_q35(GArray *table_data, GArray *linker, 
AcpiMcfgInfo *info)
 }
 
 static void
-build_dmar_q35(GArray *table_data, GArray *linker)
+build_dmar_q35(MachineState *ms, GArray *table_data, GArray *linker)
 {
 int dmar_start = table_data->len;
 
 AcpiTableDmar *dmar;
 AcpiDmarHardwareUnit *drhd;
+uint8_t dmar_flags = 0;
+
+if (ms->iommu_intr) {
+/* enable INTR for the IOMMU device */
+dmar_flags |= DMAR_REPORT_F_INTR;
+}
 
 dmar = acpi_data_push(table_data, sizeof(*dmar));
 dmar->host_address_width = VTD_HOST_ADDRESS_WIDTH - 1;
-dmar->flags = 0;/* No intr_remap for now */
+dmar->flags = dmar_flags;
 
 /* DMAR Remapping Hardware Unit Definition structure */
 drhd = acpi_data_push(table_data, sizeof(*drhd));
@@ -2745,7 +2751,7 @@ void acpi_build(AcpiBuildTables *tables, MachineState 
*machine)
 }
 if (acpi_has_iommu()) {
 acpi_add_table(table_offsets, tables_blob);
-build_dmar_q35(tables_blob, tables->linker);
+build_dmar_q35(MACHINE(pcms), tables_blob, tables->linker);
 }
 if (pcms->acpi_nvdimm_state.is_enabled) {
 nvdimm_build_acpi(table_offsets, tables_blob, tables->linker);
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 8d4fe56..43f4976 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -152,6 +152,7 @@ struct MachineState {
 bool igd_gfx_passthru;
 char *firmware;
 bool iommu;
+bool iommu_intr;
 bool suppress_vmdesc;
 bool enforce_config_section;
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index b024ffa..0d89796 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -44,6 +44,8 @@
 #define VTD_HOST_ADDRESS_WIDTH  39
 #define VTD_HAW_MASK((1ULL << VTD_HOST_ADDRESS_WIDTH) - 1)
 
+#define DMAR_REPORT_F_INTR  (1)
+
 typedef struct VTDContextEntry VTDContextEntry;
 typedef struct VTDContextCacheEntry VTDContextCacheEntry;
 typedef struct IntelIOMMUState IntelIOMMUState;
-- 
2.4.11

[Qemu-devel] [PATCH v6 00/26] IOMMU: Enable interrupt remapping for Intel IOMMU

2016-05-04 Thread Peter Xu

Hi, all,

This is v6 for Intel IOMMU IR support. This series introduced quite
a few new patches based on v5. Sorry for that (Yes, Jan is
contributing to it as well, though most of which are really good
ideas for me :). Hopefully we can get its convergence in this
version.

To make the review easier, I tried to keep all the existing patches
and indexes (also, this is easier for me too to do the
modifications, and logically I feel this make more sense and clean,
please let me know if I am wrong). Patches 1-18 are v5 patches, and
patches 19-26 are newly added patches.

All the new patches may need more review, many of them are outside
Intel IOMMU scope, and touching other part of codes, which I am
still not very sure about.

Testing is only covering basic smoke test for the following matrix:

- IR enabled/disable
- kernel irqchip off/split
- network device: tap with/without vhost, e1000

Here's the change log. Please review. Thanks,

v6 changes:
- patch 10: use write_with_attrs() rather than write(), preparing
  for SID verification [Jan]
- patch 17-18: add r-b line from Radim [Radim]
- new patch 19: put together Jan's EIM patch [Jan]
- new patch 20: add SID validation process
- new patch 21-22: introduce X86IOMMU class, which is the parent of
  IntelIOMMU class. Patch 21 only introduce the class and did
  nothing, patch 22 cleaned up all the vtd_*() hooks into x86
  ones. This is only a start. In the future, we can abstract more
  things into X86IOMMU class, like iotlb, address spaces mgmt,
  etc. [Jan]
- new patch 23-25: this is to do IEC notify to all irqfd consumers
  like vhost/vfio. patch 23 changed interface for
  kvm_irqchip_add_msi_route(), provide vector info rather than a raw
  MSI message. Patch 24 added new hooks to do arch-specific
  notification on addition/deletion of msi routes. Patch 25 is x86
  specific, which added one more IEC notifier for msi routes. [Jan]
- new patch 26: this is to partially solve the issue that Jan has
  encountered (1 sec delay when invalidating IR cache).

v5 changes:
- patch 10: add vector checking for IOAPIC interrupts (this may help
  debug in the future, will only generate warning if specify
  IOMMU_DEBUG)
- patch 13: replace error_report() with a trace. [Jan]
- patch 14: rename parameter "intr" to "intremap", to be aligned
  with kernel parameter [Jan]
- patch 15: fix comments for vtd_iec_notify_fn
- patch 17 & 18 (added): fix issue when IR enabled with devices
  using level-triggered interrupts, like e1000. Adding it to the end
  of series, since this issue never happen without IR.

  Patch 17 adds read-only check for IOAPIC entries.
  Patch 18 clears remote IRR bit when entry configured as
  edge-triggered.

v4 changes (all patch number corresponds to v3):
- add one patch at the start of v3 series: I missed to send the
  first patch in v3. adding it in. [Jan]
- patch 9: add support for compatible mode (no reason not to support
  it, if not, we will get some warnings when using split irqchip)
- patch 11: further simplify ioapic_update_kvm_routes() using the
  helper function.
- patch 12: tweak on kvm_arch_fixup_msi_route() rather than
  ioapic_update_kvm_routes() only. [Radim]
- add patch 15: introduce IEC (Interrupt Entry Cache) invalidation
  notifier list. We can register to this list if we want to be
  notified when we got IR invalidation requests [Radim]
- add patch 16: let IOAPIC the first consumer for the above IEC
  notifier list. [Radim]
- several other trivial fixes (like moving some defines from .c to
  .h, moving several lines of changes from one patch to another to
  make it make more sense, etc.)

v3 changes (all patch numbers corresponds to v2):
- patch 1 (-> v3 patch 13)
  - move to the end of series [Alex]
- patch 10 (dropped)
  - drop this one, since re-worked on IOAPIC support, so we do not
need this any more.
- patch 12 (-> v3 patch 10)
  - leverage MSI path for IOAPIC IR [Jan]
- patch 13 (v3 -> patch 9)
  - remove vtd_interrupt_remap_msi() declaration by reordering the
functions [mst]
  - vtd_generate_msi_message(): init msg using {}, remove FIXME
[mst]
- new patches
  - v3 patch 11: introduce ioapic_entry_parse() helper function
  - v3 patch 12: add support for kernel-irqchip=split. This needs
more reviews, logically this should enable lots of things:
splitted irqchip, irqfd, vhost, and irqfd support for
passthrough devices (not tested). Please refer to the patch for
more information.

v2 changes:
- patch 1
  - rename "int_remap" to "intr" in several places [Marcel]
  - remove "Intel" specific words in desc or commit message, prepare
itself with further AMD support [Marcel]
  - avoid using object_property_get_bool() [Marcel]
- patch 5
  - use PCI bus number 0xff rather than 0xf0 for the IOAPIC scope
definition. (please let me know if anyone knows how I can avoid
user using PCI bus number 0xff... TIA)
- patch 11
  - fix comments [Marcel]
- all
  - remove intr_supported variable [Marcel

Re: [Qemu-devel] [PATCH 0/5] QOM'ify hw/display devices

2016-05-04 Thread xiaoqiang zhao




在 2016年05月04日 22:26, Peter Maydell 写道:

On 24 March 2016 at 10:29, xiaoqiang zhao  wrote:

This patch set trys to QOM'ify hw/display files, see commit messages
for more details

xiaoqiang zhao (5):
   hw/display: QOM'ify exynos4210_fimd.c
   hw/display: QOM'ify jazz_led.c
   hw/display: QOM'ify milkymist-tmu2.c
   hw/display: QOM'ify milkymist-vgafb.c
   hw/display: QOM'ify pl110.c

Hi; I was going to review this series (apologies for taking so
long!), but looking at my email archive and the patchwork server
the patches in it seem a bit confused. I see seven patches, not five,
with rather odd patch number indications:
  1/5
  2/5
  3/6
  4/5
  5/6
  5/5
  6/6

(and 5/5 and 6/6 seem to be the same). Could you resend the
series with the correct patches in it, please?

thanks
-- PMM
I have resend patches, buf forget to cc to you, so I send you a separate 
copy.

Re: [Qemu-devel] [RFC PATCH V3 4/4] colo-compare: add TCP, UDP, ICMP packet comparison

2016-05-04 Thread Zhang Chen




On 05/05/2016 11:03 AM, Zhang Chen wrote:



On 04/29/2016 03:44 AM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/colo-compare.c | 158 
+++--

  1 file changed, 154 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 4b5a2d4..3dad461 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -385,9 +385,148 @@ static int colo_packet_compare(Packet *ppkt, 
Packet *spkt)

  }
  }
  -static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)
+/*
+ * called from the compare thread on the primary
+ * for compare tcp packet
+ * compare_tcp copied from Dr. David Alan Gilbert's branch
+ */
+static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
+{
+struct tcphdr *ptcp, *stcp;
+int res;
+char *sdebug, *ddebug;
+ptrdiff_t offset;
+
+trace_colo_compare_main("compare tcp");
+ptcp = (struct tcphdr *)ppkt->transport_layer;
+stcp = (struct tcphdr *)spkt->transport_layer;
+
+/* Initial is compare the whole packet */
+offset = 12; /* Hack! Skip virtio header */

So, when I post a set of patches and mark it saying that I know they've
got a lot of hacks in them, it's good for those reusing those patches
to check they need the hacks!

In my world I found I needed to skip over that header and I didn't 
understand
why; but hadn't figured out the details yet, and I'd added the 12 
everywhere -
I think this is the only place you've got it, so it's almost 
certainly wrong.


I test in my world it hadn't that header,so if I remove the
12 offset,then the function is almost OK?




+if (ptcp->th_flags == stcp->th_flags &&
+((ptcp->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
+/* This is the syn/ack response from the guest to an incoming
+ * connection; the secondary won't have matched the 
sequence number

+ * Note: We should probably compare the IP level?
+ * Note hack: This already has the virtio offset
+ */
+offset = sizeof(ptcp->th_ack) + (void *)&ptcp->th_ack - 
ppkt->data;

+}
+/* Note - we want to compare everything as long as it's not the 
syn/ack? */

+assert(offset > 0);
+assert(spkt->size > offset);
+
+/* The 'identification' field in the IP header is *very* random
+ * it almost never matches.  Fudge this by ignoring differences in
+ * unfragmented packets; they'll normally sort themselves out 
if different

+ * anyway, and it should recover at the TCP level.
+ * An alternative would be to get both the primary and 
secondary to rewrite
+ * somehow; but that would need some sync traffic to sync the 
state

+ */
+if (ntohs(ppkt->ip->ip_off) & IP_DF) {
+spkt->ip->ip_id = ppkt->ip->ip_id;
+/* and the sum will be different if the IDs were different */
+spkt->ip->ip_sum = ppkt->ip->ip_sum;
+}
+
+res = memcmp(ppkt->data + offset, spkt->data + offset,
+ (spkt->size - offset));
+
+if (res && DEBUG_TCP_COMPARE) {
+sdebug = strdup(inet_ntoa(ppkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst));
+fprintf(stderr, "%s: src/dst: %s/%s offset=%zd p: 
seq/ack=%u/%u"

+" s: seq/ack=%u/%u res=%d flags=%x/%x\n", __func__,
+   sdebug, ddebug, offset,
+   ntohl(ptcp->th_seq), ntohl(ptcp->th_ack),
+   ntohl(stcp->th_seq), ntohl(stcp->th_ack),
+   res, ptcp->th_flags, stcp->th_flags);
+if (res && (ptcp->th_seq == stcp->th_seq)) {
+trace_colo_compare_with_int("Primary len", ppkt->size);
+colo_dump_packet(ppkt);
+trace_colo_compare_with_int("Secondary len", spkt->size);
+colo_dump_packet(spkt);
+}
Try and use meaningful traceing for this - don't use a 
'compare_with_int'

trace; but use a name that says what you're doing - for example
trace_colo_tcp_miscompare ; that way if you're running COLO and just
want to see why you're getting so many miscompares, you can look
at this without turning on all the rest of the debug.


OK,I will fix in next version.



Also, in my version instead of using a DEBUG_TCP macro, I again used
the trace system, so, my code here was:

 if (trace_event_get_state(TRACE_COLO_PROXY_MISCOMPARE) && res) {

 that means you can switch it on and off at runtime using the
trace system.  Then just as it's running I can get to the (qemu) prompt
and do:
trace-event colo_proxy_miscompare on

and see what's happening without recompiling.


OK,I will fix.




+g_free(sdebug);
+g_free(ddebug);
+}
+
+return res;
+}
+
+/*
+ * called from the compare thread on the primary
+ * for compare udp packet
+ */
+static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt)
+{
+int ret = 1;
+
+

[Qemu-devel] [PATCH RESEND 2/5] hw/display: QOM'ify jazz_led.c

2016-05-04 Thread xiaoqiang zhao

* Drop the old SysBus init function and use instance_init
* Move graphic_console_init into realize stage

Signed-off-by: xiaoqiang zhao 
---
 hw/display/jazz_led.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/display/jazz_led.c b/hw/display/jazz_led.c
index 09dcdb4..b72fdb1 100644
--- a/hw/display/jazz_led.c
+++ b/hw/display/jazz_led.c
@@ -267,16 +267,20 @@ static const GraphicHwOps jazz_led_ops = {
 .text_update = jazz_led_text_update,
 };
 
-static int jazz_led_init(SysBusDevice *dev)
+static void jazz_led_init(Object *obj)
 {
-LedState *s = JAZZ_LED(dev);
+LedState *s = JAZZ_LED(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
-memory_region_init_io(&s->iomem, OBJECT(s), &led_ops, s, "led", 1);
+memory_region_init_io(&s->iomem, obj, &led_ops, s, "led", 1);
 sysbus_init_mmio(dev, &s->iomem);
+}
 
-s->con = graphic_console_init(DEVICE(dev), 0, &jazz_led_ops, s);
+static void jazz_led_realize(DeviceState *dev, Error **errp)
+{
+LedState *s = JAZZ_LED(dev);
 
-return 0;
+s->con = graphic_console_init(dev, 0, &jazz_led_ops, s);
 }
 
 static void jazz_led_reset(DeviceState *d)
@@ -291,18 +295,18 @@ static void jazz_led_reset(DeviceState *d)
 static void jazz_led_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = jazz_led_init;
 dc->desc = "Jazz LED display",
 dc->vmsd = &vmstate_jazz_led;
 dc->reset = jazz_led_reset;
+dc->realize = jazz_led_realize;
 }
 
 static const TypeInfo jazz_led_info = {
 .name  = TYPE_JAZZ_LED,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(LedState),
+.instance_init = jazz_led_init,
 .class_init= jazz_led_class_init,
 };
 
-- 
2.1.4

[Qemu-devel] [PATCH RESEND 4/5] hw/display: QOM'ify milkymist-vgafb.c

2016-05-04 Thread xiaoqiang zhao

* Drop the old SysBus init function and use instance_init
* Move graphic_console_init into realize stage

Signed-off-by: xiaoqiang zhao 
---
 hw/display/milkymist-vgafb.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/display/milkymist-vgafb.c b/hw/display/milkymist-vgafb.c
index 19ca256..39e16d6 100644
--- a/hw/display/milkymist-vgafb.c
+++ b/hw/display/milkymist-vgafb.c
@@ -292,17 +292,21 @@ static const GraphicHwOps vgafb_ops = {
 .gfx_update  = vgafb_update_display,
 };
 
-static int milkymist_vgafb_init(SysBusDevice *dev)
+static void milkymist_vgafb_init(Object *obj)
 {
-MilkymistVgafbState *s = MILKYMIST_VGAFB(dev);
+MilkymistVgafbState *s = MILKYMIST_VGAFB(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 memory_region_init_io(&s->regs_region, OBJECT(s), &vgafb_mmio_ops, s,
 "milkymist-vgafb", R_MAX * 4);
 sysbus_init_mmio(dev, &s->regs_region);
+}
 
-s->con = graphic_console_init(DEVICE(dev), 0, &vgafb_ops, s);
+static void milkymist_vgafb_realize(DeviceState *dev, Error **errp)
+{
+MilkymistVgafbState *s = MILKYMIST_VGAFB(dev);
 
-return 0;
+s->con = graphic_console_init(dev, 0, &vgafb_ops, s);
 }
 
 static int vgafb_post_load(void *opaque, int version_id)
@@ -331,18 +335,18 @@ static Property milkymist_vgafb_properties[] = {
 static void milkymist_vgafb_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = milkymist_vgafb_init;
 dc->reset = milkymist_vgafb_reset;
 dc->vmsd = &vmstate_milkymist_vgafb;
 dc->props = milkymist_vgafb_properties;
+dc->realize = milkymist_vgafb_realize;
 }
 
 static const TypeInfo milkymist_vgafb_info = {
 .name  = TYPE_MILKYMIST_VGAFB,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(MilkymistVgafbState),
+.instance_init = milkymist_vgafb_init,
 .class_init= milkymist_vgafb_class_init,
 };
 
-- 
2.1.4

[Qemu-devel] [PATCH RESEND 1/5] hw/display: QOM'ify exynos4210_fimd.c

2016-05-04 Thread xiaoqiang zhao

* Drop the old SysBus init function and use instance_init
* Move graphic_console_init into realize stage

Signed-off-by: xiaoqiang zhao 
---
 hw/display/exynos4210_fimd.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/hw/display/exynos4210_fimd.c b/hw/display/exynos4210_fimd.c
index 728eb21..e5be713 100644
--- a/hw/display/exynos4210_fimd.c
+++ b/hw/display/exynos4210_fimd.c
@@ -1909,9 +1909,10 @@ static const GraphicHwOps exynos4210_fimd_ops = {
 .gfx_update  = exynos4210_fimd_update,
 };
 
-static int exynos4210_fimd_init(SysBusDevice *dev)
+static void exynos4210_fimd_init(Object *obj)
 {
-Exynos4210fimdState *s = EXYNOS4210_FIMD(dev);
+Exynos4210fimdState *s = EXYNOS4210_FIMD(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 s->ifb = NULL;
 
@@ -1919,28 +1920,32 @@ static int exynos4210_fimd_init(SysBusDevice *dev)
 sysbus_init_irq(dev, &s->irq[1]);
 sysbus_init_irq(dev, &s->irq[2]);
 
-memory_region_init_io(&s->iomem, OBJECT(s), &exynos4210_fimd_mmio_ops, s,
+memory_region_init_io(&s->iomem, obj, &exynos4210_fimd_mmio_ops, s,
 "exynos4210.fimd", FIMD_REGS_SIZE);
 sysbus_init_mmio(dev, &s->iomem);
-s->console = graphic_console_init(DEVICE(dev), 0, &exynos4210_fimd_ops, s);
+}
 
-return 0;
+static void exynos4210_fimd_realize(DeviceState *dev, Error **errp)
+{
+Exynos4210fimdState *s = EXYNOS4210_FIMD(dev);
+
+s->console = graphic_console_init(dev, 0, &exynos4210_fimd_ops, s);
 }
 
 static void exynos4210_fimd_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
 dc->vmsd = &exynos4210_fimd_vmstate;
 dc->reset = exynos4210_fimd_reset;
-k->init = exynos4210_fimd_init;
+dc->realize = exynos4210_fimd_realize;
 }
 
 static const TypeInfo exynos4210_fimd_info = {
 .name = TYPE_EXYNOS4210_FIMD,
 .parent = TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(Exynos4210fimdState),
+.instance_init = exynos4210_fimd_init,
 .class_init = exynos4210_fimd_class_init,
 };
 
-- 
2.1.4

[Qemu-devel] [PATCH RESEND 5/5] hw/display: QOM'ify pl110.c

2016-05-04 Thread xiaoqiang zhao

* Drop the old SysBus init function and use instance_init
* Move graphic_console_init into realize stage

Signed-off-by: xiaoqiang zhao 
---
 hw/display/pl110.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/hw/display/pl110.c b/hw/display/pl110.c
index d589959..61418da 100644
--- a/hw/display/pl110.c
+++ b/hw/display/pl110.c
@@ -465,24 +465,24 @@ static const GraphicHwOps pl110_gfx_ops = {
 .gfx_update  = pl110_update_display,
 };
 
-static int pl110_initfn(SysBusDevice *sbd)
+static void pl110_init(Object *obj)
 {
-DeviceState *dev = DEVICE(sbd);
-PL110State *s = PL110(dev);
+DeviceState *dev = DEVICE(obj);
+PL110State *s = PL110(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 
-memory_region_init_io(&s->iomem, OBJECT(s), &pl110_ops, s, "pl110", 
0x1000);
+memory_region_init_io(&s->iomem, obj, &pl110_ops, s, "pl110", 0x1000);
 sysbus_init_mmio(sbd, &s->iomem);
 sysbus_init_irq(sbd, &s->irq);
 qdev_init_gpio_in(dev, pl110_mux_ctrl_set, 1);
-s->con = graphic_console_init(dev, 0, &pl110_gfx_ops, s);
-return 0;
 }
 
-static void pl110_init(Object *obj)
+static void pl110_realize(DeviceState *dev, Error **errp)
 {
-PL110State *s = PL110(obj);
+PL110State *s = PL110(dev);
 
 s->version = PL110;
+s->con = graphic_console_init(dev, 0, &pl110_gfx_ops, s);
 }
 
 static void pl110_versatile_init(Object *obj)
@@ -502,11 +502,10 @@ static void pl111_init(Object *obj)
 static void pl110_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = pl110_initfn;
 set_bit(DEVICE_CATEGORY_DISPLAY, dc->categories);
 dc->vmsd = &vmstate_pl110;
+dc->realize = pl110_realize;
 }
 
 static const TypeInfo pl110_info = {
-- 
2.1.4

[Qemu-devel] [PATCH RESEND 3/5] hw/display: QOM'ify milkymist-tmu2.c

2016-05-04 Thread xiaoqiang zhao

* Drop the old SysBus init function and use instance_init
* Move tmu2_glx_init into realize stage

Signed-off-by: xiaoqiang zhao 
---
 hw/display/milkymist-tmu2.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/hw/display/milkymist-tmu2.c b/hw/display/milkymist-tmu2.c
index 9bc88f9..df10bf4 100644
--- a/hw/display/milkymist-tmu2.c
+++ b/hw/display/milkymist-tmu2.c
@@ -443,21 +443,25 @@ static void milkymist_tmu2_reset(DeviceState *d)
 }
 }
 
-static int milkymist_tmu2_init(SysBusDevice *dev)
+static void milkymist_tmu2_init(Object *obj)
 {
-MilkymistTMU2State *s = MILKYMIST_TMU2(dev);
-
-if (tmu2_glx_init(s)) {
-return 1;
-}
+MilkymistTMU2State *s = MILKYMIST_TMU2(obj);
+SysBusDevice *dev = SYS_BUS_DEVICE(obj);
 
 sysbus_init_irq(dev, &s->irq);
 
-memory_region_init_io(&s->regs_region, OBJECT(s), &tmu2_mmio_ops, s,
+memory_region_init_io(&s->regs_region, obj, &tmu2_mmio_ops, s,
 "milkymist-tmu2", R_MAX * 4);
 sysbus_init_mmio(dev, &s->regs_region);
+}
 
-return 0;
+static void milkymist_tmu2_realize(DeviceState *dev, Error **errp)
+{
+MilkymistTMU2State *s = MILKYMIST_TMU2(dev);
+
+if (tmu2_glx_init(s)) {
+error_setg(errp, "tmu2_glx_init failed.");
+}
 }
 
 static const VMStateDescription vmstate_milkymist_tmu2 = {
@@ -473,9 +477,8 @@ static const VMStateDescription vmstate_milkymist_tmu2 = {
 static void milkymist_tmu2_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = milkymist_tmu2_init;
+dc->realize = milkymist_tmu2_realize;
 dc->reset = milkymist_tmu2_reset;
 dc->vmsd = &vmstate_milkymist_tmu2;
 }
@@ -484,6 +487,7 @@ static const TypeInfo milkymist_tmu2_info = {
 .name  = TYPE_MILKYMIST_TMU2,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(MilkymistTMU2State),
+.instance_init = milkymist_tmu2_init,
 .class_init= milkymist_tmu2_class_init,
 };
 
-- 
2.1.4

[Qemu-devel] [PATCH RESEND 0/5] QOM'ify hw/display devices

2016-05-04 Thread xiaoqiang zhao

This patch set trys to QOM'ify hw/display files, see commit messages
for more details.

xiaoqiang zhao (5):
  hw/display: QOM'ify exynos4210_fimd.c
  hw/display: QOM'ify jazz_led.c
  hw/display: QOM'ify milkymist-tmu2.c
  hw/display: QOM'ify milkymist-vgafb.c
  hw/display: QOM'ify pl110.c

 hw/display/exynos4210_fimd.c | 19 ---
 hw/display/jazz_led.c| 18 +++---
 hw/display/milkymist-tmu2.c  | 24 ++--
 hw/display/milkymist-vgafb.c | 16 ++--
 hw/display/pl110.c   | 19 +--
 5 files changed, 56 insertions(+), 40 deletions(-)

-- 
2.1.4

Re: [Qemu-devel] [RFC PATCH V3 4/4] colo-compare: add TCP, UDP, ICMP packet comparison

2016-05-04 Thread Zhang Chen




On 04/29/2016 03:44 AM, Dr. David Alan Gilbert wrote:

* Zhang Chen (zhangchen.f...@cn.fujitsu.com) wrote:

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
---
  net/colo-compare.c | 158 +++--
  1 file changed, 154 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 4b5a2d4..3dad461 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -385,9 +385,148 @@ static int colo_packet_compare(Packet *ppkt, Packet *spkt)
  }
  }
  
-static int colo_packet_compare_all(Packet *spkt, Packet *ppkt)

+/*
+ * called from the compare thread on the primary
+ * for compare tcp packet
+ * compare_tcp copied from Dr. David Alan Gilbert's branch
+ */
+static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
+{
+struct tcphdr *ptcp, *stcp;
+int res;
+char *sdebug, *ddebug;
+ptrdiff_t offset;
+
+trace_colo_compare_main("compare tcp");
+ptcp = (struct tcphdr *)ppkt->transport_layer;
+stcp = (struct tcphdr *)spkt->transport_layer;
+
+/* Initial is compare the whole packet */
+offset = 12; /* Hack! Skip virtio header */

So, when I post a set of patches and mark it saying that I know they've
got a lot of hacks in them, it's good for those reusing those patches
to check they need the hacks!

In my world I found I needed to skip over that header and I didn't understand
why; but hadn't figured out the details yet, and I'd added the 12 everywhere -
I think this is the only place you've got it, so it's almost certainly wrong.


I test in my world it hadn't that header,so if I remove the
12 offset,then the function is almost OK?




+if (ptcp->th_flags == stcp->th_flags &&
+((ptcp->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
+/* This is the syn/ack response from the guest to an incoming
+ * connection; the secondary won't have matched the sequence number
+ * Note: We should probably compare the IP level?
+ * Note hack: This already has the virtio offset
+ */
+offset = sizeof(ptcp->th_ack) + (void *)&ptcp->th_ack - ppkt->data;
+}
+/* Note - we want to compare everything as long as it's not the syn/ack? */
+assert(offset > 0);
+assert(spkt->size > offset);
+
+/* The 'identification' field in the IP header is *very* random
+ * it almost never matches.  Fudge this by ignoring differences in
+ * unfragmented packets; they'll normally sort themselves out if different
+ * anyway, and it should recover at the TCP level.
+ * An alternative would be to get both the primary and secondary to rewrite
+ * somehow; but that would need some sync traffic to sync the state
+ */
+if (ntohs(ppkt->ip->ip_off) & IP_DF) {
+spkt->ip->ip_id = ppkt->ip->ip_id;
+/* and the sum will be different if the IDs were different */
+spkt->ip->ip_sum = ppkt->ip->ip_sum;
+}
+
+res = memcmp(ppkt->data + offset, spkt->data + offset,
+ (spkt->size - offset));
+
+if (res && DEBUG_TCP_COMPARE) {
+sdebug = strdup(inet_ntoa(ppkt->ip->ip_src));
+ddebug = strdup(inet_ntoa(ppkt->ip->ip_dst));
+fprintf(stderr, "%s: src/dst: %s/%s offset=%zd p: seq/ack=%u/%u"
+" s: seq/ack=%u/%u res=%d flags=%x/%x\n", __func__,
+   sdebug, ddebug, offset,
+   ntohl(ptcp->th_seq), ntohl(ptcp->th_ack),
+   ntohl(stcp->th_seq), ntohl(stcp->th_ack),
+   res, ptcp->th_flags, stcp->th_flags);
+if (res && (ptcp->th_seq == stcp->th_seq)) {
+trace_colo_compare_with_int("Primary len", ppkt->size);
+colo_dump_packet(ppkt);
+trace_colo_compare_with_int("Secondary len", spkt->size);
+colo_dump_packet(spkt);
+}

Try and use meaningful traceing for this - don't use a 'compare_with_int'
trace; but use a name that says what you're doing - for example
trace_colo_tcp_miscompare ; that way if you're running COLO and just
want to see why you're getting so many miscompares, you can look
at this without turning on all the rest of the debug.


OK,I will fix in next version.



Also, in my version instead of using a DEBUG_TCP macro, I again used
the trace system, so, my code here was:

 if (trace_event_get_state(TRACE_COLO_PROXY_MISCOMPARE) && res) {

 that means you can switch it on and off at runtime using the
trace system.  Then just as it's running I can get to the (qemu) prompt
and do:
trace-event colo_proxy_miscompare on

and see what's happening without recompiling.


OK,I will fix.




+g_free(sdebug);
+g_free(ddebug);
+}
+
+return res;
+}
+
+/*
+ * called from the compare thread on the primary
+ * for compare udp packet
+ */
+static int colo_packet_compare_udp(Packet *spkt, Packet *ppkt)
+{
+int ret = 1;
+
+trace_colo_compare_main("compare udp");
+ret = colo_packet_c

Re: [Qemu-devel] [PATCH 0/5] QOM'ify hw/display devices

2016-05-04 Thread xiaoqiang zhao




在 2016年05月04日 22:26, Peter Maydell 写道:

On 24 March 2016 at 10:29, xiaoqiang zhao  wrote:

This patch set trys to QOM'ify hw/display files, see commit messages
for more details

xiaoqiang zhao (5):
   hw/display: QOM'ify exynos4210_fimd.c
   hw/display: QOM'ify jazz_led.c
   hw/display: QOM'ify milkymist-tmu2.c
   hw/display: QOM'ify milkymist-vgafb.c
   hw/display: QOM'ify pl110.c

Hi; I was going to review this series (apologies for taking so
long!), but looking at my email archive and the patchwork server
the patches in it seem a bit confused. I see seven patches, not five,
with rather odd patch number indications:
  1/5
  2/5
  3/6
  4/5
  5/6
  5/5
  6/6

(and 5/5 and 6/6 seem to be the same). Could you resend the
series with the correct patches in it, please?

thanks
-- PMM

Yes, there is a mistake, i will resend soon.

[Qemu-devel] [PATCH qemu] vfio: Fix 128 bit handling when deleting region

2016-05-04 Thread Alexey Kardashevskiy

7532d3cbf "vfio: Fix 128 bit handling" added support for 64bit IOMMU
memory regions when those are added to VFIO address space; however
removing code cannot cope with these as int128_get64() will fail on
1<<64.

This copies 128bit handling from region_add() to region_del().

Since the only machine type which is actually going to use 64bit IOMMU
is pseries and it never really removes them (instead it will dynamically
add/remove subregions), this should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy 
---
 hw/vfio/common.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f27db36..fe5ec6a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -430,6 +430,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 hwaddr iova, end;
+Int128 llend, llsize;
 int ret;
 
 if (vfio_listener_skipped_section(section)) {
@@ -468,21 +469,25 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 
 iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
-end = (section->offset_within_address_space + int128_get64(section->size)) 
&
-  TARGET_PAGE_MASK;
+llend = int128_make64(section->offset_within_address_space);
+llend = int128_add(llend, section->size);
+llend = int128_and(llend, int128_exts64(TARGET_PAGE_MASK));
 
-if (iova >= end) {
+if (int128_ge(int128_make64(iova), llend)) {
 return;
 }
+end = int128_get64(int128_sub(llend, int128_one()));
 
-trace_vfio_listener_region_del(iova, end - 1);
+llsize = int128_sub(llend, int128_make64(iova));
 
-ret = vfio_dma_unmap(container, iova, end - iova);
+trace_vfio_listener_region_del(iova, end);
+
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
 memory_region_unref(section->mr);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
- container, iova, end - iova, ret);
+ container, iova, int128_get64(llsize), ret);
 }
 }
 
-- 
2.5.0.rc3

Re: [Qemu-devel] [PATCH 1/2] block: Invalidate all children

2016-05-04 Thread Fam Zheng

On Wed, 05/04 12:10, Kevin Wolf wrote:
> Am 19.04.2016 um 03:42 hat Fam Zheng geschrieben:
> > Currently we only recurse to bs->file, which will miss the children in 
> > quorum
> > and VMDK.
> > 
> > Recurse into the whole subtree to avoid that.
> > 
> > Signed-off-by: Fam Zheng 
> > ---
> >  block.c | 20 ++--
> >  1 file changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block.c b/block.c
> > index d4939b4..fa8b38f 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -3201,6 +3201,7 @@ void bdrv_init_with_whitelist(void)
> >  
> >  void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp)
> >  {
> > +BdrvChild *child;
> >  Error *local_err = NULL;
> >  int ret;
> >  
> > @@ -3215,13 +3216,20 @@ void bdrv_invalidate_cache(BlockDriverState *bs, 
> > Error **errp)
> >  
> >  if (bs->drv->bdrv_invalidate_cache) {
> >  bs->drv->bdrv_invalidate_cache(bs, &local_err);
> > -} else if (bs->file) {
> > -bdrv_invalidate_cache(bs->file->bs, &local_err);
> 
> The old behaviour was that we only recurse for bs->file if the block
> driver doesn't have its own implementation.
> 
> This means that in qcow2, for example, we call bdrv_invalidate_cache()
> explicitly for bs->file. If we can already invalidate it here, the call
> inside qcow2 and probably other drivers could go away.

Yes, will update. Thanks.

Fam

Re: [Qemu-devel] [PATCH 2/2] block: Inactivate all children

2016-05-04 Thread Fam Zheng

On Wed, 05/04 12:12, Kevin Wolf wrote:
> Am 19.04.2016 um 03:42 hat Fam Zheng geschrieben:
> > Currently we only inactivate the top BDS. Actually bdrv_inactivate
> > should be the opposite of bdrv_invalidate_cache.
> > 
> > Recurse into the whole subtree instead.
> > 
> > Signed-off-by: Fam Zheng 
> 
> Did you actually test this?
> 
> I would expect that bs->drv->bdrv_inactivate() fails now (as in
> assertion failure) if it has anything to flush to the image because
> bs->file has already be inactivated before. I think children need to be
> inactived after their parents.

OK, my test apparently failed to trigger that bdrv_pwritv() path. Good catch!

> 
> Nodes with multiple parents could actually become even more
> interesting...

I'll make it two passes recursion: one for calling drv->bdrv_inactivate and the
other for setting BDRV_O_INACTIVATE.

Fam

[Qemu-devel] [PATCH v6 07/20] scsi-disk: Switch to byte-based aio block access

2016-05-04 Thread Eric Blake

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

Signed-off-by: Eric Blake 
---
 hw/scsi/scsi-disk.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 1335392..5d98f7b 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -343,8 +343,9 @@ static void scsi_do_read(SCSIDiskReq *r, int ret)
 n = scsi_init_iovec(r, SCSI_DMA_BUF_SIZE);
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
  n * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
-r->req.aiocb = blk_aio_readv(s->qdev.conf.blk, r->sector, &r->qiov, n,
- scsi_read_complete, r);
+r->req.aiocb = blk_aio_preadv(s->qdev.conf.blk,
+  r->sector << BDRV_SECTOR_BITS, &r->qiov,
+  0, scsi_read_complete, r);
 }

 done:
@@ -504,7 +505,6 @@ static void scsi_write_data(SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-uint32_t n;

 /* No data transfer may already be in progress */
 assert(r->req.aiocb == NULL);
@@ -544,11 +544,11 @@ static void scsi_write_data(SCSIRequest *req)
 r->req.aiocb = dma_blk_write(s->qdev.conf.blk, r->req.sg, r->sector,
  scsi_dma_complete, r);
 } else {
-n = r->qiov.size / 512;
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
- n * BDRV_SECTOR_SIZE, BLOCK_ACCT_WRITE);
-r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, r->sector, &r->qiov, n,
-  scsi_write_complete, r);
+ r->qiov.size, BLOCK_ACCT_WRITE);
+r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
+   r->sector << BDRV_SECTOR_BITS, &r->qiov,
+   0, scsi_write_complete, r);
 }
 }

@@ -1730,13 +1730,11 @@ static void scsi_write_same_complete(void *opaque, int 
ret)
 if (data->iov.iov_len) {
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
  data->iov.iov_len, BLOCK_ACCT_WRITE);
-/* blk_aio_write doesn't like the qiov size being different from
- * nb_sectors, make sure they match.
- */
 qemu_iovec_init_external(&data->qiov, &data->iov, 1);
-r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, data->sector,
-  &data->qiov, data->iov.iov_len / 512,
-  scsi_write_same_complete, data);
+r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
+   data->sector << BDRV_SECTOR_BITS,
+   &data->qiov, 0,
+   scsi_write_same_complete, data);
 return;
 }

@@ -1803,9 +1801,10 @@ static void scsi_disk_emulate_write_same(SCSIDiskReq *r, 
uint8_t *inbuf)
 scsi_req_ref(&r->req);
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
  data->iov.iov_len, BLOCK_ACCT_WRITE);
-r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, data->sector,
-  &data->qiov, data->iov.iov_len / 512,
-  scsi_write_same_complete, data);
+r->req.aiocb = blk_aio_pwritev(s->qdev.conf.blk,
+   data->sector << BDRV_SECTOR_BITS,
+   &data->qiov, 0,
+   scsi_write_same_complete, data);
 }

 static void scsi_disk_emulate_write_data(SCSIRequest *req)
-- 
2.5.5

[Qemu-devel] [PATCH v6 17/20] nbd: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Signed-off-by: Eric Blake 
---
 qemu-nbd.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index c55b40f..c07ceef 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -159,12 +159,13 @@ static int find_partition(BlockBackend *blk, int 
partition,
   off_t *offset, off_t *size)
 {
 struct partition_record mbr[4];
-uint8_t data[512];
+uint8_t data[BDRV_SECTOR_SIZE];
 int i;
 int ext_partnum = 4;
 int ret;

-if ((ret = blk_read(blk, 0, data, 1)) < 0) {
+ret = blk_pread(blk, 0, data, sizeof(data));
+if (ret < 0) {
 error_report("error while reading: %s", strerror(-ret));
 exit(EXIT_FAILURE);
 }
@@ -182,10 +183,12 @@ static int find_partition(BlockBackend *blk, int 
partition,

 if (mbr[i].system == 0xF || mbr[i].system == 0x5) {
 struct partition_record ext[4];
-uint8_t data1[512];
+uint8_t data1[BDRV_SECTOR_SIZE];
 int j;

-if ((ret = blk_read(blk, mbr[i].start_sector_abs, data1, 1)) < 0) {
+ret = blk_pread(blk, mbr[i].start_sector_abs << BDRV_SECTOR_BITS,
+data1, sizeof(data1));
+if (ret < 0) {
 error_report("error while reading: %s", strerror(-ret));
 exit(EXIT_FAILURE);
 }
-- 
2.5.5

[Qemu-devel] [PATCH v6 16/20] atapi: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Add new defines ATAPI_SECTOR_BITS and ATAPI_SECTOR_SIZE to
use anywhere we were previously scaling BDRV_SECTOR_* by 4,
for better legibility.

Signed-off-by: Eric Blake 

---
v4: add new defines for use in more places [jsnow]
---
 hw/ide/atapi.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/ide/atapi.c b/hw/ide/atapi.c
index 2bb606c..95056d9 100644
--- a/hw/ide/atapi.c
+++ b/hw/ide/atapi.c
@@ -28,6 +28,9 @@
 #include "hw/scsi/scsi.h"
 #include "sysemu/block-backend.h"

+#define ATAPI_SECTOR_BITS (2 + BDRV_SECTOR_BITS)
+#define ATAPI_SECTOR_SIZE (1 << ATAPI_SECTOR_BITS)
+
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);

 static void padstr8(uint8_t *buf, int buf_size, const char *src)
@@ -111,7 +114,7 @@ cd_read_sector_sync(IDEState *s)
 {
 int ret;
 block_acct_start(blk_get_stats(s->blk), &s->acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ ATAPI_SECTOR_SIZE, BLOCK_ACCT_READ);

 #ifdef DEBUG_IDE_ATAPI
 printf("cd_read_sector_sync: lba=%d\n", s->lba);
@@ -119,12 +122,12 @@ cd_read_sector_sync(IDEState *s)

 switch (s->cd_sector_size) {
 case 2048:
-ret = blk_read(s->blk, (int64_t)s->lba << 2,
-   s->io_buffer, 4);
+ret = blk_pread(s->blk, (int64_t)s->lba << ATAPI_SECTOR_BITS,
+s->io_buffer, ATAPI_SECTOR_SIZE);
 break;
 case 2352:
-ret = blk_read(s->blk, (int64_t)s->lba << 2,
-   s->io_buffer + 16, 4);
+ret = blk_pread(s->blk, (int64_t)s->lba << ATAPI_SECTOR_BITS,
+s->io_buffer + 16, ATAPI_SECTOR_SIZE);
 if (ret >= 0) {
 cd_data_to_raw(s->io_buffer, s->lba);
 }
@@ -182,7 +185,7 @@ static int cd_read_sector(IDEState *s)
 s->iov.iov_base = (s->cd_sector_size == 2352) ?
   s->io_buffer + 16 : s->io_buffer;

-s->iov.iov_len = 4 * BDRV_SECTOR_SIZE;
+s->iov.iov_len = ATAPI_SECTOR_SIZE;
 qemu_iovec_init_external(&s->qiov, &s->iov, 1);

 #ifdef DEBUG_IDE_ATAPI
@@ -190,7 +193,7 @@ static int cd_read_sector(IDEState *s)
 #endif

 block_acct_start(blk_get_stats(s->blk), &s->acct,
- 4 * BDRV_SECTOR_SIZE, BLOCK_ACCT_READ);
+ ATAPI_SECTOR_SIZE, BLOCK_ACCT_READ);

 ide_buffered_readv(s, (int64_t)s->lba << 2, &s->qiov, 4,
cd_read_sector_cb, s);
@@ -435,7 +438,7 @@ static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret)
 #endif

 s->bus->dma->iov.iov_base = (void *)(s->io_buffer + data_offset);
-s->bus->dma->iov.iov_len = n * 4 * 512;
+s->bus->dma->iov.iov_len = n * ATAPI_SECTOR_SIZE;
 qemu_iovec_init_external(&s->bus->dma->qiov, &s->bus->dma->iov, 1);

 s->bus->dma->aiocb = ide_buffered_readv(s, (int64_t)s->lba << 2,
-- 
2.5.5

[Qemu-devel] [PATCH v6 13/20] pflash: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
---
 hw/block/pflash_cfi01.c | 12 ++--
 hw/block/pflash_cfi02.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 106a775..3a1f85d 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -413,11 +413,11 @@ static void pflash_update(pflash_t *pfl, int offset,
 int offset_end;
 if (pfl->blk) {
 offset_end = offset + size;
-/* round to sectors */
-offset = offset >> 9;
-offset_end = (offset_end + 511) >> 9;
-blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-  offset_end - offset);
+/* widen to sector boundaries */
+offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+   offset_end - offset, 0);
 }
 }

@@ -739,7 +739,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)

 if (pfl->blk) {
 /* read the initial flash content */
-ret = blk_read(pfl->blk, 0, pfl->storage, total_len >> 9);
+ret = blk_pread(pfl->blk, 0, pfl->storage, total_len);

 if (ret < 0) {
 vmstate_unregister_ram(&pfl->mem, DEVICE(pfl));
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index b13172c..5f10610 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -253,11 +253,11 @@ static void pflash_update(pflash_t *pfl, int offset,
 int offset_end;
 if (pfl->blk) {
 offset_end = offset + size;
-/* round to sectors */
-offset = offset >> 9;
-offset_end = (offset_end + 511) >> 9;
-blk_write(pfl->blk, offset, pfl->storage + (offset << 9),
-  offset_end - offset);
+/* widen to sector boundaries */
+offset = QEMU_ALIGN_DOWN(offset, BDRV_SECTOR_SIZE);
+offset_end = QEMU_ALIGN_UP(offset_end, BDRV_SECTOR_SIZE);
+blk_pwrite(pfl->blk, offset, pfl->storage + offset,
+   offset_end - offset, 0);
 }
 }

@@ -622,7 +622,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 pfl->chip_len = chip_len;
 if (pfl->blk) {
 /* read the initial flash content */
-ret = blk_read(pfl->blk, 0, pfl->storage, chip_len >> 9);
+ret = blk_pread(pfl->blk, 0, pfl->storage, chip_len);
 if (ret < 0) {
 vmstate_unregister_ram(&pfl->orig_mem, DEVICE(pfl));
 error_setg(errp, "failed to read the initial flash content");
-- 
2.5.5

[Qemu-devel] [PATCH v6 20/20] block: Kill unused sector-based blk_* functions

2016-05-04 Thread Eric Blake

Now that there are no remaining clients, we can drop the
sector-based blk_read(), blk_write(), blk_aio_readv(), and
blk_aio_writev().  Sadly, there are still remaining
sector-based interfaces, such as blk_*discard(), or
blk_write_compressed(); those will have to wait for another
day.

Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h | 10 -
 block/block-backend.c  | 51 --
 2 files changed, 61 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 8d7839c..457efd9 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -90,12 +90,8 @@ void blk_attach_dev_nofail(BlockBackend *blk, void *dev);
 void blk_detach_dev(BlockBackend *blk, void *dev);
 void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors);
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
   int count);
-int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
-  int nb_sectors);
 int blk_write_zeroes(BlockBackend *blk, int64_t offset,
  int count, BdrvRequestFlags flags);
 BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
@@ -107,15 +103,9 @@ int blk_pwrite(BlockBackend *blk, int64_t offset, const 
void *buf, int count,
 int64_t blk_getlength(BlockBackend *blk);
 void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr);
 int64_t blk_nb_sectors(BlockBackend *blk);
-BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t sector_num,
-  QEMUIOVector *iov, int nb_sectors,
-  BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t offset,
QEMUIOVector *iov, BdrvRequestFlags flags,
BlockCompletionFunc *cb, void *opaque);
-BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
-   QEMUIOVector *iov, int nb_sectors,
-   BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
 QEMUIOVector *iov, BdrvRequestFlags flags,
 BlockCompletionFunc *cb, void *opaque);
diff --git a/block/block-backend.c b/block/block-backend.c
index 104852d..0551be4 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -772,24 +772,6 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 return rwco.ret;
 }

-static int blk_rw(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
-  int nb_sectors, CoroutineEntry co_entry,
-  BdrvRequestFlags flags)
-{
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return -EINVAL;
-}
-
-return blk_prw(blk, sector_num << BDRV_SECTOR_BITS, buf,
-   nb_sectors << BDRV_SECTOR_BITS, co_entry, flags);
-}
-
-int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors)
-{
-return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
-}
-
 int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
   int count)
 {
@@ -807,13 +789,6 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
offset, uint8_t *buf,
 return ret;
 }

-int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
-  int nb_sectors)
-{
-return blk_rw(blk, sector_num, (uint8_t*) buf, nb_sectors,
-  blk_write_entry, 0);
-}
-
 int blk_write_zeroes(BlockBackend *blk, int64_t offset,
  int count, BdrvRequestFlags flags)
 {
@@ -985,19 +960,6 @@ int64_t blk_nb_sectors(BlockBackend *blk)
 return bdrv_nb_sectors(blk_bs(blk));
 }

-BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t sector_num,
-  QEMUIOVector *iov, int nb_sectors,
-  BlockCompletionFunc *cb, void *opaque)
-{
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return blk_abort_aio_request(blk, cb, opaque, -EINVAL);
-}
-
-assert(nb_sectors << BDRV_SECTOR_BITS == iov->size);
-return blk_aio_prwv(blk, sector_num << BDRV_SECTOR_BITS, iov->size, iov,
-blk_aio_read_entry, 0, cb, opaque);
-}
-
 BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t offset,
QEMUIOVector *iov, BdrvRequestFlags flags,
BlockCompletionFunc *cb, void *opaque)
@@ -1006,19 +968,6 @@ BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t 
offset,
 blk_aio_read_entry, flags, cb, opaque);
 }

-BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
-

[Qemu-devel] [PATCH v6 08/20] virtio: Switch to byte-based aio block access

2016-05-04 Thread Eric Blake

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

The trace is modified at the same time, and nb_sectors is now
unused.  Fix a comment typo while in the vicinity.

Signed-off-by: Eric Blake 
---
 hw/block/virtio-blk.c | 18 --
 trace-events  |  2 +-
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 3f88f8c..284e646 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -322,7 +322,6 @@ static inline void submit_requests(BlockBackend *blk, 
MultiReqBuffer *mrb,
 {
 QEMUIOVector *qiov = &mrb->reqs[start]->qiov;
 int64_t sector_num = mrb->reqs[start]->sector_num;
-int nb_sectors = mrb->reqs[start]->qiov.size / BDRV_SECTOR_SIZE;
 bool is_write = mrb->is_write;

 if (num_reqs > 1) {
@@ -331,7 +330,7 @@ static inline void submit_requests(BlockBackend *blk, 
MultiReqBuffer *mrb,
 int tmp_niov = qiov->niov;

 /* mrb->reqs[start]->qiov was initialized from external so we can't
- * modifiy it here. We need to initialize it locally and then add the
+ * modify it here. We need to initialize it locally and then add the
  * external iovecs. */
 qemu_iovec_init(qiov, niov);

@@ -343,23 +342,22 @@ static inline void submit_requests(BlockBackend *blk, 
MultiReqBuffer *mrb,
 qemu_iovec_concat(qiov, &mrb->reqs[i]->qiov, 0,
   mrb->reqs[i]->qiov.size);
 mrb->reqs[i - 1]->mr_next = mrb->reqs[i];
-nb_sectors += mrb->reqs[i]->qiov.size / BDRV_SECTOR_SIZE;
 }
-assert(nb_sectors == qiov->size / BDRV_SECTOR_SIZE);

-trace_virtio_blk_submit_multireq(mrb, start, num_reqs, sector_num,
- nb_sectors, is_write);
+trace_virtio_blk_submit_multireq(mrb, start, num_reqs,
+ sector_num << BDRV_SECTOR_BITS,
+ qiov->size, is_write);
 block_acct_merge_done(blk_get_stats(blk),
   is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ,
   num_reqs - 1);
 }

 if (is_write) {
-blk_aio_writev(blk, sector_num, qiov, nb_sectors,
+blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
+virtio_blk_rw_complete, mrb->reqs[start]);
+} else {
+blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0,
virtio_blk_rw_complete, mrb->reqs[start]);
-} else {
-blk_aio_readv(blk, sector_num, qiov, nb_sectors,
-  virtio_blk_rw_complete, mrb->reqs[start]);
 }
 }

diff --git a/trace-events b/trace-events
index b4acd2a..b588091 100644
--- a/trace-events
+++ b/trace-events
@@ -118,7 +118,7 @@ virtio_blk_req_complete(void *req, int status) "req %p 
status %d"
 virtio_blk_rw_complete(void *req, int ret) "req %p ret %d"
 virtio_blk_handle_write(void *req, uint64_t sector, size_t nsectors) "req %p 
sector %"PRIu64" nsectors %zu"
 virtio_blk_handle_read(void *req, uint64_t sector, size_t nsectors) "req %p 
sector %"PRIu64" nsectors %zu"
-virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t 
sector, size_t nsectors, bool is_write) "mrb %p start %d num_reqs %d sector 
%"PRIu64" nsectors %zu is_write %d"
+virtio_blk_submit_multireq(void *mrb, int start, int num_reqs, uint64_t 
offset, size_t size, bool is_write) "mrb %p start %d num_reqs %d offset 
%"PRIu64" size %zu is_write %d"

 # hw/block/dataplane/virtio-blk.c
 virtio_blk_data_plane_start(void *s) "dataplane %p"
-- 
2.5.5

[Qemu-devel] [PATCH v6 19/20] qemu-io: Switch to byte-based block access

2016-05-04 Thread Eric Blake

qemu-io is the last user of several sector-based interfaces.
This patch upgrades to the new interfaces under the hood,
then deletes the resulting dead code.  Note that for maximum
back-compat, while the -p option is no longer required to get
blk_pread(), it is still needed to allow for unaligned access;
this is because qemu-iotest 23 relies on qemu-io rejecting
unaligned accesses without -p.  A later patch may clean up the
interface to be more user-friendly, but it's better to separate
what's done under the hood from what the user sees.

Signed-off-by: Eric Blake 
---
 qemu-io-cmds.c | 62 ++
 1 file changed, 10 insertions(+), 52 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index a3e3982..0bbbc72 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -419,40 +419,6 @@ fail:
 return buf;
 }

-static int do_read(BlockBackend *blk, char *buf, int64_t offset, int64_t count,
-   int64_t *total)
-{
-int ret;
-
-if (count >> 9 > INT_MAX) {
-return -ERANGE;
-}
-
-ret = blk_read(blk, offset >> 9, (uint8_t *)buf, count >> 9);
-if (ret < 0) {
-return ret;
-}
-*total = count;
-return 1;
-}
-
-static int do_write(BlockBackend *blk, char *buf, int64_t offset, int64_t 
count,
-int64_t *total)
-{
-int ret;
-
-if (count >> 9 > INT_MAX) {
-return -ERANGE;
-}
-
-ret = blk_write(blk, offset >> 9, (uint8_t *)buf, count >> 9);
-if (ret < 0) {
-return ret;
-}
-*total = count;
-return 1;
-}
-
 static int do_pread(BlockBackend *blk, char *buf, int64_t offset,
 int64_t count, int64_t *total)
 {
@@ -588,8 +554,7 @@ static int do_aio_readv(BlockBackend *blk, QEMUIOVector 
*qiov,
 {
 int async_ret = NOT_DONE;

-blk_aio_readv(blk, offset >> 9, qiov, qiov->size >> 9,
-  aio_rw_done, &async_ret);
+blk_aio_preadv(blk, offset, qiov, 0, aio_rw_done, &async_ret);
 while (async_ret == NOT_DONE) {
 main_loop_wait(false);
 }
@@ -603,8 +568,7 @@ static int do_aio_writev(BlockBackend *blk, QEMUIOVector 
*qiov,
 {
 int async_ret = NOT_DONE;

-blk_aio_writev(blk, offset >> 9, qiov, qiov->size >> 9,
-   aio_rw_done, &async_ret);
+blk_aio_pwritev(blk, offset, qiov, 0, aio_rw_done, &async_ret);
 while (async_ret == NOT_DONE) {
 main_loop_wait(false);
 }
@@ -670,7 +634,7 @@ static void read_help(void)
 " -b, -- read from the VM state rather than the virtual disk\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -l, -- length for pattern verification (only with -P)\n"
-" -p, -- use blk_pread to read the file\n"
+" -p, -- allow unaligned access\n"
 " -P, -- use a pattern to verify read data\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
 " -s, -- start offset for pattern verification (only with -P)\n"
@@ -805,12 +769,10 @@ static int read_f(BlockBackend *blk, int argc, char 
**argv)
 buf = qemu_io_alloc(blk, count, 0xab);

 gettimeofday(&t1, NULL);
-if (pflag) {
-cnt = do_pread(blk, buf, offset, count, &total);
-} else if (bflag) {
+if (bflag) {
 cnt = do_load_vmstate(blk, buf, offset, count, &total);
 } else {
-cnt = do_read(blk, buf, offset, count, &total);
+cnt = do_pread(blk, buf, offset, count, &total);
 }
 gettimeofday(&t2, NULL);

@@ -990,7 +952,7 @@ static void write_help(void)
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
 " -c, -- write compressed data with blk_write_compressed\n"
-" -p, -- use blk_pwrite to write the file\n"
+" -p, -- allow unaligned access\n"
 " -P, -- use different pattern to fill file\n"
 " -C, -- report statistics in a machine parsable format\n"
 " -q, -- quiet mode, do not show I/O statistics\n"
@@ -1106,16 +1068,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 }

 gettimeofday(&t1, NULL);
-if (pflag) {
-cnt = do_pwrite(blk, buf, offset, count, &total);
-} else if (bflag) {
+if (bflag) {
 cnt = do_save_vmstate(blk, buf, offset, count, &total);
 } else if (zflag) {
 cnt = do_co_write_zeroes(blk, offset, count, &total);
 } else if (cflag) {
 cnt = do_write_compressed(blk, buf, offset, count, &total);
 } else {
-cnt = do_write(blk, buf, offset, count, &total);
+cnt = do_pwrite(blk, buf, offset, count, &total);
 }
 gettimeofday(&t2, NULL);

@@ -1592,8 +1552,7 @@ static int aio_read_f(BlockBackend *blk, int argc, char 
**argv)
 gettimeofday(&ctx->t1, NULL);
 block_acct_start(blk_get_stats(blk), &ctx->acct, ctx->qiov.size,
  BLOCK_ACCT_READ);
-blk_aio_readv(blk, ctx->offset >> 9, &ctx->qiov,
-  ctx->qiov.size >> 9, aio_read_done, ctx);
+blk_aio_preadv(blk, ctx->offset, &ctx->qiov, 0, aio_read_done, ctx);
 return 0

[Qemu-devel] [PATCH v6 18/20] qemu-img: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
---
 qemu-img.c | 28 +++-
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 76430a8..491a460 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1088,7 +1088,8 @@ static int check_empty_sectors(BlockBackend *blk, int64_t 
sect_num,
uint8_t *buffer, bool quiet)
 {
 int pnum, ret = 0;
-ret = blk_read(blk, sect_num, buffer, sect_count);
+ret = blk_pread(blk, sect_num << BDRV_SECTOR_BITS, buffer,
+sect_count << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("Error while reading offset %" PRId64 " of %s: %s",
  sectors_to_bytes(sect_num), filename, strerror(-ret));
@@ -1301,7 +1302,8 @@ static int img_compare(int argc, char **argv)
 nb_sectors = MIN(pnum1, pnum2);
 } else if (allocated1 == allocated2) {
 if (allocated1) {
-ret = blk_read(blk1, sector_num, buf1, nb_sectors);
+ret = blk_pread(blk1, sector_num << BDRV_SECTOR_BITS, buf1,
+nb_sectors << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("Error while reading offset %" PRId64 " of 
%s:"
  " %s", sectors_to_bytes(sector_num), 
filename1,
@@ -1309,7 +1311,8 @@ static int img_compare(int argc, char **argv)
 ret = 4;
 goto out;
 }
-ret = blk_read(blk2, sector_num, buf2, nb_sectors);
+ret = blk_pread(blk2, sector_num << BDRV_SECTOR_BITS, buf2,
+nb_sectors << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("Error while reading offset %" PRId64
  " of %s: %s", sectors_to_bytes(sector_num),
@@ -1522,7 +1525,9 @@ static int convert_read(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
 bs_sectors = s->src_sectors[s->src_cur];

 n = MIN(nb_sectors, bs_sectors - (sector_num - s->src_cur_offset));
-ret = blk_read(blk, sector_num - s->src_cur_offset, buf, n);
+ret = blk_pread(blk,
+(sector_num - s->src_cur_offset) << BDRV_SECTOR_BITS,
+buf, n << BDRV_SECTOR_BITS);
 if (ret < 0) {
 return ret;
 }
@@ -1577,7 +1582,8 @@ static int convert_write(ImgConvertState *s, int64_t 
sector_num, int nb_sectors,
 if (!s->min_sparse ||
 is_allocated_sectors_min(buf, n, &n, s->min_sparse))
 {
-ret = blk_write(s->target, sector_num, buf, n);
+ret = blk_pwrite(s->target, sector_num << BDRV_SECTOR_BITS,
+ buf, n << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 return ret;
 }
@@ -3024,7 +3030,8 @@ static int img_rebase(int argc, char **argv)
 n = old_backing_num_sectors - sector;
 }

-ret = blk_read(blk_old_backing, sector, buf_old, n);
+ret = blk_pread(blk_old_backing, sector << BDRV_SECTOR_BITS,
+buf_old, n << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("error while reading from old backing file");
 goto out;
@@ -3038,7 +3045,8 @@ static int img_rebase(int argc, char **argv)
 n = new_backing_num_sectors - sector;
 }

-ret = blk_read(blk_new_backing, sector, buf_new, n);
+ret = blk_pread(blk_new_backing, sector << BDRV_SECTOR_BITS,
+buf_new, n << BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_report("error while reading from new backing file");
 goto out;
@@ -3054,8 +3062,10 @@ static int img_rebase(int argc, char **argv)
 if (compare_sectors(buf_old + written * 512,
 buf_new + written * 512, n - written, &pnum))
 {
-ret = blk_write(blk, sector + written,
-buf_old + written * 512, pnum);
+ret = blk_pwrite(blk,
+ (sector + written) << BDRV_SECTOR_BITS,
+ buf_old + written * 512,
+ pnum << BDRV_SECTOR_BITS, 0);
 if (ret < 0) {
 error_report("Error while writing to COW image: %s",
 strerror(-ret));
-- 
2.5.5

[Qemu-devel] [PATCH v6 05/20] block: Introduce byte-based aio read/write

2016-05-04 Thread Eric Blake

blk_aio_readv() and blk_aio_writev() are annoying in that they
can't access sub-sector granularity, and cannot pass flags.
Also, they require the caller to pass redundant information
about the size of the I/O.

Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix
the flaws. The next few patches will upgrade callers, then
finally delete the old interfaces.

Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h |  6 ++
 block/block-backend.c  | 16 
 2 files changed, 22 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 851376b..8d7839c 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -110,9 +110,15 @@ int64_t blk_nb_sectors(BlockBackend *blk);
 BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t sector_num,
   QEMUIOVector *iov, int nb_sectors,
   BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t offset,
+   QEMUIOVector *iov, BdrvRequestFlags flags,
+   BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque);
+BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
+QEMUIOVector *iov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *blk_aio_discard(BlockBackend *blk,
diff --git a/block/block-backend.c b/block/block-backend.c
index f8f88a6..104852d 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -998,6 +998,14 @@ BlockAIOCB *blk_aio_readv(BlockBackend *blk, int64_t 
sector_num,
 blk_aio_read_entry, 0, cb, opaque);
 }

+BlockAIOCB *blk_aio_preadv(BlockBackend *blk, int64_t offset,
+   QEMUIOVector *iov, BdrvRequestFlags flags,
+   BlockCompletionFunc *cb, void *opaque)
+{
+return blk_aio_prwv(blk, offset, iov->size, iov,
+blk_aio_read_entry, flags, cb, opaque);
+}
+
 BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockCompletionFunc *cb, void *opaque)
@@ -1011,6 +1019,14 @@ BlockAIOCB *blk_aio_writev(BlockBackend *blk, int64_t 
sector_num,
 blk_aio_write_entry, 0, cb, opaque);
 }

+BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t offset,
+QEMUIOVector *iov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque)
+{
+return blk_aio_prwv(blk, offset, iov->size, iov,
+blk_aio_write_entry, flags, cb, opaque);
+}
+
 BlockAIOCB *blk_aio_flush(BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque)
 {
-- 
2.5.5

[Qemu-devel] [PATCH v6 10/20] fdc: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Signed-off-by: Eric Blake 
---
 hw/block/fdc.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 3722275..f73af7d 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -223,6 +223,13 @@ static int fd_sector(FDrive *drv)
   NUM_SIDES(drv));
 }

+/* Returns current position, in bytes, for given drive */
+static int fd_offset(FDrive *drv)
+{
+g_assert(fd_sector(drv) < INT_MAX >> BDRV_SECTOR_BITS);
+return fd_sector(drv) << BDRV_SECTOR_BITS;
+}
+
 /* Seek to a new position:
  * returns 0 if already on right track
  * returns 1 if track changed
@@ -1629,8 +1636,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,
 if (fdctrl->data_dir != FD_DIR_WRITE ||
 len < FD_SECTOR_LEN || rel_pos != 0) {
 /* READ & SCAN commands and realign to a sector for WRITE */
-if (blk_read(cur_drv->blk, fd_sector(cur_drv),
- fdctrl->fifo, 1) < 0) {
+if (blk_pread(cur_drv->blk, fd_offset(cur_drv),
+  fdctrl->fifo, BDRV_SECTOR_SIZE) < 0) {
 FLOPPY_DPRINTF("Floppy: error getting sector %d\n",
fd_sector(cur_drv));
 /* Sure, image size is too small... */
@@ -1657,8 +1664,8 @@ static int fdctrl_transfer_handler (void *opaque, int 
nchan,

 k->read_memory(fdctrl->dma, nchan, fdctrl->fifo + rel_pos,
fdctrl->data_pos, len);
-if (blk_write(cur_drv->blk, fd_sector(cur_drv),
-  fdctrl->fifo, 1) < 0) {
+if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv),
+   fdctrl->fifo, BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error writing sector %d\n",
fd_sector(cur_drv));
 fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 
0x00, 0x00);
@@ -1741,7 +1748,8 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
fd_sector(cur_drv));
 return 0;
 }
-if (blk_read(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1)
+if (blk_pread(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+  BDRV_SECTOR_SIZE)
 < 0) {
 FLOPPY_DPRINTF("error getting sector %d\n",
fd_sector(cur_drv));
@@ -1820,7 +1828,8 @@ static void fdctrl_format_sector(FDCtrl *fdctrl)
 }
 memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
 if (cur_drv->blk == NULL ||
-blk_write(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1) < 0) {
+blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error formatting sector %d\n", fd_sector(cur_drv));
 fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 0x00, 0x00);
 } else {
@@ -2243,8 +2252,8 @@ static void fdctrl_write_data(FDCtrl *fdctrl, uint32_t 
value)
 if (pos == FD_SECTOR_LEN - 1 ||
 fdctrl->data_pos == fdctrl->data_len) {
 cur_drv = get_cur_drv(fdctrl);
-if (blk_write(cur_drv->blk, fd_sector(cur_drv), fdctrl->fifo, 1)
-< 0) {
+if (blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 FLOPPY_DPRINTF("error writing sector %d\n",
fd_sector(cur_drv));
 break;
-- 
2.5.5

[Qemu-devel] [PATCH v6 02/20] block: Drop private ioctl-only members of BlockRequest

2016-05-04 Thread Eric Blake

I was thrown by the fact that the public type BlockRequest had
an anonymous union, but no obvious discriminator.  Turns out
that the only client of the second branch of the union was code
internal to io.c, and that with a slight abuse of QEMUIOVector*
to pass a void* pointer, we can make the public interface less
confusing.

(Yes, I know that strict C doesn't guarantee that you can cast
void* to the wrong type and then back to void* - it only
guarantees the reverse direction of the original pointer to
void* and back to the original type - but we already have other
assumptions throughout the qemu code base that assume that all
pointers are interchangeable in representation).

Signed-off-by: Eric Blake 
---
 include/block/block.h | 16 
 block/io.c|  8 +---
 2 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 0e8b4d1..754aae3 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -334,18 +334,10 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb);

 typedef struct BlockRequest {
 /* Fields to be filled by multiwrite caller */
-union {
-struct {
-int64_t sector;
-int nb_sectors;
-int flags;
-QEMUIOVector *qiov;
-};
-struct {
-int req;
-void *buf;
-};
-};
+int64_t sector;
+int nb_sectors;
+int flags;
+QEMUIOVector *qiov;
 BlockCompletionFunc *cb;
 void *opaque;

diff --git a/block/io.c b/block/io.c
index 0db1146..f15c0f4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2614,7 +2614,7 @@ static void coroutine_fn bdrv_co_aio_ioctl_entry(void 
*opaque)
 {
 BlockAIOCBCoroutine *acb = opaque;
 acb->req.error = bdrv_co_do_ioctl(acb->common.bs,
-  acb->req.req, acb->req.buf);
+  acb->req.flags, acb->req.qiov);
 bdrv_co_complete(acb);
 }

@@ -2628,8 +2628,10 @@ BlockAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,

 acb->need_bh = true;
 acb->req.error = -EINPROGRESS;
-acb->req.req = req;
-acb->req.buf = buf;
+/* Slight type abuse here, so we don't have to expose extra fields
+ * in the public struct BlockRequest */
+acb->req.flags = req;
+acb->req.qiov = buf;
 co = qemu_coroutine_create(bdrv_co_aio_ioctl_entry);
 qemu_coroutine_enter(co, acb);

-- 
2.5.5

[Qemu-devel] [PATCH v6 15/20] m25p80: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.

Likewise for blk_aio_readv() and blk_aio_writev().

Signed-off-by: Eric Blake 
---
 hw/block/m25p80.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 906b712..5d30863 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -358,25 +358,21 @@ static void blk_sync_complete(void *opaque, int ret)

 static void flash_sync_page(Flash *s, int page)
 {
-int blk_sector, nb_sectors;
 QEMUIOVector iov;

 if (!s->blk || blk_is_read_only(s->blk)) {
 return;
 }

-blk_sector = (page * s->pi->page_size) / BDRV_SECTOR_SIZE;
-nb_sectors = DIV_ROUND_UP(s->pi->page_size, BDRV_SECTOR_SIZE);
 qemu_iovec_init(&iov, 1);
-qemu_iovec_add(&iov, s->storage + blk_sector * BDRV_SECTOR_SIZE,
-   nb_sectors * BDRV_SECTOR_SIZE);
-blk_aio_writev(s->blk, blk_sector, &iov, nb_sectors, blk_sync_complete,
-   NULL);
+qemu_iovec_add(&iov, s->storage + page * s->pi->page_size,
+   s->pi->page_size);
+blk_aio_pwritev(s->blk, page * s->pi->page_size, &iov, 0,
+blk_sync_complete, NULL);
 }

 static inline void flash_sync_area(Flash *s, int64_t off, int64_t len)
 {
-int64_t start, end, nb_sectors;
 QEMUIOVector iov;

 if (!s->blk || blk_is_read_only(s->blk)) {
@@ -384,13 +380,9 @@ static inline void flash_sync_area(Flash *s, int64_t off, 
int64_t len)
 }

 assert(!(len % BDRV_SECTOR_SIZE));
-start = off / BDRV_SECTOR_SIZE;
-end = (off + len) / BDRV_SECTOR_SIZE;
-nb_sectors = end - start;
 qemu_iovec_init(&iov, 1);
-qemu_iovec_add(&iov, s->storage + (start * BDRV_SECTOR_SIZE),
-nb_sectors * BDRV_SECTOR_SIZE);
-blk_aio_writev(s->blk, start, &iov, nb_sectors, blk_sync_complete, NULL);
+qemu_iovec_add(&iov, s->storage + off, len);
+blk_aio_pwritev(s->blk, off, &iov, 0, blk_sync_complete, NULL);
 }

 static void flash_erase(Flash *s, int offset, FlashCMD cmd)
@@ -907,8 +899,7 @@ static int m25p80_init(SSISlave *ss)
 s->storage = blk_blockalign(s->blk, s->size);

 /* FIXME: Move to late init */
-if (blk_read(s->blk, 0, s->storage,
- DIV_ROUND_UP(s->size, BDRV_SECTOR_SIZE))) {
+if (blk_pread(s->blk, 0, s->storage, s->size)) {
 fprintf(stderr, "Failed to initialize SPI flash!\n");
 return 1;
 }
-- 
2.5.5

[Qemu-devel] [PATCH v6 09/20] xen_disk: Switch to byte-based aio block access

2016-05-04 Thread Eric Blake

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

Signed-off-by: Eric Blake 
---
 hw/block/xen_disk.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index d4ce380..064c116 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -554,9 +554,8 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
 block_acct_start(blk_get_stats(blkdev->blk), &ioreq->acct,
  ioreq->v.size, BLOCK_ACCT_READ);
 ioreq->aio_inflight++;
-blk_aio_readv(blkdev->blk, ioreq->start / BLOCK_SIZE,
-  &ioreq->v, ioreq->v.size / BLOCK_SIZE,
-  qemu_aio_complete, ioreq);
+blk_aio_preadv(blkdev->blk, ioreq->start, &ioreq->v, 0,
+   qemu_aio_complete, ioreq);
 break;
 case BLKIF_OP_WRITE:
 case BLKIF_OP_FLUSH_DISKCACHE:
@@ -569,9 +568,8 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
  ioreq->req.operation == BLKIF_OP_WRITE ?
  BLOCK_ACCT_WRITE : BLOCK_ACCT_FLUSH);
 ioreq->aio_inflight++;
-blk_aio_writev(blkdev->blk, ioreq->start / BLOCK_SIZE,
-   &ioreq->v, ioreq->v.size / BLOCK_SIZE,
-   qemu_aio_complete, ioreq);
+blk_aio_pwritev(blkdev->blk, ioreq->start, &ioreq->v, 0,
+qemu_aio_complete, ioreq);
 break;
 case BLKIF_OP_DISCARD:
 {
-- 
2.5.5

[Qemu-devel] [PATCH v6 14/20] sd: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

Greatly simplifies the code, now that we let the block layer
take care of alignment and read-modify-write on our behalf :)
In fact, we no longer need to include 'buf' in the migration
stream (although we do have to ensure that the stream remains
compatible).

Signed-off-by: Eric Blake 

---
v5: fix bug in sd_blk_write, drop sd->buf
---
 hw/sd/sd.c | 51 ---
 1 file changed, 4 insertions(+), 47 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index b66e5d2..87e3d23 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -123,7 +123,6 @@ struct SDState {
 qemu_irq readonly_cb;
 qemu_irq inserted_cb;
 BlockBackend *blk;
-uint8_t *buf;

 bool enable;
 };
@@ -551,7 +550,7 @@ static const VMStateDescription sd_vmstate = {
 VMSTATE_UINT64(data_start, SDState),
 VMSTATE_UINT32(data_offset, SDState),
 VMSTATE_UINT8_ARRAY(data, SDState, 512),
-VMSTATE_BUFFER_POINTER_UNSAFE(buf, SDState, 1, 512),
+VMSTATE_UNUSED_V(1, 512),
 VMSTATE_BOOL(enable, SDState),
 VMSTATE_END_OF_LIST()
 },
@@ -1577,57 +1576,17 @@ send_response:

 static void sd_blk_read(SDState *sd, uint64_t addr, uint32_t len)
 {
-uint64_t end = addr + len;
-
 DPRINTF("sd_blk_read: addr = 0x%08llx, len = %d\n",
 (unsigned long long) addr, len);
-if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
+if (!sd->blk || blk_pread(sd->blk, addr, sd->data, len) < 0) {
 fprintf(stderr, "sd_blk_read: read error on host side\n");
-return;
 }
-
-if (end > (addr & ~511) + 512) {
-memcpy(sd->data, sd->buf + (addr & 511), 512 - (addr & 511));
-
-if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_read: read error on host side\n");
-return;
-}
-memcpy(sd->data + 512 - (addr & 511), sd->buf, end & 511);
-} else
-memcpy(sd->data, sd->buf + (addr & 511), len);
 }

 static void sd_blk_write(SDState *sd, uint64_t addr, uint32_t len)
 {
-uint64_t end = addr + len;
-
-if ((addr & 511) || len < 512)
-if (!sd->blk || blk_read(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: read error on host side\n");
-return;
-}
-
-if (end > (addr & ~511) + 512) {
-memcpy(sd->buf + (addr & 511), sd->data, 512 - (addr & 511));
-if (blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-return;
-}
-
-if (blk_read(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: read error on host side\n");
-return;
-}
-memcpy(sd->buf, sd->data + 512 - (addr & 511), end & 511);
-if (blk_write(sd->blk, end >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-}
-} else {
-memcpy(sd->buf + (addr & 511), sd->data, len);
-if (!sd->blk || blk_write(sd->blk, addr >> 9, sd->buf, 1) < 0) {
-fprintf(stderr, "sd_blk_write: write error on host side\n");
-}
+if (!sd->blk || blk_pwrite(sd->blk, addr, sd->data, len, 0) < 0) {
+fprintf(stderr, "sd_blk_write: write error on host side\n");
 }
 }

@@ -1925,8 +1884,6 @@ static void sd_realize(DeviceState *dev, Error **errp)
 return;
 }

-sd->buf = blk_blockalign(sd->blk, 512);
-
 if (sd->blk) {
 blk_set_dev_ops(sd->blk, &sd_block_ops, sd);
 }
-- 
2.5.5

[Qemu-devel] [PATCH v6 01/20] block: Allow BDRV_REQ_FUA through blk_pwrite()

2016-05-04 Thread Eric Blake

We have several block drivers that understand BDRV_REQ_FUA,
and emulate it in the block layer for the rest by a full flush.
But without a way to actually request BDRV_REQ_FUA during a
pass-through blk_pwrite(), FUA-aware block drivers like NBD are
forced to repeat the emulation logic of a full flush regardless
of whether the backend they are writing to could do it more
efficiently.

This patch just wires up a flags argument; followup patches
will actually make use of it in the NBD driver and in qemu-io.

Signed-off-by: Eric Blake 
Acked-by: Denis V. Lunev 
---
 include/sysemu/block-backend.h |  3 ++-
 block/block-backend.c  |  6 --
 block/crypto.c |  2 +-
 block/parallels.c  |  2 +-
 block/qcow.c   |  8 
 block/qcow2.c  |  4 ++--
 block/qed.c|  6 +++---
 block/sheepdog.c   |  2 +-
 block/vdi.c|  4 ++--
 block/vhdx.c   |  5 +++--
 block/vmdk.c   | 10 +-
 block/vpc.c| 10 +-
 hw/nvram/spapr_nvram.c |  4 ++--
 nbd/server.c   |  2 +-
 qemu-io-cmds.c |  2 +-
 15 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index c62b6fe..6991b26 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -102,7 +102,8 @@ BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t 
sector_num,
  int nb_sectors, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque);
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count);
-int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count);
+int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
+   BdrvRequestFlags flags);
 int64_t blk_getlength(BlockBackend *blk);
 void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr);
 int64_t blk_nb_sectors(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index a7623e8..96c1d7c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -953,9 +953,11 @@ int blk_pread(BlockBackend *blk, int64_t offset, void 
*buf, int count)
 return count;
 }

-int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count)
+int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
+   BdrvRequestFlags flags)
 {
-int ret = blk_prw(blk, offset, (void*) buf, count, blk_write_entry, 0);
+int ret = blk_prw(blk, offset, (void *) buf, count, blk_write_entry,
+  flags);
 if (ret < 0) {
 return ret;
 }
diff --git a/block/crypto.c b/block/crypto.c
index 1903e84..32ba17c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -91,7 +91,7 @@ static ssize_t block_crypto_write_func(QCryptoBlock *block,
 struct BlockCryptoCreateData *data = opaque;
 ssize_t ret;

-ret = blk_pwrite(data->blk, offset, buf, buflen);
+ret = blk_pwrite(data->blk, offset, buf, buflen, 0);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not write encryption header");
 return ret;
diff --git a/block/parallels.c b/block/parallels.c
index 324ed43..2d8bc87 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -512,7 +512,7 @@ static int parallels_create(const char *filename, QemuOpts 
*opts, Error **errp)
 memset(tmp, 0, sizeof(tmp));
 memcpy(tmp, &header, sizeof(header));

-ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE);
+ret = blk_pwrite(file, 0, tmp, BDRV_SECTOR_SIZE, 0);
 if (ret < 0) {
 goto exit;
 }
diff --git a/block/qcow.c b/block/qcow.c
index 60ddb12..d6dc1b0 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -853,14 +853,14 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }

 /* write all the data */
-ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header));
+ret = blk_pwrite(qcow_blk, 0, &header, sizeof(header), 0);
 if (ret != sizeof(header)) {
 goto exit;
 }

 if (backing_file) {
 ret = blk_pwrite(qcow_blk, sizeof(header),
-backing_file, backing_filename_len);
+ backing_file, backing_filename_len, 0);
 if (ret != backing_filename_len) {
 goto exit;
 }
@@ -869,8 +869,8 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 tmp = g_malloc0(BDRV_SECTOR_SIZE);
 for (i = 0; i < ((sizeof(uint64_t)*l1_size + BDRV_SECTOR_SIZE - 1)/
 BDRV_SECTOR_SIZE); i++) {
-ret = blk_pwrite(qcow_blk, header_size +
-BDRV_SECTOR_SIZE*i, tmp, BDRV_SECTOR_SIZE);
+ret = blk_pwrite(qcow_blk, header_size + BDRV_SECTOR_SIZE * i,
+ tmp, BDRV_SECTOR_SIZE, 0);
 if (ret != BDRV_SECTOR_SIZE) {

[Qemu-devel] [PATCH v6 11/20] nand: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations.  Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.

Signed-off-by: Eric Blake 

---
v5: fix missing edit
---
 hw/block/nand.c | 36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/hw/block/nand.c b/hw/block/nand.c
index 29c6596..c69e675 100644
--- a/hw/block/nand.c
+++ b/hw/block/nand.c
@@ -663,7 +663,8 @@ static void glue(nand_blk_write_, PAGE_SIZE)(NANDFlashState 
*s)
 sector = SECTOR(s->addr);
 off = (s->addr & PAGE_MASK) + s->offset;
 soff = SECTOR_OFFSET(s->addr);
-if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+  PAGE_SECTORS << BDRV_SECTOR_BITS) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
 return;
 }
@@ -675,21 +676,24 @@ static void glue(nand_blk_write_, 
PAGE_SIZE)(NANDFlashState *s)
 MIN(OOB_SIZE, off + s->iolen - PAGE_SIZE));
 }

-if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS) < 0) {
+if (blk_pwrite(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+   PAGE_SECTORS << BDRV_SECTOR_BITS, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, 
sector);
 }
 } else {
 off = PAGE_START(s->addr) + (s->addr & PAGE_MASK) + s->offset;
 sector = off >> 9;
 soff = off & 0x1ff;
-if (blk_read(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+if (blk_pread(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+  (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, sector);
 return;
 }

 mem_and(iobuf + soff, s->io, s->iolen);

-if (blk_write(s->blk, sector, iobuf, PAGE_SECTORS + 2) < 0) {
+if (blk_pwrite(s->blk, sector << BDRV_SECTOR_BITS, iobuf,
+   (PAGE_SECTORS + 2) << BDRV_SECTOR_BITS, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, 
sector);
 }
 }
@@ -716,17 +720,20 @@ static void glue(nand_blk_erase_, 
PAGE_SIZE)(NANDFlashState *s)
 i = SECTOR(addr);
 page = SECTOR(addr + (1 << (ADDR_SHIFT + s->erase_shift)));
 for (; i < page; i ++)
-if (blk_write(s->blk, i, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, i << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, i);
 }
 } else {
 addr = PAGE_START(addr);
 page = addr >> 9;
-if (blk_read(s->blk, page, iobuf, 1) < 0) {
+if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+  BDRV_SECTOR_SIZE) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
 }
 memset(iobuf + (addr & 0x1ff), 0xff, (~addr & 0x1ff) + 1);
-if (blk_write(s->blk, page, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
 }

@@ -734,18 +741,20 @@ static void glue(nand_blk_erase_, 
PAGE_SIZE)(NANDFlashState *s)
 i = (addr & ~0x1ff) + 0x200;
 for (addr += ((PAGE_SIZE + OOB_SIZE) << s->erase_shift) - 0x200;
 i < addr; i += 0x200) {
-if (blk_write(s->blk, i >> 9, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, i, iobuf, BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n",
__func__, i >> 9);
 }
 }

 page = i >> 9;
-if (blk_read(s->blk, page, iobuf, 1) < 0) {
+if (blk_pread(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+  BDRV_SECTOR_SIZE) < 0) {
 printf("%s: read error in sector %" PRIu64 "\n", __func__, page);
 }
 memset(iobuf, 0xff, ((addr - 1) & 0x1ff) + 1);
-if (blk_write(s->blk, page, iobuf, 1) < 0) {
+if (blk_pwrite(s->blk, page << BDRV_SECTOR_BITS, iobuf,
+   BDRV_SECTOR_SIZE, 0) < 0) {
 printf("%s: write error in sector %" PRIu64 "\n", __func__, page);
 }
 }
@@ -760,7 +769,8 @@ static void glue(nand_blk_load_, PAGE_SIZE)(NANDFlashState 
*s,

 if (s->blk) {
 if (s->mem_oob) {
-

[Qemu-devel] [PATCH v6 03/20] block: Switch blk_read_unthrottled() to byte interface

2016-05-04 Thread Eric Blake

Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().

Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h | 4 ++--
 block/block-backend.c  | 8 
 hw/block/hd-geometry.c | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 6991b26..662a106 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -92,8 +92,8 @@ void *blk_get_attached_dev(BlockBackend *blk);
 void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops, void *opaque);
 int blk_read(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
  int nb_sectors);
-int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors);
+int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
+  int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
   int nb_sectors);
 int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
diff --git a/block/block-backend.c b/block/block-backend.c
index 96c1d7c..e5a8a07 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -790,19 +790,19 @@ int blk_read(BlockBackend *blk, int64_t sector_num, 
uint8_t *buf,
 return blk_rw(blk, sector_num, buf, nb_sectors, blk_read_entry, 0);
 }

-int blk_read_unthrottled(BlockBackend *blk, int64_t sector_num, uint8_t *buf,
- int nb_sectors)
+int blk_pread_unthrottled(BlockBackend *blk, int64_t offset, uint8_t *buf,
+  int count)
 {
 BlockDriverState *bs = blk_bs(blk);
 int ret;

-ret = blk_check_request(blk, sector_num, nb_sectors);
+ret = blk_check_byte_request(blk, offset, count);
 if (ret < 0) {
 return ret;
 }

 bdrv_no_throttling_begin(bs);
-ret = blk_read(blk, sector_num, buf, nb_sectors);
+ret = blk_pread(blk, offset, buf, count);
 bdrv_no_throttling_end(bs);
 return ret;
 }
diff --git a/hw/block/hd-geometry.c b/hw/block/hd-geometry.c
index 6d02192..d388f13 100644
--- a/hw/block/hd-geometry.c
+++ b/hw/block/hd-geometry.c
@@ -66,7 +66,7 @@ static int guess_disk_lchs(BlockBackend *blk,
  * but also in async I/O mode. So the I/O throttling function has to
  * be disabled temporarily here, not permanently.
  */
-if (blk_read_unthrottled(blk, 0, buf, 1) < 0) {
+if (blk_pread_unthrottled(blk, 0, buf, BDRV_SECTOR_SIZE) < 0) {
 return -1;
 }
 /* test msdos magic */
-- 
2.5.5

[Qemu-devel] [PATCH v6 12/20] onenand: Switch to byte-based block access

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead.  Likewise for blk_read().

This particular device picks its size during onenand_initfn(),
and can be at most 0x8000 bytes; therefore, shifting an
'int sec' request to get back to a byte offset should never
overflow 32 bits.  But adding assertions to document that point
should not hurt.

Signed-off-by: Eric Blake 

---
v5: compile-tested, add asserts that size never exceeds 0x8000,
so that we can consistently use uint32_t
---
 hw/block/onenand.c | 41 +++--
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/hw/block/onenand.c b/hw/block/onenand.c
index 883f4b1..8d84227 100644
--- a/hw/block/onenand.c
+++ b/hw/block/onenand.c
@@ -224,7 +224,8 @@ static void onenand_reset(OneNANDState *s, int cold)
 /* Lock the whole flash */
 memset(s->blockwp, ONEN_LOCK_LOCKED, s->blocks);

-if (s->blk_cur && blk_read(s->blk_cur, 0, s->boot[0], 8) < 0) {
+if (s->blk_cur && blk_pread(s->blk_cur, 0, s->boot[0],
+8 << BDRV_SECTOR_BITS) < 0) {
 hw_error("%s: Loading the BootRAM failed.\n", __func__);
 }
 }
@@ -240,8 +241,11 @@ static void onenand_system_reset(DeviceState *dev)
 static inline int onenand_load_main(OneNANDState *s, int sec, int secn,
 void *dest)
 {
+assert(UINT32_MAX >> BDRV_SECTOR_BITS > sec);
+assert(UINT32_MAX >> BDRV_SECTOR_BITS > secn);
 if (s->blk_cur) {
-return blk_read(s->blk_cur, sec, dest, secn) < 0;
+return blk_pread(s->blk_cur, sec << BDRV_SECTOR_BITS, dest,
+ secn << BDRV_SECTOR_BITS) < 0;
 } else if (sec + secn > s->secs_cur) {
 return 1;
 }
@@ -257,19 +261,22 @@ static inline int onenand_prog_main(OneNANDState *s, int 
sec, int secn,
 int result = 0;

 if (secn > 0) {
-uint32_t size = (uint32_t)secn * 512;
+uint32_t size = secn << BDRV_SECTOR_BITS;
+uint32_t offset = sec << BDRV_SECTOR_BITS;
+assert(UINT32_MAX >> BDRV_SECTOR_BITS > sec);
+assert(UINT32_MAX >> BDRV_SECTOR_BITS > secn);
 const uint8_t *sp = (const uint8_t *)src;
 uint8_t *dp = 0;
 if (s->blk_cur) {
 dp = g_malloc(size);
-if (!dp || blk_read(s->blk_cur, sec, dp, secn) < 0) {
+if (!dp || blk_pread(s->blk_cur, offset, dp, size) < 0) {
 result = 1;
 }
 } else {
 if (sec + secn > s->secs_cur) {
 result = 1;
 } else {
-dp = (uint8_t *)s->current + (sec << 9);
+dp = (uint8_t *)s->current + offset;
 }
 }
 if (!result) {
@@ -278,7 +285,7 @@ static inline int onenand_prog_main(OneNANDState *s, int 
sec, int secn,
 dp[i] &= sp[i];
 }
 if (s->blk_cur) {
-result = blk_write(s->blk_cur, sec, dp, secn) < 0;
+result = blk_pwrite(s->blk_cur, offset, dp, size, 0) < 0;
 }
 }
 if (dp && s->blk_cur) {
@@ -295,7 +302,8 @@ static inline int onenand_load_spare(OneNANDState *s, int 
sec, int secn,
 uint8_t buf[512];

 if (s->blk_cur) {
-if (blk_read(s->blk_cur, s->secs_cur + (sec >> 5), buf, 1) < 0) {
+uint32_t offset = (s->secs_cur + (sec >> 5)) << BDRV_SECTOR_BITS;
+if (blk_pread(s->blk_cur, offset, buf, BDRV_SECTOR_SIZE) < 0) {
 return 1;
 }
 memcpy(dest, buf + ((sec & 31) << 4), secn << 4);
@@ -304,7 +312,7 @@ static inline int onenand_load_spare(OneNANDState *s, int 
sec, int secn,
 } else {
 memcpy(dest, s->current + (s->secs_cur << 9) + (sec << 4), secn << 4);
 }
- 
+
 return 0;
 }

@@ -315,10 +323,12 @@ static inline int onenand_prog_spare(OneNANDState *s, int 
sec, int secn,
 if (secn > 0) {
 const uint8_t *sp = (const uint8_t *)src;
 uint8_t *dp = 0, *dpp = 0;
+uint32_t offset = (s->secs_cur + (sec >> 5)) << BDRV_SECTOR_BITS;
+assert(UINT32_MAX >> BDRV_SECTOR_BITS > s->secs_cur + (sec >> 5));
 if (s->blk_cur) {
 dp = g_malloc(512);
 if (!dp
-|| blk_read(s->blk_cur, s->secs_cur + (sec >> 5), dp, 1) < 0) {
+|| blk_pread(s->blk_cur, offset, dp, BDRV_SECTOR_SIZE) < 0) {
 result = 1;
 } else {
 dpp = dp + ((sec & 31) << 4);
@@ -336,8 +346,8 @@ static inline int onenand_prog_spare(OneNANDState *s, int 
sec, int secn,
 dpp[i] &= sp[i];
 }
 if (s->blk_cur) {
-result = blk_write(s->blk_cur, s->secs_cur + (sec >> 5),
-   dp, 1) < 0;
+result = blk_pwrite(s->blk_cur, offset, dp,
+BDRV_SECTOR_SIZE, 0) < 0;
 }
 }
 g_free(dp);
@@ -35

[Qemu-devel] [PATCH v6 04/20] block: Switch blk_*write_zeroes() to byte interface

2016-05-04 Thread Eric Blake

Sector-based blk_write() should die; convert the one-off
variant blk_write_zeroes() to use an offset/count interface
instead.  Likewise for blk_co_write_zeroes() and
blk_aio_write_zeroes().

Signed-off-by: Eric Blake 
---
 include/sysemu/block-backend.h | 12 ++--
 block/block-backend.c  | 33 +++--
 block/parallels.c  |  3 ++-
 hw/scsi/scsi-disk.c|  4 ++--
 qemu-img.c |  3 ++-
 qemu-io-cmds.c |  6 ++
 6 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 662a106..851376b 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -96,10 +96,10 @@ int blk_pread_unthrottled(BlockBackend *blk, int64_t 
offset, uint8_t *buf,
   int count);
 int blk_write(BlockBackend *blk, int64_t sector_num, const uint8_t *buf,
   int nb_sectors);
-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags);
-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags,
+int blk_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags);
+BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque);
 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count);
 int blk_pwrite(BlockBackend *blk, int64_t offset, const void *buf, int count,
@@ -179,8 +179,8 @@ int blk_get_open_flags_from_root_state(BlockBackend *blk);

 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
   BlockCompletionFunc *cb, void *opaque);
-int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags);
+int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags);
 int blk_write_compressed(BlockBackend *blk, int64_t sector_num,
  const uint8_t *buf, int nb_sectors);
 int blk_truncate(BlockBackend *blk, int64_t offset);
diff --git a/block/block-backend.c b/block/block-backend.c
index e5a8a07..f8f88a6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -814,11 +814,11 @@ int blk_write(BlockBackend *blk, int64_t sector_num, 
const uint8_t *buf,
   blk_write_entry, 0);
 }

-int blk_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags)
+int blk_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags)
 {
-return blk_rw(blk, sector_num, NULL, nb_sectors, blk_write_entry,
-  flags | BDRV_REQ_ZERO_WRITE);
+return blk_prw(blk, offset, NULL, count, blk_write_entry,
+   flags | BDRV_REQ_ZERO_WRITE);
 }

 static void error_callback_bh(void *opaque)
@@ -930,18 +930,12 @@ static void blk_aio_write_entry(void *opaque)
 blk_aio_complete(acb);
 }

-BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags,
+BlockAIOCB *blk_aio_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags,
  BlockCompletionFunc *cb, void *opaque)
 {
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return blk_abort_aio_request(blk, cb, opaque, -EINVAL);
-}
-
-return blk_aio_prwv(blk, sector_num << BDRV_SECTOR_BITS,
-nb_sectors << BDRV_SECTOR_BITS, NULL,
-blk_aio_write_entry, flags | BDRV_REQ_ZERO_WRITE,
-cb, opaque);
+return blk_aio_prwv(blk, offset, count, NULL, blk_aio_write_entry,
+flags | BDRV_REQ_ZERO_WRITE, cb, opaque);
 }

 int blk_pread(BlockBackend *blk, int64_t offset, void *buf, int count)
@@ -1444,15 +1438,10 @@ void *blk_aio_get(const AIOCBInfo *aiocb_info, 
BlockBackend *blk,
 return qemu_aio_get(aiocb_info, blk_bs(blk), cb, opaque);
 }

-int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t sector_num,
- int nb_sectors, BdrvRequestFlags flags)
+int coroutine_fn blk_co_write_zeroes(BlockBackend *blk, int64_t offset,
+ int count, BdrvRequestFlags flags)
 {
-if (nb_sectors < 0 || nb_sectors > BDRV_REQUEST_MAX_SECTORS) {
-return -EINVAL;
-}
-
-return blk_co_pwritev(blk, sector_num << BDRV_SECTOR_BITS,
-  nb_sectors << BDRV_SECTOR_BITS, NULL,
+return blk_

[Qemu-devel] [PATCH v6 00/20] block: kill sector-based blk_write/read

2016-05-04 Thread Eric Blake

2.7 material, depends on Kevin's block-next:
git://repo.or.cz/qemu/kevin.git block-next

Previously posted as part of a larger NBD series [1] and as v5 [2].
Mostly orthogonal to Kevin's recent work to also kill sector
interfaces from the driver.

[1] https://lists.gnu.org/archive/html/qemu-devel/2016-04/msg03526.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg00235.html

Also available as a tag at this location:
git fetch git://repo.or.cz/qemu/ericb.git nbd-block-v6

Changes since then:
- patch 2: new patch for something that confused me
- patch 3 (was 12/14): moved earlier
- patch 4 (was 13/14): moved earlier, retitled, and touches more
interfaces
- patches 5-9: new, to cover blk_aio_{read,write}v
- patch 15: also cover blk_aio_{read,write}v
- patch 19: also cover blk_aio_{read,write}v, don't break
qemu-iotests 23 [kwolf]
- patch 20 (was 14/14): retitle, kill more dead code

I still plan to make qemu-io 'read' accept unaligned access
without requiring -p, but that will now come in a later series
that does other qemu-io UI improvements (besides, 20 patches is
big enough to review for now).  For qemu-iotests 23 to work
with unaligned 'read', I also have to fix 'readv' to be
unaligned; which is why this series grew to include blk_aio_*.
Note that there are still a few sector-aligned blk_* interfaces
left even after this series, but we are progressively getting
closer to using byte interfaces throughout the code base.

001/20:[] [--] 'block: Allow BDRV_REQ_FUA through blk_pwrite()'
002/20:[down] 'block: Drop private ioctl-only members of BlockRequest'
003/20:[] [--] 'block: Switch blk_read_unthrottled() to byte interface'
004/20:[down] 'block: Switch blk_*write_zeroes() to byte interface'
005/20:[down] 'block: Introduce byte-based aio read/write'
006/20:[down] 'ide: Switch to byte-based aio block access'
007/20:[down] 'scsi-disk: Switch to byte-based aio block access'
008/20:[down] 'virtio: Switch to byte-based aio block access'
009/20:[down] 'xen_disk: Switch to byte-based aio block access'
010/20:[] [--] 'fdc: Switch to byte-based block access'
011/20:[] [--] 'nand: Switch to byte-based block access'
012/20:[] [--] 'onenand: Switch to byte-based block access'
013/20:[] [--] 'pflash: Switch to byte-based block access'
014/20:[] [--] 'sd: Switch to byte-based block access'
015/20:[0020] [FC] 'm25p80: Switch to byte-based block access'
016/20:[] [--] 'atapi: Switch to byte-based block access'
017/20:[] [--] 'nbd: Switch to byte-based block access'
018/20:[] [--] 'qemu-img: Switch to byte-based block access'
019/20:[0041] [FC] 'qemu-io: Switch to byte-based block access'
020/20:[down] 'block: Kill unused sector-based blk_* functions'

Eric Blake (20):
  block: Allow BDRV_REQ_FUA through blk_pwrite()
  block: Drop private ioctl-only members of BlockRequest
  block: Switch blk_read_unthrottled() to byte interface
  block: Switch blk_*write_zeroes() to byte interface
  block: Introduce byte-based aio read/write
  ide: Switch to byte-based aio block access
  scsi-disk: Switch to byte-based aio block access
  virtio: Switch to byte-based aio block access
  xen_disk: Switch to byte-based aio block access
  fdc: Switch to byte-based block access
  nand: Switch to byte-based block access
  onenand: Switch to byte-based block access
  pflash: Switch to byte-based block access
  sd: Switch to byte-based block access
  m25p80: Switch to byte-based block access
  atapi: Switch to byte-based block access
  nbd: Switch to byte-based block access
  qemu-img: Switch to byte-based block access
  qemu-io: Switch to byte-based block access
  block: Kill unused sector-based blk_* functions

 hw/ide/internal.h  |   2 +-
 include/block/block.h  |  16 ++-
 include/sysemu/block-backend.h |  33 ++---
 include/sysemu/dma.h   |   4 +-
 block/block-backend.c  | 104 -
 block/crypto.c |   2 +-
 block/io.c |   8 ++--
 block/parallels.c  |   5 +-
 block/qcow.c   |   8 ++--
 block/qcow2.c  |   4 +-
 block/qed.c|   6 +--
 block/sheepdog.c   |   2 +-
 block/vdi.c|   4 +-
 block/vhdx.c   |   5 +-
 block/vmdk.c   |  10 ++--
 block/vpc.c|  10 ++--
 dma-helpers.c  |  14 +++---
 hw/block/fdc.c |  25 ++
 hw/block/hd-geometry.c |   2 +-
 hw/block/m25p80.c  |  23 +++--
 hw/block/nand.c|  36 --
 hw/block/onenand.c |  41 ++--
 hw/block/pflash_cfi01.c|  12 ++---
 hw/block/pflash_cfi02.c|  12 ++---
 hw/block/virtio-blk.c  |  18 ---
 hw/block/xen_disk.c|  10 ++--
 hw/ide/atapi.c |  19 
 hw/ide/core.c  |  10 ++--
 hw/ide/macio.c

[Qemu-devel] [PATCH v6 06/20] ide: Switch to byte-based aio block access

2016-05-04 Thread Eric Blake

Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.

The patch had to touch multiple files at once, because dma_blk_io()
takes pointers to the functions, and ide_issue_trim() piggybacks on
the same interface (while ignoring offset under the hood).

Signed-off-by: Eric Blake 
---
 hw/ide/internal.h|  2 +-
 include/sysemu/dma.h |  4 ++--
 dma-helpers.c| 14 +++---
 hw/ide/core.c| 10 +-
 hw/ide/macio.c   |  9 +++--
 5 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index d2c458f..ceb9e59 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -614,7 +614,7 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
 void ide_transfer_stop(IDEState *s);
 void ide_set_inactive(IDEState *s, bool more);
 BlockAIOCB *ide_issue_trim(BlockBackend *blk,
-int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+int64_t offset, QEMUIOVector *qiov, BdrvRequestFlags flags,
 BlockCompletionFunc *cb, void *opaque);
 BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
diff --git a/include/sysemu/dma.h b/include/sysemu/dma.h
index b0fbb9b..0f7cd4d 100644
--- a/include/sysemu/dma.h
+++ b/include/sysemu/dma.h
@@ -197,8 +197,8 @@ void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, 
dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 #endif

-typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t sector_num,
-  QEMUIOVector *iov, int nb_sectors,
+typedef BlockAIOCB *DMAIOFunc(BlockBackend *blk, int64_t offset,
+  QEMUIOVector *iov, BdrvRequestFlags flags,
   BlockCompletionFunc *cb, void *opaque);

 BlockAIOCB *dma_blk_io(BlockBackend *blk,
diff --git a/dma-helpers.c b/dma-helpers.c
index 4ad0bca..a6cc15f 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -73,7 +73,7 @@ typedef struct {
 BlockBackend *blk;
 BlockAIOCB *acb;
 QEMUSGList *sg;
-uint64_t sector_num;
+uint64_t offset;
 DMADirection dir;
 int sg_cur_index;
 dma_addr_t sg_cur_byte;
@@ -130,7 +130,7 @@ static void dma_blk_cb(void *opaque, int ret)
 trace_dma_blk_cb(dbs, ret);

 dbs->acb = NULL;
-dbs->sector_num += dbs->iov.size / 512;
+dbs->offset += dbs->iov.size;

 if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
 dma_complete(dbs, ret);
@@ -164,8 +164,8 @@ static void dma_blk_cb(void *opaque, int ret)
 qemu_iovec_discard_back(&dbs->iov, dbs->iov.size & ~BDRV_SECTOR_MASK);
 }

-dbs->acb = dbs->io_func(dbs->blk, dbs->sector_num, &dbs->iov,
-dbs->iov.size / 512, dma_blk_cb, dbs);
+dbs->acb = dbs->io_func(dbs->blk, dbs->offset, &dbs->iov, 0,
+dma_blk_cb, dbs);
 assert(dbs->acb);
 }

@@ -203,7 +203,7 @@ BlockAIOCB *dma_blk_io(
 dbs->acb = NULL;
 dbs->blk = blk;
 dbs->sg = sg;
-dbs->sector_num = sector_num;
+dbs->offset = sector_num << BDRV_SECTOR_BITS;
 dbs->sg_cur_index = 0;
 dbs->sg_cur_byte = 0;
 dbs->dir = dir;
@@ -219,7 +219,7 @@ BlockAIOCB *dma_blk_read(BlockBackend *blk,
  QEMUSGList *sg, uint64_t sector,
  void (*cb)(void *opaque, int ret), void *opaque)
 {
-return dma_blk_io(blk, sg, sector, blk_aio_readv, cb, opaque,
+return dma_blk_io(blk, sg, sector, blk_aio_preadv, cb, opaque,
   DMA_DIRECTION_FROM_DEVICE);
 }

@@ -227,7 +227,7 @@ BlockAIOCB *dma_blk_write(BlockBackend *blk,
   QEMUSGList *sg, uint64_t sector,
   void (*cb)(void *opaque, int ret), void *opaque)
 {
-return dma_blk_io(blk, sg, sector, blk_aio_writev, cb, opaque,
+return dma_blk_io(blk, sg, sector, blk_aio_pwritev, cb, opaque,
   DMA_DIRECTION_TO_DEVICE);
 }

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 41e6a2d..fe2bfba 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -442,7 +442,7 @@ static void ide_issue_trim_cb(void *opaque, int ret)
 }

 BlockAIOCB *ide_issue_trim(BlockBackend *blk,
-int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+int64_t offset, QEMUIOVector *qiov, BdrvRequestFlags flags,
 BlockCompletionFunc *cb, void *opaque)
 {
 TrimAIOCB *iocb;
@@ -616,8 +616,8 @@ BlockAIOCB *ide_buffered_readv(IDEState *s, int64_t 
sector_num,
 req->iov.iov_len = iov->size;
 qemu_iovec_init_external(&req->qiov, &req->iov, 1);

-aioreq = blk_aio_readv(s->blk, sector_num, &req->qiov, nb_sectors,
-   ide_buffered_readv_cb, req);
+aioreq = blk_aio_preadv(s->blk, sector_num << BDRV_SECTOR_BITS,
+&req->qiov, 0, ide_buffered_readv_cb, req);

 QLIST_INSERT_HEAD(&s->buffered_requests, req, list);
 re

Re: [Qemu-devel] [Bug 1576347] [NEW] Only one NVMe device is usable in Windows (10) guest

2016-05-04 Thread Keith Busch

> >   C:\Windows\system32>sg_vpd -p sn PD1
> >   Unit serial number VPD page:
> > Unit serial number: ___."

I checked your serial number against the SNT refernce on nvmexpress.org and
it's definitely the wrong translation, so that has to be a guest OS driver bug
(Linux has the right translation if interested, but it's use is deprecated).

I pinged some Windows comrades to see if a potential serial conflict prevents
both disks from surfacing.

I'm surprised to see this bad translation as I know of folks successfully
testing multiple nvme drives in various versions of Windows with both the OFA
and Microsoft drivers. An emulated NVMe is no different than real h/w for
namespace identification from the host's perspective.

Re: [Qemu-devel] [Bug 1576347] [NEW] Only one NVMe device is usable in Windows (10) guest

2016-05-04 Thread Keith Busch

On Fri, Apr 29, 2016 at 10:10:39AM +0100, Stefan Hajnoczi wrote:
> On Thu, Apr 28, 2016 at 05:44:21PM -, Tom Yan wrote:
> 
> CCing Keith Busch , maintainer of QEMU NVMe.
> Maybe he has an idea.

Thanks for the report. Sounds like a Windows specific issue as I have no
problem with multiple nvme drives on my dev machines:

[Host]
# uname -r
4.6.0-rc5+

# qemu-system-x86_64 --version
QEMU emulator version 2.5.50, Copyright (c) 2003-2008 Fabrice Bellard

# qemu-system-x86_64 -m 4096 -smp 4 -enable-kvm debian.img \
-drive file=nvme.1.img,if=none,id=one -device nvme,drive=one,serial=foo \
-drive file=nvme.2.img,if=none,id=two -device nvme,drive=two,serial=bar 

[Guest]
# uname -r
4.5.0

# ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1  /dev/nvme1  /dev/nvme1n1

# nvme id-ctrl /dev/nvme0 | grep sn
sn : foo

# nvme id-ctrl /dev/nvme1 | grep sn
sn : bar

> > When there are two NVMe devices specified, only the second one will be
> > usable in Windows. The following error is shown under "Device status" of
> > the failed NVMe controller in Device Manager:
> > 
> > "This device cannot start. (Code 10)
> > 
> > The I/O device is configured incorrectly or the configuration parameters
> > to the driver are incorrect."
> > 
> > The only thing seems suspicious to me is that the nvme emulation in qemu
> > does not have WWN/EUI-64 set for the devices, though I have no idea at
> > all whether that is mandatory:

These are not mandatory. They were only introduced in the 1.1 and 1.2 versions
of the NVMe spec, though we only cared to emulate the 1.0 portions rather than
provide a full featured NVMe controller.

That said, there needs to be care in the host OS to provide an appropriate
translation IF it is using a SCSI stack to talk to NVMe. Linux doesn't care,
but Windows does.

> > "C:\Windows\system32>sg_vpd -i PD1
> > Device Identification VPD page:
> >   Addressed logical unit:
> > designator type: SCSI name string,  code set: UTF-8
> >   SCSI name string:
> >   8086QEMU NVMe Ctrl  00012BDAC262CF831698

The above looks reasonable for your second controller that had serial
2BDAC262CF831698.

> > C:\Windows\system32>sg_vpd -p sn PD1
> > Unit serial number VPD page:
> >   Unit serial number: ___."

This doesn't look like a very good SCSI-NVMe translation and possibly
suspicious. But I don't know the first thing about windows; does it care about
unique unit serial numbers in order to surface a "SCSI" disk?

[Qemu-devel] [PATCH 51/52] target-m68k: add cmpm

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 029c166..2d92bdd 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2758,6 +2758,26 @@ DISAS_INSN(cmpa)
 gen_update_cc_cmp(s, reg, src, opsize);
 }
 
+DISAS_INSN(cmpm)
+{
+TCGv src;
+TCGv dest;
+TCGv reg;
+int opsize;
+
+opsize = insn_opsize(insn);
+
+reg = AREG(insn, 0);
+src = gen_load(s, opsize, reg, 1);
+tcg_gen_addi_i32(reg, reg, opsize_bytes(opsize));
+
+reg = AREG(insn, 9);
+dest = gen_load(s, opsize, reg, 1);
+tcg_gen_addi_i32(reg, reg, opsize_bytes(opsize));
+
+gen_update_cc_cmp(s, dest, src, opsize);
+}
+
 DISAS_INSN(eor)
 {
 TCGv src;
@@ -4876,6 +4896,7 @@ void register_m68k_insns (CPUM68KState *env)
 INSN(cmpa,  b1c0, f1c0, CF_ISA_A);
 INSN(cmp,   b000, f100, M68000);
 INSN(eor,   b100, f100, M68000);
+INSN(cmpm,  b108, f138, M68000);
 INSN(cmpa,  b0c0, f0c0, M68000);
 INSN(eor,   b180, f1c0, CF_ISA_A);
 BASE(and,   c000, f000);
-- 
2.5.5

[Qemu-devel] [PATCH 52/52] target-m68k: sr/ccr cleanup

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 2d92bdd..4f3e8ca 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2239,29 +2239,29 @@ static void gen_set_sr_im(DisasContext *s, uint16_t 
val, int ccr_only)
 set_cc_op(s, CC_OP_FLAGS);
 }
 
-static void gen_set_sr(DisasContext *s, TCGv val, int ccr_only)
+static void gen_set_sr(CPUM68KState *env, DisasContext *s, uint16_t insn,
+   int ccr_only)
 {
-if (ccr_only) {
-gen_helper_set_ccr(cpu_env, val);
+if ((insn & 0x38) == 0) {
+if (ccr_only) {
+gen_helper_set_ccr(cpu_env, DREG(insn, 0));
+} else {
+gen_helper_set_sr(cpu_env, DREG(insn, 0));
+}
+set_cc_op(s, CC_OP_FLAGS);
+} else if ((insn & 0x3f) == 0x3c) {
+uint16_t val;
+val = read_im16(env, s);
+gen_set_sr_im(s, val, ccr_only);
 } else {
-gen_helper_set_sr(cpu_env, val);
+disas_undef(env, s, insn);
 }
-set_cc_op(s, CC_OP_FLAGS);
-}
-
-static void gen_move_to_sr(CPUM68KState *env, DisasContext *s, uint16_t insn,
-   int ccr_only)
-{
-TCGv src;
-
-SRC_EA(env, src, OS_WORD, 0, NULL);
-gen_set_sr(s, src, ccr_only);
 }
 
 
 DISAS_INSN(move_to_ccr)
 {
-gen_move_to_sr(env, s, insn, 1);
+gen_set_sr(env, s, insn, 1);
 }
 
 DISAS_INSN(not)
@@ -3901,7 +3901,7 @@ DISAS_INSN(move_to_sr)
 gen_exception(s, s->pc - 2, EXCP_PRIVILEGE);
 return;
 }
-gen_move_to_sr(env, s, insn, 0);
+gen_set_sr(env, s, insn, 0);
 gen_lookup_tb(s);
 }
 
@@ -4652,8 +4652,9 @@ DISAS_INSN(macsr_to_ccr)
 {
 TCGv tmp = tcg_temp_new();
 tcg_gen_andi_i32(tmp, QREG_MACSR, 0xf);
-gen_set_sr(s, tmp, 1);
+gen_helper_set_sr(cpu_env, tmp);
 tcg_temp_free(tmp);
+set_cc_op(s, CC_OP_FLAGS);
 }
 
 DISAS_INSN(to_mac)
-- 
2.5.5

[Qemu-devel] [PATCH 50/52] target-m68k: immediate ops manage word and byte operands

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 57 ++---
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index a22ee67..029c166 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1842,52 +1842,65 @@ DISAS_INSN(bitop_im)
 DISAS_INSN(arith_im)
 {
 int op;
-uint32_t im;
+TCGv im;
 TCGv src1;
 TCGv dest;
 TCGv addr;
+int opsize;
 
 op = (insn >> 9) & 7;
-SRC_EA(env, src1, OS_LONG, 0, (op == 6) ? NULL : &addr);
-im = read_im32(env, s);
+opsize = insn_opsize(insn);
+switch (opsize) {
+case OS_BYTE:
+im = tcg_const_i32((int8_t)read_im8(env, s));
+break;
+case OS_WORD:
+im = tcg_const_i32((int16_t)read_im16(env, s));
+break;
+case OS_LONG:
+im = tcg_const_i32(read_im32(env, s));
+break;
+default:
+   abort();
+}
+SRC_EA(env, src1, opsize, 1, (op == 6) ? NULL : &addr);
 dest = tcg_temp_new();
 switch (op) {
 case 0: /* ori */
-tcg_gen_ori_i32(dest, src1, im);
-gen_logic_cc(s, dest, OS_LONG);
+tcg_gen_or_i32(dest, src1, im);
+gen_logic_cc(s, dest, opsize);
 break;
 case 1: /* andi */
-tcg_gen_andi_i32(dest, src1, im);
-gen_logic_cc(s, dest, OS_LONG);
+tcg_gen_and_i32(dest, src1, im);
+gen_logic_cc(s, dest, opsize);
 break;
 case 2: /* subi */
-tcg_gen_mov_i32(dest, src1);
-tcg_gen_setcondi_i32(TCG_COND_LTU, QREG_CC_X, dest, im);
-tcg_gen_subi_i32(dest, dest, im);
-gen_update_cc_add(dest, tcg_const_i32(im));
-set_cc_op(s, CC_OP_SUB);
+tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, src1, im);
+tcg_gen_sub_i32(dest, src1, im);
+gen_update_cc_add(dest, im, opsize);
+set_cc_op(s, CC_OP_SUBB + opsize);
 break;
 case 3: /* addi */
-tcg_gen_mov_i32(dest, src1);
-tcg_gen_addi_i32(dest, dest, im);
-gen_update_cc_add(dest, tcg_const_i32(im));
-tcg_gen_setcondi_i32(TCG_COND_LTU, QREG_CC_X, dest, im);
-set_cc_op(s, CC_OP_ADD);
+tcg_gen_add_i32(dest, src1, im);
+gen_update_cc_add(dest, im, opsize);
+tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, dest, im);
+set_cc_op(s, CC_OP_ADDB + opsize);
 break;
 case 5: /* eori */
-tcg_gen_xori_i32(dest, src1, im);
-gen_logic_cc(s, dest, OS_LONG);
+tcg_gen_xor_i32(dest, src1, im);
+gen_logic_cc(s, dest, opsize);
 break;
 case 6: /* cmpi */
-gen_update_cc_add(src1, tcg_const_i32(im));
-set_cc_op(s, CC_OP_CMP);
+gen_update_cc_cmp(s, src1, im, opsize);
 break;
 default:
 abort();
 }
+tcg_temp_free(im);
 if (op != 6) {
-DEST_EA(env, insn, OS_LONG, dest, &addr);
+DEST_EA(env, insn, opsize, dest, &addr);
 }
+tcg_temp_free(dest);
 }
 
 DISAS_INSN(cas)
-- 
2.5.5

[Qemu-devel] [PATCH 43/52] target-m68k: or can manage word and byte operands

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index cfe878a..15109ed 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2572,19 +2572,21 @@ DISAS_INSN(or)
 TCGv dest;
 TCGv src;
 TCGv addr;
+int opsize;
 
-reg = DREG(insn, 9);
+opsize = insn_opsize(insn);
+reg = gen_extend(DREG(insn, 9), opsize, 0);
 dest = tcg_temp_new();
 if (insn & 0x100) {
-SRC_EA(env, src, OS_LONG, 0, &addr);
+SRC_EA(env, src, opsize, 0, &addr);
 tcg_gen_or_i32(dest, src, reg);
-DEST_EA(env, insn, OS_LONG, dest, &addr);
+DEST_EA(env, insn, opsize, dest, &addr);
 } else {
-SRC_EA(env, src, OS_LONG, 0, NULL);
+SRC_EA(env, src, opsize, 0, NULL);
 tcg_gen_or_i32(dest, src, reg);
-tcg_gen_mov_i32(reg, dest);
+gen_partset_reg(opsize, DREG(insn, 9), dest);
 }
-gen_logic_cc(s, dest, OS_LONG);
+gen_logic_cc(s, dest, opsize);
 }
 
 DISAS_INSN(suba)
-- 
2.5.5

[Qemu-devel] [PATCH 46/52] target-m68k: introduce byte and word cc_ops

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/cpu.h   |  6 +--
 target-m68k/helper.c| 25 +
 target-m68k/translate.c | 99 ++---
 3 files changed, 80 insertions(+), 50 deletions(-)

diff --git a/target-m68k/cpu.h b/target-m68k/cpu.h
index 5ce77e4..1112502 100644
--- a/target-m68k/cpu.h
+++ b/target-m68k/cpu.h
@@ -147,11 +147,11 @@ typedef enum {
 CC_OP_FLAGS,
 
 /* X in cc_x, C = X, N in cc_n, Z in cc_n, V via cc_n/cc_v.  */
-CC_OP_ADD,
-CC_OP_SUB,
+CC_OP_ADDB, CC_OP_ADDW, CC_OP_ADDL,
+CC_OP_SUBB, CC_OP_SUBW, CC_OP_SUBL,
 
 /* X in cc_x, {N,Z,C,V} via cc_n/cc_v.  */
-CC_OP_CMP,
+CC_OP_CMPB, CC_OP_CMPW, CC_OP_CMPL,
 
 /* X in cc_x, C = 0, V = 0, N in cc_n, Z in cc_n.  */
 CC_OP_LOGIC,
diff --git a/target-m68k/helper.c b/target-m68k/helper.c
index 76dda44..4d346a7 100644
--- a/target-m68k/helper.c
+++ b/target-m68k/helper.c
@@ -544,32 +544,41 @@ void HELPER(mac_set_flags)(CPUM68KState *env, uint32_t 
acc)
 }
 }
 
+#define EXTSIGN(val, index) ( \
+(index == 0) ? (int8_t)(val) : ((index == 1) ? (int16_t)(val) : (val)) \
+)
 
 #define COMPUTE_CCR(op, x, n, z, v, c) {   \
 switch (op) {  \
 case CC_OP_FLAGS:  \
 /* Everything in place.  */\
 break; \
-case CC_OP_ADD:\
+case CC_OP_ADDB:   \
+case CC_OP_ADDW:   \
+case CC_OP_ADDL:   \
 res = n;   \
 src2 = v;  \
-src1 = res - src2; \
+src1 = EXTSIGN(res - src2, op - CC_OP_ADDB);   \
 c = x; \
 z = n; \
 v = (res ^ src1) & ~(src1 ^ src2); \
 break; \
-case CC_OP_SUB:\
+case CC_OP_SUBB:   \
+case CC_OP_SUBW:   \
+case CC_OP_SUBL:   \
 res = n;   \
 src2 = v;  \
-src1 = res + src2; \
+src1 = EXTSIGN(res + src2, op - CC_OP_SUBB);   \
 c = x; \
 z = n; \
 v = (res ^ src1) & (src1 ^ src2);  \
 break; \
-case CC_OP_CMP:\
+case CC_OP_CMPB:   \
+case CC_OP_CMPW:   \
+case CC_OP_CMPL:   \
 src1 = n;  \
 src2 = v;  \
-res = src1 - src2; \
+res = EXTSIGN(src1 - src2, op - CC_OP_CMPB);   \
 n = res;   \
 z = res;   \
 c = src1 < src2;   \
@@ -590,16 +599,16 @@ uint32_t cpu_m68k_get_ccr(CPUM68KState *env)
 uint32_t res, src1, src2;
 
 x = env->cc_x;
-c = env->cc_c;
 n = env->cc_n;
 z = env->cc_z;
 v = env->cc_v;
+c = env->cc_c;
 
 COMPUTE_CCR(env->cc_op, x, n, z, v, c);
 
 n = n >> 31;
-v = v >> 31;
 z = (z == 0);
+v = v >> 31;
 
 return x*CCF_X + n*CCF_N + z*CCF_Z + v*CCF_V + c*CCF_C;
 }
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index a8e9b64..2b6ba15 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -176,9 +176,9 @@ typedef void (*disas_proc)(CPUM68KState *env, DisasContext 
*s, uint16_t insn);
 
 static const uint8_t cc_op_live[CC_OP_NB] = {
 [CC_OP_

[Qemu-devel] [PATCH 48/52] target-m68k: add/sub manage word and byte operands

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 71 ++---
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index bd7394f..f880a2a 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1620,35 +1620,37 @@ DISAS_INSN(addsub)
 TCGv tmp;
 TCGv addr;
 int add;
+int opsize;
 
 add = (insn & 0x4000) != 0;
-reg = DREG(insn, 9);
+opsize = insn_opsize(insn);
+reg = gen_extend(DREG(insn, 9), opsize, 1);
 dest = tcg_temp_new();
 if (insn & 0x100) {
-SRC_EA(env, tmp, OS_LONG, 0, &addr);
+SRC_EA(env, tmp, opsize, 1, &addr);
 src = reg;
 } else {
 tmp = reg;
-SRC_EA(env, src, OS_LONG, 0, NULL);
+SRC_EA(env, src, opsize, 1, NULL);
 }
 if (add) {
 tcg_gen_add_i32(dest, tmp, src);
 tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, dest, src);
-set_cc_op(s, CC_OP_ADD);
+set_cc_op(s, CC_OP_ADDB + opsize);
 } else {
 tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, tmp, src);
 tcg_gen_sub_i32(dest, tmp, src);
-set_cc_op(s, CC_OP_SUB);
+set_cc_op(s, CC_OP_SUBB + opsize);
 }
-gen_update_cc_add(dest, src);
+gen_update_cc_add(dest, src, opsize);
 if (insn & 0x100) {
-DEST_EA(env, insn, OS_LONG, dest, &addr);
+DEST_EA(env, insn, opsize, dest, &addr);
 } else {
-tcg_gen_mov_i32(reg, dest);
+gen_partset_reg(opsize, DREG(insn, 9), dest);
 }
+tcg_temp_free(dest);
 }
 
-
 /* Reverse the order of the bits in REG.  */
 DISAS_INSN(bitrev)
 {
@@ -2482,40 +2484,46 @@ DISAS_INSN(jump)
 
 DISAS_INSN(addsubq)
 {
-TCGv src1;
-TCGv src2;
+TCGv src;
 TCGv dest;
-int val;
+TCGv val;
+int imm;
 TCGv addr;
+int opsize;
 
-SRC_EA(env, src1, OS_LONG, 0, &addr);
-val = (insn >> 9) & 7;
-if (val == 0)
-val = 8;
+if ((insn & 070) == 010) {
+/* Operation on address register is always long.  */
+opsize = OS_LONG;
+} else
+opsize = insn_opsize(insn);
+SRC_EA(env, src, opsize, 1, &addr);
+imm = (insn >> 9) & 7;
+if (imm == 0)
+imm = 8;
+val = tcg_const_i32(imm);
 dest = tcg_temp_new();
-tcg_gen_mov_i32(dest, src1);
+tcg_gen_mov_i32(dest, src);
 if ((insn & 0x38) == 0x08) {
 /* Don't update condition codes if the destination is an
address register.  */
 if (insn & 0x0100) {
-tcg_gen_subi_i32(dest, dest, val);
+tcg_gen_sub_i32(dest, dest, val);
 } else {
-tcg_gen_addi_i32(dest, dest, val);
+tcg_gen_add_i32(dest, dest, val);
 }
 } else {
-src2 = tcg_const_i32(val);
 if (insn & 0x0100) {
-tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, dest, src2);
-tcg_gen_sub_i32(dest, dest, src2);
-set_cc_op(s, CC_OP_SUB);
+tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, dest, val);
+tcg_gen_sub_i32(dest, dest, val);
+set_cc_op(s, CC_OP_SUBB + opsize);
 } else {
-tcg_gen_add_i32(dest, dest, src2);
-tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, dest, src2);
-set_cc_op(s, CC_OP_ADD);
+tcg_gen_add_i32(dest, dest, val);
+tcg_gen_setcond_i32(TCG_COND_LTU, QREG_CC_X, dest, val);
+set_cc_op(s, CC_OP_ADDB + opsize);
 }
-gen_update_cc_add(dest, src2);
+gen_update_cc_add(dest, val, opsize);
 }
-DEST_EA(env, insn, OS_LONG, dest, &addr);
+DEST_EA(env, insn, opsize, dest, &addr);
 }
 
 DISAS_INSN(tpf)
@@ -4804,16 +4812,13 @@ void register_m68k_insns (CPUM68KState *env)
 BASE(rts,   4e75, );
 INSN(movec, 4e7b, , CF_ISA_A);
 BASE(jump,  4e80, ffc0);
-INSN(jump,  4ec0, ffc0, CF_ISA_A);
-INSN(addsubq,   5180, f1c0, CF_ISA_A);
-INSN(jump,  4ec0, ffc0, M68000);
+BASE(jump,  4ec0, ffc0);
 INSN(addsubq,   5000, f080, M68000);
-INSN(addsubq,   5080, f0c0, M68000);
+BASE(addsubq,   5080, f0c0);
 INSN(scc,   50c0, f0f8, CF_ISA_A);
 INSN(scc_mem,   50c0, f0c0, M68000);
 INSN(scc,   50c0, f0f8, M68000);
 INSN(dbcc,  50c8, f0f8, M68000);
-INSN(addsubq,   5080, f1c0, CF_ISA_A);
 INSN(tpf,   51f8, fff8, CF_ISA_A);
 
 /* Branch instructions.  */
-- 
2.5.5

[Qemu-devel] [PATCH 44/52] target-m68k: and can manage word and byte operands

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 15109ed..9fed334 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2775,19 +2775,23 @@ DISAS_INSN(and)
 TCGv reg;
 TCGv dest;
 TCGv addr;
+int opsize;
 
-reg = DREG(insn, 9);
 dest = tcg_temp_new();
+
+opsize = insn_opsize(insn);
+reg = DREG(insn, 9);
 if (insn & 0x100) {
-SRC_EA(env, src, OS_LONG, 0, &addr);
+SRC_EA(env, src, opsize, 0, &addr);
 tcg_gen_and_i32(dest, src, reg);
-DEST_EA(env, insn, OS_LONG, dest, &addr);
+DEST_EA(env, insn, opsize, dest, &addr);
 } else {
-SRC_EA(env, src, OS_LONG, 0, NULL);
+SRC_EA(env, src, opsize, 0, NULL);
 tcg_gen_and_i32(dest, src, reg);
-tcg_gen_mov_i32(reg, dest);
+gen_partset_reg(opsize, reg, dest);
 }
-gen_logic_cc(s, dest, OS_LONG);
+tcg_temp_free(dest);
+gen_logic_cc(s, dest, opsize);
 }
 
 DISAS_INSN(adda)
-- 
2.5.5

[Qemu-devel] [PATCH 47/52] target-m68k: add addressing modes to neg

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 2b6ba15..bd7394f 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2194,16 +2194,20 @@ DISAS_INSN(move_from_ccr)
 
 DISAS_INSN(neg)
 {
-TCGv reg;
 TCGv src1;
+TCGv dest;
+TCGv addr;
+int opsize;
 
-reg = DREG(insn, 0);
-src1 = tcg_temp_new();
-tcg_gen_mov_i32(src1, reg);
-tcg_gen_neg_i32(reg, src1);
-gen_update_cc_add(reg, src1);
-tcg_gen_setcondi_i32(TCG_COND_NE, QREG_CC_X, src1, 0);
-set_cc_op(s, CC_OP_SUB);
+opsize = insn_opsize(insn);
+SRC_EA(env, src1, opsize, 1, &addr);
+dest = tcg_temp_new();
+tcg_gen_neg_i32(dest, src1);
+set_cc_op(s, CC_OP_SUBB + opsize);
+gen_update_cc_add(dest, src1, opsize);
+tcg_gen_setcondi_i32(TCG_COND_NE, QREG_CC_X, dest, 0);
+DEST_EA(env, insn, opsize, dest, &addr);
+tcg_temp_free(dest);
 }
 
 static void gen_set_sr_im(DisasContext *s, uint16_t val, int ccr_only)
-- 
2.5.5

[Qemu-devel] [PATCH 49/52] target-m68k: cmp manages word and bytes operands

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index f880a2a..a22ee67 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2724,10 +2724,9 @@ DISAS_INSN(cmp)
 int opsize;
 
 opsize = insn_opsize(insn);
-SRC_EA(env, src, opsize, -1, NULL);
-reg = DREG(insn, 9);
-gen_update_cc_add(reg, src);
-set_cc_op(s, CC_OP_CMP);
+SRC_EA(env, src, opsize, 1, NULL);
+reg = gen_extend(DREG(insn, 9), opsize, 1);
+gen_update_cc_cmp(s, reg, src, opsize);
 }
 
 DISAS_INSN(cmpa)
@@ -2743,8 +2742,7 @@ DISAS_INSN(cmpa)
 }
 SRC_EA(env, src, opsize, 1, NULL);
 reg = AREG(insn, 9);
-gen_update_cc_add(reg, src);
-set_cc_op(s, CC_OP_CMP);
+gen_update_cc_cmp(s, reg, src, opsize);
 }
 
 DISAS_INSN(eor)
-- 
2.5.5

[Qemu-devel] [PATCH 42/52] target-m68k: eor can manage word and byte operands

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 1c3c9a2..cfe878a 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2715,16 +2715,17 @@ DISAS_INSN(cmpa)
 DISAS_INSN(eor)
 {
 TCGv src;
-TCGv reg;
 TCGv dest;
 TCGv addr;
+int opsize;
 
-SRC_EA(env, src, OS_LONG, 0, &addr);
-reg = DREG(insn, 9);
+opsize = insn_opsize(insn);
+
+SRC_EA(env, src, opsize, 0, &addr);
 dest = tcg_temp_new();
-tcg_gen_xor_i32(dest, src, reg);
-gen_logic_cc(s, dest, OS_LONG);
-DEST_EA(env, insn, OS_LONG, dest, &addr);
+tcg_gen_xor_i32(dest, src, DREG(insn, 9));
+gen_logic_cc(s, dest, opsize);
+DEST_EA(env, insn, opsize, dest, &addr);
 }
 
 DISAS_INSN(exg)
-- 
2.5.5

[Qemu-devel] [PATCH 45/52] target-m68k: suba/adda can manage word operand

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 9fed334..a8e9b64 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2594,7 +2594,7 @@ DISAS_INSN(suba)
 TCGv src;
 TCGv reg;
 
-SRC_EA(env, src, OS_LONG, 0, NULL);
+SRC_EA(env, src, (insn & 0x100) ? OS_LONG : OS_WORD, 1, NULL);
 reg = AREG(insn, 9);
 tcg_gen_sub_i32(reg, reg, src);
 }
@@ -2799,7 +2799,7 @@ DISAS_INSN(adda)
 TCGv src;
 TCGv reg;
 
-SRC_EA(env, src, OS_LONG, 0, NULL);
+SRC_EA(env, src, (insn & 0x100) ? OS_LONG : OS_WORD, 1, NULL);
 reg = AREG(insn, 9);
 tcg_gen_add_i32(reg, reg, src);
 }
@@ -4812,6 +4812,7 @@ void register_m68k_insns (CPUM68KState *env)
 INSN(subx_reg,  9100, f138, M68000);
 INSN(subx_mem,  9108, f138, M68000);
 INSN(suba,  91c0, f1c0, CF_ISA_A);
+INSN(suba,  90c0, f0c0, M68000);
 
 BASE(undef_mac, a000, f000);
 INSN(mac,   a000, f100, CF_EMAC);
-- 
2.5.5

[Qemu-devel] [PATCH 41/52] target-m68k: add addressing modes to not

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index df5ce94..1c3c9a2 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2226,11 +2226,17 @@ DISAS_INSN(move_to_ccr)
 
 DISAS_INSN(not)
 {
-TCGv reg;
+TCGv src1;
+TCGv dest;
+TCGv addr;
+int opsize;
 
-reg = DREG(insn, 0);
-tcg_gen_not_i32(reg, reg);
-gen_logic_cc(s, reg, OS_LONG);
+opsize = insn_opsize(insn);
+SRC_EA(env, src1, opsize, 1, &addr);
+dest = tcg_temp_new();
+tcg_gen_not_i32(dest, src1);
+DEST_EA(env, insn, opsize, dest, &addr);
+gen_logic_cc(s, dest, opsize);
 }
 
 DISAS_INSN(swap)
-- 
2.5.5

[Qemu-devel] [PATCH 40/52] target-m68k: add exg ops

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 53c3c41..df5ce94 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2721,6 +2721,45 @@ DISAS_INSN(eor)
 DEST_EA(env, insn, OS_LONG, dest, &addr);
 }
 
+DISAS_INSN(exg)
+{
+TCGv src;
+TCGv reg;
+TCGv dest;
+int exg_mode;
+
+exg_mode = insn & 0x1f8;
+
+dest = tcg_temp_new();
+switch (exg_mode) {
+case 0x140:
+/* exchange Dx and Dy */
+src = DREG(insn, 9);
+reg = DREG(insn, 0);
+tcg_gen_mov_i32(dest, src);
+tcg_gen_mov_i32(src, reg);
+tcg_gen_mov_i32(reg, dest);
+break;
+case 0x148:
+/* exchange Ax and Ay */
+src = AREG(insn, 9);
+reg = AREG(insn, 0);
+tcg_gen_mov_i32(dest, src);
+tcg_gen_mov_i32(src, reg);
+tcg_gen_mov_i32(reg, dest);
+break;
+case 0x188:
+/* exchange Dx and Ay */
+src = DREG(insn, 9);
+reg = AREG(insn, 0);
+tcg_gen_mov_i32(dest, src);
+tcg_gen_mov_i32(src, reg);
+tcg_gen_mov_i32(reg, dest);
+break;
+}
+tcg_temp_free(dest);
+}
+
 DISAS_INSN(and)
 {
 TCGv src;
@@ -4785,6 +4824,12 @@ void register_m68k_insns (CPUM68KState *env)
 INSN(cmpa,  b0c0, f0c0, M68000);
 INSN(eor,   b180, f1c0, CF_ISA_A);
 BASE(and,   c000, f000);
+INSN(undef, c140, f1f8, CF_ISA_A);
+INSN(exg,   c140, f1f8, M68000);
+INSN(undef, c148, f1f8, CF_ISA_A);
+INSN(exg,   c148, f1f8, M68000);
+INSN(undef, c188, f1f8, CF_ISA_A);
+INSN(exg,   c188, f1f8, M68000);
 BASE(mulw,  c0c0, f0c0);
 INSN(abcd_reg,  c100, f1f8, M68000);
 INSN(abcd_mem,  c108, f1f8, M68000);
-- 
2.5.5

Re: [Qemu-devel] [RFC PATCH v3 2/3] VFIO driver for vGPU device

2016-05-04 Thread Neo Jia

On Wed, May 04, 2016 at 11:06:19AM -0600, Alex Williamson wrote:
> On Wed, 4 May 2016 03:23:13 +
> "Tian, Kevin"  wrote:
> 
> > > From: Alex Williamson [mailto:alex.william...@redhat.com]
> > > Sent: Wednesday, May 04, 2016 6:43 AM  
> > > > +
> > > > +   if (gpu_dev->ops->write) {
> > > > +   ret = gpu_dev->ops->write(vgpu_dev,
> > > > + user_data,
> > > > + count,
> > > > + 
> > > > vgpu_emul_space_config,
> > > > + pos);
> > > > +   }
> > > > +
> > > > +   memcpy((void *)(vdev->vconfig + pos), (void 
> > > > *)user_data, count);  
> > > 
> > > So write is expected to user_data to allow only the writable bits to be
> > > changed?  What's really being saved in the vconfig here vs the vendor
> > > vgpu driver?  It seems like we're only using it to cache the BAR
> > > values, but we're not providing the BAR emulation here, which seems
> > > like one of the few things we could provide so it's not duplicated in
> > > every vendor driver.  But then we only need a few u32s to do that, not
> > > all of config space.  
> > 
> > We can borrow same vconfig emulation from existing vfio-pci driver.
> > But doing so doesn't mean that vendor vgpu driver cannot have its
> > own vconfig emulation further. vGPU is not like a real device, since
> > there may be no physical config space implemented for each vGPU.
> > So anyway vendor vGPU driver needs to create/emulate the virtualized 
> > config space while the way how is created might be vendor specific. 
> > So better to keep the interface to access raw vconfig space from
> > vendor vGPU driver.
> 
> I'm hoping config space will be very simple for a vgpu, so I don't know
> that it makes sense to add that complexity early on.  Neo/Kirti, what
> capabilities do you expect to provide?  Who provides the MSI
> capability?  Is a PCIe capability provided?  Others?

Currently only standard PCI caps.

MSI cap is emulated by the vendor drivers via the above interface.

No PCIe caps so far.

>  
> > > > +static ssize_t vgpu_dev_rw(void *device_data, char __user *buf,
> > > > +   size_t count, loff_t *ppos, bool iswrite)
> > > > +{
> > > > +   unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> > > > +   struct vfio_vgpu_device *vdev = device_data;
> > > > +
> > > > +   if (index >= VFIO_PCI_NUM_REGIONS)
> > > > +   return -EINVAL;
> > > > +
> > > > +   switch (index) {
> > > > +   case VFIO_PCI_CONFIG_REGION_INDEX:
> > > > +   return vgpu_dev_config_rw(vdev, buf, count, ppos, 
> > > > iswrite);
> > > > +
> > > > +   case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
> > > > +   return vgpu_dev_bar_rw(vdev, buf, count, ppos, iswrite);
> > > > +
> > > > +   case VFIO_PCI_ROM_REGION_INDEX:
> > > > +   case VFIO_PCI_VGA_REGION_INDEX:  
> > > 
> > > Wait a sec, who's doing the VGA emulation?  We can't be claiming to
> > > support a VGA region and then fail to provide read/write access to it
> > > like we said it has.  
> > 
> > For Intel side we plan to not support VGA region when upstreaming our
> > KVMGT work, which means Intel vGPU will be exposed only as a 
> > secondary graphics card then so legacy VGA is not required. Also no
> > VBIOS/ROM requirement. Guess we can remove above two regions.
> 
> So this needs to be optional based on what the mediation driver
> provides.  It seems like we're just making passthroughs for the vendor
> mediation driver to speak vfio.
> 
> > > > +
> > > > +static int vgpu_dev_mmio_fault(struct vm_area_struct *vma, struct 
> > > > vm_fault *vmf)
> > > > +{
> > > > +   int ret = 0;
> > > > +   struct vfio_vgpu_device *vdev = vma->vm_private_data;
> > > > +   struct vgpu_device *vgpu_dev;
> > > > +   struct gpu_device *gpu_dev;
> > > > +   u64 virtaddr = (u64)vmf->virtual_address;
> > > > +   u64 offset, phyaddr;
> > > > +   unsigned long req_size, pgoff;
> > > > +   pgprot_t pg_prot;
> > > > +
> > > > +   if (!vdev && !vdev->vgpu_dev)
> > > > +   return -EINVAL;
> > > > +
> > > > +   vgpu_dev = vdev->vgpu_dev;
> > > > +   gpu_dev  = vgpu_dev->gpu_dev;
> > > > +
> > > > +   offset   = vma->vm_pgoff << PAGE_SHIFT;
> > > > +   phyaddr  = virtaddr - vma->vm_start + offset;
> > > > +   pgoff= phyaddr >> PAGE_SHIFT;
> > > > +   req_size = vma->vm_end - virtaddr;
> > > > +   pg_prot  = vma->vm_page_prot;
> > > > +
> > > > +   if (gpu_dev->ops->validate_map_request) {
> > > > +   ret = gpu_dev->ops->validate_map_request(vgpu_dev, 
> > > > virtaddr, &pgoff,
> > > > +&req_size, 
> > > > &pg_prot);
> > > > +   if (ret)
> > > > +   return ret;

[Qemu-devel] [PATCH 38/52] target-m68k: add linkl

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 80033fc..9a38235 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2351,21 +2351,36 @@ DISAS_INSN(mull)
 }
 }
 
-DISAS_INSN(link)
+static void gen_link(DisasContext *s, uint16_t insn, int32_t offset)
 {
-int16_t offset;
 TCGv reg;
 TCGv tmp;
 
-offset = cpu_ldsw_code(env, s->pc);
-s->pc += 2;
 reg = AREG(insn, 0);
 tmp = tcg_temp_new();
 tcg_gen_subi_i32(tmp, QREG_SP, 4);
 gen_store(s, OS_LONG, tmp, reg);
-if ((insn & 7) != 7)
+if ((insn & 7) != 7) {
 tcg_gen_mov_i32(reg, tmp);
+}
 tcg_gen_addi_i32(QREG_SP, tmp, offset);
+tcg_temp_free(tmp);
+}
+
+DISAS_INSN(link)
+{
+int16_t offset;
+
+offset = read_im16(env, s);
+gen_link(s, insn, offset);
+}
+
+DISAS_INSN(linkl)
+{
+int32_t offset;
+
+offset = read_im32(env, s);
+gen_link(s, insn, offset);
 }
 
 DISAS_INSN(unlk)
@@ -4661,6 +4676,7 @@ void register_m68k_insns (CPUM68KState *env)
 INSN(undef, 46c0, ffc0, M68000);
 INSN(move_to_sr, 46c0, ffc0, CF_ISA_A);
 INSN(nbcd,  4800, ffc0, M68000);
+INSN(linkl, 4808, fff8, M68000);
 BASE(pea,   4840, ffc0);
 BASE(swap,  4840, fff8);
 INSN(bkpt,  4848, fff8, M68000);
-- 
2.5.5

[Qemu-devel] [PATCH 36/52] target-m68k: inline shift ops

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 211 
 1 file changed, 176 insertions(+), 35 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index d183a3c..d48ab66 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -551,7 +551,7 @@ static inline void gen_ext(TCGv res, TCGv val, int opsize, 
int sign)
 }
 }
 
-static TCGv gen_extend(TCGv val, int opsize, int sign)
+static inline TCGv gen_extend(TCGv val, int opsize, int sign)
 {
 TCGv tmp;
 
@@ -2615,19 +2615,51 @@ DISAS_INSN(addx_mem)
 gen_store(s, opsize, addr_dest, QREG_CC_N);
 }
 
-DISAS_INSN(shift_im)
+static inline void shift_im(DisasContext *s, uint16_t insn, int opsize)
 {
-TCGv reg = DREG(insn, 0);
 int count = (insn >> 9) & 7;
 int logical = insn & 8;
+int left = insn & 0x100;
+int bits = opsize_bytes(opsize) * 8;
+TCGv reg = gen_extend(DREG(insn, 0), opsize, !logical);
+TCGv zero;
 
-if (count == 0) {
-count = 8;
-}
+count = ((count - 1) & 0x7) + 1; /* 1..8 */
 
-if (insn & 0x100) {
-tcg_gen_shri_i32(QREG_CC_C, reg, 31 - count);
+zero = tcg_const_i32(0);
+if (left) {
+tcg_gen_shri_i32(QREG_CC_C, reg, bits - count);
 tcg_gen_shli_i32(QREG_CC_N, reg, count);
+
+/* Note that ColdFire always clears V,
+   while M68000 sets if the most significant bit is changed at
+   any time during the shift operation */
+tcg_gen_mov_i32(QREG_CC_V, zero);
+if (!logical && m68k_feature(s->env, M68K_FEATURE_M68000)) {
+/* if shift count >= bits, V is (reg != 0) */
+if (count >= bits) {
+tcg_gen_setcond_i32(TCG_COND_EQ, QREG_CC_V, reg, zero);
+/* adjust V: (1,0) -> (0,-1) */
+tcg_gen_subi_i32(QREG_CC_V, QREG_CC_V, 1);
+} else {
+TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_const_i32(bits - 1 - count);
+
+tcg_gen_shr_i32(QREG_CC_V, reg, t1);
+tcg_gen_sar_i32(t0, reg, t1);
+tcg_temp_free(t1);
+tcg_gen_not_i32(t0, t0);
+
+tcg_gen_setcond_i32(TCG_COND_EQ, QREG_CC_V, QREG_CC_V, zero);
+tcg_gen_setcond_i32(TCG_COND_EQ, t0, t0, zero);
+tcg_gen_or_i32(QREG_CC_V, QREG_CC_V, t0); /* V is !V here */
+
+tcg_temp_free(t0);
+
+/* adjust V: (1,0) -> (0,-1) */
+tcg_gen_subi_i32(QREG_CC_V, QREG_CC_V, 1);
+}
+}
 } else {
 tcg_gen_shri_i32(QREG_CC_C, reg, count - 1);
 if (logical) {
@@ -2635,30 +2667,28 @@ DISAS_INSN(shift_im)
 } else {
 tcg_gen_sari_i32(QREG_CC_N, reg, count);
 }
+tcg_gen_mov_i32(QREG_CC_V, zero);
 }
+
+gen_ext(QREG_CC_N, QREG_CC_N, opsize, 1);
 tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
 tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
 tcg_gen_mov_i32(QREG_CC_X, QREG_CC_C);
 
-/* Note that ColdFire always clears V, while M68000 sets it for
-   a change in the sign bit.  */
-if (!logical && m68k_feature(s->env, M68K_FEATURE_M68000)) {
-tcg_gen_xor_i32(QREG_CC_V, QREG_CC_N, reg);
-} else {
-tcg_gen_movi_i32(QREG_CC_V, 0);
-}
-
-tcg_gen_mov_i32(reg, QREG_CC_N);
+gen_partset_reg(opsize, DREG(insn, 0), QREG_CC_N);
 set_cc_op(s, CC_OP_FLAGS);
 }
 
-DISAS_INSN(shift_reg)
+static inline void shift_reg(DisasContext *s, uint16_t insn, int opsize)
 {
-TCGv reg, s32;
-TCGv_i64 t64, s64;
 int logical = insn & 8;
+int left = insn & 0x100;
+int bits = opsize_bytes(opsize) * 8;
+TCGv reg = gen_extend(DREG(insn, 0), opsize, !logical);
+TCGv s32;
+TCGv_i64 t64, s64;
+TCGv zero;
 
-reg = DREG(insn, 0);
 t64 = tcg_temp_new_i64();
 s64 = tcg_temp_new_i64();
 s32 = tcg_temp_new();
@@ -2669,44 +2699,148 @@ DISAS_INSN(shift_reg)
 tcg_gen_andi_i32(s32, DREG(insn, 9), 63);
 tcg_gen_extu_i32_i64(s64, s32);
 
-/* Non-arithmetic shift clears V.  Use it as a source zero here.  */
-tcg_gen_movi_i32(QREG_CC_V, 0);
+zero = tcg_const_i32(0);
 
-if (insn & 0x100) {
-tcg_gen_extu_i32_i64(t64, reg);
+tcg_gen_extu_i32_i64(t64, reg);
+if (left) {
+tcg_gen_shli_i64(t64, t64, 32 - bits);
 tcg_gen_shl_i64(t64, t64, s64);
 tcg_temp_free_i64(s64);
 tcg_gen_extr_i64_i32(QREG_CC_N, QREG_CC_C, t64);
 tcg_temp_free_i64(t64);
+tcg_gen_sari_i32(QREG_CC_N, QREG_CC_N, 32 - bits);
 tcg_gen_andi_i32(QREG_CC_C, QREG_CC_C, 1);
+
+/* Note that ColdFire always clears V,
+   while M68000 sets if the most significant bit is changed at
+   any time during the shift operation */
+tcg_gen_mov_i32(QREG_CC_V, zero);
+if (!logical && m68k_feature(s->env, M68K_FEATURE_M68000)) {
+
+TCGv t1 = tcg_co

[Qemu-devel] [PATCH 39/52] target-m68k: movem

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 51 ++---
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 9a38235..53c3c41 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1714,6 +1714,8 @@ DISAS_INSN(movem)
 TCGv reg;
 TCGv tmp;
 int is_load;
+int opsize;
+int32_t incr;
 
 mask = read_im16(env, s);
 tmp = gen_lea(env, s, insn, OS_LONG);
@@ -1724,21 +1726,40 @@ DISAS_INSN(movem)
 addr = tcg_temp_new();
 tcg_gen_mov_i32(addr, tmp);
 is_load = ((insn & 0x0400) != 0);
-for (i = 0; i < 16; i++, mask >>= 1) {
-if (mask & 1) {
-if (i < 8)
-reg = DREG(i, 0);
-else
-reg = AREG(i, 0);
-if (is_load) {
-tmp = gen_load(s, OS_LONG, addr, 0);
-tcg_gen_mov_i32(reg, tmp);
-} else {
-gen_store(s, OS_LONG, addr, reg);
-}
-if (mask != 1)
-tcg_gen_addi_i32(addr, addr, 4);
-}
+opsize = (insn & 0x40) != 0 ? OS_LONG : OS_WORD;
+incr = opsize_bytes(opsize);
+if (!is_load && (insn & 070) == 040) {
+   for (i = 15; i >= 0; i--, mask >>= 1) {
+   if (mask & 1) {
+   if (i < 8)
+   reg = DREG(i, 0);
+   else
+   reg = AREG(i, 0);
+   gen_store(s, opsize, addr, reg);
+   if (mask != 1)
+   tcg_gen_subi_i32(addr, addr, incr);
+   }
+   }
+   tcg_gen_mov_i32(AREG(insn, 0), addr);
+} else {
+   for (i = 0; i < 16; i++, mask >>= 1) {
+   if (mask & 1) {
+   if (i < 8)
+   reg = DREG(i, 0);
+   else
+   reg = AREG(i, 0);
+   if (is_load) {
+   tmp = gen_load(s, opsize, addr, 1);
+   tcg_gen_mov_i32(reg, tmp);
+   } else {
+   gen_store(s, opsize, addr, reg);
+   }
+   if (mask != 1 || (insn & 070) == 030)
+   tcg_gen_addi_i32(addr, addr, incr);
+   }
+   }
+   if ((insn & 070) == 030)
+   tcg_gen_mov_i32(AREG(insn, 0), addr);
 }
 }
 
-- 
2.5.5

[Qemu-devel] [PATCH 35/52] target-m68k: inline rotate ops

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 353 
 1 file changed, 353 insertions(+)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 1d05c6a..d183a3c 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2710,6 +2710,352 @@ DISAS_INSN(shift_reg)
 set_cc_op(s, CC_OP_FLAGS);
 }
 
+static inline void rotate(TCGv reg, TCGv shift, int left, int size)
+{
+if (size == 32) {
+if (left) {
+tcg_gen_rotl_i32(reg, reg, shift);
+} else {
+tcg_gen_rotr_i32(reg, reg, shift);
+}
+} else {
+TCGv t0;
+
+if (left) {
+tcg_gen_shl_i32(reg, reg, shift);
+} else {
+tcg_gen_shli_i32(reg, reg, size);
+tcg_gen_shr_i32(reg, reg, shift);
+}
+
+t0 = tcg_temp_new();
+tcg_gen_shri_i32(t0, reg, size);
+tcg_gen_or_i32(reg, reg, t0);
+tcg_temp_free(t0);
+if (size == 8) {
+tcg_gen_ext8s_i32(reg, reg);
+} else if (size == 16) {
+tcg_gen_ext16s_i32(reg, reg);
+}
+}
+
+if (left) {
+tcg_gen_andi_i32(QREG_CC_C, reg, 1);
+} else {
+tcg_gen_shri_i32(QREG_CC_C, reg, 31);
+}
+
+tcg_gen_movi_i32(QREG_CC_V, 0);
+tcg_gen_mov_i32(QREG_CC_N, reg);
+tcg_gen_mov_i32(QREG_CC_Z, reg);
+}
+
+static inline void rotate_x_flags(TCGv reg, int size)
+{
+switch (size) {
+case 8:
+tcg_gen_ext8s_i32(reg, reg);
+break;
+case 16:
+tcg_gen_ext16s_i32(reg, reg);
+break;
+default:
+break;
+}
+tcg_gen_mov_i32(QREG_CC_N, reg);
+tcg_gen_mov_i32(QREG_CC_Z, reg);
+tcg_gen_mov_i32(QREG_CC_C, QREG_CC_X);
+}
+
+static inline void rotate_x(TCGv dest, TCGv X, TCGv reg, TCGv shift,
+int left, int size)
+{
+TCGv_i64 t0, shift64;
+TCGv lo, hi;
+
+shift64 = tcg_temp_new_i64();
+tcg_gen_extu_i32_i64(shift64, shift);
+
+t0 = tcg_temp_new_i64();
+
+lo = tcg_temp_new();
+hi = tcg_temp_new();
+
+if (left) {
+/* create [reg:X:..] */
+
+if (size == 32) {
+tcg_gen_shli_i32(X, QREG_CC_X, 31);
+tcg_gen_concat_i32_i64(t0, X, reg);
+} else {
+tcg_gen_shli_i32(X, reg, 1);
+tcg_gen_or_i32(X, X, QREG_CC_X);
+tcg_gen_extu_i32_i64(t0, X);
+tcg_gen_shli_i64(t0, t0, 64 - size - 1);
+}
+
+/* rotate */
+
+tcg_gen_rotl_i64(t0, t0, shift64);
+tcg_temp_free_i64(shift64);
+
+/* result is [reg:..:reg:X] */
+
+tcg_gen_extr_i64_i32(lo, hi, t0);
+tcg_gen_andi_i32(X, lo, 1);
+
+tcg_gen_shri_i32(lo, lo, 1);
+tcg_gen_shri_i32(hi, hi, 32 - size);
+tcg_gen_or_i32(dest, lo, hi);
+} else {
+if (size == 32) {
+tcg_gen_concat_i32_i64(t0, reg, QREG_CC_X);
+} else {
+tcg_gen_shli_i32(X, QREG_CC_X, size);
+tcg_gen_or_i32(X, reg, X);
+tcg_gen_extu_i32_i64(t0, X);
+}
+
+tcg_gen_rotr_i64(t0, t0, shift64);
+tcg_temp_free_i64(shift64);
+
+/* result is value: [X:reg:..:reg] */
+
+tcg_gen_extr_i64_i32(lo, hi, t0);
+
+/* extract X */
+
+tcg_gen_shri_i32(X, hi, 31);
+
+/* extract result */
+
+tcg_gen_shli_i32(hi, hi, 1);
+tcg_gen_shri_i32(hi, hi, 32 - size);
+tcg_gen_or_i32(dest, lo, hi);
+}
+tcg_temp_free(hi);
+tcg_temp_free(lo);
+tcg_temp_free_i64(t0);
+
+tcg_gen_movi_i32(QREG_CC_V, 0); /* always cleared */
+}
+
+DISAS_INSN(rotate_im)
+{
+TCGv reg;
+TCGv shift;
+int tmp;
+int left = (insn & 0x100);
+
+reg = DREG(insn, 0);
+tmp = (insn >> 9) & 7;
+tmp = ((tmp - 1) & 7) + 1; /* 1..8 */
+
+shift = tcg_const_i32(tmp);
+if (insn & 8) {
+rotate(reg, shift, left, 32);
+} else {
+rotate_x(reg, QREG_CC_X, reg, shift, left, 32);
+rotate_x_flags(reg, 32);
+}
+tcg_temp_free(shift);
+
+set_cc_op(s, CC_OP_FLAGS);
+}
+
+DISAS_INSN(rotate8_im)
+{
+int left = (insn & 0x100);
+TCGv reg;
+TCGv shift;
+int tmp;
+
+reg = gen_extend(DREG(insn, 0), OS_BYTE, 0);
+
+tmp = (insn >> 9) & 7;
+tmp = ((tmp - 1) & 7) + 1; /* 1..8 */
+
+shift = tcg_const_i32(tmp);
+if (insn & 8) {
+rotate(reg, shift, left, 8);
+} else {
+rotate_x(reg, QREG_CC_X, reg, shift, left, 8);
+rotate_x_flags(reg, 8);
+}
+gen_partset_reg(OS_BYTE, DREG(insn, 0), reg);
+set_cc_op(s, CC_OP_FLAGS);
+}
+
+DISAS_INSN(rotate16_im)
+{
+int left = (insn & 0x100);
+TCGv reg;
+TCGv shift;
+int tmp;
+
+reg = gen_extend(DREG(insn, 0), OS_WORD, 0);
+tmp = (insn >> 9) & 7;
+tmp = ((tmp - 1) & 7) + 1; /* 1..8 */
+
+shift = tcg_const_i32(tmp);
+if (insn & 8) {
+rotate(reg, shift, left, 16);

[Qemu-devel] [PATCH 37/52] target-m68k: add cas/cas2 ops

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 linux-user/main.c   | 193 
 target-m68k/cpu.h   |   9 +++
 target-m68k/qregs.def   |   5 ++
 target-m68k/translate.c | 175 +++
 4 files changed, 382 insertions(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index 74b02c7..3c51afe 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2994,6 +2994,194 @@ void cpu_loop(CPUMBState *env)
 
 #ifdef TARGET_M68K
 
+static int do_cas(CPUM68KState *env)
+{
+int size, is_cas;
+int cmp1_reg, upd1_reg;
+int cmp2_reg, upd2_reg;
+uint32_t dest1, cmp1, addr1;
+uint32_t dest2, cmp2, addr2;
+int segv = 0;
+int z;
+
+start_exclusive();
+
+/* cas_param bits
+ * 31-> CAS(0) / CAS2(1)
+ * 11:13 -> update reg 2
+ * 8:10  -> cmp reg 2
+ * 5:7   -> update reg 1
+ * 2:4   -> cmp reg 1
+ * 0:1   -> opsize
+ */
+
+is_cas = (env->cas_param & 0x8000) == 0;
+
+size = env->cas_param & 0x3;
+
+cmp1_reg = (env->cas_param >> 2) & 7;
+upd1_reg = (env->cas_param >> 5) & 7;
+cmp2_reg = (env->cas_param >> 8) & 7;
+upd2_reg = (env->cas_param >> 11) & 7;
+
+addr1 = env->cas_addr1;
+addr2 = env->cas_addr2;
+
+switch (size) {
+case OS_BYTE:
+segv = get_user_u8(dest1, addr1);
+cmp1 = (uint8_t)env->dregs[cmp1_reg];
+break;
+case OS_WORD:
+segv = get_user_u16(dest1, addr1);
+cmp1 = (uint16_t)env->dregs[cmp1_reg];
+break;
+case OS_LONG:
+default:
+segv = get_user_u32(dest1, addr1);
+cmp1 = env->dregs[cmp1_reg];
+break;
+}
+if (segv) {
+env->mmu.ar = addr1;
+goto done;
+}
+env->cc_n = dest1;
+env->cc_v = cmp1;
+z = dest1 - cmp1;
+env->cc_op = CC_OP_CMPB + size;
+
+if (is_cas) {
+/* CAS */
+
+/* if (addr1) == cmp1 then (addr1) = upd1 */
+
+if (z == 0) {
+switch (size) {
+case OS_BYTE:
+segv = put_user_u8(env->dregs[upd1_reg], addr1);
+break;
+case OS_WORD:
+segv = put_user_u16(env->dregs[upd1_reg], addr1);
+break;
+case OS_LONG:
+segv = put_user_u32(env->dregs[upd1_reg], addr1);
+break;
+default:
+break;
+}
+if (segv) {
+env->mmu.ar = addr1;
+}
+goto done;
+}
+/* else cmp1 = (addr1) */
+switch (size) {
+case OS_BYTE:
+env->dregs[cmp1_reg] = deposit32(env->dregs[cmp1_reg],
+0, 8, dest1);
+break;
+case OS_WORD:
+env->dregs[cmp1_reg] = deposit32(env->dregs[cmp1_reg],
+0, 16, dest1);
+break;
+case OS_LONG:
+env->dregs[cmp1_reg] = dest1;
+break;
+default:
+break;
+}
+} else {
+/* CAS2 */
+switch (size) {
+case OS_BYTE:
+segv = get_user_u8(dest2, addr2);
+cmp2 = (uint8_t)env->dregs[cmp2_reg];
+break;
+case OS_WORD:
+segv = get_user_u16(dest2, addr2);
+cmp2 = (uint16_t)env->dregs[cmp2_reg];
+break;
+case OS_LONG:
+default:
+segv = get_user_u32(dest2, addr2);
+cmp2 = env->dregs[cmp2_reg];
+break;
+}
+if (segv) {
+env->mmu.ar = addr2;
+goto done;
+}
+/* if (addr1) == cmp1 && (addr2) == cmp2 then
+ *(addr1) = upd1, (addr2) = udp2
+ */
+if (z == 0) {
+z = dest2 - cmp2;
+}
+if (z == 0) {
+switch (size) {
+case OS_BYTE:
+segv = put_user_u8(env->dregs[upd1_reg], addr1);
+break;
+case OS_WORD:
+segv = put_user_u16(env->dregs[upd1_reg], addr1);
+break;
+case OS_LONG:
+segv = put_user_u32(env->dregs[upd1_reg], addr1);
+break;
+default:
+break;
+}
+if (segv) {
+env->mmu.ar = addr1;
+}
+switch (size) {
+case OS_BYTE:
+segv = put_user_u8(env->dregs[upd2_reg], addr2);
+break;
+case OS_WORD:
+segv = put_user_u16(env->dregs[upd2_reg], addr2);
+break;
+case OS_LONG:
+segv = put_user_u32(env->dregs[upd2_reg], addr2);
+break;
+default:
+break;
+}
+if (segv) {
+env->mmu.ar = addr2;
+}
+goto done;
+}
+/* else cmp1 = (addr1), cmp2

[Qemu-devel] [PATCH 34/52] target-m68k: add 64bit mull

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 54 ++---
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index b47f9c1..1d05c6a 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -2128,24 +2128,54 @@ DISAS_INSN(tas)
 DISAS_INSN(mull)
 {
 uint16_t ext;
-TCGv reg;
 TCGv src1;
-TCGv dest;
+int sign;
 
-/* The upper 32 bits of the product are discarded, so
-   muls.l and mulu.l are functionally equivalent.  */
 ext = read_im16(env, s);
-if (ext & 0x87ff) {
-gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
+
+sign = ext & 0x800;
+
+if (ext & 0x400) {
+if (!m68k_feature(s->env, M68K_FEATURE_QUAD_MULDIV)) {
+gen_exception(s, s->pc - 4, EXCP_UNSUPPORTED);
+return;
+}
+
+SRC_EA(env, src1, OS_LONG, 0, NULL);
+
+if (sign) {
+tcg_gen_muls2_i32(DREG(ext, 12), DREG(ext, 0), src1, DREG(ext, 
12));
+} else {
+tcg_gen_mulu2_i32(DREG(ext, 12), DREG(ext, 0), src1, DREG(ext, 
12));
+}
+
+tcg_gen_movi_i32(QREG_CC_V, 0);
+tcg_gen_mov_i32(QREG_CC_C, QREG_CC_V);
+tcg_gen_mov_i32(QREG_CC_N, DREG(ext, 0));
+tcg_gen_or_i32(QREG_CC_Z, DREG(ext, 12), DREG(ext, 0));
+
+set_cc_op(s, CC_OP_FLAGS);
 return;
 }
-reg = DREG(ext, 12);
 SRC_EA(env, src1, OS_LONG, 0, NULL);
-dest = tcg_temp_new();
-tcg_gen_mul_i32(dest, src1, reg);
-tcg_gen_mov_i32(reg, dest);
-/* Unlike m68k, coldfire always clears the overflow bit.  */
-gen_logic_cc(s, dest, OS_LONG);
+if (m68k_feature(s->env, M68K_FEATURE_M68000)) {
+if (sign) {
+tcg_gen_muls2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
+} else {
+tcg_gen_mulu2_i32(QREG_CC_N, QREG_CC_V, src1, DREG(ext, 12));
+}
+tcg_gen_mov_i32(DREG(ext, 12), QREG_CC_N);
+
+tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
+tcg_gen_movi_i32(QREG_CC_C, 0);
+
+set_cc_op(s, CC_OP_FLAGS);
+} else {
+/* The upper 32 bits of the product are discarded, so
+   muls.l and mulu.l are functionally equivalent.  */
+tcg_gen_mul_i32(DREG(ext, 12), src1, DREG(ext, 12));
+gen_logic_cc(s, DREG(ext, 12), OS_LONG);
+}
 }
 
 DISAS_INSN(link)
-- 
2.5.5

[Qemu-devel] [PATCH 33/52] target-m68k: inline divu/divs

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 linux-user/main.c   |   7 ++
 target-m68k/cpu.h   |   4 -
 target-m68k/helper.h|   2 -
 target-m68k/op_helper.c |  49 
 target-m68k/qregs.def   |   2 -
 target-m68k/translate.c | 198 +++-
 6 files changed, 169 insertions(+), 93 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 5f3ec97..74b02c7 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3034,6 +3034,13 @@ void cpu_loop(CPUM68KState *env)
 info._sifields._sigfault._addr = env->pc;
 queue_signal(env, info.si_signo, &info);
 break;
+case EXCP_DIV0:
+info.si_signo = TARGET_SIGFPE;
+info.si_errno = 0;
+info.si_code = TARGET_FPE_INTDIV;
+info._sifields._sigfault._addr = env->pc;
+queue_signal(env, info.si_signo, &info);
+break;
 case EXCP_TRAP0:
 {
 ts->sim_syscalls = 0;
diff --git a/target-m68k/cpu.h b/target-m68k/cpu.h
index 9415d2b..b300a92 100644
--- a/target-m68k/cpu.h
+++ b/target-m68k/cpu.h
@@ -94,10 +94,6 @@ typedef struct CPUM68KState {
 uint32_t macsr;
 uint32_t mac_mask;
 
-/* Temporary storage for DIV helpers.  */
-uint32_t div1;
-uint32_t div2;
-
 /* MMU status.  */
 struct {
 uint32_t ar;
diff --git a/target-m68k/helper.h b/target-m68k/helper.h
index 0b819cb..80ee4f8 100644
--- a/target-m68k/helper.h
+++ b/target-m68k/helper.h
@@ -2,8 +2,6 @@ DEF_HELPER_1(bitrev, i32, i32)
 DEF_HELPER_1(ff1, i32, i32)
 DEF_HELPER_2(bfffo, i32, i32, i32)
 DEF_HELPER_FLAGS_2(sats, TCG_CALL_NO_RWG_SE, i32, i32, i32)
-DEF_HELPER_2(divu, void, env, i32)
-DEF_HELPER_2(divs, void, env, i32)
 DEF_HELPER_2(set_sr, void, env, i32)
 DEF_HELPER_3(movec, void, env, i32, i32)
 
diff --git a/target-m68k/op_helper.c b/target-m68k/op_helper.c
index 71caba9..bf3c813 100644
--- a/target-m68k/op_helper.c
+++ b/target-m68k/op_helper.c
@@ -245,52 +245,3 @@ void HELPER(bitfield_store)(CPUM68KState *env, uint32_t 
addr, uint32_t offset,
 break;
 }
 }
-
-void HELPER(divu)(CPUM68KState *env, uint32_t word)
-{
-uint32_t num;
-uint32_t den;
-uint32_t quot;
-uint32_t rem;
-
-num = env->div1;
-den = env->div2;
-/* ??? This needs to make sure the throwing location is accurate.  */
-if (den == 0) {
-raise_exception(env, EXCP_DIV0);
-}
-quot = num / den;
-rem = num % den;
-
-env->cc_v = (word && quot > 0x ? -1 : 0);
-env->cc_z = quot;
-env->cc_n = quot;
-env->cc_c = 0;
-
-env->div1 = quot;
-env->div2 = rem;
-}
-
-void HELPER(divs)(CPUM68KState *env, uint32_t word)
-{
-int32_t num;
-int32_t den;
-int32_t quot;
-int32_t rem;
-
-num = env->div1;
-den = env->div2;
-if (den == 0) {
-raise_exception(env, EXCP_DIV0);
-}
-quot = num / den;
-rem = num % den;
-
-env->cc_v = (word && quot != (int16_t)quot ? -1 : 0);
-env->cc_z = quot;
-env->cc_n = quot;
-env->cc_c = 0;
-
-env->div1 = quot;
-env->div2 = rem;
-}
diff --git a/target-m68k/qregs.def b/target-m68k/qregs.def
index 156c0f5..51ff43b 100644
--- a/target-m68k/qregs.def
+++ b/target-m68k/qregs.def
@@ -7,7 +7,5 @@ DEFO32(CC_C, cc_c)
 DEFO32(CC_N, cc_n)
 DEFO32(CC_V, cc_v)
 DEFO32(CC_Z, cc_z)
-DEFO32(DIV1, div1)
-DEFO32(DIV2, div2)
 DEFO32(MACSR, macsr)
 DEFO32(MAC_MASK, mac_mask)
diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 00fd2f1..b47f9c1 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1047,11 +1047,19 @@ static void gen_jmp(DisasContext *s, TCGv dest)
 s->is_jmp = DISAS_JUMP;
 }
 
+static inline void gen_raise_exception(int nr)
+{
+TCGv_i32 tmp = tcg_const_i32(nr);
+
+gen_helper_raise_exception(cpu_env, tmp);
+tcg_temp_free_i32(tmp);
+}
+
 static void gen_exception(DisasContext *s, uint32_t where, int nr)
 {
 update_cc_op(s);
 gen_jmp_im(s, where);
-gen_helper_raise_exception(cpu_env, tcg_const_i32(nr));
+gen_raise_exception(nr);
 }
 
 static inline void gen_addr_fault(DisasContext *s)
@@ -1179,64 +1187,182 @@ DISAS_INSN(mulw)
 
 DISAS_INSN(divw)
 {
-TCGv reg;
-TCGv tmp;
-TCGv src;
+TCGLabel *l1;
+TCGv t0, src;
+TCGv quot, rem;
 int sign;
 
 sign = (insn & 0x100) != 0;
-reg = DREG(insn, 9);
-if (sign) {
-tcg_gen_ext16s_i32(QREG_DIV1, reg);
-} else {
-tcg_gen_ext16u_i32(QREG_DIV1, reg);
-}
-SRC_EA(env, src, OS_WORD, sign, NULL);
-tcg_gen_mov_i32(QREG_DIV2, src);
+
+tcg_gen_movi_i32(QREG_CC_C, 0); /* C is always cleared, use as 0 */
+
+/* dest.l / src.w */
+
+SRC_EA(env, t0, OS_WORD, sign, NULL);
+
+src = tcg_temp_local_new_i32();
+tcg_gen_mov_i32(src, t0);
+l1 = gen_new_label();
+tcg_gen_brcondi_i32(TCG_COND_NE, src, 0, l1);
+tcg_gen_movi_i32(QREG_PC, s->insn_pc);
+gen_raise_exception(E

[Qemu-devel] [PATCH 30/52] target-m68k: add scc/dbcc

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 5914185..cd656fe 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1096,6 +1096,49 @@ static void gen_jmp_tb(DisasContext *s, int n, uint32_t 
dest)
 s->is_jmp = DISAS_TB_JUMP;
 }
 
+DISAS_INSN(scc_mem)
+{
+TCGLabel *l1;
+int cond;
+TCGv dest;
+
+l1 = gen_new_label();
+cond = (insn >> 8) & 0xf;
+dest = tcg_temp_local_new();
+tcg_gen_movi_i32(dest, 0);
+gen_jmpcc(s, cond ^ 1, l1);
+tcg_gen_movi_i32(dest, 0xff);
+gen_set_label(l1);
+DEST_EA(env, insn, OS_BYTE, dest, NULL);
+tcg_temp_free(dest);
+}
+
+DISAS_INSN(dbcc)
+{
+TCGLabel *l1;
+TCGv reg;
+TCGv tmp;
+int16_t offset;
+uint32_t base;
+
+reg = DREG(insn, 0);
+base = s->pc;
+offset = (int16_t)read_im16(env, s);
+l1 = gen_new_label();
+gen_jmpcc(s, (insn >> 8) & 0xf, l1);
+
+tmp = tcg_temp_new();
+tcg_gen_ext16s_i32(tmp, reg);
+tcg_gen_addi_i32(tmp, tmp, -1);
+gen_partset_reg(OS_WORD, reg, tmp);
+tcg_gen_brcondi_i32(TCG_COND_EQ, tmp, -1, l1);
+update_cc_op(s);
+gen_jmp_tb(s, 1, base + offset);
+gen_set_label(l1);
+update_cc_op(s);
+gen_jmp_tb(s, 0, s->pc);
+}
+
 DISAS_INSN(undef_mac)
 {
 gen_exception(s, s->pc - 2, EXCP_LINEA);
@@ -3292,6 +3335,9 @@ void register_m68k_insns (CPUM68KState *env)
 INSN(addsubq,   5000, f080, M68000);
 INSN(addsubq,   5080, f0c0, M68000);
 INSN(scc,   50c0, f0f8, CF_ISA_A);
+INSN(scc_mem,   50c0, f0c0, M68000);
+INSN(scc,   50c0, f0f8, M68000);
+INSN(dbcc,  50c8, f0f8, M68000);
 INSN(addsubq,   5080, f1c0, CF_ISA_A);
 INSN(tpf,   51f8, fff8, CF_ISA_A);
 
-- 
2.5.5

[Qemu-devel] [PATCH 29/52] target-m68k: factorize flags computing

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/helper.c | 125 +--
 1 file changed, 42 insertions(+), 83 deletions(-)

diff --git a/target-m68k/helper.c b/target-m68k/helper.c
index 42a2f1c..e9e7cee 100644
--- a/target-m68k/helper.c
+++ b/target-m68k/helper.c
@@ -531,6 +531,46 @@ void HELPER(mac_set_flags)(CPUM68KState *env, uint32_t acc)
 }
 }
 
+
+#define COMPUTE_CCR(op, x, n, z, v, c) {   \
+switch (op) {  \
+case CC_OP_FLAGS:  \
+/* Everything in place.  */\
+break; \
+case CC_OP_ADD:\
+res = n;   \
+src2 = v;  \
+src1 = res - src2; \
+c = x; \
+z = n; \
+v = (res ^ src1) & ~(src1 ^ src2); \
+break; \
+case CC_OP_SUB:\
+res = n;   \
+src2 = v;  \
+src1 = res + src2; \
+c = x; \
+z = n; \
+v = (res ^ src1) & (src1 ^ src2);  \
+break; \
+case CC_OP_CMP:\
+src1 = n;  \
+src2 = v;  \
+res = src1 - src2; \
+n = res;   \
+z = res;   \
+c = src1 < src2;   \
+v = (res ^ src1) & (src1 ^ src2);  \
+break; \
+case CC_OP_LOGIC:  \
+c = v = 0; \
+z = n; \
+break; \
+default:   \
+cpu_abort(CPU(m68k_env_get_cpu(env)), "Bad CC_OP %d", op); \
+}  \
+} while (0)
+
 uint32_t cpu_m68k_get_ccr(CPUM68KState *env)
 {
 uint32_t x, c, n, z, v;
@@ -542,47 +582,7 @@ uint32_t cpu_m68k_get_ccr(CPUM68KState *env)
 z = env->cc_z;
 v = env->cc_v;
 
-switch (env->cc_op) {
-case CC_OP_FLAGS:
-/* Everything in place.  */
-break;
-
-case CC_OP_ADD:
-res = n;
-src2 = v;
-src1 = res - src2;
-c = x;
-z = n;
-v = (res ^ src1) & ~(src1 ^ src2);
-break;
-
-case CC_OP_SUB:
-res = n;
-src2 = v;
-src1 = res + src2;
-c = x;
-z = n;
-v = (res ^ src1) & (src1 ^ src2);
-break;
-
-case CC_OP_CMP:
-src1 = n;
-src2 = v;
-res = src1 - src2;
-n = res;
-z = res;
-c = src1 < src2;
-v = (res ^ src1) & (src1 ^ src2);
-break;
-
-case CC_OP_LOGIC:
-c = v = 0;
-z = n;
-break;
-
-default:
-cpu_abort(CPU(m68k_env_get_cpu(env)), "Bad CC_OP %d", env->cc_op);
-}
+COMPUTE_CCR(env->cc_op, x, n, z, v, c);
 
 n = n >> 31;
 v = v >> 31;
@@ -615,48 +615,7 @@ void HELPER(flush_flags)(CPUM68KState *env, uint32_t cc_op)
 {
 uint32_t res, src1, src2;
 
-switch (cc_op) {
-case CC_OP_FLAGS:
-/* Everything up to date.  */
-return;
-
-case CC_OP_ADD:
-res = env->cc_n;
-src2 = env->cc_v;
-src1 = res - src2;
-env->cc_z = res;
-env->cc_c = env->cc_x;
-env->cc_v = (res ^ src1) & ~(src1 ^ src2);
-break;
-
-case CC_OP_SUB:
-res = env->cc_n;
-src2 = env->cc_v;
-src1 = res + src2;
-env->cc_z = res;

[Qemu-devel] [PATCH 31/52] target-m68k: some bit ops cleanup

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 34 +++---
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index cd656fe..817f0b3 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1300,39 +1300,36 @@ DISAS_INSN(bitop_reg)
 else
 opsize = OS_LONG;
 op = (insn >> 6) & 3;
-
-gen_flush_flags(s);
-
 SRC_EA(env, src1, opsize, 0, op ? &addr: NULL);
-src2 = DREG(insn, 9);
-dest = tcg_temp_new();
 
-tmp = tcg_temp_new();
+gen_flush_flags(s);
+src2 = tcg_temp_new();
 if (opsize == OS_BYTE)
-tcg_gen_andi_i32(tmp, src2, 7);
+tcg_gen_andi_i32(src2, DREG(insn, 9), 7);
 else
-tcg_gen_andi_i32(tmp, src2, 31);
+tcg_gen_andi_i32(src2, DREG(insn, 9), 31);
 
-src2 = tcg_const_i32(1);
-tcg_gen_shl_i32(src2, src2, tmp);
-tcg_temp_free(tmp);
+tmp = tcg_const_i32(1);
+tcg_gen_shl_i32(tmp, tmp, src2);
+tcg_temp_free(src2);
 
-tcg_gen_and_i32(QREG_CC_Z, src1, src2);
+tcg_gen_and_i32(QREG_CC_Z, src1, tmp);
 
+dest = tcg_temp_new();
 switch (op) {
 case 1: /* bchg */
-tcg_gen_xor_i32(dest, src1, src2);
+tcg_gen_xor_i32(dest, src1, tmp);
 break;
 case 2: /* bclr */
-tcg_gen_andc_i32(dest, src1, src2);
+tcg_gen_andc_i32(dest, src1, tmp);
 break;
 case 3: /* bset */
-tcg_gen_or_i32(dest, src1, src2);
+tcg_gen_or_i32(dest, src1, tmp);
 break;
 default: /* btst */
 break;
 }
-tcg_temp_free(src2);
+tcg_temp_free(tmp);
 if (op) {
 DEST_EA(env, insn, opsize, dest, &addr);
 }
@@ -1416,17 +1413,16 @@ DISAS_INSN(bitop_im)
 return;
 }
 
-gen_flush_flags(s);
-
 SRC_EA(env, src1, opsize, 0, op ? &addr: NULL);
 
+gen_flush_flags(s);
 if (opsize == OS_BYTE)
 bitnum &= 7;
 else
 bitnum &= 31;
 mask = 1 << bitnum;
 
-tcg_gen_andi_i32(QREG_CC_Z, src1, mask);
+   tcg_gen_andi_i32(QREG_CC_Z, src1, mask);
 
 if (op) {
 tmp = tcg_temp_new();
-- 
2.5.5

[Qemu-devel] [PATCH 28/52] target-m68k: add addx/subx/negx ops

2016-05-04 Thread Laurent Vivier

Signed-off-by: Laurent Vivier 
---
 target-m68k/translate.c | 200 ++--
 1 file changed, 160 insertions(+), 40 deletions(-)

diff --git a/target-m68k/translate.c b/target-m68k/translate.c
index 13ae953..5914185 100644
--- a/target-m68k/translate.c
+++ b/target-m68k/translate.c
@@ -1503,27 +1503,46 @@ DISAS_INSN(move)
 
 DISAS_INSN(negx)
 {
-TCGv reg, z;
+TCGv z;
+TCGv src;
+TCGv addr;
+int opsize;
 
-reg = DREG(insn, 0);
+opsize = insn_opsize(insn);
+SRC_EA(env, src, opsize, 1, &addr);
+
+gen_flush_flags(s); /* compute old Z */
+
+/* Perform substract with borrow.
+ * (X, N) =  -(src + X);
+ */
 
-/* Perform subtraction with borrow.  */
 z = tcg_const_i32(0);
-tcg_gen_add2_i32(QREG_CC_N, QREG_CC_X, reg, z, QREG_CC_X, z);
+tcg_gen_add2_i32(QREG_CC_N, QREG_CC_X, src, z, QREG_CC_X, z);
 tcg_gen_sub2_i32(QREG_CC_N, QREG_CC_X, z, z, QREG_CC_N, QREG_CC_X);
 tcg_temp_free(z);
+gen_ext(QREG_CC_N, QREG_CC_N, opsize, 1);
+
 tcg_gen_andi_i32(QREG_CC_X, QREG_CC_X, 1);
 
 /* Compute signed-overflow for negation.  The normal formula for
-   subtraction is (res ^ src1) & (src1 ^ src2), but with src1==0
-   this simplies to res & src2.  */
-tcg_gen_and_i32(QREG_CC_V, QREG_CC_N, reg);
+ * subtraction is (res ^ src) & (src ^ dest), but with dest==0
+ * this simplies to res & src.
+ */
+
+tcg_gen_and_i32(QREG_CC_V, QREG_CC_N, src);
 
 /* Copy the rest of the results into place.  */
-tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
+tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_N); /* !Z is sticky */
+gen_ext(QREG_CC_Z, QREG_CC_Z, opsize, 0);
+
 tcg_gen_mov_i32(QREG_CC_C, QREG_CC_X);
-tcg_gen_mov_i32(reg, QREG_CC_N);
+
 set_cc_op(s, CC_OP_FLAGS);
+
+/* result is in QREG_CC_N */
+
+DEST_EA(env, insn, opsize, QREG_CC_N, &addr);
 }
 
 DISAS_INSN(lea)
@@ -1943,32 +1962,78 @@ DISAS_INSN(suba)
 tcg_gen_sub_i32(reg, reg, src);
 }
 
-DISAS_INSN(subx)
+static inline void gen_subx(DisasContext *s, TCGv src, TCGv dest, int opsize)
 {
-TCGv reg, src, z;
+TCGv tmp;
 
-reg = DREG(insn, 9);
-src = DREG(insn, 0);
+gen_flush_flags(s); /* compute old Z */
 
-/* Perform subtract with borrow.  */
-z = tcg_const_i32(0);
-tcg_gen_add2_i32(QREG_CC_N, QREG_CC_X, src, z, QREG_CC_X, z);
-tcg_gen_sub2_i32(QREG_CC_N, QREG_CC_X, reg, z, QREG_CC_N, QREG_CC_X);
-tcg_temp_free(z);
+/* Perform substract with borrow.
+ * (X, N) = dest - (src + X);
+ */
+
+tmp = tcg_const_i32(0);
+tcg_gen_add2_i32(QREG_CC_N, QREG_CC_X, src, tmp, QREG_CC_X, tmp);
+tcg_gen_sub2_i32(QREG_CC_N, QREG_CC_X, dest, tmp, QREG_CC_N, QREG_CC_X);
+gen_ext(QREG_CC_N, QREG_CC_N, opsize, 1);
 tcg_gen_andi_i32(QREG_CC_X, QREG_CC_X, 1);
 
-/* Compute signed-overflow for subtraction.  */
-tcg_gen_xor_i32(QREG_CC_V, QREG_CC_N, reg);
-tcg_gen_xor_i32(QREG_CC_Z, reg, src);
-tcg_gen_and_i32(QREG_CC_V, QREG_CC_V, QREG_CC_Z);
+/* Compute signed-overflow for substract.  */
+
+tcg_gen_xor_i32(QREG_CC_V, QREG_CC_N, dest);
+tcg_gen_xor_i32(tmp, dest, src);
+tcg_gen_and_i32(QREG_CC_V, QREG_CC_V, tmp);
+tcg_temp_free(tmp);
 
 /* Copy the rest of the results into place.  */
-tcg_gen_mov_i32(QREG_CC_Z, QREG_CC_N);
+tcg_gen_or_i32(QREG_CC_Z, QREG_CC_Z, QREG_CC_N); /* !Z is sticky */
+gen_ext(QREG_CC_Z, QREG_CC_Z, opsize, 0);
+
 tcg_gen_mov_i32(QREG_CC_C, QREG_CC_X);
-tcg_gen_mov_i32(reg, QREG_CC_N);
+
 set_cc_op(s, CC_OP_FLAGS);
+
+/* result is in QREG_CC_N */
+}
+
+DISAS_INSN(subx_reg)
+{
+TCGv dest;
+TCGv src;
+int opsize;
+
+opsize = insn_opsize(insn);
+
+src = gen_extend(DREG(insn, 0), opsize, 1);
+dest = gen_extend(DREG(insn, 9), opsize, 1);
+
+gen_subx(s, src, dest, opsize);
+
+gen_partset_reg(opsize, DREG(insn, 9), QREG_CC_N);
 }
 
+DISAS_INSN(subx_mem)
+{
+TCGv src;
+TCGv addr_src;
+TCGv dest;
+TCGv addr_dest;
+int opsize;
+
+opsize = insn_opsize(insn);
+
+addr_src = AREG(insn, 0);
+tcg_gen_subi_i32(addr_src, addr_src, opsize);
+src = gen_load(s, opsize, addr_src, 1);
+
+addr_dest = AREG(insn, 9);
+tcg_gen_subi_i32(addr_dest, addr_dest, opsize);
+dest = gen_load(s, opsize, addr_dest, 1);
+
+gen_subx(s, src, dest, opsize);
+
+gen_store(s, opsize, addr_dest, QREG_CC_N);
+}
 DISAS_INSN(mov3q)
 {
 TCGv src;
@@ -2058,29 +2123,76 @@ DISAS_INSN(adda)
 tcg_gen_add_i32(reg, reg, src);
 }
 
-DISAS_INSN(addx)
+static inline void gen_addx(DisasContext *s, TCGv src, TCGv dest, int opsize)
 {
-TCGv reg, src, z;
+TCGv tmp;
 
-reg = DREG(insn, 9);
-src = DREG(insn, 0);
+gen_flush_flags(s); /* compute old Z */
 
-/* Perform addition with carry.  */
-z = tcg_const_i32(0);
-tcg_gen_add2_i32(QREG_CC_N, QREG_CC_X, QREG_CC_X, z, reg, z);
-tcg_gen_add2_i32(QREG_CC_N, QREG_CC_X

1 2 3 4 >

1 - 100 of 306 matches

Mail list logo