date:20110808

Re: [PATCH] Permit -mem-path without sync mmu

2011-08-08 Thread David Gibson

On Fri, Aug 05, 2011 at 12:30:53PM -0300, Marcelo Tosatti wrote:
 On Fri, Aug 05, 2011 at 08:16:42AM +0200, Jan Kiszka wrote:
  On 2011-08-05 06:02, David Gibson wrote:
   At present, an explicit test disallows use of -mem-path when kvm is 
   enabled
   but KVM_CAP_SYNC_MMU is not set.  In particular, this prevents the user
   from using hugetlbfs to back the guest memory.
   
   I can see no reason for this check, and when I asked about it previously,
   the only theory offered was that this was a limitation of the very early
   days of kvm which only happened to match the SYNC_MMU flag by accident.
   
   This patch, therefore, removes the check.  This is of particular use to
   us on POWER, where we haven't yet implement SYNC_MMU, but where backing
   the guest with hugepages is possible, and in fact mandatory (for now).
   
   Signed-off-by: David Gibson da...@gibson.dropbear.id.au
   ---
exec.c |5 -
1 files changed, 0 insertions(+), 5 deletions(-)
   
   diff --git a/exec.c b/exec.c
   index 476b507..041637c 100644
   --- a/exec.c
   +++ b/exec.c
   @@ -2818,11 +2818,6 @@ static void *file_ram_alloc(RAMBlock *block,
return NULL;
}

   -if (kvm_enabled()  !kvm_has_sync_mmu()) {
   -fprintf(stderr, host lacks kvm mmu notifiers, -mem-path 
   unsupported\n);
   -return NULL;
   -}
   -
if (asprintf(filename, %s/qemu_back_mem.XX, path) == -1) {
return NULL;
}
  
  This is nothing trivial, see ce9a92411d in qemu-kvm or
  http://thread.gmane.org/gmane.comp.emulators.kvm.devel/27380. And it
  should rather target uq/master. CCing Avi, Marcelo, and the kvm list.
  
  Jan

Well, sending the patch flushed out the real reason for that check, at
least, as I thought it might.

 Yes, the check cannot be removed because there is the possibility of
 corruption using hugepages without mmu notifiers (described in the 
 archived message above).

Ok, so.  If I understand the archived message correctly.  First, this
check *is* all about hugepages - which is not obvious from the test
itself.

Second, if userspace qemu passing hugepages to kvm can cause (host)
kernel memory corruption, that is clearly a host kernel bug.  So am I
correct in thinking this is basically just a safety feature if qemu is
run on a buggy kernel.  Presumably this bug was corrected at some
point?  Is the presence of the SYNC_MMU feature just being used as a
proxy for is this kernel recent enough to have the corruption bug
fixed?

In any case this test sure as hell needs a big comment next to it
explaining this context.

 Why are mmu notifiers not implemented for PPC again?

It's just not done yet; we're working on it.  (That is, mmu notifiers
are certainly present on PPC, it's just they're not wired up to kvm,
yet).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm PCI assignment VFIO ramblings

2011-08-08 Thread David Gibson

On Fri, Aug 05, 2011 at 09:10:09AM -0600, Alex Williamson wrote:
 On Fri, 2011-08-05 at 20:42 +1000, Benjamin Herrenschmidt wrote:
  Right. In fact to try to clarify the problem for everybody, I think we
  can distinguish two different classes of constraints that can
  influence the grouping of devices:
  
   1- Hard constraints. These are typically devices using the same RID or
  where the RID cannot be reliably guaranteed (the later is the case with
  some PCIe-PCIX bridges which will take ownership of some transactions
  such as split but not all). Devices like that must be in the same
  domain. This is where PowerPC adds to what x86 does today the concept
  that the domains are pre-existing, since we use the RID for error
  isolation  MMIO segmenting as well. so we need to create those domains
  at boot time.
  
   2- Softer constraints. Those constraints derive from the fact that not
  applying them risks enabling the guest to create side effects outside of
  its sandbox. To some extent, there can be degrees of badness between
  the various things that can cause such constraints. Examples are shared
  LSIs (since trusting DisINTx can be chancy, see earlier discussions),
  potentially any set of functions in the same device can be problematic
  due to the possibility to get backdoor access to the BARs etc...
 
 This is what I've been trying to get to, hardware constraints vs system
 policy constraints.
 
  Now, what I derive from the discussion we've had so far, is that we need
  to find a proper fix for #1, but Alex and Avi seem to prefer that #2
  remains a matter of libvirt/user doing the right thing (basically
  keeping a loaded gun aimed at the user's foot with a very very very
  sweet trigger but heh, let's not start a flamewar here :-)
 
 Doesn't your own uncertainty of whether or not to allow this lead to the
 same conclusion, that it belongs in userspace policy?  I don't think we
 want to make white lists of which devices we trust to do DisINTx
 correctly part of the kernel interface, do we?  Thanks,

Yes, but the overall point is that both the hard and soft constraints
are much easier to handle if a group or iommu domain or whatever is a
persistent entity that can be set up once-per-boot by the admin with
whatever degree of safety they want, rather than a transient entity
tied to an fd's lifetime, which must be set up correctly, every time,
by the thing establishing it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] [NEW] cgroup test * general smoke_test + module dependend subtests (memory test included) * library for future use in other tests (kvm)

2011-08-08 Thread Jiri Zupka

I go through this and let you know.

- Original Message -
 From: root r...@dhcp-26-193.brq.redhat.com
 
 cgroup.py:
 * structure for different cgroup subtests
 * contains basic cgroup-memory test
 
 cgroup_common.py:
 * library for cgroup handling (intended to be used from kvm test in
 the future)
 * universal smoke_test for every module
 
 cgroup_client.py:
 * application which is executed and controled using cgroups
 * contains smoke, memory, cpu and devices tests which were manually
 tested to break cgroup rules and will be used in the cgroup.py
 subtests
 
 Signed-off-by: Lukas Doktor ldok...@redhat.com
 ---
 client/tests/cgroup/cgroup.py | 236 ++
 client/tests/cgroup/cgroup_client.py | 116 +
 client/tests/cgroup/control | 12 ++
 3 files changed, 364 insertions(+), 0 deletions(-)
 create mode 100755 client/tests/cgroup/cgroup.py
 create mode 100755 client/tests/cgroup/cgroup_client.py
 create mode 100644 client/tests/cgroup/control
 
 diff --git a/client/tests/cgroup/cgroup.py
 b/client/tests/cgroup/cgroup.py
 new file mode 100755
 index 000..d043d65
 --- /dev/null
 +++ b/client/tests/cgroup/cgroup.py
 @@ -0,0 +1,236 @@
 +from autotest_lib.client.bin import test
 +from autotest_lib.client.common_lib import error
 +import os, logging
 +import time
 +from cgroup_common import Cgroup as CG
 +from cgroup_common import CgroupModules
 +
 +class cgroup(test.test):
 + 
 + Tests the cgroup functionalities
 + 
 + version = 1
 + _client = 
 + modules = CgroupModules()
 +
 +
 + def run_once(self):
 + 
 + Try to access different resources which are restricted by cgroup.
 + 
 + logging.info('Start')
 +
 + err = 
 + # Run available tests
 + for i in ['memory']:
 + try:
 + if self.modules.get_pwd(i):
 + if (eval (self.test_%s() % i)):
 + err += %s,  % i
 + else:
 + logging.error(CGROUP: Skipping test_%s, module not 
 + available/mounted, i)
 + err += %s,  % i
 + except Exception, inst:
 + logging.error(CGROUP: test_%s fatal failure: %s, i, inst)
 + err += %s,  % i
 +
 + if err:
 + raise error.TestFail('CGROUP: Some subtests failed (%s)' % err[:-2])
 +
 +
 + def setup(self):
 + 
 + Setup
 + 
 + logging.info('Setup')
 +
 + self._client = os.path.join(self.bindir, cgroup_client.py)
 +
 + _modules = ['cpuset', 'ns', 'cpu', 'cpuacct', 'memory', 'devices',
 + 'freezer', 'net_cls', 'blkio']
 + if (self.modules.init(_modules) = 0):
 + raise error.TestFail('Can\'t mount any cgroup modules')
 +
 +
 + def cleanup(self):
 + 
 + Unmount all cgroups and remove directories
 + 
 + logging.info('Cleanup')
 + self.modules.cleanup()
 +
 +
 + #
 + # TESTS
 + #
 + def test_memory(self):
 + 
 + Memory test
 + 
 + # Preparation
 + logging.info(Entering 'test_memory')
 + item = CG('memory', self._client)
 + if item.initialize(self.modules):
 + logging.error(test_memory: cgroup init failed)
 + return -1
 +
 + if item.smoke_test():
 + logging.error(test_memory: smoke_test failed)
 + return -1
 +
 + pwd = item.mk_cgroup()
 + if pwd == None:
 + logging.error(test_memory: Can't create cgroup)
 + return -1
 +
 + logging.debug(test_memory: Memory filling test)
 +
 + f = open('/proc/meminfo','r')
 + mem = f.readline()
 + while not mem.startswith(MemFree):
 + mem = f.readline()
 + # Use only 1G or max of the free memory
 + mem = min(int(mem.split()[1])/1024, 1024)
 + mem = max(mem, 100) # at least 100M
 + if (item.get_property(memory.memsw.limit_in_bytes, supress=True)
 + != None):
 + memsw = True
 + # Clear swap
 + os.system(swapoff -a)
 + os.system(swapon -a)
 + f.seek(0)
 + swap = f.readline()
 + while not swap.startswith(SwapTotal):
 + swap = f.readline()
 + swap = int(swap.split()[1])/1024
 + if swap  mem / 2:
 + logging.error(Not enough swap memory to test 'memsw')
 + memsw = False
 + else:
 + # Doesn't support swap+memory limitation, disable swap
 + logging.info('memsw' not supported)
 + os.system(swapoff -a)
 + memsw = False
 + logging.debug(test_memory: Initializition passed)
 +
 + 
 + # Fill the memory without cgroup limitation
 + # Should pass
 + 
 + ps = item.test(memfill %d % mem)
 + ps.stdin.write('\n')
 + i = 0
 + while ps.poll() == None:
 + if i  60:
 + break
 + i += 1
 + time.sleep(1)
 + if i  60:
 + logging.error(test_memory: Memory filling failed (WO cgroup))
 + ps.terminate()
 + return -1
 + if not ps.stdout.readlines()[-1].startswith(PASS):
 + logging.error(test_memory: Unsuccessful memory filling 
 + (WO cgroup))
 + return -1
 + logging.debug(test_memory: Memfill WO cgroup passed)
 +
 + 
 + # Fill the memory with 1/2 memory limit
 + # memsw: should swap out part of the process and pass
 + # WO memsw: should fail (SIGKILL)
 + 
 + ps = item.test(memfill %d % mem)
 + if item.set_cgroup(ps.pid, pwd):
 + logging.error(test_memory:

Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device

2011-08-08 Thread Liu Yuan


On 08/08/2011 01:04 PM, Badari Pulavarty wrote:

On 8/7/2011 6:35 PM, Liu Yuan wrote:

On 08/06/2011 02:02 AM, Badari Pulavarty wrote:

On 8/5/2011 4:04 AM, Liu Yuan wrote:

On 08/05/2011 05:58 AM, Badari Pulavarty wrote:

Hi Liu Yuan,

I started testing your patches. I applied your kernel patch to 3.0
and applied QEMU to latest git.

I passed 6 blockdevices from the host to guest (4 vcpu, 4GB RAM).
I ran simple dd read tests from the guest on all block devices
(with various blocksizes, iflag=direct).

Unfortunately, system doesn't stay up. I immediately get into
panic on the host. I didn't get time to debug the problem. Wondering
if you have seen this issue before and/or you have new patchset
to try ?

Let me know.

Thanks,
Badari



Okay, it is actually a bug pointed out by MST on the other thread, 
that it needs a mutex for completion thread.


Now would you please this attachment?This patch only applies to 
kernel part, on top of v1 kernel patch.


This patch mainly moves completion thread into vhost thread as a 
function. As a result, both requests submitting and completion 
signalling is in the same thread.


Yuan


Unfortunately, dd tests (4 out of 6) in the guest hung. I see 
following messages


virtio_blk virtio2: requests: id 0 is not a head !
virtio_blk virtio3: requests: id 1 is not a head !
virtio_blk virtio5: requests: id 1 is not a head !
virtio_blk virtio1: requests: id 1 is not a head !

I still see host panics. I will collect the host panic and see if 
its still same or not.


Thanks,
Badari


Would you please show me how to reproduce it step by step? I tried dd 
with two block device attached, but didn't get hung nor panic.


Yuan


I did 6 dds on 6 block devices..

dd if=/dev/vdb of=/dev/null bs=1M iflag=direct 
dd if=/dev/vdc of=/dev/null bs=1M iflag=direct 
dd if=/dev/vdd of=/dev/null bs=1M iflag=direct 
dd if=/dev/vde of=/dev/null bs=1M iflag=direct 
dd if=/dev/vdf of=/dev/null bs=1M iflag=direct 
dd if=/dev/vdg of=/dev/null bs=1M iflag=direct 

I can reproduce the problem with in 3 minutes :(

Thanks,
Badari


Ah...I made an embarrassing mistake that I tried to 'free()' an 
kmem_cache object.


Would you please revert the vblk-for-kernel-2 patch and apply the new 
one attached in this letter?


Yuan,
Thanks
diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
index ecaf6fe..7a24aba 100644
--- a/drivers/vhost/blk.c
+++ b/drivers/vhost/blk.c
@@ -47,6 +47,7 @@ struct vhost_blk {
 	struct eventfd_ctx *ectx;
 	struct file *efile;
 	struct task_struct *worker;
+	struct vhost_poll poll;
 };
 
 struct used_info {
@@ -62,6 +63,7 @@ static struct kmem_cache *used_info_cachep;
 static void blk_flush(struct vhost_blk *blk)
 {
vhost_poll_flush(blk-vq.poll);
+   vhost_poll_flush(blk-poll);
 }
 
 static long blk_set_features(struct vhost_blk *blk, u64 features)
@@ -146,11 +148,11 @@ static long blk_reset_owner(struct vhost_blk *b)
 blk_stop(b);
 blk_flush(b);
 ret = vhost_dev_reset_owner(b-dev);
-	if (b-worker) {
-		b-should_stop = 1;
-		smp_mb();
-		eventfd_signal(b-ectx, 1);
-	}
+//	if (b-worker) {
+//		b-should_stop = 1;
+//		smp_mb();
+//		eventfd_signal(b-ectx, 1);
+//	}
 err:
 mutex_unlock(b-dev.mutex);
 return ret;
@@ -361,8 +363,8 @@ static long vhost_blk_ioctl(struct file *f, unsigned int ioctl,
 		default:
 			mutex_lock(blk-dev.mutex);
 			ret = vhost_dev_ioctl(blk-dev, ioctl, arg);
-			if (!ret  ioctl == VHOST_SET_OWNER)
-ret = blk_set_owner(blk);
+//			if (!ret  ioctl == VHOST_SET_OWNER)
+//ret = blk_set_owner(blk);
 			blk_flush(blk);
 			mutex_unlock(blk-dev.mutex);
 			break;
@@ -480,10 +482,50 @@ static void handle_guest_kick(struct vhost_work *work)
 	handle_kick(blk);
 }
 
+static void handle_completetion(struct vhost_work* work)
+{
+	struct vhost_blk *blk = container_of(work, struct vhost_blk, poll.work);
+	struct timespec ts = { 0 };
+	int ret, i, nr;
+	u64 count;
+
+	do {
+		ret = eventfd_ctx_read(blk-ectx, 1, count);
+	} while (unlikely(ret == -ERESTARTSYS));
+
+	do {
+		nr = kernel_read_events(blk-ioctx, count, MAX_EVENTS, events, ts);
+	} while (unlikely(nr == -EINTR));
+	dprintk(%s, count %llu, nr %d\n, __func__, count, nr);
+
+	if (unlikely(nr = 0))
+		return;
+
+	for (i = 0; i  nr; i++) {
+		struct used_info *u = (struct used_info *)events[i].obj;
+		int len, status;
+
+		dprintk(%s, head %d complete in %d\n, __func__, u-head, i);
+		len = io_event_ret(events[i]);
+		//status = u-len == len ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR;
+		status = len  0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR;
+		if (copy_to_user(u-status, status, sizeof status)) {
+			vq_err(blk-vq, %s failed to write status\n, __func__);
+			BUG(); /* FIXME: maybe a bit radical? */
+		}
+		vhost_add_used(blk-vq, u-head, u-len);
+		kmem_cache_free(used_info_cachep, u);
+	}
+
+	vhost_signal(blk-dev, blk-vq);
+}
+
 static void eventfd_setup(struct vhost_blk *blk)
 {
 	blk-efile = eventfd_file_create(0, 0);
 	blk-ectx =

Re: [PATCH] Permit -mem-path without sync mmu

2011-08-08 Thread Avi Kivity


On 08/08/2011 09:03 AM, David Gibson wrote:

Second, if userspace qemu passing hugepages to kvm can cause (host)
kernel memory corruption, that is clearly a host kernel bug.  So am I
correct in thinking this is basically just a safety feature if qemu is
run on a buggy kernel.


Seems so, yes.  2.6.2[456] are exploitable.  We only found out after 
these were all released.



Presumably this bug was corrected at some
point?  Is the presence of the SYNC_MMU feature just being used as a
proxy for is this kernel recent enough to have the corruption bug
fixed?


SYNC_MMU actually fixes the bug.


In any case this test sure as hell needs a big comment next to it
explaining this context.


Yes.




  Why are mmu notifiers not implemented for PPC again?

It's just not done yet; we're working on it.  (That is, mmu notifiers
are certainly present on PPC, it's just they're not wired up to kvm,
yet).



If ppc doesn't have this issue even without SYNC_MMU, we can make the 
check x86 specific.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm PCI assignment VFIO ramblings

2011-08-08 Thread Avi Kivity


On 08/03/2011 05:04 AM, David Gibson wrote:

I still don't understand the distinction you're making.  We're saying
the group is owned by a given user or guest in the sense that no-one
else may use anything in the group (including host drivers).  At that
point none, some or all of the devices in the group may actually be
used by the guest.

You seem to be making a distinction between owned by and assigned
to and used by and I really don't see what it is.



Alex (and I) think that we should work with device/function granularity, 
as is common with other archs, and that the group thing is just a 
constraint on which functions may be assigned where, while you think 
that we should work at group granularity, with 1-function groups for 
archs which don't have constraints.


Is this an accurate way of putting it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Adds cgroup handling library

2011-08-08 Thread Lukas Doktor

[new] cgroup_common.py
* library for handling cgroups

Signed-off-by: Lukas Doktor ldok...@redhat.com
---
 client/tests/cgroup/cgroup.py|5 +-
 client/tests/cgroup/cgroup_common.py |  327 ++
 2 files changed, 331 insertions(+), 1 deletions(-)
 create mode 100755 client/tests/cgroup/cgroup_common.py

diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py
index d043d65..112f012 100755
--- a/client/tests/cgroup/cgroup.py
+++ b/client/tests/cgroup/cgroup.py
@@ -118,6 +118,7 @@ class cgroup(test.test):
 # Fill the memory without cgroup limitation
 # Should pass
 
+logging.debug(test_memory: Memfill WO cgroup)
 ps = item.test(memfill %d % mem)
 ps.stdin.write('\n')
 i = 0
@@ -141,6 +142,7 @@ class cgroup(test.test):
 # memsw: should swap out part of the process and pass
 # WO memsw: should fail (SIGKILL)
 
+logging.debug(test_memory: Memfill mem only limit)
 ps = item.test(memfill %d % mem)
 if item.set_cgroup(ps.pid, pwd):
 logging.error(test_memory: Could not set cgroup)
@@ -187,6 +189,7 @@ class cgroup(test.test):
 # Fill the memory with 1/2 memory+swap limit
 # Should fail
 
+logging.debug(test_memory: Memfill mem + swap limit)
 if memsw:
 ps = item.test(memfill %d % mem)
 if item.set_cgroup(ps.pid, pwd):
@@ -226,11 +229,11 @@ class cgroup(test.test):
 logging.debug(test_memory: Memfill mem+swap cgroup passed)
 
 # cleanup
+logging.debug(test_memory: Cleanup)
 if item.rm_cgroup(pwd):
 logging.error(test_memory: Can't remove cgroup directory)
 return -1
 os.system(swapon -a)
-logging.debug(test_memory: Cleanup passed)
 
 logging.info(Leaving 'test_memory': PASSED)
 return 0
diff --git a/client/tests/cgroup/cgroup_common.py 
b/client/tests/cgroup/cgroup_common.py
new file mode 100755
index 000..3fd1cf7
--- /dev/null
+++ b/client/tests/cgroup/cgroup_common.py
@@ -0,0 +1,327 @@
+#!/usr/bin/python
+# -*- coding: utf-8 -*-
+
+Helpers for cgroup testing
+
+@copyright: 2011 Red Hat Inc.
+@author: Lukas Doktor ldok...@redhat.com
+
+import os, logging
+import subprocess
+from tempfile import mkdtemp
+import time
+
+class Cgroup:
+
+Cgroup handling class
+
+def __init__(self, module, _client):
+
+Constructor
+@param module: Name of the cgroup module
+@param _client: Test script pwd+name
+
+self.module = module
+self._client = _client
+self.root = None
+
+
+def initialize(self, modules):
+
+Inicializes object for use
+@param modules: array of all available cgroup modules
+@return: 0 when PASSED
+
+self.root = modules.get_pwd(self.module)
+if self.root:
+return 0
+else:
+logging.error(cg.initialize(): Module %s not found, self.module)
+return -1
+return 0
+
+
+def mk_cgroup(self, root=None):
+
+Creates new temporary cgroup
+@param root: where to create this cgroup (default: self.root)
+@return: 0 when PASSED
+
+try:
+if root:
+pwd = mkdtemp(prefix='cgroup-', dir=root) + '/'
+else:
+pwd = mkdtemp(prefix='cgroup-', dir=self.root) + '/'
+except Exception, inst:
+logging.error(cg.mk_cgroup(): %s , inst)
+return None
+return pwd
+
+
+def rm_cgroup(self, pwd, supress=False):
+
+Removes cgroup
+@param pwd: cgroup directory
+@param supress: supress output
+@return: 0 when PASSED
+
+try:
+os.rmdir(pwd)
+except Exception, inst:
+if not supress:
+logging.error(cg.rm_cgroup(): %s , inst)
+return -1
+return 0
+
+
+def test(self, cmd):
+
+Executes cgroup_client.py with cmd parameter
+@param cmd: command to be executed
+@return: subprocess.Popen() process
+
+logging.debug(cg.test(): executing paralel process '%s' , cmd)
+process = subprocess.Popen((self._client + ' ' + cmd), shell=True,
+stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+stderr=subprocess.PIPE, close_fds=True)
+return process
+
+
+def is_cgroup(self, pid, pwd):
+
+Checks if the 'pid' process is in 'pwd' cgroup
+@param pid: pid of the process
+@param pwd: cgroup directory
+@return: 0 when is 'pwd' member
+
+if open(pwd+'/tasks').readlines().count(%d\n

Missing cgroup_common.py

2011-08-08 Thread Lukas Doktor

Hi,

I'm sorry for missing cgroup_common.py in previous patchset. I forgot to add it 
to git. You can find it in attached patch.

Regards,
Lukáš

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] postcopy livemigration proposal

2011-08-08 Thread Dor Laor


On 08/08/2011 06:24 AM, Isaku Yamahata wrote:

This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM
on which we'll give a talk at KVM-forum.
The purpose of this mail is to letting developers know it in advance
so that we can get better feedback on its design/implementation approach
early before our starting to implement it.


Background
==
* What's is postcopy livemigration
It is is yet another live migration mechanism for Qemu/KVM, which
implements the migration technique known as postcopy or lazy
migration. Just after the migrate command is invoked, the execution
host of a VM is instantaneously switched to a destination host.

The benefit is, total migration time is shorter because it transfer
a page only once. On the other hand precopy may repeat sending same pages
again and again because they can be dirtied.
The switching time from the source to the destination is several
hunderds mili seconds so that it enables quick load balancing.
For details, please refer to the papers.

We believe this is useful for others so that we'd like to merge this
feature into the upstream qemu/kvm. The existing implementation that
we have right now is very ad-hoc because it's for academic research.
For the upstream merge, we're starting to re-design/implement it and
we'd like to get feedback early.  Although many improvements/optimizations
are possible, we should implement/merge the simple/clean, but extensible
as well, one at first and then improve/optimize it later.

postcopy livemigration will be introduced as optional feature. The existing
precopy livemigration remains as default behavior.


* related links:
project page
http://sites.google.com/site/grivonhome/quick-kvm-migration

Enabling Instantaneous Relocation of Virtual Machines with a
Lightweight VMM Extension,
(proof-of-concept, ad-hoc prototype. not a new design)
http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf
http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf

Reactive consolidation of virtual machines enabled by postcopy live migration
(advantage for VM consolidation)
http://portal.acm.org/citation.cfm?id=1996125
http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf

Qemu wiki
http://wiki.qemu.org/Features/PostCopyLiveMigration


Design/Implementation
=
The basic idea of postcopy livemigration is to use a sort of distributed
shared memory between the migration source and destination.

The migration procedure looks like
   - start migration
 stop the guest VM on the source and send the machine states except
 guest RAM to the destination
   - resume the guest VM on the destination without guest RAM contents
   - Hook guest access to pages, and pull page contents from the source
 This continues until all the pages are pulled to the destination

   The big picture is depicted at
   http://wiki.qemu.org/File:Postcopy-livemigration.png


That's terrific  (nice video also)!
Orit and myself had the exact same idea too (now we can't patent it..).

Advantages:
- No down time due to memory copying.
- Efficient, reduce needed traffic no need to re-send pages.
- Reduce overall RAM consumption of the source and destination
as opposed from current live migration (both the source and the
destination allocate the memory until the live migration
completes). We can free copied memory once the destination guest
received it and save RAM.
- Increase parallelism for SMP guests we can have multiple
virtual CPU handle their demand paging . Less time to hold a
global lock, less thread contention.
- Virtual machines are using more and more memory resources ,
for a virtual machine with very large working set doing live
migration with reasonable down time is impossible today.

Disadvantageous:
- During the live migration the guest will run slower than in
today's live migration. We need to remember that even today
guests suffer from performance penalty on the source during the
COW stage (memory copy).
- Failure of the source or destination or the network will cause
us to lose the running virtual machine. Those failures are very
rare.
In case there is shared storage we can store a copy of the
memory there , that can be recovered in case of such failure .

Overall, it looks like a better approach for the vast majority of cases.
Hope it will get merged to kvm and become the default way.




There are several design points.
   - who takes care of pulling page contents.
 an independent daemon vs a thread in qemu
 The daemon approach is preferable because an independent daemon would
 easy for debug postcopy memory mechanism without qemu.
 If required, it wouldn't be difficult to convert a daemon into
 a thread in qemu

   - connection between the source and the destination

Re: [Qemu-devel] [RFC] postcopy livemigration proposal

2011-08-08 Thread Stefan Hajnoczi

On Mon, Aug 8, 2011 at 4:24 AM, Isaku Yamahata yamah...@valinux.co.jp wrote:
 This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM
 on which we'll give a talk at KVM-forum.

I'm curious if this approach is compatible with asynchronous page
faults?  The idea there was to tell the guest about a page fault so it
can continue to do useful work in the meantime (if the fault was in
guest userspace).

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] postcopy livemigration proposal

2011-08-08 Thread Yaniv Kaul


On 08/08/2011 12:20, Dor Laor wrote:

On 08/08/2011 06:24 AM, Isaku Yamahata wrote:

This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM
on which we'll give a talk at KVM-forum.
The purpose of this mail is to letting developers know it in advance
so that we can get better feedback on its design/implementation approach
early before our starting to implement it.


Background
==
* What's is postcopy livemigration
It is is yet another live migration mechanism for Qemu/KVM, which
implements the migration technique known as postcopy or lazy
migration. Just after the migrate command is invoked, the execution
host of a VM is instantaneously switched to a destination host.

The benefit is, total migration time is shorter because it transfer
a page only once. On the other hand precopy may repeat sending same 
pages

again and again because they can be dirtied.
The switching time from the source to the destination is several
hunderds mili seconds so that it enables quick load balancing.
For details, please refer to the papers.

We believe this is useful for others so that we'd like to merge this
feature into the upstream qemu/kvm. The existing implementation that
we have right now is very ad-hoc because it's for academic research.
For the upstream merge, we're starting to re-design/implement it and
we'd like to get feedback early.  Although many 
improvements/optimizations

are possible, we should implement/merge the simple/clean, but extensible
as well, one at first and then improve/optimize it later.

postcopy livemigration will be introduced as optional feature. The 
existing

precopy livemigration remains as default behavior.


* related links:
project page
http://sites.google.com/site/grivonhome/quick-kvm-migration

Enabling Instantaneous Relocation of Virtual Machines with a
Lightweight VMM Extension,
(proof-of-concept, ad-hoc prototype. not a new design)
http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf
http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf

Reactive consolidation of virtual machines enabled by postcopy live 
migration

(advantage for VM consolidation)
http://portal.acm.org/citation.cfm?id=1996125
http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf 



Qemu wiki
http://wiki.qemu.org/Features/PostCopyLiveMigration


Design/Implementation
=
The basic idea of postcopy livemigration is to use a sort of distributed
shared memory between the migration source and destination.

The migration procedure looks like
   - start migration
 stop the guest VM on the source and send the machine states except
 guest RAM to the destination
   - resume the guest VM on the destination without guest RAM contents
   - Hook guest access to pages, and pull page contents from the source
 This continues until all the pages are pulled to the destination

   The big picture is depicted at
   http://wiki.qemu.org/File:Postcopy-livemigration.png


That's terrific  (nice video also)!
Orit and myself had the exact same idea too (now we can't patent it..).

Advantages:
- No down time due to memory copying.
- Efficient, reduce needed traffic no need to re-send pages.
- Reduce overall RAM consumption of the source and destination
as opposed from current live migration (both the source and the
destination allocate the memory until the live migration
completes). We can free copied memory once the destination guest
received it and save RAM.
- Increase parallelism for SMP guests we can have multiple
virtual CPU handle their demand paging . Less time to hold a
global lock, less thread contention.
- Virtual machines are using more and more memory resources ,
for a virtual machine with very large working set doing live
migration with reasonable down time is impossible today.

Disadvantageous:
- During the live migration the guest will run slower than in
today's live migration. We need to remember that even today
guests suffer from performance penalty on the source during the
COW stage (memory copy).
- Failure of the source or destination or the network will cause
us to lose the running virtual machine. Those failures are very
rare.


I highly doubt that's acceptable in enterprise deployments.


In case there is shared storage we can store a copy of the
memory there , that can be recovered in case of such failure .

Overall, it looks like a better approach for the vast majority of cases.
Hope it will get merged to kvm and become the default way.




There are several design points.
   - who takes care of pulling page contents.
 an independent daemon vs a thread in qemu
 The daemon approach is preferable because an independent daemon 
would

 easy for debug postcopy memory mechanism without qemu.
 If required, it wouldn't be

Re: [Qemu-devel] [RFC] postcopy livemigration proposal

2011-08-08 Thread Isaku Yamahata

On Mon, Aug 08, 2011 at 10:38:35AM +0100, Stefan Hajnoczi wrote:
 On Mon, Aug 8, 2011 at 4:24 AM, Isaku Yamahata yamah...@valinux.co.jp wrote:
  This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM
  on which we'll give a talk at KVM-forum.
 
 I'm curious if this approach is compatible with asynchronous page
 faults?  The idea there was to tell the guest about a page fault so it
 can continue to do useful work in the meantime (if the fault was in
 guest userspace).

Yes. It's quite possible to inject async page fault into the guest
when the faulted page isn't available on the destination. At the same
time the page will be requested to the source of the migration.
I think it's not so difficult.
-- 
yamahata
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: percpu crash on NetBurst

2011-08-08 Thread Tejun Heo

Hello, Avi.

On Sun, Aug 07, 2011 at 06:32:35PM +0300, Avi Kivity wrote:
 qemu, under some conditions (-cpu host or -cpu kvm64), erroneously
 passes family=15 as the virtual cpuid.  This causes a BUG() in
 percpu code during late boot:
 
 [ cut here ]
 kernel BUG at mm/percpu.c:577!

This means that free_percpu() was passed a pointer which doesn't point
to the start of an allocated area.  ie. the caller is trying to free
invalid pointer.  H... from the backtrace, it seems to be caused
by super_block-s_files.  Weird.

  [811060cc] free_percpu+0x8c/0x140
  [811462a5] __put_super+0x45/0x80
  [811463d5] put_super+0x25/0x40
  [8114651a] deactivate_locked_super+0x5a/0x70
  [81146f0e] deactivate_super+0x4e/0x70
  [811614e5] mntput_no_expire+0xb5/0x100
  [8116154f] mntput+0x1f/0x30
  [81245855] mq_put_mnt+0x15/0x20
  [81245f77] put_ipc_ns+0x47/0xa0
  [81080232] free_nsproxy+0x42/0x90
  [81080440] switch_task_namespaces+0x50/0x60
  [81080460] exit_task_namespaces+0x10/0x20
  [8105d29c] do_exit+0x46c/0x870
  [8105da02] do_group_exit+0x42/0xa0
  [8105da77] sys_exit_group+0x17/0x20
  [81521382] system_call_fastpath+0x16/0x1b
 Code: e7 41 89 54 24 14 e8 f2 fd ff ff 5b 41 5c 41 5d 41 5e 5d c3 31
 f6 31 db e9 f5 fe ff ff 45 31 ed 31 c9 31 db e9 02 ff ff ff 0f 0b
 0f 0b 55 48 89 e5 48 83 ec 20 48 89 5d e0 4c 89 65 e8 4c 89 6d
 RIP  [8110603e] pcpu_free_area+0x17e/0x180
  RSP 880001cabd18
 ---[ end trace 87bc11c05d27169e ]---
 
 I traced this to the kernel cpuid code determining the cache line size:
 
 arch/x86/kernel/cpu/intel.c:
 
 if (c-x86 == 15)
 c-x86_cache_alignment = c-x86_clflush_size * 2;
 
 If I comment out this code, the kernel boots and all is well.
 
 I suspect that the percpu code sometimes uses x86_cache_alignment
 and sometimes some hardcoded macro; I saw some negative elements of
 chunk-map[].

The negative elements indicate allocated areas.

 All this applies to v3.0; current upstream (c2f340a69ca) fails even
 worse, haven't yet determined exactly why.
 
 I'm surprised this hasn't been reported before; Ingo, don't you have
 family=15 hosts in your test farm?

Hmmm... I can't trigger the problem w/ kvm64 (I tried mounting and
unmounting filesystems but it worked okay) and am quite skeptical this
is a wide spread problem given that the percpu core code is used very
widely and hasn't seen a lot of changes lately.  Is there anything
specific you need to do to trigger the condition?  Can you try to
print out the s_files addresses being allocated and freed?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Michael S. Tsirkin

On Fri, Aug 05, 2011 at 08:52:25AM -0500, Anthony Liguori wrote:
 On 08/04/2011 08:05 AM, Avi Kivity wrote:
 From: Michael S. Tsirkinm...@redhat.com
 
 We originally did get config on map, so that
 following write accesses are done on an updated config.
 New memory API doesn't give us a callback
 on map, and arguably, devices don't know when
 cpu really can access there. So updating on
 init seems cleaner.
 
 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 Signed-off-by: Avi Kivitya...@redhat.com
 ---
   hw/virtio-pci.c |7 ---
   1 files changed, 4 insertions(+), 3 deletions(-)
 
 diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
 index d685243..ca1f12f 100644
 --- a/hw/virtio-pci.c
 +++ b/hw/virtio-pci.c
 @@ -506,9 +506,6 @@ static void virtio_map(PCIDevice *pci_dev, int 
 region_num,
   register_ioport_read(addr, config_len, 1, virtio_pci_config_readb, 
  proxy);
   register_ioport_read(addr, config_len, 2, virtio_pci_config_readw, 
  proxy);
   register_ioport_read(addr, config_len, 4, virtio_pci_config_readl, 
  proxy);
 -
 -if (vdev-config_len)
 -vdev-get_config(vdev, vdev-config);
   }
 
   static void virtio_write_config(PCIDevice *pci_dev, uint32_t address,
 @@ -689,6 +686,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, 
 VirtIODevice *vdev)
   proxy-host_features |= 0x1  VIRTIO_F_NOTIFY_ON_EMPTY;
   proxy-host_features |= 0x1  VIRTIO_F_BAD_FEATURE;
   proxy-host_features = vdev-get_features(vdev, proxy-host_features);
 +
 +if (vdev-config_len) {
 +vdev-get_config(vdev, vdev-config);
 +}
 
 Thinking more closely, I don't think this right.
 
 Updating on map ensured that the config was refreshed after each
 time the bar was mapped.  In the very least, the config needs to be
 refreshed during reset because the guest may write to the guest
 space which should get cleared after reset.
 
 Regards,
 
 Anthony Liguori

Not sure I understand. Which register, for example,
do you have in mind?
Could you clarify please?


   }
 
   static int virtio_blk_init_pci(PCIDevice *pci_dev)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] postcopy livemigration proposal

2011-08-08 Thread Nadav Har'El

 * What's is postcopy livemigration
 It is is yet another live migration mechanism for Qemu/KVM, which
 implements the migration technique known as postcopy or lazy
 migration. Just after the migrate command is invoked, the execution
 host of a VM is instantaneously switched to a destination host.

Sounds like a cool idea.

 The benefit is, total migration time is shorter because it transfer
 a page only once. On the other hand precopy may repeat sending same pages
 again and again because they can be dirtied.
 The switching time from the source to the destination is several
 hunderds mili seconds so that it enables quick load balancing.
 For details, please refer to the papers.

While these are the obvious benefits, the possible downside (that, as
always, depends on the workload) is the amount of time that the guest
workload runs more slowly than usual, waiting for pages it needs to
continue. There are a whole spectrum between the guest pausing completely
(which would solve all the problems of migration, but is often considered
unacceptible) and running at full-speed. Is it acceptable that the guest
runs at 90% speed during the migration? 50%? 10%?
I guess we could have nothing to lose from having both options, and choosing
the most appropriate technique for each guest!

 That's terrific  (nice video also)!
 Orit and myself had the exact same idea too (now we can't patent it..).

I think new implementation is not the only reason why you cannot patent
this idea :-) Demand-paged migration has actually been discussed (and done)
for nearly a quarter of a century (!) in the area of *process* migration.

The first use I'm aware of was in CMU's Accent 1987 - see [1].
Another paper, [2], written in 1991, discusses how process migration is done
in UCB's Sprite operating system, and evaluates the various alternatives
common at the time (20 years ago), including what it calls lazy copying
is more-or-less the same thing as post copy. Mosix (a project which, in some
sense, is still alive to day) also used some sort of cross between pre-copying
(of dirty pages) and copying on-demand of clean pages (from their backing
store on the source machine).


References
[1] Attacking the Process Migration Bottleneck
 http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf
[2]  Transparent Process Migration: Design Alternatives and the Sprite
 Implementation
 http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf

 Advantages:
 - Virtual machines are using more and more memory resources ,
 for a virtual machine with very large working set doing live
 migration with reasonable down time is impossible today.

If a guest actually constantly uses (working set) most of its allocated
memory, it will basically be unable to do any significant amount of work
on the destination VM until this large working set is transfered to the
destination. So in this scenario, post copying doesn't give any
significant advantages over plain-old pause guest and send it to the
destination. Or am I missing something?

 Disadvantageous:
 - During the live migration the guest will run slower than in
 today's live migration. We need to remember that even today
 guests suffer from performance penalty on the source during the
 COW stage (memory copy).

I wonder if something like asynchronous page faults can help somewhat with
multi-process guest workloads (and modified (PV) guest OS). 

 - Failure of the source or destination or the network will cause
 us to lose the running virtual machine. Those failures are very
 rare.

How is this different from a VM running on a single machine that fails?
Just that the small probability of failure (roughly) doubles for the
relatively-short duration of the transfer?


-- 
Nadav Har'El|   Monday, Aug  8 2011, 8 Av 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |If glory comes after death, I'm not in a
http://nadav.harel.org.il   |hurry. (Latin proverb)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Avi Kivity

In certain circumstances, posix-aio-compat can incur a lot of latency:
 - threads are created by vcpu threads, so if vcpu affinity is set,
   aio threads inherit vcpu affinity.  This can cause many aio threads
   to compete for one cpu.
 - we can create up to max_threads (64) aio threads in one go; since a
   pthread_create can take around 30μs, we have up to 2ms of cpu time
   under a global lock.

Fix by:
 - moving thread creation to the main thread, so we inherit the main
   thread's affinity instead of the vcpu thread's affinity.
 - if a thread is currently being created, and we need to create yet
   another thread, let thread being born create the new thread, reducing
   the amount of time we spend under the main thread.
 - drop the local lock while creating a thread (we may still hold the
   global mutex, though)

Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it.  We may want pre-allocated
threads when this cannot be tolerated.

Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.

Signed-off-by: Avi Kivity a...@redhat.com
---
 posix-aio-compat.c |   48 ++--
 1 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 8dc00cb..aa30673 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -30,6 +30,7 @@
 
 #include block/raw-posix-aio.h
 
+static void do_spawn_thread(void);
 
 struct qemu_paiocb {
 BlockDriverAIOCB common;
@@ -64,6 +65,9 @@ static pthread_attr_t attr;
 static int max_threads = 64;
 static int cur_threads = 0;
 static int idle_threads = 0;
+static int new_threads = 0; /* backlog of threads we need to create */
+static int pending_threads = 0; /* threads created but not running yet */
+static QEMUBH *new_thread_bh;
 static QTAILQ_HEAD(, qemu_paiocb) request_list;
 
 #ifdef CONFIG_PREADV
@@ -311,6 +315,13 @@ static void *aio_thread(void *unused)
 
 pid = getpid();
 
+mutex_lock(lock);
+if (new_threads) {
+do_spawn_thread();
+}
+pending_threads--;
+mutex_unlock(lock);
+
 while (1) {
 struct qemu_paiocb *aiocb;
 ssize_t ret = 0;
@@ -381,11 +392,18 @@ static void *aio_thread(void *unused)
 return NULL;
 }
 
-static void spawn_thread(void)
+static void do_spawn_thread(void)
 {
 sigset_t set, oldset;
 
-cur_threads++;
+if (!new_threads) {
+return;
+}
+
+new_threads--;
+pending_threads++;
+
+mutex_unlock(lock);
 
 /* block all signals */
 if (sigfillset(set)) die(sigfillset);
@@ -394,6 +412,31 @@ static void spawn_thread(void)
 thread_create(thread_id, attr, aio_thread, NULL);
 
 if (sigprocmask(SIG_SETMASK, oldset, NULL)) die(sigprocmask restore);
+
+mutex_lock(lock);
+}
+
+static void spawn_thread_bh_fn(void *opaque)
+{
+mutex_lock(lock);
+do_spawn_thread();
+mutex_unlock(lock);
+}
+
+static void spawn_thread(void)
+{
+cur_threads++;
+new_threads++;
+/* If there are threads being created, they will spawn new workers, so
+ * we don't spend time creating many threads in a loop holding a mutex or
+ * starving the current vcpu.
+ *
+ * If there are no idle threads, ask the main thread to create one, so we
+ * inherit the correct affinity instead of the vcpu affinity.
+ */
+if (!pending_threads) {
+qemu_bh_schedule(new_thread_bh);
+}
 }
 
 static void qemu_paio_submit(struct qemu_paiocb *aiocb)
@@ -665,6 +708,7 @@ int paio_init(void)
 die2(ret, pthread_attr_setdetachstate);
 
 QTAILQ_INIT(request_list);
+new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL);
 
 posix_aio_state = s;
 return 0;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] postcopy livemigration proposal

2011-08-08 Thread Dor Laor


On 08/08/2011 01:59 PM, Nadav Har'El wrote:

* What's is postcopy livemigration
It is is yet another live migration mechanism for Qemu/KVM, which
implements the migration technique known as postcopy or lazy
migration. Just after the migrate command is invoked, the execution
host of a VM is instantaneously switched to a destination host.


Sounds like a cool idea.


The benefit is, total migration time is shorter because it transfer
a page only once. On the other hand precopy may repeat sending same pages
again and again because they can be dirtied.
The switching time from the source to the destination is several
hunderds mili seconds so that it enables quick load balancing.
For details, please refer to the papers.


While these are the obvious benefits, the possible downside (that, as
always, depends on the workload) is the amount of time that the guest
workload runs more slowly than usual, waiting for pages it needs to
continue. There are a whole spectrum between the guest pausing completely
(which would solve all the problems of migration, but is often considered
unacceptible) and running at full-speed. Is it acceptable that the guest
runs at 90% speed during the migration? 50%? 10%?
I guess we could have nothing to lose from having both options, and choosing
the most appropriate technique for each guest!


+1




That's terrific  (nice video also)!
Orit and myself had the exact same idea too (now we can't patent it..).


I think new implementation is not the only reason why you cannot patent
this idea :-) Demand-paged migration has actually been discussed (and done)
for nearly a quarter of a century (!) in the area of *process* migration.

The first use I'm aware of was in CMU's Accent 1987 - see [1].
Another paper, [2], written in 1991, discusses how process migration is done
in UCB's Sprite operating system, and evaluates the various alternatives
common at the time (20 years ago), including what it calls lazy copying
is more-or-less the same thing as post copy. Mosix (a project which, in some
sense, is still alive to day) also used some sort of cross between pre-copying
(of dirty pages) and copying on-demand of clean pages (from their backing
store on the source machine).


References
[1] Attacking the Process Migration Bottleneck
  http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf


w/o reading the internals, patents enable you to implement an existing 
idea on a new field. Anyway, there won't be no patent in this case. 
Still let's have the kvm innovation merged.



[2]  Transparent Process Migration: Design Alternatives and the Sprite
  Implementation
  http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf


Advantages:
 - Virtual machines are using more and more memory resources ,
 for a virtual machine with very large working set doing live
 migration with reasonable down time is impossible today.


If a guest actually constantly uses (working set) most of its allocated
memory, it will basically be unable to do any significant amount of work
on the destination VM until this large working set is transfered to the
destination. So in this scenario, post copying doesn't give any
significant advantages over plain-old pause guest and send it to the
destination. Or am I missing something?


There is one key advantage in this scheme/use case - if you have a guest 
with a very large working set, you'll need a very large downtime in 
order to migrate it with today's algorithm. With post copy (aka 
streaming/demand paging), the guest won't have any downtime but will run 
slower than expected.


There are guests today that is impractical to really live migrate them.

btw: Even today, marking pages RO also carries some performance penalty.




Disadvantageous:
 - During the live migration the guest will run slower than in
 today's live migration. We need to remember that even today
 guests suffer from performance penalty on the source during the
 COW stage (memory copy).


I wonder if something like asynchronous page faults can help somewhat with
multi-process guest workloads (and modified (PV) guest OS).


They should come in to play for some extent. Note that only newer Linux 
guest will enjoy of them.





 - Failure of the source or destination or the network will cause
 us to lose the running virtual machine. Those failures are very
 rare.


How is this different from a VM running on a single machine that fails?
Just that the small probability of failure (roughly) doubles for the
relatively-short duration of the transfer?


Exactly my point, this is not a major disadvantage because of this low 
probability.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Anthony Liguori


On 08/08/2011 06:37 AM, Avi Kivity wrote:

In certain circumstances, posix-aio-compat can incur a lot of latency:
  - threads are created by vcpu threads, so if vcpu affinity is set,
aio threads inherit vcpu affinity.  This can cause many aio threads
to compete for one cpu.
  - we can create up to max_threads (64) aio threads in one go; since a
pthread_create can take around 30μs, we have up to 2ms of cpu time
under a global lock.

Fix by:
  - moving thread creation to the main thread, so we inherit the main
thread's affinity instead of the vcpu thread's affinity.
  - if a thread is currently being created, and we need to create yet
another thread, let thread being born create the new thread, reducing
the amount of time we spend under the main thread.
  - drop the local lock while creating a thread (we may still hold the
global mutex, though)

Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it.  We may want pre-allocated
threads when this cannot be tolerated.

Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.


Do you have a scenario where you can measure the benefits of this 
change?  The idle time in the thread pool is rather large, it surprises 
me that it'd be an issue in practice.


Regards,

Anthony Liguori



Signed-off-by: Avi Kivitya...@redhat.com
---
  posix-aio-compat.c |   48 ++--
  1 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 8dc00cb..aa30673 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -30,6 +30,7 @@

  #include block/raw-posix-aio.h

+static void do_spawn_thread(void);

  struct qemu_paiocb {
  BlockDriverAIOCB common;
@@ -64,6 +65,9 @@ static pthread_attr_t attr;
  static int max_threads = 64;
  static int cur_threads = 0;
  static int idle_threads = 0;
+static int new_threads = 0; /* backlog of threads we need to create */
+static int pending_threads = 0; /* threads created but not running yet */
+static QEMUBH *new_thread_bh;
  static QTAILQ_HEAD(, qemu_paiocb) request_list;

  #ifdef CONFIG_PREADV
@@ -311,6 +315,13 @@ static void *aio_thread(void *unused)

  pid = getpid();

+mutex_lock(lock);
+if (new_threads) {
+do_spawn_thread();
+}
+pending_threads--;
+mutex_unlock(lock);
+
  while (1) {
  struct qemu_paiocb *aiocb;
  ssize_t ret = 0;
@@ -381,11 +392,18 @@ static void *aio_thread(void *unused)
  return NULL;
  }

-static void spawn_thread(void)
+static void do_spawn_thread(void)
  {
  sigset_t set, oldset;

-cur_threads++;
+if (!new_threads) {
+return;
+}
+
+new_threads--;
+pending_threads++;
+
+mutex_unlock(lock);

  /* block all signals */
  if (sigfillset(set)) die(sigfillset);
@@ -394,6 +412,31 @@ static void spawn_thread(void)
  thread_create(thread_id,attr, aio_thread, NULL);

  if (sigprocmask(SIG_SETMASK,oldset, NULL)) die(sigprocmask restore);
+
+mutex_lock(lock);
+}
+
+static void spawn_thread_bh_fn(void *opaque)
+{
+mutex_lock(lock);
+do_spawn_thread();
+mutex_unlock(lock);
+}
+
+static void spawn_thread(void)
+{
+cur_threads++;
+new_threads++;
+/* If there are threads being created, they will spawn new workers, so
+ * we don't spend time creating many threads in a loop holding a mutex or
+ * starving the current vcpu.
+ *
+ * If there are no idle threads, ask the main thread to create one, so we
+ * inherit the correct affinity instead of the vcpu affinity.
+ */
+if (!pending_threads) {
+qemu_bh_schedule(new_thread_bh);
+}
  }

  static void qemu_paio_submit(struct qemu_paiocb *aiocb)
@@ -665,6 +708,7 @@ int paio_init(void)
  die2(ret, pthread_attr_setdetachstate);

  QTAILQ_INIT(request_list);
+new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL);

  posix_aio_state = s;
  return 0;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] postcopy livemigration proposal

2011-08-08 Thread Avi Kivity


On 08/08/2011 06:24 AM, Isaku Yamahata wrote:

This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM
on which we'll give a talk at KVM-forum.
The purpose of this mail is to letting developers know it in advance
so that we can get better feedback on its design/implementation approach
early before our starting to implement it.


Interesting; what is the impact of increased latency on memory reads?




There are several design points.
   - who takes care of pulling page contents.
 an independent daemon vs a thread in qemu
 The daemon approach is preferable because an independent daemon would
 easy for debug postcopy memory mechanism without qemu.
 If required, it wouldn't be difficult to convert a daemon into
 a thread in qemu


Isn't this equivalent to touching each page in sequence?

Care must be taken that we don't post too many requests, or it could 
affect the latency of synchronous accesses by the guest.




   - connection between the source and the destination
 The connection for live migration can be re-used after sending machine
 state.

   - transfer protocol
 The existing protocol that exists today can be extended.

   - hooking guest RAM access
 Introduce a character device to handle page fault.
 When page fault occurs, it queues page request up to user space daemon
 at the destination. And the daemon pulls page contents from the source
 and serves it into the character device. Then the page fault is resovlved.


This doesn't play well with host swapping, transparent hugepages, or 
ksm, does it?


I see you note this later on.


* More on hooking guest RAM access
There are several candidate for the implementation. Our preference is
character device approach.

   - inserting hooks into everywhere in qemu/kvm
 This is impractical

   - backing store for guest ram
 a block device or a file can be used to back guest RAM.
 Thus hook the guest ram access.

 pros
 - new device driver isn't needed.
 cons
 - future improvement would be difficult
 - some KVM host feature(KSM, THP) wouldn't work

   - character device
 qemu mmap() the dedicated character device, and then hook page fault.

 pros
 - straght forward approach
 - future improvement would be easy
 cons
 - new driver is needed
 - some KVM host feature(KSM, THP) wouldn't work
   They checks if a given VMA is anonymous. This can be fixed.

   - swap device
 When creating guest, it is set up as if all the guest RAM is swapped out
 to a dedicated swap device, which may be nbd disk (or some kind of user
 space block device, BUSE?).
 When the VM tries to access memory, swap-in is triggered and IO to the
 swap device is issued. Then the IO to swap is routed to the daemon
 in user space with nbd protocol (or BUSE, AOE, iSCSI...). The daemon pulls
 pages from the migration source and services the IO request.

 pros
 - After the page transfer is complete, everything is same as normal case.
 - no new device driver isn't needed
 cons
 - future improvement would be difficult
 - administration: setting up nbd, swap device



Using a swap device would be my preference.  We'd still be using 
anonymous memory so thp/ksm/ordinary swap still work.


It would need to be a special kind of swap device since we only want to 
swap in, and never out, to that device.  We'd also need a special way of 
telling the kernel that memory comes from that device.  In that it's 
similar your second option.


Maybe we should use a backing file (using nbd) and have a madvise() call 
that converts the vma to anonymous memory once the migration is finished.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Avi Kivity


On 08/08/2011 03:34 PM, Anthony Liguori wrote:

On 08/08/2011 06:37 AM, Avi Kivity wrote:

In certain circumstances, posix-aio-compat can incur a lot of latency:
  - threads are created by vcpu threads, so if vcpu affinity is set,
aio threads inherit vcpu affinity.  This can cause many aio threads
to compete for one cpu.
  - we can create up to max_threads (64) aio threads in one go; since a
pthread_create can take around 30μs, we have up to 2ms of cpu time
under a global lock.

Fix by:
  - moving thread creation to the main thread, so we inherit the main
thread's affinity instead of the vcpu thread's affinity.
  - if a thread is currently being created, and we need to create yet
another thread, let thread being born create the new thread, 
reducing

the amount of time we spend under the main thread.
  - drop the local lock while creating a thread (we may still hold the
global mutex, though)

Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it.  We may want 
pre-allocated

threads when this cannot be tolerated.

Thanks to Uli Obergfell of Red Hat for his excellent analysis and 
suggestions.


Do you have a scenario where you can measure the benefits of this change? 


It's a customer scenario, so I can't share it.  Not that I know exactly 
what happened there in terms of workload.


The idle time in the thread pool is rather large, it surprises me that 
it'd be an issue in practice.




Just starting up a virtio guest will fill the queue with  max_threads 
requests, and if the vcpu is pinned, all 64 thread creations and 
executions will have to run on the same cpu, and will likely preempt the 
vcpu since it's classified as a cpu hog by some schedulers.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Anthony Liguori


On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote:

Thinking more closely, I don't think this right.

Updating on map ensured that the config was refreshed after each
time the bar was mapped.  In the very least, the config needs to be
refreshed during reset because the guest may write to the guest
space which should get cleared after reset.

Regards,

Anthony Liguori


Not sure I understand. Which register, for example,
do you have in mind?
Could you clarify please?


Actually, you never need to call config_get() AFAICT.  It's called in 
every read/write access.  So I think the code you changed is extraneous now.


Regards,

Anthony Liguori




  }

  static int virtio_blk_init_pci(PCIDevice *pci_dev)




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Avi Kivity


On 08/08/2011 03:45 PM, Anthony Liguori wrote:


Actually, you never need to call config_get() AFAICT.  It's called in 
every read/write access.  So I think the code you changed is 
extraneous now.


Ok; I'll drop this patch and report (and just remove the code in 
virtio_map()).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Frediano Ziglio

2011/8/8 Avi Kivity a...@redhat.com:
 In certain circumstances, posix-aio-compat can incur a lot of latency:
  - threads are created by vcpu threads, so if vcpu affinity is set,
   aio threads inherit vcpu affinity.  This can cause many aio threads
   to compete for one cpu.
  - we can create up to max_threads (64) aio threads in one go; since a
   pthread_create can take around 30μs, we have up to 2ms of cpu time
   under a global lock.

 Fix by:
  - moving thread creation to the main thread, so we inherit the main
   thread's affinity instead of the vcpu thread's affinity.
  - if a thread is currently being created, and we need to create yet
   another thread, let thread being born create the new thread, reducing
   the amount of time we spend under the main thread.
  - drop the local lock while creating a thread (we may still hold the
   global mutex, though)

 Note this doesn't eliminate latency completely; scheduler artifacts or
 lack of host cpu resources can still cause it.  We may want pre-allocated
 threads when this cannot be tolerated.

 Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.

 Signed-off-by: Avi Kivity a...@redhat.com

Why not calling pthread_attr_setaffinity_np (where available) before
thread creation or shed_setaffinity at thread start instead of telling
another thread to create a thread for us just to get affinity cleared?

Regards
  Frediano

 ---
  posix-aio-compat.c |   48 ++--
  1 files changed, 46 insertions(+), 2 deletions(-)

 diff --git a/posix-aio-compat.c b/posix-aio-compat.c
 index 8dc00cb..aa30673 100644
 --- a/posix-aio-compat.c
 +++ b/posix-aio-compat.c
 @@ -30,6 +30,7 @@

  #include block/raw-posix-aio.h

 +static void do_spawn_thread(void);

  struct qemu_paiocb {
     BlockDriverAIOCB common;
 @@ -64,6 +65,9 @@ static pthread_attr_t attr;
  static int max_threads = 64;
  static int cur_threads = 0;
  static int idle_threads = 0;
 +static int new_threads = 0;     /* backlog of threads we need to create */
 +static int pending_threads = 0; /* threads created but not running yet */
 +static QEMUBH *new_thread_bh;
  static QTAILQ_HEAD(, qemu_paiocb) request_list;

  #ifdef CONFIG_PREADV
 @@ -311,6 +315,13 @@ static void *aio_thread(void *unused)

     pid = getpid();

 +    mutex_lock(lock);
 +    if (new_threads) {
 +        do_spawn_thread();
 +    }
 +    pending_threads--;
 +    mutex_unlock(lock);
 +
     while (1) {
         struct qemu_paiocb *aiocb;
         ssize_t ret = 0;
 @@ -381,11 +392,18 @@ static void *aio_thread(void *unused)
     return NULL;
  }

 -static void spawn_thread(void)
 +static void do_spawn_thread(void)
  {
     sigset_t set, oldset;

 -    cur_threads++;
 +    if (!new_threads) {
 +        return;
 +    }
 +
 +    new_threads--;
 +    pending_threads++;
 +
 +    mutex_unlock(lock);

     /* block all signals */
     if (sigfillset(set)) die(sigfillset);
 @@ -394,6 +412,31 @@ static void spawn_thread(void)
     thread_create(thread_id, attr, aio_thread, NULL);

     if (sigprocmask(SIG_SETMASK, oldset, NULL)) die(sigprocmask restore);
 +
 +    mutex_lock(lock);
 +}
 +
 +static void spawn_thread_bh_fn(void *opaque)
 +{
 +    mutex_lock(lock);
 +    do_spawn_thread();
 +    mutex_unlock(lock);
 +}
 +
 +static void spawn_thread(void)
 +{
 +    cur_threads++;
 +    new_threads++;
 +    /* If there are threads being created, they will spawn new workers, so
 +     * we don't spend time creating many threads in a loop holding a mutex or
 +     * starving the current vcpu.
 +     *
 +     * If there are no idle threads, ask the main thread to create one, so we
 +     * inherit the correct affinity instead of the vcpu affinity.
 +     */
 +    if (!pending_threads) {
 +        qemu_bh_schedule(new_thread_bh);
 +    }
  }

  static void qemu_paio_submit(struct qemu_paiocb *aiocb)
 @@ -665,6 +708,7 @@ int paio_init(void)
         die2(ret, pthread_attr_setdetachstate);

     QTAILQ_INIT(request_list);
 +    new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL);

     posix_aio_state = s;
     return 0;
 --
 1.7.5.3



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Avi Kivity


On 08/08/2011 03:49 PM, Frediano Ziglio wrote:

2011/8/8 Avi Kivitya...@redhat.com:
  In certain circumstances, posix-aio-compat can incur a lot of latency:
- threads are created by vcpu threads, so if vcpu affinity is set,
 aio threads inherit vcpu affinity.  This can cause many aio threads
 to compete for one cpu.
- we can create up to max_threads (64) aio threads in one go; since a
 pthread_create can take around 30μs, we have up to 2ms of cpu time
 under a global lock.

  Fix by:
- moving thread creation to the main thread, so we inherit the main
 thread's affinity instead of the vcpu thread's affinity.
- if a thread is currently being created, and we need to create yet
 another thread, let thread being born create the new thread, reducing
 the amount of time we spend under the main thread.
- drop the local lock while creating a thread (we may still hold the
 global mutex, though)

  Note this doesn't eliminate latency completely; scheduler artifacts or
  lack of host cpu resources can still cause it.  We may want pre-allocated
  threads when this cannot be tolerated.

  Thanks to Uli Obergfell of Red Hat for his excellent analysis and 
suggestions.

  Signed-off-by: Avi Kivitya...@redhat.com

Why not calling pthread_attr_setaffinity_np (where available) before
thread creation or shed_setaffinity at thread start instead of telling
another thread to create a thread for us just to get affinity cleared?



The entire qemu process may be affined to a subset of the host cpus; we 
don't want to break that.


For example:

   taskset 0xf0 qemu 
   (qemu) info cpus
pin individual vcpu threads to host cpus


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Michael S. Tsirkin

On Mon, Aug 08, 2011 at 07:45:19AM -0500, Anthony Liguori wrote:
 On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote:
 Thinking more closely, I don't think this right.
 
 Updating on map ensured that the config was refreshed after each
 time the bar was mapped.  In the very least, the config needs to be
 refreshed during reset because the guest may write to the guest
 space which should get cleared after reset.
 
 Regards,
 
 Anthony Liguori
 
 Not sure I understand. Which register, for example,
 do you have in mind?
 Could you clarify please?
 
 Actually, you never need to call config_get() AFAICT.  It's called
 in every read/write access.

Every read, yes. But every write? Are you sure?

  So I think the code you changed is
 extraneous now.
 
 Regards,
 
 Anthony Liguori


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Avi Kivity

QEMU deals with a lot of fixed width integer types; their names
(uint64_t etc) are clumsy to use and take up a lot of space.

Following Linux, introduce shorter names, for example U64 for
uint64_t.

Signed-off-by: Avi Kivity a...@redhat.com
---
 qemu-common.h |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/qemu-common.h b/qemu-common.h
index 0fdecf1..52a2300 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -112,6 +112,15 @@ static inline char *realpath(const char *path, char 
*resolved_path)
 int qemu_main(int argc, char **argv, char **envp);
 #endif
 
+typedef int8_t   S8;
+typedef uint8_t  U8;
+typedef int16_t  S16;
+typedef uint16_t U16;
+typedef int32_t  S32;
+typedef uint32_t U32;
+typedef int64_t  S64;
+typedef uint64_t U64;
+
 /* bottom halves */
 typedef void QEMUBHFunc(void *opaque);
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Anthony Liguori


On 08/08/2011 07:56 AM, Avi Kivity wrote:

QEMU deals with a lot of fixed width integer types; their names
(uint64_t etc) are clumsy to use and take up a lot of space.

Following Linux, introduce shorter names, for example U64 for
uint64_t.


Except Linux uses lower case letters.

I personally think Linux style is wrong here.  The int8_t types are 
standard types.


Besides, we save lots of characters by using 4-space tabs instead of 
8-space tabs.  We can afford to spend some of those saved characters on 
using proper type names :-)


Regards,

Anthony Liguori



Signed-off-by: Avi Kivitya...@redhat.com
---
  qemu-common.h |9 +
  1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/qemu-common.h b/qemu-common.h
index 0fdecf1..52a2300 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -112,6 +112,15 @@ static inline char *realpath(const char *path, char 
*resolved_path)
  int qemu_main(int argc, char **argv, char **envp);
  #endif

+typedef int8_t   S8;
+typedef uint8_t  U8;
+typedef int16_t  S16;
+typedef uint16_t U16;
+typedef int32_t  S32;
+typedef uint32_t U32;
+typedef int64_t  S64;
+typedef uint64_t U64;
+
  /* bottom halves */
  typedef void QEMUBHFunc(void *opaque);



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Anthony Liguori


On 08/08/2011 07:56 AM, Michael S. Tsirkin wrote:

On Mon, Aug 08, 2011 at 07:45:19AM -0500, Anthony Liguori wrote:

On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote:

Thinking more closely, I don't think this right.

Updating on map ensured that the config was refreshed after each
time the bar was mapped.  In the very least, the config needs to be
refreshed during reset because the guest may write to the guest
space which should get cleared after reset.

Regards,

Anthony Liguori


Not sure I understand. Which register, for example,
do you have in mind?
Could you clarify please?


Actually, you never need to call config_get() AFAICT.  It's called
in every read/write access.


Every read, yes. But every write? Are you sure?


Yeah, not on write, but I think this is a bug.  get_config() should be 
called before doing the memcpy() in order to have a proper RMW.


Regards,

Anthony Liguori




  So I think the code you changed is
extraneous now.

Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: balloon: test multiple devices

2011-08-08 Thread Amit Shah

Multiple balloon devices should not be allowed.  Check if the qemu we're
running under has the right fixes.

Signed-off-by: Amit Shah amit.s...@redhat.com
---
 client/tests/kvm/tests/balloon_check.py |   14 ++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/balloon_check.py 
b/client/tests/kvm/tests/balloon_check.py
index 0b7f0f4..d79ed13 100644
--- a/client/tests/kvm/tests/balloon_check.py
+++ b/client/tests/kvm/tests/balloon_check.py
@@ -64,6 +64,18 @@ def run_balloon_check(test, params, env):
 fail += 1
 return fail
 
+def multiple_devices():
+
+Hot-plugging multiple balloon devices isn't allowed.
+Ensure qemu fails hot-plugging a second device.
+
+try:
+vm.monitor.cmd(device_add virtio-balloon-pci)
+except kvm_monitor.MonitorError, e:
+# This is good.
+return 0
+logging.error(Multiple balloon devices allowed by this version of 
qemu)
+return 1
 
 fail = 0
 vm = env.get_vm(params[main_vm])
@@ -100,6 +112,8 @@ def run_balloon_check(test, params, env):
 # we won't trigger guest OOM killer while running multiple iterations
 fail += balloon_memory(vm_assigned_mem)
 
+fail += multiple_devices()
+
 # Close stablished session
 session.close()
 # Check if any failures happen during the whole test
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 01/39] memory: rename PORTIO_END to PORTIO_END_OF_LIST

2011-08-08 Thread Avi Kivity

For consistency with other _END_OF_LIST macros.

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/memory.h b/memory.h
index 4e518b2..da00a3b 100644
--- a/memory.h
+++ b/memory.h
@@ -133,7 +133,7 @@ struct MemoryRegionPortio {
 IOPortWriteFunc *write;
 };
 
-#define PORTIO_END { }
+#define PORTIO_END_OF_LIST() { }
 
 /**
  * memory_region_init: Initialize a memory region
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 00/39] Memory API, batch 2: PCI devices

2011-08-08 Thread Avi Kivity

This is a mostly mindless conversion of all QEMU PCI devices to the memory API.
After this patchset is applied, it is no longer possible to create a PCI device
using the old API.

An immediate benefit is that PCI BARs that overlap each other are now handled
correctly: currently, the sequence

  map BAR 0
  map BAR 1 at an overlapping address
  unmap either BAR 0 or BAR 1

will leave a hole where the overlap exists.  With the patchset, the memory map
is restored correctly.

Note that overlaps of PCI BARs with memory or non-PCI resources are still not
resolved correctly; this will be fixed later on.

The vga patches have ugly intermediate states; however the result is fairly 
clean.

Changes from v3:
 - dropped virtio-pci config patch; will be fixed outside this patchset if
   necessary
 - minor style fixes

Changes from v2:
 - added patch from Michael simplifying virtio-pci config setup

Changes from v1:
 - cmd646 type fix
 - folded a fixlet into its parent

Avi Kivity (39):
  memory: rename PORTIO_END to PORTIO_END_OF_LIST
  pci: add API to get a BAR's mapped address
  vmsvga: don't remember pci BAR address in callback any more
  vga: convert vga and its derivatives to the memory API
  cirrus: simplify mmio BAR access functions
  cirrus: simplify bitblt BAR access functions
  cirrus: simplify vga window mmio access functions
  vga: simplify vga window mmio access functions
  cirrus: simplify linear framebuffer access functions
  Integrate I/O memory regions into qemu
  pci: pass I/O address space to new PCI bus
  pci: allow I/O BARs to be registered with pci_register_bar_region()
  rtl8139: convert to memory API
  ac97: convert to memory API
  e1000: convert to memory API
  eepro100: convert to memory API
  es1370: convert to memory API
  ide: convert to memory API
  ivshmem: convert to memory API
  virtio-pci: convert to memory API
  ahci: convert to memory API
  intel-hda: convert to memory API
  lsi53c895a: convert to memory API
  ppc: convert to memory API
  ne2000: convert to memory API
  pcnet: convert to memory API
  i6300esb: convert to memory API
  isa-mmio: convert to memory API
  sun4u: convert to memory API
  ehci: convert to memory API
  uhci: convert to memory API
  xen-platform: convert to memory API
  msix: convert to memory API
  pci: remove pci_register_bar_simple()
  pci: convert pci rom to memory API
  pci: remove pci_register_bar()
  pci: fold BAR mapping function into its caller
  pci: rename pci_register_bar_region() to pci_register_bar()
  pci: remove support for pre memory API BARs

 exec-memory.h  |5 +
 exec.c |   10 ++
 hw/ac97.c  |   88 ++-
 hw/apb_pci.c   |1 +
 hw/bonito.c|1 +
 hw/cirrus_vga.c|  459 ---
 hw/cuda.c  |6 +-
 hw/e1000.c |  113 ++
 hw/eepro100.c  |  181 -
 hw/es1370.c|   43 +++--
 hw/escc.c  |   42 +++---
 hw/escc.h  |2 +-
 hw/grackle_pci.c   |8 +-
 hw/gt64xxx.c   |4 +-
 hw/heathrow_pic.c  |   29 ++--
 hw/ide.h   |2 +-
 hw/ide/ahci.c  |   31 ++--
 hw/ide/ahci.h  |2 +-
 hw/ide/cmd646.c|  204 +++-
 hw/ide/ich.c   |3 +-
 hw/ide/macio.c |   36 +++--
 hw/ide/pci.c   |   25 ++--
 hw/ide/pci.h   |   19 ++-
 hw/ide/piix.c  |   63 ++--
 hw/ide/via.c   |   64 ++--
 hw/intel-hda.c |   35 +++--
 hw/isa.h   |2 +
 hw/isa_mmio.c  |   29 ++--
 hw/ivshmem.c   |  158 +++
 hw/lance.c |   31 ++--
 hw/lsi53c895a.c|  257 +++---
 hw/mac_dbdma.c |   32 ++--
 hw/mac_dbdma.h |4 +-
 hw/mac_nvram.c |   39 ++---
 hw/macio.c |   73 -
 hw/msix.c  |   64 +++-
 hw/msix.h  |6 +-
 hw/ne2000-isa.c|   13 +--
 hw/ne2000.c|   77 ++---
 hw/ne2000.h|8 +-
 hw/openpic.c   |   81 +-
 hw/openpic.h   |2 +-
 hw/pc.h|4 +-
 hw/pc_piix.c   |6 +-
 hw/pci.c   |  133 +---
 hw/pci.h   |   26 ++--
 hw/pci_internals.h |3 +-
 hw/pcnet-pci.c |   74 +
 hw/pcnet.h |4 +-
 hw/piix_pci.c  |   14 +-
 hw/ppc4xx_pci.c|1 +
 hw/ppc_mac.h   |   27 ++--
 hw/ppc_newworld.c  |   34 ++--
 hw/ppc_oldworld.c  |   27 ++--
 hw/ppc_prep.c  |2 +-
 hw/ppce500_pci.c   |7 +-
 hw/prep_pci.c  |8 +-
 hw/prep_pci.h  |4 +-
 hw/qxl-render.c|2 +-
 hw/qxl.c   |  129 ++--
 hw/qxl.h   |6 +-
 hw/rtl8139.c   |   70 
 hw/sh_pci.c|4 +-
 hw/sun4u.c |   53 +++
 hw/unin_pci.c  |   16 ++-
 hw/usb-ehci.c  |   36 +---
 hw/usb-ohci.c  |2 +-
 hw/usb-uhci.c  |   41 +++--
 hw/versatile_pci.c |2 +-
 hw/vga-isa-mm.c|   46 --
 hw/vga-isa.c   |   10 +-
 hw/vga-pci.c   |   27

[PATCH v4 22/39] intel-hda: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/intel-hda.c |   35 +++
 1 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/hw/intel-hda.c b/hw/intel-hda.c
index 5a2bc3a..1e4c71e 100644
--- a/hw/intel-hda.c
+++ b/hw/intel-hda.c
@@ -177,7 +177,7 @@ struct IntelHDAState {
 IntelHDAStream st[8];
 
 /* state */
-int mmio_addr;
+MemoryRegion mmio;
 uint32_t rirb_count;
 int64_t wall_base_ns;
 
@@ -1084,16 +1084,20 @@ static uint32_t intel_hda_mmio_readl(void *opaque, 
target_phys_addr_t addr)
 return intel_hda_reg_read(d, reg, 0x);
 }
 
-static CPUReadMemoryFunc * const intel_hda_mmio_read[3] = {
-intel_hda_mmio_readb,
-intel_hda_mmio_readw,
-intel_hda_mmio_readl,
-};
-
-static CPUWriteMemoryFunc * const intel_hda_mmio_write[3] = {
-intel_hda_mmio_writeb,
-intel_hda_mmio_writew,
-intel_hda_mmio_writel,
+static const MemoryRegionOps intel_hda_mmio_ops = {
+.old_mmio = {
+.read = {
+intel_hda_mmio_readb,
+intel_hda_mmio_readw,
+intel_hda_mmio_readl,
+},
+.write = {
+intel_hda_mmio_writeb,
+intel_hda_mmio_writew,
+intel_hda_mmio_writel,
+},
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 /* - */
@@ -1130,10 +1134,9 @@ static int intel_hda_init(PCIDevice *pci)
 /* HDCTL off 0x40 bit 0 selects signaling mode (1-HDA, 0 - Ac97) 18.1.19 */
 conf[0x40] = 0x01;
 
-d-mmio_addr = cpu_register_io_memory(intel_hda_mmio_read,
-  intel_hda_mmio_write, d,
-  DEVICE_NATIVE_ENDIAN);
-pci_register_bar_simple(d-pci, 0, 0x4000, 0, d-mmio_addr);
+memory_region_init_io(d-mmio, intel_hda_mmio_ops, d,
+  intel-hda, 0x4000);
+pci_register_bar_region(d-pci, 0, 0, d-mmio);
 if (d-msi) {
 msi_init(d-pci, 0x50, 1, true, false);
 }
@@ -1149,7 +1152,7 @@ static int intel_hda_exit(PCIDevice *pci)
 IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci);
 
 msi_uninit(d-pci);
-cpu_unregister_io_memory(d-mmio_addr);
+memory_region_destroy(d-mmio);
 return 0;
 }
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 13/39] rtl8139: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/rtl8139.c |   72 ++---
 1 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 5214b8c..f07af35 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -474,7 +474,6 @@ typedef struct RTL8139State {
 
 NICState *nic;
 NICConf conf;
-int rtl8139_mmio_io_addr;
 
 /* C ring mode */
 uint32_t   currTxDesc;
@@ -506,6 +505,9 @@ typedef struct RTL8139State {
 QEMUTimer *timer;
 int64_t TimerExpire;
 
+MemoryRegion bar_io;
+MemoryRegion bar_mem;
+
 /* Support migration to/from old versions */
 int rtl8139_mmio_io_addr_dummy;
 } RTL8139State;
@@ -3283,7 +3285,7 @@ static void rtl8139_pre_save(void *opaque)
 rtl8139_set_next_tctr_time(s, current_time);
 s-TCTR = muldiv64(current_time - s-TCTR_base, PCI_FREQUENCY,
get_ticks_per_sec());
-s-rtl8139_mmio_io_addr_dummy = s-rtl8139_mmio_io_addr;
+s-rtl8139_mmio_io_addr_dummy = 0;
 }
 
 static const VMStateDescription vmstate_rtl8139 = {
@@ -3379,31 +3381,35 @@ static const VMStateDescription vmstate_rtl8139 = {
 /***/
 /* PCI RTL8139 definitions */
 
-static void rtl8139_ioport_map(PCIDevice *pci_dev, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
-{
-RTL8139State *s = DO_UPCAST(RTL8139State, dev, pci_dev);
-
-register_ioport_write(addr, 0x100, 1, rtl8139_ioport_writeb, s);
-register_ioport_read( addr, 0x100, 1, rtl8139_ioport_readb,  s);
-
-register_ioport_write(addr, 0x100, 2, rtl8139_ioport_writew, s);
-register_ioport_read( addr, 0x100, 2, rtl8139_ioport_readw,  s);
-
-register_ioport_write(addr, 0x100, 4, rtl8139_ioport_writel, s);
-register_ioport_read( addr, 0x100, 4, rtl8139_ioport_readl,  s);
-}
+static const MemoryRegionPortio rtl8139_portio[] = {
+{ 0, 0x100, 1, .read = rtl8139_ioport_readb, },
+{ 0, 0x100, 1, .write = rtl8139_ioport_writeb, },
+{ 0, 0x100, 2, .read = rtl8139_ioport_readw, },
+{ 0, 0x100, 2, .write = rtl8139_ioport_writew, },
+{ 0, 0x100, 4, .read = rtl8139_ioport_readl, },
+{ 0, 0x100, 4, .write = rtl8139_ioport_writel, },
+PORTIO_END_OF_LIST()
+};
 
-static CPUReadMemoryFunc * const rtl8139_mmio_read[3] = {
-rtl8139_mmio_readb,
-rtl8139_mmio_readw,
-rtl8139_mmio_readl,
+static const MemoryRegionOps rtl8139_io_ops = {
+.old_portio = rtl8139_portio,
+.endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const rtl8139_mmio_write[3] = {
-rtl8139_mmio_writeb,
-rtl8139_mmio_writew,
-rtl8139_mmio_writel,
+static const MemoryRegionOps rtl8139_mmio_ops = {
+.old_mmio = {
+.read = {
+rtl8139_mmio_readb,
+rtl8139_mmio_readw,
+rtl8139_mmio_readl,
+},
+.write = {
+rtl8139_mmio_writeb,
+rtl8139_mmio_writew,
+rtl8139_mmio_writel,
+},
+},
+.endianness = DEVICE_LITTLE_ENDIAN,
 };
 
 static void rtl8139_timer(void *opaque)
@@ -3432,7 +3438,8 @@ static int pci_rtl8139_uninit(PCIDevice *dev)
 {
 RTL8139State *s = DO_UPCAST(RTL8139State, dev, dev);
 
-cpu_unregister_io_memory(s-rtl8139_mmio_io_addr);
+memory_region_destroy(s-bar_io);
+memory_region_destroy(s-bar_mem);
 if (s-cplus_txbuffer) {
 qemu_free(s-cplus_txbuffer);
 s-cplus_txbuffer = NULL;
@@ -3462,15 +3469,12 @@ static int pci_rtl8139_init(PCIDevice *dev)
  * list bit in status register, and offset 0xdc seems unused. */
 pci_conf[PCI_CAPABILITY_LIST] = 0xdc;
 
-/* I/O handler for memory-mapped I/O */
-s-rtl8139_mmio_io_addr =
-cpu_register_io_memory(rtl8139_mmio_read, rtl8139_mmio_write, s,
-   DEVICE_LITTLE_ENDIAN);
-
-pci_register_bar(s-dev, 0, 0x100,
-   PCI_BASE_ADDRESS_SPACE_IO,  rtl8139_ioport_map);
-
-pci_register_bar_simple(s-dev, 1, 0x100, 0, s-rtl8139_mmio_io_addr);
+memory_region_init_io(s-bar_io, rtl8139_io_ops, s, rtl8139, 0x100);
+memory_region_init_io(s-bar_mem, rtl8139_mmio_ops, s, rtl8139, 0x100);
+pci_register_bar_region(s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
+s-bar_io);
+pci_register_bar_region(s-dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
+s-bar_mem);
 
 qemu_macaddr_default_if_unset(s-conf.macaddr);
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 21/39] ahci: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ide/ahci.c |   31 +--
 hw/ide/ahci.h |2 +-
 hw/ide/ich.c  |3 +--
 3 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 1f008a3..e207ca0 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -276,12 +276,12 @@ static void  ahci_port_write(AHCIState *s, int port, int 
offset, uint32_t val)
 }
 }
 
-static uint32_t ahci_mem_readl(void *ptr, target_phys_addr_t addr)
+static uint64_t ahci_mem_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
 {
-AHCIState *s = ptr;
+AHCIState *s = opaque;
 uint32_t val = 0;
 
-addr = addr  0xfff;
 if (addr  AHCI_GENERIC_HOST_CONTROL_REGS_MAX_ADDR) {
 switch (addr) {
 case HOST_CAP:
@@ -314,10 +314,10 @@ static uint32_t ahci_mem_readl(void *ptr, 
target_phys_addr_t addr)
 
 
 
-static void ahci_mem_writel(void *ptr, target_phys_addr_t addr, uint32_t val)
+static void ahci_mem_write(void *opaque, target_phys_addr_t addr,
+   uint64_t val, unsigned size)
 {
-AHCIState *s = ptr;
-addr = addr  0xfff;
+AHCIState *s = opaque;
 
 /* Only aligned reads are allowed on AHCI */
 if (addr  3) {
@@ -364,16 +364,10 @@ static void ahci_mem_writel(void *ptr, target_phys_addr_t 
addr, uint32_t val)
 
 }
 
-static CPUReadMemoryFunc * const ahci_readfn[3]={
-ahci_mem_readl,
-ahci_mem_readl,
-ahci_mem_readl
-};
-
-static CPUWriteMemoryFunc * const ahci_writefn[3]={
-ahci_mem_writel,
-ahci_mem_writel,
-ahci_mem_writel
+static MemoryRegionOps ahci_mem_ops = {
+.read = ahci_mem_read,
+.write = ahci_mem_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
 };
 
 static void ahci_reg_init(AHCIState *s)
@@ -1131,8 +1125,8 @@ void ahci_init(AHCIState *s, DeviceState *qdev, int ports)
 s-ports = ports;
 s-dev = qemu_mallocz(sizeof(AHCIDevice) * ports);
 ahci_reg_init(s);
-s-mem = cpu_register_io_memory(ahci_readfn, ahci_writefn, s,
-DEVICE_LITTLE_ENDIAN);
+/* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */
+memory_region_init_io(s-mem, ahci_mem_ops, s, ahci, 0x1000);
 irqs = qemu_allocate_irqs(ahci_irq_set, s, s-ports);
 
 for (i = 0; i  s-ports; i++) {
@@ -1151,6 +1145,7 @@ void ahci_init(AHCIState *s, DeviceState *qdev, int ports)
 
 void ahci_uninit(AHCIState *s)
 {
+memory_region_destroy(s-mem);
 qemu_free(s-dev);
 }
 
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index dc86951..e456193 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -289,7 +289,7 @@ struct AHCIDevice {
 typedef struct AHCIState {
 AHCIDevice *dev;
 AHCIControlRegs control_regs;
-int mem;
+MemoryRegion mem;
 int ports;
 qemu_irq irq;
 } AHCIState;
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index d241ea8..698b5f6 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -98,8 +98,7 @@ static int pci_ich9_ahci_init(PCIDevice *dev)
 msi_init(dev, 0x50, 1, true, false);
 d-ahci.irq = d-card.irq[0];
 
-/* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */
-pci_register_bar_simple(d-card, 5, 0x1000, 0, d-ahci.mem);
+pci_register_bar_region(d-card, 5, 0, d-ahci.mem);
 
 return 0;
 }
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 15/39] e1000: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/e1000.c |  114 +--
 1 files changed, 48 insertions(+), 66 deletions(-)

diff --git a/hw/e1000.c b/hw/e1000.c
index 96d84f9..dfc082b 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -82,7 +82,8 @@ typedef struct E1000State_st {
 PCIDevice dev;
 NICState *nic;
 NICConf conf;
-int mmio_index;
+MemoryRegion mmio;
+MemoryRegion io;
 
 uint32_t mac_reg[0x8000];
 uint16_t phy_reg[0x20];
@@ -151,14 +152,6 @@ static const char phy_regcap[0x20] = {
 };
 
 static void
-ioport_map(PCIDevice *pci_dev, int region_num, pcibus_t addr,
-   pcibus_t size, int type)
-{
-DBGOUT(IO, e1000_ioport_map addr=0x%04FMT_PCIBUS
-size=0x%08FMT_PCIBUS\n, addr, size);
-}
-
-static void
 set_interrupt_cause(E1000State *s, int index, uint32_t val)
 {
 if (val)
@@ -905,7 +898,8 @@ static void (*macreg_writeops[])(E1000State *, int, 
uint32_t) = {
 enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
 
 static void
-e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
+e1000_mmio_write(void *opaque, target_phys_addr_t addr, uint64_t val,
+ unsigned size)
 {
 E1000State *s = opaque;
 unsigned int index = (addr  0x1)  2;
@@ -913,31 +907,15 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, 
uint32_t val)
 if (index  NWRITEOPS  macreg_writeops[index]) {
 macreg_writeops[index](s, index, val);
 } else if (index  NREADOPS  macreg_readops[index]) {
-DBGOUT(MMIO, e1000_mmio_writel RO %x: 0x%04x\n, index2, val);
+DBGOUT(MMIO, e1000_mmio_writel RO %x: 0x%04PRIx64\n, index2, 
val);
 } else {
-DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n,
+DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08PRIx64\n,
index2, val);
 }
 }
 
-static void
-e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-// emulate hw without byte enables: no RMW
-e1000_mmio_writel(opaque, addr  ~3,
-  (val  0x)  (8*(addr  3)));
-}
-
-static void
-e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-// emulate hw without byte enables: no RMW
-e1000_mmio_writel(opaque, addr  ~3,
-  (val  0xff)  (8*(addr  3)));
-}
-
-static uint32_t
-e1000_mmio_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t
+e1000_mmio_read(void *opaque, target_phys_addr_t addr, unsigned size)
 {
 E1000State *s = opaque;
 unsigned int index = (addr  0x1)  2;
@@ -950,20 +928,39 @@ e1000_mmio_readl(void *opaque, target_phys_addr_t addr)
 return 0;
 }
 
-static uint32_t
-e1000_mmio_readb(void *opaque, target_phys_addr_t addr)
+static const MemoryRegionOps e1000_mmio_ops = {
+.read = e1000_mmio_read,
+.write = e1000_mmio_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
+static uint64_t e1000_io_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
 {
-return ((e1000_mmio_readl(opaque, addr  ~3)) 
-(8 * (addr  3)))  0xff;
+E1000State *s = opaque;
+
+(void)s;
+return 0;
 }
 
-static uint32_t
-e1000_mmio_readw(void *opaque, target_phys_addr_t addr)
+static void e1000_io_write(void *opaque, target_phys_addr_t addr,
+   uint64_t val, unsigned size)
 {
-return ((e1000_mmio_readl(opaque, addr  ~3)) 
-(8 * (addr  3)))  0x;
+E1000State *s = opaque;
+
+(void)s;
 }
 
+static const MemoryRegionOps e1000_io_ops = {
+.read = e1000_io_read,
+.write = e1000_io_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
 static bool is_version_1(void *opaque, int version_id)
 {
 return version_id == 1;
@@ -1083,36 +1080,22 @@ static const uint32_t mac_reg_init[] = {
 
 /* PCI interface */
 
-static CPUWriteMemoryFunc * const e1000_mmio_write[] = {
-e1000_mmio_writeb, e1000_mmio_writew,  e1000_mmio_writel
-};
-
-static CPUReadMemoryFunc * const e1000_mmio_read[] = {
-e1000_mmio_readb,  e1000_mmio_readw,   e1000_mmio_readl
-};
-
 static void
-e1000_mmio_map(PCIDevice *pci_dev, int region_num,
-pcibus_t addr, pcibus_t size, int type)
+e1000_mmio_setup(E1000State *d)
 {
-E1000State *d = DO_UPCAST(E1000State, dev, pci_dev);
 int i;
 const uint32_t excluded_regs[] = {
 E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
 E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
 };
 
-
-DBGOUT(MMIO, e1000_mmio_map addr=0x%08FMT_PCIBUS 0x%08FMT_PCIBUS\n,
-   addr, size);
-
-cpu_register_physical_memory(addr, PNPMMIO_SIZE, d-mmio_index);
-qemu_register_coalesced_mmio(addr, excluded_regs[0]);
-
+memory_region_init_io(d-mmio,

[PATCH v4 38/39] pci: rename pci_register_bar_region() to pci_register_bar()

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ac97.c |5 ++---
 hw/cirrus_vga.c   |5 ++---
 hw/e1000.c|5 ++---
 hw/eepro100.c |7 +++
 hw/es1370.c   |2 +-
 hw/ide/cmd646.c   |   14 +-
 hw/ide/ich.c  |2 +-
 hw/ide/piix.c |3 +--
 hw/ide/via.c  |3 +--
 hw/intel-hda.c|2 +-
 hw/ivshmem.c  |   15 +++
 hw/lsi53c895a.c   |7 +++
 hw/macio.c|3 +--
 hw/ne2000.c   |2 +-
 hw/openpic.c  |4 ++--
 hw/pci.c  |6 +++---
 hw/pci.h  |4 ++--
 hw/pcnet-pci.c|4 ++--
 hw/qxl.c  |   16 
 hw/rtl8139.c  |6 ++
 hw/sun4u.c|6 ++
 hw/usb-ehci.c |2 +-
 hw/usb-ohci.c |2 +-
 hw/usb-uhci.c |3 +--
 hw/vga-pci.c  |3 +--
 hw/virtio-pci.c   |9 -
 hw/vmware_vga.c   |8 
 hw/wdt_i6300esb.c |2 +-
 hw/xen_platform.c |7 +++
 29 files changed, 68 insertions(+), 89 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 52f0f0d..541d9a4 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -1316,9 +1316,8 @@ static int ac97_initfn (PCIDevice *dev)
 
 memory_region_init_io (s-io_nam, ac97_io_nam_ops, s, ac97-nam, 1024);
 memory_region_init_io (s-io_nabm, ac97_io_nabm_ops, s, ac97-nabm, 
256);
-pci_register_bar_region (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, 
s-io_nam);
-pci_register_bar_region (s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO,
- s-io_nabm);
+pci_register_bar (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io_nam);
+pci_register_bar (s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, s-io_nabm);
 qemu_register_reset (ac97_on_reset, s);
 AUD_register_card (ac97, s-card);
 ac97_on_reset (s);
diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index c9887ac..b489309 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2948,10 +2948,9 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev)
  /* memory #0 LFB */
  /* memory #1 memory-mapped I/O */
  /* XXX: s-vga.vram_size must be a power of two */
- pci_register_bar_region(d-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH,
- s-pci_bar);
+ pci_register_bar(d-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, s-pci_bar);
  if (device_id == CIRRUS_ID_CLGD5446) {
- pci_register_bar_region(d-dev, 1, 0, s-cirrus_mmio_io);
+ pci_register_bar(d-dev, 1, 0, s-cirrus_mmio_io);
  }
  return 0;
 }
diff --git a/hw/e1000.c b/hw/e1000.c
index dfc082b..29b453f 100644
--- a/hw/e1000.c
+++ b/hw/e1000.c
@@ -1158,10 +1158,9 @@ static int pci_e1000_init(PCIDevice *pci_dev)
 
 e1000_mmio_setup(d);
 
-pci_register_bar_region(d-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
-d-mmio);
+pci_register_bar(d-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, d-mmio);
 
-pci_register_bar_region(d-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, d-io);
+pci_register_bar(d-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, d-io);
 
 memmove(d-eeprom_data, e1000_eeprom_template,
 sizeof e1000_eeprom_template);
diff --git a/hw/eepro100.c b/hw/eepro100.c
index 04723f3..a636d30 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -1879,15 +1879,14 @@ static int e100_nic_init(PCIDevice *pci_dev)
 /* Handler for memory-mapped I/O */
 memory_region_init_io(s-mmio_bar, eepro100_ops, s, eepro100-mmio,
   PCI_MEM_SIZE);
-pci_register_bar_region(s-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH,
-s-mmio_bar);
+pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, s-mmio_bar);
 memory_region_init_io(s-io_bar, eepro100_ops, s, eepro100-io,
   PCI_IO_SIZE);
-pci_register_bar_region(s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, s-io_bar);
+pci_register_bar(s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, s-io_bar);
 /* FIXME: flash aliases to mmio?! */
 memory_region_init_io(s-flash_bar, eepro100_ops, s, eepro100-flash,
   PCI_FLASH_SIZE);
-pci_register_bar_region(s-dev, 2, 0, s-flash_bar);
+pci_register_bar(s-dev, 2, 0, s-flash_bar);
 
 qemu_macaddr_default_if_unset(s-conf.macaddr);
 logout(macaddr: %s\n, nic_dump(s-conf.macaddr.a[0], 6));
diff --git a/hw/es1370.c b/hw/es1370.c
index 4e43c4a..a9387d1 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -1009,7 +1009,7 @@ static int es1370_initfn (PCIDevice *dev)
 c[PCI_MAX_LAT] = 0x80;
 
 memory_region_init_io (s-io, es1370_io_ops, s, es1370, 256);
-pci_register_bar_region (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io);
+pci_register_bar (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io);
 qemu_register_reset (es1370_on_reset, s);
 
 AUD_register_card (es1370, s-card);
diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 13e6f2f..4d91e2c 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -270,16

[PATCH v4 24/39] ppc: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cuda.c |6 ++-
 hw/escc.c |   42 +--
 hw/escc.h |2 +-
 hw/heathrow_pic.c |   29 --
 hw/ide.h  |2 +-
 hw/ide/macio.c|   36 ---
 hw/mac_dbdma.c|   32 ++--
 hw/mac_dbdma.h|4 ++-
 hw/mac_nvram.c|   39 ++---
 hw/macio.c|   74 +++-
 hw/openpic.c  |   81 +
 hw/openpic.h  |2 +-
 hw/ppc_mac.h  |   16 ++
 hw/ppc_newworld.c |   30 +--
 hw/ppc_oldworld.c |   23 +++
 15 files changed, 201 insertions(+), 217 deletions(-)

diff --git a/hw/cuda.c b/hw/cuda.c
index 065c362..5c92d81 100644
--- a/hw/cuda.c
+++ b/hw/cuda.c
@@ -117,6 +117,7 @@ typedef struct CUDATimer {
 } CUDATimer;
 
 typedef struct CUDAState {
+MemoryRegion mem;
 /* cuda registers */
 uint8_t b;  /* B-side data */
 uint8_t a;  /* A-side data */
@@ -722,7 +723,7 @@ static void cuda_reset(void *opaque)
 set_counter(s, s-timers[1], 0x);
 }
 
-void cuda_init (int *cuda_mem_index, qemu_irq irq)
+void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq)
 {
 struct tm tm;
 CUDAState *s = cuda_state;
@@ -738,8 +739,9 @@ void cuda_init (int *cuda_mem_index, qemu_irq irq)
 s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET;
 
 s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s);
-*cuda_mem_index = cpu_register_io_memory(cuda_read, cuda_write, s,
+cpu_register_io_memory(cuda_read, cuda_write, s,
  DEVICE_NATIVE_ENDIAN);
+*cuda_mem = s-mem;
 vmstate_register(NULL, -1, vmstate_cuda, s);
 qemu_register_reset(cuda_reset, s);
 }
diff --git a/hw/escc.c b/hw/escc.c
index f6fd919..bea5873 100644
--- a/hw/escc.c
+++ b/hw/escc.c
@@ -126,7 +126,7 @@ struct SerialState {
 SysBusDevice busdev;
 struct ChannelState chn[2];
 uint32_t it_shift;
-int mmio_index;
+MemoryRegion mmio;
 uint32_t disabled;
 uint32_t frequency;
 };
@@ -490,7 +490,8 @@ static void escc_update_parameters(ChannelState *s)
 qemu_chr_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_PARAMS, ssp);
 }
 
-static void escc_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t 
val)
+static void escc_mem_write(void *opaque, target_phys_addr_t addr,
+   uint64_t val, unsigned size)
 {
 SerialState *serial = opaque;
 ChannelState *s;
@@ -592,7 +593,8 @@ static void escc_mem_writeb(void *opaque, 
target_phys_addr_t addr, uint32_t val)
 }
 }
 
-static uint32_t escc_mem_readb(void *opaque, target_phys_addr_t addr)
+static uint64_t escc_mem_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
 {
 SerialState *serial = opaque;
 ChannelState *s;
@@ -627,6 +629,16 @@ static uint32_t escc_mem_readb(void *opaque, 
target_phys_addr_t addr)
 return 0;
 }
 
+static const MemoryRegionOps escc_mem_ops = {
+.read = escc_mem_read,
+.write = escc_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+};
+
 static int serial_can_receive(void *opaque)
 {
 ChannelState *s = opaque;
@@ -668,18 +680,6 @@ static void serial_event(void *opaque, int event)
 serial_receive_break(s);
 }
 
-static CPUReadMemoryFunc * const escc_mem_read[3] = {
-escc_mem_readb,
-NULL,
-NULL,
-};
-
-static CPUWriteMemoryFunc * const escc_mem_write[3] = {
-escc_mem_writeb,
-NULL,
-NULL,
-};
-
 static const VMStateDescription vmstate_escc_chn = {
 .name =escc_chn,
 .version_id = 2,
@@ -712,7 +712,7 @@ static const VMStateDescription vmstate_escc = {
 }
 };
 
-int escc_init(target_phys_addr_t base, qemu_irq irqA, qemu_irq irqB,
+MemoryRegion *escc_init(target_phys_addr_t base, qemu_irq irqA, qemu_irq irqB,
   CharDriverState *chrA, CharDriverState *chrB,
   int clock, int it_shift)
 {
@@ -737,7 +737,7 @@ int escc_init(target_phys_addr_t base, qemu_irq irqA, 
qemu_irq irqB,
 }
 
 d = FROM_SYSBUS(SerialState, s);
-return d-mmio_index;
+return d-mmio;
 }
 
 static const uint8_t keycodes[128] = {
@@ -901,7 +901,6 @@ void slavio_serial_ms_kbd_init(target_phys_addr_t base, 
qemu_irq irq,
 static int escc_init1(SysBusDevice *dev)
 {
 SerialState *s = FROM_SYSBUS(SerialState, dev);
-int io;
 unsigned int i;
 
 s-chn[0].disabled = s-disabled;
@@ -918,10 +917,9 @@ static int escc_init1(SysBusDevice *dev)
 s-chn[0].otherchn = s-chn[1];
 s-chn[1].otherchn = s-chn[0];
 
-io = cpu_register_io_memory(escc_mem_read, escc_mem_write, s,
-DEVICE_NATIVE_ENDIAN);
-

[PATCH v4 20/39] virtio-pci: convert to memory API

2011-08-08 Thread Avi Kivity

except msix.

[jan: fix build]

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/virtio-pci.c |   71 +--
 hw/virtio-pci.h |2 +-
 2 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index f3b3293..5df380d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -162,7 +162,8 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
 {
 VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
 EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
-int r;
+int r = 0;
+
 if (assign) {
 r = event_notifier_init(notifier, 1);
 if (r  0) {
@@ -170,24 +171,11 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
  __func__, r);
 return r;
 }
-r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
-   n, assign);
-if (r  0) {
-error_report(%s: unable to map ioeventfd: %d,
- __func__, r);
-event_notifier_cleanup(notifier);
-}
+memory_region_add_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
+  true, n, event_notifier_get_fd(notifier));
 } else {
-r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
-   n, assign);
-if (r  0) {
-error_report(%s: unable to unmap ioeventfd: %d,
- __func__, r);
-return r;
-}
-
+memory_region_del_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
+  true, n, event_notifier_get_fd(notifier));
 /* Handle the race condition where the guest kicked and we deassigned
  * before we got around to handling the kick.
  */
@@ -424,7 +412,6 @@ static uint32_t virtio_pci_config_readb(void *opaque, 
uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config)
 return virtio_ioport_read(proxy, addr);
 addr -= config;
@@ -435,7 +422,6 @@ static uint32_t virtio_pci_config_readw(void *opaque, 
uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config)
 return virtio_ioport_read(proxy, addr);
 addr -= config;
@@ -446,7 +432,6 @@ static uint32_t virtio_pci_config_readl(void *opaque, 
uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config)
 return virtio_ioport_read(proxy, addr);
 addr -= config;
@@ -457,7 +442,6 @@ static void virtio_pci_config_writeb(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config) {
 virtio_ioport_write(proxy, addr, val);
 return;
@@ -470,7 +454,6 @@ static void virtio_pci_config_writew(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config) {
 virtio_ioport_write(proxy, addr, val);
 return;
@@ -483,7 +466,6 @@ static void virtio_pci_config_writel(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config) {
 virtio_ioport_write(proxy, addr, val);
 return;
@@ -492,30 +474,26 @@ static void virtio_pci_config_writel(void *opaque, 
uint32_t addr, uint32_t val)
 virtio_config_writel(proxy-vdev, addr, val);
 }
 
-static void virtio_map(PCIDevice *pci_dev, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
-{
-VirtIOPCIProxy *proxy = container_of(pci_dev, VirtIOPCIProxy, pci_dev);
-VirtIODevice *vdev = proxy-vdev;
-unsigned config_len = VIRTIO_PCI_REGION_SIZE(pci_dev) + vdev-config_len;
-
-proxy-addr = addr;
-
-register_ioport_write(addr, config_len, 1, virtio_pci_config_writeb, 
proxy);
-register_ioport_write(addr, config_len, 2, virtio_pci_config_writew, 
proxy);
-register_ioport_write(addr, config_len, 4, virtio_pci_config_writel, 
proxy);
-register_ioport_read(addr, config_len, 1, virtio_pci_config_readb, proxy);
-register_ioport_read(addr, config_len, 2, virtio_pci_config_readw, proxy);
-register_ioport_read(addr, config_len, 4,

[PATCH v4 25/39] ne2000: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ne2000-isa.c |   13 ++---
 hw/ne2000.c |   77 +-
 hw/ne2000.h |8 +
 3 files changed, 58 insertions(+), 40 deletions(-)

diff --git a/hw/ne2000-isa.c b/hw/ne2000-isa.c
index e41dbba..756ed5c 100644
--- a/hw/ne2000-isa.c
+++ b/hw/ne2000-isa.c
@@ -27,6 +27,7 @@
 #include qdev.h
 #include net.h
 #include ne2000.h
+#include exec-memory.h
 
 typedef struct ISANE2000State {
 ISADevice dev;
@@ -66,19 +67,11 @@ static int isa_ne2000_initfn(ISADevice *dev)
 ISANE2000State *isa = DO_UPCAST(ISANE2000State, dev, dev);
 NE2000State *s = isa-ne2000;
 
-register_ioport_write(isa-iobase, 16, 1, ne2000_ioport_write, s);
-register_ioport_read(isa-iobase, 16, 1, ne2000_ioport_read, s);
+ne2000_setup_io(s, 0x20);
 isa_init_ioport_range(dev, isa-iobase, 16);
-
-register_ioport_write(isa-iobase + 0x10, 1, 1, ne2000_asic_ioport_write, 
s);
-register_ioport_read(isa-iobase + 0x10, 1, 1, ne2000_asic_ioport_read, s);
-register_ioport_write(isa-iobase + 0x10, 2, 2, ne2000_asic_ioport_write, 
s);
-register_ioport_read(isa-iobase + 0x10, 2, 2, ne2000_asic_ioport_read, s);
 isa_init_ioport_range(dev, isa-iobase + 0x10, 2);
-
-register_ioport_write(isa-iobase + 0x1f, 1, 1, ne2000_reset_ioport_write, 
s);
-register_ioport_read(isa-iobase + 0x1f, 1, 1, ne2000_reset_ioport_read, 
s);
 isa_init_ioport(dev, isa-iobase + 0x1f);
+memory_region_add_subregion(get_system_io(), isa-iobase, s-io);
 
 isa_init_irq(dev, s-irq, isa-isairq);
 
diff --git a/hw/ne2000.c b/hw/ne2000.c
index f8acaae..5b76acf 100644
--- a/hw/ne2000.c
+++ b/hw/ne2000.c
@@ -297,7 +297,7 @@ ssize_t ne2000_receive(VLANClientState *nc, const uint8_t 
*buf, size_t size_)
 return size_;
 }
 
-void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val)
+static void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 NE2000State *s = opaque;
 int offset, page, index;
@@ -394,7 +394,7 @@ void ne2000_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 }
 }
 
-uint32_t ne2000_ioport_read(void *opaque, uint32_t addr)
+static uint32_t ne2000_ioport_read(void *opaque, uint32_t addr)
 {
 NE2000State *s = opaque;
 int offset, page, ret;
@@ -544,7 +544,7 @@ static inline void ne2000_dma_update(NE2000State *s, int 
len)
 }
 }
 
-void ne2000_asic_ioport_write(void *opaque, uint32_t addr, uint32_t val)
+static void ne2000_asic_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 NE2000State *s = opaque;
 
@@ -564,7 +564,7 @@ void ne2000_asic_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 }
 }
 
-uint32_t ne2000_asic_ioport_read(void *opaque, uint32_t addr)
+static uint32_t ne2000_asic_ioport_read(void *opaque, uint32_t addr)
 {
 NE2000State *s = opaque;
 int ret;
@@ -612,12 +612,12 @@ static uint32_t ne2000_asic_ioport_readl(void *opaque, 
uint32_t addr)
 return ret;
 }
 
-void ne2000_reset_ioport_write(void *opaque, uint32_t addr, uint32_t val)
+static void ne2000_reset_ioport_write(void *opaque, uint32_t addr, uint32_t 
val)
 {
 /* nothing to do (end of reset pulse) */
 }
 
-uint32_t ne2000_reset_ioport_read(void *opaque, uint32_t addr)
+static uint32_t ne2000_reset_ioport_read(void *opaque, uint32_t addr)
 {
 NE2000State *s = opaque;
 ne2000_reset(s);
@@ -676,27 +676,55 @@ static const VMStateDescription vmstate_pci_ne2000 = {
 }
 };
 
-/***/
-/* PCI NE2000 definitions */
+static uint64_t ne2000_read(void *opaque, target_phys_addr_t addr,
+unsigned size)
+{
+NE2000State *s = opaque;
 
-static void ne2000_map(PCIDevice *pci_dev, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
+if (addr  0x10  size == 1) {
+return ne2000_ioport_read(s, addr);
+} else if (addr == 0x10) {
+if (size = 2) {
+return ne2000_asic_ioport_read(s, addr);
+} else {
+return ne2000_asic_ioport_readl(s, addr);
+}
+} else if (addr == 0x1f  size == 1) {
+return ne2000_reset_ioport_read(s, addr);
+}
+return ((uint64_t)1  (size * 8)) - 1;
+}
+
+static void ne2000_write(void *opaque, target_phys_addr_t addr,
+ uint64_t data, unsigned size)
 {
-PCINE2000State *d = DO_UPCAST(PCINE2000State, dev, pci_dev);
-NE2000State *s = d-ne2000;
+NE2000State *s = opaque;
+
+if (addr  0x10  size == 1) {
+return ne2000_ioport_write(s, addr, data);
+} else if (addr == 0x10) {
+if (size = 2) {
+return ne2000_asic_ioport_write(s, addr, data);
+} else {
+return ne2000_asic_ioport_writel(s, addr, data);
+}
+} else if (addr == 0x1f  size == 1) {
+return ne2000_reset_ioport_write(s, addr, data);
+

[PATCH v4 32/39] xen-platform: convert to memory API

2011-08-08 Thread Avi Kivity

Since this device bypasses PCI and registers I/O ports directly with
the system bus, it needs further attention.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/xen_platform.c |   83 -
 1 files changed, 50 insertions(+), 33 deletions(-)

diff --git a/hw/xen_platform.c b/hw/xen_platform.c
index fb6be6a..0b89075 100644
--- a/hw/xen_platform.c
+++ b/hw/xen_platform.c
@@ -32,8 +32,8 @@
 #include xen_common.h
 #include net.h
 #include xen_backend.h
-#include rwhandler.h
 #include trace.h
+#include exec-memory.h
 
 #include xenguest.h
 
@@ -51,6 +51,9 @@
 
 typedef struct PCIXenPlatformState {
 PCIDevice  pci_dev;
+MemoryRegion fixed_io;
+MemoryRegion bar;
+MemoryRegion mmio_bar;
 uint8_t flags; /* used only for version_id == 2 */
 int drivers_blacklisted;
 uint16_t driver_product_version;
@@ -221,21 +224,32 @@ static void platform_fixed_ioport_reset(void *opaque)
 platform_fixed_ioport_writeb(s, XEN_PLATFORM_IOPORT, 0);
 }
 
+const MemoryRegionPortio xen_platform_ioport[] = {
+{ 0, 16, 4, .write = platform_fixed_ioport_writel, },
+{ 0, 16, 2, .write = platform_fixed_ioport_writew, },
+{ 0, 16, 1, .write = platform_fixed_ioport_writeb, },
+{ 0, 16, 2, .read = platform_fixed_ioport_readw, },
+{ 0, 16, 1, .read = platform_fixed_ioport_readb, },
+PORTIO_END_OF_LIST()
+};
+
+static const MemoryRegionOps platform_fixed_io_ops = {
+.old_portio = xen_platform_ioport,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 static void platform_fixed_ioport_init(PCIXenPlatformState* s)
 {
-register_ioport_write(XEN_PLATFORM_IOPORT, 16, 4, 
platform_fixed_ioport_writel, s);
-register_ioport_write(XEN_PLATFORM_IOPORT, 16, 2, 
platform_fixed_ioport_writew, s);
-register_ioport_write(XEN_PLATFORM_IOPORT, 16, 1, 
platform_fixed_ioport_writeb, s);
-register_ioport_read(XEN_PLATFORM_IOPORT, 16, 2, 
platform_fixed_ioport_readw, s);
-register_ioport_read(XEN_PLATFORM_IOPORT, 16, 1, 
platform_fixed_ioport_readb, s);
+memory_region_init_io(s-fixed_io, platform_fixed_io_ops, s,
+  xen-fixed, 16);
+memory_region_add_subregion(get_system_io(), XEN_PLATFORM_IOPORT,
+s-fixed_io);
 }
 
 /* Xen Platform PCI Device */
 
 static uint32_t xen_platform_ioport_readb(void *opaque, uint32_t addr)
 {
-addr = 0xff;
-
 if (addr == 0) {
 return platform_fixed_ioport_readb(opaque, XEN_PLATFORM_IOPORT);
 } else {
@@ -247,9 +261,6 @@ static void xen_platform_ioport_writeb(void *opaque, 
uint32_t addr, uint32_t val
 {
 PCIXenPlatformState *s = opaque;
 
-addr = 0xff;
-val  = 0xff;
-
 switch (addr) {
 case 0: /* Platform flags */
 platform_fixed_ioport_writeb(opaque, XEN_PLATFORM_IOPORT, val);
@@ -262,15 +273,23 @@ static void xen_platform_ioport_writeb(void *opaque, 
uint32_t addr, uint32_t val
 }
 }
 
-static void platform_ioport_map(PCIDevice *pci_dev, int region_num, pcibus_t 
addr, pcibus_t size, int type)
-{
-PCIXenPlatformState *d = DO_UPCAST(PCIXenPlatformState, pci_dev, pci_dev);
+static MemoryRegionPortio xen_pci_portio[] = {
+{ 0, 0x100, 1, .read = xen_platform_ioport_readb, },
+{ 0, 0x100, 1, .write = xen_platform_ioport_writeb, },
+PORTIO_END_OF_LIST()
+};
+
+static const MemoryRegionOps xen_pci_io_ops = {
+.old_portio = xen_pci_portio,
+};
 
-register_ioport_write(addr, size, 1, xen_platform_ioport_writeb, d);
-register_ioport_read(addr, size, 1, xen_platform_ioport_readb, d);
+static void platform_ioport_bar_setup(PCIXenPlatformState *d)
+{
+memory_region_init_io(d-bar, xen_pci_io_ops, d, xen-pci, 0x100);
 }
 
-static uint32_t platform_mmio_read(ReadWriteHandler *handler, pcibus_t addr, 
int len)
+static uint64_t platform_mmio_read(void *opaque, target_phys_addr_t addr,
+   unsigned size)
 {
 DPRINTF(Warning: attempted read from physical address 
 0x TARGET_FMT_plx  in xen platform mmio space\n, addr);
@@ -278,28 +297,24 @@ static uint32_t platform_mmio_read(ReadWriteHandler 
*handler, pcibus_t addr, int
 return 0;
 }
 
-static void platform_mmio_write(ReadWriteHandler *handler, pcibus_t addr,
-uint32_t val, int len)
+static void platform_mmio_write(void *opaque, target_phys_addr_t addr,
+uint64_t val, unsigned size)
 {
-DPRINTF(Warning: attempted write of 0x%x to physical 
+DPRINTF(Warning: attempted write of 0x%PRIx64 to physical 
 address 0x TARGET_FMT_plx  in xen platform mmio space\n,
 val, addr);
 }
 
-static ReadWriteHandler platform_mmio_handler = {
+static const MemoryRegionOps platform_mmio_handler = {
 .read = platform_mmio_read,
 .write = platform_mmio_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };

[PATCH v4 02/39] pci: add API to get a BAR's mapped address

2011-08-08 Thread Avi Kivity

Some (hacky) devices that have a back-channel to read this
address back outside the normal configuration mechanisms, such
as VMware svga.

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c |5 +
 hw/pci.h |1 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 8621d3d..c2c2699 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -952,6 +952,11 @@ void pci_register_bar_region(PCIDevice *pci_dev, int 
region_num,
 pci_dev-io_regions[region_num].memory = memory;
 }
 
+pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
+{
+return pci_dev-io_regions[region_num].addr;
+}
+
 static void pci_bridge_filter(PCIDevice *d, pcibus_t *addr, pcibus_t *size,
   uint8_t type)
 {
diff --git a/hw/pci.h b/hw/pci.h
index c51156d..64282ad 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -207,6 +207,7 @@ void pci_register_bar_simple(PCIDevice *pci_dev, int 
region_num,
  pcibus_t size, uint8_t attr, ram_addr_t ram_addr);
 void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
  uint8_t attr, MemoryRegion *memory);
+pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
 
 int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size);
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 33/39] msix: convert to memory API

2011-08-08 Thread Avi Kivity

The msix table is defined as a subregion, to allow for a BAR that
mixes device specific regions with the msix table.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ivshmem.c|   11 +
 hw/msix.c   |   64 +++
 hw/msix.h   |6 +---
 hw/pci.h|2 +-
 hw/virtio-pci.c |   16 -
 hw/virtio-pci.h |1 +
 6 files changed, 42 insertions(+), 58 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index f80e7b6..bacba60 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -65,6 +65,7 @@ typedef struct IVShmemState {
  */
 MemoryRegion bar;
 MemoryRegion ivshmem;
+MemoryRegion msix_bar;
 uint64_t ivshmem_size; /* size of shared memory region */
 int shm_fd; /* shared memory file descriptor */
 
@@ -540,11 +541,11 @@ static void ivshmem_setup_msi(IVShmemState * s) {
 
 /* allocate the MSI-X vectors */
 
-if (!msix_init(s-dev, s-vectors, 1, 0)) {
-pci_register_bar(s-dev, 1,
- msix_bar_size(s-dev),
- PCI_BASE_ADDRESS_SPACE_MEMORY,
- msix_mmio_map);
+memory_region_init(s-msix_bar, ivshmem-msix, 4096);
+if (!msix_init(s-dev, s-vectors, s-msix_bar, 1, 0)) {
+pci_register_bar_region(s-dev, 1,
+PCI_BASE_ADDRESS_SPACE_MEMORY,
+s-msix_bar);
 IVSHMEM_DPRINTF(msix initialized (%d vectors)\n, s-vectors);
 } else {
 IVSHMEM_DPRINTF(msix initialization failed\n);
diff --git a/hw/msix.c b/hw/msix.c
index e67e700..8536c3f 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -82,7 +82,8 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned 
short nentries,
 return 0;
 }
 
-static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr,
+   unsigned size)
 {
 PCIDevice *dev = opaque;
 unsigned int offset = addr  (MSIX_PAGE_SIZE - 1)  ~0x3;
@@ -91,12 +92,6 @@ static uint32_t msix_mmio_readl(void *opaque, 
target_phys_addr_t addr)
 return pci_get_long(page + offset);
 }
 
-static uint32_t msix_mmio_read_unallowed(void *opaque, target_phys_addr_t addr)
-{
-fprintf(stderr, MSI-X: only dword read is allowed!\n);
-return 0;
-}
-
 static uint8_t msix_pending_mask(int vector)
 {
 return 1  (vector % 8);
@@ -169,8 +164,8 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
 }
 }
 
-static void msix_mmio_writel(void *opaque, target_phys_addr_t addr,
- uint32_t val)
+static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
+uint64_t val, unsigned size)
 {
 PCIDevice *dev = opaque;
 unsigned int offset = addr  (MSIX_PAGE_SIZE - 1)  ~0x3;
@@ -179,37 +174,25 @@ static void msix_mmio_writel(void *opaque, 
target_phys_addr_t addr,
 msix_handle_mask_update(dev, vector);
 }
 
-static void msix_mmio_write_unallowed(void *opaque, target_phys_addr_t addr,
-  uint32_t val)
-{
-fprintf(stderr, MSI-X: only dword write is allowed!\n);
-}
-
-static CPUWriteMemoryFunc * const msix_mmio_write[] = {
-msix_mmio_write_unallowed, msix_mmio_write_unallowed, msix_mmio_writel
-};
-
-static CPUReadMemoryFunc * const msix_mmio_read[] = {
-msix_mmio_read_unallowed, msix_mmio_read_unallowed, msix_mmio_readl
+static const MemoryRegionOps msix_mmio_ops = {
+.read = msix_mmio_read,
+.write = msix_mmio_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
 };
 
-/* Should be called from device's map method. */
-void msix_mmio_map(PCIDevice *d, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
+static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar)
 {
 uint8_t *config = d-config + d-msix_cap;
 uint32_t table = pci_get_long(config + PCI_MSIX_TABLE);
 uint32_t offset = table  ~(MSIX_PAGE_SIZE - 1);
 /* TODO: for assigned devices, we'll want to make it possible to map
  * pending bits separately in case they are in a separate bar. */
-int table_bir = table  PCI_MSIX_FLAGS_BIRMASK;
 
-if (table_bir != region_num)
-return;
-if (size = offset)
-return;
-cpu_register_physical_memory(addr + offset, size - offset,
- d-msix_mmio_index);
+memory_region_add_subregion(bar, offset, d-msix_mmio);
 }
 
 static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
@@ -225,6 +208,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
 /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
  * modified, it should be retrieved with msix_bar_size. */
 int msix_init(struct PCIDevice *dev,

[PATCH v4 34/39] pci: remove pci_register_bar_simple()

2011-08-08 Thread Avi Kivity

Superceded by pci_register_bar_region().

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c |   17 -
 hw/pci.h |3 ---
 2 files changed, 0 insertions(+), 20 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index c00cbf8..7a70037 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -903,7 +903,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 r-filtered_size = size;
 r-type = type;
 r-map_func = map_func;
-r-ram_addr = IO_MEM_UNASSIGNED;
 r-memory = NULL;
 
 wmask = ~(size - 1);
@@ -923,13 +922,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 }
 }
 
-static void pci_simple_bar_mapfunc(PCIDevice *pci_dev, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
-{
-cpu_register_physical_memory(addr, size,
- pci_dev-io_regions[region_num].ram_addr);
-}
-
 static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num,
   pcibus_t addr, pcibus_t size,
   int type)
@@ -942,15 +934,6 @@ static void pci_simple_bar_mapfunc_region(PCIDevice 
*pci_dev, int region_num,
 1);
 }
 
-void pci_register_bar_simple(PCIDevice *pci_dev, int region_num,
- pcibus_t size,  uint8_t attr, ram_addr_t ram_addr)
-{
-pci_register_bar(pci_dev, region_num, size,
- PCI_BASE_ADDRESS_SPACE_MEMORY | attr,
- pci_simple_bar_mapfunc);
-pci_dev-io_regions[region_num].ram_addr = ram_addr;
-}
-
 void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
  uint8_t attr, MemoryRegion *memory)
 {
diff --git a/hw/pci.h b/hw/pci.h
index a95e2ad..25e28b1 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -93,7 +93,6 @@ typedef struct PCIIORegion {
 pcibus_t filtered_size;
 uint8_t type;
 PCIMapIORegionFunc *map_func;
-ram_addr_t ram_addr;
 MemoryRegion *memory;
 MemoryRegion *address_space;
 } PCIIORegion;
@@ -204,8 +203,6 @@ PCIDevice *pci_register_device(PCIBus *bus, const char 
*name,
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
 pcibus_t size, uint8_t type,
 PCIMapIORegionFunc *map_func);
-void pci_register_bar_simple(PCIDevice *pci_dev, int region_num,
- pcibus_t size, uint8_t attr, ram_addr_t ram_addr);
 void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
  uint8_t attr, MemoryRegion *memory);
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 39/39] pci: remove support for pre memory API BARs

2011-08-08 Thread Avi Kivity

Not used anymore.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c |   33 ++---
 1 files changed, 2 insertions(+), 31 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6547d2b..dc7271a 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -848,18 +848,7 @@ static void pci_unregister_io_regions(PCIDevice *pci_dev)
 r = pci_dev-io_regions[i];
 if (!r-size || r-addr == PCI_BAR_UNMAPPED)
 continue;
-if (r-memory) {
-memory_region_del_subregion(r-address_space, r-memory);
-} else {
-if (r-type == PCI_BASE_ADDRESS_SPACE_IO) {
-isa_unassign_ioport(r-addr, r-filtered_size);
-} else {
-cpu_register_physical_memory(pci_to_cpu_addr(pci_dev-bus,
- r-addr),
- r-filtered_size,
- IO_MEM_UNASSIGNED);
-}
-}
+memory_region_del_subregion(r-address_space, r-memory);
 }
 }
 
@@ -1058,25 +1047,7 @@ static void pci_update_mappings(PCIDevice *d)
 
 /* now do the real mapping */
 if (r-addr != PCI_BAR_UNMAPPED) {
-if (r-memory) {
-memory_region_del_subregion(r-address_space, r-memory);
-} else if (r-type  PCI_BASE_ADDRESS_SPACE_IO) {
-int class;
-/* NOTE: specific hack for IDE in PC case:
-   only one byte must be mapped. */
-class = pci_get_word(d-config + PCI_CLASS_DEVICE);
-if (class == 0x0101  r-size == 4) {
-isa_unassign_ioport(r-addr + 2, 1);
-} else {
-isa_unassign_ioport(r-addr, r-filtered_size);
-}
-} else {
-cpu_register_physical_memory(pci_to_cpu_addr(d-bus,
- r-addr),
- r-filtered_size,
- IO_MEM_UNASSIGNED);
-qemu_unregister_coalesced_mmio(r-addr, r-filtered_size);
-}
+memory_region_del_subregion(r-address_space, r-memory);
 }
 r-addr = new_addr;
 r-filtered_size = filtered_size;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 23/39] lsi53c895a: convert to memory API

2011-08-08 Thread Avi Kivity

An optimization that fast-pathed DMA reads from the SCRIPTS memory
was removed int the process.  Likely it breaks with iommus anyway.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/lsi53c895a.c |  258 ---
 1 files changed, 56 insertions(+), 202 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index e9904c4..0ab8c78 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -185,9 +185,9 @@ typedef struct lsi_request {
 
 typedef struct {
 PCIDevice dev;
-int mmio_io_addr;
-int ram_io_addr;
-uint32_t script_ram_base;
+MemoryRegion mmio_io;
+MemoryRegion ram_io;
+MemoryRegion io_io;
 
 int carry; /* ??? Should this be an a visible register somewhere?  */
 int status;
@@ -391,10 +391,9 @@ static inline uint32_t read_dword(LSIState *s, uint32_t 
addr)
 {
 uint32_t buf;
 
-/* Optimize reading from SCRIPTS RAM.  */
-if ((addr  0xe000) == s-script_ram_base) {
-return s-script_ram[(addr  0x1fff)  2];
-}
+/* XXX: an optimization here used to fast-path the read from scripts
+ * memory.  But that bypasses any iommu.
+ */
 cpu_physical_memory_read(addr, (uint8_t *)buf, 4);
 return cpu_to_le32(buf);
 }
@@ -1899,232 +1898,90 @@ static void lsi_reg_writeb(LSIState *s, int offset, 
uint8_t val)
 #undef CASE_SET_REG32
 }
 
-static void lsi_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t 
val)
+static void lsi_mmio_write(void *opaque, target_phys_addr_t addr,
+   uint64_t val, unsigned size)
 {
 LSIState *s = opaque;
 
 lsi_reg_writeb(s, addr  0xff, val);
 }
 
-static void lsi_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t 
val)
-{
-LSIState *s = opaque;
-
-addr = 0xff;
-lsi_reg_writeb(s, addr, val  0xff);
-lsi_reg_writeb(s, addr + 1, (val  8)  0xff);
-}
-
-static void lsi_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t 
val)
-{
-LSIState *s = opaque;
-
-addr = 0xff;
-lsi_reg_writeb(s, addr, val  0xff);
-lsi_reg_writeb(s, addr + 1, (val  8)  0xff);
-lsi_reg_writeb(s, addr + 2, (val  16)  0xff);
-lsi_reg_writeb(s, addr + 3, (val  24)  0xff);
-}
-
-static uint32_t lsi_mmio_readb(void *opaque, target_phys_addr_t addr)
+static uint64_t lsi_mmio_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
 {
 LSIState *s = opaque;
 
 return lsi_reg_readb(s, addr  0xff);
 }
 
-static uint32_t lsi_mmio_readw(void *opaque, target_phys_addr_t addr)
-{
-LSIState *s = opaque;
-uint32_t val;
-
-addr = 0xff;
-val = lsi_reg_readb(s, addr);
-val |= lsi_reg_readb(s, addr + 1)  8;
-return val;
-}
-
-static uint32_t lsi_mmio_readl(void *opaque, target_phys_addr_t addr)
-{
-LSIState *s = opaque;
-uint32_t val;
-addr = 0xff;
-val = lsi_reg_readb(s, addr);
-val |= lsi_reg_readb(s, addr + 1)  8;
-val |= lsi_reg_readb(s, addr + 2)  16;
-val |= lsi_reg_readb(s, addr + 3)  24;
-return val;
-}
-
-static CPUReadMemoryFunc * const lsi_mmio_readfn[3] = {
-lsi_mmio_readb,
-lsi_mmio_readw,
-lsi_mmio_readl,
-};
-
-static CPUWriteMemoryFunc * const lsi_mmio_writefn[3] = {
-lsi_mmio_writeb,
-lsi_mmio_writew,
-lsi_mmio_writel,
+static const MemoryRegionOps lsi_mmio_ops = {
+.read = lsi_mmio_read,
+.write = lsi_mmio_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
 };
 
-static void lsi_ram_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
+static void lsi_ram_write(void *opaque, target_phys_addr_t addr,
+  uint64_t val, unsigned size)
 {
 LSIState *s = opaque;
 uint32_t newval;
+uint32_t mask;
 int shift;
 
-addr = 0x1fff;
 newval = s-script_ram[addr  2];
 shift = (addr  3) * 8;
-newval = ~(0xff  shift);
+mask = ((uint64_t)1  (size * 8)) - 1;
+newval = ~(mask  shift);
 newval |= val  shift;
 s-script_ram[addr  2] = newval;
 }
 
-static void lsi_ram_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-LSIState *s = opaque;
-uint32_t newval;
-
-addr = 0x1fff;
-newval = s-script_ram[addr  2];
-if (addr  2) {
-newval = (newval  0x) | (val  16);
-} else {
-newval = (newval  0x) | val;
-}
-s-script_ram[addr  2] = newval;
-}
-
-
-static void lsi_ram_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-LSIState *s = opaque;
-
-addr = 0x1fff;
-s-script_ram[addr  2] = val;
-}
-
-static uint32_t lsi_ram_readb(void *opaque, target_phys_addr_t addr)
+static uint64_t lsi_ram_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
 {
 LSIState *s = opaque;
 uint32_t val;
+uint32_t mask;
 
-addr =

[PATCH v4 36/39] pci: remove pci_register_bar()

2011-08-08 Thread Avi Kivity

Superceded by pci_register_bar_region().  The implementations
are folded together.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c |   42 +-
 hw/pci.h |3 ---
 2 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index f885d4e..62b34d4 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -881,13 +881,25 @@ static int pci_unregister_device(DeviceState *dev)
 return 0;
 }
 
-void pci_register_bar(PCIDevice *pci_dev, int region_num,
-pcibus_t size, uint8_t type,
-PCIMapIORegionFunc *map_func)
+static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num,
+  pcibus_t addr, pcibus_t size,
+  int type)
+{
+PCIIORegion *r = pci_dev-io_regions[region_num];
+
+memory_region_add_subregion_overlap(r-address_space,
+addr,
+r-memory,
+1);
+}
+
+void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
+ uint8_t type, MemoryRegion *memory)
 {
 PCIIORegion *r;
 uint32_t addr;
 uint64_t wmask;
+pcibus_t size = memory_region_size(memory);
 
 assert(region_num = 0);
 assert(region_num  PCI_NUM_REGIONS);
@@ -902,7 +914,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 r-size = size;
 r-filtered_size = size;
 r-type = type;
-r-map_func = map_func;
+r-map_func = pci_simple_bar_mapfunc_region;
 r-memory = NULL;
 
 wmask = ~(size - 1);
@@ -920,29 +932,9 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 pci_set_long(pci_dev-wmask + addr, wmask  0x);
 pci_set_long(pci_dev-cmask + addr, 0x);
 }
-}
-
-static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num,
-  pcibus_t addr, pcibus_t size,
-  int type)
-{
-PCIIORegion *r = pci_dev-io_regions[region_num];
-
-memory_region_add_subregion_overlap(r-address_space,
-addr,
-r-memory,
-1);
-}
-
-void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
- uint8_t attr, MemoryRegion *memory)
-{
-pci_register_bar(pci_dev, region_num, memory_region_size(memory),
- attr,
- pci_simple_bar_mapfunc_region);
 pci_dev-io_regions[region_num].memory = memory;
 pci_dev-io_regions[region_num].address_space
-= attr  PCI_BASE_ADDRESS_SPACE_IO
+= type  PCI_BASE_ADDRESS_SPACE_IO
 ? pci_dev-bus-address_space_io
 : pci_dev-bus-address_space_mem;
 }
diff --git a/hw/pci.h b/hw/pci.h
index 6e2bcea..8028176 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -201,9 +201,6 @@ PCIDevice *pci_register_device(PCIBus *bus, const char 
*name,
PCIConfigReadFunc *config_read,
PCIConfigWriteFunc *config_write);
 
-void pci_register_bar(PCIDevice *pci_dev, int region_num,
-pcibus_t size, uint8_t type,
-PCIMapIORegionFunc *map_func);
 void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
  uint8_t attr, MemoryRegion *memory);
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num);
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 37/39] pci: fold BAR mapping function into its caller

2011-08-08 Thread Avi Kivity

There is only one function, so no need for a function pointer.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c |   25 +
 hw/pci.h |1 -
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 62b34d4..aa17395 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -881,18 +881,6 @@ static int pci_unregister_device(DeviceState *dev)
 return 0;
 }
 
-static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num,
-  pcibus_t addr, pcibus_t size,
-  int type)
-{
-PCIIORegion *r = pci_dev-io_regions[region_num];
-
-memory_region_add_subregion_overlap(r-address_space,
-addr,
-r-memory,
-1);
-}
-
 void pci_register_bar_region(PCIDevice *pci_dev, int region_num,
  uint8_t type, MemoryRegion *memory)
 {
@@ -914,7 +902,6 @@ void pci_register_bar_region(PCIDevice *pci_dev, int 
region_num,
 r-size = size;
 r-filtered_size = size;
 r-type = type;
-r-map_func = pci_simple_bar_mapfunc_region;
 r-memory = NULL;
 
 wmask = ~(size - 1);
@@ -1102,10 +1089,16 @@ static void pci_update_mappings(PCIDevice *d)
  * addr  (size - 1) != 0.
  */
 if (r-type  PCI_BASE_ADDRESS_SPACE_IO) {
-r-map_func(d, i, r-addr, r-filtered_size, r-type);
+memory_region_add_subregion_overlap(r-address_space,
+r-addr,
+r-memory,
+1);
 } else {
-r-map_func(d, i, pci_to_cpu_addr(d-bus, r-addr),
-r-filtered_size, r-type);
+memory_region_add_subregion_overlap(r-address_space,
+pci_to_cpu_addr(d-bus,
+r-addr),
+r-memory,
+1);
 }
 }
 }
diff --git a/hw/pci.h b/hw/pci.h
index 8028176..8d1662a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -92,7 +92,6 @@ typedef struct PCIIORegion {
 pcibus_t size;
 pcibus_t filtered_size;
 uint8_t type;
-PCIMapIORegionFunc *map_func;
 MemoryRegion *memory;
 MemoryRegion *address_space;
 } PCIIORegion;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 30/39] ehci: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/usb-ehci.c |   36 +---
 1 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/hw/usb-ehci.c b/hw/usb-ehci.c
index 2b43895..6ef7798 100644
--- a/hw/usb-ehci.c
+++ b/hw/usb-ehci.c
@@ -370,8 +370,7 @@ struct EHCIState {
 PCIDevice dev;
 USBBus bus;
 qemu_irq irq;
-target_phys_addr_t mem_base;
-int mem;
+MemoryRegion mem;
 int companion_count;
 
 /* properties */
@@ -2179,29 +2178,15 @@ static void ehci_frame_timer(void *opaque)
 qemu_mod_timer(ehci-frame_timer, expire_time);
 }
 
-static CPUReadMemoryFunc *ehci_readfn[3]={
-ehci_mem_readb,
-ehci_mem_readw,
-ehci_mem_readl
-};
 
-static CPUWriteMemoryFunc *ehci_writefn[3]={
-ehci_mem_writeb,
-ehci_mem_writew,
-ehci_mem_writel
+static const MemoryRegionOps ehci_mem_ops = {
+.old_mmio = {
+.read = { ehci_mem_readb, ehci_mem_readw, ehci_mem_readl },
+.write = { ehci_mem_writeb, ehci_mem_writew, ehci_mem_writel },
+},
+.endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void ehci_map(PCIDevice *pci_dev, int region_num,
- pcibus_t addr, pcibus_t size, int type)
-{
-EHCIState *s =(EHCIState *)pci_dev;
-
-DPRINTF(ehci_map: region %d, addr %08 PRIx64 , size % PRId64 , s-mem 
%08X\n,
-region_num, addr, size, s-mem);
-s-mem_base = addr;
-cpu_register_physical_memory(addr, size, s-mem);
-}
-
 static int usb_ehci_initfn(PCIDevice *dev);
 
 static USBPortOps ehci_port_ops = {
@@ -2316,11 +2301,8 @@ static int usb_ehci_initfn(PCIDevice *dev)
 
 qemu_register_reset(ehci_reset, s);
 
-s-mem = cpu_register_io_memory(ehci_readfn, ehci_writefn, s,
-DEVICE_LITTLE_ENDIAN);
-
-pci_register_bar(s-dev, 0, MMIO_SIZE, PCI_BASE_ADDRESS_SPACE_MEMORY,
-ehci_map);
+memory_region_init_io(s-mem, ehci_mem_ops, s, ehci, MMIO_SIZE);
+pci_register_bar_region(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, 
s-mem);
 
 fprintf(stderr, *** EHCI support is under development ***\n);
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 35/39] pci: convert pci rom to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c |   20 +++-
 hw/pci.h |3 ++-
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 7a70037..f885d4e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1855,11 +1855,6 @@ static uint8_t pci_find_capability_list(PCIDevice *pdev, 
uint8_t cap_id,
 return next;
 }
 
-static void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr, 
pcibus_t size, int type)
-{
-cpu_register_physical_memory(addr, size, pdev-rom_offset);
-}
-
 /* Patch the PCI vendor and device ids in a PCI rom image if necessary.
This is needed for an option rom which is used for more than one device. */
 static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size)
@@ -1963,9 +1958,9 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom)
 snprintf(name, sizeof(name), %s.rom, pdev-qdev.info-vmsd-name);
 else
 snprintf(name, sizeof(name), %s.rom, pdev-qdev.info-name);
-pdev-rom_offset = qemu_ram_alloc(pdev-qdev, name, size);
-
-ptr = qemu_get_ram_ptr(pdev-rom_offset);
+pdev-has_rom = true;
+memory_region_init_ram(pdev-rom, pdev-qdev, name, size);
+ptr = memory_region_get_ram_ptr(pdev-rom);
 load_image(path, ptr);
 qemu_free(path);
 
@@ -1976,19 +1971,18 @@ static int pci_add_option_rom(PCIDevice *pdev, bool 
is_default_rom)
 
 qemu_put_ram_ptr(ptr);
 
-pci_register_bar(pdev, PCI_ROM_SLOT, size,
- 0, pci_map_option_rom);
+pci_register_bar_region(pdev, PCI_ROM_SLOT, 0, pdev-rom);
 
 return 0;
 }
 
 static void pci_del_option_rom(PCIDevice *pdev)
 {
-if (!pdev-rom_offset)
+if (!pdev-has_rom)
 return;
 
-qemu_ram_free(pdev-rom_offset);
-pdev-rom_offset = 0;
+memory_region_destroy(pdev-rom);
+pdev-has_rom = false;
 }
 
 /*
diff --git a/hw/pci.h b/hw/pci.h
index 25e28b1..6e2bcea 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -191,7 +191,8 @@ struct PCIDevice {
 
 /* Location of option rom */
 char *romfile;
-ram_addr_t rom_offset;
+bool has_rom;
+MemoryRegion rom;
 uint32_t rom_bar;
 };
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 31/39] uhci: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/usb-uhci.c |   42 --
 1 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 824e3a5..ea38169 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -129,6 +129,7 @@ typedef struct UHCIPort {
 
 struct UHCIState {
 PCIDevice dev;
+MemoryRegion io_bar;
 USBBus bus; /* Note unused when we're a companion controller */
 uint16_t cmd; /* cmd register */
 uint16_t status;
@@ -1096,18 +1097,19 @@ static void uhci_frame_timer(void *opaque)
 qemu_mod_timer(s-frame_timer, s-expire_time);
 }
 
-static void uhci_map(PCIDevice *pci_dev, int region_num,
-pcibus_t addr, pcibus_t size, int type)
-{
-UHCIState *s = (UHCIState *)pci_dev;
-
-register_ioport_write(addr, 32, 2, uhci_ioport_writew, s);
-register_ioport_read(addr, 32, 2, uhci_ioport_readw, s);
-register_ioport_write(addr, 32, 4, uhci_ioport_writel, s);
-register_ioport_read(addr, 32, 4, uhci_ioport_readl, s);
-register_ioport_write(addr, 32, 1, uhci_ioport_writeb, s);
-register_ioport_read(addr, 32, 1, uhci_ioport_readb, s);
-}
+static const MemoryRegionPortio uhci_portio[] = {
+{ 0, 32, 2, .write = uhci_ioport_writew, },
+{ 0, 32, 2, .read = uhci_ioport_readw, },
+{ 0, 32, 4, .write = uhci_ioport_writel, },
+{ 0, 32, 4, .read = uhci_ioport_readl, },
+{ 0, 32, 1, .write = uhci_ioport_writeb, },
+{ 0, 32, 1, .read = uhci_ioport_readb, },
+PORTIO_END_OF_LIST()
+};
+
+static const MemoryRegionOps uhci_ioport_ops = {
+.old_portio = uhci_portio,
+};
 
 static USBPortOps uhci_port_ops = {
 .attach = uhci_attach,
@@ -1154,10 +1156,11 @@ static int usb_uhci_common_initfn(PCIDevice *dev)
 
 qemu_register_reset(uhci_reset, s);
 
+memory_region_init_io(s-io_bar, uhci_ioport_ops, s, uhci, 0x20);
 /* Use region 4 for consistency with real hardware.  BSD guests seem
to rely on this.  */
-pci_register_bar(s-dev, 4, 0x20,
-   PCI_BASE_ADDRESS_SPACE_IO, uhci_map);
+pci_register_bar_region(s-dev, 4,
+PCI_BASE_ADDRESS_SPACE_IO, s-io_bar);
 
 return 0;
 }
@@ -1177,6 +1180,14 @@ static int usb_uhci_vt82c686b_initfn(PCIDevice *dev)
 return usb_uhci_common_initfn(dev);
 }
 
+static int usb_uhci_exit(PCIDevice *dev)
+{
+UHCIState *s = DO_UPCAST(UHCIState, dev, dev);
+
+memory_region_destroy(s-io_bar);
+return 0;
+}
+
 static Property uhci_properties[] = {
 DEFINE_PROP_STRING(masterbus, UHCIState, masterbus),
 DEFINE_PROP_UINT32(firstport, UHCIState, firstport, 0),
@@ -1189,6 +1200,7 @@ static PCIDeviceInfo uhci_info[] = {
 .qdev.size= sizeof(UHCIState),
 .qdev.vmsd= vmstate_uhci,
 .init = usb_uhci_common_initfn,
+.exit = usb_uhci_exit,
 .vendor_id= PCI_VENDOR_ID_INTEL,
 .device_id= PCI_DEVICE_ID_INTEL_82371SB_2,
 .revision = 0x01,
@@ -1199,6 +1211,7 @@ static PCIDeviceInfo uhci_info[] = {
 .qdev.size= sizeof(UHCIState),
 .qdev.vmsd= vmstate_uhci,
 .init = usb_uhci_common_initfn,
+.exit = usb_uhci_exit,
 .vendor_id= PCI_VENDOR_ID_INTEL,
 .device_id= PCI_DEVICE_ID_INTEL_82371AB_2,
 .revision = 0x01,
@@ -1209,6 +1222,7 @@ static PCIDeviceInfo uhci_info[] = {
 .qdev.size= sizeof(UHCIState),
 .qdev.vmsd= vmstate_uhci,
 .init = usb_uhci_vt82c686b_initfn,
+.exit = usb_uhci_exit,
 .vendor_id= PCI_VENDOR_ID_VIA,
 .device_id= PCI_DEVICE_ID_VIA_UHCI,
 .revision = 0x01,
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 18/39] ide: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ide/cmd646.c |  208 +++
 hw/ide/pci.c|   25 ---
 hw/ide/pci.h|   19 -
 hw/ide/piix.c   |   64 +
 hw/ide/via.c|   65 +
 5 files changed, 261 insertions(+), 120 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 56302b5..13e6f2f 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -44,35 +44,95 @@
 
 static void cmd646_update_irq(PCIIDEState *d);
 
-static void ide_map(PCIDevice *pci_dev, int region_num,
-pcibus_t addr, pcibus_t size, int type)
+static uint64_t cmd646_cmd_read(void *opaque, target_phys_addr_t addr,
+unsigned size)
 {
-PCIIDEState *d = DO_UPCAST(PCIIDEState, dev, pci_dev);
-IDEBus *bus;
-
-if (region_num = 3) {
-bus = d-bus[(region_num  1)];
-if (region_num  1) {
-register_ioport_read(addr + 2, 1, 1, ide_status_read, bus);
-register_ioport_write(addr + 2, 1, 1, ide_cmd_write, bus);
+CMD646BAR *cmd646bar = opaque;
+
+if (addr != 2 || size != 1) {
+return ((uint64_t)1  (size * 8)) - 1;
+}
+return ide_status_read(cmd646bar-bus, addr + 2);
+}
+
+static void cmd646_cmd_write(void *opaque, target_phys_addr_t addr,
+ uint64_t data, unsigned size)
+{
+CMD646BAR *cmd646bar = opaque;
+
+if (addr != 2 || size != 1) {
+return;
+}
+ide_cmd_write(cmd646bar-bus, addr + 2, data);
+}
+
+static MemoryRegionOps cmd646_cmd_ops = {
+.read = cmd646_cmd_read,
+.write = cmd646_cmd_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static uint64_t cmd646_data_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
+{
+CMD646BAR *cmd646bar = opaque;
+
+if (size == 1) {
+return ide_ioport_read(cmd646bar-bus, addr);
+} else if (addr == 0) {
+if (size == 2) {
+return ide_data_readw(cmd646bar-bus, addr);
 } else {
-register_ioport_write(addr, 8, 1, ide_ioport_write, bus);
-register_ioport_read(addr, 8, 1, ide_ioport_read, bus);
-
-/* data ports */
-register_ioport_write(addr, 2, 2, ide_data_writew, bus);
-register_ioport_read(addr, 2, 2, ide_data_readw, bus);
-register_ioport_write(addr, 4, 4, ide_data_writel, bus);
-register_ioport_read(addr, 4, 4, ide_data_readl, bus);
+return ide_data_readl(cmd646bar-bus, addr);
 }
 }
+return ((uint64_t)1  (size * 8)) - 1;
 }
 
-static uint32_t bmdma_readb_common(PCIIDEState *pci_dev, BMDMAState *bm,
-   uint32_t addr)
+static void cmd646_data_write(void *opaque, target_phys_addr_t addr,
+ uint64_t data, unsigned size)
 {
+CMD646BAR *cmd646bar = opaque;
+
+if (size == 1) {
+return ide_ioport_write(cmd646bar-bus, addr, data);
+} else if (addr == 0) {
+if (size == 2) {
+return ide_data_writew(cmd646bar-bus, addr, data);
+} else {
+return ide_data_writel(cmd646bar-bus, addr, data);
+}
+}
+}
+
+static MemoryRegionOps cmd646_data_ops = {
+.read = cmd646_data_read,
+.write = cmd646_data_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
+
+static void setup_cmd646_bar(PCIIDEState *d, int bus_num)
+{
+IDEBus *bus = d-bus[bus_num];
+CMD646BAR *bar = d-cmd646_bar[bus_num];
+
+bar-bus = bus;
+bar-pci_dev = d;
+memory_region_init_io(bar-cmd, cmd646_cmd_ops, bar, cmd646-cmd, 4);
+memory_region_init_io(bar-data, cmd646_data_ops, bar, cmd646-data, 8);
+}
+
+static uint64_t bmdma_read(void *opaque, target_phys_addr_t addr,
+   unsigned size)
+{
+BMDMAState *bm = opaque;
+PCIIDEState *pci_dev = bm-pci_dev;
 uint32_t val;
 
+if (size != 1) {
+return ((uint64_t)1  (size * 8)) - 1;
+}
+
 switch(addr  3) {
 case 0:
 val = bm-cmd;
@@ -100,31 +160,22 @@ static uint32_t bmdma_readb_common(PCIIDEState *pci_dev, 
BMDMAState *bm,
 return val;
 }
 
-static uint32_t bmdma_readb_0(void *opaque, uint32_t addr)
+static void bmdma_write(void *opaque, target_phys_addr_t addr,
+uint64_t val, unsigned size)
 {
-PCIIDEState *pci_dev = opaque;
-BMDMAState *bm = pci_dev-bmdma[0];
-
-return bmdma_readb_common(pci_dev, bm, addr);
-}
+BMDMAState *bm = opaque;
+PCIIDEState *pci_dev = bm-pci_dev;
 
-static uint32_t bmdma_readb_1(void *opaque, uint32_t addr)
-{
-PCIIDEState *pci_dev = opaque;
-BMDMAState *bm = pci_dev-bmdma[1];
-
-return bmdma_readb_common(pci_dev, bm, addr);
-}
+if (size != 1) {
+return;
+}
 
-static void bmdma_writeb_common(PCIIDEState *pci_dev, BMDMAState *bm,

[PATCH v4 28/39] isa-mmio: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/isa.h  |2 ++
 hw/isa_mmio.c |   29 ++---
 2 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/hw/isa.h b/hw/isa.h
index d2b6126..f1f2181 100644
--- a/hw/isa.h
+++ b/hw/isa.h
@@ -4,6 +4,7 @@
 /* ISA bus */
 
 #include ioport.h
+#include memory.h
 #include qdev.h
 
 typedef struct ISABus ISABus;
@@ -37,6 +38,7 @@ ISADevice *isa_create_simple(const char *name);
 
 extern target_phys_addr_t isa_mem_base;
 
+void isa_mmio_setup(MemoryRegion *mr, target_phys_addr_t size);
 void isa_mmio_init(target_phys_addr_t base, target_phys_addr_t size);
 
 /* dma.c */
diff --git a/hw/isa_mmio.c b/hw/isa_mmio.c
index ca957fb..3d2af1a 100644
--- a/hw/isa_mmio.c
+++ b/hw/isa_mmio.c
@@ -24,6 +24,7 @@
 
 #include hw.h
 #include isa.h
+#include exec-memory.h
 
 static void isa_mmio_writeb (void *opaque, target_phys_addr_t addr,
   uint32_t val)
@@ -58,25 +59,23 @@ static uint32_t isa_mmio_readl(void *opaque, 
target_phys_addr_t addr)
 return cpu_inl(addr  IOPORTS_MASK);
 }
 
-static CPUWriteMemoryFunc * const isa_mmio_write[] = {
-isa_mmio_writeb,
-isa_mmio_writew,
-isa_mmio_writel,
+static const MemoryRegionOps isa_mmio_ops = {
+.old_mmio = {
+.write = { isa_mmio_writeb, isa_mmio_writew, isa_mmio_writel },
+.read = { isa_mmio_readb, isa_mmio_readw, isa_mmio_readl, },
+},
+.endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static CPUReadMemoryFunc * const isa_mmio_read[] = {
-isa_mmio_readb,
-isa_mmio_readw,
-isa_mmio_readl,
-};
+void isa_mmio_setup(MemoryRegion *mr, target_phys_addr_t size)
+{
+memory_region_init_io(mr, isa_mmio_ops, NULL, isa-mmio, size);
+}
 
 void isa_mmio_init(target_phys_addr_t base, target_phys_addr_t size)
 {
-int isa_mmio_iomemtype;
+MemoryRegion *mr = qemu_malloc(sizeof(*mr));
 
-isa_mmio_iomemtype = cpu_register_io_memory(isa_mmio_read,
-isa_mmio_write,
-NULL,
-DEVICE_LITTLE_ENDIAN);
-cpu_register_physical_memory(base, size, isa_mmio_iomemtype);
+isa_mmio_setup(mr, size);
+memory_region_add_subregion(get_system_memory(), base, mr);
 }
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 26/39] pcnet: convert to memory API

2011-08-08 Thread Avi Kivity

Also related chips.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/lance.c |   31 ++-
 hw/pcnet-pci.c |   74 +--
 hw/pcnet.h |4 ++-
 3 files changed, 61 insertions(+), 48 deletions(-)

diff --git a/hw/lance.c b/hw/lance.c
index ddb1cbb..8e20360 100644
--- a/hw/lance.c
+++ b/hw/lance.c
@@ -55,8 +55,8 @@ static void parent_lance_reset(void *opaque, int irq, int 
level)
 pcnet_h_reset(d-state);
 }
 
-static void lance_mem_writew(void *opaque, target_phys_addr_t addr,
- uint32_t val)
+static void lance_mem_write(void *opaque, target_phys_addr_t addr,
+uint64_t val, unsigned size)
 {
 SysBusPCNetState *d = opaque;
 
@@ -64,7 +64,8 @@ static void lance_mem_writew(void *opaque, target_phys_addr_t 
addr,
 pcnet_ioport_writew(d-state, addr, val  0x);
 }
 
-static uint32_t lance_mem_readw(void *opaque, target_phys_addr_t addr)
+static uint64_t lance_mem_read(void *opaque, target_phys_addr_t addr,
+   unsigned size)
 {
 SysBusPCNetState *d = opaque;
 uint32_t val;
@@ -74,16 +75,14 @@ static uint32_t lance_mem_readw(void *opaque, 
target_phys_addr_t addr)
 return val  0x;
 }
 
-static CPUReadMemoryFunc * const lance_mem_read[3] = {
-NULL,
-lance_mem_readw,
-NULL,
-};
-
-static CPUWriteMemoryFunc * const lance_mem_write[3] = {
-NULL,
-lance_mem_writew,
-NULL,
+static const MemoryRegionOps lance_mem_ops = {
+.read = lance_mem_read,
+.write = lance_mem_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 2,
+.max_access_size = 2,
+},
 };
 
 static void lance_cleanup(VLANClientState *nc)
@@ -117,13 +116,11 @@ static int lance_init(SysBusDevice *dev)
 SysBusPCNetState *d = FROM_SYSBUS(SysBusPCNetState, dev);
 PCNetState *s = d-state;
 
-s-mmio_index =
-cpu_register_io_memory(lance_mem_read, lance_mem_write, d,
-   DEVICE_NATIVE_ENDIAN);
+memory_region_init_io(s-mmio, lance_mem_ops, s, lance-mmio, 4);
 
 qdev_init_gpio_in(dev-qdev, parent_lance_reset, 1);
 
-sysbus_init_mmio(dev, 4, s-mmio_index);
+sysbus_init_mmio_region(dev, s-mmio);
 
 sysbus_init_irq(dev, s-irq);
 
diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c
index 216cf81..a25f565 100644
--- a/hw/pcnet-pci.c
+++ b/hw/pcnet-pci.c
@@ -46,6 +46,7 @@
 typedef struct {
 PCIDevice pci_dev;
 PCNetState state;
+MemoryRegion io_bar;
 } PCIPCNetState;
 
 static void pcnet_aprom_writeb(void *opaque, uint32_t addr, uint32_t val)
@@ -69,25 +70,41 @@ static uint32_t pcnet_aprom_readb(void *opaque, uint32_t 
addr)
 return val;
 }
 
-static void pcnet_ioport_map(PCIDevice *pci_dev, int region_num,
- pcibus_t addr, pcibus_t size, int type)
+static uint64_t pcnet_ioport_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
 {
-PCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev, pci_dev)-state;
+PCNetState *d = opaque;
 
-#ifdef PCNET_DEBUG_IO
-printf(pcnet_ioport_map addr=0x%04FMT_PCIBUS size=0x%04FMT_PCIBUS\n,
-   addr, size);
-#endif
+if (addr  16  size == 1) {
+return pcnet_aprom_readb(d, addr);
+} else if (addr = 0x10  addr  0x20  size == 2) {
+return pcnet_ioport_readw(d, addr);
+} else if (addr = 0x10  addr  0x20  size == 4) {
+return pcnet_ioport_readl(d, addr);
+}
+return ((uint64_t)1  (size * 8)) - 1;
+}
 
-register_ioport_write(addr, 16, 1, pcnet_aprom_writeb, d);
-register_ioport_read(addr, 16, 1, pcnet_aprom_readb, d);
+static void pcnet_ioport_write(void *opaque, target_phys_addr_t addr,
+   uint64_t data, unsigned size)
+{
+PCNetState *d = opaque;
 
-register_ioport_write(addr + 0x10, 0x10, 2, pcnet_ioport_writew, d);
-register_ioport_read(addr + 0x10, 0x10, 2, pcnet_ioport_readw, d);
-register_ioport_write(addr + 0x10, 0x10, 4, pcnet_ioport_writel, d);
-register_ioport_read(addr + 0x10, 0x10, 4, pcnet_ioport_readl, d);
+if (addr  16  size == 1) {
+return pcnet_aprom_writeb(d, addr, data);
+} else if (addr = 0x10  addr  0x20  size == 2) {
+return pcnet_ioport_writew(d, addr, data);
+} else if (addr = 0x10  addr  0x20  size == 4) {
+return pcnet_ioport_writel(d, addr, data);
+}
 }
 
+static const MemoryRegionOps pcnet_io_ops = {
+.read = pcnet_ioport_read,
+.write = pcnet_ioport_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 static void pcnet_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t 
val)
 {
 PCNetState *d = opaque;
@@ -202,16 +219,12 @@ static const VMStateDescription vmstate_pci_pcnet = {
 
 /* PCI interface */
 
-static CPUWriteMemoryFunc * const

[PATCH v4 29/39] sun4u: convert to memory API

2011-08-08 Thread Avi Kivity

fixes memory leak on repeated BAR map/unmap

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/sun4u.c |   55 +--
 1 files changed, 25 insertions(+), 30 deletions(-)

diff --git a/hw/sun4u.c b/hw/sun4u.c
index d7dcaf0..cb76031 100644
--- a/hw/sun4u.c
+++ b/hw/sun4u.c
@@ -91,6 +91,12 @@ struct hwdef {
 uint64_t console_serial_base;
 };
 
+typedef struct EbusState {
+PCIDevice pci_dev;
+MemoryRegion bar0;
+MemoryRegion bar1;
+} EbusState;
+
 int DMA_get_channel_mode (int nchan)
 {
 return 0;
@@ -518,21 +524,6 @@ void cpu_tick_set_limit(CPUTimer *timer, uint64_t limit)
 }
 }
 
-static void ebus_mmio_mapfunc(PCIDevice *pci_dev, int region_num,
-  pcibus_t addr, pcibus_t size, int type)
-{
-EBUS_DPRINTF(Mapping region %d registers at % FMT_PCIBUS \n,
- region_num, addr);
-switch (region_num) {
-case 0:
-isa_mmio_init(addr, 0x100);
-break;
-case 1:
-isa_mmio_init(addr, 0x80);
-break;
-}
-}
-
 static void dummy_isa_irq_handler(void *opaque, int n, int level)
 {
 }
@@ -549,27 +540,31 @@ pci_ebus_init(PCIBus *bus, int devfn)
 }
 
 static int
-pci_ebus_init1(PCIDevice *s)
+pci_ebus_init1(PCIDevice *pci_dev)
 {
-isa_bus_new(s-qdev);
+EbusState *s = DO_UPCAST(EbusState, pci_dev, pci_dev);
+
+isa_bus_new(pci_dev-qdev);
 
-s-config[0x04] = 0x06; // command = bus master, pci mem
-s-config[0x05] = 0x00;
-s-config[0x06] = 0xa0; // status = fast back-to-back, 66MHz, no error
-s-config[0x07] = 0x03; // status = medium devsel
-s-config[0x09] = 0x00; // programming i/f
-s-config[0x0D] = 0x0a; // latency_timer
+pci_dev-config[0x04] = 0x06; // command = bus master, pci mem
+pci_dev-config[0x05] = 0x00;
+pci_dev-config[0x06] = 0xa0; // status = fast back-to-back, 66MHz, no 
error
+pci_dev-config[0x07] = 0x03; // status = medium devsel
+pci_dev-config[0x09] = 0x00; // programming i/f
+pci_dev-config[0x0D] = 0x0a; // latency_timer
 
-pci_register_bar(s, 0, 0x100, PCI_BASE_ADDRESS_SPACE_MEMORY,
-   ebus_mmio_mapfunc);
-pci_register_bar(s, 1, 0x80,  PCI_BASE_ADDRESS_SPACE_MEMORY,
-   ebus_mmio_mapfunc);
+isa_mmio_setup(s-bar0, 0x100);
+pci_register_bar_region(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY,
+s-bar0);
+isa_mmio_setup(s-bar1, 0x80);
+pci_register_bar_region(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY,
+s-bar1);
 return 0;
 }
 
 static PCIDeviceInfo ebus_info = {
 .qdev.name = ebus,
-.qdev.size = sizeof(PCIDevice),
+.qdev.size = sizeof(EbusState),
 .init = pci_ebus_init1,
 .vendor_id = PCI_VENDOR_ID_SUN,
 .device_id = PCI_DEVICE_ID_SUN_EBUS,
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 27/39] i6300esb: convert to memory API

2011-08-08 Thread Avi Kivity

Also add missing destructor.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/wdt_i6300esb.c |   43 +--
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/hw/wdt_i6300esb.c b/hw/wdt_i6300esb.c
index 53786ce..abc2e17 100644
--- a/hw/wdt_i6300esb.c
+++ b/hw/wdt_i6300esb.c
@@ -66,6 +66,7 @@
 /* Device state. */
 struct I6300State {
 PCIDevice dev;
+MemoryRegion io_mem;
 
 int reboot_enabled; /* Reboot on timer expiry.  The real action
  * performed depends on the -watchdog-action
@@ -355,6 +356,22 @@ static void i6300esb_mem_writel(void *vp, 
target_phys_addr_t addr, uint32_t val)
 }
 }
 
+static const MemoryRegionOps i6300esb_ops = {
+.old_mmio = {
+.read = {
+i6300esb_mem_readb,
+i6300esb_mem_readw,
+i6300esb_mem_readl,
+},
+.write = {
+i6300esb_mem_writeb,
+i6300esb_mem_writew,
+i6300esb_mem_writel,
+},
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
 static const VMStateDescription vmstate_i6300esb = {
 .name = i6300esb_wdt,
 .version_id = sizeof(I6300State),
@@ -381,31 +398,28 @@ static const VMStateDescription vmstate_i6300esb = {
 static int i6300esb_init(PCIDevice *dev)
 {
 I6300State *d = DO_UPCAST(I6300State, dev, dev);
-int io_mem;
-static CPUReadMemoryFunc * const mem_read[3] = {
-i6300esb_mem_readb,
-i6300esb_mem_readw,
-i6300esb_mem_readl,
-};
-static CPUWriteMemoryFunc * const mem_write[3] = {
-i6300esb_mem_writeb,
-i6300esb_mem_writew,
-i6300esb_mem_writel,
-};
 
 i6300esb_debug(I6300State = %p\n, d);
 
 d-timer = qemu_new_timer_ns(vm_clock, i6300esb_timer_expired, d);
 d-previous_reboot_flag = 0;
 
-io_mem = cpu_register_io_memory(mem_read, mem_write, d,
-DEVICE_NATIVE_ENDIAN);
-pci_register_bar_simple(d-dev, 0, 0x10, 0, io_mem);
+memory_region_init_io(d-io_mem, i6300esb_ops, d, i6300esb, 0x10);
+pci_register_bar_region(d-dev, 0, 0, d-io_mem);
 /* qemu_register_coalesced_mmio (addr, 0x10); ? */
 
 return 0;
 }
 
+static int i6300esb_exit(PCIDevice *dev)
+{
+I6300State *d = DO_UPCAST(I6300State, dev, dev);
+
+memory_region_destroy(d-io_mem);
+
+return 0;
+}
+
 static WatchdogTimerModel model = {
 .wdt_name = i6300esb,
 .wdt_description = Intel 6300ESB,
@@ -419,6 +433,7 @@ static PCIDeviceInfo i6300esb_info = {
 .config_read  = i6300esb_config_read,
 .config_write = i6300esb_config_write,
 .init = i6300esb_init,
+.exit = i6300esb_exit,
 .vendor_id= PCI_VENDOR_ID_INTEL,
 .device_id= PCI_DEVICE_ID_INTEL_ESB_9,
 .class_id = PCI_CLASS_SYSTEM_OTHER,
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 19/39] ivshmem: convert to memory API

2011-08-08 Thread Avi Kivity

excluding msix.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ivshmem.c |  148 --
 1 files changed, 50 insertions(+), 98 deletions(-)

diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 3055dd2..f80e7b6 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -56,11 +56,15 @@ typedef struct IVShmemState {
 
 CharDriverState **eventfd_chr;
 CharDriverState *server_chr;
-int ivshmem_mmio_io_addr;
+MemoryRegion ivshmem_mmio;
 
 pcibus_t mmio_addr;
-pcibus_t shm_pci_addr;
-uint64_t ivshmem_offset;
+/* We might need to register the BAR before we actually have the memory.
+ * So prepare a container MemoryRegion for the BAR immediately and
+ * add a subregion when we have the memory.
+ */
+MemoryRegion bar;
+MemoryRegion ivshmem;
 uint64_t ivshmem_size; /* size of shared memory region */
 int shm_fd; /* shared memory file descriptor */
 
@@ -96,23 +100,6 @@ static inline bool is_power_of_two(uint64_t x) {
 return (x  (x - 1)) == 0;
 }
 
-static void ivshmem_map(PCIDevice *pci_dev, int region_num,
-pcibus_t addr, pcibus_t size, int type)
-{
-IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev);
-
-s-shm_pci_addr = addr;
-
-if (s-ivshmem_offset  0) {
-cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size,
-s-ivshmem_offset);
-}
-
-IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = %
-PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size);
-
-}
-
 /* accessing registers - based on rtl8139 */
 static void ivshmem_update_irq(IVShmemState *s, int val)
 {
@@ -168,15 +155,8 @@ static uint32_t ivshmem_IntrStatus_read(IVShmemState *s)
 return ret;
 }
 
-static void ivshmem_io_writew(void *opaque, target_phys_addr_t addr,
-uint32_t val)
-{
-
-IVSHMEM_DPRINTF(We shouldn't be writing words\n);
-}
-
-static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr,
-uint32_t val)
+static void ivshmem_io_write(void *opaque, target_phys_addr_t addr,
+ uint64_t val, unsigned size)
 {
 IVShmemState *s = opaque;
 
@@ -219,20 +199,8 @@ static void ivshmem_io_writel(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static void ivshmem_io_writeb(void *opaque, target_phys_addr_t addr,
-uint32_t val)
-{
-IVSHMEM_DPRINTF(We shouldn't be writing bytes\n);
-}
-
-static uint32_t ivshmem_io_readw(void *opaque, target_phys_addr_t addr)
-{
-
-IVSHMEM_DPRINTF(We shouldn't be reading words\n);
-return 0;
-}
-
-static uint32_t ivshmem_io_readl(void *opaque, target_phys_addr_t addr)
+static uint64_t ivshmem_io_read(void *opaque, target_phys_addr_t addr,
+unsigned size)
 {
 
 IVShmemState *s = opaque;
@@ -265,23 +233,14 @@ static uint32_t ivshmem_io_readl(void *opaque, 
target_phys_addr_t addr)
 return ret;
 }
 
-static uint32_t ivshmem_io_readb(void *opaque, target_phys_addr_t addr)
-{
-IVSHMEM_DPRINTF(We shouldn't be reading bytes\n);
-
-return 0;
-}
-
-static CPUReadMemoryFunc * const ivshmem_mmio_read[3] = {
-ivshmem_io_readb,
-ivshmem_io_readw,
-ivshmem_io_readl,
-};
-
-static CPUWriteMemoryFunc * const ivshmem_mmio_write[3] = {
-ivshmem_io_writeb,
-ivshmem_io_writew,
-ivshmem_io_writel,
+static const MemoryRegionOps ivshmem_mmio_ops = {
+.read = ivshmem_io_read,
+.write = ivshmem_io_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
 };
 
 static void ivshmem_receive(void *opaque, const uint8_t *buf, int size)
@@ -371,12 +330,12 @@ static void create_shared_memory_BAR(IVShmemState *s, int 
fd) {
 
 ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 
-s-ivshmem_offset = qemu_ram_alloc_from_ptr(s-dev.qdev, ivshmem.bar2,
-s-ivshmem_size, ptr);
+memory_region_init_ram_ptr(s-ivshmem, s-dev.qdev, ivshmem.bar2,
+   s-ivshmem_size, ptr);
+memory_region_add_subregion(s-bar, 0, s-ivshmem);
 
 /* region for shared memory */
-pci_register_bar(s-dev, 2, s-ivshmem_size,
-PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map);
+pci_register_bar_region(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, 
s-bar);
 }
 
 static void close_guest_eventfds(IVShmemState *s, int posn)
@@ -401,8 +360,12 @@ static void setup_ioeventfds(IVShmemState *s) {
 
 for (i = 0; i = s-max_peer; i++) {
 for (j = 0; j  s-peers[i].nb_eventfds; j++) {
-

[PATCH v4 16/39] eepro100: convert to memory API

2011-08-08 Thread Avi Kivity

Note: the existing code aliases the flash BAR into the MMIO bar.  This is
probably a bug.  This patch does not correct the problem.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/eepro100.c |  182 -
 1 files changed, 37 insertions(+), 145 deletions(-)

diff --git a/hw/eepro100.c b/hw/eepro100.c
index 9b6f4a5..04723f3 100644
--- a/hw/eepro100.c
+++ b/hw/eepro100.c
@@ -228,13 +228,14 @@ typedef struct {
 PCIDevice dev;
 /* Hash register (multicast mask array, multiple individual addresses). */
 uint8_t mult[8];
-int mmio_index;
+MemoryRegion mmio_bar;
+MemoryRegion io_bar;
+MemoryRegion flash_bar;
 NICState *nic;
 NICConf conf;
 uint8_t scb_stat;   /* SCB stat/ack byte */
 uint8_t int_stat;   /* PCI interrupt status */
 /* region must not be saved by nic_save. */
-uint32_t region1;   /* PCI region 1 address */
 uint16_t mdimem[32];
 eeprom_t *eeprom;
 uint32_t device;/* device variant */
@@ -1584,147 +1585,36 @@ static void eepro100_write4(EEPRO100State * s, 
uint32_t addr, uint32_t val)
 }
 }
 
-/*
- *
- * Port mapped I/O.
- *
- /
-
-static uint32_t ioport_read1(void *opaque, uint32_t addr)
-{
-EEPRO100State *s = opaque;
-#if 0
-logout(addr=%s\n, regname(addr));
-#endif
-return eepro100_read1(s, addr - s-region1);
-}
-
-static uint32_t ioport_read2(void *opaque, uint32_t addr)
-{
-EEPRO100State *s = opaque;
-return eepro100_read2(s, addr - s-region1);
-}
-
-static uint32_t ioport_read4(void *opaque, uint32_t addr)
-{
-EEPRO100State *s = opaque;
-return eepro100_read4(s, addr - s-region1);
-}
-
-static void ioport_write1(void *opaque, uint32_t addr, uint32_t val)
-{
-EEPRO100State *s = opaque;
-#if 0
-logout(addr=%s val=0x%02x\n, regname(addr), val);
-#endif
-eepro100_write1(s, addr - s-region1, val);
-}
-
-static void ioport_write2(void *opaque, uint32_t addr, uint32_t val)
-{
-EEPRO100State *s = opaque;
-eepro100_write2(s, addr - s-region1, val);
-}
-
-static void ioport_write4(void *opaque, uint32_t addr, uint32_t val)
-{
-EEPRO100State *s = opaque;
-eepro100_write4(s, addr - s-region1, val);
-}
-
-/***/
-/* PCI EEPRO100 definitions */
-
-static void pci_map(PCIDevice * pci_dev, int region_num,
-pcibus_t addr, pcibus_t size, int type)
-{
-EEPRO100State *s = DO_UPCAST(EEPRO100State, dev, pci_dev);
-
-TRACE(OTHER, logout(region %d, addr=0x%08FMT_PCIBUS, 
-  size=0x%08FMT_PCIBUS, type=%d\n,
-  region_num, addr, size, type));
-
-assert(region_num == 1);
-register_ioport_write(addr, size, 1, ioport_write1, s);
-register_ioport_read(addr, size, 1, ioport_read1, s);
-register_ioport_write(addr, size, 2, ioport_write2, s);
-register_ioport_read(addr, size, 2, ioport_read2, s);
-register_ioport_write(addr, size, 4, ioport_write4, s);
-register_ioport_read(addr, size, 4, ioport_read4, s);
-
-s-region1 = addr;
-}
-
-/*
- *
- * Memory mapped I/O.
- *
- /
-
-static void pci_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t 
val)
-{
-EEPRO100State *s = opaque;
-#if 0
-logout(addr=%s val=0x%02x\n, regname(addr), val);
-#endif
-eepro100_write1(s, addr, val);
-}
-
-static void pci_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t 
val)
+static uint64_t eepro100_read(void *opaque, target_phys_addr_t addr,
+  unsigned size)
 {
 EEPRO100State *s = opaque;
-#if 0
-logout(addr=%s val=0x%02x\n, regname(addr), val);
-#endif
-eepro100_write2(s, addr, val);
-}
 
-static void pci_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t 
val)
-{
-EEPRO100State *s = opaque;
-#if 0
-logout(addr=%s val=0x%02x\n, regname(addr), val);
-#endif
-eepro100_write4(s, addr, val);
-}
-
-static uint32_t pci_mmio_readb(void *opaque, target_phys_addr_t addr)
-{
-EEPRO100State *s = opaque;
-#if 0
-logout(addr=%s\n, regname(addr));
-#endif
-return eepro100_read1(s, addr);
+switch (size) {
+case 1: return eepro100_read1(s, addr);
+case 2: return eepro100_read2(s, addr);
+case 4: return eepro100_read4(s, addr);
+default: abort();
+}
 }
 
-static uint32_t pci_mmio_readw(void *opaque, target_phys_addr_t addr)
+static void eepro100_write(void *opaque, target_phys_addr_t addr,
+   uint64_t data, unsigned size)
 {
 EEPRO100State *s = opaque;
-#if 0
-

[PATCH v4 17/39] es1370: convert to memory API

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/es1370.c |   43 +--
 1 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/hw/es1370.c b/hw/es1370.c
index 1ed62b7..4e43c4a 100644
--- a/hw/es1370.c
+++ b/hw/es1370.c
@@ -268,6 +268,7 @@ struct chan {
 typedef struct ES1370State {
 PCIDevice dev;
 QEMUSoundCard card;
+MemoryRegion io;
 struct chan chan[NB_CHANNELS];
 SWVoiceOut *dac_voice[2];
 SWVoiceIn *adc_voice;
@@ -775,7 +776,6 @@ IO_READ_PROTO (es1370_readl)
 return val;
 }
 
-
 static void es1370_transfer_audio (ES1370State *s, struct chan *d, int 
loop_sel,
int max, int *irq)
 {
@@ -906,23 +906,20 @@ static void es1370_adc_callback (void *opaque, int avail)
 es1370_run_channel (s, ADC_CHANNEL, avail);
 }
 
-static void es1370_map (PCIDevice *pci_dev, int region_num,
-pcibus_t addr, pcibus_t size, int type)
-{
-ES1370State *s = DO_UPCAST (ES1370State, dev, pci_dev);
-
-(void) region_num;
-(void) size;
-(void) type;
-
-register_ioport_write (addr, 0x40 * 4, 1, es1370_writeb, s);
-register_ioport_write (addr, 0x40 * 2, 2, es1370_writew, s);
-register_ioport_write (addr, 0x40, 4, es1370_writel, s);
+static const MemoryRegionPortio es1370_portio[] = {
+{ 0, 0x40 * 4, 1, .write = es1370_writeb, },
+{ 0, 0x40 * 2, 2, .write = es1370_writew, },
+{ 0, 0x40, 4, .write = es1370_writel, },
+{ 0, 0x40 * 4, 1, .read = es1370_readb, },
+{ 0, 0x40 * 2, 2, .read = es1370_readw, },
+{ 0, 0x40, 4, .read = es1370_readl, },
+PORTIO_END_OF_LIST()
+};
 
-register_ioport_read (addr, 0x40 * 4, 1, es1370_readb, s);
-register_ioport_read (addr, 0x40 * 2, 2, es1370_readw, s);
-register_ioport_read (addr, 0x40, 4, es1370_readl, s);
-}
+static const MemoryRegionOps es1370_io_ops = {
+.old_portio = es1370_portio,
+.endianness = DEVICE_LITTLE_ENDIAN,
+};
 
 static const VMStateDescription vmstate_es1370_channel = {
 .name = es1370_channel,
@@ -1011,7 +1008,8 @@ static int es1370_initfn (PCIDevice *dev)
 c[PCI_MIN_GNT] = 0x0c;
 c[PCI_MAX_LAT] = 0x80;
 
-pci_register_bar (s-dev, 0, 256, PCI_BASE_ADDRESS_SPACE_IO, es1370_map);
+memory_region_init_io (s-io, es1370_io_ops, s, es1370, 256);
+pci_register_bar_region (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io);
 qemu_register_reset (es1370_on_reset, s);
 
 AUD_register_card (es1370, s-card);
@@ -1019,6 +1017,14 @@ static int es1370_initfn (PCIDevice *dev)
 return 0;
 }
 
+static int es1370_exitfn(PCIDevice *dev)
+{
+ES1370State *s = DO_UPCAST (ES1370State, dev, dev);
+
+memory_region_destroy (s-io);
+return 0;
+}
+
 int es1370_init (PCIBus *bus)
 {
 pci_create_simple (bus, -1, ES1370);
@@ -1031,6 +1037,7 @@ static PCIDeviceInfo es1370_info = {
 .qdev.size= sizeof (ES1370State),
 .qdev.vmsd= vmstate_es1370,
 .init = es1370_initfn,
+.exit = es1370_exitfn,
 .vendor_id= PCI_VENDOR_ID_ENSONIQ,
 .device_id= PCI_DEVICE_ID_ENSONIQ_ES1370,
 .class_id = PCI_CLASS_MULTIMEDIA_AUDIO,
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 14/39] ac97: convert to memory API

2011-08-08 Thread Avi Kivity

fixes BAR sizing as well.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/ac97.c |   89 +++-
 1 files changed, 52 insertions(+), 37 deletions(-)

diff --git a/hw/ac97.c b/hw/ac97.c
index 0b59896..52f0f0d 100644
--- a/hw/ac97.c
+++ b/hw/ac97.c
@@ -160,8 +160,9 @@ typedef struct AC97LinkState {
 SWVoiceIn *voice_mc;
 int invalid_freq[3];
 uint8_t silence[128];
-uint32_t base[2];
 int bup_flag;
+MemoryRegion io_nam;
+MemoryRegion io_nabm;
 } AC97LinkState;
 
 enum {
@@ -583,7 +584,7 @@ static uint32_t nam_readw (void *opaque, uint32_t addr)
 {
 AC97LinkState *s = opaque;
 uint32_t val = ~0U;
-uint32_t index = addr - s-base[0];
+uint32_t index = addr;
 s-cas = 0;
 val = mixer_load (s, index);
 return val;
@@ -611,7 +612,7 @@ static void nam_writeb (void *opaque, uint32_t addr, 
uint32_t val)
 static void nam_writew (void *opaque, uint32_t addr, uint32_t val)
 {
 AC97LinkState *s = opaque;
-uint32_t index = addr - s-base[0];
+uint32_t index = addr;
 s-cas = 0;
 switch (index) {
 case AC97_Reset:
@@ -714,7 +715,7 @@ static uint32_t nabm_readb (void *opaque, uint32_t addr)
 {
 AC97LinkState *s = opaque;
 AC97BusMasterRegs *r = NULL;
-uint32_t index = addr - s-base[1];
+uint32_t index = addr;
 uint32_t val = ~0U;
 
 switch (index) {
@@ -769,7 +770,7 @@ static uint32_t nabm_readw (void *opaque, uint32_t addr)
 {
 AC97LinkState *s = opaque;
 AC97BusMasterRegs *r = NULL;
-uint32_t index = addr - s-base[1];
+uint32_t index = addr;
 uint32_t val = ~0U;
 
 switch (index) {
@@ -798,7 +799,7 @@ static uint32_t nabm_readl (void *opaque, uint32_t addr)
 {
 AC97LinkState *s = opaque;
 AC97BusMasterRegs *r = NULL;
-uint32_t index = addr - s-base[1];
+uint32_t index = addr;
 uint32_t val = ~0U;
 
 switch (index) {
@@ -848,7 +849,7 @@ static void nabm_writeb (void *opaque, uint32_t addr, 
uint32_t val)
 {
 AC97LinkState *s = opaque;
 AC97BusMasterRegs *r = NULL;
-uint32_t index = addr - s-base[1];
+uint32_t index = addr;
 switch (index) {
 case PI_LVI:
 case PO_LVI:
@@ -904,7 +905,7 @@ static void nabm_writew (void *opaque, uint32_t addr, 
uint32_t val)
 {
 AC97LinkState *s = opaque;
 AC97BusMasterRegs *r = NULL;
-uint32_t index = addr - s-base[1];
+uint32_t index = addr;
 switch (index) {
 case PI_SR:
 case PO_SR:
@@ -924,7 +925,7 @@ static void nabm_writel (void *opaque, uint32_t addr, 
uint32_t val)
 {
 AC97LinkState *s = opaque;
 AC97BusMasterRegs *r = NULL;
-uint32_t index = addr - s-base[1];
+uint32_t index = addr;
 switch (index) {
 case PI_BDBAR:
 case PO_BDBAR:
@@ -1230,31 +1231,33 @@ static const VMStateDescription vmstate_ac97 = {
 }
 };
 
-static void ac97_map (PCIDevice *pci_dev, int region_num,
-  pcibus_t addr, pcibus_t size, int type)
-{
-AC97LinkState *s = DO_UPCAST (AC97LinkState, dev, pci_dev);
-PCIDevice *d = s-dev;
-
-if (!region_num) {
-s-base[0] = addr;
-register_ioport_read (addr, 256 * 1, 1, nam_readb, d);
-register_ioport_read (addr, 256 * 2, 2, nam_readw, d);
-register_ioport_read (addr, 256 * 4, 4, nam_readl, d);
-register_ioport_write (addr, 256 * 1, 1, nam_writeb, d);
-register_ioport_write (addr, 256 * 2, 2, nam_writew, d);
-register_ioport_write (addr, 256 * 4, 4, nam_writel, d);
-}
-else {
-s-base[1] = addr;
-register_ioport_read (addr, 64 * 1, 1, nabm_readb, d);
-register_ioport_read (addr, 64 * 2, 2, nabm_readw, d);
-register_ioport_read (addr, 64 * 4, 4, nabm_readl, d);
-register_ioport_write (addr, 64 * 1, 1, nabm_writeb, d);
-register_ioport_write (addr, 64 * 2, 2, nabm_writew, d);
-register_ioport_write (addr, 64 * 4, 4, nabm_writel, d);
-}
-}
+static const MemoryRegionPortio nam_portio[] = {
+{ 0, 256 * 1, 1, .read = nam_readb, },
+{ 0, 256 * 2, 2, .read = nam_readw, },
+{ 0, 256 * 4, 4, .read = nam_readl, },
+{ 0, 256 * 1, 1, .write = nam_writeb, },
+{ 0, 256 * 2, 2, .write = nam_writew, },
+{ 0, 256 * 4, 4, .write = nam_writel, },
+PORTIO_END_OF_LIST(),
+};
+
+static const MemoryRegionOps ac97_io_nam_ops = {
+.old_portio = nam_portio,
+};
+
+static const MemoryRegionPortio nabm_portio[] = {
+{ 0, 64 * 1, 1, .read = nabm_readb, },
+{ 0, 64 * 2, 2, .read = nabm_readw, },
+{ 0, 64 * 4, 4, .read = nabm_readl, },
+{ 0, 64 * 1, 1, .write = nabm_writeb, },
+{ 0, 64 * 2, 2, .write = nabm_writew, },
+{ 0, 64 * 4, 4, .write = nabm_writel, },
+PORTIO_END_OF_LIST()
+};
+
+static const MemoryRegionOps ac97_io_nabm_ops = {
+.old_portio = nabm_portio,
+};
 
 static void ac97_on_reset (void

[PATCH v4 06/39] cirrus: simplify bitblt BAR access functions

2011-08-08 Thread Avi Kivity

Make use of the memory API's ability to satisfy multi-byte accesses via
multiple single-byte accesses.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |   81 +--
 1 files changed, 13 insertions(+), 68 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 4f57b92..c39acb9 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2446,37 +2446,23 @@ static void cirrus_linear_write(void *opaque, 
target_phys_addr_t addr,
  ***/
 
 
-static uint32_t cirrus_linear_bitblt_readb(void *opaque, target_phys_addr_t 
addr)
+static uint64_t cirrus_linear_bitblt_read(void *opaque,
+  target_phys_addr_t addr,
+  unsigned size)
 {
+CirrusVGAState *s = opaque;
 uint32_t ret;
 
 /* XXX handle bitblt */
+(void)s;
 ret = 0xff;
 return ret;
 }
 
-static uint32_t cirrus_linear_bitblt_readw(void *opaque, target_phys_addr_t 
addr)
-{
-uint32_t v;
-
-v = cirrus_linear_bitblt_readb(opaque, addr);
-v |= cirrus_linear_bitblt_readb(opaque, addr + 1)  8;
-return v;
-}
-
-static uint32_t cirrus_linear_bitblt_readl(void *opaque, target_phys_addr_t 
addr)
-{
-uint32_t v;
-
-v = cirrus_linear_bitblt_readb(opaque, addr);
-v |= cirrus_linear_bitblt_readb(opaque, addr + 1)  8;
-v |= cirrus_linear_bitblt_readb(opaque, addr + 2)  16;
-v |= cirrus_linear_bitblt_readb(opaque, addr + 3)  24;
-return v;
-}
-
-static void cirrus_linear_bitblt_writeb(void *opaque, target_phys_addr_t addr,
-uint32_t val)
+static void cirrus_linear_bitblt_write(void *opaque,
+   target_phys_addr_t addr,
+   uint64_t val,
+   unsigned size)
 {
 CirrusVGAState *s = opaque;
 
@@ -2489,55 +2475,14 @@ static void cirrus_linear_bitblt_writeb(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static void cirrus_linear_bitblt_writew(void *opaque, target_phys_addr_t addr,
-uint32_t val)
-{
-cirrus_linear_bitblt_writeb(opaque, addr, val  0xff);
-cirrus_linear_bitblt_writeb(opaque, addr + 1, (val  8)  0xff);
-}
-
-static void cirrus_linear_bitblt_writel(void *opaque, target_phys_addr_t addr,
-uint32_t val)
-{
-cirrus_linear_bitblt_writeb(opaque, addr, val  0xff);
-cirrus_linear_bitblt_writeb(opaque, addr + 1, (val  8)  0xff);
-cirrus_linear_bitblt_writeb(opaque, addr + 2, (val  16)  0xff);
-cirrus_linear_bitblt_writeb(opaque, addr + 3, (val  24)  0xff);
-}
-
-static uint64_t cirrus_linear_bitblt_read(void *opaque,
-  target_phys_addr_t addr,
-  unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_linear_bitblt_readb(s, addr);
-case 2: return cirrus_linear_bitblt_readw(s, addr);
-case 4: return cirrus_linear_bitblt_readl(s, addr);
-default: abort();
-}
-};
-
-static void cirrus_linear_bitblt_write(void *opaque,
-   target_phys_addr_t addr,
-   uint64_t data,
-   unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_linear_bitblt_writeb(s, addr, data);
-case 2: return cirrus_linear_bitblt_writew(s, addr, data);
-case 4: return cirrus_linear_bitblt_writel(s, addr, data);
-default: abort();
-}
-};
-
 static const MemoryRegionOps cirrus_linear_bitblt_io_ops = {
 .read = cirrus_linear_bitblt_read,
 .write = cirrus_linear_bitblt_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
 };
 
 static void unmap_bank(CirrusVGAState *s, unsigned bank)
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 04/39] vga: convert vga and its derivatives to the memory API

2011-08-08 Thread Avi Kivity

Convert all vga memory to the memory API.  Note we need to fall back to
get_system_memory(), since the various buses don't pass the vga window
as a memory region.

We no longer need to sync the dirty bitmap of the cirrus mapped memory
banks, since the memory API takes care of that for us.

[jan: fix vga-pci logging]

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |  342 ---
 hw/qxl-render.c |2 +-
 hw/qxl.c|  135 --
 hw/qxl.h|6 +-
 hw/vga-isa-mm.c |   46 +---
 hw/vga-isa.c|   10 +-
 hw/vga-pci.c|   28 +
 hw/vga.c|  146 +++-
 hw/vga_int.h|   14 +--
 hw/vmware_vga.c |  143 ---
 10 files changed, 437 insertions(+), 435 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index f39d1f8..ad23c4a 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -32,6 +32,7 @@
 #include console.h
 #include vga_int.h
 #include loader.h
+#include exec-memory.h
 
 /*
  * TODO:
@@ -200,9 +201,14 @@ typedef void (*cirrus_fill_t)(struct CirrusVGAState *s,
 typedef struct CirrusVGAState {
 VGACommonState vga;
 
-int cirrus_linear_io_addr;
-int cirrus_linear_bitblt_io_addr;
-int cirrus_mmio_io_addr;
+MemoryRegion cirrus_linear_io;
+MemoryRegion cirrus_linear_bitblt_io;
+MemoryRegion cirrus_mmio_io;
+MemoryRegion pci_bar;
+bool linear_vram;  /* vga.vram mapped over cirrus_linear_io */
+MemoryRegion low_mem_container; /* container for 0xa-0xc */
+MemoryRegion low_mem;   /* always mapped, overridden by: */
+MemoryRegion *cirrus_bank[2];   /*   aliases at 0xa-0xb  */
 uint32_t cirrus_addr_mask;
 uint32_t linear_mmio_mask;
 uint8_t cirrus_shadow_gr0;
@@ -612,7 +618,7 @@ static void cirrus_invalidate_region(CirrusVGAState * s, 
int off_begin,
off_cur_end = (off_cur + bytesperline)  s-cirrus_addr_mask;
off_cur = TARGET_PAGE_MASK;
while (off_cur  off_cur_end) {
-   cpu_physical_memory_set_dirty(s-vga.vram_offset + off_cur);
+   memory_region_set_dirty(s-vga.vram, off_cur);
off_cur += TARGET_PAGE_SIZE;
}
off_begin += off_pitch;
@@ -1177,12 +1183,6 @@ static void cirrus_update_bank_ptr(CirrusVGAState * s, 
unsigned bank_index)
 }
 
 if (limit  0) {
-/* Thinking about changing bank base? First, drop the dirty bitmap 
information
- * on the current location, otherwise we lose this pointer forever */
-if (s-vga.lfb_vram_mapped) {
-target_phys_addr_t base_addr = isa_mem_base + 0xa + bank_index 
* 0x8000;
-cpu_physical_sync_dirty_bitmap(base_addr, base_addr + 0x8000);
-}
s-cirrus_bank_base[bank_index] = offset;
s-cirrus_bank_limit[bank_index] = limit;
 } else {
@@ -1921,8 +1921,8 @@ static void 
cirrus_mem_writeb_mode4and5_8bpp(CirrusVGAState * s,
val = 1;
dst++;
 }
-cpu_physical_memory_set_dirty(s-vga.vram_offset + offset);
-cpu_physical_memory_set_dirty(s-vga.vram_offset + offset + 7);
+memory_region_set_dirty(s-vga.vram, offset);
+memory_region_set_dirty(s-vga.vram, offset + 7);
 }
 
 static void cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s,
@@ -1946,8 +1946,8 @@ static void 
cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s,
val = 1;
dst += 2;
 }
-cpu_physical_memory_set_dirty(s-vga.vram_offset + offset);
-cpu_physical_memory_set_dirty(s-vga.vram_offset + offset + 15);
+memory_region_set_dirty(s-vga.vram, offset);
+memory_region_set_dirty(s-vga.vram, offset + 15);
 }
 
 /***
@@ -2057,8 +2057,7 @@ static void cirrus_vga_mem_writeb(void *opaque, 
target_phys_addr_t addr,
mode = s-vga.gr[0x05]  0x7;
if (mode  4 || mode  5 || ((s-vga.gr[0x0B]  0x4) == 0)) {
*(s-vga.vram_ptr + bank_offset) = mem_value;
-   cpu_physical_memory_set_dirty(s-vga.vram_offset +
- bank_offset);
+   memory_region_set_dirty(s-vga.vram, bank_offset);
} else {
if ((s-vga.gr[0x0B]  0x14) != 0x14) {
cirrus_mem_writeb_mode4and5_8bpp(s, mode,
@@ -2099,16 +2098,37 @@ static void cirrus_vga_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_
 cirrus_vga_mem_writeb(opaque, addr + 3, (val  24)  0xff);
 }
 
-static CPUReadMemoryFunc * const cirrus_vga_mem_read[3] = {
-cirrus_vga_mem_readb,
-cirrus_vga_mem_readw,
-cirrus_vga_mem_readl,
+static uint64_t cirrus_vga_mem_read(void *opaque,
+target_phys_addr_t addr,
+uint32_t size)
+{
+CirrusVGAState *s = opaque;
+
+switch (size) {
+case 1: return

[PATCH v4 09/39] cirrus: simplify linear framebuffer access functions

2011-08-08 Thread Avi Kivity

Make use of the memory API's ability to satisfy multi-byte accesses via
multiple single-byte accesses.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |   74 ++-
 1 files changed, 8 insertions(+), 66 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 2a9bd25..c9887ac 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2250,7 +2250,8 @@ static void cirrus_cursor_draw_line(VGACommonState *s1, 
uint8_t *d1, int scr_y)
  *
  ***/
 
-static uint32_t cirrus_linear_readb(void *opaque, target_phys_addr_t addr)
+static uint64_t cirrus_linear_read(void *opaque, target_phys_addr_t addr,
+   unsigned size)
 {
 CirrusVGAState *s = opaque;
 uint32_t ret;
@@ -2278,28 +2279,8 @@ static uint32_t cirrus_linear_readb(void *opaque, 
target_phys_addr_t addr)
 return ret;
 }
 
-static uint32_t cirrus_linear_readw(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-
-v = cirrus_linear_readb(opaque, addr);
-v |= cirrus_linear_readb(opaque, addr + 1)  8;
-return v;
-}
-
-static uint32_t cirrus_linear_readl(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-
-v = cirrus_linear_readb(opaque, addr);
-v |= cirrus_linear_readb(opaque, addr + 1)  8;
-v |= cirrus_linear_readb(opaque, addr + 2)  16;
-v |= cirrus_linear_readb(opaque, addr + 3)  24;
-return v;
-}
-
-static void cirrus_linear_writeb(void *opaque, target_phys_addr_t addr,
-uint32_t val)
+static void cirrus_linear_write(void *opaque, target_phys_addr_t addr,
+uint64_t val, unsigned size)
 {
 CirrusVGAState *s = opaque;
 unsigned mode;
@@ -2339,49 +2320,6 @@ static void cirrus_linear_writeb(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static void cirrus_linear_writew(void *opaque, target_phys_addr_t addr,
-uint32_t val)
-{
-cirrus_linear_writeb(opaque, addr, val  0xff);
-cirrus_linear_writeb(opaque, addr + 1, (val  8)  0xff);
-}
-
-static void cirrus_linear_writel(void *opaque, target_phys_addr_t addr,
-uint32_t val)
-{
-cirrus_linear_writeb(opaque, addr, val  0xff);
-cirrus_linear_writeb(opaque, addr + 1, (val  8)  0xff);
-cirrus_linear_writeb(opaque, addr + 2, (val  16)  0xff);
-cirrus_linear_writeb(opaque, addr + 3, (val  24)  0xff);
-}
-
-
-static uint64_t cirrus_linear_read(void *opaque, target_phys_addr_t addr,
-   unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_linear_readb(s, addr);
-case 2: return cirrus_linear_readw(s, addr);
-case 4: return cirrus_linear_readl(s, addr);
-default: abort();
-}
-}
-
-static void cirrus_linear_write(void *opaque, target_phys_addr_t addr,
-uint64_t data, unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_linear_writeb(s, addr, data);
-case 2: return cirrus_linear_writew(s, addr, data);
-case 4: return cirrus_linear_writel(s, addr, data);
-default: abort();
-}
-}
-
 /***
  *
  *  system to screen memory access
@@ -2859,6 +2797,10 @@ static const MemoryRegionOps cirrus_linear_io_ops = {
 .read = cirrus_linear_read,
 .write = cirrus_linear_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
 };
 
 static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci)
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 08/39] vga: simplify vga window mmio access functions

2011-08-08 Thread Avi Kivity

Make use of the memory API's ability to satisfy multi-byte accesses via
multiple single-byte accesses.

We have to keep vga_mem_{read,write}b() since they're used by cirrus.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |4 +-
 hw/vga.c|   56 +++---
 hw/vga_int.h|4 +-
 3 files changed, 12 insertions(+), 52 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index 5ded1ff..2a9bd25 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -1966,7 +1966,7 @@ static uint64_t cirrus_vga_mem_read(void *opaque,
 uint32_t val;
 
 if ((s-vga.sr[0x07]  0x01) == 0) {
-   return vga_mem_readb(s, addr);
+return vga_mem_readb(s-vga, addr);
 }
 
 if (addr  0x1) {
@@ -2011,7 +2011,7 @@ static void cirrus_vga_mem_write(void *opaque,
 unsigned mode;
 
 if ((s-vga.sr[0x07]  0x01) == 0) {
-   vga_mem_writeb(s, addr, mem_value);
+vga_mem_writeb(s-vga, addr, mem_value);
 return;
 }
 
diff --git a/hw/vga.c b/hw/vga.c
index 8b6e6b6..33dc478 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -708,9 +708,8 @@ static void vbe_ioport_write_data(void *opaque, uint32_t 
addr, uint32_t val)
 #endif
 
 /* called for accesses between 0xa and 0xc */
-uint32_t vga_mem_readb(void *opaque, target_phys_addr_t addr)
+uint32_t vga_mem_readb(VGACommonState *s, target_phys_addr_t addr)
 {
-VGACommonState *s = opaque;
 int memory_map_mode, plane;
 uint32_t ret;
 
@@ -764,28 +763,9 @@ uint32_t vga_mem_readb(void *opaque, target_phys_addr_t 
addr)
 return ret;
 }
 
-static uint32_t vga_mem_readw(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-v = vga_mem_readb(opaque, addr);
-v |= vga_mem_readb(opaque, addr + 1)  8;
-return v;
-}
-
-static uint32_t vga_mem_readl(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-v = vga_mem_readb(opaque, addr);
-v |= vga_mem_readb(opaque, addr + 1)  8;
-v |= vga_mem_readb(opaque, addr + 2)  16;
-v |= vga_mem_readb(opaque, addr + 3)  24;
-return v;
-}
-
 /* called for accesses between 0xa and 0xc */
-void vga_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
+void vga_mem_writeb(VGACommonState *s, target_phys_addr_t addr, uint32_t val)
 {
-VGACommonState *s = opaque;
 int memory_map_mode, plane, write_mode, b, func_select, mask;
 uint32_t write_mask, bit_mask, set_mask;
 
@@ -917,20 +897,6 @@ void vga_mem_writeb(void *opaque, target_phys_addr_t addr, 
uint32_t val)
 }
 }
 
-static void vga_mem_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-vga_mem_writeb(opaque, addr, val  0xff);
-vga_mem_writeb(opaque, addr + 1, (val  8)  0xff);
-}
-
-static void vga_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-vga_mem_writeb(opaque, addr, val  0xff);
-vga_mem_writeb(opaque, addr + 1, (val  8)  0xff);
-vga_mem_writeb(opaque, addr + 2, (val  16)  0xff);
-vga_mem_writeb(opaque, addr + 3, (val  24)  0xff);
-}
-
 typedef void vga_draw_glyph8_func(uint8_t *d, int linesize,
  const uint8_t *font_ptr, int h,
  uint32_t fgcol, uint32_t bgcol);
@@ -2105,12 +2071,7 @@ static uint64_t vga_mem_read(void *opaque, 
target_phys_addr_t addr,
 {
 VGACommonState *s = opaque;
 
-switch (size) {
-case 1: return vga_mem_readb(s, addr);
-case 2: return vga_mem_readw(s, addr);
-case 4: return vga_mem_readl(s, addr);
-default: abort();
-}
+return vga_mem_readb(s, addr);
 }
 
 static void vga_mem_write(void *opaque, target_phys_addr_t addr,
@@ -2118,18 +2079,17 @@ static void vga_mem_write(void *opaque, 
target_phys_addr_t addr,
 {
 VGACommonState *s = opaque;
 
-switch (size) {
-case 1: return vga_mem_writeb(s, addr, data);
-case 2: return vga_mem_writew(s, addr, data);
-case 4: return vga_mem_writel(s, addr, data);
-default: abort();
-}
+return vga_mem_writeb(s, addr, data);
 }
 
 const MemoryRegionOps vga_mem_ops = {
 .read = vga_mem_read,
 .write = vga_mem_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
 };
 
 static int vga_common_post_load(void *opaque, int version_id)
diff --git a/hw/vga_int.h b/hw/vga_int.h
index 4592d2c..100d98c 100644
--- a/hw/vga_int.h
+++ b/hw/vga_int.h
@@ -198,8 +198,8 @@ void vga_dirty_log_restart(VGACommonState *s);
 extern const VMStateDescription vmstate_vga_common;
 uint32_t vga_ioport_read(void *opaque, uint32_t addr);
 void vga_ioport_write(void *opaque, uint32_t addr, uint32_t val);
-uint32_t vga_mem_readb(void *opaque, target_phys_addr_t addr);
-void vga_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val);
+uint32_t vga_mem_readb(VGACommonState *s, target_phys_addr_t addr);
+void

[PATCH v4 12/39] pci: allow I/O BARs to be registered with pci_register_bar_region()

2011-08-08 Thread Avi Kivity

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/pci.c   |   43 +++
 hw/pci.h   |1 +
 hw/pci_internals.h |3 ++-
 3 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 0857644..c00cbf8 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -271,7 +271,8 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent,
 qbus_create_inplace(bus-qbus, pci_bus_info, parent, name);
 assert(PCI_FUNC(devfn_min) == 0);
 bus-devfn_min = devfn_min;
-bus-address_space = address_space_mem;
+bus-address_space_mem = address_space_mem;
+bus-address_space_io = address_space_io;
 
 /* host bridge */
 QLIST_INIT(bus-child);
@@ -847,12 +848,11 @@ static void pci_unregister_io_regions(PCIDevice *pci_dev)
 r = pci_dev-io_regions[i];
 if (!r-size || r-addr == PCI_BAR_UNMAPPED)
 continue;
-if (r-type == PCI_BASE_ADDRESS_SPACE_IO) {
-isa_unassign_ioport(r-addr, r-filtered_size);
+if (r-memory) {
+memory_region_del_subregion(r-address_space, r-memory);
 } else {
-if (r-memory) {
-memory_region_del_subregion(pci_dev-bus-address_space,
-r-memory);
+if (r-type == PCI_BASE_ADDRESS_SPACE_IO) {
+isa_unassign_ioport(r-addr, r-filtered_size);
 } else {
 cpu_register_physical_memory(pci_to_cpu_addr(pci_dev-bus,
  r-addr),
@@ -934,9 +934,11 @@ static void pci_simple_bar_mapfunc_region(PCIDevice 
*pci_dev, int region_num,
   pcibus_t addr, pcibus_t size,
   int type)
 {
-memory_region_add_subregion_overlap(pci_dev-bus-address_space,
+PCIIORegion *r = pci_dev-io_regions[region_num];
+
+memory_region_add_subregion_overlap(r-address_space,
 addr,
-pci_dev-io_regions[region_num].memory,
+r-memory,
 1);
 }
 
@@ -953,9 +955,13 @@ void pci_register_bar_region(PCIDevice *pci_dev, int 
region_num,
  uint8_t attr, MemoryRegion *memory)
 {
 pci_register_bar(pci_dev, region_num, memory_region_size(memory),
- PCI_BASE_ADDRESS_SPACE_MEMORY | attr,
+ attr,
  pci_simple_bar_mapfunc_region);
 pci_dev-io_regions[region_num].memory = memory;
+pci_dev-io_regions[region_num].address_space
+= attr  PCI_BASE_ADDRESS_SPACE_IO
+? pci_dev-bus-address_space_io
+: pci_dev-bus-address_space_mem;
 }
 
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
@@ -1090,7 +1096,9 @@ static void pci_update_mappings(PCIDevice *d)
 
 /* now do the real mapping */
 if (r-addr != PCI_BAR_UNMAPPED) {
-if (r-type  PCI_BASE_ADDRESS_SPACE_IO) {
+if (r-memory) {
+memory_region_del_subregion(r-address_space, r-memory);
+} else if (r-type  PCI_BASE_ADDRESS_SPACE_IO) {
 int class;
 /* NOTE: specific hack for IDE in PC case:
only one byte must be mapped. */
@@ -1101,16 +1109,11 @@ static void pci_update_mappings(PCIDevice *d)
 isa_unassign_ioport(r-addr, r-filtered_size);
 }
 } else {
-if (r-memory) {
-memory_region_del_subregion(d-bus-address_space,
-r-memory);
-} else {
-cpu_register_physical_memory(pci_to_cpu_addr(d-bus,
- r-addr),
- r-filtered_size,
- IO_MEM_UNASSIGNED);
-qemu_unregister_coalesced_mmio(r-addr, r-filtered_size);
-}
+cpu_register_physical_memory(pci_to_cpu_addr(d-bus,
+ r-addr),
+ r-filtered_size,
+ IO_MEM_UNASSIGNED);
+qemu_unregister_coalesced_mmio(r-addr, r-filtered_size);
 }
 }
 r-addr = new_addr;
diff --git a/hw/pci.h b/hw/pci.h
index 45b30fa..928e96c 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -95,6 +95,7 @@ typedef struct PCIIORegion {
 PCIMapIORegionFunc *map_func;
 ram_addr_t ram_addr;
 MemoryRegion *memory;
+MemoryRegion *address_space;
 } PCIIORegion;
 
 #define PCI_ROM_SLOT 6
diff --git a/hw/pci_internals.h

[PATCH v4 11/39] pci: pass I/O address space to new PCI bus

2011-08-08 Thread Avi Kivity

This lets us register BARs in the I/O address space.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/apb_pci.c   |1 +
 hw/bonito.c|1 +
 hw/grackle_pci.c   |8 ++--
 hw/gt64xxx.c   |4 +++-
 hw/pc.h|4 +++-
 hw/pc_piix.c   |6 +-
 hw/pci.c   |   18 --
 hw/pci.h   |   10 +++---
 hw/piix_pci.c  |   14 +-
 hw/ppc4xx_pci.c|1 +
 hw/ppc_mac.h   |   11 ---
 hw/ppc_newworld.c  |4 ++--
 hw/ppc_oldworld.c  |4 +++-
 hw/ppc_prep.c  |2 +-
 hw/ppce500_pci.c   |7 ---
 hw/prep_pci.c  |8 ++--
 hw/prep_pci.h  |4 +++-
 hw/sh_pci.c|4 +++-
 hw/unin_pci.c  |   16 
 hw/versatile_pci.c |2 +-
 20 files changed, 91 insertions(+), 38 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 8b9939c..1638226 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -348,6 +348,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 d-bus = pci_register_bus(d-busdev.qdev, pci,
  pci_apb_set_irq, pci_pbm_map_irq, d,
  get_system_memory(),
+ get_system_io(),
  0, 32);
 pci_bus_set_mem_base(d-bus, mem_base);
 
diff --git a/hw/bonito.c b/hw/bonito.c
index 5f62dda..8708e95 100644
--- a/hw/bonito.c
+++ b/hw/bonito.c
@@ -775,6 +775,7 @@ PCIBus *bonito_init(qemu_irq *pic)
 pcihost = FROM_SYSBUS(BonitoState, sysbus_from_qdev(dev));
 b = pci_register_bus(pcihost-busdev.qdev, pci, pci_bonito_set_irq,
  pci_bonito_map_irq, pic, get_system_memory(),
+ get_system_io(),
  0x28, 32);
 pcihost-bus = b;
 qdev_init_nofail(dev);
diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
index da67cf9..9a823e1 100644
--- a/hw/grackle_pci.c
+++ b/hw/grackle_pci.c
@@ -62,7 +62,8 @@ static void pci_grackle_reset(void *opaque)
 }
 
 PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic,
- MemoryRegion *address_space)
+ MemoryRegion *address_space_mem,
+ MemoryRegion *address_space_io)
 {
 DeviceState *dev;
 SysBusDevice *s;
@@ -75,7 +76,10 @@ PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic,
 d-host_state.bus = pci_register_bus(d-busdev.qdev, pci,
  pci_grackle_set_irq,
  pci_grackle_map_irq,
- pic, address_space, 0, 4);
+ pic,
+ address_space_mem,
+ address_space_io,
+ 0, 4);
 
 pci_create_simple(d-host_state.bus, 0, grackle);
 
diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c
index 65e63dd..d541558 100644
--- a/hw/gt64xxx.c
+++ b/hw/gt64xxx.c
@@ -1093,7 +1093,9 @@ PCIBus *gt64120_register(qemu_irq *pic)
 d = FROM_SYSBUS(GT64120State, s);
 d-pci.bus = pci_register_bus(d-busdev.qdev, pci,
   gt64120_pci_set_irq, gt64120_pci_map_irq,
-  pic, get_system_memory(),
+  pic,
+  get_system_memory(),
+  get_system_io(),
   PCI_DEVFN(18, 0), 4);
 d-ISD_handle = cpu_register_io_memory(gt64120_read, gt64120_write, d,
DEVICE_NATIVE_ENDIAN);
diff --git a/hw/pc.h b/hw/pc.h
index a2de0fe..ec34db7 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -179,7 +179,9 @@ struct PCII440FXState;
 typedef struct PCII440FXState PCII440FXState;
 
 PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn,
-qemu_irq *pic, MemoryRegion *address_space,
+qemu_irq *pic,
+MemoryRegion *address_space_mem,
+MemoryRegion *address_space_io,
 ram_addr_t ram_size);
 void i440fx_init_memory_mappings(PCII440FXState *d);
 
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index c0a2abe..7dd5008 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -69,6 +69,7 @@ static void ioapic_init(IsaIrqState *isa_irq_state)
 
 /* PC hardware initialisation */
 static void pc_init1(MemoryRegion *system_memory,
+ MemoryRegion *system_io,
  ram_addr_t ram_size,
  const char *boot_device,
  const char *kernel_filename,
@@ -129,7 +130,7 @@ static void pc_init1(MemoryRegion *system_memory,
 
 if (pci_enabled) {
 pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_irq,
-

[PATCH v4 05/39] cirrus: simplify mmio BAR access functions

2011-08-08 Thread Avi Kivity

Make use of the memory API's ability to satisfy multi-byte accesses via
multiple single-byte accesses.

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |   78 +-
 1 files changed, 8 insertions(+), 70 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index ad23c4a..4f57b92 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -2827,12 +2827,11 @@ static void cirrus_vga_ioport_write(void *opaque, 
uint32_t addr, uint32_t val)
  *
  ***/
 
-static uint32_t cirrus_mmio_readb(void *opaque, target_phys_addr_t addr)
+static uint64_t cirrus_mmio_read(void *opaque, target_phys_addr_t addr,
+ unsigned size)
 {
 CirrusVGAState *s = opaque;
 
-addr = CIRRUS_PNPMMIO_SIZE - 1;
-
 if (addr = 0x100) {
 return cirrus_mmio_blt_read(s, addr - 0x100);
 } else {
@@ -2840,33 +2839,11 @@ static uint32_t cirrus_mmio_readb(void *opaque, 
target_phys_addr_t addr)
 }
 }
 
-static uint32_t cirrus_mmio_readw(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-
-v = cirrus_mmio_readb(opaque, addr);
-v |= cirrus_mmio_readb(opaque, addr + 1)  8;
-return v;
-}
-
-static uint32_t cirrus_mmio_readl(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-
-v = cirrus_mmio_readb(opaque, addr);
-v |= cirrus_mmio_readb(opaque, addr + 1)  8;
-v |= cirrus_mmio_readb(opaque, addr + 2)  16;
-v |= cirrus_mmio_readb(opaque, addr + 3)  24;
-return v;
-}
-
-static void cirrus_mmio_writeb(void *opaque, target_phys_addr_t addr,
-  uint32_t val)
+static void cirrus_mmio_write(void *opaque, target_phys_addr_t addr,
+  uint64_t val, unsigned size)
 {
 CirrusVGAState *s = opaque;
 
-addr = CIRRUS_PNPMMIO_SIZE - 1;
-
 if (addr = 0x100) {
cirrus_mmio_blt_write(s, addr - 0x100, val);
 } else {
@@ -2874,53 +2851,14 @@ static void cirrus_mmio_writeb(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static void cirrus_mmio_writew(void *opaque, target_phys_addr_t addr,
-  uint32_t val)
-{
-cirrus_mmio_writeb(opaque, addr, val  0xff);
-cirrus_mmio_writeb(opaque, addr + 1, (val  8)  0xff);
-}
-
-static void cirrus_mmio_writel(void *opaque, target_phys_addr_t addr,
-  uint32_t val)
-{
-cirrus_mmio_writeb(opaque, addr, val  0xff);
-cirrus_mmio_writeb(opaque, addr + 1, (val  8)  0xff);
-cirrus_mmio_writeb(opaque, addr + 2, (val  16)  0xff);
-cirrus_mmio_writeb(opaque, addr + 3, (val  24)  0xff);
-}
-
-
-static uint64_t cirrus_mmio_read(void *opaque, target_phys_addr_t addr,
- unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_mmio_readb(s, addr);
-case 2: return cirrus_mmio_readw(s, addr);
-case 4: return cirrus_mmio_readl(s, addr);
-default: abort();
-}
-};
-
-static void cirrus_mmio_write(void *opaque, target_phys_addr_t addr,
-  uint64_t data, unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_mmio_writeb(s, addr, data);
-case 2: return cirrus_mmio_writew(s, addr, data);
-case 4: return cirrus_mmio_writel(s, addr, data);
-default: abort();
-}
-};
-
 static const MemoryRegionOps cirrus_mmio_io_ops = {
 .read = cirrus_mmio_read,
 .write = cirrus_mmio_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
 };
 
 /* load/save state */
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 10/39] Integrate I/O memory regions into qemu

2011-08-08 Thread Avi Kivity

get_system_io() returns the root I/O memory region.

Reviewed-by: Richard Henderson r...@twiddle.net
Signed-off-by: Avi Kivity a...@redhat.com
---
 exec-memory.h |5 +
 exec.c|   10 ++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/exec-memory.h b/exec-memory.h
index c439aba..334219f 100644
--- a/exec-memory.h
+++ b/exec-memory.h
@@ -28,6 +28,11 @@
  */
 MemoryRegion *get_system_memory(void);
 
+/* Get the root I/O port region.  This interface should only be used
+ * temporarily until a proper bus interface is available.
+ */
+MemoryRegion *get_system_io(void);
+
 /* Set the root memory region.  This region is the system memory map. */
 void set_system_memory_map(MemoryRegion *mr);
 
diff --git a/exec.c b/exec.c
index 719fff9..be7e4b2 100644
--- a/exec.c
+++ b/exec.c
@@ -113,6 +113,7 @@ static int in_migration;
 RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list) };
 
 static MemoryRegion *system_memory;
+static MemoryRegion *system_io;
 
 #endif
 
@@ -3830,6 +3831,10 @@ static void memory_map_init(void)
 system_memory = qemu_malloc(sizeof(*system_memory));
 memory_region_init(system_memory, system, INT64_MAX);
 set_system_memory_map(system_memory);
+
+system_io = qemu_malloc(sizeof(*system_io));
+memory_region_init(system_io, io, 65536);
+set_system_io_map(system_io);
 }
 
 MemoryRegion *get_system_memory(void)
@@ -3837,6 +3842,11 @@ MemoryRegion *get_system_memory(void)
 return system_memory;
 }
 
+MemoryRegion *get_system_io(void)
+{
+return system_io;
+}
+
 #endif /* !defined(CONFIG_USER_ONLY) */
 
 /* physical memory access (slow version, mainly for debug) */
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 07/39] cirrus: simplify vga window mmio access functions

2011-08-08 Thread Avi Kivity

Make use of the memory API's ability to satisfy multi-byte accesses via
multiple single-byte accesses.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/cirrus_vga.c |   79 +++---
 1 files changed, 11 insertions(+), 68 deletions(-)

diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c
index c39acb9..5ded1ff 100644
--- a/hw/cirrus_vga.c
+++ b/hw/cirrus_vga.c
@@ -1956,7 +1956,9 @@ static void 
cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s,
  *
  ***/
 
-static uint32_t cirrus_vga_mem_readb(void *opaque, target_phys_addr_t addr)
+static uint64_t cirrus_vga_mem_read(void *opaque,
+target_phys_addr_t addr,
+uint32_t size)
 {
 CirrusVGAState *s = opaque;
 unsigned bank_index;
@@ -1967,8 +1969,6 @@ static uint32_t cirrus_vga_mem_readb(void *opaque, 
target_phys_addr_t addr)
return vga_mem_readb(s, addr);
 }
 
-addr = 0x1;
-
 if (addr  0x1) {
/* XXX handle bitblt */
/* video memory */
@@ -2000,28 +2000,10 @@ static uint32_t cirrus_vga_mem_readb(void *opaque, 
target_phys_addr_t addr)
 return val;
 }
 
-static uint32_t cirrus_vga_mem_readw(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-
-v = cirrus_vga_mem_readb(opaque, addr);
-v |= cirrus_vga_mem_readb(opaque, addr + 1)  8;
-return v;
-}
-
-static uint32_t cirrus_vga_mem_readl(void *opaque, target_phys_addr_t addr)
-{
-uint32_t v;
-
-v = cirrus_vga_mem_readb(opaque, addr);
-v |= cirrus_vga_mem_readb(opaque, addr + 1)  8;
-v |= cirrus_vga_mem_readb(opaque, addr + 2)  16;
-v |= cirrus_vga_mem_readb(opaque, addr + 3)  24;
-return v;
-}
-
-static void cirrus_vga_mem_writeb(void *opaque, target_phys_addr_t addr,
-  uint32_t mem_value)
+static void cirrus_vga_mem_write(void *opaque,
+ target_phys_addr_t addr,
+ uint64_t mem_value,
+ uint32_t size)
 {
 CirrusVGAState *s = opaque;
 unsigned bank_index;
@@ -2033,8 +2015,6 @@ static void cirrus_vga_mem_writeb(void *opaque, 
target_phys_addr_t addr,
 return;
 }
 
-addr = 0x1;
-
 if (addr  0x1) {
if (s-cirrus_srcptr != s-cirrus_srcptr_end) {
/* bitblt */
@@ -2084,51 +2064,14 @@ static void cirrus_vga_mem_writeb(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static void cirrus_vga_mem_writew(void *opaque, target_phys_addr_t addr, 
uint32_t val)
-{
-cirrus_vga_mem_writeb(opaque, addr, val  0xff);
-cirrus_vga_mem_writeb(opaque, addr + 1, (val  8)  0xff);
-}
-
-static void cirrus_vga_mem_writel(void *opaque, target_phys_addr_t addr, 
uint32_t val)
-{
-cirrus_vga_mem_writeb(opaque, addr, val  0xff);
-cirrus_vga_mem_writeb(opaque, addr + 1, (val  8)  0xff);
-cirrus_vga_mem_writeb(opaque, addr + 2, (val  16)  0xff);
-cirrus_vga_mem_writeb(opaque, addr + 3, (val  24)  0xff);
-}
-
-static uint64_t cirrus_vga_mem_read(void *opaque,
-target_phys_addr_t addr,
-uint32_t size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_vga_mem_readb(s, addr);
-case 2: return cirrus_vga_mem_readw(s, addr);
-case 4: return cirrus_vga_mem_readl(s, addr);
-default: abort();
-}
-}
-
-static void cirrus_vga_mem_write(void *opaque, target_phys_addr_t addr,
- uint64_t data, unsigned size)
-{
-CirrusVGAState *s = opaque;
-
-switch (size) {
-case 1: return cirrus_vga_mem_writeb(s, addr, data);
-case 2: return cirrus_vga_mem_writew(s, addr, data);
-case 4: return cirrus_vga_mem_writel(s, addr, data);
-default: abort();
-}
-};
-
 static const MemoryRegionOps cirrus_vga_mem_ops = {
 .read = cirrus_vga_mem_read,
 .write = cirrus_vga_mem_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
 };
 
 /***
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 03/39] vmsvga: don't remember pci BAR address in callback any more

2011-08-08 Thread Avi Kivity

We're going to remove the callback, so we can't use it to save the
address.  Use the pci API instead.

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/vmware_vga.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 354c221..190b005 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -52,8 +52,6 @@ struct vmsvga_state_s {
 int on;
 } cursor;
 
-target_phys_addr_t vram_base;
-
 int index;
 int scratch_size;
 uint32_t *scratch;
@@ -761,8 +759,11 @@ static uint32_t vmsvga_value_read(void *opaque, uint32_t 
address)
 case SVGA_REG_BYTES_PER_LINE:
 return ((s-depth + 7)  3) * s-new_width;
 
-case SVGA_REG_FB_START:
-return s-vram_base;
+case SVGA_REG_FB_START: {
+struct pci_vmsvga_state_s *pci_vmsvga
+= container_of(s, struct pci_vmsvga_state_s, chip);
+return pci_get_bar_addr(pci_vmsvga-card, 1);
+}
 
 case SVGA_REG_FB_OFFSET:
 return 0x0;
@@ -1247,14 +1248,13 @@ static void pci_vmsvga_map_mem(PCIDevice *pci_dev, int 
region_num,
 struct vmsvga_state_s *s = d-chip;
 ram_addr_t iomemtype;
 
-s-vram_base = addr;
 #ifdef DIRECT_VRAM
 iomemtype = cpu_register_io_memory(vmsvga_vram_read,
 vmsvga_vram_write, s, DEVICE_NATIVE_ENDIAN);
 #else
 iomemtype = s-vga.vram_offset | IO_MEM_RAM;
 #endif
-cpu_register_physical_memory(s-vram_base, s-vga.vram_size,
+cpu_register_physical_memory(addr, s-vga.vram_size,
 iomemtype);
 
 s-vga.map_addr = addr;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Avi Kivity


On 08/08/2011 04:00 PM, Anthony Liguori wrote:

On 08/08/2011 07:56 AM, Avi Kivity wrote:

QEMU deals with a lot of fixed width integer types; their names
(uint64_t etc) are clumsy to use and take up a lot of space.

Following Linux, introduce shorter names, for example U64 for
uint64_t.


Except Linux uses lower case letters.

I personally think Linux style is wrong here.  The int8_t types are 
standard types.


Besides, we save lots of characters by using 4-space tabs instead of 
8-space tabs.  We can afford to spend some of those saved characters 
on using proper type names :-)




It's not about saving space, it's about improving readability.  We have 
about 21k uses of these types, they deserve short names.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Michael S. Tsirkin

On Mon, Aug 08, 2011 at 08:02:08AM -0500, Anthony Liguori wrote:
 On 08/08/2011 07:56 AM, Michael S. Tsirkin wrote:
 On Mon, Aug 08, 2011 at 07:45:19AM -0500, Anthony Liguori wrote:
 On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote:
 Thinking more closely, I don't think this right.
 
 Updating on map ensured that the config was refreshed after each
 time the bar was mapped.  In the very least, the config needs to be
 refreshed during reset because the guest may write to the guest
 space which should get cleared after reset.
 
 Regards,
 
 Anthony Liguori
 
 Not sure I understand. Which register, for example,
 do you have in mind?
 Could you clarify please?
 
 Actually, you never need to call config_get() AFAICT.  It's called
 in every read/write access.
 
 Every read, yes. But every write? Are you sure?
 
 Yeah, not on write, but I think this is a bug.  get_config() should
 be called before doing the memcpy() in order to have a proper RMW.
 
 Regards,
 
 Anthony Liguori

Probably not noticeable because guests don't do the RMW
in practice.
We also send the config over on migration.
That's probably a bug as well ...

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init

2011-08-08 Thread Anthony Liguori


On 08/08/2011 08:14 AM, Michael S. Tsirkin wrote:

Probably not noticeable because guests don't do the RMW
in practice.
We also send the config over on migration.
That's probably a bug as well ...


It's probably unnecessary, but I don't think it's a bug..

Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Anthony Liguori


On 08/08/2011 08:12 AM, Avi Kivity wrote:

On 08/08/2011 04:00 PM, Anthony Liguori wrote:

On 08/08/2011 07:56 AM, Avi Kivity wrote:

QEMU deals with a lot of fixed width integer types; their names
(uint64_t etc) are clumsy to use and take up a lot of space.

Following Linux, introduce shorter names, for example U64 for
uint64_t.


Except Linux uses lower case letters.

I personally think Linux style is wrong here. The int8_t types are
standard types.

Besides, we save lots of characters by using 4-space tabs instead of
8-space tabs. We can afford to spend some of those saved characters on
using proper type names :-)



It's not about saving space, it's about improving readability. We have
about 21k uses of these types, they deserve short names.


This is one of the few areas that we're actually consistent with today. 
 Introducing a new set of types will just create inconsistency.


Most importantly, these are standard types.  Every modern library and C 
program should be using them.  TBH, having short names is just a bad 
case of NIH.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Peter Maydell

On 8 August 2011 13:56, Avi Kivity a...@redhat.com wrote:
 QEMU deals with a lot of fixed width integer types; their names
 (uint64_t etc) are clumsy to use and take up a lot of space.

 Following Linux, introduce shorter names, for example U64 for
 uint64_t.

Strongly disagree. uint64_t c are standard types and it's
immediately clear to a competent C programmer what they are.
Random qemu-specific funny named types just introduces an
unnecessary level of indirection.

We only just recently managed to get rid of the nonstandard
typenames for these from fpu/...

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Avi Kivity


On 08/08/2011 04:17 PM, Anthony Liguori wrote:


This is one of the few areas that we're actually consistent with 
today.  Introducing a new set of types will just create inconsistency.


Most importantly, these are standard types.  Every modern library and 
C program should be using them.  TBH, having short names is just a bad 
case of NIH.




Those are exactly the same types, compatible with all the libraries.  
NIH would be redefining them ourselves (and breaking pointer 
compatibility etc.)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Frediano Ziglio

2011/8/8 Avi Kivity a...@redhat.com:
 On 08/08/2011 03:49 PM, Frediano Ziglio wrote:

 2011/8/8 Avi Kivitya...@redhat.com:
   In certain circumstances, posix-aio-compat can incur a lot of latency:
     - threads are created by vcpu threads, so if vcpu affinity is set,
      aio threads inherit vcpu affinity.  This can cause many aio threads
      to compete for one cpu.
     - we can create up to max_threads (64) aio threads in one go; since a
      pthread_create can take around 30μs, we have up to 2ms of cpu time
      under a global lock.
 
   Fix by:
     - moving thread creation to the main thread, so we inherit the main
      thread's affinity instead of the vcpu thread's affinity.
     - if a thread is currently being created, and we need to create yet
      another thread, let thread being born create the new thread,
  reducing
      the amount of time we spend under the main thread.
     - drop the local lock while creating a thread (we may still hold the
      global mutex, though)
 
   Note this doesn't eliminate latency completely; scheduler artifacts or
   lack of host cpu resources can still cause it.  We may want
  pre-allocated
   threads when this cannot be tolerated.
 
   Thanks to Uli Obergfell of Red Hat for his excellent analysis and
  suggestions.
 
   Signed-off-by: Avi Kivitya...@redhat.com

 Why not calling pthread_attr_setaffinity_np (where available) before
 thread creation or shed_setaffinity at thread start instead of telling
 another thread to create a thread for us just to get affinity cleared?


 The entire qemu process may be affined to a subset of the host cpus; we
 don't want to break that.

 For example:

   taskset 0xf0 qemu 
   (qemu) info cpus
 pin individual vcpu threads to host cpus



Just call sched_getaffinity at program start, save to a global
variable and then set this affinity for io threads.
I didn't use affinity that much but from manual it seems that if you
own process you can set affinity as you like.
IMHO this patch introduce a delay in io thread creation due to posting
thread creation to another thread just to set different affinity.

Frediano
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues

2011-08-08 Thread Avi Kivity


On 08/08/2011 04:21 PM, Frediano Ziglio wrote:


  The entire qemu process may be affined to a subset of the host cpus; we
  don't want to break that.

  For example:

 taskset 0xf0 qemu 
 (qemu) info cpus
  pin individual vcpu threads to host cpus



Just call sched_getaffinity at program start, save to a global
variable and then set this affinity for io threads.


This affinity may change later on.


I didn't use affinity that much but from manual it seems that if you
own process you can set affinity as you like.
IMHO this patch introduce a delay in io thread creation due to posting
thread creation to another thread just to set different affinity.


It does.  But aio threads have a long life, so this happens very rarely.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types

2011-08-08 Thread Kevin Wolf

Am 08.08.2011 15:00, schrieb Anthony Liguori:
 On 08/08/2011 07:56 AM, Avi Kivity wrote:
 QEMU deals with a lot of fixed width integer types; their names
 (uint64_t etc) are clumsy to use and take up a lot of space.

 Following Linux, introduce shorter names, for example U64 for
 uint64_t.
 
 Except Linux uses lower case letters.
 
 I personally think Linux style is wrong here.  The int8_t types are 
 standard types.

I fully agree, we should use the standard types.

 Besides, we save lots of characters by using 4-space tabs instead of 
 8-space tabs.  We can afford to spend some of those saved characters on 
 using proper type names :-)

Heh, I like this reasoning. :-)

Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC] postcopy livemigration proposal

2011-08-08 Thread Dor Laor


On 08/08/2011 03:32 PM, Anthony Liguori wrote:

On 08/08/2011 04:20 AM, Dor Laor wrote:

On 08/08/2011 06:24 AM, Isaku Yamahata wrote:

This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM
on which we'll give a talk at KVM-forum.
The purpose of this mail is to letting developers know it in advance
so that we can get better feedback on its design/implementation approach
early before our starting to implement it.


Background
==
* What's is postcopy livemigration
It is is yet another live migration mechanism for Qemu/KVM, which
implements the migration technique known as postcopy or lazy
migration. Just after the migrate command is invoked, the execution
host of a VM is instantaneously switched to a destination host.

The benefit is, total migration time is shorter because it transfer
a page only once. On the other hand precopy may repeat sending same
pages
again and again because they can be dirtied.
The switching time from the source to the destination is several
hunderds mili seconds so that it enables quick load balancing.
For details, please refer to the papers.

We believe this is useful for others so that we'd like to merge this
feature into the upstream qemu/kvm. The existing implementation that
we have right now is very ad-hoc because it's for academic research.
For the upstream merge, we're starting to re-design/implement it and
we'd like to get feedback early. Although many
improvements/optimizations
are possible, we should implement/merge the simple/clean, but extensible
as well, one at first and then improve/optimize it later.

postcopy livemigration will be introduced as optional feature. The
existing
precopy livemigration remains as default behavior.


* related links:
project page
http://sites.google.com/site/grivonhome/quick-kvm-migration

Enabling Instantaneous Relocation of Virtual Machines with a
Lightweight VMM Extension,
(proof-of-concept, ad-hoc prototype. not a new design)
http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf
http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf

Reactive consolidation of virtual machines enabled by postcopy live
migration
(advantage for VM consolidation)
http://portal.acm.org/citation.cfm?id=1996125
http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf



Qemu wiki
http://wiki.qemu.org/Features/PostCopyLiveMigration


Design/Implementation
=
The basic idea of postcopy livemigration is to use a sort of distributed
shared memory between the migration source and destination.

The migration procedure looks like
- start migration
stop the guest VM on the source and send the machine states except
guest RAM to the destination
- resume the guest VM on the destination without guest RAM contents
- Hook guest access to pages, and pull page contents from the source
This continues until all the pages are pulled to the destination

The big picture is depicted at
http://wiki.qemu.org/File:Postcopy-livemigration.png


That's terrific (nice video also)!
Orit and myself had the exact same idea too (now we can't patent it..).

Advantages:
- No down time due to memory copying.


But non-deterministic down time due to network latency while trying to
satisfy a page fault.


True but it is possible to limit it with some dedicated network or 
bandwidth reservation.





- Efficient, reduce needed traffic no need to re-send pages.


It's not quite that simple. Post-copy needs to introduce a protocol
capable of requesting pages.


Just another subsection.. (kidding), still it shouldn't be too 
complicated, just an offset+pagesize and return page_content/error




I think in presenting something like this, it's important to collect
quite a bit of performance data. I'd suggest doing runs while running
jitterd in the guest to attempt to quantify the actual downtime
experienced too.

http://git.codemonkey.ws/cgit/jitterd.git/


and also comparing the speed that it takes for various benchmarks like 
iozone/netperf/linpack/..




There's a lot of potential in something like this, but it's not obvious
to me whether it's a net win. Should make for a very interesting
presentation :-)


- Reduce overall RAM consumption of the source and destination
as opposed from current live migration (both the source and the
destination allocate the memory until the live migration
completes). We can free copied memory once the destination guest
received it and save RAM.
- Increase parallelism for SMP guests we can have multiple
virtual CPU handle their demand paging . Less time to hold a
global lock, less thread contention.
- Virtual machines are using more and more memory resources ,
for a virtual machine with very large working set doing live
migration with reasonable down time is impossible today.


This is really just a limitation of our implementation. In theory,
pre-copy allows you to exert fine grain resource control over the guest
which you can use to encourage convergence.


But a very

Re: [Qemu-devel] [PATCH v4 01/39] memory: rename PORTIO_END to PORTIO_END_OF_LIST

2011-08-08 Thread Anthony Liguori


On 08/08/2011 08:08 AM, Avi Kivity wrote:

For consistency with other _END_OF_LIST macros.

Signed-off-by: Avi Kivitya...@redhat.com


Reviewed-by: Anthony Liguori aligu...@us.ibm.com

Regards,

Anthony Liguori


---
  memory.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/memory.h b/memory.h
index 4e518b2..da00a3b 100644
--- a/memory.h
+++ b/memory.h
@@ -133,7 +133,7 @@ struct MemoryRegionPortio {
  IOPortWriteFunc *write;
  };

-#define PORTIO_END { }
+#define PORTIO_END_OF_LIST() { }

  /**
   * memory_region_init: Initialize a memory region


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 00/39] Memory API, batch 2: PCI devices

2011-08-08 Thread Michael S. Tsirkin

On Mon, Aug 08, 2011 at 04:08:53PM +0300, Avi Kivity wrote:
 This is a mostly mindless conversion of all QEMU PCI devices to the memory 
 API.
 After this patchset is applied, it is no longer possible to create a PCI 
 device
 using the old API.
 
 An immediate benefit is that PCI BARs that overlap each other are now handled
 correctly: currently, the sequence
 
   map BAR 0
   map BAR 1 at an overlapping address
   unmap either BAR 0 or BAR 1
 
 will leave a hole where the overlap exists.  With the patchset, the memory map
 is restored correctly.
 
 Note that overlaps of PCI BARs with memory or non-PCI resources are still not
 resolved correctly; this will be fixed later on.
 
 The vga patches have ugly intermediate states; however the result is fairly 
 clean.
 
 Changes from v3:
  - dropped virtio-pci config patch; will be fixed outside this patchset if
necessary
  - minor style fixes
 
 Changes from v2:
  - added patch from Michael simplifying virtio-pci config setup
 
 Changes from v1:
  - cmd646 type fix
  - folded a fixlet into its parent


For the series:

Acked-by: Michael S. Tsirkin m...@redhat.com

 Avi Kivity (39):
   memory: rename PORTIO_END to PORTIO_END_OF_LIST
   pci: add API to get a BAR's mapped address
   vmsvga: don't remember pci BAR address in callback any more
   vga: convert vga and its derivatives to the memory API
   cirrus: simplify mmio BAR access functions
   cirrus: simplify bitblt BAR access functions
   cirrus: simplify vga window mmio access functions
   vga: simplify vga window mmio access functions
   cirrus: simplify linear framebuffer access functions
   Integrate I/O memory regions into qemu
   pci: pass I/O address space to new PCI bus
   pci: allow I/O BARs to be registered with pci_register_bar_region()
   rtl8139: convert to memory API
   ac97: convert to memory API
   e1000: convert to memory API
   eepro100: convert to memory API
   es1370: convert to memory API
   ide: convert to memory API
   ivshmem: convert to memory API
   virtio-pci: convert to memory API
   ahci: convert to memory API
   intel-hda: convert to memory API
   lsi53c895a: convert to memory API
   ppc: convert to memory API
   ne2000: convert to memory API
   pcnet: convert to memory API
   i6300esb: convert to memory API
   isa-mmio: convert to memory API
   sun4u: convert to memory API
   ehci: convert to memory API
   uhci: convert to memory API
   xen-platform: convert to memory API
   msix: convert to memory API
   pci: remove pci_register_bar_simple()
   pci: convert pci rom to memory API
   pci: remove pci_register_bar()
   pci: fold BAR mapping function into its caller
   pci: rename pci_register_bar_region() to pci_register_bar()
   pci: remove support for pre memory API BARs
 
  exec-memory.h  |5 +
  exec.c |   10 ++
  hw/ac97.c  |   88 ++-
  hw/apb_pci.c   |1 +
  hw/bonito.c|1 +
  hw/cirrus_vga.c|  459 ---
  hw/cuda.c  |6 +-
  hw/e1000.c |  113 ++
  hw/eepro100.c  |  181 -
  hw/es1370.c|   43 +++--
  hw/escc.c  |   42 +++---
  hw/escc.h  |2 +-
  hw/grackle_pci.c   |8 +-
  hw/gt64xxx.c   |4 +-
  hw/heathrow_pic.c  |   29 ++--
  hw/ide.h   |2 +-
  hw/ide/ahci.c  |   31 ++--
  hw/ide/ahci.h  |2 +-
  hw/ide/cmd646.c|  204 +++-
  hw/ide/ich.c   |3 +-
  hw/ide/macio.c |   36 +++--
  hw/ide/pci.c   |   25 ++--
  hw/ide/pci.h   |   19 ++-
  hw/ide/piix.c  |   63 ++--
  hw/ide/via.c   |   64 ++--
  hw/intel-hda.c |   35 +++--
  hw/isa.h   |2 +
  hw/isa_mmio.c  |   29 ++--
  hw/ivshmem.c   |  158 +++
  hw/lance.c |   31 ++--
  hw/lsi53c895a.c|  257 +++---
  hw/mac_dbdma.c |   32 ++--
  hw/mac_dbdma.h |4 +-
  hw/mac_nvram.c |   39 ++---
  hw/macio.c |   73 -
  hw/msix.c  |   64 +++-
  hw/msix.h  |6 +-
  hw/ne2000-isa.c|   13 +--
  hw/ne2000.c|   77 ++---
  hw/ne2000.h|8 +-
  hw/openpic.c   |   81 +-
  hw/openpic.h   |2 +-
  hw/pc.h|4 +-
  hw/pc_piix.c   |6 +-
  hw/pci.c   |  133 +---
  hw/pci.h   |   26 ++--
  hw/pci_internals.h |3 +-
  hw/pcnet-pci.c |   74 +
  hw/pcnet.h |4 +-
  hw/piix_pci.c  |   14 +-
  hw/ppc4xx_pci.c|1 +
  hw/ppc_mac.h   |   27 ++--
  hw/ppc_newworld.c  |   34 ++--
  hw/ppc_oldworld.c  |   27 ++--
  hw/ppc_prep.c  |2 +-
  hw/ppce500_pci.c   |7 +-
  hw/prep_pci.c  |8 +-
  hw/prep_pci.h  |4 +-
  hw/qxl-render.c|2 +-
  hw/qxl.c   |  129 ++--
  hw/qxl.h   |6 +-
  hw/rtl8139.c   |   70 
  hw/sh_pci.c|4 +-
  hw/sun4u.c |

Re: [Qemu-devel] [PATCH v4 20/39] virtio-pci: convert to memory API

2011-08-08 Thread Anthony Liguori


On 08/08/2011 08:09 AM, Avi Kivity wrote:

except msix.

[jan: fix build]


This actually breaks the build:

  CClibhw64/virtio-pci.o
cc1: warnings being treated as errors
/home/anthony/git/qemu/hw/virtio-pci.c: In function ‘virtio_write_config’:
/home/anthony/git/qemu/hw/virtio-pci.c:496:19: error: unused variable ‘vdev’
make[1]: *** [virtio-pci.o] Error 1
make: *** [subdir-libhw64] Error 2




Reviewed-by: Richard Hendersonr...@twiddle.net
Reviewed-by: Anthony Liguorialigu...@us.ibm.com
Signed-off-by: Avi Kivitya...@redhat.com
---
  hw/virtio-pci.c |   71 +--
  hw/virtio-pci.h |2 +-
  2 files changed, 28 insertions(+), 45 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index f3b3293..5df380d 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -162,7 +162,8 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
  {
  VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
  EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
-int r;
+int r = 0;
+
  if (assign) {
  r = event_notifier_init(notifier, 1);
  if (r  0) {
@@ -170,24 +171,11 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
   __func__, r);
  return r;
  }
-r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
-   n, assign);
-if (r  0) {
-error_report(%s: unable to map ioeventfd: %d,
- __func__, r);
-event_notifier_cleanup(notifier);
-}
+memory_region_add_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
+  true, n, event_notifier_get_fd(notifier));
  } else {
-r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
-   n, assign);
-if (r  0) {
-error_report(%s: unable to unmap ioeventfd: %d,
- __func__, r);
-return r;
-}
-
+memory_region_del_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
+  true, n, event_notifier_get_fd(notifier));
  /* Handle the race condition where the guest kicked and we deassigned
   * before we got around to handling the kick.
   */
@@ -424,7 +412,6 @@ static uint32_t virtio_pci_config_readb(void *opaque, 
uint32_t addr)
  {
  VirtIOPCIProxy *proxy = opaque;
  uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
  if (addr  config)
  return virtio_ioport_read(proxy, addr);
  addr -= config;
@@ -435,7 +422,6 @@ static uint32_t virtio_pci_config_readw(void *opaque, 
uint32_t addr)
  {
  VirtIOPCIProxy *proxy = opaque;
  uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
  if (addr  config)
  return virtio_ioport_read(proxy, addr);
  addr -= config;
@@ -446,7 +432,6 @@ static uint32_t virtio_pci_config_readl(void *opaque, 
uint32_t addr)
  {
  VirtIOPCIProxy *proxy = opaque;
  uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
  if (addr  config)
  return virtio_ioport_read(proxy, addr);
  addr -= config;
@@ -457,7 +442,6 @@ static void virtio_pci_config_writeb(void *opaque, uint32_t 
addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
  uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
  if (addr  config) {
  virtio_ioport_write(proxy, addr, val);
  return;
@@ -470,7 +454,6 @@ static void virtio_pci_config_writew(void *opaque, uint32_t 
addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
  uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
  if (addr  config) {
  virtio_ioport_write(proxy, addr, val);
  return;
@@ -483,7 +466,6 @@ static void virtio_pci_config_writel(void *opaque, uint32_t 
addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
  uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
  if (addr  config) {
  virtio_ioport_write(proxy, addr, val);
  return;
@@ -492,30 +474,26 @@ static void virtio_pci_config_writel(void *opaque, 
uint32_t addr, uint32_t val)
  virtio_config_writel(proxy-vdev, addr, val);
  }

-static void virtio_map(PCIDevice *pci_dev, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
-{
-VirtIOPCIProxy *proxy = container_of(pci_dev, VirtIOPCIProxy, pci_dev);
-VirtIODevice *vdev = proxy-vdev;
-unsigned config_len = VIRTIO_PCI_REGION_SIZE(pci_dev) + vdev-config_len;
-
-proxy-addr = addr;
-
-register_ioport_write(addr, config_len, 1,

[PATCH v4.1 20/39] virtio-pci: convert to memory API

2011-08-08 Thread Avi Kivity

except msix.

[jan: fix build]

Reviewed-by: Richard Henderson r...@twiddle.net
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com
---

v4.1: drop unused variable

 hw/virtio-pci.c |   70 --
 hw/virtio-pci.h |2 +-
 2 files changed, 27 insertions(+), 45 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index f3b3293..86c3229 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -162,7 +162,8 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
 {
 VirtQueue *vq = virtio_get_queue(proxy-vdev, n);
 EventNotifier *notifier = virtio_queue_get_host_notifier(vq);
-int r;
+int r = 0;
+
 if (assign) {
 r = event_notifier_init(notifier, 1);
 if (r  0) {
@@ -170,24 +171,11 @@ static int 
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
  __func__, r);
 return r;
 }
-r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
-   n, assign);
-if (r  0) {
-error_report(%s: unable to map ioeventfd: %d,
- __func__, r);
-event_notifier_cleanup(notifier);
-}
+memory_region_add_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
+  true, n, event_notifier_get_fd(notifier));
 } else {
-r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier),
-   proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY,
-   n, assign);
-if (r  0) {
-error_report(%s: unable to unmap ioeventfd: %d,
- __func__, r);
-return r;
-}
-
+memory_region_del_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2,
+  true, n, event_notifier_get_fd(notifier));
 /* Handle the race condition where the guest kicked and we deassigned
  * before we got around to handling the kick.
  */
@@ -424,7 +412,6 @@ static uint32_t virtio_pci_config_readb(void *opaque, 
uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config)
 return virtio_ioport_read(proxy, addr);
 addr -= config;
@@ -435,7 +422,6 @@ static uint32_t virtio_pci_config_readw(void *opaque, 
uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config)
 return virtio_ioport_read(proxy, addr);
 addr -= config;
@@ -446,7 +432,6 @@ static uint32_t virtio_pci_config_readl(void *opaque, 
uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config)
 return virtio_ioport_read(proxy, addr);
 addr -= config;
@@ -457,7 +442,6 @@ static void virtio_pci_config_writeb(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config) {
 virtio_ioport_write(proxy, addr, val);
 return;
@@ -470,7 +454,6 @@ static void virtio_pci_config_writew(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config) {
 virtio_ioport_write(proxy, addr, val);
 return;
@@ -483,7 +466,6 @@ static void virtio_pci_config_writel(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev);
-addr -= proxy-addr;
 if (addr  config) {
 virtio_ioport_write(proxy, addr, val);
 return;
@@ -492,25 +474,20 @@ static void virtio_pci_config_writel(void *opaque, 
uint32_t addr, uint32_t val)
 virtio_config_writel(proxy-vdev, addr, val);
 }
 
-static void virtio_map(PCIDevice *pci_dev, int region_num,
-   pcibus_t addr, pcibus_t size, int type)
-{
-VirtIOPCIProxy *proxy = container_of(pci_dev, VirtIOPCIProxy, pci_dev);
-VirtIODevice *vdev = proxy-vdev;
-unsigned config_len = VIRTIO_PCI_REGION_SIZE(pci_dev) + vdev-config_len;
-
-proxy-addr = addr;
-
-register_ioport_write(addr, config_len, 1, virtio_pci_config_writeb, 
proxy);
-register_ioport_write(addr, config_len, 2, virtio_pci_config_writew, 
proxy);
-register_ioport_write(addr, config_len, 4, virtio_pci_config_writel, 
proxy);
-register_ioport_read(addr, config_len, 1, virtio_pci_config_readb, proxy);
-register_ioport_read(addr, config_len, 2, virtio_pci_config_readw, proxy);
-register_ioport_read(addr,

Re: [Qemu-devel] [RFC] postcopy livemigration proposal

2011-08-08 Thread Anthony Liguori


On 08/08/2011 10:11 AM, Dor Laor wrote:

On 08/08/2011 03:32 PM, Anthony Liguori wrote:

On 08/08/2011 04:20 AM, Dor Laor wrote:


That's terrific (nice video also)!
Orit and myself had the exact same idea too (now we can't patent it..).

Advantages:
- No down time due to memory copying.


But non-deterministic down time due to network latency while trying to
satisfy a page fault.


True but it is possible to limit it with some dedicated network or
bandwidth reservation.


Yup.  Any technique that uses RDMA (which is basically what this is) 
requires dedicated network resources.



- Efficient, reduce needed traffic no need to re-send pages.


It's not quite that simple. Post-copy needs to introduce a protocol
capable of requesting pages.


Just another subsection.. (kidding), still it shouldn't be too
complicated, just an offset+pagesize and return page_content/error


What I meant by this is that there is potentially a lot of round trip 
overhead.  Pre-copy migration works well with reasonable high latency 
network connections because the downtime is capped only by the maximum 
latency sending from one point to another.


But with something like this, the total downtime is 
2*max_latency*nb_pagefaults.  That's potentially pretty high.


So it may be desirable to try to reduce nb_pagefaults by prefaulting in 
pages, etc.  Suffice to say, this ends up getting complicated and may 
end up burning network traffic too.



This is really just a limitation of our implementation. In theory,
pre-copy allows you to exert fine grain resource control over the guest
which you can use to encourage convergence.


But a very large guest w/ large working set that changes more frequent
than the network bandwidth might always need huge down time with the
current system.


In theory, you can do things like reduce the guests' priority to reduce 
the amount of work it can do in order to encourage convergence.



One thing I think we need to do is put together a live migration
roadmap. We've got a lot of invasive efforts underway with live
migration and I fear that without some planning and serialization, some
of this useful work with get lost.


Some of them are parallel. I think all the readers here agree that post
copy migration should be an option while we need to maintain the current
one.


I actually think they need to be done mostly in sequence while cleaning 
up some of the current infrastructure.  I don't think we really should 
make any major changes (beyond maybe the separate thread) until we 
eliminate QEMUFile.


There's so much overhead involved in using QEMUFile today, I think it's 
hard to talk about performance data when we've got a major bottleneck 
sitting in the middle.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC] postcopy livemigration proposal

2011-08-08 Thread Avi Kivity


On 08/08/2011 06:29 PM, Anthony Liguori wrote:



- Efficient, reduce needed traffic no need to re-send pages.


It's not quite that simple. Post-copy needs to introduce a protocol
capable of requesting pages.


Just another subsection.. (kidding), still it shouldn't be too
complicated, just an offset+pagesize and return page_content/error


What I meant by this is that there is potentially a lot of round trip 
overhead.  Pre-copy migration works well with reasonable high latency 
network connections because the downtime is capped only by the maximum 
latency sending from one point to another.


But with something like this, the total downtime is 
2*max_latency*nb_pagefaults.  That's potentially pretty high.


Let's be generous and assume that the latency is dominated by page copy 
time.  So the total downtime is equal to the first live migration pass, 
~20 sec for 2GB on 1GbE.  It's distributed over potentially even more 
time, though.  If the guest does a lot of I/O, it may not be noticeable 
(esp. if we don't copy over pages read from disk).  If the guest is 
cpu/memory bound, it'll probably suck badly.




So it may be desirable to try to reduce nb_pagefaults by prefaulting 
in pages, etc.  Suffice to say, this ends up getting complicated and 
may end up burning network traffic too.


Yeah, and prefaulting in the background adds latency to synchronous 
requests.


This really needs excellent networking resources to work well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [RFC] postcopy livemigration proposal

2011-08-08 Thread Anthony Liguori


On 08/08/2011 10:36 AM, Avi Kivity wrote:

On 08/08/2011 06:29 PM, Anthony Liguori wrote:



- Efficient, reduce needed traffic no need to re-send pages.


It's not quite that simple. Post-copy needs to introduce a protocol
capable of requesting pages.


Just another subsection.. (kidding), still it shouldn't be too
complicated, just an offset+pagesize and return page_content/error


What I meant by this is that there is potentially a lot of round trip
overhead. Pre-copy migration works well with reasonable high latency
network connections because the downtime is capped only by the maximum
latency sending from one point to another.

But with something like this, the total downtime is
2*max_latency*nb_pagefaults. That's potentially pretty high.


Let's be generous and assume that the latency is dominated by page copy
time. So the total downtime is equal to the first live migration pass,
~20 sec for 2GB on 1GbE. It's distributed over potentially even more
time, though. If the guest does a lot of I/O, it may not be noticeable
(esp. if we don't copy over pages read from disk). If the guest is
cpu/memory bound, it'll probably suck badly.



So it may be desirable to try to reduce nb_pagefaults by prefaulting
in pages, etc. Suffice to say, this ends up getting complicated and
may end up burning network traffic too.


Yeah, and prefaulting in the background adds latency to synchronous
requests.

This really needs excellent networking resources to work well.


Yup, it's very similar to other technologies using RDMA (single system 
image, lock step execution, etc.).


Regards,

Anthony Liguori





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH v4 00/39] Memory API, batch 2: PCI devices

2011-08-08 Thread Anthony Liguori


On 08/08/2011 08:08 AM, Avi Kivity wrote:

This is a mostly mindless conversion of all QEMU PCI devices to the memory API.
After this patchset is applied, it is no longer possible to create a PCI device
using the old API.

An immediate benefit is that PCI BARs that overlap each other are now handled
correctly: currently, the sequence

   map BAR 0
   map BAR 1 at an overlapping address
   unmap either BAR 0 or BAR 1

will leave a hole where the overlap exists.  With the patchset, the memory map
is restored correctly.

Note that overlaps of PCI BARs with memory or non-PCI resources are still not
resolved correctly; this will be fixed later on.

The vga patches have ugly intermediate states; however the result is fairly 
clean.


Applied all.  Thanks for taking this on, the results are very nice!

Regards,

Anthony Liguori



Changes from v3:
  - dropped virtio-pci config patch; will be fixed outside this patchset if
necessary
  - minor style fixes

Changes from v2:
  - added patch from Michael simplifying virtio-pci config setup

Changes from v1:
  - cmd646 type fix
  - folded a fixlet into its parent

Avi Kivity (39):
   memory: rename PORTIO_END to PORTIO_END_OF_LIST
   pci: add API to get a BAR's mapped address
   vmsvga: don't remember pci BAR address in callback any more
   vga: convert vga and its derivatives to the memory API
   cirrus: simplify mmio BAR access functions
   cirrus: simplify bitblt BAR access functions
   cirrus: simplify vga window mmio access functions
   vga: simplify vga window mmio access functions
   cirrus: simplify linear framebuffer access functions
   Integrate I/O memory regions into qemu
   pci: pass I/O address space to new PCI bus
   pci: allow I/O BARs to be registered with pci_register_bar_region()
   rtl8139: convert to memory API
   ac97: convert to memory API
   e1000: convert to memory API
   eepro100: convert to memory API
   es1370: convert to memory API
   ide: convert to memory API
   ivshmem: convert to memory API
   virtio-pci: convert to memory API
   ahci: convert to memory API
   intel-hda: convert to memory API
   lsi53c895a: convert to memory API
   ppc: convert to memory API
   ne2000: convert to memory API
   pcnet: convert to memory API
   i6300esb: convert to memory API
   isa-mmio: convert to memory API
   sun4u: convert to memory API
   ehci: convert to memory API
   uhci: convert to memory API
   xen-platform: convert to memory API
   msix: convert to memory API
   pci: remove pci_register_bar_simple()
   pci: convert pci rom to memory API
   pci: remove pci_register_bar()
   pci: fold BAR mapping function into its caller
   pci: rename pci_register_bar_region() to pci_register_bar()
   pci: remove support for pre memory API BARs

  exec-memory.h  |5 +
  exec.c |   10 ++
  hw/ac97.c  |   88 ++-
  hw/apb_pci.c   |1 +
  hw/bonito.c|1 +
  hw/cirrus_vga.c|  459 ---
  hw/cuda.c  |6 +-
  hw/e1000.c |  113 ++
  hw/eepro100.c  |  181 -
  hw/es1370.c|   43 +++--
  hw/escc.c  |   42 +++---
  hw/escc.h  |2 +-
  hw/grackle_pci.c   |8 +-
  hw/gt64xxx.c   |4 +-
  hw/heathrow_pic.c  |   29 ++--
  hw/ide.h   |2 +-
  hw/ide/ahci.c  |   31 ++--
  hw/ide/ahci.h  |2 +-
  hw/ide/cmd646.c|  204 +++-
  hw/ide/ich.c   |3 +-
  hw/ide/macio.c |   36 +++--
  hw/ide/pci.c   |   25 ++--
  hw/ide/pci.h   |   19 ++-
  hw/ide/piix.c  |   63 ++--
  hw/ide/via.c   |   64 ++--
  hw/intel-hda.c |   35 +++--
  hw/isa.h   |2 +
  hw/isa_mmio.c  |   29 ++--
  hw/ivshmem.c   |  158 +++
  hw/lance.c |   31 ++--
  hw/lsi53c895a.c|  257 +++---
  hw/mac_dbdma.c |   32 ++--
  hw/mac_dbdma.h |4 +-
  hw/mac_nvram.c |   39 ++---
  hw/macio.c |   73 -
  hw/msix.c  |   64 +++-
  hw/msix.h  |6 +-
  hw/ne2000-isa.c|   13 +--
  hw/ne2000.c|   77 ++---
  hw/ne2000.h|8 +-
  hw/openpic.c   |   81 +-
  hw/openpic.h   |2 +-
  hw/pc.h|4 +-
  hw/pc_piix.c   |6 +-
  hw/pci.c   |  133 +---
  hw/pci.h   |   26 ++--
  hw/pci_internals.h |3 +-
  hw/pcnet-pci.c |   74 +
  hw/pcnet.h |4 +-
  hw/piix_pci.c  |   14 +-
  hw/ppc4xx_pci.c|1 +
  hw/ppc_mac.h   |   27 ++--
  hw/ppc_newworld.c  |   34 ++--
  hw/ppc_oldworld.c  |   27 ++--
  hw/ppc_prep.c  |2 +-
  hw/ppce500_pci.c   |7 +-
  hw/prep_pci.c  |8 +-
  hw/prep_pci.h  |4 +-
  hw/qxl-render.c|2 +-
  hw/qxl.c   |  129 ++--
  hw/qxl.h   |6 +-
  hw/rtl8139.c   |   70 
  hw/sh_pci.c|4 +-
  hw/sun4u.c |   53

[PATCH 1/3] memory: reclaim resources when a memory region is destroyed for good

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.c |   24 
 memory.h |1 +
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/memory.c b/memory.c
index be891c6..5e3d966 100644
--- a/memory.c
+++ b/memory.c
@@ -661,6 +661,25 @@ void memory_region_transaction_commit(void)
 memory_region_update_topology();
 }
 
+static void memory_region_destructor_none(MemoryRegion *mr)
+{
+}
+
+static void memory_region_destructor_ram(MemoryRegion *mr)
+{
+qemu_ram_free(mr-ram_addr);
+}
+
+static void memory_region_destructor_ram_from_ptr(MemoryRegion *mr)
+{
+qemu_ram_free_from_ptr(mr-ram_addr);
+}
+
+static void memory_region_destructor_iomem(MemoryRegion *mr)
+{
+cpu_unregister_io_memory(mr-ram_addr);
+}
+
 void memory_region_init(MemoryRegion *mr,
 const char *name,
 uint64_t size)
@@ -671,6 +690,7 @@ void memory_region_init(MemoryRegion *mr,
 mr-addr = 0;
 mr-offset = 0;
 mr-terminates = false;
+mr-destructor = memory_region_destructor_none;
 mr-priority = 0;
 mr-may_overlap = false;
 mr-alias = NULL;
@@ -833,6 +853,7 @@ static void memory_region_prepare_ram_addr(MemoryRegion *mr)
 return;
 }
 
+mr-destructor = memory_region_destructor_iomem;
 mr-ram_addr = cpu_register_io_memory(memory_region_read_thunk,
   memory_region_write_thunk,
   mr,
@@ -860,6 +881,7 @@ void memory_region_init_ram(MemoryRegion *mr,
 {
 memory_region_init(mr, name, size);
 mr-terminates = true;
+mr-destructor = memory_region_destructor_ram;
 mr-ram_addr = qemu_ram_alloc(dev, name, size);
 mr-backend_registered = true;
 }
@@ -872,6 +894,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr,
 {
 memory_region_init(mr, name, size);
 mr-terminates = true;
+mr-destructor = memory_region_destructor_ram_from_ptr;
 mr-ram_addr = qemu_ram_alloc_from_ptr(dev, name, size, ptr);
 mr-backend_registered = true;
 }
@@ -890,6 +913,7 @@ void memory_region_init_alias(MemoryRegion *mr,
 void memory_region_destroy(MemoryRegion *mr)
 {
 assert(QTAILQ_EMPTY(mr-subregions));
+mr-destructor(mr);
 memory_region_clear_coalescing(mr);
 qemu_free((char *)mr-name);
 qemu_free(mr-ioeventfds);
diff --git a/memory.h b/memory.h
index da00a3b..c9252a2 100644
--- a/memory.h
+++ b/memory.h
@@ -109,6 +109,7 @@ struct MemoryRegion {
 target_phys_addr_t addr;
 target_phys_addr_t offset;
 bool backend_registered;
+void (*destructor)(MemoryRegion *mr);
 ram_addr_t ram_addr;
 IORange iorange;
 bool terminates;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] memory: add API for creating ROM/device regions

2011-08-08 Thread Avi Kivity

ROM/device regions act as mapped RAM for reads, can I/O memory for
writes.  This allow emulation of flash devices.

Signed-off-by: Avi Kivity a...@redhat.com
---
 memory.c |   46 --
 memory.h |   34 ++
 2 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/memory.c b/memory.c
index 5e3d966..beff98c 100644
--- a/memory.c
+++ b/memory.c
@@ -125,6 +125,7 @@ struct FlatRange {
 target_phys_addr_t offset_in_region;
 AddrRange addr;
 uint8_t dirty_log_mask;
+bool readable;
 };
 
 /* Flattened global view of current active memory hierarchy.  Kept in sorted
@@ -164,7 +165,8 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b)
 {
 return a-mr == b-mr
  addrrange_equal(a-addr, b-addr)
- a-offset_in_region == b-offset_in_region;
+ a-offset_in_region == b-offset_in_region
+ a-readable == b-readable;
 }
 
 static void flatview_init(FlatView *view)
@@ -200,7 +202,8 @@ static bool can_merge(FlatRange *r1, FlatRange *r2)
 return addrrange_end(r1-addr) == r2-addr.start
  r1-mr == r2-mr
  r1-offset_in_region + r1-addr.size == r2-offset_in_region
- r1-dirty_log_mask == r2-dirty_log_mask;
+ r1-dirty_log_mask == r2-dirty_log_mask
+ r1-readable == r2-readable;
 }
 
 /* Attempt to simplify a view by merging ajacent ranges */
@@ -241,6 +244,10 @@ static void as_memory_range_add(AddressSpace *as, 
FlatRange *fr)
 region_offset = 0;
 }
 
+if (!fr-readable) {
+phys_offset = TARGET_PAGE_MASK;
+}
+
 cpu_register_physical_memory_log(fr-addr.start,
  fr-addr.size,
  phys_offset,
@@ -462,6 +469,7 @@ static void render_memory_region(FlatView *view,
 fr.offset_in_region = offset_in_region;
 fr.addr = addrrange_make(base, now);
 fr.dirty_log_mask = mr-dirty_log_mask;
+fr.readable = mr-readable;
 flatview_insert(view, i, fr);
 ++i;
 base += now;
@@ -480,6 +488,7 @@ static void render_memory_region(FlatView *view,
 fr.offset_in_region = offset_in_region;
 fr.addr = addrrange_make(base, remain);
 fr.dirty_log_mask = mr-dirty_log_mask;
+fr.readable = mr-readable;
 flatview_insert(view, i, fr);
 }
 }
@@ -680,6 +689,12 @@ static void memory_region_destructor_iomem(MemoryRegion 
*mr)
 cpu_unregister_io_memory(mr-ram_addr);
 }
 
+static void memory_region_destructor_rom_device(MemoryRegion *mr)
+{
+qemu_ram_free(mr-ram_addr  TARGET_PAGE_MASK);
+cpu_unregister_io_memory(mr-ram_addr  ~(TARGET_PAGE_MASK | IO_MEM_ROMD));
+}
+
 void memory_region_init(MemoryRegion *mr,
 const char *name,
 uint64_t size)
@@ -690,6 +705,7 @@ void memory_region_init(MemoryRegion *mr,
 mr-addr = 0;
 mr-offset = 0;
 mr-terminates = false;
+mr-readable = true;
 mr-destructor = memory_region_destructor_none;
 mr-priority = 0;
 mr-may_overlap = false;
@@ -910,6 +926,24 @@ void memory_region_init_alias(MemoryRegion *mr,
 mr-alias_offset = offset;
 }
 
+void memory_region_init_rom_device(MemoryRegion *mr,
+   const MemoryRegionOps *ops,
+   DeviceState *dev,
+   const char *name,
+   uint64_t size)
+{
+memory_region_init(mr, name, size);
+mr-terminates = true;
+mr-destructor = memory_region_destructor_rom_device;
+mr-ram_addr = qemu_ram_alloc(dev, name, size);
+mr-ram_addr |= cpu_register_io_memory(memory_region_read_thunk,
+   memory_region_write_thunk,
+   mr,
+   mr-ops-endianness);
+mr-ram_addr |= IO_MEM_ROMD;
+mr-backend_registered = true;
+}
+
 void memory_region_destroy(MemoryRegion *mr)
 {
 assert(QTAILQ_EMPTY(mr-subregions));
@@ -967,6 +1001,14 @@ void memory_region_set_readonly(MemoryRegion *mr, bool 
readonly)
 /* FIXME */
 }
 
+void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable)
+{
+if (mr-readable != readable) {
+mr-readable = readable;
+memory_region_update_topology();
+}
+}
+
 void memory_region_reset_dirty(MemoryRegion *mr, target_phys_addr_t addr,
target_phys_addr_t size, unsigned client)
 {
diff --git a/memory.h b/memory.h
index c9252a2..0553cc7 100644
--- a/memory.h
+++ b/memory.h
@@ -113,6 +113,7 @@ struct MemoryRegion {
 ram_addr_t ram_addr;
 IORange iorange;
 bool terminates;
+bool readable;
 MemoryRegion *alias;
 target_phys_addr_t alias_offset;
 unsigned priority;
@@ -219,6 +220,25 @@ void memory_region_init_alias(MemoryRegion *mr,

[PATCH 3/3] memory: correct documentation typos

2011-08-08 Thread Avi Kivity

Noted by Drew Jones.

Signed-off-by: Avi Kivity a...@redhat.com
---
 docs/memory.txt |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/memory.txt b/docs/memory.txt
index 4460c06..3fc1683 100644
--- a/docs/memory.txt
+++ b/docs/memory.txt
@@ -15,7 +15,7 @@ The memory model provides support for
  - setting up coalesced memory for kvm
  - setting up ioeventfd regions for kvm
 
-Memory is modelled as an tree (really acyclic graph) of MemoryRegion objects.
+Memory is modelled as a tree (really acyclic graph) of MemoryRegion objects.
 The root of the tree is memory as seen from the CPU's viewpoint (the system
 bus).  Nodes in the tree represent other buses, memory controllers, and
 memory regions that have been rerouted.  Leaves are RAM and MMIO regions.
@@ -87,7 +87,7 @@ guest accesses an address:
   descending priority order
   - if the address lies outside the region offset/size, the subregion is
 discarded
-  - if the subregion is a leaf (RAM or MMIO), the seach terminates
+  - if the subregion is a leaf (RAM or MMIO), the search terminates
   - if the subregion is a container, the same algorithm is used within the
 subregion (after the address is adjusted by the subregion offset)
   - if the subregion is an alias, the search is continues at the alias target
@@ -128,7 +128,7 @@ so-called PCI hole, that allows a 32-bit PCI bus to exist 
in a system with
 4GB of memory.
 
 The memory controller diverts addresses in the range 640K-768K to the PCI
-address space.  This is modeled using the vga-window alias, mapped at a
+address space.  This is modelled using the vga-window alias, mapped at a
 higher priority so it obscures the RAM at the same addresses.  The vga window
 can be removed by programming the memory controller; this is modelled by
 removing the alias and exposing the RAM underneath.
@@ -164,7 +164,7 @@ various constraints can be supplied to control how these 
callbacks are called:
  - .impl.min_access_size, .impl.max_access_size define the access sizes
(in bytes) supported by the *implementation*; other access sizes will be
emulated using the ones available.  For example a 4-byte write will be
-   emulated using four 1-byte write, is .impl.max_access_size = 1.
+   emulated using four 1-byte write, if .impl.max_access_size = 1.
  - .impl.valid specifies that the *implementation* only supports unaligned
accesses; unaligned accesses will be emulated by two aligned accesses.
  - .old_portio and .old_mmio can be used to ease porting from code using
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] Memory API updates

2011-08-08 Thread Avi Kivity

The following patches fix a resource leak, add a ROM/device API (for flash
devices which act like memory when read, and as an mmio device when written),
and correct typos in the documentation.

Avi Kivity (3):
  memory: reclaim resources when a memory region is destroyed for good
  memory: add API for creating ROM/device regions
  memory: correct documentation typos

 docs/memory.txt |8 +++---
 memory.c|   70 +-
 memory.h|   35 +++
 3 files changed, 107 insertions(+), 6 deletions(-)

-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/24] sh_pci: convert to memory API

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/sh_pci.c |   63 +++---
 1 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/hw/sh_pci.c b/hw/sh_pci.c
index cd86501..76061bb 100644
--- a/hw/sh_pci.c
+++ b/hw/sh_pci.c
@@ -33,13 +33,16 @@ typedef struct SHPCIState {
 PCIBus *bus;
 PCIDevice *dev;
 qemu_irq irq[4];
-int memconfig;
+MemoryRegion memconfig_p4;
+MemoryRegion memconfig_a7;
+MemoryRegion isa;
 uint32_t par;
 uint32_t mbr;
 uint32_t iobr;
 } SHPCIState;
 
-static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint32_t val)
+static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint64_t val,
+  unsigned size)
 {
 SHPCIState *pcic = p;
 switch(addr) {
@@ -54,10 +57,10 @@ static void sh_pci_reg_write (void *p, target_phys_addr_t 
addr, uint32_t val)
 break;
 case 0x1c8:
 if ((val  0xfffc) != (pcic-iobr  0xfffc)) {
-cpu_register_physical_memory(pcic-iobr  0xfffc, 0x4,
- IO_MEM_UNASSIGNED);
+memory_region_del_subregion(get_system_memory(), pcic-isa);
 pcic-iobr = val  0xfffc0001;
-isa_mmio_init(pcic-iobr  0xfffc, 0x4);
+memory_region_add_subregion(get_system_memory(),
+pcic-iobr  0xfffc, pcic-isa);
 }
 break;
 case 0x220:
@@ -66,7 +69,8 @@ static void sh_pci_reg_write (void *p, target_phys_addr_t 
addr, uint32_t val)
 }
 }
 
-static uint32_t sh_pci_reg_read (void *p, target_phys_addr_t addr)
+static uint64_t sh_pci_reg_read (void *p, target_phys_addr_t addr,
+ unsigned size)
 {
 SHPCIState *pcic = p;
 switch(addr) {
@@ -84,14 +88,14 @@ static uint32_t sh_pci_reg_read (void *p, 
target_phys_addr_t addr)
 return 0;
 }
 
-typedef struct {
-CPUReadMemoryFunc * const r[3];
-CPUWriteMemoryFunc * const w[3];
-} MemOp;
-
-static MemOp sh_pci_reg = {
-{ NULL, NULL, sh_pci_reg_read },
-{ NULL, NULL, sh_pci_reg_write },
+static const MemoryRegionOps sh_pci_reg_ops = {
+.read = sh_pci_reg_read,
+.write = sh_pci_reg_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
 };
 
 static int sh_pci_map_irq(PCIDevice *d, int irq_num)
@@ -110,11 +114,23 @@ static void sh_pci_map(SysBusDevice *dev, 
target_phys_addr_t base)
 {
 SHPCIState *s = FROM_SYSBUS(SHPCIState, dev);
 
-cpu_register_physical_memory(P4ADDR(base), 0x224, s-memconfig);
-cpu_register_physical_memory(A7ADDR(base), 0x224, s-memconfig);
-
+memory_region_add_subregion(get_system_memory(),
+P4ADDR(base),
+s-memconfig_p4);
+memory_region_add_subregion(get_system_memory(),
+A7ADDR(base),
+s-memconfig_a7);
 s-iobr = 0xfe24;
-isa_mmio_init(s-iobr, 0x4);
+memory_region_add_subregion(get_system_memory(), s-iobr, s-isa);
+}
+
+static void sh_pci_unmap(SysBusDevice *dev, target_phys_addr_t base)
+{
+SHPCIState *s = FROM_SYSBUS(SHPCIState, dev);
+
+memory_region_del_subregion(get_system_memory(), s-memconfig_p4);
+memory_region_del_subregion(get_system_memory(), s-memconfig_a7);
+memory_region_del_subregion(get_system_memory(), s-isa);
 }
 
 static int sh_pci_init_device(SysBusDevice *dev)
@@ -132,9 +148,14 @@ static int sh_pci_init_device(SysBusDevice *dev)
   get_system_memory(),
   get_system_io(),
   PCI_DEVFN(0, 0), 4);
-s-memconfig = cpu_register_io_memory(sh_pci_reg.r, sh_pci_reg.w,
-  s, DEVICE_NATIVE_ENDIAN);
-sysbus_init_mmio_cb(dev, 0x224, sh_pci_map);
+memory_region_init_io(s-memconfig_p4, sh_pci_reg_ops, s,
+  sh_pci, 0x224);
+memory_region_init_alias(s-memconfig_a7, sh_pci.2, s-memconfig_a7,
+ 0, 0x224);
+isa_mmio_setup(s-isa, 0x4);
+sysbus_init_mmio_cb2(dev, sh_pci_map, sh_pci_unmap);
+sysbus_init_mmio_region(dev, s-memconfig_a7);
+sysbus_init_mmio_region(dev, s-isa);
 s-dev = pci_create_simple(s-bus, PCI_DEVFN(0, 0), sh_pci_host);
 return 0;
 }
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/24] tusb6010: move declarations to new file tusb6010.h

2011-08-08 Thread Avi Kivity

Avoid #include hell.

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/devices.h  |7 ---
 hw/nseries.c  |1 +
 hw/tusb6010.c |2 +-
 hw/tusb6010.h |   10 ++
 4 files changed, 12 insertions(+), 8 deletions(-)
 create mode 100644 hw/tusb6010.h

diff --git a/hw/devices.h b/hw/devices.h
index c788373..07fda83 100644
--- a/hw/devices.h
+++ b/hw/devices.h
@@ -47,13 +47,6 @@ void *tahvo_init(qemu_irq irq, int betty);
 
 void retu_key_event(void *retu, int state);
 
-/* tusb6010.c */
-typedef struct TUSBState TUSBState;
-TUSBState *tusb6010_init(qemu_irq intr);
-int tusb6010_sync_io(TUSBState *s);
-int tusb6010_async_io(TUSBState *s);
-void tusb6010_power(TUSBState *s, int on);
-
 /* tc6393xb.c */
 typedef struct TC6393xbState TC6393xbState;
 #define TC6393XB_RAM   0x11 /* amount of ram for Video and USB */
diff --git a/hw/nseries.c b/hw/nseries.c
index 6a5575e..5521f28 100644
--- a/hw/nseries.c
+++ b/hw/nseries.c
@@ -32,6 +32,7 @@
 #include bt.h
 #include loader.h
 #include blockdev.h
+#include tusb6010.h
 
 /* Nokia N8x0 support */
 struct n800_s {
diff --git a/hw/tusb6010.c b/hw/tusb6010.c
index ccd01ad..add748c 100644
--- a/hw/tusb6010.c
+++ b/hw/tusb6010.c
@@ -23,7 +23,7 @@
 #include usb.h
 #include omap.h
 #include irq.h
-#include devices.h
+#include tusb6010.h
 
 struct TUSBState {
 int iomemtype[2];
diff --git a/hw/tusb6010.h b/hw/tusb6010.h
new file mode 100644
index 000..6faa94d
--- /dev/null
+++ b/hw/tusb6010.h
@@ -0,0 +1,10 @@
+#ifndef TUSB6010_H
+#define TUSB6010_H
+
+typedef struct TUSBState TUSBState;
+TUSBState *tusb6010_init(qemu_irq intr);
+int tusb6010_sync_io(TUSBState *s);
+int tusb6010_async_io(TUSBState *s);
+void tusb6010_power(TUSBState *s, int on);
+
+#endif
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/24] sysbus: add a variant of sysbus_init_mmio_cb with an unmap callback

2011-08-08 Thread Avi Kivity

sysbus_init_mmio_cb() uses the destructive IO_MEM_UNASSIGNED to remove a
region.  Provide an alternative that calls an unmap callback, so the removal
may be done non-destructively.

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/sysbus.c |   15 +++
 hw/sysbus.h |3 +++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/hw/sysbus.c b/hw/sysbus.c
index ea442ac..64749e0 100644
--- a/hw/sysbus.c
+++ b/hw/sysbus.c
@@ -53,6 +53,8 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, 
target_phys_addr_t addr)
 if (dev-mmio[n].memory) {
 memory_region_del_subregion(get_system_memory(),
 dev-mmio[n].memory);
+} else if (dev-mmio[n].unmap) {
+dev-mmio[n].unmap(dev, dev-mmio[n].addr);
 } else {
 cpu_register_physical_memory(dev-mmio[n].addr, dev-mmio[n].size,
  IO_MEM_UNASSIGNED);
@@ -117,6 +119,19 @@ void sysbus_init_mmio_cb(SysBusDevice *dev, 
target_phys_addr_t size,
 dev-mmio[n].cb = cb;
 }
 
+void sysbus_init_mmio_cb2(SysBusDevice *dev,
+  mmio_mapfunc cb, mmio_mapfunc unmap)
+{
+int n;
+
+assert(dev-num_mmio  QDEV_MAX_MMIO);
+n = dev-num_mmio++;
+dev-mmio[n].addr = -1;
+dev-mmio[n].size = 0;
+dev-mmio[n].cb = cb;
+dev-mmio[n].unmap = unmap;
+}
+
 void sysbus_init_mmio_region(SysBusDevice *dev, MemoryRegion *memory)
 {
 int n;
diff --git a/hw/sysbus.h b/hw/sysbus.h
index 5f62e2d..16fd969 100644
--- a/hw/sysbus.h
+++ b/hw/sysbus.h
@@ -23,6 +23,7 @@ struct SysBusDevice {
 target_phys_addr_t addr;
 target_phys_addr_t size;
 mmio_mapfunc cb;
+mmio_mapfunc unmap;
 ram_addr_t iofunc;
 MemoryRegion *memory;
 } mmio[QDEV_MAX_MMIO];
@@ -48,6 +49,8 @@ void sysbus_init_mmio(SysBusDevice *dev, target_phys_addr_t 
size,
   ram_addr_t iofunc);
 void sysbus_init_mmio_cb(SysBusDevice *dev, target_phys_addr_t size,
 mmio_mapfunc cb);
+void sysbus_init_mmio_cb2(SysBusDevice *dev,
+  mmio_mapfunc cb, mmio_mapfunc unmap);
 void sysbus_init_mmio_region(SysBusDevice *dev, MemoryRegion *memory);
 void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p);
 void sysbus_pass_irq(SysBusDevice *dev, SysBusDevice *target);
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/24] arm_gic: convert to memory API

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/arm_gic.c  |   22 --
 hw/armv7m_nvic.c  |3 ++-
 hw/mpcore.c   |   37 +
 hw/realview_gic.c |   38 +-
 4 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/hw/arm_gic.c b/hw/arm_gic.c
index fb07314..83213dd 100644
--- a/hw/arm_gic.c
+++ b/hw/arm_gic.c
@@ -104,7 +104,7 @@ typedef struct gic_state
 int num_cpu;
 #endif
 
-int iomemtype;
+MemoryRegion iomem;
 } gic_state;
 
 /* TODO: Many places that call this routine could be optimized.  */
@@ -567,16 +567,12 @@ static void gic_dist_writel(void *opaque, 
target_phys_addr_t offset,
 gic_dist_writew(opaque, offset + 2, value  16);
 }
 
-static CPUReadMemoryFunc * const gic_dist_readfn[] = {
-   gic_dist_readb,
-   gic_dist_readw,
-   gic_dist_readl
-};
-
-static CPUWriteMemoryFunc * const gic_dist_writefn[] = {
-   gic_dist_writeb,
-   gic_dist_writew,
-   gic_dist_writel
+static const MemoryRegionOps gic_dist_ops = {
+.old_mmio = {
+.read = { gic_dist_readb, gic_dist_readw, gic_dist_readl, },
+.write = { gic_dist_writeb, gic_dist_writew, gic_dist_writel, },
+},
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 #ifndef NVIC
@@ -741,9 +737,7 @@ static void gic_init(gic_state *s)
 for (i = 0; i  NUM_CPU(s); i++) {
 sysbus_init_irq(s-busdev, s-parent_irq[i]);
 }
-s-iomemtype = cpu_register_io_memory(gic_dist_readfn,
-  gic_dist_writefn, s,
-  DEVICE_NATIVE_ENDIAN);
+memory_region_init_io(s-iomem, gic_dist_ops, s, gic_dist, 0x1000);
 gic_reset(s);
 register_savevm(NULL, arm_gic, -1, 1, gic_save, gic_load, s);
 }
diff --git a/hw/armv7m_nvic.c b/hw/armv7m_nvic.c
index 1df8d4d..bf8c3c5 100644
--- a/hw/armv7m_nvic.c
+++ b/hw/armv7m_nvic.c
@@ -13,6 +13,7 @@
 #include sysbus.h
 #include qemu-timer.h
 #include arm-misc.h
+#include exec-memory.h
 
 /* 32 internal lines (16 used for system exceptions) plus 64 external
interrupt lines.  */
@@ -384,7 +385,7 @@ static int armv7m_nvic_init(SysBusDevice *dev)
 nvic_state *s= FROM_SYSBUSGIC(nvic_state, dev);
 
 gic_init(s-gic);
-cpu_register_physical_memory(0xe000e000, 0x1000, s-gic.iomemtype);
+memory_region_add_subregion(get_system_memory(), 0xe000e000, 
s-gic.iomem);
 s-systick.timer = qemu_new_timer_ns(vm_clock, systick_timer_tick, s);
 vmstate_register(dev-qdev, -1, vmstate_nvic, s);
 return 0;
diff --git a/hw/mpcore.c b/hw/mpcore.c
index d778507..d6175cf 100644
--- a/hw/mpcore.c
+++ b/hw/mpcore.c
@@ -40,6 +40,8 @@ typedef struct mpcore_priv_state {
 int iomemtype;
 mpcore_timer_state timer[8];
 uint32_t num_cpu;
+MemoryRegion iomem;
+MemoryRegion container;
 } mpcore_priv_state;
 
 /* Per-CPU Timers.  */
@@ -151,7 +153,8 @@ static void mpcore_timer_init(mpcore_priv_state *mpcore,
 
 /* Per-CPU private memory mapped IO.  */
 
-static uint32_t mpcore_priv_read(void *opaque, target_phys_addr_t offset)
+static uint64_t mpcore_priv_read(void *opaque, target_phys_addr_t offset,
+ unsigned size)
 {
 mpcore_priv_state *s = (mpcore_priv_state *)opaque;
 int id;
@@ -203,7 +206,7 @@ bad_reg:
 }
 
 static void mpcore_priv_write(void *opaque, target_phys_addr_t offset,
-  uint32_t value)
+  uint64_t value, unsigned size)
 {
 mpcore_priv_state *s = (mpcore_priv_state *)opaque;
 int id;
@@ -250,23 +253,19 @@ bad_reg:
 hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset);
 }
 
-static CPUReadMemoryFunc * const mpcore_priv_readfn[] = {
-   mpcore_priv_read,
-   mpcore_priv_read,
-   mpcore_priv_read
+static const MemoryRegionOps mpcore_priv_ops = {
+.read = mpcore_priv_read,
+.write = mpcore_priv_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const mpcore_priv_writefn[] = {
-   mpcore_priv_write,
-   mpcore_priv_write,
-   mpcore_priv_write
-};
-
-static void mpcore_priv_map(SysBusDevice *dev, target_phys_addr_t base)
+static void mpcore_priv_map_setup(mpcore_priv_state *s)
 {
-mpcore_priv_state *s = FROM_SYSBUSGIC(mpcore_priv_state, dev);
-cpu_register_physical_memory(base, 0x1000, s-iomemtype);
-cpu_register_physical_memory(base + 0x1000, 0x1000, s-gic.iomemtype);
+memory_region_init(s-container, mpcode-priv-container, 0x2000);
+memory_region_init_io(s-iomem, mpcore_priv_ops, s, mpcode-priv,
+  0x1000);
+memory_region_add_subregion(s-container, 0, s-iomem);
+memory_region_add_subregion(s-container, 0x1000, s-gic.iomem);
 }
 
 static int mpcore_priv_init(SysBusDevice *dev)
@@ -275,10 +274,8 @@ static int mpcore_priv_init(SysBusDevice *dev)
 int i;
 
 gic_init(s-gic, s-num_cpu);
-s-iomemtype = cpu_register_io_memory(mpcore_priv_readfn,
-

[PATCH 14/24] stellaris_enet: convert to memory API

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/stellaris_enet.c |   29 -
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/hw/stellaris_enet.c b/hw/stellaris_enet.c
index 1291931..9f1f37a 100644
--- a/hw/stellaris_enet.c
+++ b/hw/stellaris_enet.c
@@ -69,7 +69,7 @@ typedef struct {
 NICState *nic;
 NICConf conf;
 qemu_irq irq;
-int mmio_index;
+MemoryRegion mmio;
 } stellaris_enet_state;
 
 static void stellaris_enet_update(stellaris_enet_state *s)
@@ -130,7 +130,8 @@ static int stellaris_enet_can_receive(VLANClientState *nc)
 return (s-np  31);
 }
 
-static uint32_t stellaris_enet_read(void *opaque, target_phys_addr_t offset)
+static uint64_t stellaris_enet_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
 {
 stellaris_enet_state *s = (stellaris_enet_state *)opaque;
 uint32_t val;
@@ -198,7 +199,7 @@ static uint32_t stellaris_enet_read(void *opaque, 
target_phys_addr_t offset)
 }
 
 static void stellaris_enet_write(void *opaque, target_phys_addr_t offset,
-uint32_t value)
+ uint64_t value, unsigned size)
 {
 stellaris_enet_state *s = (stellaris_enet_state *)opaque;
 
@@ -303,17 +304,12 @@ static void stellaris_enet_write(void *opaque, 
target_phys_addr_t offset,
 }
 }
 
-static CPUReadMemoryFunc * const stellaris_enet_readfn[] = {
-   stellaris_enet_read,
-   stellaris_enet_read,
-   stellaris_enet_read
+static const MemoryRegionOps stellaris_enet_ops = {
+.read = stellaris_enet_read,
+.write = stellaris_enet_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static CPUWriteMemoryFunc * const stellaris_enet_writefn[] = {
-   stellaris_enet_write,
-   stellaris_enet_write,
-   stellaris_enet_write
-};
 static void stellaris_enet_reset(stellaris_enet_state *s)
 {
 s-mdv = 0x80;
@@ -391,7 +387,7 @@ static void stellaris_enet_cleanup(VLANClientState *nc)
 
 unregister_savevm(s-busdev.qdev, stellaris_enet, s);
 
-cpu_unregister_io_memory(s-mmio_index);
+memory_region_destroy(s-mmio);
 
 qemu_free(s);
 }
@@ -408,10 +404,9 @@ static int stellaris_enet_init(SysBusDevice *dev)
 {
 stellaris_enet_state *s = FROM_SYSBUS(stellaris_enet_state, dev);
 
-s-mmio_index = cpu_register_io_memory(stellaris_enet_readfn,
-   stellaris_enet_writefn, s,
-   DEVICE_NATIVE_ENDIAN);
-sysbus_init_mmio(dev, 0x1000, s-mmio_index);
+memory_region_init_io(s-mmio, stellaris_enet_ops, s, stellaris_enet,
+  0x1000);
+sysbus_init_mmio_region(dev, s-mmio);
 sysbus_init_irq(dev, s-irq);
 qemu_macaddr_default_if_unset(s-conf.macaddr);
 
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/24] arm_sysctl: convert to memory API

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/arm_sysctl.c |   27 ++-
 1 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/hw/arm_sysctl.c b/hw/arm_sysctl.c
index fd0c8bc..1838401 100644
--- a/hw/arm_sysctl.c
+++ b/hw/arm_sysctl.c
@@ -17,6 +17,7 @@
 
 typedef struct {
 SysBusDevice busdev;
+MemoryRegion iomem;
 uint32_t sys_id;
 uint32_t leds;
 uint16_t lockval;
@@ -80,7 +81,8 @@ static void arm_sysctl_reset(DeviceState *d)
 s-resetlevel = 0;
 }
 
-static uint32_t arm_sysctl_read(void *opaque, target_phys_addr_t offset)
+static uint64_t arm_sysctl_read(void *opaque, target_phys_addr_t offset,
+unsigned size)
 {
 arm_sysctl_state *s = (arm_sysctl_state *)opaque;
 
@@ -177,7 +179,7 @@ static uint32_t arm_sysctl_read(void *opaque, 
target_phys_addr_t offset)
 }
 
 static void arm_sysctl_write(void *opaque, target_phys_addr_t offset,
-  uint32_t val)
+ uint64_t val, unsigned size)
 {
 arm_sysctl_state *s = (arm_sysctl_state *)opaque;
 
@@ -284,16 +286,10 @@ static void arm_sysctl_write(void *opaque, 
target_phys_addr_t offset,
 }
 }
 
-static CPUReadMemoryFunc * const arm_sysctl_readfn[] = {
-   arm_sysctl_read,
-   arm_sysctl_read,
-   arm_sysctl_read
-};
-
-static CPUWriteMemoryFunc * const arm_sysctl_writefn[] = {
-   arm_sysctl_write,
-   arm_sysctl_write,
-   arm_sysctl_write
+static const MemoryRegionOps arm_sysctl_ops = {
+.read = arm_sysctl_read,
+.write = arm_sysctl_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static void arm_sysctl_gpio_set(void *opaque, int line, int level)
@@ -327,12 +323,9 @@ static void arm_sysctl_gpio_set(void *opaque, int line, 
int level)
 static int arm_sysctl_init1(SysBusDevice *dev)
 {
 arm_sysctl_state *s = FROM_SYSBUS(arm_sysctl_state, dev);
-int iomemtype;
 
-iomemtype = cpu_register_io_memory(arm_sysctl_readfn,
-   arm_sysctl_writefn, s,
-   DEVICE_NATIVE_ENDIAN);
-sysbus_init_mmio(dev, 0x1000, iomemtype);
+memory_region_init_io(s-iomem, arm_sysctl_ops, s, arm-sysctl, 0x1000);
+sysbus_init_mmio_region(dev, s-iomem);
 qdev_init_gpio_in(s-busdev.qdev, arm_sysctl_gpio_set, 2);
 /* ??? Save/restore.  */
 return 0;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/24] versatile_pci: convert to memory API

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/versatile_pci.c |   94 ---
 1 files changed, 44 insertions(+), 50 deletions(-)

diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c
index e1d5c0b..43edf77 100644
--- a/hw/versatile_pci.c
+++ b/hw/versatile_pci.c
@@ -16,7 +16,9 @@ typedef struct {
 SysBusDevice busdev;
 qemu_irq irq[4];
 int realview;
-int mem_config;
+MemoryRegion mem_config;
+MemoryRegion mem_config2;
+MemoryRegion isa;
 } PCIVPBState;
 
 static inline uint32_t vpb_pci_config_addr(target_phys_addr_t addr)
@@ -24,55 +26,24 @@ static inline uint32_t 
vpb_pci_config_addr(target_phys_addr_t addr)
 return addr  0xff;
 }
 
-static void pci_vpb_config_writeb (void *opaque, target_phys_addr_t addr,
-   uint32_t val)
+static void pci_vpb_config_write(void *opaque, target_phys_addr_t addr,
+ uint64_t val, unsigned size)
 {
-pci_data_write(opaque, vpb_pci_config_addr (addr), val, 1);
+pci_data_write(opaque, vpb_pci_config_addr(addr), val, size);
 }
 
-static void pci_vpb_config_writew (void *opaque, target_phys_addr_t addr,
-   uint32_t val)
-{
-pci_data_write(opaque, vpb_pci_config_addr (addr), val, 2);
-}
-
-static void pci_vpb_config_writel (void *opaque, target_phys_addr_t addr,
-   uint32_t val)
-{
-pci_data_write(opaque, vpb_pci_config_addr (addr), val, 4);
-}
-
-static uint32_t pci_vpb_config_readb (void *opaque, target_phys_addr_t addr)
+static uint64_t pci_vpb_config_read(void *opaque, target_phys_addr_t addr,
+unsigned size)
 {
 uint32_t val;
-val = pci_data_read(opaque, vpb_pci_config_addr (addr), 1);
-return val;
+val = pci_data_read(opaque, vpb_pci_config_addr(addr), size);
+return size;
 }
 
-static uint32_t pci_vpb_config_readw (void *opaque, target_phys_addr_t addr)
-{
-uint32_t val;
-val = pci_data_read(opaque, vpb_pci_config_addr (addr), 2);
-return val;
-}
-
-static uint32_t pci_vpb_config_readl (void *opaque, target_phys_addr_t addr)
-{
-uint32_t val;
-val = pci_data_read(opaque, vpb_pci_config_addr (addr), 4);
-return val;
-}
-
-static CPUWriteMemoryFunc * const pci_vpb_config_write[] = {
-pci_vpb_config_writeb,
-pci_vpb_config_writew,
-pci_vpb_config_writel,
-};
-
-static CPUReadMemoryFunc * const pci_vpb_config_read[] = {
-pci_vpb_config_readb,
-pci_vpb_config_readw,
-pci_vpb_config_readl,
+static const MemoryRegionOps pci_vpb_config_ops = {
+.read = pci_vpb_config_read,
+.write = pci_vpb_config_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static int pci_vpb_map_irq(PCIDevice *d, int irq_num)
@@ -87,17 +58,35 @@ static void pci_vpb_set_irq(void *opaque, int irq_num, int 
level)
 qemu_set_irq(pic[irq_num], level);
 }
 
+
 static void pci_vpb_map(SysBusDevice *dev, target_phys_addr_t base)
 {
 PCIVPBState *s = (PCIVPBState *)dev;
 /* Selfconfig area.  */
-cpu_register_physical_memory(base + 0x0100, 0x100, s-mem_config);
+memory_region_add_subregion(get_system_memory(), base + 0x0100,
+s-mem_config);
 /* Normal config area.  */
-cpu_register_physical_memory(base + 0x0200, 0x100, s-mem_config);
+memory_region_add_subregion(get_system_memory(), base + 0x0200,
+s-mem_config2);
 
 if (s-realview) {
 /* IO memory area.  */
-isa_mmio_init(base + 0x0300, 0x0010);
+memory_region_add_subregion(get_system_memory(), base + 0x0300,
+s-isa);
+}
+}
+
+static void pci_vpb_unmap(SysBusDevice *dev, target_phys_addr_t base)
+{
+PCIVPBState *s = (PCIVPBState *)dev;
+/* Selfconfig area.  */
+memory_region_del_subregion(get_system_memory(), s-mem_config);
+/* Normal config area.  */
+memory_region_del_subregion(get_system_memory(), s-mem_config2);
+
+if (s-realview) {
+/* IO memory area.  */
+memory_region_del_subregion(get_system_memory(), s-isa);
 }
 }
 
@@ -117,10 +106,15 @@ static int pci_vpb_init(SysBusDevice *dev)
 
 /* ??? Register memory space.  */
 
-s-mem_config = cpu_register_io_memory(pci_vpb_config_read,
-   pci_vpb_config_write, bus,
-   DEVICE_LITTLE_ENDIAN);
-sysbus_init_mmio_cb(dev, 0x0400, pci_vpb_map);
+memory_region_init_io(s-mem_config, pci_vpb_config_ops, bus,
+  pci-vpb-selfconfig, 0x100);
+memory_region_init_io(s-mem_config2, pci_vpb_config_ops, bus,
+  pci-vpb-config, 0x100);
+if (s-realview) {
+isa_mmio_setup(s-isa, 0x010);
+}
+
+sysbus_init_mmio_cb2(dev, pci_vpb_map, pci_vpb_unmap);

[PATCH 07/24] gt64xxx.c: convert to memory API

2011-08-08 Thread Avi Kivity

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/gt64xxx.c |   36 +++-
 1 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c
index d541558..6af9782 100644
--- a/hw/gt64xxx.c
+++ b/hw/gt64xxx.c
@@ -227,7 +227,7 @@
 #define PCI_MAPPING_ENTRY(regname)\
 target_phys_addr_t regname ##_start;  \
 target_phys_addr_t regname ##_length; \
-int regname ##_handle
+MemoryRegion regname ##_mem
 
 typedef struct GT64120State {
 SysBusDevice busdev;
@@ -269,9 +269,9 @@ static void gt64120_isd_mapping(GT64120State *s)
 target_phys_addr_t start = s-regs[GT_ISD]  21;
 target_phys_addr_t length = 0x1000;
 
-if (s-ISD_length)
-cpu_register_physical_memory(s-ISD_start, s-ISD_length,
- IO_MEM_UNASSIGNED);
+if (s-ISD_length) {
+memory_region_del_subregion(get_system_memory(), s-ISD_mem);
+}
 check_reserved_space(start, length);
 length = 0x1000;
 /* Map new address */
@@ -279,7 +279,7 @@ static void gt64120_isd_mapping(GT64120State *s)
 length, start, s-ISD_handle);
 s-ISD_start = start;
 s-ISD_length = length;
-cpu_register_physical_memory(s-ISD_start, s-ISD_length, s-ISD_handle);
+memory_region_add_subregion(get_system_memory(), s-ISD_start, 
s-ISD_mem);
 }
 
 static void gt64120_pci_mapping(GT64120State *s)
@@ -290,7 +290,8 @@ static void gt64120_pci_mapping(GT64120State *s)
   /* Unmap old IO address */
   if (s-PCI0IO_length)
   {
-cpu_register_physical_memory(s-PCI0IO_start, s-PCI0IO_length, 
IO_MEM_UNASSIGNED);
+  memory_region_del_subregion(get_system_memory(), s-PCI0IO_mem);
+  memory_region_destroy(s-PCI0IO_mem);
   }
   /* Map new IO address */
   s-PCI0IO_start = s-regs[GT_PCI0IOLD]  21;
@@ -301,7 +302,7 @@ static void gt64120_pci_mapping(GT64120State *s)
 }
 
 static void gt64120_writel (void *opaque, target_phys_addr_t addr,
-uint32_t val)
+uint64_t val, unsigned size)
 {
 GT64120State *s = opaque;
 uint32_t saddr;
@@ -579,8 +580,8 @@ static void gt64120_writel (void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static uint32_t gt64120_readl (void *opaque,
-   target_phys_addr_t addr)
+static uint64_t gt64120_readl (void *opaque,
+   target_phys_addr_t addr, unsigned size)
 {
 GT64120State *s = opaque;
 uint32_t val;
@@ -851,16 +852,10 @@ static uint32_t gt64120_readl (void *opaque,
 return val;
 }
 
-static CPUWriteMemoryFunc * const gt64120_write[] = {
-gt64120_writel,
-gt64120_writel,
-gt64120_writel,
-};
-
-static CPUReadMemoryFunc * const gt64120_read[] = {
-gt64120_readl,
-gt64120_readl,
-gt64120_readl,
+static const MemoryRegionOps isd_mem_ops = {
+.read = gt64120_readl,
+.write = gt64120_writel,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 static int gt64120_pci_map_irq(PCIDevice *pci_dev, int irq_num)
@@ -1097,8 +1092,7 @@ PCIBus *gt64120_register(qemu_irq *pic)
   get_system_memory(),
   get_system_io(),
   PCI_DEVFN(18, 0), 4);
-d-ISD_handle = cpu_register_io_memory(gt64120_read, gt64120_write, d,
-   DEVICE_NATIVE_ENDIAN);
+memory_region_init_io(d-ISD_mem, isd_mem_ops, d, isd-mem, 0x1000);
 
 pci_create_simple(d-pci.bus, PCI_DEVFN(0, 0), gt64120_pci);
 return d-pci.bus;
-- 
1.7.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 09/24] omap_gpmc/nseries/tusb6010: convert to memory API

2011-08-08 Thread Avi Kivity

Somewhat clumsy since it needs a variable sized region.

Signed-off-by: Avi Kivity a...@redhat.com
---
 hw/omap.h  |3 ++-
 hw/omap_gpmc.c |   53 +
 hw/tusb6010.c  |   30 +-
 hw/tusb6010.h  |7 +--
 4 files changed, 49 insertions(+), 44 deletions(-)

diff --git a/hw/omap.h b/hw/omap.h
index a064353..c2fe54c 100644
--- a/hw/omap.h
+++ b/hw/omap.h
@@ -17,6 +17,7 @@
  * with this program; if not, see http://www.gnu.org/licenses/.
  */
 #ifndef hw_omap_h
+#include memory.h
 # define hw_omap_h omap.h
 
 # define OMAP_EMIFS_BASE   0x
@@ -119,7 +120,7 @@ void omap_sdrc_reset(struct omap_sdrc_s *s);
 struct omap_gpmc_s;
 struct omap_gpmc_s *omap_gpmc_init(target_phys_addr_t base, qemu_irq irq);
 void omap_gpmc_reset(struct omap_gpmc_s *s);
-void omap_gpmc_attach(struct omap_gpmc_s *s, int cs, int iomemtype,
+void omap_gpmc_attach(struct omap_gpmc_s *s, int cs, MemoryRegion *iomem,
 void (*base_upd)(void *opaque, target_phys_addr_t new),
 void (*unmap)(void *opaque), void *opaque);
 
diff --git a/hw/omap_gpmc.c b/hw/omap_gpmc.c
index 8bf3343..4901aba 100644
--- a/hw/omap_gpmc.c
+++ b/hw/omap_gpmc.c
@@ -21,10 +21,13 @@
 #include hw.h
 #include flash.h
 #include omap.h
+#include memory.h
+#include exec-memory.h
 
 /* General-Purpose Memory Controller */
 struct omap_gpmc_s {
 qemu_irq irq;
+MemoryRegion iomem;
 
 uint8_t sysconfig;
 uint16_t irqst;
@@ -39,7 +42,8 @@ struct omap_gpmc_s {
 uint32_t config[7];
 target_phys_addr_t base;
 size_t size;
-int iomemtype;
+MemoryRegion *iomem;
+MemoryRegion container;
 void (*base_update)(void *opaque, target_phys_addr_t new);
 void (*unmap)(void *opaque);
 void *opaque;
@@ -75,8 +79,12 @@ static void omap_gpmc_cs_map(struct omap_gpmc_cs_file_s *f, 
int base, int mask)
  * constant), the mask should cause wrapping of the address space, so
  * that the same memory becomes accessible at every isize/i bytes
  * starting from ibase/i.  */
-if (f-iomemtype)
-cpu_register_physical_memory(f-base, f-size, f-iomemtype);
+if (f-iomem) {
+memory_region_init(f-container, omap-gpmc-file, f-size);
+memory_region_add_subregion(f-container, 0, f-iomem);
+memory_region_add_subregion(get_system_memory(), f-base,
+f-container);
+}
 
 if (f-base_update)
 f-base_update(f-opaque, f-base);
@@ -87,8 +95,11 @@ static void omap_gpmc_cs_unmap(struct omap_gpmc_cs_file_s *f)
 if (f-size) {
 if (f-unmap)
 f-unmap(f-opaque);
-if (f-iomemtype)
-cpu_register_physical_memory(f-base, f-size, IO_MEM_UNASSIGNED);
+if (f-iomem) {
+memory_region_del_subregion(get_system_memory(), f-container);
+memory_region_del_subregion(f-container, f-iomem);
+memory_region_destroy(f-container);
+}
 f-base = 0;
 f-size = 0;
 }
@@ -132,7 +143,8 @@ void omap_gpmc_reset(struct omap_gpmc_s *s)
 ecc_reset(s-ecc[i]);
 }
 
-static uint32_t omap_gpmc_read(void *opaque, target_phys_addr_t addr)
+static uint64_t omap_gpmc_read(void *opaque, target_phys_addr_t addr,
+   unsigned size)
 {
 struct omap_gpmc_s *s = (struct omap_gpmc_s *) opaque;
 int cs;
@@ -230,7 +242,7 @@ static uint32_t omap_gpmc_read(void *opaque, 
target_phys_addr_t addr)
 }
 
 static void omap_gpmc_write(void *opaque, target_phys_addr_t addr,
-uint32_t value)
+uint64_t value, unsigned size)
 {
 struct omap_gpmc_s *s = (struct omap_gpmc_s *) opaque;
 int cs;
@@ -249,7 +261,7 @@ static void omap_gpmc_write(void *opaque, 
target_phys_addr_t addr,
 
 case 0x010:/* GPMC_SYSCONFIG */
 if ((value  3) == 0x3)
-fprintf(stderr, %s: bad SDRAM idle mode %i\n,
+fprintf(stderr, %s: bad SDRAM idle mode %PRIi64\n,
 __FUNCTION__, value  3);
 if (value  2)
 omap_gpmc_reset(s);
@@ -369,34 +381,27 @@ static void omap_gpmc_write(void *opaque, 
target_phys_addr_t addr,
 }
 }
 
-static CPUReadMemoryFunc * const omap_gpmc_readfn[] = {
-omap_badwidth_read32,  /* TODO */
-omap_badwidth_read32,  /* TODO */
-omap_gpmc_read,
-};
-
-static CPUWriteMemoryFunc * const omap_gpmc_writefn[] = {
-omap_badwidth_write32, /* TODO */
-omap_badwidth_write32, /* TODO */
-omap_gpmc_write,
+static const MemoryRegionOps omap_gpmc_ops = {
+/* TODO: specialize 4 byte writes? */
+.read = omap_gpmc_read,
+.write = omap_gpmc_write,
+.endianness = DEVICE_NATIVE_ENDIAN,
 };
 
 struct omap_gpmc_s *omap_gpmc_init(target_phys_addr_t base, qemu_irq irq)
 {
-int iomemtype;
 struct omap_gpmc_s *s = (struct omap_gpmc_s *)

1 2 >

1 - 100 of 147 matches

Mail list logo