Re: [PATCH] Permit -mem-path without sync mmu
On Fri, Aug 05, 2011 at 12:30:53PM -0300, Marcelo Tosatti wrote: On Fri, Aug 05, 2011 at 08:16:42AM +0200, Jan Kiszka wrote: On 2011-08-05 06:02, David Gibson wrote: At present, an explicit test disallows use of -mem-path when kvm is enabled but KVM_CAP_SYNC_MMU is not set. In particular, this prevents the user from using hugetlbfs to back the guest memory. I can see no reason for this check, and when I asked about it previously, the only theory offered was that this was a limitation of the very early days of kvm which only happened to match the SYNC_MMU flag by accident. This patch, therefore, removes the check. This is of particular use to us on POWER, where we haven't yet implement SYNC_MMU, but where backing the guest with hugepages is possible, and in fact mandatory (for now). Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- exec.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/exec.c b/exec.c index 476b507..041637c 100644 --- a/exec.c +++ b/exec.c @@ -2818,11 +2818,6 @@ static void *file_ram_alloc(RAMBlock *block, return NULL; } -if (kvm_enabled() !kvm_has_sync_mmu()) { -fprintf(stderr, host lacks kvm mmu notifiers, -mem-path unsupported\n); -return NULL; -} - if (asprintf(filename, %s/qemu_back_mem.XX, path) == -1) { return NULL; } This is nothing trivial, see ce9a92411d in qemu-kvm or http://thread.gmane.org/gmane.comp.emulators.kvm.devel/27380. And it should rather target uq/master. CCing Avi, Marcelo, and the kvm list. Jan Well, sending the patch flushed out the real reason for that check, at least, as I thought it might. Yes, the check cannot be removed because there is the possibility of corruption using hugepages without mmu notifiers (described in the archived message above). Ok, so. If I understand the archived message correctly. First, this check *is* all about hugepages - which is not obvious from the test itself. Second, if userspace qemu passing hugepages to kvm can cause (host) kernel memory corruption, that is clearly a host kernel bug. So am I correct in thinking this is basically just a safety feature if qemu is run on a buggy kernel. Presumably this bug was corrected at some point? Is the presence of the SYNC_MMU feature just being used as a proxy for is this kernel recent enough to have the corruption bug fixed? In any case this test sure as hell needs a big comment next to it explaining this context. Why are mmu notifiers not implemented for PPC again? It's just not done yet; we're working on it. (That is, mmu notifiers are certainly present on PPC, it's just they're not wired up to kvm, yet). -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm PCI assignment VFIO ramblings
On Fri, Aug 05, 2011 at 09:10:09AM -0600, Alex Williamson wrote: On Fri, 2011-08-05 at 20:42 +1000, Benjamin Herrenschmidt wrote: Right. In fact to try to clarify the problem for everybody, I think we can distinguish two different classes of constraints that can influence the grouping of devices: 1- Hard constraints. These are typically devices using the same RID or where the RID cannot be reliably guaranteed (the later is the case with some PCIe-PCIX bridges which will take ownership of some transactions such as split but not all). Devices like that must be in the same domain. This is where PowerPC adds to what x86 does today the concept that the domains are pre-existing, since we use the RID for error isolation MMIO segmenting as well. so we need to create those domains at boot time. 2- Softer constraints. Those constraints derive from the fact that not applying them risks enabling the guest to create side effects outside of its sandbox. To some extent, there can be degrees of badness between the various things that can cause such constraints. Examples are shared LSIs (since trusting DisINTx can be chancy, see earlier discussions), potentially any set of functions in the same device can be problematic due to the possibility to get backdoor access to the BARs etc... This is what I've been trying to get to, hardware constraints vs system policy constraints. Now, what I derive from the discussion we've had so far, is that we need to find a proper fix for #1, but Alex and Avi seem to prefer that #2 remains a matter of libvirt/user doing the right thing (basically keeping a loaded gun aimed at the user's foot with a very very very sweet trigger but heh, let's not start a flamewar here :-) Doesn't your own uncertainty of whether or not to allow this lead to the same conclusion, that it belongs in userspace policy? I don't think we want to make white lists of which devices we trust to do DisINTx correctly part of the kernel interface, do we? Thanks, Yes, but the overall point is that both the hard and soft constraints are much easier to handle if a group or iommu domain or whatever is a persistent entity that can be set up once-per-boot by the admin with whatever degree of safety they want, rather than a transient entity tied to an fd's lifetime, which must be set up correctly, every time, by the thing establishing it. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [NEW] cgroup test * general smoke_test + module dependend subtests (memory test included) * library for future use in other tests (kvm)
I go through this and let you know. - Original Message - From: root r...@dhcp-26-193.brq.redhat.com cgroup.py: * structure for different cgroup subtests * contains basic cgroup-memory test cgroup_common.py: * library for cgroup handling (intended to be used from kvm test in the future) * universal smoke_test for every module cgroup_client.py: * application which is executed and controled using cgroups * contains smoke, memory, cpu and devices tests which were manually tested to break cgroup rules and will be used in the cgroup.py subtests Signed-off-by: Lukas Doktor ldok...@redhat.com --- client/tests/cgroup/cgroup.py | 236 ++ client/tests/cgroup/cgroup_client.py | 116 + client/tests/cgroup/control | 12 ++ 3 files changed, 364 insertions(+), 0 deletions(-) create mode 100755 client/tests/cgroup/cgroup.py create mode 100755 client/tests/cgroup/cgroup_client.py create mode 100644 client/tests/cgroup/control diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py new file mode 100755 index 000..d043d65 --- /dev/null +++ b/client/tests/cgroup/cgroup.py @@ -0,0 +1,236 @@ +from autotest_lib.client.bin import test +from autotest_lib.client.common_lib import error +import os, logging +import time +from cgroup_common import Cgroup as CG +from cgroup_common import CgroupModules + +class cgroup(test.test): + + Tests the cgroup functionalities + + version = 1 + _client = + modules = CgroupModules() + + + def run_once(self): + + Try to access different resources which are restricted by cgroup. + + logging.info('Start') + + err = + # Run available tests + for i in ['memory']: + try: + if self.modules.get_pwd(i): + if (eval (self.test_%s() % i)): + err += %s, % i + else: + logging.error(CGROUP: Skipping test_%s, module not + available/mounted, i) + err += %s, % i + except Exception, inst: + logging.error(CGROUP: test_%s fatal failure: %s, i, inst) + err += %s, % i + + if err: + raise error.TestFail('CGROUP: Some subtests failed (%s)' % err[:-2]) + + + def setup(self): + + Setup + + logging.info('Setup') + + self._client = os.path.join(self.bindir, cgroup_client.py) + + _modules = ['cpuset', 'ns', 'cpu', 'cpuacct', 'memory', 'devices', + 'freezer', 'net_cls', 'blkio'] + if (self.modules.init(_modules) = 0): + raise error.TestFail('Can\'t mount any cgroup modules') + + + def cleanup(self): + + Unmount all cgroups and remove directories + + logging.info('Cleanup') + self.modules.cleanup() + + + # + # TESTS + # + def test_memory(self): + + Memory test + + # Preparation + logging.info(Entering 'test_memory') + item = CG('memory', self._client) + if item.initialize(self.modules): + logging.error(test_memory: cgroup init failed) + return -1 + + if item.smoke_test(): + logging.error(test_memory: smoke_test failed) + return -1 + + pwd = item.mk_cgroup() + if pwd == None: + logging.error(test_memory: Can't create cgroup) + return -1 + + logging.debug(test_memory: Memory filling test) + + f = open('/proc/meminfo','r') + mem = f.readline() + while not mem.startswith(MemFree): + mem = f.readline() + # Use only 1G or max of the free memory + mem = min(int(mem.split()[1])/1024, 1024) + mem = max(mem, 100) # at least 100M + if (item.get_property(memory.memsw.limit_in_bytes, supress=True) + != None): + memsw = True + # Clear swap + os.system(swapoff -a) + os.system(swapon -a) + f.seek(0) + swap = f.readline() + while not swap.startswith(SwapTotal): + swap = f.readline() + swap = int(swap.split()[1])/1024 + if swap mem / 2: + logging.error(Not enough swap memory to test 'memsw') + memsw = False + else: + # Doesn't support swap+memory limitation, disable swap + logging.info('memsw' not supported) + os.system(swapoff -a) + memsw = False + logging.debug(test_memory: Initializition passed) + + + # Fill the memory without cgroup limitation + # Should pass + + ps = item.test(memfill %d % mem) + ps.stdin.write('\n') + i = 0 + while ps.poll() == None: + if i 60: + break + i += 1 + time.sleep(1) + if i 60: + logging.error(test_memory: Memory filling failed (WO cgroup)) + ps.terminate() + return -1 + if not ps.stdout.readlines()[-1].startswith(PASS): + logging.error(test_memory: Unsuccessful memory filling + (WO cgroup)) + return -1 + logging.debug(test_memory: Memfill WO cgroup passed) + + + # Fill the memory with 1/2 memory limit + # memsw: should swap out part of the process and pass + # WO memsw: should fail (SIGKILL) + + ps = item.test(memfill %d % mem) + if item.set_cgroup(ps.pid, pwd): + logging.error(test_memory:
Re: [RFC PATCH]vhost-blk: In-kernel accelerator for virtio block device
On 08/08/2011 01:04 PM, Badari Pulavarty wrote: On 8/7/2011 6:35 PM, Liu Yuan wrote: On 08/06/2011 02:02 AM, Badari Pulavarty wrote: On 8/5/2011 4:04 AM, Liu Yuan wrote: On 08/05/2011 05:58 AM, Badari Pulavarty wrote: Hi Liu Yuan, I started testing your patches. I applied your kernel patch to 3.0 and applied QEMU to latest git. I passed 6 blockdevices from the host to guest (4 vcpu, 4GB RAM). I ran simple dd read tests from the guest on all block devices (with various blocksizes, iflag=direct). Unfortunately, system doesn't stay up. I immediately get into panic on the host. I didn't get time to debug the problem. Wondering if you have seen this issue before and/or you have new patchset to try ? Let me know. Thanks, Badari Okay, it is actually a bug pointed out by MST on the other thread, that it needs a mutex for completion thread. Now would you please this attachment?This patch only applies to kernel part, on top of v1 kernel patch. This patch mainly moves completion thread into vhost thread as a function. As a result, both requests submitting and completion signalling is in the same thread. Yuan Unfortunately, dd tests (4 out of 6) in the guest hung. I see following messages virtio_blk virtio2: requests: id 0 is not a head ! virtio_blk virtio3: requests: id 1 is not a head ! virtio_blk virtio5: requests: id 1 is not a head ! virtio_blk virtio1: requests: id 1 is not a head ! I still see host panics. I will collect the host panic and see if its still same or not. Thanks, Badari Would you please show me how to reproduce it step by step? I tried dd with two block device attached, but didn't get hung nor panic. Yuan I did 6 dds on 6 block devices.. dd if=/dev/vdb of=/dev/null bs=1M iflag=direct dd if=/dev/vdc of=/dev/null bs=1M iflag=direct dd if=/dev/vdd of=/dev/null bs=1M iflag=direct dd if=/dev/vde of=/dev/null bs=1M iflag=direct dd if=/dev/vdf of=/dev/null bs=1M iflag=direct dd if=/dev/vdg of=/dev/null bs=1M iflag=direct I can reproduce the problem with in 3 minutes :( Thanks, Badari Ah...I made an embarrassing mistake that I tried to 'free()' an kmem_cache object. Would you please revert the vblk-for-kernel-2 patch and apply the new one attached in this letter? Yuan, Thanks diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c index ecaf6fe..7a24aba 100644 --- a/drivers/vhost/blk.c +++ b/drivers/vhost/blk.c @@ -47,6 +47,7 @@ struct vhost_blk { struct eventfd_ctx *ectx; struct file *efile; struct task_struct *worker; + struct vhost_poll poll; }; struct used_info { @@ -62,6 +63,7 @@ static struct kmem_cache *used_info_cachep; static void blk_flush(struct vhost_blk *blk) { vhost_poll_flush(blk-vq.poll); + vhost_poll_flush(blk-poll); } static long blk_set_features(struct vhost_blk *blk, u64 features) @@ -146,11 +148,11 @@ static long blk_reset_owner(struct vhost_blk *b) blk_stop(b); blk_flush(b); ret = vhost_dev_reset_owner(b-dev); - if (b-worker) { - b-should_stop = 1; - smp_mb(); - eventfd_signal(b-ectx, 1); - } +// if (b-worker) { +// b-should_stop = 1; +// smp_mb(); +// eventfd_signal(b-ectx, 1); +// } err: mutex_unlock(b-dev.mutex); return ret; @@ -361,8 +363,8 @@ static long vhost_blk_ioctl(struct file *f, unsigned int ioctl, default: mutex_lock(blk-dev.mutex); ret = vhost_dev_ioctl(blk-dev, ioctl, arg); - if (!ret ioctl == VHOST_SET_OWNER) -ret = blk_set_owner(blk); +// if (!ret ioctl == VHOST_SET_OWNER) +//ret = blk_set_owner(blk); blk_flush(blk); mutex_unlock(blk-dev.mutex); break; @@ -480,10 +482,50 @@ static void handle_guest_kick(struct vhost_work *work) handle_kick(blk); } +static void handle_completetion(struct vhost_work* work) +{ + struct vhost_blk *blk = container_of(work, struct vhost_blk, poll.work); + struct timespec ts = { 0 }; + int ret, i, nr; + u64 count; + + do { + ret = eventfd_ctx_read(blk-ectx, 1, count); + } while (unlikely(ret == -ERESTARTSYS)); + + do { + nr = kernel_read_events(blk-ioctx, count, MAX_EVENTS, events, ts); + } while (unlikely(nr == -EINTR)); + dprintk(%s, count %llu, nr %d\n, __func__, count, nr); + + if (unlikely(nr = 0)) + return; + + for (i = 0; i nr; i++) { + struct used_info *u = (struct used_info *)events[i].obj; + int len, status; + + dprintk(%s, head %d complete in %d\n, __func__, u-head, i); + len = io_event_ret(events[i]); + //status = u-len == len ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR; + status = len 0 ? VIRTIO_BLK_S_OK : VIRTIO_BLK_S_IOERR; + if (copy_to_user(u-status, status, sizeof status)) { + vq_err(blk-vq, %s failed to write status\n, __func__); + BUG(); /* FIXME: maybe a bit radical? */ + } + vhost_add_used(blk-vq, u-head, u-len); + kmem_cache_free(used_info_cachep, u); + } + + vhost_signal(blk-dev, blk-vq); +} + static void eventfd_setup(struct vhost_blk *blk) { blk-efile = eventfd_file_create(0, 0); blk-ectx =
Re: [PATCH] Permit -mem-path without sync mmu
On 08/08/2011 09:03 AM, David Gibson wrote: Second, if userspace qemu passing hugepages to kvm can cause (host) kernel memory corruption, that is clearly a host kernel bug. So am I correct in thinking this is basically just a safety feature if qemu is run on a buggy kernel. Seems so, yes. 2.6.2[456] are exploitable. We only found out after these were all released. Presumably this bug was corrected at some point? Is the presence of the SYNC_MMU feature just being used as a proxy for is this kernel recent enough to have the corruption bug fixed? SYNC_MMU actually fixes the bug. In any case this test sure as hell needs a big comment next to it explaining this context. Yes. Why are mmu notifiers not implemented for PPC again? It's just not done yet; we're working on it. (That is, mmu notifiers are certainly present on PPC, it's just they're not wired up to kvm, yet). If ppc doesn't have this issue even without SYNC_MMU, we can make the check x86 specific. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm PCI assignment VFIO ramblings
On 08/03/2011 05:04 AM, David Gibson wrote: I still don't understand the distinction you're making. We're saying the group is owned by a given user or guest in the sense that no-one else may use anything in the group (including host drivers). At that point none, some or all of the devices in the group may actually be used by the guest. You seem to be making a distinction between owned by and assigned to and used by and I really don't see what it is. Alex (and I) think that we should work with device/function granularity, as is common with other archs, and that the group thing is just a constraint on which functions may be assigned where, while you think that we should work at group granularity, with 1-function groups for archs which don't have constraints. Is this an accurate way of putting it? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Adds cgroup handling library
[new] cgroup_common.py * library for handling cgroups Signed-off-by: Lukas Doktor ldok...@redhat.com --- client/tests/cgroup/cgroup.py|5 +- client/tests/cgroup/cgroup_common.py | 327 ++ 2 files changed, 331 insertions(+), 1 deletions(-) create mode 100755 client/tests/cgroup/cgroup_common.py diff --git a/client/tests/cgroup/cgroup.py b/client/tests/cgroup/cgroup.py index d043d65..112f012 100755 --- a/client/tests/cgroup/cgroup.py +++ b/client/tests/cgroup/cgroup.py @@ -118,6 +118,7 @@ class cgroup(test.test): # Fill the memory without cgroup limitation # Should pass +logging.debug(test_memory: Memfill WO cgroup) ps = item.test(memfill %d % mem) ps.stdin.write('\n') i = 0 @@ -141,6 +142,7 @@ class cgroup(test.test): # memsw: should swap out part of the process and pass # WO memsw: should fail (SIGKILL) +logging.debug(test_memory: Memfill mem only limit) ps = item.test(memfill %d % mem) if item.set_cgroup(ps.pid, pwd): logging.error(test_memory: Could not set cgroup) @@ -187,6 +189,7 @@ class cgroup(test.test): # Fill the memory with 1/2 memory+swap limit # Should fail +logging.debug(test_memory: Memfill mem + swap limit) if memsw: ps = item.test(memfill %d % mem) if item.set_cgroup(ps.pid, pwd): @@ -226,11 +229,11 @@ class cgroup(test.test): logging.debug(test_memory: Memfill mem+swap cgroup passed) # cleanup +logging.debug(test_memory: Cleanup) if item.rm_cgroup(pwd): logging.error(test_memory: Can't remove cgroup directory) return -1 os.system(swapon -a) -logging.debug(test_memory: Cleanup passed) logging.info(Leaving 'test_memory': PASSED) return 0 diff --git a/client/tests/cgroup/cgroup_common.py b/client/tests/cgroup/cgroup_common.py new file mode 100755 index 000..3fd1cf7 --- /dev/null +++ b/client/tests/cgroup/cgroup_common.py @@ -0,0 +1,327 @@ +#!/usr/bin/python +# -*- coding: utf-8 -*- + +Helpers for cgroup testing + +@copyright: 2011 Red Hat Inc. +@author: Lukas Doktor ldok...@redhat.com + +import os, logging +import subprocess +from tempfile import mkdtemp +import time + +class Cgroup: + +Cgroup handling class + +def __init__(self, module, _client): + +Constructor +@param module: Name of the cgroup module +@param _client: Test script pwd+name + +self.module = module +self._client = _client +self.root = None + + +def initialize(self, modules): + +Inicializes object for use +@param modules: array of all available cgroup modules +@return: 0 when PASSED + +self.root = modules.get_pwd(self.module) +if self.root: +return 0 +else: +logging.error(cg.initialize(): Module %s not found, self.module) +return -1 +return 0 + + +def mk_cgroup(self, root=None): + +Creates new temporary cgroup +@param root: where to create this cgroup (default: self.root) +@return: 0 when PASSED + +try: +if root: +pwd = mkdtemp(prefix='cgroup-', dir=root) + '/' +else: +pwd = mkdtemp(prefix='cgroup-', dir=self.root) + '/' +except Exception, inst: +logging.error(cg.mk_cgroup(): %s , inst) +return None +return pwd + + +def rm_cgroup(self, pwd, supress=False): + +Removes cgroup +@param pwd: cgroup directory +@param supress: supress output +@return: 0 when PASSED + +try: +os.rmdir(pwd) +except Exception, inst: +if not supress: +logging.error(cg.rm_cgroup(): %s , inst) +return -1 +return 0 + + +def test(self, cmd): + +Executes cgroup_client.py with cmd parameter +@param cmd: command to be executed +@return: subprocess.Popen() process + +logging.debug(cg.test(): executing paralel process '%s' , cmd) +process = subprocess.Popen((self._client + ' ' + cmd), shell=True, +stdin=subprocess.PIPE, stdout=subprocess.PIPE, +stderr=subprocess.PIPE, close_fds=True) +return process + + +def is_cgroup(self, pid, pwd): + +Checks if the 'pid' process is in 'pwd' cgroup +@param pid: pid of the process +@param pwd: cgroup directory +@return: 0 when is 'pwd' member + +if open(pwd+'/tasks').readlines().count(%d\n
Missing cgroup_common.py
Hi, I'm sorry for missing cgroup_common.py in previous patchset. I forgot to add it to git. You can find it in attached patch. Regards, Lukáš -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] postcopy livemigration proposal
On 08/08/2011 06:24 AM, Isaku Yamahata wrote: This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM on which we'll give a talk at KVM-forum. The purpose of this mail is to letting developers know it in advance so that we can get better feedback on its design/implementation approach early before our starting to implement it. Background == * What's is postcopy livemigration It is is yet another live migration mechanism for Qemu/KVM, which implements the migration technique known as postcopy or lazy migration. Just after the migrate command is invoked, the execution host of a VM is instantaneously switched to a destination host. The benefit is, total migration time is shorter because it transfer a page only once. On the other hand precopy may repeat sending same pages again and again because they can be dirtied. The switching time from the source to the destination is several hunderds mili seconds so that it enables quick load balancing. For details, please refer to the papers. We believe this is useful for others so that we'd like to merge this feature into the upstream qemu/kvm. The existing implementation that we have right now is very ad-hoc because it's for academic research. For the upstream merge, we're starting to re-design/implement it and we'd like to get feedback early. Although many improvements/optimizations are possible, we should implement/merge the simple/clean, but extensible as well, one at first and then improve/optimize it later. postcopy livemigration will be introduced as optional feature. The existing precopy livemigration remains as default behavior. * related links: project page http://sites.google.com/site/grivonhome/quick-kvm-migration Enabling Instantaneous Relocation of Virtual Machines with a Lightweight VMM Extension, (proof-of-concept, ad-hoc prototype. not a new design) http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf Reactive consolidation of virtual machines enabled by postcopy live migration (advantage for VM consolidation) http://portal.acm.org/citation.cfm?id=1996125 http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf Qemu wiki http://wiki.qemu.org/Features/PostCopyLiveMigration Design/Implementation = The basic idea of postcopy livemigration is to use a sort of distributed shared memory between the migration source and destination. The migration procedure looks like - start migration stop the guest VM on the source and send the machine states except guest RAM to the destination - resume the guest VM on the destination without guest RAM contents - Hook guest access to pages, and pull page contents from the source This continues until all the pages are pulled to the destination The big picture is depicted at http://wiki.qemu.org/File:Postcopy-livemigration.png That's terrific (nice video also)! Orit and myself had the exact same idea too (now we can't patent it..). Advantages: - No down time due to memory copying. - Efficient, reduce needed traffic no need to re-send pages. - Reduce overall RAM consumption of the source and destination as opposed from current live migration (both the source and the destination allocate the memory until the live migration completes). We can free copied memory once the destination guest received it and save RAM. - Increase parallelism for SMP guests we can have multiple virtual CPU handle their demand paging . Less time to hold a global lock, less thread contention. - Virtual machines are using more and more memory resources , for a virtual machine with very large working set doing live migration with reasonable down time is impossible today. Disadvantageous: - During the live migration the guest will run slower than in today's live migration. We need to remember that even today guests suffer from performance penalty on the source during the COW stage (memory copy). - Failure of the source or destination or the network will cause us to lose the running virtual machine. Those failures are very rare. In case there is shared storage we can store a copy of the memory there , that can be recovered in case of such failure . Overall, it looks like a better approach for the vast majority of cases. Hope it will get merged to kvm and become the default way. There are several design points. - who takes care of pulling page contents. an independent daemon vs a thread in qemu The daemon approach is preferable because an independent daemon would easy for debug postcopy memory mechanism without qemu. If required, it wouldn't be difficult to convert a daemon into a thread in qemu - connection between the source and the destination
Re: [Qemu-devel] [RFC] postcopy livemigration proposal
On Mon, Aug 8, 2011 at 4:24 AM, Isaku Yamahata yamah...@valinux.co.jp wrote: This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM on which we'll give a talk at KVM-forum. I'm curious if this approach is compatible with asynchronous page faults? The idea there was to tell the guest about a page fault so it can continue to do useful work in the meantime (if the fault was in guest userspace). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] postcopy livemigration proposal
On 08/08/2011 12:20, Dor Laor wrote: On 08/08/2011 06:24 AM, Isaku Yamahata wrote: This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM on which we'll give a talk at KVM-forum. The purpose of this mail is to letting developers know it in advance so that we can get better feedback on its design/implementation approach early before our starting to implement it. Background == * What's is postcopy livemigration It is is yet another live migration mechanism for Qemu/KVM, which implements the migration technique known as postcopy or lazy migration. Just after the migrate command is invoked, the execution host of a VM is instantaneously switched to a destination host. The benefit is, total migration time is shorter because it transfer a page only once. On the other hand precopy may repeat sending same pages again and again because they can be dirtied. The switching time from the source to the destination is several hunderds mili seconds so that it enables quick load balancing. For details, please refer to the papers. We believe this is useful for others so that we'd like to merge this feature into the upstream qemu/kvm. The existing implementation that we have right now is very ad-hoc because it's for academic research. For the upstream merge, we're starting to re-design/implement it and we'd like to get feedback early. Although many improvements/optimizations are possible, we should implement/merge the simple/clean, but extensible as well, one at first and then improve/optimize it later. postcopy livemigration will be introduced as optional feature. The existing precopy livemigration remains as default behavior. * related links: project page http://sites.google.com/site/grivonhome/quick-kvm-migration Enabling Instantaneous Relocation of Virtual Machines with a Lightweight VMM Extension, (proof-of-concept, ad-hoc prototype. not a new design) http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf Reactive consolidation of virtual machines enabled by postcopy live migration (advantage for VM consolidation) http://portal.acm.org/citation.cfm?id=1996125 http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf Qemu wiki http://wiki.qemu.org/Features/PostCopyLiveMigration Design/Implementation = The basic idea of postcopy livemigration is to use a sort of distributed shared memory between the migration source and destination. The migration procedure looks like - start migration stop the guest VM on the source and send the machine states except guest RAM to the destination - resume the guest VM on the destination without guest RAM contents - Hook guest access to pages, and pull page contents from the source This continues until all the pages are pulled to the destination The big picture is depicted at http://wiki.qemu.org/File:Postcopy-livemigration.png That's terrific (nice video also)! Orit and myself had the exact same idea too (now we can't patent it..). Advantages: - No down time due to memory copying. - Efficient, reduce needed traffic no need to re-send pages. - Reduce overall RAM consumption of the source and destination as opposed from current live migration (both the source and the destination allocate the memory until the live migration completes). We can free copied memory once the destination guest received it and save RAM. - Increase parallelism for SMP guests we can have multiple virtual CPU handle their demand paging . Less time to hold a global lock, less thread contention. - Virtual machines are using more and more memory resources , for a virtual machine with very large working set doing live migration with reasonable down time is impossible today. Disadvantageous: - During the live migration the guest will run slower than in today's live migration. We need to remember that even today guests suffer from performance penalty on the source during the COW stage (memory copy). - Failure of the source or destination or the network will cause us to lose the running virtual machine. Those failures are very rare. I highly doubt that's acceptable in enterprise deployments. In case there is shared storage we can store a copy of the memory there , that can be recovered in case of such failure . Overall, it looks like a better approach for the vast majority of cases. Hope it will get merged to kvm and become the default way. There are several design points. - who takes care of pulling page contents. an independent daemon vs a thread in qemu The daemon approach is preferable because an independent daemon would easy for debug postcopy memory mechanism without qemu. If required, it wouldn't be
Re: [Qemu-devel] [RFC] postcopy livemigration proposal
On Mon, Aug 08, 2011 at 10:38:35AM +0100, Stefan Hajnoczi wrote: On Mon, Aug 8, 2011 at 4:24 AM, Isaku Yamahata yamah...@valinux.co.jp wrote: This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM on which we'll give a talk at KVM-forum. I'm curious if this approach is compatible with asynchronous page faults? The idea there was to tell the guest about a page fault so it can continue to do useful work in the meantime (if the fault was in guest userspace). Yes. It's quite possible to inject async page fault into the guest when the faulted page isn't available on the destination. At the same time the page will be requested to the source of the migration. I think it's not so difficult. -- yamahata -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: percpu crash on NetBurst
Hello, Avi. On Sun, Aug 07, 2011 at 06:32:35PM +0300, Avi Kivity wrote: qemu, under some conditions (-cpu host or -cpu kvm64), erroneously passes family=15 as the virtual cpuid. This causes a BUG() in percpu code during late boot: [ cut here ] kernel BUG at mm/percpu.c:577! This means that free_percpu() was passed a pointer which doesn't point to the start of an allocated area. ie. the caller is trying to free invalid pointer. H... from the backtrace, it seems to be caused by super_block-s_files. Weird. [811060cc] free_percpu+0x8c/0x140 [811462a5] __put_super+0x45/0x80 [811463d5] put_super+0x25/0x40 [8114651a] deactivate_locked_super+0x5a/0x70 [81146f0e] deactivate_super+0x4e/0x70 [811614e5] mntput_no_expire+0xb5/0x100 [8116154f] mntput+0x1f/0x30 [81245855] mq_put_mnt+0x15/0x20 [81245f77] put_ipc_ns+0x47/0xa0 [81080232] free_nsproxy+0x42/0x90 [81080440] switch_task_namespaces+0x50/0x60 [81080460] exit_task_namespaces+0x10/0x20 [8105d29c] do_exit+0x46c/0x870 [8105da02] do_group_exit+0x42/0xa0 [8105da77] sys_exit_group+0x17/0x20 [81521382] system_call_fastpath+0x16/0x1b Code: e7 41 89 54 24 14 e8 f2 fd ff ff 5b 41 5c 41 5d 41 5e 5d c3 31 f6 31 db e9 f5 fe ff ff 45 31 ed 31 c9 31 db e9 02 ff ff ff 0f 0b 0f 0b 55 48 89 e5 48 83 ec 20 48 89 5d e0 4c 89 65 e8 4c 89 6d RIP [8110603e] pcpu_free_area+0x17e/0x180 RSP 880001cabd18 ---[ end trace 87bc11c05d27169e ]--- I traced this to the kernel cpuid code determining the cache line size: arch/x86/kernel/cpu/intel.c: if (c-x86 == 15) c-x86_cache_alignment = c-x86_clflush_size * 2; If I comment out this code, the kernel boots and all is well. I suspect that the percpu code sometimes uses x86_cache_alignment and sometimes some hardcoded macro; I saw some negative elements of chunk-map[]. The negative elements indicate allocated areas. All this applies to v3.0; current upstream (c2f340a69ca) fails even worse, haven't yet determined exactly why. I'm surprised this hasn't been reported before; Ingo, don't you have family=15 hosts in your test farm? Hmmm... I can't trigger the problem w/ kvm64 (I tried mounting and unmounting filesystems but it worked okay) and am quite skeptical this is a wide spread problem given that the percpu core code is used very widely and hasn't seen a lot of changes lately. Is there anything specific you need to do to trigger the condition? Can you try to print out the s_files addresses being allocated and freed? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On Fri, Aug 05, 2011 at 08:52:25AM -0500, Anthony Liguori wrote: On 08/04/2011 08:05 AM, Avi Kivity wrote: From: Michael S. Tsirkinm...@redhat.com We originally did get config on map, so that following write accesses are done on an updated config. New memory API doesn't give us a callback on map, and arguably, devices don't know when cpu really can access there. So updating on init seems cleaner. Signed-off-by: Michael S. Tsirkinm...@redhat.com Signed-off-by: Avi Kivitya...@redhat.com --- hw/virtio-pci.c |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index d685243..ca1f12f 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -506,9 +506,6 @@ static void virtio_map(PCIDevice *pci_dev, int region_num, register_ioport_read(addr, config_len, 1, virtio_pci_config_readb, proxy); register_ioport_read(addr, config_len, 2, virtio_pci_config_readw, proxy); register_ioport_read(addr, config_len, 4, virtio_pci_config_readl, proxy); - -if (vdev-config_len) -vdev-get_config(vdev, vdev-config); } static void virtio_write_config(PCIDevice *pci_dev, uint32_t address, @@ -689,6 +686,10 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev) proxy-host_features |= 0x1 VIRTIO_F_NOTIFY_ON_EMPTY; proxy-host_features |= 0x1 VIRTIO_F_BAD_FEATURE; proxy-host_features = vdev-get_features(vdev, proxy-host_features); + +if (vdev-config_len) { +vdev-get_config(vdev, vdev-config); +} Thinking more closely, I don't think this right. Updating on map ensured that the config was refreshed after each time the bar was mapped. In the very least, the config needs to be refreshed during reset because the guest may write to the guest space which should get cleared after reset. Regards, Anthony Liguori Not sure I understand. Which register, for example, do you have in mind? Could you clarify please? } static int virtio_blk_init_pci(PCIDevice *pci_dev) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] postcopy livemigration proposal
* What's is postcopy livemigration It is is yet another live migration mechanism for Qemu/KVM, which implements the migration technique known as postcopy or lazy migration. Just after the migrate command is invoked, the execution host of a VM is instantaneously switched to a destination host. Sounds like a cool idea. The benefit is, total migration time is shorter because it transfer a page only once. On the other hand precopy may repeat sending same pages again and again because they can be dirtied. The switching time from the source to the destination is several hunderds mili seconds so that it enables quick load balancing. For details, please refer to the papers. While these are the obvious benefits, the possible downside (that, as always, depends on the workload) is the amount of time that the guest workload runs more slowly than usual, waiting for pages it needs to continue. There are a whole spectrum between the guest pausing completely (which would solve all the problems of migration, but is often considered unacceptible) and running at full-speed. Is it acceptable that the guest runs at 90% speed during the migration? 50%? 10%? I guess we could have nothing to lose from having both options, and choosing the most appropriate technique for each guest! That's terrific (nice video also)! Orit and myself had the exact same idea too (now we can't patent it..). I think new implementation is not the only reason why you cannot patent this idea :-) Demand-paged migration has actually been discussed (and done) for nearly a quarter of a century (!) in the area of *process* migration. The first use I'm aware of was in CMU's Accent 1987 - see [1]. Another paper, [2], written in 1991, discusses how process migration is done in UCB's Sprite operating system, and evaluates the various alternatives common at the time (20 years ago), including what it calls lazy copying is more-or-less the same thing as post copy. Mosix (a project which, in some sense, is still alive to day) also used some sort of cross between pre-copying (of dirty pages) and copying on-demand of clean pages (from their backing store on the source machine). References [1] Attacking the Process Migration Bottleneck http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf [2] Transparent Process Migration: Design Alternatives and the Sprite Implementation http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf Advantages: - Virtual machines are using more and more memory resources , for a virtual machine with very large working set doing live migration with reasonable down time is impossible today. If a guest actually constantly uses (working set) most of its allocated memory, it will basically be unable to do any significant amount of work on the destination VM until this large working set is transfered to the destination. So in this scenario, post copying doesn't give any significant advantages over plain-old pause guest and send it to the destination. Or am I missing something? Disadvantageous: - During the live migration the guest will run slower than in today's live migration. We need to remember that even today guests suffer from performance penalty on the source during the COW stage (memory copy). I wonder if something like asynchronous page faults can help somewhat with multi-process guest workloads (and modified (PV) guest OS). - Failure of the source or destination or the network will cause us to lose the running virtual machine. Those failures are very rare. How is this different from a VM running on a single machine that fails? Just that the small probability of failure (roughly) doubles for the relatively-short duration of the transfer? -- Nadav Har'El| Monday, Aug 8 2011, 8 Av 5771 n...@math.technion.ac.il |- Phone +972-523-790466, ICQ 13349191 |If glory comes after death, I'm not in a http://nadav.harel.org.il |hurry. (Latin proverb) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] posix-aio-compat: fix latency issues
In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Signed-off-by: Avi Kivity a...@redhat.com --- posix-aio-compat.c | 48 ++-- 1 files changed, 46 insertions(+), 2 deletions(-) diff --git a/posix-aio-compat.c b/posix-aio-compat.c index 8dc00cb..aa30673 100644 --- a/posix-aio-compat.c +++ b/posix-aio-compat.c @@ -30,6 +30,7 @@ #include block/raw-posix-aio.h +static void do_spawn_thread(void); struct qemu_paiocb { BlockDriverAIOCB common; @@ -64,6 +65,9 @@ static pthread_attr_t attr; static int max_threads = 64; static int cur_threads = 0; static int idle_threads = 0; +static int new_threads = 0; /* backlog of threads we need to create */ +static int pending_threads = 0; /* threads created but not running yet */ +static QEMUBH *new_thread_bh; static QTAILQ_HEAD(, qemu_paiocb) request_list; #ifdef CONFIG_PREADV @@ -311,6 +315,13 @@ static void *aio_thread(void *unused) pid = getpid(); +mutex_lock(lock); +if (new_threads) { +do_spawn_thread(); +} +pending_threads--; +mutex_unlock(lock); + while (1) { struct qemu_paiocb *aiocb; ssize_t ret = 0; @@ -381,11 +392,18 @@ static void *aio_thread(void *unused) return NULL; } -static void spawn_thread(void) +static void do_spawn_thread(void) { sigset_t set, oldset; -cur_threads++; +if (!new_threads) { +return; +} + +new_threads--; +pending_threads++; + +mutex_unlock(lock); /* block all signals */ if (sigfillset(set)) die(sigfillset); @@ -394,6 +412,31 @@ static void spawn_thread(void) thread_create(thread_id, attr, aio_thread, NULL); if (sigprocmask(SIG_SETMASK, oldset, NULL)) die(sigprocmask restore); + +mutex_lock(lock); +} + +static void spawn_thread_bh_fn(void *opaque) +{ +mutex_lock(lock); +do_spawn_thread(); +mutex_unlock(lock); +} + +static void spawn_thread(void) +{ +cur_threads++; +new_threads++; +/* If there are threads being created, they will spawn new workers, so + * we don't spend time creating many threads in a loop holding a mutex or + * starving the current vcpu. + * + * If there are no idle threads, ask the main thread to create one, so we + * inherit the correct affinity instead of the vcpu affinity. + */ +if (!pending_threads) { +qemu_bh_schedule(new_thread_bh); +} } static void qemu_paio_submit(struct qemu_paiocb *aiocb) @@ -665,6 +708,7 @@ int paio_init(void) die2(ret, pthread_attr_setdetachstate); QTAILQ_INIT(request_list); +new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL); posix_aio_state = s; return 0; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] postcopy livemigration proposal
On 08/08/2011 01:59 PM, Nadav Har'El wrote: * What's is postcopy livemigration It is is yet another live migration mechanism for Qemu/KVM, which implements the migration technique known as postcopy or lazy migration. Just after the migrate command is invoked, the execution host of a VM is instantaneously switched to a destination host. Sounds like a cool idea. The benefit is, total migration time is shorter because it transfer a page only once. On the other hand precopy may repeat sending same pages again and again because they can be dirtied. The switching time from the source to the destination is several hunderds mili seconds so that it enables quick load balancing. For details, please refer to the papers. While these are the obvious benefits, the possible downside (that, as always, depends on the workload) is the amount of time that the guest workload runs more slowly than usual, waiting for pages it needs to continue. There are a whole spectrum between the guest pausing completely (which would solve all the problems of migration, but is often considered unacceptible) and running at full-speed. Is it acceptable that the guest runs at 90% speed during the migration? 50%? 10%? I guess we could have nothing to lose from having both options, and choosing the most appropriate technique for each guest! +1 That's terrific (nice video also)! Orit and myself had the exact same idea too (now we can't patent it..). I think new implementation is not the only reason why you cannot patent this idea :-) Demand-paged migration has actually been discussed (and done) for nearly a quarter of a century (!) in the area of *process* migration. The first use I'm aware of was in CMU's Accent 1987 - see [1]. Another paper, [2], written in 1991, discusses how process migration is done in UCB's Sprite operating system, and evaluates the various alternatives common at the time (20 years ago), including what it calls lazy copying is more-or-less the same thing as post copy. Mosix (a project which, in some sense, is still alive to day) also used some sort of cross between pre-copying (of dirty pages) and copying on-demand of clean pages (from their backing store on the source machine). References [1] Attacking the Process Migration Bottleneck http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf w/o reading the internals, patents enable you to implement an existing idea on a new field. Anyway, there won't be no patent in this case. Still let's have the kvm innovation merged. [2] Transparent Process Migration: Design Alternatives and the Sprite Implementation http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf Advantages: - Virtual machines are using more and more memory resources , for a virtual machine with very large working set doing live migration with reasonable down time is impossible today. If a guest actually constantly uses (working set) most of its allocated memory, it will basically be unable to do any significant amount of work on the destination VM until this large working set is transfered to the destination. So in this scenario, post copying doesn't give any significant advantages over plain-old pause guest and send it to the destination. Or am I missing something? There is one key advantage in this scheme/use case - if you have a guest with a very large working set, you'll need a very large downtime in order to migrate it with today's algorithm. With post copy (aka streaming/demand paging), the guest won't have any downtime but will run slower than expected. There are guests today that is impractical to really live migrate them. btw: Even today, marking pages RO also carries some performance penalty. Disadvantageous: - During the live migration the guest will run slower than in today's live migration. We need to remember that even today guests suffer from performance penalty on the source during the COW stage (memory copy). I wonder if something like asynchronous page faults can help somewhat with multi-process guest workloads (and modified (PV) guest OS). They should come in to play for some extent. Note that only newer Linux guest will enjoy of them. - Failure of the source or destination or the network will cause us to lose the running virtual machine. Those failures are very rare. How is this different from a VM running on a single machine that fails? Just that the small probability of failure (roughly) doubles for the relatively-short duration of the transfer? Exactly my point, this is not a major disadvantage because of this low probability. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
On 08/08/2011 06:37 AM, Avi Kivity wrote: In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Do you have a scenario where you can measure the benefits of this change? The idle time in the thread pool is rather large, it surprises me that it'd be an issue in practice. Regards, Anthony Liguori Signed-off-by: Avi Kivitya...@redhat.com --- posix-aio-compat.c | 48 ++-- 1 files changed, 46 insertions(+), 2 deletions(-) diff --git a/posix-aio-compat.c b/posix-aio-compat.c index 8dc00cb..aa30673 100644 --- a/posix-aio-compat.c +++ b/posix-aio-compat.c @@ -30,6 +30,7 @@ #include block/raw-posix-aio.h +static void do_spawn_thread(void); struct qemu_paiocb { BlockDriverAIOCB common; @@ -64,6 +65,9 @@ static pthread_attr_t attr; static int max_threads = 64; static int cur_threads = 0; static int idle_threads = 0; +static int new_threads = 0; /* backlog of threads we need to create */ +static int pending_threads = 0; /* threads created but not running yet */ +static QEMUBH *new_thread_bh; static QTAILQ_HEAD(, qemu_paiocb) request_list; #ifdef CONFIG_PREADV @@ -311,6 +315,13 @@ static void *aio_thread(void *unused) pid = getpid(); +mutex_lock(lock); +if (new_threads) { +do_spawn_thread(); +} +pending_threads--; +mutex_unlock(lock); + while (1) { struct qemu_paiocb *aiocb; ssize_t ret = 0; @@ -381,11 +392,18 @@ static void *aio_thread(void *unused) return NULL; } -static void spawn_thread(void) +static void do_spawn_thread(void) { sigset_t set, oldset; -cur_threads++; +if (!new_threads) { +return; +} + +new_threads--; +pending_threads++; + +mutex_unlock(lock); /* block all signals */ if (sigfillset(set)) die(sigfillset); @@ -394,6 +412,31 @@ static void spawn_thread(void) thread_create(thread_id,attr, aio_thread, NULL); if (sigprocmask(SIG_SETMASK,oldset, NULL)) die(sigprocmask restore); + +mutex_lock(lock); +} + +static void spawn_thread_bh_fn(void *opaque) +{ +mutex_lock(lock); +do_spawn_thread(); +mutex_unlock(lock); +} + +static void spawn_thread(void) +{ +cur_threads++; +new_threads++; +/* If there are threads being created, they will spawn new workers, so + * we don't spend time creating many threads in a loop holding a mutex or + * starving the current vcpu. + * + * If there are no idle threads, ask the main thread to create one, so we + * inherit the correct affinity instead of the vcpu affinity. + */ +if (!pending_threads) { +qemu_bh_schedule(new_thread_bh); +} } static void qemu_paio_submit(struct qemu_paiocb *aiocb) @@ -665,6 +708,7 @@ int paio_init(void) die2(ret, pthread_attr_setdetachstate); QTAILQ_INIT(request_list); +new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL); posix_aio_state = s; return 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] postcopy livemigration proposal
On 08/08/2011 06:24 AM, Isaku Yamahata wrote: This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM on which we'll give a talk at KVM-forum. The purpose of this mail is to letting developers know it in advance so that we can get better feedback on its design/implementation approach early before our starting to implement it. Interesting; what is the impact of increased latency on memory reads? There are several design points. - who takes care of pulling page contents. an independent daemon vs a thread in qemu The daemon approach is preferable because an independent daemon would easy for debug postcopy memory mechanism without qemu. If required, it wouldn't be difficult to convert a daemon into a thread in qemu Isn't this equivalent to touching each page in sequence? Care must be taken that we don't post too many requests, or it could affect the latency of synchronous accesses by the guest. - connection between the source and the destination The connection for live migration can be re-used after sending machine state. - transfer protocol The existing protocol that exists today can be extended. - hooking guest RAM access Introduce a character device to handle page fault. When page fault occurs, it queues page request up to user space daemon at the destination. And the daemon pulls page contents from the source and serves it into the character device. Then the page fault is resovlved. This doesn't play well with host swapping, transparent hugepages, or ksm, does it? I see you note this later on. * More on hooking guest RAM access There are several candidate for the implementation. Our preference is character device approach. - inserting hooks into everywhere in qemu/kvm This is impractical - backing store for guest ram a block device or a file can be used to back guest RAM. Thus hook the guest ram access. pros - new device driver isn't needed. cons - future improvement would be difficult - some KVM host feature(KSM, THP) wouldn't work - character device qemu mmap() the dedicated character device, and then hook page fault. pros - straght forward approach - future improvement would be easy cons - new driver is needed - some KVM host feature(KSM, THP) wouldn't work They checks if a given VMA is anonymous. This can be fixed. - swap device When creating guest, it is set up as if all the guest RAM is swapped out to a dedicated swap device, which may be nbd disk (or some kind of user space block device, BUSE?). When the VM tries to access memory, swap-in is triggered and IO to the swap device is issued. Then the IO to swap is routed to the daemon in user space with nbd protocol (or BUSE, AOE, iSCSI...). The daemon pulls pages from the migration source and services the IO request. pros - After the page transfer is complete, everything is same as normal case. - no new device driver isn't needed cons - future improvement would be difficult - administration: setting up nbd, swap device Using a swap device would be my preference. We'd still be using anonymous memory so thp/ksm/ordinary swap still work. It would need to be a special kind of swap device since we only want to swap in, and never out, to that device. We'd also need a special way of telling the kernel that memory comes from that device. In that it's similar your second option. Maybe we should use a backing file (using nbd) and have a madvise() call that converts the vma to anonymous memory once the migration is finished. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
On 08/08/2011 03:34 PM, Anthony Liguori wrote: On 08/08/2011 06:37 AM, Avi Kivity wrote: In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Do you have a scenario where you can measure the benefits of this change? It's a customer scenario, so I can't share it. Not that I know exactly what happened there in terms of workload. The idle time in the thread pool is rather large, it surprises me that it'd be an issue in practice. Just starting up a virtio guest will fill the queue with max_threads requests, and if the vcpu is pinned, all 64 thread creations and executions will have to run on the same cpu, and will likely preempt the vcpu since it's classified as a cpu hog by some schedulers. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote: Thinking more closely, I don't think this right. Updating on map ensured that the config was refreshed after each time the bar was mapped. In the very least, the config needs to be refreshed during reset because the guest may write to the guest space which should get cleared after reset. Regards, Anthony Liguori Not sure I understand. Which register, for example, do you have in mind? Could you clarify please? Actually, you never need to call config_get() AFAICT. It's called in every read/write access. So I think the code you changed is extraneous now. Regards, Anthony Liguori } static int virtio_blk_init_pci(PCIDevice *pci_dev) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On 08/08/2011 03:45 PM, Anthony Liguori wrote: Actually, you never need to call config_get() AFAICT. It's called in every read/write access. So I think the code you changed is extraneous now. Ok; I'll drop this patch and report (and just remove the code in virtio_map()). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
2011/8/8 Avi Kivity a...@redhat.com: In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Signed-off-by: Avi Kivity a...@redhat.com Why not calling pthread_attr_setaffinity_np (where available) before thread creation or shed_setaffinity at thread start instead of telling another thread to create a thread for us just to get affinity cleared? Regards Frediano --- posix-aio-compat.c | 48 ++-- 1 files changed, 46 insertions(+), 2 deletions(-) diff --git a/posix-aio-compat.c b/posix-aio-compat.c index 8dc00cb..aa30673 100644 --- a/posix-aio-compat.c +++ b/posix-aio-compat.c @@ -30,6 +30,7 @@ #include block/raw-posix-aio.h +static void do_spawn_thread(void); struct qemu_paiocb { BlockDriverAIOCB common; @@ -64,6 +65,9 @@ static pthread_attr_t attr; static int max_threads = 64; static int cur_threads = 0; static int idle_threads = 0; +static int new_threads = 0; /* backlog of threads we need to create */ +static int pending_threads = 0; /* threads created but not running yet */ +static QEMUBH *new_thread_bh; static QTAILQ_HEAD(, qemu_paiocb) request_list; #ifdef CONFIG_PREADV @@ -311,6 +315,13 @@ static void *aio_thread(void *unused) pid = getpid(); + mutex_lock(lock); + if (new_threads) { + do_spawn_thread(); + } + pending_threads--; + mutex_unlock(lock); + while (1) { struct qemu_paiocb *aiocb; ssize_t ret = 0; @@ -381,11 +392,18 @@ static void *aio_thread(void *unused) return NULL; } -static void spawn_thread(void) +static void do_spawn_thread(void) { sigset_t set, oldset; - cur_threads++; + if (!new_threads) { + return; + } + + new_threads--; + pending_threads++; + + mutex_unlock(lock); /* block all signals */ if (sigfillset(set)) die(sigfillset); @@ -394,6 +412,31 @@ static void spawn_thread(void) thread_create(thread_id, attr, aio_thread, NULL); if (sigprocmask(SIG_SETMASK, oldset, NULL)) die(sigprocmask restore); + + mutex_lock(lock); +} + +static void spawn_thread_bh_fn(void *opaque) +{ + mutex_lock(lock); + do_spawn_thread(); + mutex_unlock(lock); +} + +static void spawn_thread(void) +{ + cur_threads++; + new_threads++; + /* If there are threads being created, they will spawn new workers, so + * we don't spend time creating many threads in a loop holding a mutex or + * starving the current vcpu. + * + * If there are no idle threads, ask the main thread to create one, so we + * inherit the correct affinity instead of the vcpu affinity. + */ + if (!pending_threads) { + qemu_bh_schedule(new_thread_bh); + } } static void qemu_paio_submit(struct qemu_paiocb *aiocb) @@ -665,6 +708,7 @@ int paio_init(void) die2(ret, pthread_attr_setdetachstate); QTAILQ_INIT(request_list); + new_thread_bh = qemu_bh_new(spawn_thread_bh_fn, NULL); posix_aio_state = s; return 0; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
On 08/08/2011 03:49 PM, Frediano Ziglio wrote: 2011/8/8 Avi Kivitya...@redhat.com: In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Signed-off-by: Avi Kivitya...@redhat.com Why not calling pthread_attr_setaffinity_np (where available) before thread creation or shed_setaffinity at thread start instead of telling another thread to create a thread for us just to get affinity cleared? The entire qemu process may be affined to a subset of the host cpus; we don't want to break that. For example: taskset 0xf0 qemu (qemu) info cpus pin individual vcpu threads to host cpus -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On Mon, Aug 08, 2011 at 07:45:19AM -0500, Anthony Liguori wrote: On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote: Thinking more closely, I don't think this right. Updating on map ensured that the config was refreshed after each time the bar was mapped. In the very least, the config needs to be refreshed during reset because the guest may write to the guest space which should get cleared after reset. Regards, Anthony Liguori Not sure I understand. Which register, for example, do you have in mind? Could you clarify please? Actually, you never need to call config_get() AFAICT. It's called in every read/write access. Every read, yes. But every write? Are you sure? So I think the code you changed is extraneous now. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Introduce short names for fixed width integer types
QEMU deals with a lot of fixed width integer types; their names (uint64_t etc) are clumsy to use and take up a lot of space. Following Linux, introduce shorter names, for example U64 for uint64_t. Signed-off-by: Avi Kivity a...@redhat.com --- qemu-common.h |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index 0fdecf1..52a2300 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -112,6 +112,15 @@ static inline char *realpath(const char *path, char *resolved_path) int qemu_main(int argc, char **argv, char **envp); #endif +typedef int8_t S8; +typedef uint8_t U8; +typedef int16_t S16; +typedef uint16_t U16; +typedef int32_t S32; +typedef uint32_t U32; +typedef int64_t S64; +typedef uint64_t U64; + /* bottom halves */ typedef void QEMUBHFunc(void *opaque); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types
On 08/08/2011 07:56 AM, Avi Kivity wrote: QEMU deals with a lot of fixed width integer types; their names (uint64_t etc) are clumsy to use and take up a lot of space. Following Linux, introduce shorter names, for example U64 for uint64_t. Except Linux uses lower case letters. I personally think Linux style is wrong here. The int8_t types are standard types. Besides, we save lots of characters by using 4-space tabs instead of 8-space tabs. We can afford to spend some of those saved characters on using proper type names :-) Regards, Anthony Liguori Signed-off-by: Avi Kivitya...@redhat.com --- qemu-common.h |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index 0fdecf1..52a2300 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -112,6 +112,15 @@ static inline char *realpath(const char *path, char *resolved_path) int qemu_main(int argc, char **argv, char **envp); #endif +typedef int8_t S8; +typedef uint8_t U8; +typedef int16_t S16; +typedef uint16_t U16; +typedef int32_t S32; +typedef uint32_t U32; +typedef int64_t S64; +typedef uint64_t U64; + /* bottom halves */ typedef void QEMUBHFunc(void *opaque); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On 08/08/2011 07:56 AM, Michael S. Tsirkin wrote: On Mon, Aug 08, 2011 at 07:45:19AM -0500, Anthony Liguori wrote: On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote: Thinking more closely, I don't think this right. Updating on map ensured that the config was refreshed after each time the bar was mapped. In the very least, the config needs to be refreshed during reset because the guest may write to the guest space which should get cleared after reset. Regards, Anthony Liguori Not sure I understand. Which register, for example, do you have in mind? Could you clarify please? Actually, you never need to call config_get() AFAICT. It's called in every read/write access. Every read, yes. But every write? Are you sure? Yeah, not on write, but I think this is a bug. get_config() should be called before doing the memcpy() in order to have a proper RMW. Regards, Anthony Liguori So I think the code you changed is extraneous now. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: balloon: test multiple devices
Multiple balloon devices should not be allowed. Check if the qemu we're running under has the right fixes. Signed-off-by: Amit Shah amit.s...@redhat.com --- client/tests/kvm/tests/balloon_check.py | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/tests/balloon_check.py b/client/tests/kvm/tests/balloon_check.py index 0b7f0f4..d79ed13 100644 --- a/client/tests/kvm/tests/balloon_check.py +++ b/client/tests/kvm/tests/balloon_check.py @@ -64,6 +64,18 @@ def run_balloon_check(test, params, env): fail += 1 return fail +def multiple_devices(): + +Hot-plugging multiple balloon devices isn't allowed. +Ensure qemu fails hot-plugging a second device. + +try: +vm.monitor.cmd(device_add virtio-balloon-pci) +except kvm_monitor.MonitorError, e: +# This is good. +return 0 +logging.error(Multiple balloon devices allowed by this version of qemu) +return 1 fail = 0 vm = env.get_vm(params[main_vm]) @@ -100,6 +112,8 @@ def run_balloon_check(test, params, env): # we won't trigger guest OOM killer while running multiple iterations fail += balloon_memory(vm_assigned_mem) +fail += multiple_devices() + # Close stablished session session.close() # Check if any failures happen during the whole test -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 01/39] memory: rename PORTIO_END to PORTIO_END_OF_LIST
For consistency with other _END_OF_LIST macros. Signed-off-by: Avi Kivity a...@redhat.com --- memory.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/memory.h b/memory.h index 4e518b2..da00a3b 100644 --- a/memory.h +++ b/memory.h @@ -133,7 +133,7 @@ struct MemoryRegionPortio { IOPortWriteFunc *write; }; -#define PORTIO_END { } +#define PORTIO_END_OF_LIST() { } /** * memory_region_init: Initialize a memory region -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 00/39] Memory API, batch 2: PCI devices
This is a mostly mindless conversion of all QEMU PCI devices to the memory API. After this patchset is applied, it is no longer possible to create a PCI device using the old API. An immediate benefit is that PCI BARs that overlap each other are now handled correctly: currently, the sequence map BAR 0 map BAR 1 at an overlapping address unmap either BAR 0 or BAR 1 will leave a hole where the overlap exists. With the patchset, the memory map is restored correctly. Note that overlaps of PCI BARs with memory or non-PCI resources are still not resolved correctly; this will be fixed later on. The vga patches have ugly intermediate states; however the result is fairly clean. Changes from v3: - dropped virtio-pci config patch; will be fixed outside this patchset if necessary - minor style fixes Changes from v2: - added patch from Michael simplifying virtio-pci config setup Changes from v1: - cmd646 type fix - folded a fixlet into its parent Avi Kivity (39): memory: rename PORTIO_END to PORTIO_END_OF_LIST pci: add API to get a BAR's mapped address vmsvga: don't remember pci BAR address in callback any more vga: convert vga and its derivatives to the memory API cirrus: simplify mmio BAR access functions cirrus: simplify bitblt BAR access functions cirrus: simplify vga window mmio access functions vga: simplify vga window mmio access functions cirrus: simplify linear framebuffer access functions Integrate I/O memory regions into qemu pci: pass I/O address space to new PCI bus pci: allow I/O BARs to be registered with pci_register_bar_region() rtl8139: convert to memory API ac97: convert to memory API e1000: convert to memory API eepro100: convert to memory API es1370: convert to memory API ide: convert to memory API ivshmem: convert to memory API virtio-pci: convert to memory API ahci: convert to memory API intel-hda: convert to memory API lsi53c895a: convert to memory API ppc: convert to memory API ne2000: convert to memory API pcnet: convert to memory API i6300esb: convert to memory API isa-mmio: convert to memory API sun4u: convert to memory API ehci: convert to memory API uhci: convert to memory API xen-platform: convert to memory API msix: convert to memory API pci: remove pci_register_bar_simple() pci: convert pci rom to memory API pci: remove pci_register_bar() pci: fold BAR mapping function into its caller pci: rename pci_register_bar_region() to pci_register_bar() pci: remove support for pre memory API BARs exec-memory.h |5 + exec.c | 10 ++ hw/ac97.c | 88 ++- hw/apb_pci.c |1 + hw/bonito.c|1 + hw/cirrus_vga.c| 459 --- hw/cuda.c |6 +- hw/e1000.c | 113 ++ hw/eepro100.c | 181 - hw/es1370.c| 43 +++-- hw/escc.c | 42 +++--- hw/escc.h |2 +- hw/grackle_pci.c |8 +- hw/gt64xxx.c |4 +- hw/heathrow_pic.c | 29 ++-- hw/ide.h |2 +- hw/ide/ahci.c | 31 ++-- hw/ide/ahci.h |2 +- hw/ide/cmd646.c| 204 +++- hw/ide/ich.c |3 +- hw/ide/macio.c | 36 +++-- hw/ide/pci.c | 25 ++-- hw/ide/pci.h | 19 ++- hw/ide/piix.c | 63 ++-- hw/ide/via.c | 64 ++-- hw/intel-hda.c | 35 +++-- hw/isa.h |2 + hw/isa_mmio.c | 29 ++-- hw/ivshmem.c | 158 +++ hw/lance.c | 31 ++-- hw/lsi53c895a.c| 257 +++--- hw/mac_dbdma.c | 32 ++-- hw/mac_dbdma.h |4 +- hw/mac_nvram.c | 39 ++--- hw/macio.c | 73 - hw/msix.c | 64 +++- hw/msix.h |6 +- hw/ne2000-isa.c| 13 +-- hw/ne2000.c| 77 ++--- hw/ne2000.h|8 +- hw/openpic.c | 81 +- hw/openpic.h |2 +- hw/pc.h|4 +- hw/pc_piix.c |6 +- hw/pci.c | 133 +--- hw/pci.h | 26 ++-- hw/pci_internals.h |3 +- hw/pcnet-pci.c | 74 + hw/pcnet.h |4 +- hw/piix_pci.c | 14 +- hw/ppc4xx_pci.c|1 + hw/ppc_mac.h | 27 ++-- hw/ppc_newworld.c | 34 ++-- hw/ppc_oldworld.c | 27 ++-- hw/ppc_prep.c |2 +- hw/ppce500_pci.c |7 +- hw/prep_pci.c |8 +- hw/prep_pci.h |4 +- hw/qxl-render.c|2 +- hw/qxl.c | 129 ++-- hw/qxl.h |6 +- hw/rtl8139.c | 70 hw/sh_pci.c|4 +- hw/sun4u.c | 53 +++ hw/unin_pci.c | 16 ++- hw/usb-ehci.c | 36 +--- hw/usb-ohci.c |2 +- hw/usb-uhci.c | 41 +++-- hw/versatile_pci.c |2 +- hw/vga-isa-mm.c| 46 -- hw/vga-isa.c | 10 +- hw/vga-pci.c | 27
[PATCH v4 22/39] intel-hda: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/intel-hda.c | 35 +++ 1 files changed, 19 insertions(+), 16 deletions(-) diff --git a/hw/intel-hda.c b/hw/intel-hda.c index 5a2bc3a..1e4c71e 100644 --- a/hw/intel-hda.c +++ b/hw/intel-hda.c @@ -177,7 +177,7 @@ struct IntelHDAState { IntelHDAStream st[8]; /* state */ -int mmio_addr; +MemoryRegion mmio; uint32_t rirb_count; int64_t wall_base_ns; @@ -1084,16 +1084,20 @@ static uint32_t intel_hda_mmio_readl(void *opaque, target_phys_addr_t addr) return intel_hda_reg_read(d, reg, 0x); } -static CPUReadMemoryFunc * const intel_hda_mmio_read[3] = { -intel_hda_mmio_readb, -intel_hda_mmio_readw, -intel_hda_mmio_readl, -}; - -static CPUWriteMemoryFunc * const intel_hda_mmio_write[3] = { -intel_hda_mmio_writeb, -intel_hda_mmio_writew, -intel_hda_mmio_writel, +static const MemoryRegionOps intel_hda_mmio_ops = { +.old_mmio = { +.read = { +intel_hda_mmio_readb, +intel_hda_mmio_readw, +intel_hda_mmio_readl, +}, +.write = { +intel_hda_mmio_writeb, +intel_hda_mmio_writew, +intel_hda_mmio_writel, +}, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; /* - */ @@ -1130,10 +1134,9 @@ static int intel_hda_init(PCIDevice *pci) /* HDCTL off 0x40 bit 0 selects signaling mode (1-HDA, 0 - Ac97) 18.1.19 */ conf[0x40] = 0x01; -d-mmio_addr = cpu_register_io_memory(intel_hda_mmio_read, - intel_hda_mmio_write, d, - DEVICE_NATIVE_ENDIAN); -pci_register_bar_simple(d-pci, 0, 0x4000, 0, d-mmio_addr); +memory_region_init_io(d-mmio, intel_hda_mmio_ops, d, + intel-hda, 0x4000); +pci_register_bar_region(d-pci, 0, 0, d-mmio); if (d-msi) { msi_init(d-pci, 0x50, 1, true, false); } @@ -1149,7 +1152,7 @@ static int intel_hda_exit(PCIDevice *pci) IntelHDAState *d = DO_UPCAST(IntelHDAState, pci, pci); msi_uninit(d-pci); -cpu_unregister_io_memory(d-mmio_addr); +memory_region_destroy(d-mmio); return 0; } -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 13/39] rtl8139: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/rtl8139.c | 72 ++--- 1 files changed, 38 insertions(+), 34 deletions(-) diff --git a/hw/rtl8139.c b/hw/rtl8139.c index 5214b8c..f07af35 100644 --- a/hw/rtl8139.c +++ b/hw/rtl8139.c @@ -474,7 +474,6 @@ typedef struct RTL8139State { NICState *nic; NICConf conf; -int rtl8139_mmio_io_addr; /* C ring mode */ uint32_t currTxDesc; @@ -506,6 +505,9 @@ typedef struct RTL8139State { QEMUTimer *timer; int64_t TimerExpire; +MemoryRegion bar_io; +MemoryRegion bar_mem; + /* Support migration to/from old versions */ int rtl8139_mmio_io_addr_dummy; } RTL8139State; @@ -3283,7 +3285,7 @@ static void rtl8139_pre_save(void *opaque) rtl8139_set_next_tctr_time(s, current_time); s-TCTR = muldiv64(current_time - s-TCTR_base, PCI_FREQUENCY, get_ticks_per_sec()); -s-rtl8139_mmio_io_addr_dummy = s-rtl8139_mmio_io_addr; +s-rtl8139_mmio_io_addr_dummy = 0; } static const VMStateDescription vmstate_rtl8139 = { @@ -3379,31 +3381,35 @@ static const VMStateDescription vmstate_rtl8139 = { /***/ /* PCI RTL8139 definitions */ -static void rtl8139_ioport_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -RTL8139State *s = DO_UPCAST(RTL8139State, dev, pci_dev); - -register_ioport_write(addr, 0x100, 1, rtl8139_ioport_writeb, s); -register_ioport_read( addr, 0x100, 1, rtl8139_ioport_readb, s); - -register_ioport_write(addr, 0x100, 2, rtl8139_ioport_writew, s); -register_ioport_read( addr, 0x100, 2, rtl8139_ioport_readw, s); - -register_ioport_write(addr, 0x100, 4, rtl8139_ioport_writel, s); -register_ioport_read( addr, 0x100, 4, rtl8139_ioport_readl, s); -} +static const MemoryRegionPortio rtl8139_portio[] = { +{ 0, 0x100, 1, .read = rtl8139_ioport_readb, }, +{ 0, 0x100, 1, .write = rtl8139_ioport_writeb, }, +{ 0, 0x100, 2, .read = rtl8139_ioport_readw, }, +{ 0, 0x100, 2, .write = rtl8139_ioport_writew, }, +{ 0, 0x100, 4, .read = rtl8139_ioport_readl, }, +{ 0, 0x100, 4, .write = rtl8139_ioport_writel, }, +PORTIO_END_OF_LIST() +}; -static CPUReadMemoryFunc * const rtl8139_mmio_read[3] = { -rtl8139_mmio_readb, -rtl8139_mmio_readw, -rtl8139_mmio_readl, +static const MemoryRegionOps rtl8139_io_ops = { +.old_portio = rtl8139_portio, +.endianness = DEVICE_LITTLE_ENDIAN, }; -static CPUWriteMemoryFunc * const rtl8139_mmio_write[3] = { -rtl8139_mmio_writeb, -rtl8139_mmio_writew, -rtl8139_mmio_writel, +static const MemoryRegionOps rtl8139_mmio_ops = { +.old_mmio = { +.read = { +rtl8139_mmio_readb, +rtl8139_mmio_readw, +rtl8139_mmio_readl, +}, +.write = { +rtl8139_mmio_writeb, +rtl8139_mmio_writew, +rtl8139_mmio_writel, +}, +}, +.endianness = DEVICE_LITTLE_ENDIAN, }; static void rtl8139_timer(void *opaque) @@ -3432,7 +3438,8 @@ static int pci_rtl8139_uninit(PCIDevice *dev) { RTL8139State *s = DO_UPCAST(RTL8139State, dev, dev); -cpu_unregister_io_memory(s-rtl8139_mmio_io_addr); +memory_region_destroy(s-bar_io); +memory_region_destroy(s-bar_mem); if (s-cplus_txbuffer) { qemu_free(s-cplus_txbuffer); s-cplus_txbuffer = NULL; @@ -3462,15 +3469,12 @@ static int pci_rtl8139_init(PCIDevice *dev) * list bit in status register, and offset 0xdc seems unused. */ pci_conf[PCI_CAPABILITY_LIST] = 0xdc; -/* I/O handler for memory-mapped I/O */ -s-rtl8139_mmio_io_addr = -cpu_register_io_memory(rtl8139_mmio_read, rtl8139_mmio_write, s, - DEVICE_LITTLE_ENDIAN); - -pci_register_bar(s-dev, 0, 0x100, - PCI_BASE_ADDRESS_SPACE_IO, rtl8139_ioport_map); - -pci_register_bar_simple(s-dev, 1, 0x100, 0, s-rtl8139_mmio_io_addr); +memory_region_init_io(s-bar_io, rtl8139_io_ops, s, rtl8139, 0x100); +memory_region_init_io(s-bar_mem, rtl8139_mmio_ops, s, rtl8139, 0x100); +pci_register_bar_region(s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, +s-bar_io); +pci_register_bar_region(s-dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, +s-bar_mem); qemu_macaddr_default_if_unset(s-conf.macaddr); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 21/39] ahci: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/ide/ahci.c | 31 +-- hw/ide/ahci.h |2 +- hw/ide/ich.c |3 +-- 3 files changed, 15 insertions(+), 21 deletions(-) diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c index 1f008a3..e207ca0 100644 --- a/hw/ide/ahci.c +++ b/hw/ide/ahci.c @@ -276,12 +276,12 @@ static void ahci_port_write(AHCIState *s, int port, int offset, uint32_t val) } } -static uint32_t ahci_mem_readl(void *ptr, target_phys_addr_t addr) +static uint64_t ahci_mem_read(void *opaque, target_phys_addr_t addr, + unsigned size) { -AHCIState *s = ptr; +AHCIState *s = opaque; uint32_t val = 0; -addr = addr 0xfff; if (addr AHCI_GENERIC_HOST_CONTROL_REGS_MAX_ADDR) { switch (addr) { case HOST_CAP: @@ -314,10 +314,10 @@ static uint32_t ahci_mem_readl(void *ptr, target_phys_addr_t addr) -static void ahci_mem_writel(void *ptr, target_phys_addr_t addr, uint32_t val) +static void ahci_mem_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { -AHCIState *s = ptr; -addr = addr 0xfff; +AHCIState *s = opaque; /* Only aligned reads are allowed on AHCI */ if (addr 3) { @@ -364,16 +364,10 @@ static void ahci_mem_writel(void *ptr, target_phys_addr_t addr, uint32_t val) } -static CPUReadMemoryFunc * const ahci_readfn[3]={ -ahci_mem_readl, -ahci_mem_readl, -ahci_mem_readl -}; - -static CPUWriteMemoryFunc * const ahci_writefn[3]={ -ahci_mem_writel, -ahci_mem_writel, -ahci_mem_writel +static MemoryRegionOps ahci_mem_ops = { +.read = ahci_mem_read, +.write = ahci_mem_write, +.endianness = DEVICE_LITTLE_ENDIAN, }; static void ahci_reg_init(AHCIState *s) @@ -1131,8 +1125,8 @@ void ahci_init(AHCIState *s, DeviceState *qdev, int ports) s-ports = ports; s-dev = qemu_mallocz(sizeof(AHCIDevice) * ports); ahci_reg_init(s); -s-mem = cpu_register_io_memory(ahci_readfn, ahci_writefn, s, -DEVICE_LITTLE_ENDIAN); +/* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */ +memory_region_init_io(s-mem, ahci_mem_ops, s, ahci, 0x1000); irqs = qemu_allocate_irqs(ahci_irq_set, s, s-ports); for (i = 0; i s-ports; i++) { @@ -1151,6 +1145,7 @@ void ahci_init(AHCIState *s, DeviceState *qdev, int ports) void ahci_uninit(AHCIState *s) { +memory_region_destroy(s-mem); qemu_free(s-dev); } diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h index dc86951..e456193 100644 --- a/hw/ide/ahci.h +++ b/hw/ide/ahci.h @@ -289,7 +289,7 @@ struct AHCIDevice { typedef struct AHCIState { AHCIDevice *dev; AHCIControlRegs control_regs; -int mem; +MemoryRegion mem; int ports; qemu_irq irq; } AHCIState; diff --git a/hw/ide/ich.c b/hw/ide/ich.c index d241ea8..698b5f6 100644 --- a/hw/ide/ich.c +++ b/hw/ide/ich.c @@ -98,8 +98,7 @@ static int pci_ich9_ahci_init(PCIDevice *dev) msi_init(dev, 0x50, 1, true, false); d-ahci.irq = d-card.irq[0]; -/* XXX BAR size should be 1k, but that breaks, so bump it to 4k for now */ -pci_register_bar_simple(d-card, 5, 0x1000, 0, d-ahci.mem); +pci_register_bar_region(d-card, 5, 0, d-ahci.mem); return 0; } -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 15/39] e1000: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/e1000.c | 114 +-- 1 files changed, 48 insertions(+), 66 deletions(-) diff --git a/hw/e1000.c b/hw/e1000.c index 96d84f9..dfc082b 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -82,7 +82,8 @@ typedef struct E1000State_st { PCIDevice dev; NICState *nic; NICConf conf; -int mmio_index; +MemoryRegion mmio; +MemoryRegion io; uint32_t mac_reg[0x8000]; uint16_t phy_reg[0x20]; @@ -151,14 +152,6 @@ static const char phy_regcap[0x20] = { }; static void -ioport_map(PCIDevice *pci_dev, int region_num, pcibus_t addr, - pcibus_t size, int type) -{ -DBGOUT(IO, e1000_ioport_map addr=0x%04FMT_PCIBUS -size=0x%08FMT_PCIBUS\n, addr, size); -} - -static void set_interrupt_cause(E1000State *s, int index, uint32_t val) { if (val) @@ -905,7 +898,8 @@ static void (*macreg_writeops[])(E1000State *, int, uint32_t) = { enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) }; static void -e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) +e1000_mmio_write(void *opaque, target_phys_addr_t addr, uint64_t val, + unsigned size) { E1000State *s = opaque; unsigned int index = (addr 0x1) 2; @@ -913,31 +907,15 @@ e1000_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) if (index NWRITEOPS macreg_writeops[index]) { macreg_writeops[index](s, index, val); } else if (index NREADOPS macreg_readops[index]) { -DBGOUT(MMIO, e1000_mmio_writel RO %x: 0x%04x\n, index2, val); +DBGOUT(MMIO, e1000_mmio_writel RO %x: 0x%04PRIx64\n, index2, val); } else { -DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08x\n, +DBGOUT(UNKNOWN, MMIO unknown write addr=0x%08x,val=0x%08PRIx64\n, index2, val); } } -static void -e1000_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -// emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0x) (8*(addr 3))); -} - -static void -e1000_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -// emulate hw without byte enables: no RMW -e1000_mmio_writel(opaque, addr ~3, - (val 0xff) (8*(addr 3))); -} - -static uint32_t -e1000_mmio_readl(void *opaque, target_phys_addr_t addr) +static uint64_t +e1000_mmio_read(void *opaque, target_phys_addr_t addr, unsigned size) { E1000State *s = opaque; unsigned int index = (addr 0x1) 2; @@ -950,20 +928,39 @@ e1000_mmio_readl(void *opaque, target_phys_addr_t addr) return 0; } -static uint32_t -e1000_mmio_readb(void *opaque, target_phys_addr_t addr) +static const MemoryRegionOps e1000_mmio_ops = { +.read = e1000_mmio_read, +.write = e1000_mmio_write, +.endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 4, +.max_access_size = 4, +}, +}; + +static uint64_t e1000_io_read(void *opaque, target_phys_addr_t addr, + unsigned size) { -return ((e1000_mmio_readl(opaque, addr ~3)) -(8 * (addr 3))) 0xff; +E1000State *s = opaque; + +(void)s; +return 0; } -static uint32_t -e1000_mmio_readw(void *opaque, target_phys_addr_t addr) +static void e1000_io_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { -return ((e1000_mmio_readl(opaque, addr ~3)) -(8 * (addr 3))) 0x; +E1000State *s = opaque; + +(void)s; } +static const MemoryRegionOps e1000_io_ops = { +.read = e1000_io_read, +.write = e1000_io_write, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + static bool is_version_1(void *opaque, int version_id) { return version_id == 1; @@ -1083,36 +1080,22 @@ static const uint32_t mac_reg_init[] = { /* PCI interface */ -static CPUWriteMemoryFunc * const e1000_mmio_write[] = { -e1000_mmio_writeb, e1000_mmio_writew, e1000_mmio_writel -}; - -static CPUReadMemoryFunc * const e1000_mmio_read[] = { -e1000_mmio_readb, e1000_mmio_readw, e1000_mmio_readl -}; - static void -e1000_mmio_map(PCIDevice *pci_dev, int region_num, -pcibus_t addr, pcibus_t size, int type) +e1000_mmio_setup(E1000State *d) { -E1000State *d = DO_UPCAST(E1000State, dev, pci_dev); int i; const uint32_t excluded_regs[] = { E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS, E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE }; - -DBGOUT(MMIO, e1000_mmio_map addr=0x%08FMT_PCIBUS 0x%08FMT_PCIBUS\n, - addr, size); - -cpu_register_physical_memory(addr, PNPMMIO_SIZE, d-mmio_index); -qemu_register_coalesced_mmio(addr, excluded_regs[0]); - +memory_region_init_io(d-mmio,
[PATCH v4 38/39] pci: rename pci_register_bar_region() to pci_register_bar()
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/ac97.c |5 ++--- hw/cirrus_vga.c |5 ++--- hw/e1000.c|5 ++--- hw/eepro100.c |7 +++ hw/es1370.c |2 +- hw/ide/cmd646.c | 14 +- hw/ide/ich.c |2 +- hw/ide/piix.c |3 +-- hw/ide/via.c |3 +-- hw/intel-hda.c|2 +- hw/ivshmem.c | 15 +++ hw/lsi53c895a.c |7 +++ hw/macio.c|3 +-- hw/ne2000.c |2 +- hw/openpic.c |4 ++-- hw/pci.c |6 +++--- hw/pci.h |4 ++-- hw/pcnet-pci.c|4 ++-- hw/qxl.c | 16 hw/rtl8139.c |6 ++ hw/sun4u.c|6 ++ hw/usb-ehci.c |2 +- hw/usb-ohci.c |2 +- hw/usb-uhci.c |3 +-- hw/vga-pci.c |3 +-- hw/virtio-pci.c |9 - hw/vmware_vga.c |8 hw/wdt_i6300esb.c |2 +- hw/xen_platform.c |7 +++ 29 files changed, 68 insertions(+), 89 deletions(-) diff --git a/hw/ac97.c b/hw/ac97.c index 52f0f0d..541d9a4 100644 --- a/hw/ac97.c +++ b/hw/ac97.c @@ -1316,9 +1316,8 @@ static int ac97_initfn (PCIDevice *dev) memory_region_init_io (s-io_nam, ac97_io_nam_ops, s, ac97-nam, 1024); memory_region_init_io (s-io_nabm, ac97_io_nabm_ops, s, ac97-nabm, 256); -pci_register_bar_region (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io_nam); -pci_register_bar_region (s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, - s-io_nabm); +pci_register_bar (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io_nam); +pci_register_bar (s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, s-io_nabm); qemu_register_reset (ac97_on_reset, s); AUD_register_card (ac97, s-card); ac97_on_reset (s); diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c9887ac..b489309 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -2948,10 +2948,9 @@ static int pci_cirrus_vga_initfn(PCIDevice *dev) /* memory #0 LFB */ /* memory #1 memory-mapped I/O */ /* XXX: s-vga.vram_size must be a power of two */ - pci_register_bar_region(d-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, - s-pci_bar); + pci_register_bar(d-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, s-pci_bar); if (device_id == CIRRUS_ID_CLGD5446) { - pci_register_bar_region(d-dev, 1, 0, s-cirrus_mmio_io); + pci_register_bar(d-dev, 1, 0, s-cirrus_mmio_io); } return 0; } diff --git a/hw/e1000.c b/hw/e1000.c index dfc082b..29b453f 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -1158,10 +1158,9 @@ static int pci_e1000_init(PCIDevice *pci_dev) e1000_mmio_setup(d); -pci_register_bar_region(d-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, -d-mmio); +pci_register_bar(d-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, d-mmio); -pci_register_bar_region(d-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, d-io); +pci_register_bar(d-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, d-io); memmove(d-eeprom_data, e1000_eeprom_template, sizeof e1000_eeprom_template); diff --git a/hw/eepro100.c b/hw/eepro100.c index 04723f3..a636d30 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -1879,15 +1879,14 @@ static int e100_nic_init(PCIDevice *pci_dev) /* Handler for memory-mapped I/O */ memory_region_init_io(s-mmio_bar, eepro100_ops, s, eepro100-mmio, PCI_MEM_SIZE); -pci_register_bar_region(s-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, -s-mmio_bar); +pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_MEM_PREFETCH, s-mmio_bar); memory_region_init_io(s-io_bar, eepro100_ops, s, eepro100-io, PCI_IO_SIZE); -pci_register_bar_region(s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, s-io_bar); +pci_register_bar(s-dev, 1, PCI_BASE_ADDRESS_SPACE_IO, s-io_bar); /* FIXME: flash aliases to mmio?! */ memory_region_init_io(s-flash_bar, eepro100_ops, s, eepro100-flash, PCI_FLASH_SIZE); -pci_register_bar_region(s-dev, 2, 0, s-flash_bar); +pci_register_bar(s-dev, 2, 0, s-flash_bar); qemu_macaddr_default_if_unset(s-conf.macaddr); logout(macaddr: %s\n, nic_dump(s-conf.macaddr.a[0], 6)); diff --git a/hw/es1370.c b/hw/es1370.c index 4e43c4a..a9387d1 100644 --- a/hw/es1370.c +++ b/hw/es1370.c @@ -1009,7 +1009,7 @@ static int es1370_initfn (PCIDevice *dev) c[PCI_MAX_LAT] = 0x80; memory_region_init_io (s-io, es1370_io_ops, s, es1370, 256); -pci_register_bar_region (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io); +pci_register_bar (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io); qemu_register_reset (es1370_on_reset, s); AUD_register_card (es1370, s-card); diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index 13e6f2f..4d91e2c 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -270,16
[PATCH v4 24/39] ppc: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cuda.c |6 ++- hw/escc.c | 42 +-- hw/escc.h |2 +- hw/heathrow_pic.c | 29 -- hw/ide.h |2 +- hw/ide/macio.c| 36 --- hw/mac_dbdma.c| 32 ++-- hw/mac_dbdma.h|4 ++- hw/mac_nvram.c| 39 ++--- hw/macio.c| 74 +++- hw/openpic.c | 81 + hw/openpic.h |2 +- hw/ppc_mac.h | 16 ++ hw/ppc_newworld.c | 30 +-- hw/ppc_oldworld.c | 23 +++ 15 files changed, 201 insertions(+), 217 deletions(-) diff --git a/hw/cuda.c b/hw/cuda.c index 065c362..5c92d81 100644 --- a/hw/cuda.c +++ b/hw/cuda.c @@ -117,6 +117,7 @@ typedef struct CUDATimer { } CUDATimer; typedef struct CUDAState { +MemoryRegion mem; /* cuda registers */ uint8_t b; /* B-side data */ uint8_t a; /* A-side data */ @@ -722,7 +723,7 @@ static void cuda_reset(void *opaque) set_counter(s, s-timers[1], 0x); } -void cuda_init (int *cuda_mem_index, qemu_irq irq) +void cuda_init (MemoryRegion **cuda_mem, qemu_irq irq) { struct tm tm; CUDAState *s = cuda_state; @@ -738,8 +739,9 @@ void cuda_init (int *cuda_mem_index, qemu_irq irq) s-tick_offset = (uint32_t)mktimegm(tm) + RTC_OFFSET; s-adb_poll_timer = qemu_new_timer_ns(vm_clock, cuda_adb_poll, s); -*cuda_mem_index = cpu_register_io_memory(cuda_read, cuda_write, s, +cpu_register_io_memory(cuda_read, cuda_write, s, DEVICE_NATIVE_ENDIAN); +*cuda_mem = s-mem; vmstate_register(NULL, -1, vmstate_cuda, s); qemu_register_reset(cuda_reset, s); } diff --git a/hw/escc.c b/hw/escc.c index f6fd919..bea5873 100644 --- a/hw/escc.c +++ b/hw/escc.c @@ -126,7 +126,7 @@ struct SerialState { SysBusDevice busdev; struct ChannelState chn[2]; uint32_t it_shift; -int mmio_index; +MemoryRegion mmio; uint32_t disabled; uint32_t frequency; }; @@ -490,7 +490,8 @@ static void escc_update_parameters(ChannelState *s) qemu_chr_ioctl(s-chr, CHR_IOCTL_SERIAL_SET_PARAMS, ssp); } -static void escc_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +static void escc_mem_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { SerialState *serial = opaque; ChannelState *s; @@ -592,7 +593,8 @@ static void escc_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) } } -static uint32_t escc_mem_readb(void *opaque, target_phys_addr_t addr) +static uint64_t escc_mem_read(void *opaque, target_phys_addr_t addr, + unsigned size) { SerialState *serial = opaque; ChannelState *s; @@ -627,6 +629,16 @@ static uint32_t escc_mem_readb(void *opaque, target_phys_addr_t addr) return 0; } +static const MemoryRegionOps escc_mem_ops = { +.read = escc_mem_read, +.write = escc_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +.valid = { +.min_access_size = 1, +.max_access_size = 1, +}, +}; + static int serial_can_receive(void *opaque) { ChannelState *s = opaque; @@ -668,18 +680,6 @@ static void serial_event(void *opaque, int event) serial_receive_break(s); } -static CPUReadMemoryFunc * const escc_mem_read[3] = { -escc_mem_readb, -NULL, -NULL, -}; - -static CPUWriteMemoryFunc * const escc_mem_write[3] = { -escc_mem_writeb, -NULL, -NULL, -}; - static const VMStateDescription vmstate_escc_chn = { .name =escc_chn, .version_id = 2, @@ -712,7 +712,7 @@ static const VMStateDescription vmstate_escc = { } }; -int escc_init(target_phys_addr_t base, qemu_irq irqA, qemu_irq irqB, +MemoryRegion *escc_init(target_phys_addr_t base, qemu_irq irqA, qemu_irq irqB, CharDriverState *chrA, CharDriverState *chrB, int clock, int it_shift) { @@ -737,7 +737,7 @@ int escc_init(target_phys_addr_t base, qemu_irq irqA, qemu_irq irqB, } d = FROM_SYSBUS(SerialState, s); -return d-mmio_index; +return d-mmio; } static const uint8_t keycodes[128] = { @@ -901,7 +901,6 @@ void slavio_serial_ms_kbd_init(target_phys_addr_t base, qemu_irq irq, static int escc_init1(SysBusDevice *dev) { SerialState *s = FROM_SYSBUS(SerialState, dev); -int io; unsigned int i; s-chn[0].disabled = s-disabled; @@ -918,10 +917,9 @@ static int escc_init1(SysBusDevice *dev) s-chn[0].otherchn = s-chn[1]; s-chn[1].otherchn = s-chn[0]; -io = cpu_register_io_memory(escc_mem_read, escc_mem_write, s, -DEVICE_NATIVE_ENDIAN); -
[PATCH v4 20/39] virtio-pci: convert to memory API
except msix. [jan: fix build] Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/virtio-pci.c | 71 +-- hw/virtio-pci.h |2 +- 2 files changed, 28 insertions(+), 45 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index f3b3293..5df380d 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -162,7 +162,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, { VirtQueue *vq = virtio_get_queue(proxy-vdev, n); EventNotifier *notifier = virtio_queue_get_host_notifier(vq); -int r; +int r = 0; + if (assign) { r = event_notifier_init(notifier, 1); if (r 0) { @@ -170,24 +171,11 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, __func__, r); return r; } -r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), - proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, - n, assign); -if (r 0) { -error_report(%s: unable to map ioeventfd: %d, - __func__, r); -event_notifier_cleanup(notifier); -} +memory_region_add_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, + true, n, event_notifier_get_fd(notifier)); } else { -r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), - proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, - n, assign); -if (r 0) { -error_report(%s: unable to unmap ioeventfd: %d, - __func__, r); -return r; -} - +memory_region_del_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, + true, n, event_notifier_get_fd(notifier)); /* Handle the race condition where the guest kicked and we deassigned * before we got around to handling the kick. */ @@ -424,7 +412,6 @@ static uint32_t virtio_pci_config_readb(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -435,7 +422,6 @@ static uint32_t virtio_pci_config_readw(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -446,7 +432,6 @@ static uint32_t virtio_pci_config_readl(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -457,7 +442,6 @@ static void virtio_pci_config_writeb(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -470,7 +454,6 @@ static void virtio_pci_config_writew(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -483,7 +466,6 @@ static void virtio_pci_config_writel(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -492,30 +474,26 @@ static void virtio_pci_config_writel(void *opaque, uint32_t addr, uint32_t val) virtio_config_writel(proxy-vdev, addr, val); } -static void virtio_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -VirtIOPCIProxy *proxy = container_of(pci_dev, VirtIOPCIProxy, pci_dev); -VirtIODevice *vdev = proxy-vdev; -unsigned config_len = VIRTIO_PCI_REGION_SIZE(pci_dev) + vdev-config_len; - -proxy-addr = addr; - -register_ioport_write(addr, config_len, 1, virtio_pci_config_writeb, proxy); -register_ioport_write(addr, config_len, 2, virtio_pci_config_writew, proxy); -register_ioport_write(addr, config_len, 4, virtio_pci_config_writel, proxy); -register_ioport_read(addr, config_len, 1, virtio_pci_config_readb, proxy); -register_ioport_read(addr, config_len, 2, virtio_pci_config_readw, proxy); -register_ioport_read(addr, config_len, 4,
[PATCH v4 25/39] ne2000: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Signed-off-by: Avi Kivity a...@redhat.com --- hw/ne2000-isa.c | 13 ++--- hw/ne2000.c | 77 +- hw/ne2000.h |8 + 3 files changed, 58 insertions(+), 40 deletions(-) diff --git a/hw/ne2000-isa.c b/hw/ne2000-isa.c index e41dbba..756ed5c 100644 --- a/hw/ne2000-isa.c +++ b/hw/ne2000-isa.c @@ -27,6 +27,7 @@ #include qdev.h #include net.h #include ne2000.h +#include exec-memory.h typedef struct ISANE2000State { ISADevice dev; @@ -66,19 +67,11 @@ static int isa_ne2000_initfn(ISADevice *dev) ISANE2000State *isa = DO_UPCAST(ISANE2000State, dev, dev); NE2000State *s = isa-ne2000; -register_ioport_write(isa-iobase, 16, 1, ne2000_ioport_write, s); -register_ioport_read(isa-iobase, 16, 1, ne2000_ioport_read, s); +ne2000_setup_io(s, 0x20); isa_init_ioport_range(dev, isa-iobase, 16); - -register_ioport_write(isa-iobase + 0x10, 1, 1, ne2000_asic_ioport_write, s); -register_ioport_read(isa-iobase + 0x10, 1, 1, ne2000_asic_ioport_read, s); -register_ioport_write(isa-iobase + 0x10, 2, 2, ne2000_asic_ioport_write, s); -register_ioport_read(isa-iobase + 0x10, 2, 2, ne2000_asic_ioport_read, s); isa_init_ioport_range(dev, isa-iobase + 0x10, 2); - -register_ioport_write(isa-iobase + 0x1f, 1, 1, ne2000_reset_ioport_write, s); -register_ioport_read(isa-iobase + 0x1f, 1, 1, ne2000_reset_ioport_read, s); isa_init_ioport(dev, isa-iobase + 0x1f); +memory_region_add_subregion(get_system_io(), isa-iobase, s-io); isa_init_irq(dev, s-irq, isa-isairq); diff --git a/hw/ne2000.c b/hw/ne2000.c index f8acaae..5b76acf 100644 --- a/hw/ne2000.c +++ b/hw/ne2000.c @@ -297,7 +297,7 @@ ssize_t ne2000_receive(VLANClientState *nc, const uint8_t *buf, size_t size_) return size_; } -void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val) +static void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val) { NE2000State *s = opaque; int offset, page, index; @@ -394,7 +394,7 @@ void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val) } } -uint32_t ne2000_ioport_read(void *opaque, uint32_t addr) +static uint32_t ne2000_ioport_read(void *opaque, uint32_t addr) { NE2000State *s = opaque; int offset, page, ret; @@ -544,7 +544,7 @@ static inline void ne2000_dma_update(NE2000State *s, int len) } } -void ne2000_asic_ioport_write(void *opaque, uint32_t addr, uint32_t val) +static void ne2000_asic_ioport_write(void *opaque, uint32_t addr, uint32_t val) { NE2000State *s = opaque; @@ -564,7 +564,7 @@ void ne2000_asic_ioport_write(void *opaque, uint32_t addr, uint32_t val) } } -uint32_t ne2000_asic_ioport_read(void *opaque, uint32_t addr) +static uint32_t ne2000_asic_ioport_read(void *opaque, uint32_t addr) { NE2000State *s = opaque; int ret; @@ -612,12 +612,12 @@ static uint32_t ne2000_asic_ioport_readl(void *opaque, uint32_t addr) return ret; } -void ne2000_reset_ioport_write(void *opaque, uint32_t addr, uint32_t val) +static void ne2000_reset_ioport_write(void *opaque, uint32_t addr, uint32_t val) { /* nothing to do (end of reset pulse) */ } -uint32_t ne2000_reset_ioport_read(void *opaque, uint32_t addr) +static uint32_t ne2000_reset_ioport_read(void *opaque, uint32_t addr) { NE2000State *s = opaque; ne2000_reset(s); @@ -676,27 +676,55 @@ static const VMStateDescription vmstate_pci_ne2000 = { } }; -/***/ -/* PCI NE2000 definitions */ +static uint64_t ne2000_read(void *opaque, target_phys_addr_t addr, +unsigned size) +{ +NE2000State *s = opaque; -static void ne2000_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) +if (addr 0x10 size == 1) { +return ne2000_ioport_read(s, addr); +} else if (addr == 0x10) { +if (size = 2) { +return ne2000_asic_ioport_read(s, addr); +} else { +return ne2000_asic_ioport_readl(s, addr); +} +} else if (addr == 0x1f size == 1) { +return ne2000_reset_ioport_read(s, addr); +} +return ((uint64_t)1 (size * 8)) - 1; +} + +static void ne2000_write(void *opaque, target_phys_addr_t addr, + uint64_t data, unsigned size) { -PCINE2000State *d = DO_UPCAST(PCINE2000State, dev, pci_dev); -NE2000State *s = d-ne2000; +NE2000State *s = opaque; + +if (addr 0x10 size == 1) { +return ne2000_ioport_write(s, addr, data); +} else if (addr == 0x10) { +if (size = 2) { +return ne2000_asic_ioport_write(s, addr, data); +} else { +return ne2000_asic_ioport_writel(s, addr, data); +} +} else if (addr == 0x1f size == 1) { +return ne2000_reset_ioport_write(s, addr, data); +
[PATCH v4 32/39] xen-platform: convert to memory API
Since this device bypasses PCI and registers I/O ports directly with the system bus, it needs further attention. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/xen_platform.c | 83 - 1 files changed, 50 insertions(+), 33 deletions(-) diff --git a/hw/xen_platform.c b/hw/xen_platform.c index fb6be6a..0b89075 100644 --- a/hw/xen_platform.c +++ b/hw/xen_platform.c @@ -32,8 +32,8 @@ #include xen_common.h #include net.h #include xen_backend.h -#include rwhandler.h #include trace.h +#include exec-memory.h #include xenguest.h @@ -51,6 +51,9 @@ typedef struct PCIXenPlatformState { PCIDevice pci_dev; +MemoryRegion fixed_io; +MemoryRegion bar; +MemoryRegion mmio_bar; uint8_t flags; /* used only for version_id == 2 */ int drivers_blacklisted; uint16_t driver_product_version; @@ -221,21 +224,32 @@ static void platform_fixed_ioport_reset(void *opaque) platform_fixed_ioport_writeb(s, XEN_PLATFORM_IOPORT, 0); } +const MemoryRegionPortio xen_platform_ioport[] = { +{ 0, 16, 4, .write = platform_fixed_ioport_writel, }, +{ 0, 16, 2, .write = platform_fixed_ioport_writew, }, +{ 0, 16, 1, .write = platform_fixed_ioport_writeb, }, +{ 0, 16, 2, .read = platform_fixed_ioport_readw, }, +{ 0, 16, 1, .read = platform_fixed_ioport_readb, }, +PORTIO_END_OF_LIST() +}; + +static const MemoryRegionOps platform_fixed_io_ops = { +.old_portio = xen_platform_ioport, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + static void platform_fixed_ioport_init(PCIXenPlatformState* s) { -register_ioport_write(XEN_PLATFORM_IOPORT, 16, 4, platform_fixed_ioport_writel, s); -register_ioport_write(XEN_PLATFORM_IOPORT, 16, 2, platform_fixed_ioport_writew, s); -register_ioport_write(XEN_PLATFORM_IOPORT, 16, 1, platform_fixed_ioport_writeb, s); -register_ioport_read(XEN_PLATFORM_IOPORT, 16, 2, platform_fixed_ioport_readw, s); -register_ioport_read(XEN_PLATFORM_IOPORT, 16, 1, platform_fixed_ioport_readb, s); +memory_region_init_io(s-fixed_io, platform_fixed_io_ops, s, + xen-fixed, 16); +memory_region_add_subregion(get_system_io(), XEN_PLATFORM_IOPORT, +s-fixed_io); } /* Xen Platform PCI Device */ static uint32_t xen_platform_ioport_readb(void *opaque, uint32_t addr) { -addr = 0xff; - if (addr == 0) { return platform_fixed_ioport_readb(opaque, XEN_PLATFORM_IOPORT); } else { @@ -247,9 +261,6 @@ static void xen_platform_ioport_writeb(void *opaque, uint32_t addr, uint32_t val { PCIXenPlatformState *s = opaque; -addr = 0xff; -val = 0xff; - switch (addr) { case 0: /* Platform flags */ platform_fixed_ioport_writeb(opaque, XEN_PLATFORM_IOPORT, val); @@ -262,15 +273,23 @@ static void xen_platform_ioport_writeb(void *opaque, uint32_t addr, uint32_t val } } -static void platform_ioport_map(PCIDevice *pci_dev, int region_num, pcibus_t addr, pcibus_t size, int type) -{ -PCIXenPlatformState *d = DO_UPCAST(PCIXenPlatformState, pci_dev, pci_dev); +static MemoryRegionPortio xen_pci_portio[] = { +{ 0, 0x100, 1, .read = xen_platform_ioport_readb, }, +{ 0, 0x100, 1, .write = xen_platform_ioport_writeb, }, +PORTIO_END_OF_LIST() +}; + +static const MemoryRegionOps xen_pci_io_ops = { +.old_portio = xen_pci_portio, +}; -register_ioport_write(addr, size, 1, xen_platform_ioport_writeb, d); -register_ioport_read(addr, size, 1, xen_platform_ioport_readb, d); +static void platform_ioport_bar_setup(PCIXenPlatformState *d) +{ +memory_region_init_io(d-bar, xen_pci_io_ops, d, xen-pci, 0x100); } -static uint32_t platform_mmio_read(ReadWriteHandler *handler, pcibus_t addr, int len) +static uint64_t platform_mmio_read(void *opaque, target_phys_addr_t addr, + unsigned size) { DPRINTF(Warning: attempted read from physical address 0x TARGET_FMT_plx in xen platform mmio space\n, addr); @@ -278,28 +297,24 @@ static uint32_t platform_mmio_read(ReadWriteHandler *handler, pcibus_t addr, int return 0; } -static void platform_mmio_write(ReadWriteHandler *handler, pcibus_t addr, -uint32_t val, int len) +static void platform_mmio_write(void *opaque, target_phys_addr_t addr, +uint64_t val, unsigned size) { -DPRINTF(Warning: attempted write of 0x%x to physical +DPRINTF(Warning: attempted write of 0x%PRIx64 to physical address 0x TARGET_FMT_plx in xen platform mmio space\n, val, addr); } -static ReadWriteHandler platform_mmio_handler = { +static const MemoryRegionOps platform_mmio_handler = { .read = platform_mmio_read, .write = platform_mmio_write, +.endianness = DEVICE_NATIVE_ENDIAN, };
[PATCH v4 02/39] pci: add API to get a BAR's mapped address
Some (hacky) devices that have a back-channel to read this address back outside the normal configuration mechanisms, such as VMware svga. Reviewed-by: Richard Henderson r...@twiddle.net Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c |5 + hw/pci.h |1 + 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 8621d3d..c2c2699 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -952,6 +952,11 @@ void pci_register_bar_region(PCIDevice *pci_dev, int region_num, pci_dev-io_regions[region_num].memory = memory; } +pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num) +{ +return pci_dev-io_regions[region_num].addr; +} + static void pci_bridge_filter(PCIDevice *d, pcibus_t *addr, pcibus_t *size, uint8_t type) { diff --git a/hw/pci.h b/hw/pci.h index c51156d..64282ad 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -207,6 +207,7 @@ void pci_register_bar_simple(PCIDevice *pci_dev, int region_num, pcibus_t size, uint8_t attr, ram_addr_t ram_addr); void pci_register_bar_region(PCIDevice *pci_dev, int region_num, uint8_t attr, MemoryRegion *memory); +pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num); int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 33/39] msix: convert to memory API
The msix table is defined as a subregion, to allow for a BAR that mixes device specific regions with the msix table. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/ivshmem.c| 11 + hw/msix.c | 64 +++ hw/msix.h |6 +--- hw/pci.h|2 +- hw/virtio-pci.c | 16 - hw/virtio-pci.h |1 + 6 files changed, 42 insertions(+), 58 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index f80e7b6..bacba60 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -65,6 +65,7 @@ typedef struct IVShmemState { */ MemoryRegion bar; MemoryRegion ivshmem; +MemoryRegion msix_bar; uint64_t ivshmem_size; /* size of shared memory region */ int shm_fd; /* shared memory file descriptor */ @@ -540,11 +541,11 @@ static void ivshmem_setup_msi(IVShmemState * s) { /* allocate the MSI-X vectors */ -if (!msix_init(s-dev, s-vectors, 1, 0)) { -pci_register_bar(s-dev, 1, - msix_bar_size(s-dev), - PCI_BASE_ADDRESS_SPACE_MEMORY, - msix_mmio_map); +memory_region_init(s-msix_bar, ivshmem-msix, 4096); +if (!msix_init(s-dev, s-vectors, s-msix_bar, 1, 0)) { +pci_register_bar_region(s-dev, 1, +PCI_BASE_ADDRESS_SPACE_MEMORY, +s-msix_bar); IVSHMEM_DPRINTF(msix initialized (%d vectors)\n, s-vectors); } else { IVSHMEM_DPRINTF(msix initialization failed\n); diff --git a/hw/msix.c b/hw/msix.c index e67e700..8536c3f 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -82,7 +82,8 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, return 0; } -static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr) +static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr, + unsigned size) { PCIDevice *dev = opaque; unsigned int offset = addr (MSIX_PAGE_SIZE - 1) ~0x3; @@ -91,12 +92,6 @@ static uint32_t msix_mmio_readl(void *opaque, target_phys_addr_t addr) return pci_get_long(page + offset); } -static uint32_t msix_mmio_read_unallowed(void *opaque, target_phys_addr_t addr) -{ -fprintf(stderr, MSI-X: only dword read is allowed!\n); -return 0; -} - static uint8_t msix_pending_mask(int vector) { return 1 (vector % 8); @@ -169,8 +164,8 @@ void msix_write_config(PCIDevice *dev, uint32_t addr, } } -static void msix_mmio_writel(void *opaque, target_phys_addr_t addr, - uint32_t val) +static void msix_mmio_write(void *opaque, target_phys_addr_t addr, +uint64_t val, unsigned size) { PCIDevice *dev = opaque; unsigned int offset = addr (MSIX_PAGE_SIZE - 1) ~0x3; @@ -179,37 +174,25 @@ static void msix_mmio_writel(void *opaque, target_phys_addr_t addr, msix_handle_mask_update(dev, vector); } -static void msix_mmio_write_unallowed(void *opaque, target_phys_addr_t addr, - uint32_t val) -{ -fprintf(stderr, MSI-X: only dword write is allowed!\n); -} - -static CPUWriteMemoryFunc * const msix_mmio_write[] = { -msix_mmio_write_unallowed, msix_mmio_write_unallowed, msix_mmio_writel -}; - -static CPUReadMemoryFunc * const msix_mmio_read[] = { -msix_mmio_read_unallowed, msix_mmio_read_unallowed, msix_mmio_readl +static const MemoryRegionOps msix_mmio_ops = { +.read = msix_mmio_read, +.write = msix_mmio_write, +.endianness = DEVICE_NATIVE_ENDIAN, +.valid = { +.min_access_size = 4, +.max_access_size = 4, +}, }; -/* Should be called from device's map method. */ -void msix_mmio_map(PCIDevice *d, int region_num, - pcibus_t addr, pcibus_t size, int type) +static void msix_mmio_setup(PCIDevice *d, MemoryRegion *bar) { uint8_t *config = d-config + d-msix_cap; uint32_t table = pci_get_long(config + PCI_MSIX_TABLE); uint32_t offset = table ~(MSIX_PAGE_SIZE - 1); /* TODO: for assigned devices, we'll want to make it possible to map * pending bits separately in case they are in a separate bar. */ -int table_bir = table PCI_MSIX_FLAGS_BIRMASK; -if (table_bir != region_num) -return; -if (size = offset) -return; -cpu_register_physical_memory(addr + offset, size - offset, - d-msix_mmio_index); +memory_region_add_subregion(bar, offset, d-msix_mmio); } static void msix_mask_all(struct PCIDevice *dev, unsigned nentries) @@ -225,6 +208,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries) /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is * modified, it should be retrieved with msix_bar_size. */ int msix_init(struct PCIDevice *dev,
[PATCH v4 34/39] pci: remove pci_register_bar_simple()
Superceded by pci_register_bar_region(). Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c | 17 - hw/pci.h |3 --- 2 files changed, 0 insertions(+), 20 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index c00cbf8..7a70037 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -903,7 +903,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, r-filtered_size = size; r-type = type; r-map_func = map_func; -r-ram_addr = IO_MEM_UNASSIGNED; r-memory = NULL; wmask = ~(size - 1); @@ -923,13 +922,6 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, } } -static void pci_simple_bar_mapfunc(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -cpu_register_physical_memory(addr, size, - pci_dev-io_regions[region_num].ram_addr); -} - static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num, pcibus_t addr, pcibus_t size, int type) @@ -942,15 +934,6 @@ static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num, 1); } -void pci_register_bar_simple(PCIDevice *pci_dev, int region_num, - pcibus_t size, uint8_t attr, ram_addr_t ram_addr) -{ -pci_register_bar(pci_dev, region_num, size, - PCI_BASE_ADDRESS_SPACE_MEMORY | attr, - pci_simple_bar_mapfunc); -pci_dev-io_regions[region_num].ram_addr = ram_addr; -} - void pci_register_bar_region(PCIDevice *pci_dev, int region_num, uint8_t attr, MemoryRegion *memory) { diff --git a/hw/pci.h b/hw/pci.h index a95e2ad..25e28b1 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -93,7 +93,6 @@ typedef struct PCIIORegion { pcibus_t filtered_size; uint8_t type; PCIMapIORegionFunc *map_func; -ram_addr_t ram_addr; MemoryRegion *memory; MemoryRegion *address_space; } PCIIORegion; @@ -204,8 +203,6 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name, void pci_register_bar(PCIDevice *pci_dev, int region_num, pcibus_t size, uint8_t type, PCIMapIORegionFunc *map_func); -void pci_register_bar_simple(PCIDevice *pci_dev, int region_num, - pcibus_t size, uint8_t attr, ram_addr_t ram_addr); void pci_register_bar_region(PCIDevice *pci_dev, int region_num, uint8_t attr, MemoryRegion *memory); pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 39/39] pci: remove support for pre memory API BARs
Not used anymore. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c | 33 ++--- 1 files changed, 2 insertions(+), 31 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 6547d2b..dc7271a 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -848,18 +848,7 @@ static void pci_unregister_io_regions(PCIDevice *pci_dev) r = pci_dev-io_regions[i]; if (!r-size || r-addr == PCI_BAR_UNMAPPED) continue; -if (r-memory) { -memory_region_del_subregion(r-address_space, r-memory); -} else { -if (r-type == PCI_BASE_ADDRESS_SPACE_IO) { -isa_unassign_ioport(r-addr, r-filtered_size); -} else { -cpu_register_physical_memory(pci_to_cpu_addr(pci_dev-bus, - r-addr), - r-filtered_size, - IO_MEM_UNASSIGNED); -} -} +memory_region_del_subregion(r-address_space, r-memory); } } @@ -1058,25 +1047,7 @@ static void pci_update_mappings(PCIDevice *d) /* now do the real mapping */ if (r-addr != PCI_BAR_UNMAPPED) { -if (r-memory) { -memory_region_del_subregion(r-address_space, r-memory); -} else if (r-type PCI_BASE_ADDRESS_SPACE_IO) { -int class; -/* NOTE: specific hack for IDE in PC case: - only one byte must be mapped. */ -class = pci_get_word(d-config + PCI_CLASS_DEVICE); -if (class == 0x0101 r-size == 4) { -isa_unassign_ioport(r-addr + 2, 1); -} else { -isa_unassign_ioport(r-addr, r-filtered_size); -} -} else { -cpu_register_physical_memory(pci_to_cpu_addr(d-bus, - r-addr), - r-filtered_size, - IO_MEM_UNASSIGNED); -qemu_unregister_coalesced_mmio(r-addr, r-filtered_size); -} +memory_region_del_subregion(r-address_space, r-memory); } r-addr = new_addr; r-filtered_size = filtered_size; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 23/39] lsi53c895a: convert to memory API
An optimization that fast-pathed DMA reads from the SCRIPTS memory was removed int the process. Likely it breaks with iommus anyway. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/lsi53c895a.c | 258 --- 1 files changed, 56 insertions(+), 202 deletions(-) diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index e9904c4..0ab8c78 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -185,9 +185,9 @@ typedef struct lsi_request { typedef struct { PCIDevice dev; -int mmio_io_addr; -int ram_io_addr; -uint32_t script_ram_base; +MemoryRegion mmio_io; +MemoryRegion ram_io; +MemoryRegion io_io; int carry; /* ??? Should this be an a visible register somewhere? */ int status; @@ -391,10 +391,9 @@ static inline uint32_t read_dword(LSIState *s, uint32_t addr) { uint32_t buf; -/* Optimize reading from SCRIPTS RAM. */ -if ((addr 0xe000) == s-script_ram_base) { -return s-script_ram[(addr 0x1fff) 2]; -} +/* XXX: an optimization here used to fast-path the read from scripts + * memory. But that bypasses any iommu. + */ cpu_physical_memory_read(addr, (uint8_t *)buf, 4); return cpu_to_le32(buf); } @@ -1899,232 +1898,90 @@ static void lsi_reg_writeb(LSIState *s, int offset, uint8_t val) #undef CASE_SET_REG32 } -static void lsi_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +static void lsi_mmio_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { LSIState *s = opaque; lsi_reg_writeb(s, addr 0xff, val); } -static void lsi_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -LSIState *s = opaque; - -addr = 0xff; -lsi_reg_writeb(s, addr, val 0xff); -lsi_reg_writeb(s, addr + 1, (val 8) 0xff); -} - -static void lsi_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -LSIState *s = opaque; - -addr = 0xff; -lsi_reg_writeb(s, addr, val 0xff); -lsi_reg_writeb(s, addr + 1, (val 8) 0xff); -lsi_reg_writeb(s, addr + 2, (val 16) 0xff); -lsi_reg_writeb(s, addr + 3, (val 24) 0xff); -} - -static uint32_t lsi_mmio_readb(void *opaque, target_phys_addr_t addr) +static uint64_t lsi_mmio_read(void *opaque, target_phys_addr_t addr, + unsigned size) { LSIState *s = opaque; return lsi_reg_readb(s, addr 0xff); } -static uint32_t lsi_mmio_readw(void *opaque, target_phys_addr_t addr) -{ -LSIState *s = opaque; -uint32_t val; - -addr = 0xff; -val = lsi_reg_readb(s, addr); -val |= lsi_reg_readb(s, addr + 1) 8; -return val; -} - -static uint32_t lsi_mmio_readl(void *opaque, target_phys_addr_t addr) -{ -LSIState *s = opaque; -uint32_t val; -addr = 0xff; -val = lsi_reg_readb(s, addr); -val |= lsi_reg_readb(s, addr + 1) 8; -val |= lsi_reg_readb(s, addr + 2) 16; -val |= lsi_reg_readb(s, addr + 3) 24; -return val; -} - -static CPUReadMemoryFunc * const lsi_mmio_readfn[3] = { -lsi_mmio_readb, -lsi_mmio_readw, -lsi_mmio_readl, -}; - -static CPUWriteMemoryFunc * const lsi_mmio_writefn[3] = { -lsi_mmio_writeb, -lsi_mmio_writew, -lsi_mmio_writel, +static const MemoryRegionOps lsi_mmio_ops = { +.read = lsi_mmio_read, +.write = lsi_mmio_write, +.endianness = DEVICE_NATIVE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 1, +}, }; -static void lsi_ram_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +static void lsi_ram_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { LSIState *s = opaque; uint32_t newval; +uint32_t mask; int shift; -addr = 0x1fff; newval = s-script_ram[addr 2]; shift = (addr 3) * 8; -newval = ~(0xff shift); +mask = ((uint64_t)1 (size * 8)) - 1; +newval = ~(mask shift); newval |= val shift; s-script_ram[addr 2] = newval; } -static void lsi_ram_writew(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -LSIState *s = opaque; -uint32_t newval; - -addr = 0x1fff; -newval = s-script_ram[addr 2]; -if (addr 2) { -newval = (newval 0x) | (val 16); -} else { -newval = (newval 0x) | val; -} -s-script_ram[addr 2] = newval; -} - - -static void lsi_ram_writel(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -LSIState *s = opaque; - -addr = 0x1fff; -s-script_ram[addr 2] = val; -} - -static uint32_t lsi_ram_readb(void *opaque, target_phys_addr_t addr) +static uint64_t lsi_ram_read(void *opaque, target_phys_addr_t addr, + unsigned size) { LSIState *s = opaque; uint32_t val; +uint32_t mask; -addr =
[PATCH v4 36/39] pci: remove pci_register_bar()
Superceded by pci_register_bar_region(). The implementations are folded together. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c | 42 +- hw/pci.h |3 --- 2 files changed, 17 insertions(+), 28 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index f885d4e..62b34d4 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -881,13 +881,25 @@ static int pci_unregister_device(DeviceState *dev) return 0; } -void pci_register_bar(PCIDevice *pci_dev, int region_num, -pcibus_t size, uint8_t type, -PCIMapIORegionFunc *map_func) +static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num, + pcibus_t addr, pcibus_t size, + int type) +{ +PCIIORegion *r = pci_dev-io_regions[region_num]; + +memory_region_add_subregion_overlap(r-address_space, +addr, +r-memory, +1); +} + +void pci_register_bar_region(PCIDevice *pci_dev, int region_num, + uint8_t type, MemoryRegion *memory) { PCIIORegion *r; uint32_t addr; uint64_t wmask; +pcibus_t size = memory_region_size(memory); assert(region_num = 0); assert(region_num PCI_NUM_REGIONS); @@ -902,7 +914,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, r-size = size; r-filtered_size = size; r-type = type; -r-map_func = map_func; +r-map_func = pci_simple_bar_mapfunc_region; r-memory = NULL; wmask = ~(size - 1); @@ -920,29 +932,9 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, pci_set_long(pci_dev-wmask + addr, wmask 0x); pci_set_long(pci_dev-cmask + addr, 0x); } -} - -static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, - int type) -{ -PCIIORegion *r = pci_dev-io_regions[region_num]; - -memory_region_add_subregion_overlap(r-address_space, -addr, -r-memory, -1); -} - -void pci_register_bar_region(PCIDevice *pci_dev, int region_num, - uint8_t attr, MemoryRegion *memory) -{ -pci_register_bar(pci_dev, region_num, memory_region_size(memory), - attr, - pci_simple_bar_mapfunc_region); pci_dev-io_regions[region_num].memory = memory; pci_dev-io_regions[region_num].address_space -= attr PCI_BASE_ADDRESS_SPACE_IO += type PCI_BASE_ADDRESS_SPACE_IO ? pci_dev-bus-address_space_io : pci_dev-bus-address_space_mem; } diff --git a/hw/pci.h b/hw/pci.h index 6e2bcea..8028176 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -201,9 +201,6 @@ PCIDevice *pci_register_device(PCIBus *bus, const char *name, PCIConfigReadFunc *config_read, PCIConfigWriteFunc *config_write); -void pci_register_bar(PCIDevice *pci_dev, int region_num, -pcibus_t size, uint8_t type, -PCIMapIORegionFunc *map_func); void pci_register_bar_region(PCIDevice *pci_dev, int region_num, uint8_t attr, MemoryRegion *memory); pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 37/39] pci: fold BAR mapping function into its caller
There is only one function, so no need for a function pointer. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c | 25 + hw/pci.h |1 - 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 62b34d4..aa17395 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -881,18 +881,6 @@ static int pci_unregister_device(DeviceState *dev) return 0; } -static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, - int type) -{ -PCIIORegion *r = pci_dev-io_regions[region_num]; - -memory_region_add_subregion_overlap(r-address_space, -addr, -r-memory, -1); -} - void pci_register_bar_region(PCIDevice *pci_dev, int region_num, uint8_t type, MemoryRegion *memory) { @@ -914,7 +902,6 @@ void pci_register_bar_region(PCIDevice *pci_dev, int region_num, r-size = size; r-filtered_size = size; r-type = type; -r-map_func = pci_simple_bar_mapfunc_region; r-memory = NULL; wmask = ~(size - 1); @@ -1102,10 +1089,16 @@ static void pci_update_mappings(PCIDevice *d) * addr (size - 1) != 0. */ if (r-type PCI_BASE_ADDRESS_SPACE_IO) { -r-map_func(d, i, r-addr, r-filtered_size, r-type); +memory_region_add_subregion_overlap(r-address_space, +r-addr, +r-memory, +1); } else { -r-map_func(d, i, pci_to_cpu_addr(d-bus, r-addr), -r-filtered_size, r-type); +memory_region_add_subregion_overlap(r-address_space, +pci_to_cpu_addr(d-bus, +r-addr), +r-memory, +1); } } } diff --git a/hw/pci.h b/hw/pci.h index 8028176..8d1662a 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -92,7 +92,6 @@ typedef struct PCIIORegion { pcibus_t size; pcibus_t filtered_size; uint8_t type; -PCIMapIORegionFunc *map_func; MemoryRegion *memory; MemoryRegion *address_space; } PCIIORegion; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 30/39] ehci: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/usb-ehci.c | 36 +--- 1 files changed, 9 insertions(+), 27 deletions(-) diff --git a/hw/usb-ehci.c b/hw/usb-ehci.c index 2b43895..6ef7798 100644 --- a/hw/usb-ehci.c +++ b/hw/usb-ehci.c @@ -370,8 +370,7 @@ struct EHCIState { PCIDevice dev; USBBus bus; qemu_irq irq; -target_phys_addr_t mem_base; -int mem; +MemoryRegion mem; int companion_count; /* properties */ @@ -2179,29 +2178,15 @@ static void ehci_frame_timer(void *opaque) qemu_mod_timer(ehci-frame_timer, expire_time); } -static CPUReadMemoryFunc *ehci_readfn[3]={ -ehci_mem_readb, -ehci_mem_readw, -ehci_mem_readl -}; -static CPUWriteMemoryFunc *ehci_writefn[3]={ -ehci_mem_writeb, -ehci_mem_writew, -ehci_mem_writel +static const MemoryRegionOps ehci_mem_ops = { +.old_mmio = { +.read = { ehci_mem_readb, ehci_mem_readw, ehci_mem_readl }, +.write = { ehci_mem_writeb, ehci_mem_writew, ehci_mem_writel }, +}, +.endianness = DEVICE_LITTLE_ENDIAN, }; -static void ehci_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -EHCIState *s =(EHCIState *)pci_dev; - -DPRINTF(ehci_map: region %d, addr %08 PRIx64 , size % PRId64 , s-mem %08X\n, -region_num, addr, size, s-mem); -s-mem_base = addr; -cpu_register_physical_memory(addr, size, s-mem); -} - static int usb_ehci_initfn(PCIDevice *dev); static USBPortOps ehci_port_ops = { @@ -2316,11 +2301,8 @@ static int usb_ehci_initfn(PCIDevice *dev) qemu_register_reset(ehci_reset, s); -s-mem = cpu_register_io_memory(ehci_readfn, ehci_writefn, s, -DEVICE_LITTLE_ENDIAN); - -pci_register_bar(s-dev, 0, MMIO_SIZE, PCI_BASE_ADDRESS_SPACE_MEMORY, -ehci_map); +memory_region_init_io(s-mem, ehci_mem_ops, s, ehci, MMIO_SIZE); +pci_register_bar_region(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-mem); fprintf(stderr, *** EHCI support is under development ***\n); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 35/39] pci: convert pci rom to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c | 20 +++- hw/pci.h |3 ++- 2 files changed, 9 insertions(+), 14 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 7a70037..f885d4e 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1855,11 +1855,6 @@ static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id, return next; } -static void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr, pcibus_t size, int type) -{ -cpu_register_physical_memory(addr, size, pdev-rom_offset); -} - /* Patch the PCI vendor and device ids in a PCI rom image if necessary. This is needed for an option rom which is used for more than one device. */ static void pci_patch_ids(PCIDevice *pdev, uint8_t *ptr, int size) @@ -1963,9 +1958,9 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) snprintf(name, sizeof(name), %s.rom, pdev-qdev.info-vmsd-name); else snprintf(name, sizeof(name), %s.rom, pdev-qdev.info-name); -pdev-rom_offset = qemu_ram_alloc(pdev-qdev, name, size); - -ptr = qemu_get_ram_ptr(pdev-rom_offset); +pdev-has_rom = true; +memory_region_init_ram(pdev-rom, pdev-qdev, name, size); +ptr = memory_region_get_ram_ptr(pdev-rom); load_image(path, ptr); qemu_free(path); @@ -1976,19 +1971,18 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) qemu_put_ram_ptr(ptr); -pci_register_bar(pdev, PCI_ROM_SLOT, size, - 0, pci_map_option_rom); +pci_register_bar_region(pdev, PCI_ROM_SLOT, 0, pdev-rom); return 0; } static void pci_del_option_rom(PCIDevice *pdev) { -if (!pdev-rom_offset) +if (!pdev-has_rom) return; -qemu_ram_free(pdev-rom_offset); -pdev-rom_offset = 0; +memory_region_destroy(pdev-rom); +pdev-has_rom = false; } /* diff --git a/hw/pci.h b/hw/pci.h index 25e28b1..6e2bcea 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -191,7 +191,8 @@ struct PCIDevice { /* Location of option rom */ char *romfile; -ram_addr_t rom_offset; +bool has_rom; +MemoryRegion rom; uint32_t rom_bar; }; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 31/39] uhci: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/usb-uhci.c | 42 -- 1 files changed, 28 insertions(+), 14 deletions(-) diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c index 824e3a5..ea38169 100644 --- a/hw/usb-uhci.c +++ b/hw/usb-uhci.c @@ -129,6 +129,7 @@ typedef struct UHCIPort { struct UHCIState { PCIDevice dev; +MemoryRegion io_bar; USBBus bus; /* Note unused when we're a companion controller */ uint16_t cmd; /* cmd register */ uint16_t status; @@ -1096,18 +1097,19 @@ static void uhci_frame_timer(void *opaque) qemu_mod_timer(s-frame_timer, s-expire_time); } -static void uhci_map(PCIDevice *pci_dev, int region_num, -pcibus_t addr, pcibus_t size, int type) -{ -UHCIState *s = (UHCIState *)pci_dev; - -register_ioport_write(addr, 32, 2, uhci_ioport_writew, s); -register_ioport_read(addr, 32, 2, uhci_ioport_readw, s); -register_ioport_write(addr, 32, 4, uhci_ioport_writel, s); -register_ioport_read(addr, 32, 4, uhci_ioport_readl, s); -register_ioport_write(addr, 32, 1, uhci_ioport_writeb, s); -register_ioport_read(addr, 32, 1, uhci_ioport_readb, s); -} +static const MemoryRegionPortio uhci_portio[] = { +{ 0, 32, 2, .write = uhci_ioport_writew, }, +{ 0, 32, 2, .read = uhci_ioport_readw, }, +{ 0, 32, 4, .write = uhci_ioport_writel, }, +{ 0, 32, 4, .read = uhci_ioport_readl, }, +{ 0, 32, 1, .write = uhci_ioport_writeb, }, +{ 0, 32, 1, .read = uhci_ioport_readb, }, +PORTIO_END_OF_LIST() +}; + +static const MemoryRegionOps uhci_ioport_ops = { +.old_portio = uhci_portio, +}; static USBPortOps uhci_port_ops = { .attach = uhci_attach, @@ -1154,10 +1156,11 @@ static int usb_uhci_common_initfn(PCIDevice *dev) qemu_register_reset(uhci_reset, s); +memory_region_init_io(s-io_bar, uhci_ioport_ops, s, uhci, 0x20); /* Use region 4 for consistency with real hardware. BSD guests seem to rely on this. */ -pci_register_bar(s-dev, 4, 0x20, - PCI_BASE_ADDRESS_SPACE_IO, uhci_map); +pci_register_bar_region(s-dev, 4, +PCI_BASE_ADDRESS_SPACE_IO, s-io_bar); return 0; } @@ -1177,6 +1180,14 @@ static int usb_uhci_vt82c686b_initfn(PCIDevice *dev) return usb_uhci_common_initfn(dev); } +static int usb_uhci_exit(PCIDevice *dev) +{ +UHCIState *s = DO_UPCAST(UHCIState, dev, dev); + +memory_region_destroy(s-io_bar); +return 0; +} + static Property uhci_properties[] = { DEFINE_PROP_STRING(masterbus, UHCIState, masterbus), DEFINE_PROP_UINT32(firstport, UHCIState, firstport, 0), @@ -1189,6 +1200,7 @@ static PCIDeviceInfo uhci_info[] = { .qdev.size= sizeof(UHCIState), .qdev.vmsd= vmstate_uhci, .init = usb_uhci_common_initfn, +.exit = usb_uhci_exit, .vendor_id= PCI_VENDOR_ID_INTEL, .device_id= PCI_DEVICE_ID_INTEL_82371SB_2, .revision = 0x01, @@ -1199,6 +1211,7 @@ static PCIDeviceInfo uhci_info[] = { .qdev.size= sizeof(UHCIState), .qdev.vmsd= vmstate_uhci, .init = usb_uhci_common_initfn, +.exit = usb_uhci_exit, .vendor_id= PCI_VENDOR_ID_INTEL, .device_id= PCI_DEVICE_ID_INTEL_82371AB_2, .revision = 0x01, @@ -1209,6 +1222,7 @@ static PCIDeviceInfo uhci_info[] = { .qdev.size= sizeof(UHCIState), .qdev.vmsd= vmstate_uhci, .init = usb_uhci_vt82c686b_initfn, +.exit = usb_uhci_exit, .vendor_id= PCI_VENDOR_ID_VIA, .device_id= PCI_DEVICE_ID_VIA_UHCI, .revision = 0x01, -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 18/39] ide: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/ide/cmd646.c | 208 +++ hw/ide/pci.c| 25 --- hw/ide/pci.h| 19 - hw/ide/piix.c | 64 + hw/ide/via.c| 65 + 5 files changed, 261 insertions(+), 120 deletions(-) diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c index 56302b5..13e6f2f 100644 --- a/hw/ide/cmd646.c +++ b/hw/ide/cmd646.c @@ -44,35 +44,95 @@ static void cmd646_update_irq(PCIIDEState *d); -static void ide_map(PCIDevice *pci_dev, int region_num, -pcibus_t addr, pcibus_t size, int type) +static uint64_t cmd646_cmd_read(void *opaque, target_phys_addr_t addr, +unsigned size) { -PCIIDEState *d = DO_UPCAST(PCIIDEState, dev, pci_dev); -IDEBus *bus; - -if (region_num = 3) { -bus = d-bus[(region_num 1)]; -if (region_num 1) { -register_ioport_read(addr + 2, 1, 1, ide_status_read, bus); -register_ioport_write(addr + 2, 1, 1, ide_cmd_write, bus); +CMD646BAR *cmd646bar = opaque; + +if (addr != 2 || size != 1) { +return ((uint64_t)1 (size * 8)) - 1; +} +return ide_status_read(cmd646bar-bus, addr + 2); +} + +static void cmd646_cmd_write(void *opaque, target_phys_addr_t addr, + uint64_t data, unsigned size) +{ +CMD646BAR *cmd646bar = opaque; + +if (addr != 2 || size != 1) { +return; +} +ide_cmd_write(cmd646bar-bus, addr + 2, data); +} + +static MemoryRegionOps cmd646_cmd_ops = { +.read = cmd646_cmd_read, +.write = cmd646_cmd_write, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static uint64_t cmd646_data_read(void *opaque, target_phys_addr_t addr, + unsigned size) +{ +CMD646BAR *cmd646bar = opaque; + +if (size == 1) { +return ide_ioport_read(cmd646bar-bus, addr); +} else if (addr == 0) { +if (size == 2) { +return ide_data_readw(cmd646bar-bus, addr); } else { -register_ioport_write(addr, 8, 1, ide_ioport_write, bus); -register_ioport_read(addr, 8, 1, ide_ioport_read, bus); - -/* data ports */ -register_ioport_write(addr, 2, 2, ide_data_writew, bus); -register_ioport_read(addr, 2, 2, ide_data_readw, bus); -register_ioport_write(addr, 4, 4, ide_data_writel, bus); -register_ioport_read(addr, 4, 4, ide_data_readl, bus); +return ide_data_readl(cmd646bar-bus, addr); } } +return ((uint64_t)1 (size * 8)) - 1; } -static uint32_t bmdma_readb_common(PCIIDEState *pci_dev, BMDMAState *bm, - uint32_t addr) +static void cmd646_data_write(void *opaque, target_phys_addr_t addr, + uint64_t data, unsigned size) { +CMD646BAR *cmd646bar = opaque; + +if (size == 1) { +return ide_ioport_write(cmd646bar-bus, addr, data); +} else if (addr == 0) { +if (size == 2) { +return ide_data_writew(cmd646bar-bus, addr, data); +} else { +return ide_data_writel(cmd646bar-bus, addr, data); +} +} +} + +static MemoryRegionOps cmd646_data_ops = { +.read = cmd646_data_read, +.write = cmd646_data_write, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + +static void setup_cmd646_bar(PCIIDEState *d, int bus_num) +{ +IDEBus *bus = d-bus[bus_num]; +CMD646BAR *bar = d-cmd646_bar[bus_num]; + +bar-bus = bus; +bar-pci_dev = d; +memory_region_init_io(bar-cmd, cmd646_cmd_ops, bar, cmd646-cmd, 4); +memory_region_init_io(bar-data, cmd646_data_ops, bar, cmd646-data, 8); +} + +static uint64_t bmdma_read(void *opaque, target_phys_addr_t addr, + unsigned size) +{ +BMDMAState *bm = opaque; +PCIIDEState *pci_dev = bm-pci_dev; uint32_t val; +if (size != 1) { +return ((uint64_t)1 (size * 8)) - 1; +} + switch(addr 3) { case 0: val = bm-cmd; @@ -100,31 +160,22 @@ static uint32_t bmdma_readb_common(PCIIDEState *pci_dev, BMDMAState *bm, return val; } -static uint32_t bmdma_readb_0(void *opaque, uint32_t addr) +static void bmdma_write(void *opaque, target_phys_addr_t addr, +uint64_t val, unsigned size) { -PCIIDEState *pci_dev = opaque; -BMDMAState *bm = pci_dev-bmdma[0]; - -return bmdma_readb_common(pci_dev, bm, addr); -} +BMDMAState *bm = opaque; +PCIIDEState *pci_dev = bm-pci_dev; -static uint32_t bmdma_readb_1(void *opaque, uint32_t addr) -{ -PCIIDEState *pci_dev = opaque; -BMDMAState *bm = pci_dev-bmdma[1]; - -return bmdma_readb_common(pci_dev, bm, addr); -} +if (size != 1) { +return; +} -static void bmdma_writeb_common(PCIIDEState *pci_dev, BMDMAState *bm,
[PATCH v4 28/39] isa-mmio: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/isa.h |2 ++ hw/isa_mmio.c | 29 ++--- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/hw/isa.h b/hw/isa.h index d2b6126..f1f2181 100644 --- a/hw/isa.h +++ b/hw/isa.h @@ -4,6 +4,7 @@ /* ISA bus */ #include ioport.h +#include memory.h #include qdev.h typedef struct ISABus ISABus; @@ -37,6 +38,7 @@ ISADevice *isa_create_simple(const char *name); extern target_phys_addr_t isa_mem_base; +void isa_mmio_setup(MemoryRegion *mr, target_phys_addr_t size); void isa_mmio_init(target_phys_addr_t base, target_phys_addr_t size); /* dma.c */ diff --git a/hw/isa_mmio.c b/hw/isa_mmio.c index ca957fb..3d2af1a 100644 --- a/hw/isa_mmio.c +++ b/hw/isa_mmio.c @@ -24,6 +24,7 @@ #include hw.h #include isa.h +#include exec-memory.h static void isa_mmio_writeb (void *opaque, target_phys_addr_t addr, uint32_t val) @@ -58,25 +59,23 @@ static uint32_t isa_mmio_readl(void *opaque, target_phys_addr_t addr) return cpu_inl(addr IOPORTS_MASK); } -static CPUWriteMemoryFunc * const isa_mmio_write[] = { -isa_mmio_writeb, -isa_mmio_writew, -isa_mmio_writel, +static const MemoryRegionOps isa_mmio_ops = { +.old_mmio = { +.write = { isa_mmio_writeb, isa_mmio_writew, isa_mmio_writel }, +.read = { isa_mmio_readb, isa_mmio_readw, isa_mmio_readl, }, +}, +.endianness = DEVICE_LITTLE_ENDIAN, }; -static CPUReadMemoryFunc * const isa_mmio_read[] = { -isa_mmio_readb, -isa_mmio_readw, -isa_mmio_readl, -}; +void isa_mmio_setup(MemoryRegion *mr, target_phys_addr_t size) +{ +memory_region_init_io(mr, isa_mmio_ops, NULL, isa-mmio, size); +} void isa_mmio_init(target_phys_addr_t base, target_phys_addr_t size) { -int isa_mmio_iomemtype; +MemoryRegion *mr = qemu_malloc(sizeof(*mr)); -isa_mmio_iomemtype = cpu_register_io_memory(isa_mmio_read, -isa_mmio_write, -NULL, -DEVICE_LITTLE_ENDIAN); -cpu_register_physical_memory(base, size, isa_mmio_iomemtype); +isa_mmio_setup(mr, size); +memory_region_add_subregion(get_system_memory(), base, mr); } -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 26/39] pcnet: convert to memory API
Also related chips. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/lance.c | 31 ++- hw/pcnet-pci.c | 74 +-- hw/pcnet.h |4 ++- 3 files changed, 61 insertions(+), 48 deletions(-) diff --git a/hw/lance.c b/hw/lance.c index ddb1cbb..8e20360 100644 --- a/hw/lance.c +++ b/hw/lance.c @@ -55,8 +55,8 @@ static void parent_lance_reset(void *opaque, int irq, int level) pcnet_h_reset(d-state); } -static void lance_mem_writew(void *opaque, target_phys_addr_t addr, - uint32_t val) +static void lance_mem_write(void *opaque, target_phys_addr_t addr, +uint64_t val, unsigned size) { SysBusPCNetState *d = opaque; @@ -64,7 +64,8 @@ static void lance_mem_writew(void *opaque, target_phys_addr_t addr, pcnet_ioport_writew(d-state, addr, val 0x); } -static uint32_t lance_mem_readw(void *opaque, target_phys_addr_t addr) +static uint64_t lance_mem_read(void *opaque, target_phys_addr_t addr, + unsigned size) { SysBusPCNetState *d = opaque; uint32_t val; @@ -74,16 +75,14 @@ static uint32_t lance_mem_readw(void *opaque, target_phys_addr_t addr) return val 0x; } -static CPUReadMemoryFunc * const lance_mem_read[3] = { -NULL, -lance_mem_readw, -NULL, -}; - -static CPUWriteMemoryFunc * const lance_mem_write[3] = { -NULL, -lance_mem_writew, -NULL, +static const MemoryRegionOps lance_mem_ops = { +.read = lance_mem_read, +.write = lance_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +.valid = { +.min_access_size = 2, +.max_access_size = 2, +}, }; static void lance_cleanup(VLANClientState *nc) @@ -117,13 +116,11 @@ static int lance_init(SysBusDevice *dev) SysBusPCNetState *d = FROM_SYSBUS(SysBusPCNetState, dev); PCNetState *s = d-state; -s-mmio_index = -cpu_register_io_memory(lance_mem_read, lance_mem_write, d, - DEVICE_NATIVE_ENDIAN); +memory_region_init_io(s-mmio, lance_mem_ops, s, lance-mmio, 4); qdev_init_gpio_in(dev-qdev, parent_lance_reset, 1); -sysbus_init_mmio(dev, 4, s-mmio_index); +sysbus_init_mmio_region(dev, s-mmio); sysbus_init_irq(dev, s-irq); diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c index 216cf81..a25f565 100644 --- a/hw/pcnet-pci.c +++ b/hw/pcnet-pci.c @@ -46,6 +46,7 @@ typedef struct { PCIDevice pci_dev; PCNetState state; +MemoryRegion io_bar; } PCIPCNetState; static void pcnet_aprom_writeb(void *opaque, uint32_t addr, uint32_t val) @@ -69,25 +70,41 @@ static uint32_t pcnet_aprom_readb(void *opaque, uint32_t addr) return val; } -static void pcnet_ioport_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) +static uint64_t pcnet_ioport_read(void *opaque, target_phys_addr_t addr, + unsigned size) { -PCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev, pci_dev)-state; +PCNetState *d = opaque; -#ifdef PCNET_DEBUG_IO -printf(pcnet_ioport_map addr=0x%04FMT_PCIBUS size=0x%04FMT_PCIBUS\n, - addr, size); -#endif +if (addr 16 size == 1) { +return pcnet_aprom_readb(d, addr); +} else if (addr = 0x10 addr 0x20 size == 2) { +return pcnet_ioport_readw(d, addr); +} else if (addr = 0x10 addr 0x20 size == 4) { +return pcnet_ioport_readl(d, addr); +} +return ((uint64_t)1 (size * 8)) - 1; +} -register_ioport_write(addr, 16, 1, pcnet_aprom_writeb, d); -register_ioport_read(addr, 16, 1, pcnet_aprom_readb, d); +static void pcnet_ioport_write(void *opaque, target_phys_addr_t addr, + uint64_t data, unsigned size) +{ +PCNetState *d = opaque; -register_ioport_write(addr + 0x10, 0x10, 2, pcnet_ioport_writew, d); -register_ioport_read(addr + 0x10, 0x10, 2, pcnet_ioport_readw, d); -register_ioport_write(addr + 0x10, 0x10, 4, pcnet_ioport_writel, d); -register_ioport_read(addr + 0x10, 0x10, 4, pcnet_ioport_readl, d); +if (addr 16 size == 1) { +return pcnet_aprom_writeb(d, addr, data); +} else if (addr = 0x10 addr 0x20 size == 2) { +return pcnet_ioport_writew(d, addr, data); +} else if (addr = 0x10 addr 0x20 size == 4) { +return pcnet_ioport_writel(d, addr, data); +} } +static const MemoryRegionOps pcnet_io_ops = { +.read = pcnet_ioport_read, +.write = pcnet_ioport_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + static void pcnet_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) { PCNetState *d = opaque; @@ -202,16 +219,12 @@ static const VMStateDescription vmstate_pci_pcnet = { /* PCI interface */ -static CPUWriteMemoryFunc * const
[PATCH v4 29/39] sun4u: convert to memory API
fixes memory leak on repeated BAR map/unmap Reviewed-by: Richard Henderson r...@twiddle.net Signed-off-by: Avi Kivity a...@redhat.com --- hw/sun4u.c | 55 +-- 1 files changed, 25 insertions(+), 30 deletions(-) diff --git a/hw/sun4u.c b/hw/sun4u.c index d7dcaf0..cb76031 100644 --- a/hw/sun4u.c +++ b/hw/sun4u.c @@ -91,6 +91,12 @@ struct hwdef { uint64_t console_serial_base; }; +typedef struct EbusState { +PCIDevice pci_dev; +MemoryRegion bar0; +MemoryRegion bar1; +} EbusState; + int DMA_get_channel_mode (int nchan) { return 0; @@ -518,21 +524,6 @@ void cpu_tick_set_limit(CPUTimer *timer, uint64_t limit) } } -static void ebus_mmio_mapfunc(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -EBUS_DPRINTF(Mapping region %d registers at % FMT_PCIBUS \n, - region_num, addr); -switch (region_num) { -case 0: -isa_mmio_init(addr, 0x100); -break; -case 1: -isa_mmio_init(addr, 0x80); -break; -} -} - static void dummy_isa_irq_handler(void *opaque, int n, int level) { } @@ -549,27 +540,31 @@ pci_ebus_init(PCIBus *bus, int devfn) } static int -pci_ebus_init1(PCIDevice *s) +pci_ebus_init1(PCIDevice *pci_dev) { -isa_bus_new(s-qdev); +EbusState *s = DO_UPCAST(EbusState, pci_dev, pci_dev); + +isa_bus_new(pci_dev-qdev); -s-config[0x04] = 0x06; // command = bus master, pci mem -s-config[0x05] = 0x00; -s-config[0x06] = 0xa0; // status = fast back-to-back, 66MHz, no error -s-config[0x07] = 0x03; // status = medium devsel -s-config[0x09] = 0x00; // programming i/f -s-config[0x0D] = 0x0a; // latency_timer +pci_dev-config[0x04] = 0x06; // command = bus master, pci mem +pci_dev-config[0x05] = 0x00; +pci_dev-config[0x06] = 0xa0; // status = fast back-to-back, 66MHz, no error +pci_dev-config[0x07] = 0x03; // status = medium devsel +pci_dev-config[0x09] = 0x00; // programming i/f +pci_dev-config[0x0D] = 0x0a; // latency_timer -pci_register_bar(s, 0, 0x100, PCI_BASE_ADDRESS_SPACE_MEMORY, - ebus_mmio_mapfunc); -pci_register_bar(s, 1, 0x80, PCI_BASE_ADDRESS_SPACE_MEMORY, - ebus_mmio_mapfunc); +isa_mmio_setup(s-bar0, 0x100); +pci_register_bar_region(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, +s-bar0); +isa_mmio_setup(s-bar1, 0x80); +pci_register_bar_region(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, +s-bar1); return 0; } static PCIDeviceInfo ebus_info = { .qdev.name = ebus, -.qdev.size = sizeof(PCIDevice), +.qdev.size = sizeof(EbusState), .init = pci_ebus_init1, .vendor_id = PCI_VENDOR_ID_SUN, .device_id = PCI_DEVICE_ID_SUN_EBUS, -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 27/39] i6300esb: convert to memory API
Also add missing destructor. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/wdt_i6300esb.c | 43 +-- 1 files changed, 29 insertions(+), 14 deletions(-) diff --git a/hw/wdt_i6300esb.c b/hw/wdt_i6300esb.c index 53786ce..abc2e17 100644 --- a/hw/wdt_i6300esb.c +++ b/hw/wdt_i6300esb.c @@ -66,6 +66,7 @@ /* Device state. */ struct I6300State { PCIDevice dev; +MemoryRegion io_mem; int reboot_enabled; /* Reboot on timer expiry. The real action * performed depends on the -watchdog-action @@ -355,6 +356,22 @@ static void i6300esb_mem_writel(void *vp, target_phys_addr_t addr, uint32_t val) } } +static const MemoryRegionOps i6300esb_ops = { +.old_mmio = { +.read = { +i6300esb_mem_readb, +i6300esb_mem_readw, +i6300esb_mem_readl, +}, +.write = { +i6300esb_mem_writeb, +i6300esb_mem_writew, +i6300esb_mem_writel, +}, +}, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + static const VMStateDescription vmstate_i6300esb = { .name = i6300esb_wdt, .version_id = sizeof(I6300State), @@ -381,31 +398,28 @@ static const VMStateDescription vmstate_i6300esb = { static int i6300esb_init(PCIDevice *dev) { I6300State *d = DO_UPCAST(I6300State, dev, dev); -int io_mem; -static CPUReadMemoryFunc * const mem_read[3] = { -i6300esb_mem_readb, -i6300esb_mem_readw, -i6300esb_mem_readl, -}; -static CPUWriteMemoryFunc * const mem_write[3] = { -i6300esb_mem_writeb, -i6300esb_mem_writew, -i6300esb_mem_writel, -}; i6300esb_debug(I6300State = %p\n, d); d-timer = qemu_new_timer_ns(vm_clock, i6300esb_timer_expired, d); d-previous_reboot_flag = 0; -io_mem = cpu_register_io_memory(mem_read, mem_write, d, -DEVICE_NATIVE_ENDIAN); -pci_register_bar_simple(d-dev, 0, 0x10, 0, io_mem); +memory_region_init_io(d-io_mem, i6300esb_ops, d, i6300esb, 0x10); +pci_register_bar_region(d-dev, 0, 0, d-io_mem); /* qemu_register_coalesced_mmio (addr, 0x10); ? */ return 0; } +static int i6300esb_exit(PCIDevice *dev) +{ +I6300State *d = DO_UPCAST(I6300State, dev, dev); + +memory_region_destroy(d-io_mem); + +return 0; +} + static WatchdogTimerModel model = { .wdt_name = i6300esb, .wdt_description = Intel 6300ESB, @@ -419,6 +433,7 @@ static PCIDeviceInfo i6300esb_info = { .config_read = i6300esb_config_read, .config_write = i6300esb_config_write, .init = i6300esb_init, +.exit = i6300esb_exit, .vendor_id= PCI_VENDOR_ID_INTEL, .device_id= PCI_DEVICE_ID_INTEL_ESB_9, .class_id = PCI_CLASS_SYSTEM_OTHER, -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 19/39] ivshmem: convert to memory API
excluding msix. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/ivshmem.c | 148 -- 1 files changed, 50 insertions(+), 98 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 3055dd2..f80e7b6 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -56,11 +56,15 @@ typedef struct IVShmemState { CharDriverState **eventfd_chr; CharDriverState *server_chr; -int ivshmem_mmio_io_addr; +MemoryRegion ivshmem_mmio; pcibus_t mmio_addr; -pcibus_t shm_pci_addr; -uint64_t ivshmem_offset; +/* We might need to register the BAR before we actually have the memory. + * So prepare a container MemoryRegion for the BAR immediately and + * add a subregion when we have the memory. + */ +MemoryRegion bar; +MemoryRegion ivshmem; uint64_t ivshmem_size; /* size of shared memory region */ int shm_fd; /* shared memory file descriptor */ @@ -96,23 +100,6 @@ static inline bool is_power_of_two(uint64_t x) { return (x (x - 1)) == 0; } -static void ivshmem_map(PCIDevice *pci_dev, int region_num, -pcibus_t addr, pcibus_t size, int type) -{ -IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); - -s-shm_pci_addr = addr; - -if (s-ivshmem_offset 0) { -cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, -s-ivshmem_offset); -} - -IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = % -PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size); - -} - /* accessing registers - based on rtl8139 */ static void ivshmem_update_irq(IVShmemState *s, int val) { @@ -168,15 +155,8 @@ static uint32_t ivshmem_IntrStatus_read(IVShmemState *s) return ret; } -static void ivshmem_io_writew(void *opaque, target_phys_addr_t addr, -uint32_t val) -{ - -IVSHMEM_DPRINTF(We shouldn't be writing words\n); -} - -static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr, -uint32_t val) +static void ivshmem_io_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { IVShmemState *s = opaque; @@ -219,20 +199,8 @@ static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr, } } -static void ivshmem_io_writeb(void *opaque, target_phys_addr_t addr, -uint32_t val) -{ -IVSHMEM_DPRINTF(We shouldn't be writing bytes\n); -} - -static uint32_t ivshmem_io_readw(void *opaque, target_phys_addr_t addr) -{ - -IVSHMEM_DPRINTF(We shouldn't be reading words\n); -return 0; -} - -static uint32_t ivshmem_io_readl(void *opaque, target_phys_addr_t addr) +static uint64_t ivshmem_io_read(void *opaque, target_phys_addr_t addr, +unsigned size) { IVShmemState *s = opaque; @@ -265,23 +233,14 @@ static uint32_t ivshmem_io_readl(void *opaque, target_phys_addr_t addr) return ret; } -static uint32_t ivshmem_io_readb(void *opaque, target_phys_addr_t addr) -{ -IVSHMEM_DPRINTF(We shouldn't be reading bytes\n); - -return 0; -} - -static CPUReadMemoryFunc * const ivshmem_mmio_read[3] = { -ivshmem_io_readb, -ivshmem_io_readw, -ivshmem_io_readl, -}; - -static CPUWriteMemoryFunc * const ivshmem_mmio_write[3] = { -ivshmem_io_writeb, -ivshmem_io_writew, -ivshmem_io_writel, +static const MemoryRegionOps ivshmem_mmio_ops = { +.read = ivshmem_io_read, +.write = ivshmem_io_write, +.endianness = DEVICE_NATIVE_ENDIAN, +.impl = { +.min_access_size = 4, +.max_access_size = 4, +}, }; static void ivshmem_receive(void *opaque, const uint8_t *buf, int size) @@ -371,12 +330,12 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); -s-ivshmem_offset = qemu_ram_alloc_from_ptr(s-dev.qdev, ivshmem.bar2, -s-ivshmem_size, ptr); +memory_region_init_ram_ptr(s-ivshmem, s-dev.qdev, ivshmem.bar2, + s-ivshmem_size, ptr); +memory_region_add_subregion(s-bar, 0, s-ivshmem); /* region for shared memory */ -pci_register_bar(s-dev, 2, s-ivshmem_size, -PCI_BASE_ADDRESS_SPACE_MEMORY, ivshmem_map); +pci_register_bar_region(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar); } static void close_guest_eventfds(IVShmemState *s, int posn) @@ -401,8 +360,12 @@ static void setup_ioeventfds(IVShmemState *s) { for (i = 0; i = s-max_peer; i++) { for (j = 0; j s-peers[i].nb_eventfds; j++) { -
[PATCH v4 16/39] eepro100: convert to memory API
Note: the existing code aliases the flash BAR into the MMIO bar. This is probably a bug. This patch does not correct the problem. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/eepro100.c | 182 - 1 files changed, 37 insertions(+), 145 deletions(-) diff --git a/hw/eepro100.c b/hw/eepro100.c index 9b6f4a5..04723f3 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -228,13 +228,14 @@ typedef struct { PCIDevice dev; /* Hash register (multicast mask array, multiple individual addresses). */ uint8_t mult[8]; -int mmio_index; +MemoryRegion mmio_bar; +MemoryRegion io_bar; +MemoryRegion flash_bar; NICState *nic; NICConf conf; uint8_t scb_stat; /* SCB stat/ack byte */ uint8_t int_stat; /* PCI interrupt status */ /* region must not be saved by nic_save. */ -uint32_t region1; /* PCI region 1 address */ uint16_t mdimem[32]; eeprom_t *eeprom; uint32_t device;/* device variant */ @@ -1584,147 +1585,36 @@ static void eepro100_write4(EEPRO100State * s, uint32_t addr, uint32_t val) } } -/* - * - * Port mapped I/O. - * - / - -static uint32_t ioport_read1(void *opaque, uint32_t addr) -{ -EEPRO100State *s = opaque; -#if 0 -logout(addr=%s\n, regname(addr)); -#endif -return eepro100_read1(s, addr - s-region1); -} - -static uint32_t ioport_read2(void *opaque, uint32_t addr) -{ -EEPRO100State *s = opaque; -return eepro100_read2(s, addr - s-region1); -} - -static uint32_t ioport_read4(void *opaque, uint32_t addr) -{ -EEPRO100State *s = opaque; -return eepro100_read4(s, addr - s-region1); -} - -static void ioport_write1(void *opaque, uint32_t addr, uint32_t val) -{ -EEPRO100State *s = opaque; -#if 0 -logout(addr=%s val=0x%02x\n, regname(addr), val); -#endif -eepro100_write1(s, addr - s-region1, val); -} - -static void ioport_write2(void *opaque, uint32_t addr, uint32_t val) -{ -EEPRO100State *s = opaque; -eepro100_write2(s, addr - s-region1, val); -} - -static void ioport_write4(void *opaque, uint32_t addr, uint32_t val) -{ -EEPRO100State *s = opaque; -eepro100_write4(s, addr - s-region1, val); -} - -/***/ -/* PCI EEPRO100 definitions */ - -static void pci_map(PCIDevice * pci_dev, int region_num, -pcibus_t addr, pcibus_t size, int type) -{ -EEPRO100State *s = DO_UPCAST(EEPRO100State, dev, pci_dev); - -TRACE(OTHER, logout(region %d, addr=0x%08FMT_PCIBUS, - size=0x%08FMT_PCIBUS, type=%d\n, - region_num, addr, size, type)); - -assert(region_num == 1); -register_ioport_write(addr, size, 1, ioport_write1, s); -register_ioport_read(addr, size, 1, ioport_read1, s); -register_ioport_write(addr, size, 2, ioport_write2, s); -register_ioport_read(addr, size, 2, ioport_read2, s); -register_ioport_write(addr, size, 4, ioport_write4, s); -register_ioport_read(addr, size, 4, ioport_read4, s); - -s-region1 = addr; -} - -/* - * - * Memory mapped I/O. - * - / - -static void pci_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -EEPRO100State *s = opaque; -#if 0 -logout(addr=%s val=0x%02x\n, regname(addr), val); -#endif -eepro100_write1(s, addr, val); -} - -static void pci_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val) +static uint64_t eepro100_read(void *opaque, target_phys_addr_t addr, + unsigned size) { EEPRO100State *s = opaque; -#if 0 -logout(addr=%s val=0x%02x\n, regname(addr), val); -#endif -eepro100_write2(s, addr, val); -} -static void pci_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -EEPRO100State *s = opaque; -#if 0 -logout(addr=%s val=0x%02x\n, regname(addr), val); -#endif -eepro100_write4(s, addr, val); -} - -static uint32_t pci_mmio_readb(void *opaque, target_phys_addr_t addr) -{ -EEPRO100State *s = opaque; -#if 0 -logout(addr=%s\n, regname(addr)); -#endif -return eepro100_read1(s, addr); +switch (size) { +case 1: return eepro100_read1(s, addr); +case 2: return eepro100_read2(s, addr); +case 4: return eepro100_read4(s, addr); +default: abort(); +} } -static uint32_t pci_mmio_readw(void *opaque, target_phys_addr_t addr) +static void eepro100_write(void *opaque, target_phys_addr_t addr, + uint64_t data, unsigned size) { EEPRO100State *s = opaque; -#if 0 -
[PATCH v4 17/39] es1370: convert to memory API
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/es1370.c | 43 +-- 1 files changed, 25 insertions(+), 18 deletions(-) diff --git a/hw/es1370.c b/hw/es1370.c index 1ed62b7..4e43c4a 100644 --- a/hw/es1370.c +++ b/hw/es1370.c @@ -268,6 +268,7 @@ struct chan { typedef struct ES1370State { PCIDevice dev; QEMUSoundCard card; +MemoryRegion io; struct chan chan[NB_CHANNELS]; SWVoiceOut *dac_voice[2]; SWVoiceIn *adc_voice; @@ -775,7 +776,6 @@ IO_READ_PROTO (es1370_readl) return val; } - static void es1370_transfer_audio (ES1370State *s, struct chan *d, int loop_sel, int max, int *irq) { @@ -906,23 +906,20 @@ static void es1370_adc_callback (void *opaque, int avail) es1370_run_channel (s, ADC_CHANNEL, avail); } -static void es1370_map (PCIDevice *pci_dev, int region_num, -pcibus_t addr, pcibus_t size, int type) -{ -ES1370State *s = DO_UPCAST (ES1370State, dev, pci_dev); - -(void) region_num; -(void) size; -(void) type; - -register_ioport_write (addr, 0x40 * 4, 1, es1370_writeb, s); -register_ioport_write (addr, 0x40 * 2, 2, es1370_writew, s); -register_ioport_write (addr, 0x40, 4, es1370_writel, s); +static const MemoryRegionPortio es1370_portio[] = { +{ 0, 0x40 * 4, 1, .write = es1370_writeb, }, +{ 0, 0x40 * 2, 2, .write = es1370_writew, }, +{ 0, 0x40, 4, .write = es1370_writel, }, +{ 0, 0x40 * 4, 1, .read = es1370_readb, }, +{ 0, 0x40 * 2, 2, .read = es1370_readw, }, +{ 0, 0x40, 4, .read = es1370_readl, }, +PORTIO_END_OF_LIST() +}; -register_ioport_read (addr, 0x40 * 4, 1, es1370_readb, s); -register_ioport_read (addr, 0x40 * 2, 2, es1370_readw, s); -register_ioport_read (addr, 0x40, 4, es1370_readl, s); -} +static const MemoryRegionOps es1370_io_ops = { +.old_portio = es1370_portio, +.endianness = DEVICE_LITTLE_ENDIAN, +}; static const VMStateDescription vmstate_es1370_channel = { .name = es1370_channel, @@ -1011,7 +1008,8 @@ static int es1370_initfn (PCIDevice *dev) c[PCI_MIN_GNT] = 0x0c; c[PCI_MAX_LAT] = 0x80; -pci_register_bar (s-dev, 0, 256, PCI_BASE_ADDRESS_SPACE_IO, es1370_map); +memory_region_init_io (s-io, es1370_io_ops, s, es1370, 256); +pci_register_bar_region (s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, s-io); qemu_register_reset (es1370_on_reset, s); AUD_register_card (es1370, s-card); @@ -1019,6 +1017,14 @@ static int es1370_initfn (PCIDevice *dev) return 0; } +static int es1370_exitfn(PCIDevice *dev) +{ +ES1370State *s = DO_UPCAST (ES1370State, dev, dev); + +memory_region_destroy (s-io); +return 0; +} + int es1370_init (PCIBus *bus) { pci_create_simple (bus, -1, ES1370); @@ -1031,6 +1037,7 @@ static PCIDeviceInfo es1370_info = { .qdev.size= sizeof (ES1370State), .qdev.vmsd= vmstate_es1370, .init = es1370_initfn, +.exit = es1370_exitfn, .vendor_id= PCI_VENDOR_ID_ENSONIQ, .device_id= PCI_DEVICE_ID_ENSONIQ_ES1370, .class_id = PCI_CLASS_MULTIMEDIA_AUDIO, -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 14/39] ac97: convert to memory API
fixes BAR sizing as well. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/ac97.c | 89 +++- 1 files changed, 52 insertions(+), 37 deletions(-) diff --git a/hw/ac97.c b/hw/ac97.c index 0b59896..52f0f0d 100644 --- a/hw/ac97.c +++ b/hw/ac97.c @@ -160,8 +160,9 @@ typedef struct AC97LinkState { SWVoiceIn *voice_mc; int invalid_freq[3]; uint8_t silence[128]; -uint32_t base[2]; int bup_flag; +MemoryRegion io_nam; +MemoryRegion io_nabm; } AC97LinkState; enum { @@ -583,7 +584,7 @@ static uint32_t nam_readw (void *opaque, uint32_t addr) { AC97LinkState *s = opaque; uint32_t val = ~0U; -uint32_t index = addr - s-base[0]; +uint32_t index = addr; s-cas = 0; val = mixer_load (s, index); return val; @@ -611,7 +612,7 @@ static void nam_writeb (void *opaque, uint32_t addr, uint32_t val) static void nam_writew (void *opaque, uint32_t addr, uint32_t val) { AC97LinkState *s = opaque; -uint32_t index = addr - s-base[0]; +uint32_t index = addr; s-cas = 0; switch (index) { case AC97_Reset: @@ -714,7 +715,7 @@ static uint32_t nabm_readb (void *opaque, uint32_t addr) { AC97LinkState *s = opaque; AC97BusMasterRegs *r = NULL; -uint32_t index = addr - s-base[1]; +uint32_t index = addr; uint32_t val = ~0U; switch (index) { @@ -769,7 +770,7 @@ static uint32_t nabm_readw (void *opaque, uint32_t addr) { AC97LinkState *s = opaque; AC97BusMasterRegs *r = NULL; -uint32_t index = addr - s-base[1]; +uint32_t index = addr; uint32_t val = ~0U; switch (index) { @@ -798,7 +799,7 @@ static uint32_t nabm_readl (void *opaque, uint32_t addr) { AC97LinkState *s = opaque; AC97BusMasterRegs *r = NULL; -uint32_t index = addr - s-base[1]; +uint32_t index = addr; uint32_t val = ~0U; switch (index) { @@ -848,7 +849,7 @@ static void nabm_writeb (void *opaque, uint32_t addr, uint32_t val) { AC97LinkState *s = opaque; AC97BusMasterRegs *r = NULL; -uint32_t index = addr - s-base[1]; +uint32_t index = addr; switch (index) { case PI_LVI: case PO_LVI: @@ -904,7 +905,7 @@ static void nabm_writew (void *opaque, uint32_t addr, uint32_t val) { AC97LinkState *s = opaque; AC97BusMasterRegs *r = NULL; -uint32_t index = addr - s-base[1]; +uint32_t index = addr; switch (index) { case PI_SR: case PO_SR: @@ -924,7 +925,7 @@ static void nabm_writel (void *opaque, uint32_t addr, uint32_t val) { AC97LinkState *s = opaque; AC97BusMasterRegs *r = NULL; -uint32_t index = addr - s-base[1]; +uint32_t index = addr; switch (index) { case PI_BDBAR: case PO_BDBAR: @@ -1230,31 +1231,33 @@ static const VMStateDescription vmstate_ac97 = { } }; -static void ac97_map (PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -AC97LinkState *s = DO_UPCAST (AC97LinkState, dev, pci_dev); -PCIDevice *d = s-dev; - -if (!region_num) { -s-base[0] = addr; -register_ioport_read (addr, 256 * 1, 1, nam_readb, d); -register_ioport_read (addr, 256 * 2, 2, nam_readw, d); -register_ioport_read (addr, 256 * 4, 4, nam_readl, d); -register_ioport_write (addr, 256 * 1, 1, nam_writeb, d); -register_ioport_write (addr, 256 * 2, 2, nam_writew, d); -register_ioport_write (addr, 256 * 4, 4, nam_writel, d); -} -else { -s-base[1] = addr; -register_ioport_read (addr, 64 * 1, 1, nabm_readb, d); -register_ioport_read (addr, 64 * 2, 2, nabm_readw, d); -register_ioport_read (addr, 64 * 4, 4, nabm_readl, d); -register_ioport_write (addr, 64 * 1, 1, nabm_writeb, d); -register_ioport_write (addr, 64 * 2, 2, nabm_writew, d); -register_ioport_write (addr, 64 * 4, 4, nabm_writel, d); -} -} +static const MemoryRegionPortio nam_portio[] = { +{ 0, 256 * 1, 1, .read = nam_readb, }, +{ 0, 256 * 2, 2, .read = nam_readw, }, +{ 0, 256 * 4, 4, .read = nam_readl, }, +{ 0, 256 * 1, 1, .write = nam_writeb, }, +{ 0, 256 * 2, 2, .write = nam_writew, }, +{ 0, 256 * 4, 4, .write = nam_writel, }, +PORTIO_END_OF_LIST(), +}; + +static const MemoryRegionOps ac97_io_nam_ops = { +.old_portio = nam_portio, +}; + +static const MemoryRegionPortio nabm_portio[] = { +{ 0, 64 * 1, 1, .read = nabm_readb, }, +{ 0, 64 * 2, 2, .read = nabm_readw, }, +{ 0, 64 * 4, 4, .read = nabm_readl, }, +{ 0, 64 * 1, 1, .write = nabm_writeb, }, +{ 0, 64 * 2, 2, .write = nabm_writew, }, +{ 0, 64 * 4, 4, .write = nabm_writel, }, +PORTIO_END_OF_LIST() +}; + +static const MemoryRegionOps ac97_io_nabm_ops = { +.old_portio = nabm_portio, +}; static void ac97_on_reset (void
[PATCH v4 06/39] cirrus: simplify bitblt BAR access functions
Make use of the memory API's ability to satisfy multi-byte accesses via multiple single-byte accesses. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c | 81 +-- 1 files changed, 13 insertions(+), 68 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index 4f57b92..c39acb9 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -2446,37 +2446,23 @@ static void cirrus_linear_write(void *opaque, target_phys_addr_t addr, ***/ -static uint32_t cirrus_linear_bitblt_readb(void *opaque, target_phys_addr_t addr) +static uint64_t cirrus_linear_bitblt_read(void *opaque, + target_phys_addr_t addr, + unsigned size) { +CirrusVGAState *s = opaque; uint32_t ret; /* XXX handle bitblt */ +(void)s; ret = 0xff; return ret; } -static uint32_t cirrus_linear_bitblt_readw(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_linear_bitblt_readb(opaque, addr); -v |= cirrus_linear_bitblt_readb(opaque, addr + 1) 8; -return v; -} - -static uint32_t cirrus_linear_bitblt_readl(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_linear_bitblt_readb(opaque, addr); -v |= cirrus_linear_bitblt_readb(opaque, addr + 1) 8; -v |= cirrus_linear_bitblt_readb(opaque, addr + 2) 16; -v |= cirrus_linear_bitblt_readb(opaque, addr + 3) 24; -return v; -} - -static void cirrus_linear_bitblt_writeb(void *opaque, target_phys_addr_t addr, -uint32_t val) +static void cirrus_linear_bitblt_write(void *opaque, + target_phys_addr_t addr, + uint64_t val, + unsigned size) { CirrusVGAState *s = opaque; @@ -2489,55 +2475,14 @@ static void cirrus_linear_bitblt_writeb(void *opaque, target_phys_addr_t addr, } } -static void cirrus_linear_bitblt_writew(void *opaque, target_phys_addr_t addr, -uint32_t val) -{ -cirrus_linear_bitblt_writeb(opaque, addr, val 0xff); -cirrus_linear_bitblt_writeb(opaque, addr + 1, (val 8) 0xff); -} - -static void cirrus_linear_bitblt_writel(void *opaque, target_phys_addr_t addr, -uint32_t val) -{ -cirrus_linear_bitblt_writeb(opaque, addr, val 0xff); -cirrus_linear_bitblt_writeb(opaque, addr + 1, (val 8) 0xff); -cirrus_linear_bitblt_writeb(opaque, addr + 2, (val 16) 0xff); -cirrus_linear_bitblt_writeb(opaque, addr + 3, (val 24) 0xff); -} - -static uint64_t cirrus_linear_bitblt_read(void *opaque, - target_phys_addr_t addr, - unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_linear_bitblt_readb(s, addr); -case 2: return cirrus_linear_bitblt_readw(s, addr); -case 4: return cirrus_linear_bitblt_readl(s, addr); -default: abort(); -} -}; - -static void cirrus_linear_bitblt_write(void *opaque, - target_phys_addr_t addr, - uint64_t data, - unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_linear_bitblt_writeb(s, addr, data); -case 2: return cirrus_linear_bitblt_writew(s, addr, data); -case 4: return cirrus_linear_bitblt_writel(s, addr, data); -default: abort(); -} -}; - static const MemoryRegionOps cirrus_linear_bitblt_io_ops = { .read = cirrus_linear_bitblt_read, .write = cirrus_linear_bitblt_write, .endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 1, +}, }; static void unmap_bank(CirrusVGAState *s, unsigned bank) -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 04/39] vga: convert vga and its derivatives to the memory API
Convert all vga memory to the memory API. Note we need to fall back to get_system_memory(), since the various buses don't pass the vga window as a memory region. We no longer need to sync the dirty bitmap of the cirrus mapped memory banks, since the memory API takes care of that for us. [jan: fix vga-pci logging] Reviewed-by: Richard Henderson r...@twiddle.net Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c | 342 --- hw/qxl-render.c |2 +- hw/qxl.c| 135 -- hw/qxl.h|6 +- hw/vga-isa-mm.c | 46 +--- hw/vga-isa.c| 10 +- hw/vga-pci.c| 28 + hw/vga.c| 146 +++- hw/vga_int.h| 14 +-- hw/vmware_vga.c | 143 --- 10 files changed, 437 insertions(+), 435 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index f39d1f8..ad23c4a 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -32,6 +32,7 @@ #include console.h #include vga_int.h #include loader.h +#include exec-memory.h /* * TODO: @@ -200,9 +201,14 @@ typedef void (*cirrus_fill_t)(struct CirrusVGAState *s, typedef struct CirrusVGAState { VGACommonState vga; -int cirrus_linear_io_addr; -int cirrus_linear_bitblt_io_addr; -int cirrus_mmio_io_addr; +MemoryRegion cirrus_linear_io; +MemoryRegion cirrus_linear_bitblt_io; +MemoryRegion cirrus_mmio_io; +MemoryRegion pci_bar; +bool linear_vram; /* vga.vram mapped over cirrus_linear_io */ +MemoryRegion low_mem_container; /* container for 0xa-0xc */ +MemoryRegion low_mem; /* always mapped, overridden by: */ +MemoryRegion *cirrus_bank[2]; /* aliases at 0xa-0xb */ uint32_t cirrus_addr_mask; uint32_t linear_mmio_mask; uint8_t cirrus_shadow_gr0; @@ -612,7 +618,7 @@ static void cirrus_invalidate_region(CirrusVGAState * s, int off_begin, off_cur_end = (off_cur + bytesperline) s-cirrus_addr_mask; off_cur = TARGET_PAGE_MASK; while (off_cur off_cur_end) { - cpu_physical_memory_set_dirty(s-vga.vram_offset + off_cur); + memory_region_set_dirty(s-vga.vram, off_cur); off_cur += TARGET_PAGE_SIZE; } off_begin += off_pitch; @@ -1177,12 +1183,6 @@ static void cirrus_update_bank_ptr(CirrusVGAState * s, unsigned bank_index) } if (limit 0) { -/* Thinking about changing bank base? First, drop the dirty bitmap information - * on the current location, otherwise we lose this pointer forever */ -if (s-vga.lfb_vram_mapped) { -target_phys_addr_t base_addr = isa_mem_base + 0xa + bank_index * 0x8000; -cpu_physical_sync_dirty_bitmap(base_addr, base_addr + 0x8000); -} s-cirrus_bank_base[bank_index] = offset; s-cirrus_bank_limit[bank_index] = limit; } else { @@ -1921,8 +1921,8 @@ static void cirrus_mem_writeb_mode4and5_8bpp(CirrusVGAState * s, val = 1; dst++; } -cpu_physical_memory_set_dirty(s-vga.vram_offset + offset); -cpu_physical_memory_set_dirty(s-vga.vram_offset + offset + 7); +memory_region_set_dirty(s-vga.vram, offset); +memory_region_set_dirty(s-vga.vram, offset + 7); } static void cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s, @@ -1946,8 +1946,8 @@ static void cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s, val = 1; dst += 2; } -cpu_physical_memory_set_dirty(s-vga.vram_offset + offset); -cpu_physical_memory_set_dirty(s-vga.vram_offset + offset + 15); +memory_region_set_dirty(s-vga.vram, offset); +memory_region_set_dirty(s-vga.vram, offset + 15); } /*** @@ -2057,8 +2057,7 @@ static void cirrus_vga_mem_writeb(void *opaque, target_phys_addr_t addr, mode = s-vga.gr[0x05] 0x7; if (mode 4 || mode 5 || ((s-vga.gr[0x0B] 0x4) == 0)) { *(s-vga.vram_ptr + bank_offset) = mem_value; - cpu_physical_memory_set_dirty(s-vga.vram_offset + - bank_offset); + memory_region_set_dirty(s-vga.vram, bank_offset); } else { if ((s-vga.gr[0x0B] 0x14) != 0x14) { cirrus_mem_writeb_mode4and5_8bpp(s, mode, @@ -2099,16 +2098,37 @@ static void cirrus_vga_mem_writel(void *opaque, target_phys_addr_t addr, uint32_ cirrus_vga_mem_writeb(opaque, addr + 3, (val 24) 0xff); } -static CPUReadMemoryFunc * const cirrus_vga_mem_read[3] = { -cirrus_vga_mem_readb, -cirrus_vga_mem_readw, -cirrus_vga_mem_readl, +static uint64_t cirrus_vga_mem_read(void *opaque, +target_phys_addr_t addr, +uint32_t size) +{ +CirrusVGAState *s = opaque; + +switch (size) { +case 1: return
[PATCH v4 09/39] cirrus: simplify linear framebuffer access functions
Make use of the memory API's ability to satisfy multi-byte accesses via multiple single-byte accesses. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c | 74 ++- 1 files changed, 8 insertions(+), 66 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index 2a9bd25..c9887ac 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -2250,7 +2250,8 @@ static void cirrus_cursor_draw_line(VGACommonState *s1, uint8_t *d1, int scr_y) * ***/ -static uint32_t cirrus_linear_readb(void *opaque, target_phys_addr_t addr) +static uint64_t cirrus_linear_read(void *opaque, target_phys_addr_t addr, + unsigned size) { CirrusVGAState *s = opaque; uint32_t ret; @@ -2278,28 +2279,8 @@ static uint32_t cirrus_linear_readb(void *opaque, target_phys_addr_t addr) return ret; } -static uint32_t cirrus_linear_readw(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_linear_readb(opaque, addr); -v |= cirrus_linear_readb(opaque, addr + 1) 8; -return v; -} - -static uint32_t cirrus_linear_readl(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_linear_readb(opaque, addr); -v |= cirrus_linear_readb(opaque, addr + 1) 8; -v |= cirrus_linear_readb(opaque, addr + 2) 16; -v |= cirrus_linear_readb(opaque, addr + 3) 24; -return v; -} - -static void cirrus_linear_writeb(void *opaque, target_phys_addr_t addr, -uint32_t val) +static void cirrus_linear_write(void *opaque, target_phys_addr_t addr, +uint64_t val, unsigned size) { CirrusVGAState *s = opaque; unsigned mode; @@ -2339,49 +2320,6 @@ static void cirrus_linear_writeb(void *opaque, target_phys_addr_t addr, } } -static void cirrus_linear_writew(void *opaque, target_phys_addr_t addr, -uint32_t val) -{ -cirrus_linear_writeb(opaque, addr, val 0xff); -cirrus_linear_writeb(opaque, addr + 1, (val 8) 0xff); -} - -static void cirrus_linear_writel(void *opaque, target_phys_addr_t addr, -uint32_t val) -{ -cirrus_linear_writeb(opaque, addr, val 0xff); -cirrus_linear_writeb(opaque, addr + 1, (val 8) 0xff); -cirrus_linear_writeb(opaque, addr + 2, (val 16) 0xff); -cirrus_linear_writeb(opaque, addr + 3, (val 24) 0xff); -} - - -static uint64_t cirrus_linear_read(void *opaque, target_phys_addr_t addr, - unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_linear_readb(s, addr); -case 2: return cirrus_linear_readw(s, addr); -case 4: return cirrus_linear_readl(s, addr); -default: abort(); -} -} - -static void cirrus_linear_write(void *opaque, target_phys_addr_t addr, -uint64_t data, unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_linear_writeb(s, addr, data); -case 2: return cirrus_linear_writew(s, addr, data); -case 4: return cirrus_linear_writel(s, addr, data); -default: abort(); -} -} - /*** * * system to screen memory access @@ -2859,6 +2797,10 @@ static const MemoryRegionOps cirrus_linear_io_ops = { .read = cirrus_linear_read, .write = cirrus_linear_write, .endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 1, +}, }; static void cirrus_init_common(CirrusVGAState * s, int device_id, int is_pci) -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 08/39] vga: simplify vga window mmio access functions
Make use of the memory API's ability to satisfy multi-byte accesses via multiple single-byte accesses. We have to keep vga_mem_{read,write}b() since they're used by cirrus. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c |4 +- hw/vga.c| 56 +++--- hw/vga_int.h|4 +- 3 files changed, 12 insertions(+), 52 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index 5ded1ff..2a9bd25 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -1966,7 +1966,7 @@ static uint64_t cirrus_vga_mem_read(void *opaque, uint32_t val; if ((s-vga.sr[0x07] 0x01) == 0) { - return vga_mem_readb(s, addr); +return vga_mem_readb(s-vga, addr); } if (addr 0x1) { @@ -2011,7 +2011,7 @@ static void cirrus_vga_mem_write(void *opaque, unsigned mode; if ((s-vga.sr[0x07] 0x01) == 0) { - vga_mem_writeb(s, addr, mem_value); +vga_mem_writeb(s-vga, addr, mem_value); return; } diff --git a/hw/vga.c b/hw/vga.c index 8b6e6b6..33dc478 100644 --- a/hw/vga.c +++ b/hw/vga.c @@ -708,9 +708,8 @@ static void vbe_ioport_write_data(void *opaque, uint32_t addr, uint32_t val) #endif /* called for accesses between 0xa and 0xc */ -uint32_t vga_mem_readb(void *opaque, target_phys_addr_t addr) +uint32_t vga_mem_readb(VGACommonState *s, target_phys_addr_t addr) { -VGACommonState *s = opaque; int memory_map_mode, plane; uint32_t ret; @@ -764,28 +763,9 @@ uint32_t vga_mem_readb(void *opaque, target_phys_addr_t addr) return ret; } -static uint32_t vga_mem_readw(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; -v = vga_mem_readb(opaque, addr); -v |= vga_mem_readb(opaque, addr + 1) 8; -return v; -} - -static uint32_t vga_mem_readl(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; -v = vga_mem_readb(opaque, addr); -v |= vga_mem_readb(opaque, addr + 1) 8; -v |= vga_mem_readb(opaque, addr + 2) 16; -v |= vga_mem_readb(opaque, addr + 3) 24; -return v; -} - /* called for accesses between 0xa and 0xc */ -void vga_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) +void vga_mem_writeb(VGACommonState *s, target_phys_addr_t addr, uint32_t val) { -VGACommonState *s = opaque; int memory_map_mode, plane, write_mode, b, func_select, mask; uint32_t write_mask, bit_mask, set_mask; @@ -917,20 +897,6 @@ void vga_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val) } } -static void vga_mem_writew(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -vga_mem_writeb(opaque, addr, val 0xff); -vga_mem_writeb(opaque, addr + 1, (val 8) 0xff); -} - -static void vga_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -vga_mem_writeb(opaque, addr, val 0xff); -vga_mem_writeb(opaque, addr + 1, (val 8) 0xff); -vga_mem_writeb(opaque, addr + 2, (val 16) 0xff); -vga_mem_writeb(opaque, addr + 3, (val 24) 0xff); -} - typedef void vga_draw_glyph8_func(uint8_t *d, int linesize, const uint8_t *font_ptr, int h, uint32_t fgcol, uint32_t bgcol); @@ -2105,12 +2071,7 @@ static uint64_t vga_mem_read(void *opaque, target_phys_addr_t addr, { VGACommonState *s = opaque; -switch (size) { -case 1: return vga_mem_readb(s, addr); -case 2: return vga_mem_readw(s, addr); -case 4: return vga_mem_readl(s, addr); -default: abort(); -} +return vga_mem_readb(s, addr); } static void vga_mem_write(void *opaque, target_phys_addr_t addr, @@ -2118,18 +2079,17 @@ static void vga_mem_write(void *opaque, target_phys_addr_t addr, { VGACommonState *s = opaque; -switch (size) { -case 1: return vga_mem_writeb(s, addr, data); -case 2: return vga_mem_writew(s, addr, data); -case 4: return vga_mem_writel(s, addr, data); -default: abort(); -} +return vga_mem_writeb(s, addr, data); } const MemoryRegionOps vga_mem_ops = { .read = vga_mem_read, .write = vga_mem_write, .endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 1, +}, }; static int vga_common_post_load(void *opaque, int version_id) diff --git a/hw/vga_int.h b/hw/vga_int.h index 4592d2c..100d98c 100644 --- a/hw/vga_int.h +++ b/hw/vga_int.h @@ -198,8 +198,8 @@ void vga_dirty_log_restart(VGACommonState *s); extern const VMStateDescription vmstate_vga_common; uint32_t vga_ioport_read(void *opaque, uint32_t addr); void vga_ioport_write(void *opaque, uint32_t addr, uint32_t val); -uint32_t vga_mem_readb(void *opaque, target_phys_addr_t addr); -void vga_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val); +uint32_t vga_mem_readb(VGACommonState *s, target_phys_addr_t addr); +void
[PATCH v4 12/39] pci: allow I/O BARs to be registered with pci_register_bar_region()
Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/pci.c | 43 +++ hw/pci.h |1 + hw/pci_internals.h |3 ++- 3 files changed, 26 insertions(+), 21 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 0857644..c00cbf8 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -271,7 +271,8 @@ void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, qbus_create_inplace(bus-qbus, pci_bus_info, parent, name); assert(PCI_FUNC(devfn_min) == 0); bus-devfn_min = devfn_min; -bus-address_space = address_space_mem; +bus-address_space_mem = address_space_mem; +bus-address_space_io = address_space_io; /* host bridge */ QLIST_INIT(bus-child); @@ -847,12 +848,11 @@ static void pci_unregister_io_regions(PCIDevice *pci_dev) r = pci_dev-io_regions[i]; if (!r-size || r-addr == PCI_BAR_UNMAPPED) continue; -if (r-type == PCI_BASE_ADDRESS_SPACE_IO) { -isa_unassign_ioport(r-addr, r-filtered_size); +if (r-memory) { +memory_region_del_subregion(r-address_space, r-memory); } else { -if (r-memory) { -memory_region_del_subregion(pci_dev-bus-address_space, -r-memory); +if (r-type == PCI_BASE_ADDRESS_SPACE_IO) { +isa_unassign_ioport(r-addr, r-filtered_size); } else { cpu_register_physical_memory(pci_to_cpu_addr(pci_dev-bus, r-addr), @@ -934,9 +934,11 @@ static void pci_simple_bar_mapfunc_region(PCIDevice *pci_dev, int region_num, pcibus_t addr, pcibus_t size, int type) { -memory_region_add_subregion_overlap(pci_dev-bus-address_space, +PCIIORegion *r = pci_dev-io_regions[region_num]; + +memory_region_add_subregion_overlap(r-address_space, addr, -pci_dev-io_regions[region_num].memory, +r-memory, 1); } @@ -953,9 +955,13 @@ void pci_register_bar_region(PCIDevice *pci_dev, int region_num, uint8_t attr, MemoryRegion *memory) { pci_register_bar(pci_dev, region_num, memory_region_size(memory), - PCI_BASE_ADDRESS_SPACE_MEMORY | attr, + attr, pci_simple_bar_mapfunc_region); pci_dev-io_regions[region_num].memory = memory; +pci_dev-io_regions[region_num].address_space += attr PCI_BASE_ADDRESS_SPACE_IO +? pci_dev-bus-address_space_io +: pci_dev-bus-address_space_mem; } pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num) @@ -1090,7 +1096,9 @@ static void pci_update_mappings(PCIDevice *d) /* now do the real mapping */ if (r-addr != PCI_BAR_UNMAPPED) { -if (r-type PCI_BASE_ADDRESS_SPACE_IO) { +if (r-memory) { +memory_region_del_subregion(r-address_space, r-memory); +} else if (r-type PCI_BASE_ADDRESS_SPACE_IO) { int class; /* NOTE: specific hack for IDE in PC case: only one byte must be mapped. */ @@ -1101,16 +1109,11 @@ static void pci_update_mappings(PCIDevice *d) isa_unassign_ioport(r-addr, r-filtered_size); } } else { -if (r-memory) { -memory_region_del_subregion(d-bus-address_space, -r-memory); -} else { -cpu_register_physical_memory(pci_to_cpu_addr(d-bus, - r-addr), - r-filtered_size, - IO_MEM_UNASSIGNED); -qemu_unregister_coalesced_mmio(r-addr, r-filtered_size); -} +cpu_register_physical_memory(pci_to_cpu_addr(d-bus, + r-addr), + r-filtered_size, + IO_MEM_UNASSIGNED); +qemu_unregister_coalesced_mmio(r-addr, r-filtered_size); } } r-addr = new_addr; diff --git a/hw/pci.h b/hw/pci.h index 45b30fa..928e96c 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -95,6 +95,7 @@ typedef struct PCIIORegion { PCIMapIORegionFunc *map_func; ram_addr_t ram_addr; MemoryRegion *memory; +MemoryRegion *address_space; } PCIIORegion; #define PCI_ROM_SLOT 6 diff --git a/hw/pci_internals.h
[PATCH v4 11/39] pci: pass I/O address space to new PCI bus
This lets us register BARs in the I/O address space. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/apb_pci.c |1 + hw/bonito.c|1 + hw/grackle_pci.c |8 ++-- hw/gt64xxx.c |4 +++- hw/pc.h|4 +++- hw/pc_piix.c |6 +- hw/pci.c | 18 -- hw/pci.h | 10 +++--- hw/piix_pci.c | 14 +- hw/ppc4xx_pci.c|1 + hw/ppc_mac.h | 11 --- hw/ppc_newworld.c |4 ++-- hw/ppc_oldworld.c |4 +++- hw/ppc_prep.c |2 +- hw/ppce500_pci.c |7 --- hw/prep_pci.c |8 ++-- hw/prep_pci.h |4 +++- hw/sh_pci.c|4 +++- hw/unin_pci.c | 16 hw/versatile_pci.c |2 +- 20 files changed, 91 insertions(+), 38 deletions(-) diff --git a/hw/apb_pci.c b/hw/apb_pci.c index 8b9939c..1638226 100644 --- a/hw/apb_pci.c +++ b/hw/apb_pci.c @@ -348,6 +348,7 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, d-bus = pci_register_bus(d-busdev.qdev, pci, pci_apb_set_irq, pci_pbm_map_irq, d, get_system_memory(), + get_system_io(), 0, 32); pci_bus_set_mem_base(d-bus, mem_base); diff --git a/hw/bonito.c b/hw/bonito.c index 5f62dda..8708e95 100644 --- a/hw/bonito.c +++ b/hw/bonito.c @@ -775,6 +775,7 @@ PCIBus *bonito_init(qemu_irq *pic) pcihost = FROM_SYSBUS(BonitoState, sysbus_from_qdev(dev)); b = pci_register_bus(pcihost-busdev.qdev, pci, pci_bonito_set_irq, pci_bonito_map_irq, pic, get_system_memory(), + get_system_io(), 0x28, 32); pcihost-bus = b; qdev_init_nofail(dev); diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c index da67cf9..9a823e1 100644 --- a/hw/grackle_pci.c +++ b/hw/grackle_pci.c @@ -62,7 +62,8 @@ static void pci_grackle_reset(void *opaque) } PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic, - MemoryRegion *address_space) + MemoryRegion *address_space_mem, + MemoryRegion *address_space_io) { DeviceState *dev; SysBusDevice *s; @@ -75,7 +76,10 @@ PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic, d-host_state.bus = pci_register_bus(d-busdev.qdev, pci, pci_grackle_set_irq, pci_grackle_map_irq, - pic, address_space, 0, 4); + pic, + address_space_mem, + address_space_io, + 0, 4); pci_create_simple(d-host_state.bus, 0, grackle); diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c index 65e63dd..d541558 100644 --- a/hw/gt64xxx.c +++ b/hw/gt64xxx.c @@ -1093,7 +1093,9 @@ PCIBus *gt64120_register(qemu_irq *pic) d = FROM_SYSBUS(GT64120State, s); d-pci.bus = pci_register_bus(d-busdev.qdev, pci, gt64120_pci_set_irq, gt64120_pci_map_irq, - pic, get_system_memory(), + pic, + get_system_memory(), + get_system_io(), PCI_DEVFN(18, 0), 4); d-ISD_handle = cpu_register_io_memory(gt64120_read, gt64120_write, d, DEVICE_NATIVE_ENDIAN); diff --git a/hw/pc.h b/hw/pc.h index a2de0fe..ec34db7 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -179,7 +179,9 @@ struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, -qemu_irq *pic, MemoryRegion *address_space, +qemu_irq *pic, +MemoryRegion *address_space_mem, +MemoryRegion *address_space_io, ram_addr_t ram_size); void i440fx_init_memory_mappings(PCII440FXState *d); diff --git a/hw/pc_piix.c b/hw/pc_piix.c index c0a2abe..7dd5008 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -69,6 +69,7 @@ static void ioapic_init(IsaIrqState *isa_irq_state) /* PC hardware initialisation */ static void pc_init1(MemoryRegion *system_memory, + MemoryRegion *system_io, ram_addr_t ram_size, const char *boot_device, const char *kernel_filename, @@ -129,7 +130,7 @@ static void pc_init1(MemoryRegion *system_memory, if (pci_enabled) { pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_irq, -
[PATCH v4 05/39] cirrus: simplify mmio BAR access functions
Make use of the memory API's ability to satisfy multi-byte accesses via multiple single-byte accesses. Reviewed-by: Richard Henderson r...@twiddle.net Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c | 78 +- 1 files changed, 8 insertions(+), 70 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index ad23c4a..4f57b92 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -2827,12 +2827,11 @@ static void cirrus_vga_ioport_write(void *opaque, uint32_t addr, uint32_t val) * ***/ -static uint32_t cirrus_mmio_readb(void *opaque, target_phys_addr_t addr) +static uint64_t cirrus_mmio_read(void *opaque, target_phys_addr_t addr, + unsigned size) { CirrusVGAState *s = opaque; -addr = CIRRUS_PNPMMIO_SIZE - 1; - if (addr = 0x100) { return cirrus_mmio_blt_read(s, addr - 0x100); } else { @@ -2840,33 +2839,11 @@ static uint32_t cirrus_mmio_readb(void *opaque, target_phys_addr_t addr) } } -static uint32_t cirrus_mmio_readw(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_mmio_readb(opaque, addr); -v |= cirrus_mmio_readb(opaque, addr + 1) 8; -return v; -} - -static uint32_t cirrus_mmio_readl(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_mmio_readb(opaque, addr); -v |= cirrus_mmio_readb(opaque, addr + 1) 8; -v |= cirrus_mmio_readb(opaque, addr + 2) 16; -v |= cirrus_mmio_readb(opaque, addr + 3) 24; -return v; -} - -static void cirrus_mmio_writeb(void *opaque, target_phys_addr_t addr, - uint32_t val) +static void cirrus_mmio_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { CirrusVGAState *s = opaque; -addr = CIRRUS_PNPMMIO_SIZE - 1; - if (addr = 0x100) { cirrus_mmio_blt_write(s, addr - 0x100, val); } else { @@ -2874,53 +2851,14 @@ static void cirrus_mmio_writeb(void *opaque, target_phys_addr_t addr, } } -static void cirrus_mmio_writew(void *opaque, target_phys_addr_t addr, - uint32_t val) -{ -cirrus_mmio_writeb(opaque, addr, val 0xff); -cirrus_mmio_writeb(opaque, addr + 1, (val 8) 0xff); -} - -static void cirrus_mmio_writel(void *opaque, target_phys_addr_t addr, - uint32_t val) -{ -cirrus_mmio_writeb(opaque, addr, val 0xff); -cirrus_mmio_writeb(opaque, addr + 1, (val 8) 0xff); -cirrus_mmio_writeb(opaque, addr + 2, (val 16) 0xff); -cirrus_mmio_writeb(opaque, addr + 3, (val 24) 0xff); -} - - -static uint64_t cirrus_mmio_read(void *opaque, target_phys_addr_t addr, - unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_mmio_readb(s, addr); -case 2: return cirrus_mmio_readw(s, addr); -case 4: return cirrus_mmio_readl(s, addr); -default: abort(); -} -}; - -static void cirrus_mmio_write(void *opaque, target_phys_addr_t addr, - uint64_t data, unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_mmio_writeb(s, addr, data); -case 2: return cirrus_mmio_writew(s, addr, data); -case 4: return cirrus_mmio_writel(s, addr, data); -default: abort(); -} -}; - static const MemoryRegionOps cirrus_mmio_io_ops = { .read = cirrus_mmio_read, .write = cirrus_mmio_write, .endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 1, +}, }; /* load/save state */ -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 10/39] Integrate I/O memory regions into qemu
get_system_io() returns the root I/O memory region. Reviewed-by: Richard Henderson r...@twiddle.net Signed-off-by: Avi Kivity a...@redhat.com --- exec-memory.h |5 + exec.c| 10 ++ 2 files changed, 15 insertions(+), 0 deletions(-) diff --git a/exec-memory.h b/exec-memory.h index c439aba..334219f 100644 --- a/exec-memory.h +++ b/exec-memory.h @@ -28,6 +28,11 @@ */ MemoryRegion *get_system_memory(void); +/* Get the root I/O port region. This interface should only be used + * temporarily until a proper bus interface is available. + */ +MemoryRegion *get_system_io(void); + /* Set the root memory region. This region is the system memory map. */ void set_system_memory_map(MemoryRegion *mr); diff --git a/exec.c b/exec.c index 719fff9..be7e4b2 100644 --- a/exec.c +++ b/exec.c @@ -113,6 +113,7 @@ static int in_migration; RAMList ram_list = { .blocks = QLIST_HEAD_INITIALIZER(ram_list) }; static MemoryRegion *system_memory; +static MemoryRegion *system_io; #endif @@ -3830,6 +3831,10 @@ static void memory_map_init(void) system_memory = qemu_malloc(sizeof(*system_memory)); memory_region_init(system_memory, system, INT64_MAX); set_system_memory_map(system_memory); + +system_io = qemu_malloc(sizeof(*system_io)); +memory_region_init(system_io, io, 65536); +set_system_io_map(system_io); } MemoryRegion *get_system_memory(void) @@ -3837,6 +3842,11 @@ MemoryRegion *get_system_memory(void) return system_memory; } +MemoryRegion *get_system_io(void) +{ +return system_io; +} + #endif /* !defined(CONFIG_USER_ONLY) */ /* physical memory access (slow version, mainly for debug) */ -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 07/39] cirrus: simplify vga window mmio access functions
Make use of the memory API's ability to satisfy multi-byte accesses via multiple single-byte accesses. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/cirrus_vga.c | 79 +++--- 1 files changed, 11 insertions(+), 68 deletions(-) diff --git a/hw/cirrus_vga.c b/hw/cirrus_vga.c index c39acb9..5ded1ff 100644 --- a/hw/cirrus_vga.c +++ b/hw/cirrus_vga.c @@ -1956,7 +1956,9 @@ static void cirrus_mem_writeb_mode4and5_16bpp(CirrusVGAState * s, * ***/ -static uint32_t cirrus_vga_mem_readb(void *opaque, target_phys_addr_t addr) +static uint64_t cirrus_vga_mem_read(void *opaque, +target_phys_addr_t addr, +uint32_t size) { CirrusVGAState *s = opaque; unsigned bank_index; @@ -1967,8 +1969,6 @@ static uint32_t cirrus_vga_mem_readb(void *opaque, target_phys_addr_t addr) return vga_mem_readb(s, addr); } -addr = 0x1; - if (addr 0x1) { /* XXX handle bitblt */ /* video memory */ @@ -2000,28 +2000,10 @@ static uint32_t cirrus_vga_mem_readb(void *opaque, target_phys_addr_t addr) return val; } -static uint32_t cirrus_vga_mem_readw(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_vga_mem_readb(opaque, addr); -v |= cirrus_vga_mem_readb(opaque, addr + 1) 8; -return v; -} - -static uint32_t cirrus_vga_mem_readl(void *opaque, target_phys_addr_t addr) -{ -uint32_t v; - -v = cirrus_vga_mem_readb(opaque, addr); -v |= cirrus_vga_mem_readb(opaque, addr + 1) 8; -v |= cirrus_vga_mem_readb(opaque, addr + 2) 16; -v |= cirrus_vga_mem_readb(opaque, addr + 3) 24; -return v; -} - -static void cirrus_vga_mem_writeb(void *opaque, target_phys_addr_t addr, - uint32_t mem_value) +static void cirrus_vga_mem_write(void *opaque, + target_phys_addr_t addr, + uint64_t mem_value, + uint32_t size) { CirrusVGAState *s = opaque; unsigned bank_index; @@ -2033,8 +2015,6 @@ static void cirrus_vga_mem_writeb(void *opaque, target_phys_addr_t addr, return; } -addr = 0x1; - if (addr 0x1) { if (s-cirrus_srcptr != s-cirrus_srcptr_end) { /* bitblt */ @@ -2084,51 +2064,14 @@ static void cirrus_vga_mem_writeb(void *opaque, target_phys_addr_t addr, } } -static void cirrus_vga_mem_writew(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -cirrus_vga_mem_writeb(opaque, addr, val 0xff); -cirrus_vga_mem_writeb(opaque, addr + 1, (val 8) 0xff); -} - -static void cirrus_vga_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val) -{ -cirrus_vga_mem_writeb(opaque, addr, val 0xff); -cirrus_vga_mem_writeb(opaque, addr + 1, (val 8) 0xff); -cirrus_vga_mem_writeb(opaque, addr + 2, (val 16) 0xff); -cirrus_vga_mem_writeb(opaque, addr + 3, (val 24) 0xff); -} - -static uint64_t cirrus_vga_mem_read(void *opaque, -target_phys_addr_t addr, -uint32_t size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_vga_mem_readb(s, addr); -case 2: return cirrus_vga_mem_readw(s, addr); -case 4: return cirrus_vga_mem_readl(s, addr); -default: abort(); -} -} - -static void cirrus_vga_mem_write(void *opaque, target_phys_addr_t addr, - uint64_t data, unsigned size) -{ -CirrusVGAState *s = opaque; - -switch (size) { -case 1: return cirrus_vga_mem_writeb(s, addr, data); -case 2: return cirrus_vga_mem_writew(s, addr, data); -case 4: return cirrus_vga_mem_writel(s, addr, data); -default: abort(); -} -}; - static const MemoryRegionOps cirrus_vga_mem_ops = { .read = cirrus_vga_mem_read, .write = cirrus_vga_mem_write, .endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 1, +.max_access_size = 1, +}, }; /*** -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 03/39] vmsvga: don't remember pci BAR address in callback any more
We're going to remove the callback, so we can't use it to save the address. Use the pci API instead. Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- hw/vmware_vga.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c index 354c221..190b005 100644 --- a/hw/vmware_vga.c +++ b/hw/vmware_vga.c @@ -52,8 +52,6 @@ struct vmsvga_state_s { int on; } cursor; -target_phys_addr_t vram_base; - int index; int scratch_size; uint32_t *scratch; @@ -761,8 +759,11 @@ static uint32_t vmsvga_value_read(void *opaque, uint32_t address) case SVGA_REG_BYTES_PER_LINE: return ((s-depth + 7) 3) * s-new_width; -case SVGA_REG_FB_START: -return s-vram_base; +case SVGA_REG_FB_START: { +struct pci_vmsvga_state_s *pci_vmsvga += container_of(s, struct pci_vmsvga_state_s, chip); +return pci_get_bar_addr(pci_vmsvga-card, 1); +} case SVGA_REG_FB_OFFSET: return 0x0; @@ -1247,14 +1248,13 @@ static void pci_vmsvga_map_mem(PCIDevice *pci_dev, int region_num, struct vmsvga_state_s *s = d-chip; ram_addr_t iomemtype; -s-vram_base = addr; #ifdef DIRECT_VRAM iomemtype = cpu_register_io_memory(vmsvga_vram_read, vmsvga_vram_write, s, DEVICE_NATIVE_ENDIAN); #else iomemtype = s-vga.vram_offset | IO_MEM_RAM; #endif -cpu_register_physical_memory(s-vram_base, s-vga.vram_size, +cpu_register_physical_memory(addr, s-vga.vram_size, iomemtype); s-vga.map_addr = addr; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types
On 08/08/2011 04:00 PM, Anthony Liguori wrote: On 08/08/2011 07:56 AM, Avi Kivity wrote: QEMU deals with a lot of fixed width integer types; their names (uint64_t etc) are clumsy to use and take up a lot of space. Following Linux, introduce shorter names, for example U64 for uint64_t. Except Linux uses lower case letters. I personally think Linux style is wrong here. The int8_t types are standard types. Besides, we save lots of characters by using 4-space tabs instead of 8-space tabs. We can afford to spend some of those saved characters on using proper type names :-) It's not about saving space, it's about improving readability. We have about 21k uses of these types, they deserve short names. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On Mon, Aug 08, 2011 at 08:02:08AM -0500, Anthony Liguori wrote: On 08/08/2011 07:56 AM, Michael S. Tsirkin wrote: On Mon, Aug 08, 2011 at 07:45:19AM -0500, Anthony Liguori wrote: On 08/08/2011 05:36 AM, Michael S. Tsirkin wrote: Thinking more closely, I don't think this right. Updating on map ensured that the config was refreshed after each time the bar was mapped. In the very least, the config needs to be refreshed during reset because the guest may write to the guest space which should get cleared after reset. Regards, Anthony Liguori Not sure I understand. Which register, for example, do you have in mind? Could you clarify please? Actually, you never need to call config_get() AFAICT. It's called in every read/write access. Every read, yes. But every write? Are you sure? Yeah, not on write, but I think this is a bug. get_config() should be called before doing the memcpy() in order to have a proper RMW. Regards, Anthony Liguori Probably not noticeable because guests don't do the RMW in practice. We also send the config over on migration. That's probably a bug as well ... -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 01/39] virtio-pci: get config on init
On 08/08/2011 08:14 AM, Michael S. Tsirkin wrote: Probably not noticeable because guests don't do the RMW in practice. We also send the config over on migration. That's probably a bug as well ... It's probably unnecessary, but I don't think it's a bug.. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types
On 08/08/2011 08:12 AM, Avi Kivity wrote: On 08/08/2011 04:00 PM, Anthony Liguori wrote: On 08/08/2011 07:56 AM, Avi Kivity wrote: QEMU deals with a lot of fixed width integer types; their names (uint64_t etc) are clumsy to use and take up a lot of space. Following Linux, introduce shorter names, for example U64 for uint64_t. Except Linux uses lower case letters. I personally think Linux style is wrong here. The int8_t types are standard types. Besides, we save lots of characters by using 4-space tabs instead of 8-space tabs. We can afford to spend some of those saved characters on using proper type names :-) It's not about saving space, it's about improving readability. We have about 21k uses of these types, they deserve short names. This is one of the few areas that we're actually consistent with today. Introducing a new set of types will just create inconsistency. Most importantly, these are standard types. Every modern library and C program should be using them. TBH, having short names is just a bad case of NIH. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types
On 8 August 2011 13:56, Avi Kivity a...@redhat.com wrote: QEMU deals with a lot of fixed width integer types; their names (uint64_t etc) are clumsy to use and take up a lot of space. Following Linux, introduce shorter names, for example U64 for uint64_t. Strongly disagree. uint64_t c are standard types and it's immediately clear to a competent C programmer what they are. Random qemu-specific funny named types just introduces an unnecessary level of indirection. We only just recently managed to get rid of the nonstandard typenames for these from fpu/... -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types
On 08/08/2011 04:17 PM, Anthony Liguori wrote: This is one of the few areas that we're actually consistent with today. Introducing a new set of types will just create inconsistency. Most importantly, these are standard types. Every modern library and C program should be using them. TBH, having short names is just a bad case of NIH. Those are exactly the same types, compatible with all the libraries. NIH would be redefining them ourselves (and breaking pointer compatibility etc.) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
2011/8/8 Avi Kivity a...@redhat.com: On 08/08/2011 03:49 PM, Frediano Ziglio wrote: 2011/8/8 Avi Kivitya...@redhat.com: In certain circumstances, posix-aio-compat can incur a lot of latency: - threads are created by vcpu threads, so if vcpu affinity is set, aio threads inherit vcpu affinity. This can cause many aio threads to compete for one cpu. - we can create up to max_threads (64) aio threads in one go; since a pthread_create can take around 30μs, we have up to 2ms of cpu time under a global lock. Fix by: - moving thread creation to the main thread, so we inherit the main thread's affinity instead of the vcpu thread's affinity. - if a thread is currently being created, and we need to create yet another thread, let thread being born create the new thread, reducing the amount of time we spend under the main thread. - drop the local lock while creating a thread (we may still hold the global mutex, though) Note this doesn't eliminate latency completely; scheduler artifacts or lack of host cpu resources can still cause it. We may want pre-allocated threads when this cannot be tolerated. Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions. Signed-off-by: Avi Kivitya...@redhat.com Why not calling pthread_attr_setaffinity_np (where available) before thread creation or shed_setaffinity at thread start instead of telling another thread to create a thread for us just to get affinity cleared? The entire qemu process may be affined to a subset of the host cpus; we don't want to break that. For example: taskset 0xf0 qemu (qemu) info cpus pin individual vcpu threads to host cpus Just call sched_getaffinity at program start, save to a global variable and then set this affinity for io threads. I didn't use affinity that much but from manual it seems that if you own process you can set affinity as you like. IMHO this patch introduce a delay in io thread creation due to posting thread creation to another thread just to set different affinity. Frediano -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] posix-aio-compat: fix latency issues
On 08/08/2011 04:21 PM, Frediano Ziglio wrote: The entire qemu process may be affined to a subset of the host cpus; we don't want to break that. For example: taskset 0xf0 qemu (qemu) info cpus pin individual vcpu threads to host cpus Just call sched_getaffinity at program start, save to a global variable and then set this affinity for io threads. This affinity may change later on. I didn't use affinity that much but from manual it seems that if you own process you can set affinity as you like. IMHO this patch introduce a delay in io thread creation due to posting thread creation to another thread just to set different affinity. It does. But aio threads have a long life, so this happens very rarely. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce short names for fixed width integer types
Am 08.08.2011 15:00, schrieb Anthony Liguori: On 08/08/2011 07:56 AM, Avi Kivity wrote: QEMU deals with a lot of fixed width integer types; their names (uint64_t etc) are clumsy to use and take up a lot of space. Following Linux, introduce shorter names, for example U64 for uint64_t. Except Linux uses lower case letters. I personally think Linux style is wrong here. The int8_t types are standard types. I fully agree, we should use the standard types. Besides, we save lots of characters by using 4-space tabs instead of 8-space tabs. We can afford to spend some of those saved characters on using proper type names :-) Heh, I like this reasoning. :-) Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] postcopy livemigration proposal
On 08/08/2011 03:32 PM, Anthony Liguori wrote: On 08/08/2011 04:20 AM, Dor Laor wrote: On 08/08/2011 06:24 AM, Isaku Yamahata wrote: This mail is on Yabusame: Postcopy Live Migration for Qemu/KVM on which we'll give a talk at KVM-forum. The purpose of this mail is to letting developers know it in advance so that we can get better feedback on its design/implementation approach early before our starting to implement it. Background == * What's is postcopy livemigration It is is yet another live migration mechanism for Qemu/KVM, which implements the migration technique known as postcopy or lazy migration. Just after the migrate command is invoked, the execution host of a VM is instantaneously switched to a destination host. The benefit is, total migration time is shorter because it transfer a page only once. On the other hand precopy may repeat sending same pages again and again because they can be dirtied. The switching time from the source to the destination is several hunderds mili seconds so that it enables quick load balancing. For details, please refer to the papers. We believe this is useful for others so that we'd like to merge this feature into the upstream qemu/kvm. The existing implementation that we have right now is very ad-hoc because it's for academic research. For the upstream merge, we're starting to re-design/implement it and we'd like to get feedback early. Although many improvements/optimizations are possible, we should implement/merge the simple/clean, but extensible as well, one at first and then improve/optimize it later. postcopy livemigration will be introduced as optional feature. The existing precopy livemigration remains as default behavior. * related links: project page http://sites.google.com/site/grivonhome/quick-kvm-migration Enabling Instantaneous Relocation of Virtual Machines with a Lightweight VMM Extension, (proof-of-concept, ad-hoc prototype. not a new design) http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-paper.pdf http://grivon.googlecode.com/svn/pub/docs/ccgrid2010-hirofuchi-talk.pdf Reactive consolidation of virtual machines enabled by postcopy live migration (advantage for VM consolidation) http://portal.acm.org/citation.cfm?id=1996125 http://www.emn.fr/x-info/ascola/lib/exe/fetch.php?media=internet:vtdc-postcopy.pdf Qemu wiki http://wiki.qemu.org/Features/PostCopyLiveMigration Design/Implementation = The basic idea of postcopy livemigration is to use a sort of distributed shared memory between the migration source and destination. The migration procedure looks like - start migration stop the guest VM on the source and send the machine states except guest RAM to the destination - resume the guest VM on the destination without guest RAM contents - Hook guest access to pages, and pull page contents from the source This continues until all the pages are pulled to the destination The big picture is depicted at http://wiki.qemu.org/File:Postcopy-livemigration.png That's terrific (nice video also)! Orit and myself had the exact same idea too (now we can't patent it..). Advantages: - No down time due to memory copying. But non-deterministic down time due to network latency while trying to satisfy a page fault. True but it is possible to limit it with some dedicated network or bandwidth reservation. - Efficient, reduce needed traffic no need to re-send pages. It's not quite that simple. Post-copy needs to introduce a protocol capable of requesting pages. Just another subsection.. (kidding), still it shouldn't be too complicated, just an offset+pagesize and return page_content/error I think in presenting something like this, it's important to collect quite a bit of performance data. I'd suggest doing runs while running jitterd in the guest to attempt to quantify the actual downtime experienced too. http://git.codemonkey.ws/cgit/jitterd.git/ and also comparing the speed that it takes for various benchmarks like iozone/netperf/linpack/.. There's a lot of potential in something like this, but it's not obvious to me whether it's a net win. Should make for a very interesting presentation :-) - Reduce overall RAM consumption of the source and destination as opposed from current live migration (both the source and the destination allocate the memory until the live migration completes). We can free copied memory once the destination guest received it and save RAM. - Increase parallelism for SMP guests we can have multiple virtual CPU handle their demand paging . Less time to hold a global lock, less thread contention. - Virtual machines are using more and more memory resources , for a virtual machine with very large working set doing live migration with reasonable down time is impossible today. This is really just a limitation of our implementation. In theory, pre-copy allows you to exert fine grain resource control over the guest which you can use to encourage convergence. But a very
Re: [Qemu-devel] [PATCH v4 01/39] memory: rename PORTIO_END to PORTIO_END_OF_LIST
On 08/08/2011 08:08 AM, Avi Kivity wrote: For consistency with other _END_OF_LIST macros. Signed-off-by: Avi Kivitya...@redhat.com Reviewed-by: Anthony Liguori aligu...@us.ibm.com Regards, Anthony Liguori --- memory.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/memory.h b/memory.h index 4e518b2..da00a3b 100644 --- a/memory.h +++ b/memory.h @@ -133,7 +133,7 @@ struct MemoryRegionPortio { IOPortWriteFunc *write; }; -#define PORTIO_END { } +#define PORTIO_END_OF_LIST() { } /** * memory_region_init: Initialize a memory region -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 00/39] Memory API, batch 2: PCI devices
On Mon, Aug 08, 2011 at 04:08:53PM +0300, Avi Kivity wrote: This is a mostly mindless conversion of all QEMU PCI devices to the memory API. After this patchset is applied, it is no longer possible to create a PCI device using the old API. An immediate benefit is that PCI BARs that overlap each other are now handled correctly: currently, the sequence map BAR 0 map BAR 1 at an overlapping address unmap either BAR 0 or BAR 1 will leave a hole where the overlap exists. With the patchset, the memory map is restored correctly. Note that overlaps of PCI BARs with memory or non-PCI resources are still not resolved correctly; this will be fixed later on. The vga patches have ugly intermediate states; however the result is fairly clean. Changes from v3: - dropped virtio-pci config patch; will be fixed outside this patchset if necessary - minor style fixes Changes from v2: - added patch from Michael simplifying virtio-pci config setup Changes from v1: - cmd646 type fix - folded a fixlet into its parent For the series: Acked-by: Michael S. Tsirkin m...@redhat.com Avi Kivity (39): memory: rename PORTIO_END to PORTIO_END_OF_LIST pci: add API to get a BAR's mapped address vmsvga: don't remember pci BAR address in callback any more vga: convert vga and its derivatives to the memory API cirrus: simplify mmio BAR access functions cirrus: simplify bitblt BAR access functions cirrus: simplify vga window mmio access functions vga: simplify vga window mmio access functions cirrus: simplify linear framebuffer access functions Integrate I/O memory regions into qemu pci: pass I/O address space to new PCI bus pci: allow I/O BARs to be registered with pci_register_bar_region() rtl8139: convert to memory API ac97: convert to memory API e1000: convert to memory API eepro100: convert to memory API es1370: convert to memory API ide: convert to memory API ivshmem: convert to memory API virtio-pci: convert to memory API ahci: convert to memory API intel-hda: convert to memory API lsi53c895a: convert to memory API ppc: convert to memory API ne2000: convert to memory API pcnet: convert to memory API i6300esb: convert to memory API isa-mmio: convert to memory API sun4u: convert to memory API ehci: convert to memory API uhci: convert to memory API xen-platform: convert to memory API msix: convert to memory API pci: remove pci_register_bar_simple() pci: convert pci rom to memory API pci: remove pci_register_bar() pci: fold BAR mapping function into its caller pci: rename pci_register_bar_region() to pci_register_bar() pci: remove support for pre memory API BARs exec-memory.h |5 + exec.c | 10 ++ hw/ac97.c | 88 ++- hw/apb_pci.c |1 + hw/bonito.c|1 + hw/cirrus_vga.c| 459 --- hw/cuda.c |6 +- hw/e1000.c | 113 ++ hw/eepro100.c | 181 - hw/es1370.c| 43 +++-- hw/escc.c | 42 +++--- hw/escc.h |2 +- hw/grackle_pci.c |8 +- hw/gt64xxx.c |4 +- hw/heathrow_pic.c | 29 ++-- hw/ide.h |2 +- hw/ide/ahci.c | 31 ++-- hw/ide/ahci.h |2 +- hw/ide/cmd646.c| 204 +++- hw/ide/ich.c |3 +- hw/ide/macio.c | 36 +++-- hw/ide/pci.c | 25 ++-- hw/ide/pci.h | 19 ++- hw/ide/piix.c | 63 ++-- hw/ide/via.c | 64 ++-- hw/intel-hda.c | 35 +++-- hw/isa.h |2 + hw/isa_mmio.c | 29 ++-- hw/ivshmem.c | 158 +++ hw/lance.c | 31 ++-- hw/lsi53c895a.c| 257 +++--- hw/mac_dbdma.c | 32 ++-- hw/mac_dbdma.h |4 +- hw/mac_nvram.c | 39 ++--- hw/macio.c | 73 - hw/msix.c | 64 +++- hw/msix.h |6 +- hw/ne2000-isa.c| 13 +-- hw/ne2000.c| 77 ++--- hw/ne2000.h|8 +- hw/openpic.c | 81 +- hw/openpic.h |2 +- hw/pc.h|4 +- hw/pc_piix.c |6 +- hw/pci.c | 133 +--- hw/pci.h | 26 ++-- hw/pci_internals.h |3 +- hw/pcnet-pci.c | 74 + hw/pcnet.h |4 +- hw/piix_pci.c | 14 +- hw/ppc4xx_pci.c|1 + hw/ppc_mac.h | 27 ++-- hw/ppc_newworld.c | 34 ++-- hw/ppc_oldworld.c | 27 ++-- hw/ppc_prep.c |2 +- hw/ppce500_pci.c |7 +- hw/prep_pci.c |8 +- hw/prep_pci.h |4 +- hw/qxl-render.c|2 +- hw/qxl.c | 129 ++-- hw/qxl.h |6 +- hw/rtl8139.c | 70 hw/sh_pci.c|4 +- hw/sun4u.c |
Re: [Qemu-devel] [PATCH v4 20/39] virtio-pci: convert to memory API
On 08/08/2011 08:09 AM, Avi Kivity wrote: except msix. [jan: fix build] This actually breaks the build: CClibhw64/virtio-pci.o cc1: warnings being treated as errors /home/anthony/git/qemu/hw/virtio-pci.c: In function ‘virtio_write_config’: /home/anthony/git/qemu/hw/virtio-pci.c:496:19: error: unused variable ‘vdev’ make[1]: *** [virtio-pci.o] Error 1 make: *** [subdir-libhw64] Error 2 Reviewed-by: Richard Hendersonr...@twiddle.net Reviewed-by: Anthony Liguorialigu...@us.ibm.com Signed-off-by: Avi Kivitya...@redhat.com --- hw/virtio-pci.c | 71 +-- hw/virtio-pci.h |2 +- 2 files changed, 28 insertions(+), 45 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index f3b3293..5df380d 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -162,7 +162,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, { VirtQueue *vq = virtio_get_queue(proxy-vdev, n); EventNotifier *notifier = virtio_queue_get_host_notifier(vq); -int r; +int r = 0; + if (assign) { r = event_notifier_init(notifier, 1); if (r 0) { @@ -170,24 +171,11 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, __func__, r); return r; } -r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), - proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, - n, assign); -if (r 0) { -error_report(%s: unable to map ioeventfd: %d, - __func__, r); -event_notifier_cleanup(notifier); -} +memory_region_add_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, + true, n, event_notifier_get_fd(notifier)); } else { -r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), - proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, - n, assign); -if (r 0) { -error_report(%s: unable to unmap ioeventfd: %d, - __func__, r); -return r; -} - +memory_region_del_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, + true, n, event_notifier_get_fd(notifier)); /* Handle the race condition where the guest kicked and we deassigned * before we got around to handling the kick. */ @@ -424,7 +412,6 @@ static uint32_t virtio_pci_config_readb(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -435,7 +422,6 @@ static uint32_t virtio_pci_config_readw(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -446,7 +432,6 @@ static uint32_t virtio_pci_config_readl(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -457,7 +442,6 @@ static void virtio_pci_config_writeb(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -470,7 +454,6 @@ static void virtio_pci_config_writew(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -483,7 +466,6 @@ static void virtio_pci_config_writel(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -492,30 +474,26 @@ static void virtio_pci_config_writel(void *opaque, uint32_t addr, uint32_t val) virtio_config_writel(proxy-vdev, addr, val); } -static void virtio_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -VirtIOPCIProxy *proxy = container_of(pci_dev, VirtIOPCIProxy, pci_dev); -VirtIODevice *vdev = proxy-vdev; -unsigned config_len = VIRTIO_PCI_REGION_SIZE(pci_dev) + vdev-config_len; - -proxy-addr = addr; - -register_ioport_write(addr, config_len, 1,
[PATCH v4.1 20/39] virtio-pci: convert to memory API
except msix. [jan: fix build] Reviewed-by: Richard Henderson r...@twiddle.net Reviewed-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Avi Kivity a...@redhat.com --- v4.1: drop unused variable hw/virtio-pci.c | 70 -- hw/virtio-pci.h |2 +- 2 files changed, 27 insertions(+), 45 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index f3b3293..86c3229 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -162,7 +162,8 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, { VirtQueue *vq = virtio_get_queue(proxy-vdev, n); EventNotifier *notifier = virtio_queue_get_host_notifier(vq); -int r; +int r = 0; + if (assign) { r = event_notifier_init(notifier, 1); if (r 0) { @@ -170,24 +171,11 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, __func__, r); return r; } -r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), - proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, - n, assign); -if (r 0) { -error_report(%s: unable to map ioeventfd: %d, - __func__, r); -event_notifier_cleanup(notifier); -} +memory_region_add_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, + true, n, event_notifier_get_fd(notifier)); } else { -r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), - proxy-addr + VIRTIO_PCI_QUEUE_NOTIFY, - n, assign); -if (r 0) { -error_report(%s: unable to unmap ioeventfd: %d, - __func__, r); -return r; -} - +memory_region_del_eventfd(proxy-bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, + true, n, event_notifier_get_fd(notifier)); /* Handle the race condition where the guest kicked and we deassigned * before we got around to handling the kick. */ @@ -424,7 +412,6 @@ static uint32_t virtio_pci_config_readb(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -435,7 +422,6 @@ static uint32_t virtio_pci_config_readw(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -446,7 +432,6 @@ static uint32_t virtio_pci_config_readl(void *opaque, uint32_t addr) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) return virtio_ioport_read(proxy, addr); addr -= config; @@ -457,7 +442,6 @@ static void virtio_pci_config_writeb(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -470,7 +454,6 @@ static void virtio_pci_config_writew(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -483,7 +466,6 @@ static void virtio_pci_config_writel(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; uint32_t config = VIRTIO_PCI_CONFIG(proxy-pci_dev); -addr -= proxy-addr; if (addr config) { virtio_ioport_write(proxy, addr, val); return; @@ -492,25 +474,20 @@ static void virtio_pci_config_writel(void *opaque, uint32_t addr, uint32_t val) virtio_config_writel(proxy-vdev, addr, val); } -static void virtio_map(PCIDevice *pci_dev, int region_num, - pcibus_t addr, pcibus_t size, int type) -{ -VirtIOPCIProxy *proxy = container_of(pci_dev, VirtIOPCIProxy, pci_dev); -VirtIODevice *vdev = proxy-vdev; -unsigned config_len = VIRTIO_PCI_REGION_SIZE(pci_dev) + vdev-config_len; - -proxy-addr = addr; - -register_ioport_write(addr, config_len, 1, virtio_pci_config_writeb, proxy); -register_ioport_write(addr, config_len, 2, virtio_pci_config_writew, proxy); -register_ioport_write(addr, config_len, 4, virtio_pci_config_writel, proxy); -register_ioport_read(addr, config_len, 1, virtio_pci_config_readb, proxy); -register_ioport_read(addr, config_len, 2, virtio_pci_config_readw, proxy); -register_ioport_read(addr,
Re: [Qemu-devel] [RFC] postcopy livemigration proposal
On 08/08/2011 10:11 AM, Dor Laor wrote: On 08/08/2011 03:32 PM, Anthony Liguori wrote: On 08/08/2011 04:20 AM, Dor Laor wrote: That's terrific (nice video also)! Orit and myself had the exact same idea too (now we can't patent it..). Advantages: - No down time due to memory copying. But non-deterministic down time due to network latency while trying to satisfy a page fault. True but it is possible to limit it with some dedicated network or bandwidth reservation. Yup. Any technique that uses RDMA (which is basically what this is) requires dedicated network resources. - Efficient, reduce needed traffic no need to re-send pages. It's not quite that simple. Post-copy needs to introduce a protocol capable of requesting pages. Just another subsection.. (kidding), still it shouldn't be too complicated, just an offset+pagesize and return page_content/error What I meant by this is that there is potentially a lot of round trip overhead. Pre-copy migration works well with reasonable high latency network connections because the downtime is capped only by the maximum latency sending from one point to another. But with something like this, the total downtime is 2*max_latency*nb_pagefaults. That's potentially pretty high. So it may be desirable to try to reduce nb_pagefaults by prefaulting in pages, etc. Suffice to say, this ends up getting complicated and may end up burning network traffic too. This is really just a limitation of our implementation. In theory, pre-copy allows you to exert fine grain resource control over the guest which you can use to encourage convergence. But a very large guest w/ large working set that changes more frequent than the network bandwidth might always need huge down time with the current system. In theory, you can do things like reduce the guests' priority to reduce the amount of work it can do in order to encourage convergence. One thing I think we need to do is put together a live migration roadmap. We've got a lot of invasive efforts underway with live migration and I fear that without some planning and serialization, some of this useful work with get lost. Some of them are parallel. I think all the readers here agree that post copy migration should be an option while we need to maintain the current one. I actually think they need to be done mostly in sequence while cleaning up some of the current infrastructure. I don't think we really should make any major changes (beyond maybe the separate thread) until we eliminate QEMUFile. There's so much overhead involved in using QEMUFile today, I think it's hard to talk about performance data when we've got a major bottleneck sitting in the middle. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] postcopy livemigration proposal
On 08/08/2011 06:29 PM, Anthony Liguori wrote: - Efficient, reduce needed traffic no need to re-send pages. It's not quite that simple. Post-copy needs to introduce a protocol capable of requesting pages. Just another subsection.. (kidding), still it shouldn't be too complicated, just an offset+pagesize and return page_content/error What I meant by this is that there is potentially a lot of round trip overhead. Pre-copy migration works well with reasonable high latency network connections because the downtime is capped only by the maximum latency sending from one point to another. But with something like this, the total downtime is 2*max_latency*nb_pagefaults. That's potentially pretty high. Let's be generous and assume that the latency is dominated by page copy time. So the total downtime is equal to the first live migration pass, ~20 sec for 2GB on 1GbE. It's distributed over potentially even more time, though. If the guest does a lot of I/O, it may not be noticeable (esp. if we don't copy over pages read from disk). If the guest is cpu/memory bound, it'll probably suck badly. So it may be desirable to try to reduce nb_pagefaults by prefaulting in pages, etc. Suffice to say, this ends up getting complicated and may end up burning network traffic too. Yeah, and prefaulting in the background adds latency to synchronous requests. This really needs excellent networking resources to work well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] postcopy livemigration proposal
On 08/08/2011 10:36 AM, Avi Kivity wrote: On 08/08/2011 06:29 PM, Anthony Liguori wrote: - Efficient, reduce needed traffic no need to re-send pages. It's not quite that simple. Post-copy needs to introduce a protocol capable of requesting pages. Just another subsection.. (kidding), still it shouldn't be too complicated, just an offset+pagesize and return page_content/error What I meant by this is that there is potentially a lot of round trip overhead. Pre-copy migration works well with reasonable high latency network connections because the downtime is capped only by the maximum latency sending from one point to another. But with something like this, the total downtime is 2*max_latency*nb_pagefaults. That's potentially pretty high. Let's be generous and assume that the latency is dominated by page copy time. So the total downtime is equal to the first live migration pass, ~20 sec for 2GB on 1GbE. It's distributed over potentially even more time, though. If the guest does a lot of I/O, it may not be noticeable (esp. if we don't copy over pages read from disk). If the guest is cpu/memory bound, it'll probably suck badly. So it may be desirable to try to reduce nb_pagefaults by prefaulting in pages, etc. Suffice to say, this ends up getting complicated and may end up burning network traffic too. Yeah, and prefaulting in the background adds latency to synchronous requests. This really needs excellent networking resources to work well. Yup, it's very similar to other technologies using RDMA (single system image, lock step execution, etc.). Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v4 00/39] Memory API, batch 2: PCI devices
On 08/08/2011 08:08 AM, Avi Kivity wrote: This is a mostly mindless conversion of all QEMU PCI devices to the memory API. After this patchset is applied, it is no longer possible to create a PCI device using the old API. An immediate benefit is that PCI BARs that overlap each other are now handled correctly: currently, the sequence map BAR 0 map BAR 1 at an overlapping address unmap either BAR 0 or BAR 1 will leave a hole where the overlap exists. With the patchset, the memory map is restored correctly. Note that overlaps of PCI BARs with memory or non-PCI resources are still not resolved correctly; this will be fixed later on. The vga patches have ugly intermediate states; however the result is fairly clean. Applied all. Thanks for taking this on, the results are very nice! Regards, Anthony Liguori Changes from v3: - dropped virtio-pci config patch; will be fixed outside this patchset if necessary - minor style fixes Changes from v2: - added patch from Michael simplifying virtio-pci config setup Changes from v1: - cmd646 type fix - folded a fixlet into its parent Avi Kivity (39): memory: rename PORTIO_END to PORTIO_END_OF_LIST pci: add API to get a BAR's mapped address vmsvga: don't remember pci BAR address in callback any more vga: convert vga and its derivatives to the memory API cirrus: simplify mmio BAR access functions cirrus: simplify bitblt BAR access functions cirrus: simplify vga window mmio access functions vga: simplify vga window mmio access functions cirrus: simplify linear framebuffer access functions Integrate I/O memory regions into qemu pci: pass I/O address space to new PCI bus pci: allow I/O BARs to be registered with pci_register_bar_region() rtl8139: convert to memory API ac97: convert to memory API e1000: convert to memory API eepro100: convert to memory API es1370: convert to memory API ide: convert to memory API ivshmem: convert to memory API virtio-pci: convert to memory API ahci: convert to memory API intel-hda: convert to memory API lsi53c895a: convert to memory API ppc: convert to memory API ne2000: convert to memory API pcnet: convert to memory API i6300esb: convert to memory API isa-mmio: convert to memory API sun4u: convert to memory API ehci: convert to memory API uhci: convert to memory API xen-platform: convert to memory API msix: convert to memory API pci: remove pci_register_bar_simple() pci: convert pci rom to memory API pci: remove pci_register_bar() pci: fold BAR mapping function into its caller pci: rename pci_register_bar_region() to pci_register_bar() pci: remove support for pre memory API BARs exec-memory.h |5 + exec.c | 10 ++ hw/ac97.c | 88 ++- hw/apb_pci.c |1 + hw/bonito.c|1 + hw/cirrus_vga.c| 459 --- hw/cuda.c |6 +- hw/e1000.c | 113 ++ hw/eepro100.c | 181 - hw/es1370.c| 43 +++-- hw/escc.c | 42 +++--- hw/escc.h |2 +- hw/grackle_pci.c |8 +- hw/gt64xxx.c |4 +- hw/heathrow_pic.c | 29 ++-- hw/ide.h |2 +- hw/ide/ahci.c | 31 ++-- hw/ide/ahci.h |2 +- hw/ide/cmd646.c| 204 +++- hw/ide/ich.c |3 +- hw/ide/macio.c | 36 +++-- hw/ide/pci.c | 25 ++-- hw/ide/pci.h | 19 ++- hw/ide/piix.c | 63 ++-- hw/ide/via.c | 64 ++-- hw/intel-hda.c | 35 +++-- hw/isa.h |2 + hw/isa_mmio.c | 29 ++-- hw/ivshmem.c | 158 +++ hw/lance.c | 31 ++-- hw/lsi53c895a.c| 257 +++--- hw/mac_dbdma.c | 32 ++-- hw/mac_dbdma.h |4 +- hw/mac_nvram.c | 39 ++--- hw/macio.c | 73 - hw/msix.c | 64 +++- hw/msix.h |6 +- hw/ne2000-isa.c| 13 +-- hw/ne2000.c| 77 ++--- hw/ne2000.h|8 +- hw/openpic.c | 81 +- hw/openpic.h |2 +- hw/pc.h|4 +- hw/pc_piix.c |6 +- hw/pci.c | 133 +--- hw/pci.h | 26 ++-- hw/pci_internals.h |3 +- hw/pcnet-pci.c | 74 + hw/pcnet.h |4 +- hw/piix_pci.c | 14 +- hw/ppc4xx_pci.c|1 + hw/ppc_mac.h | 27 ++-- hw/ppc_newworld.c | 34 ++-- hw/ppc_oldworld.c | 27 ++-- hw/ppc_prep.c |2 +- hw/ppce500_pci.c |7 +- hw/prep_pci.c |8 +- hw/prep_pci.h |4 +- hw/qxl-render.c|2 +- hw/qxl.c | 129 ++-- hw/qxl.h |6 +- hw/rtl8139.c | 70 hw/sh_pci.c|4 +- hw/sun4u.c | 53
[PATCH 1/3] memory: reclaim resources when a memory region is destroyed for good
Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 24 memory.h |1 + 2 files changed, 25 insertions(+), 0 deletions(-) diff --git a/memory.c b/memory.c index be891c6..5e3d966 100644 --- a/memory.c +++ b/memory.c @@ -661,6 +661,25 @@ void memory_region_transaction_commit(void) memory_region_update_topology(); } +static void memory_region_destructor_none(MemoryRegion *mr) +{ +} + +static void memory_region_destructor_ram(MemoryRegion *mr) +{ +qemu_ram_free(mr-ram_addr); +} + +static void memory_region_destructor_ram_from_ptr(MemoryRegion *mr) +{ +qemu_ram_free_from_ptr(mr-ram_addr); +} + +static void memory_region_destructor_iomem(MemoryRegion *mr) +{ +cpu_unregister_io_memory(mr-ram_addr); +} + void memory_region_init(MemoryRegion *mr, const char *name, uint64_t size) @@ -671,6 +690,7 @@ void memory_region_init(MemoryRegion *mr, mr-addr = 0; mr-offset = 0; mr-terminates = false; +mr-destructor = memory_region_destructor_none; mr-priority = 0; mr-may_overlap = false; mr-alias = NULL; @@ -833,6 +853,7 @@ static void memory_region_prepare_ram_addr(MemoryRegion *mr) return; } +mr-destructor = memory_region_destructor_iomem; mr-ram_addr = cpu_register_io_memory(memory_region_read_thunk, memory_region_write_thunk, mr, @@ -860,6 +881,7 @@ void memory_region_init_ram(MemoryRegion *mr, { memory_region_init(mr, name, size); mr-terminates = true; +mr-destructor = memory_region_destructor_ram; mr-ram_addr = qemu_ram_alloc(dev, name, size); mr-backend_registered = true; } @@ -872,6 +894,7 @@ void memory_region_init_ram_ptr(MemoryRegion *mr, { memory_region_init(mr, name, size); mr-terminates = true; +mr-destructor = memory_region_destructor_ram_from_ptr; mr-ram_addr = qemu_ram_alloc_from_ptr(dev, name, size, ptr); mr-backend_registered = true; } @@ -890,6 +913,7 @@ void memory_region_init_alias(MemoryRegion *mr, void memory_region_destroy(MemoryRegion *mr) { assert(QTAILQ_EMPTY(mr-subregions)); +mr-destructor(mr); memory_region_clear_coalescing(mr); qemu_free((char *)mr-name); qemu_free(mr-ioeventfds); diff --git a/memory.h b/memory.h index da00a3b..c9252a2 100644 --- a/memory.h +++ b/memory.h @@ -109,6 +109,7 @@ struct MemoryRegion { target_phys_addr_t addr; target_phys_addr_t offset; bool backend_registered; +void (*destructor)(MemoryRegion *mr); ram_addr_t ram_addr; IORange iorange; bool terminates; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] memory: add API for creating ROM/device regions
ROM/device regions act as mapped RAM for reads, can I/O memory for writes. This allow emulation of flash devices. Signed-off-by: Avi Kivity a...@redhat.com --- memory.c | 46 -- memory.h | 34 ++ 2 files changed, 78 insertions(+), 2 deletions(-) diff --git a/memory.c b/memory.c index 5e3d966..beff98c 100644 --- a/memory.c +++ b/memory.c @@ -125,6 +125,7 @@ struct FlatRange { target_phys_addr_t offset_in_region; AddrRange addr; uint8_t dirty_log_mask; +bool readable; }; /* Flattened global view of current active memory hierarchy. Kept in sorted @@ -164,7 +165,8 @@ static bool flatrange_equal(FlatRange *a, FlatRange *b) { return a-mr == b-mr addrrange_equal(a-addr, b-addr) - a-offset_in_region == b-offset_in_region; + a-offset_in_region == b-offset_in_region + a-readable == b-readable; } static void flatview_init(FlatView *view) @@ -200,7 +202,8 @@ static bool can_merge(FlatRange *r1, FlatRange *r2) return addrrange_end(r1-addr) == r2-addr.start r1-mr == r2-mr r1-offset_in_region + r1-addr.size == r2-offset_in_region - r1-dirty_log_mask == r2-dirty_log_mask; + r1-dirty_log_mask == r2-dirty_log_mask + r1-readable == r2-readable; } /* Attempt to simplify a view by merging ajacent ranges */ @@ -241,6 +244,10 @@ static void as_memory_range_add(AddressSpace *as, FlatRange *fr) region_offset = 0; } +if (!fr-readable) { +phys_offset = TARGET_PAGE_MASK; +} + cpu_register_physical_memory_log(fr-addr.start, fr-addr.size, phys_offset, @@ -462,6 +469,7 @@ static void render_memory_region(FlatView *view, fr.offset_in_region = offset_in_region; fr.addr = addrrange_make(base, now); fr.dirty_log_mask = mr-dirty_log_mask; +fr.readable = mr-readable; flatview_insert(view, i, fr); ++i; base += now; @@ -480,6 +488,7 @@ static void render_memory_region(FlatView *view, fr.offset_in_region = offset_in_region; fr.addr = addrrange_make(base, remain); fr.dirty_log_mask = mr-dirty_log_mask; +fr.readable = mr-readable; flatview_insert(view, i, fr); } } @@ -680,6 +689,12 @@ static void memory_region_destructor_iomem(MemoryRegion *mr) cpu_unregister_io_memory(mr-ram_addr); } +static void memory_region_destructor_rom_device(MemoryRegion *mr) +{ +qemu_ram_free(mr-ram_addr TARGET_PAGE_MASK); +cpu_unregister_io_memory(mr-ram_addr ~(TARGET_PAGE_MASK | IO_MEM_ROMD)); +} + void memory_region_init(MemoryRegion *mr, const char *name, uint64_t size) @@ -690,6 +705,7 @@ void memory_region_init(MemoryRegion *mr, mr-addr = 0; mr-offset = 0; mr-terminates = false; +mr-readable = true; mr-destructor = memory_region_destructor_none; mr-priority = 0; mr-may_overlap = false; @@ -910,6 +926,24 @@ void memory_region_init_alias(MemoryRegion *mr, mr-alias_offset = offset; } +void memory_region_init_rom_device(MemoryRegion *mr, + const MemoryRegionOps *ops, + DeviceState *dev, + const char *name, + uint64_t size) +{ +memory_region_init(mr, name, size); +mr-terminates = true; +mr-destructor = memory_region_destructor_rom_device; +mr-ram_addr = qemu_ram_alloc(dev, name, size); +mr-ram_addr |= cpu_register_io_memory(memory_region_read_thunk, + memory_region_write_thunk, + mr, + mr-ops-endianness); +mr-ram_addr |= IO_MEM_ROMD; +mr-backend_registered = true; +} + void memory_region_destroy(MemoryRegion *mr) { assert(QTAILQ_EMPTY(mr-subregions)); @@ -967,6 +1001,14 @@ void memory_region_set_readonly(MemoryRegion *mr, bool readonly) /* FIXME */ } +void memory_region_rom_device_set_readable(MemoryRegion *mr, bool readable) +{ +if (mr-readable != readable) { +mr-readable = readable; +memory_region_update_topology(); +} +} + void memory_region_reset_dirty(MemoryRegion *mr, target_phys_addr_t addr, target_phys_addr_t size, unsigned client) { diff --git a/memory.h b/memory.h index c9252a2..0553cc7 100644 --- a/memory.h +++ b/memory.h @@ -113,6 +113,7 @@ struct MemoryRegion { ram_addr_t ram_addr; IORange iorange; bool terminates; +bool readable; MemoryRegion *alias; target_phys_addr_t alias_offset; unsigned priority; @@ -219,6 +220,25 @@ void memory_region_init_alias(MemoryRegion *mr,
[PATCH 3/3] memory: correct documentation typos
Noted by Drew Jones. Signed-off-by: Avi Kivity a...@redhat.com --- docs/memory.txt |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/memory.txt b/docs/memory.txt index 4460c06..3fc1683 100644 --- a/docs/memory.txt +++ b/docs/memory.txt @@ -15,7 +15,7 @@ The memory model provides support for - setting up coalesced memory for kvm - setting up ioeventfd regions for kvm -Memory is modelled as an tree (really acyclic graph) of MemoryRegion objects. +Memory is modelled as a tree (really acyclic graph) of MemoryRegion objects. The root of the tree is memory as seen from the CPU's viewpoint (the system bus). Nodes in the tree represent other buses, memory controllers, and memory regions that have been rerouted. Leaves are RAM and MMIO regions. @@ -87,7 +87,7 @@ guest accesses an address: descending priority order - if the address lies outside the region offset/size, the subregion is discarded - - if the subregion is a leaf (RAM or MMIO), the seach terminates + - if the subregion is a leaf (RAM or MMIO), the search terminates - if the subregion is a container, the same algorithm is used within the subregion (after the address is adjusted by the subregion offset) - if the subregion is an alias, the search is continues at the alias target @@ -128,7 +128,7 @@ so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with 4GB of memory. The memory controller diverts addresses in the range 640K-768K to the PCI -address space. This is modeled using the vga-window alias, mapped at a +address space. This is modelled using the vga-window alias, mapped at a higher priority so it obscures the RAM at the same addresses. The vga window can be removed by programming the memory controller; this is modelled by removing the alias and exposing the RAM underneath. @@ -164,7 +164,7 @@ various constraints can be supplied to control how these callbacks are called: - .impl.min_access_size, .impl.max_access_size define the access sizes (in bytes) supported by the *implementation*; other access sizes will be emulated using the ones available. For example a 4-byte write will be - emulated using four 1-byte write, is .impl.max_access_size = 1. + emulated using four 1-byte write, if .impl.max_access_size = 1. - .impl.valid specifies that the *implementation* only supports unaligned accesses; unaligned accesses will be emulated by two aligned accesses. - .old_portio and .old_mmio can be used to ease porting from code using -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Memory API updates
The following patches fix a resource leak, add a ROM/device API (for flash devices which act like memory when read, and as an mmio device when written), and correct typos in the documentation. Avi Kivity (3): memory: reclaim resources when a memory region is destroyed for good memory: add API for creating ROM/device regions memory: correct documentation typos docs/memory.txt |8 +++--- memory.c| 70 +- memory.h| 35 +++ 3 files changed, 107 insertions(+), 6 deletions(-) -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/24] sh_pci: convert to memory API
Signed-off-by: Avi Kivity a...@redhat.com --- hw/sh_pci.c | 63 +++--- 1 files changed, 42 insertions(+), 21 deletions(-) diff --git a/hw/sh_pci.c b/hw/sh_pci.c index cd86501..76061bb 100644 --- a/hw/sh_pci.c +++ b/hw/sh_pci.c @@ -33,13 +33,16 @@ typedef struct SHPCIState { PCIBus *bus; PCIDevice *dev; qemu_irq irq[4]; -int memconfig; +MemoryRegion memconfig_p4; +MemoryRegion memconfig_a7; +MemoryRegion isa; uint32_t par; uint32_t mbr; uint32_t iobr; } SHPCIState; -static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint32_t val) +static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint64_t val, + unsigned size) { SHPCIState *pcic = p; switch(addr) { @@ -54,10 +57,10 @@ static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint32_t val) break; case 0x1c8: if ((val 0xfffc) != (pcic-iobr 0xfffc)) { -cpu_register_physical_memory(pcic-iobr 0xfffc, 0x4, - IO_MEM_UNASSIGNED); +memory_region_del_subregion(get_system_memory(), pcic-isa); pcic-iobr = val 0xfffc0001; -isa_mmio_init(pcic-iobr 0xfffc, 0x4); +memory_region_add_subregion(get_system_memory(), +pcic-iobr 0xfffc, pcic-isa); } break; case 0x220: @@ -66,7 +69,8 @@ static void sh_pci_reg_write (void *p, target_phys_addr_t addr, uint32_t val) } } -static uint32_t sh_pci_reg_read (void *p, target_phys_addr_t addr) +static uint64_t sh_pci_reg_read (void *p, target_phys_addr_t addr, + unsigned size) { SHPCIState *pcic = p; switch(addr) { @@ -84,14 +88,14 @@ static uint32_t sh_pci_reg_read (void *p, target_phys_addr_t addr) return 0; } -typedef struct { -CPUReadMemoryFunc * const r[3]; -CPUWriteMemoryFunc * const w[3]; -} MemOp; - -static MemOp sh_pci_reg = { -{ NULL, NULL, sh_pci_reg_read }, -{ NULL, NULL, sh_pci_reg_write }, +static const MemoryRegionOps sh_pci_reg_ops = { +.read = sh_pci_reg_read, +.write = sh_pci_reg_write, +.endianness = DEVICE_NATIVE_ENDIAN, +.valid = { +.min_access_size = 4, +.max_access_size = 4, +}, }; static int sh_pci_map_irq(PCIDevice *d, int irq_num) @@ -110,11 +114,23 @@ static void sh_pci_map(SysBusDevice *dev, target_phys_addr_t base) { SHPCIState *s = FROM_SYSBUS(SHPCIState, dev); -cpu_register_physical_memory(P4ADDR(base), 0x224, s-memconfig); -cpu_register_physical_memory(A7ADDR(base), 0x224, s-memconfig); - +memory_region_add_subregion(get_system_memory(), +P4ADDR(base), +s-memconfig_p4); +memory_region_add_subregion(get_system_memory(), +A7ADDR(base), +s-memconfig_a7); s-iobr = 0xfe24; -isa_mmio_init(s-iobr, 0x4); +memory_region_add_subregion(get_system_memory(), s-iobr, s-isa); +} + +static void sh_pci_unmap(SysBusDevice *dev, target_phys_addr_t base) +{ +SHPCIState *s = FROM_SYSBUS(SHPCIState, dev); + +memory_region_del_subregion(get_system_memory(), s-memconfig_p4); +memory_region_del_subregion(get_system_memory(), s-memconfig_a7); +memory_region_del_subregion(get_system_memory(), s-isa); } static int sh_pci_init_device(SysBusDevice *dev) @@ -132,9 +148,14 @@ static int sh_pci_init_device(SysBusDevice *dev) get_system_memory(), get_system_io(), PCI_DEVFN(0, 0), 4); -s-memconfig = cpu_register_io_memory(sh_pci_reg.r, sh_pci_reg.w, - s, DEVICE_NATIVE_ENDIAN); -sysbus_init_mmio_cb(dev, 0x224, sh_pci_map); +memory_region_init_io(s-memconfig_p4, sh_pci_reg_ops, s, + sh_pci, 0x224); +memory_region_init_alias(s-memconfig_a7, sh_pci.2, s-memconfig_a7, + 0, 0x224); +isa_mmio_setup(s-isa, 0x4); +sysbus_init_mmio_cb2(dev, sh_pci_map, sh_pci_unmap); +sysbus_init_mmio_region(dev, s-memconfig_a7); +sysbus_init_mmio_region(dev, s-isa); s-dev = pci_create_simple(s-bus, PCI_DEVFN(0, 0), sh_pci_host); return 0; } -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/24] tusb6010: move declarations to new file tusb6010.h
Avoid #include hell. Signed-off-by: Avi Kivity a...@redhat.com --- hw/devices.h |7 --- hw/nseries.c |1 + hw/tusb6010.c |2 +- hw/tusb6010.h | 10 ++ 4 files changed, 12 insertions(+), 8 deletions(-) create mode 100644 hw/tusb6010.h diff --git a/hw/devices.h b/hw/devices.h index c788373..07fda83 100644 --- a/hw/devices.h +++ b/hw/devices.h @@ -47,13 +47,6 @@ void *tahvo_init(qemu_irq irq, int betty); void retu_key_event(void *retu, int state); -/* tusb6010.c */ -typedef struct TUSBState TUSBState; -TUSBState *tusb6010_init(qemu_irq intr); -int tusb6010_sync_io(TUSBState *s); -int tusb6010_async_io(TUSBState *s); -void tusb6010_power(TUSBState *s, int on); - /* tc6393xb.c */ typedef struct TC6393xbState TC6393xbState; #define TC6393XB_RAM 0x11 /* amount of ram for Video and USB */ diff --git a/hw/nseries.c b/hw/nseries.c index 6a5575e..5521f28 100644 --- a/hw/nseries.c +++ b/hw/nseries.c @@ -32,6 +32,7 @@ #include bt.h #include loader.h #include blockdev.h +#include tusb6010.h /* Nokia N8x0 support */ struct n800_s { diff --git a/hw/tusb6010.c b/hw/tusb6010.c index ccd01ad..add748c 100644 --- a/hw/tusb6010.c +++ b/hw/tusb6010.c @@ -23,7 +23,7 @@ #include usb.h #include omap.h #include irq.h -#include devices.h +#include tusb6010.h struct TUSBState { int iomemtype[2]; diff --git a/hw/tusb6010.h b/hw/tusb6010.h new file mode 100644 index 000..6faa94d --- /dev/null +++ b/hw/tusb6010.h @@ -0,0 +1,10 @@ +#ifndef TUSB6010_H +#define TUSB6010_H + +typedef struct TUSBState TUSBState; +TUSBState *tusb6010_init(qemu_irq intr); +int tusb6010_sync_io(TUSBState *s); +int tusb6010_async_io(TUSBState *s); +void tusb6010_power(TUSBState *s, int on); + +#endif -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/24] sysbus: add a variant of sysbus_init_mmio_cb with an unmap callback
sysbus_init_mmio_cb() uses the destructive IO_MEM_UNASSIGNED to remove a region. Provide an alternative that calls an unmap callback, so the removal may be done non-destructively. Signed-off-by: Avi Kivity a...@redhat.com --- hw/sysbus.c | 15 +++ hw/sysbus.h |3 +++ 2 files changed, 18 insertions(+), 0 deletions(-) diff --git a/hw/sysbus.c b/hw/sysbus.c index ea442ac..64749e0 100644 --- a/hw/sysbus.c +++ b/hw/sysbus.c @@ -53,6 +53,8 @@ void sysbus_mmio_map(SysBusDevice *dev, int n, target_phys_addr_t addr) if (dev-mmio[n].memory) { memory_region_del_subregion(get_system_memory(), dev-mmio[n].memory); +} else if (dev-mmio[n].unmap) { +dev-mmio[n].unmap(dev, dev-mmio[n].addr); } else { cpu_register_physical_memory(dev-mmio[n].addr, dev-mmio[n].size, IO_MEM_UNASSIGNED); @@ -117,6 +119,19 @@ void sysbus_init_mmio_cb(SysBusDevice *dev, target_phys_addr_t size, dev-mmio[n].cb = cb; } +void sysbus_init_mmio_cb2(SysBusDevice *dev, + mmio_mapfunc cb, mmio_mapfunc unmap) +{ +int n; + +assert(dev-num_mmio QDEV_MAX_MMIO); +n = dev-num_mmio++; +dev-mmio[n].addr = -1; +dev-mmio[n].size = 0; +dev-mmio[n].cb = cb; +dev-mmio[n].unmap = unmap; +} + void sysbus_init_mmio_region(SysBusDevice *dev, MemoryRegion *memory) { int n; diff --git a/hw/sysbus.h b/hw/sysbus.h index 5f62e2d..16fd969 100644 --- a/hw/sysbus.h +++ b/hw/sysbus.h @@ -23,6 +23,7 @@ struct SysBusDevice { target_phys_addr_t addr; target_phys_addr_t size; mmio_mapfunc cb; +mmio_mapfunc unmap; ram_addr_t iofunc; MemoryRegion *memory; } mmio[QDEV_MAX_MMIO]; @@ -48,6 +49,8 @@ void sysbus_init_mmio(SysBusDevice *dev, target_phys_addr_t size, ram_addr_t iofunc); void sysbus_init_mmio_cb(SysBusDevice *dev, target_phys_addr_t size, mmio_mapfunc cb); +void sysbus_init_mmio_cb2(SysBusDevice *dev, + mmio_mapfunc cb, mmio_mapfunc unmap); void sysbus_init_mmio_region(SysBusDevice *dev, MemoryRegion *memory); void sysbus_init_irq(SysBusDevice *dev, qemu_irq *p); void sysbus_pass_irq(SysBusDevice *dev, SysBusDevice *target); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/24] arm_gic: convert to memory API
Signed-off-by: Avi Kivity a...@redhat.com --- hw/arm_gic.c | 22 -- hw/armv7m_nvic.c |3 ++- hw/mpcore.c | 37 + hw/realview_gic.c | 38 +- 4 files changed, 44 insertions(+), 56 deletions(-) diff --git a/hw/arm_gic.c b/hw/arm_gic.c index fb07314..83213dd 100644 --- a/hw/arm_gic.c +++ b/hw/arm_gic.c @@ -104,7 +104,7 @@ typedef struct gic_state int num_cpu; #endif -int iomemtype; +MemoryRegion iomem; } gic_state; /* TODO: Many places that call this routine could be optimized. */ @@ -567,16 +567,12 @@ static void gic_dist_writel(void *opaque, target_phys_addr_t offset, gic_dist_writew(opaque, offset + 2, value 16); } -static CPUReadMemoryFunc * const gic_dist_readfn[] = { - gic_dist_readb, - gic_dist_readw, - gic_dist_readl -}; - -static CPUWriteMemoryFunc * const gic_dist_writefn[] = { - gic_dist_writeb, - gic_dist_writew, - gic_dist_writel +static const MemoryRegionOps gic_dist_ops = { +.old_mmio = { +.read = { gic_dist_readb, gic_dist_readw, gic_dist_readl, }, +.write = { gic_dist_writeb, gic_dist_writew, gic_dist_writel, }, +}, +.endianness = DEVICE_NATIVE_ENDIAN, }; #ifndef NVIC @@ -741,9 +737,7 @@ static void gic_init(gic_state *s) for (i = 0; i NUM_CPU(s); i++) { sysbus_init_irq(s-busdev, s-parent_irq[i]); } -s-iomemtype = cpu_register_io_memory(gic_dist_readfn, - gic_dist_writefn, s, - DEVICE_NATIVE_ENDIAN); +memory_region_init_io(s-iomem, gic_dist_ops, s, gic_dist, 0x1000); gic_reset(s); register_savevm(NULL, arm_gic, -1, 1, gic_save, gic_load, s); } diff --git a/hw/armv7m_nvic.c b/hw/armv7m_nvic.c index 1df8d4d..bf8c3c5 100644 --- a/hw/armv7m_nvic.c +++ b/hw/armv7m_nvic.c @@ -13,6 +13,7 @@ #include sysbus.h #include qemu-timer.h #include arm-misc.h +#include exec-memory.h /* 32 internal lines (16 used for system exceptions) plus 64 external interrupt lines. */ @@ -384,7 +385,7 @@ static int armv7m_nvic_init(SysBusDevice *dev) nvic_state *s= FROM_SYSBUSGIC(nvic_state, dev); gic_init(s-gic); -cpu_register_physical_memory(0xe000e000, 0x1000, s-gic.iomemtype); +memory_region_add_subregion(get_system_memory(), 0xe000e000, s-gic.iomem); s-systick.timer = qemu_new_timer_ns(vm_clock, systick_timer_tick, s); vmstate_register(dev-qdev, -1, vmstate_nvic, s); return 0; diff --git a/hw/mpcore.c b/hw/mpcore.c index d778507..d6175cf 100644 --- a/hw/mpcore.c +++ b/hw/mpcore.c @@ -40,6 +40,8 @@ typedef struct mpcore_priv_state { int iomemtype; mpcore_timer_state timer[8]; uint32_t num_cpu; +MemoryRegion iomem; +MemoryRegion container; } mpcore_priv_state; /* Per-CPU Timers. */ @@ -151,7 +153,8 @@ static void mpcore_timer_init(mpcore_priv_state *mpcore, /* Per-CPU private memory mapped IO. */ -static uint32_t mpcore_priv_read(void *opaque, target_phys_addr_t offset) +static uint64_t mpcore_priv_read(void *opaque, target_phys_addr_t offset, + unsigned size) { mpcore_priv_state *s = (mpcore_priv_state *)opaque; int id; @@ -203,7 +206,7 @@ bad_reg: } static void mpcore_priv_write(void *opaque, target_phys_addr_t offset, - uint32_t value) + uint64_t value, unsigned size) { mpcore_priv_state *s = (mpcore_priv_state *)opaque; int id; @@ -250,23 +253,19 @@ bad_reg: hw_error(mpcore_priv_read: Bad offset %x\n, (int)offset); } -static CPUReadMemoryFunc * const mpcore_priv_readfn[] = { - mpcore_priv_read, - mpcore_priv_read, - mpcore_priv_read +static const MemoryRegionOps mpcore_priv_ops = { +.read = mpcore_priv_read, +.write = mpcore_priv_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; -static CPUWriteMemoryFunc * const mpcore_priv_writefn[] = { - mpcore_priv_write, - mpcore_priv_write, - mpcore_priv_write -}; - -static void mpcore_priv_map(SysBusDevice *dev, target_phys_addr_t base) +static void mpcore_priv_map_setup(mpcore_priv_state *s) { -mpcore_priv_state *s = FROM_SYSBUSGIC(mpcore_priv_state, dev); -cpu_register_physical_memory(base, 0x1000, s-iomemtype); -cpu_register_physical_memory(base + 0x1000, 0x1000, s-gic.iomemtype); +memory_region_init(s-container, mpcode-priv-container, 0x2000); +memory_region_init_io(s-iomem, mpcore_priv_ops, s, mpcode-priv, + 0x1000); +memory_region_add_subregion(s-container, 0, s-iomem); +memory_region_add_subregion(s-container, 0x1000, s-gic.iomem); } static int mpcore_priv_init(SysBusDevice *dev) @@ -275,10 +274,8 @@ static int mpcore_priv_init(SysBusDevice *dev) int i; gic_init(s-gic, s-num_cpu); -s-iomemtype = cpu_register_io_memory(mpcore_priv_readfn, -
[PATCH 14/24] stellaris_enet: convert to memory API
Signed-off-by: Avi Kivity a...@redhat.com --- hw/stellaris_enet.c | 29 - 1 files changed, 12 insertions(+), 17 deletions(-) diff --git a/hw/stellaris_enet.c b/hw/stellaris_enet.c index 1291931..9f1f37a 100644 --- a/hw/stellaris_enet.c +++ b/hw/stellaris_enet.c @@ -69,7 +69,7 @@ typedef struct { NICState *nic; NICConf conf; qemu_irq irq; -int mmio_index; +MemoryRegion mmio; } stellaris_enet_state; static void stellaris_enet_update(stellaris_enet_state *s) @@ -130,7 +130,8 @@ static int stellaris_enet_can_receive(VLANClientState *nc) return (s-np 31); } -static uint32_t stellaris_enet_read(void *opaque, target_phys_addr_t offset) +static uint64_t stellaris_enet_read(void *opaque, target_phys_addr_t offset, +unsigned size) { stellaris_enet_state *s = (stellaris_enet_state *)opaque; uint32_t val; @@ -198,7 +199,7 @@ static uint32_t stellaris_enet_read(void *opaque, target_phys_addr_t offset) } static void stellaris_enet_write(void *opaque, target_phys_addr_t offset, -uint32_t value) + uint64_t value, unsigned size) { stellaris_enet_state *s = (stellaris_enet_state *)opaque; @@ -303,17 +304,12 @@ static void stellaris_enet_write(void *opaque, target_phys_addr_t offset, } } -static CPUReadMemoryFunc * const stellaris_enet_readfn[] = { - stellaris_enet_read, - stellaris_enet_read, - stellaris_enet_read +static const MemoryRegionOps stellaris_enet_ops = { +.read = stellaris_enet_read, +.write = stellaris_enet_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; -static CPUWriteMemoryFunc * const stellaris_enet_writefn[] = { - stellaris_enet_write, - stellaris_enet_write, - stellaris_enet_write -}; static void stellaris_enet_reset(stellaris_enet_state *s) { s-mdv = 0x80; @@ -391,7 +387,7 @@ static void stellaris_enet_cleanup(VLANClientState *nc) unregister_savevm(s-busdev.qdev, stellaris_enet, s); -cpu_unregister_io_memory(s-mmio_index); +memory_region_destroy(s-mmio); qemu_free(s); } @@ -408,10 +404,9 @@ static int stellaris_enet_init(SysBusDevice *dev) { stellaris_enet_state *s = FROM_SYSBUS(stellaris_enet_state, dev); -s-mmio_index = cpu_register_io_memory(stellaris_enet_readfn, - stellaris_enet_writefn, s, - DEVICE_NATIVE_ENDIAN); -sysbus_init_mmio(dev, 0x1000, s-mmio_index); +memory_region_init_io(s-mmio, stellaris_enet_ops, s, stellaris_enet, + 0x1000); +sysbus_init_mmio_region(dev, s-mmio); sysbus_init_irq(dev, s-irq); qemu_macaddr_default_if_unset(s-conf.macaddr); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/24] arm_sysctl: convert to memory API
Signed-off-by: Avi Kivity a...@redhat.com --- hw/arm_sysctl.c | 27 ++- 1 files changed, 10 insertions(+), 17 deletions(-) diff --git a/hw/arm_sysctl.c b/hw/arm_sysctl.c index fd0c8bc..1838401 100644 --- a/hw/arm_sysctl.c +++ b/hw/arm_sysctl.c @@ -17,6 +17,7 @@ typedef struct { SysBusDevice busdev; +MemoryRegion iomem; uint32_t sys_id; uint32_t leds; uint16_t lockval; @@ -80,7 +81,8 @@ static void arm_sysctl_reset(DeviceState *d) s-resetlevel = 0; } -static uint32_t arm_sysctl_read(void *opaque, target_phys_addr_t offset) +static uint64_t arm_sysctl_read(void *opaque, target_phys_addr_t offset, +unsigned size) { arm_sysctl_state *s = (arm_sysctl_state *)opaque; @@ -177,7 +179,7 @@ static uint32_t arm_sysctl_read(void *opaque, target_phys_addr_t offset) } static void arm_sysctl_write(void *opaque, target_phys_addr_t offset, - uint32_t val) + uint64_t val, unsigned size) { arm_sysctl_state *s = (arm_sysctl_state *)opaque; @@ -284,16 +286,10 @@ static void arm_sysctl_write(void *opaque, target_phys_addr_t offset, } } -static CPUReadMemoryFunc * const arm_sysctl_readfn[] = { - arm_sysctl_read, - arm_sysctl_read, - arm_sysctl_read -}; - -static CPUWriteMemoryFunc * const arm_sysctl_writefn[] = { - arm_sysctl_write, - arm_sysctl_write, - arm_sysctl_write +static const MemoryRegionOps arm_sysctl_ops = { +.read = arm_sysctl_read, +.write = arm_sysctl_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; static void arm_sysctl_gpio_set(void *opaque, int line, int level) @@ -327,12 +323,9 @@ static void arm_sysctl_gpio_set(void *opaque, int line, int level) static int arm_sysctl_init1(SysBusDevice *dev) { arm_sysctl_state *s = FROM_SYSBUS(arm_sysctl_state, dev); -int iomemtype; -iomemtype = cpu_register_io_memory(arm_sysctl_readfn, - arm_sysctl_writefn, s, - DEVICE_NATIVE_ENDIAN); -sysbus_init_mmio(dev, 0x1000, iomemtype); +memory_region_init_io(s-iomem, arm_sysctl_ops, s, arm-sysctl, 0x1000); +sysbus_init_mmio_region(dev, s-iomem); qdev_init_gpio_in(s-busdev.qdev, arm_sysctl_gpio_set, 2); /* ??? Save/restore. */ return 0; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 18/24] versatile_pci: convert to memory API
Signed-off-by: Avi Kivity a...@redhat.com --- hw/versatile_pci.c | 94 --- 1 files changed, 44 insertions(+), 50 deletions(-) diff --git a/hw/versatile_pci.c b/hw/versatile_pci.c index e1d5c0b..43edf77 100644 --- a/hw/versatile_pci.c +++ b/hw/versatile_pci.c @@ -16,7 +16,9 @@ typedef struct { SysBusDevice busdev; qemu_irq irq[4]; int realview; -int mem_config; +MemoryRegion mem_config; +MemoryRegion mem_config2; +MemoryRegion isa; } PCIVPBState; static inline uint32_t vpb_pci_config_addr(target_phys_addr_t addr) @@ -24,55 +26,24 @@ static inline uint32_t vpb_pci_config_addr(target_phys_addr_t addr) return addr 0xff; } -static void pci_vpb_config_writeb (void *opaque, target_phys_addr_t addr, - uint32_t val) +static void pci_vpb_config_write(void *opaque, target_phys_addr_t addr, + uint64_t val, unsigned size) { -pci_data_write(opaque, vpb_pci_config_addr (addr), val, 1); +pci_data_write(opaque, vpb_pci_config_addr(addr), val, size); } -static void pci_vpb_config_writew (void *opaque, target_phys_addr_t addr, - uint32_t val) -{ -pci_data_write(opaque, vpb_pci_config_addr (addr), val, 2); -} - -static void pci_vpb_config_writel (void *opaque, target_phys_addr_t addr, - uint32_t val) -{ -pci_data_write(opaque, vpb_pci_config_addr (addr), val, 4); -} - -static uint32_t pci_vpb_config_readb (void *opaque, target_phys_addr_t addr) +static uint64_t pci_vpb_config_read(void *opaque, target_phys_addr_t addr, +unsigned size) { uint32_t val; -val = pci_data_read(opaque, vpb_pci_config_addr (addr), 1); -return val; +val = pci_data_read(opaque, vpb_pci_config_addr(addr), size); +return size; } -static uint32_t pci_vpb_config_readw (void *opaque, target_phys_addr_t addr) -{ -uint32_t val; -val = pci_data_read(opaque, vpb_pci_config_addr (addr), 2); -return val; -} - -static uint32_t pci_vpb_config_readl (void *opaque, target_phys_addr_t addr) -{ -uint32_t val; -val = pci_data_read(opaque, vpb_pci_config_addr (addr), 4); -return val; -} - -static CPUWriteMemoryFunc * const pci_vpb_config_write[] = { -pci_vpb_config_writeb, -pci_vpb_config_writew, -pci_vpb_config_writel, -}; - -static CPUReadMemoryFunc * const pci_vpb_config_read[] = { -pci_vpb_config_readb, -pci_vpb_config_readw, -pci_vpb_config_readl, +static const MemoryRegionOps pci_vpb_config_ops = { +.read = pci_vpb_config_read, +.write = pci_vpb_config_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; static int pci_vpb_map_irq(PCIDevice *d, int irq_num) @@ -87,17 +58,35 @@ static void pci_vpb_set_irq(void *opaque, int irq_num, int level) qemu_set_irq(pic[irq_num], level); } + static void pci_vpb_map(SysBusDevice *dev, target_phys_addr_t base) { PCIVPBState *s = (PCIVPBState *)dev; /* Selfconfig area. */ -cpu_register_physical_memory(base + 0x0100, 0x100, s-mem_config); +memory_region_add_subregion(get_system_memory(), base + 0x0100, +s-mem_config); /* Normal config area. */ -cpu_register_physical_memory(base + 0x0200, 0x100, s-mem_config); +memory_region_add_subregion(get_system_memory(), base + 0x0200, +s-mem_config2); if (s-realview) { /* IO memory area. */ -isa_mmio_init(base + 0x0300, 0x0010); +memory_region_add_subregion(get_system_memory(), base + 0x0300, +s-isa); +} +} + +static void pci_vpb_unmap(SysBusDevice *dev, target_phys_addr_t base) +{ +PCIVPBState *s = (PCIVPBState *)dev; +/* Selfconfig area. */ +memory_region_del_subregion(get_system_memory(), s-mem_config); +/* Normal config area. */ +memory_region_del_subregion(get_system_memory(), s-mem_config2); + +if (s-realview) { +/* IO memory area. */ +memory_region_del_subregion(get_system_memory(), s-isa); } } @@ -117,10 +106,15 @@ static int pci_vpb_init(SysBusDevice *dev) /* ??? Register memory space. */ -s-mem_config = cpu_register_io_memory(pci_vpb_config_read, - pci_vpb_config_write, bus, - DEVICE_LITTLE_ENDIAN); -sysbus_init_mmio_cb(dev, 0x0400, pci_vpb_map); +memory_region_init_io(s-mem_config, pci_vpb_config_ops, bus, + pci-vpb-selfconfig, 0x100); +memory_region_init_io(s-mem_config2, pci_vpb_config_ops, bus, + pci-vpb-config, 0x100); +if (s-realview) { +isa_mmio_setup(s-isa, 0x010); +} + +sysbus_init_mmio_cb2(dev, pci_vpb_map, pci_vpb_unmap);
[PATCH 07/24] gt64xxx.c: convert to memory API
Signed-off-by: Avi Kivity a...@redhat.com --- hw/gt64xxx.c | 36 +++- 1 files changed, 15 insertions(+), 21 deletions(-) diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c index d541558..6af9782 100644 --- a/hw/gt64xxx.c +++ b/hw/gt64xxx.c @@ -227,7 +227,7 @@ #define PCI_MAPPING_ENTRY(regname)\ target_phys_addr_t regname ##_start; \ target_phys_addr_t regname ##_length; \ -int regname ##_handle +MemoryRegion regname ##_mem typedef struct GT64120State { SysBusDevice busdev; @@ -269,9 +269,9 @@ static void gt64120_isd_mapping(GT64120State *s) target_phys_addr_t start = s-regs[GT_ISD] 21; target_phys_addr_t length = 0x1000; -if (s-ISD_length) -cpu_register_physical_memory(s-ISD_start, s-ISD_length, - IO_MEM_UNASSIGNED); +if (s-ISD_length) { +memory_region_del_subregion(get_system_memory(), s-ISD_mem); +} check_reserved_space(start, length); length = 0x1000; /* Map new address */ @@ -279,7 +279,7 @@ static void gt64120_isd_mapping(GT64120State *s) length, start, s-ISD_handle); s-ISD_start = start; s-ISD_length = length; -cpu_register_physical_memory(s-ISD_start, s-ISD_length, s-ISD_handle); +memory_region_add_subregion(get_system_memory(), s-ISD_start, s-ISD_mem); } static void gt64120_pci_mapping(GT64120State *s) @@ -290,7 +290,8 @@ static void gt64120_pci_mapping(GT64120State *s) /* Unmap old IO address */ if (s-PCI0IO_length) { -cpu_register_physical_memory(s-PCI0IO_start, s-PCI0IO_length, IO_MEM_UNASSIGNED); + memory_region_del_subregion(get_system_memory(), s-PCI0IO_mem); + memory_region_destroy(s-PCI0IO_mem); } /* Map new IO address */ s-PCI0IO_start = s-regs[GT_PCI0IOLD] 21; @@ -301,7 +302,7 @@ static void gt64120_pci_mapping(GT64120State *s) } static void gt64120_writel (void *opaque, target_phys_addr_t addr, -uint32_t val) +uint64_t val, unsigned size) { GT64120State *s = opaque; uint32_t saddr; @@ -579,8 +580,8 @@ static void gt64120_writel (void *opaque, target_phys_addr_t addr, } } -static uint32_t gt64120_readl (void *opaque, - target_phys_addr_t addr) +static uint64_t gt64120_readl (void *opaque, + target_phys_addr_t addr, unsigned size) { GT64120State *s = opaque; uint32_t val; @@ -851,16 +852,10 @@ static uint32_t gt64120_readl (void *opaque, return val; } -static CPUWriteMemoryFunc * const gt64120_write[] = { -gt64120_writel, -gt64120_writel, -gt64120_writel, -}; - -static CPUReadMemoryFunc * const gt64120_read[] = { -gt64120_readl, -gt64120_readl, -gt64120_readl, +static const MemoryRegionOps isd_mem_ops = { +.read = gt64120_readl, +.write = gt64120_writel, +.endianness = DEVICE_NATIVE_ENDIAN, }; static int gt64120_pci_map_irq(PCIDevice *pci_dev, int irq_num) @@ -1097,8 +1092,7 @@ PCIBus *gt64120_register(qemu_irq *pic) get_system_memory(), get_system_io(), PCI_DEVFN(18, 0), 4); -d-ISD_handle = cpu_register_io_memory(gt64120_read, gt64120_write, d, - DEVICE_NATIVE_ENDIAN); +memory_region_init_io(d-ISD_mem, isd_mem_ops, d, isd-mem, 0x1000); pci_create_simple(d-pci.bus, PCI_DEVFN(0, 0), gt64120_pci); return d-pci.bus; -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/24] omap_gpmc/nseries/tusb6010: convert to memory API
Somewhat clumsy since it needs a variable sized region. Signed-off-by: Avi Kivity a...@redhat.com --- hw/omap.h |3 ++- hw/omap_gpmc.c | 53 + hw/tusb6010.c | 30 +- hw/tusb6010.h |7 +-- 4 files changed, 49 insertions(+), 44 deletions(-) diff --git a/hw/omap.h b/hw/omap.h index a064353..c2fe54c 100644 --- a/hw/omap.h +++ b/hw/omap.h @@ -17,6 +17,7 @@ * with this program; if not, see http://www.gnu.org/licenses/. */ #ifndef hw_omap_h +#include memory.h # define hw_omap_h omap.h # define OMAP_EMIFS_BASE 0x @@ -119,7 +120,7 @@ void omap_sdrc_reset(struct omap_sdrc_s *s); struct omap_gpmc_s; struct omap_gpmc_s *omap_gpmc_init(target_phys_addr_t base, qemu_irq irq); void omap_gpmc_reset(struct omap_gpmc_s *s); -void omap_gpmc_attach(struct omap_gpmc_s *s, int cs, int iomemtype, +void omap_gpmc_attach(struct omap_gpmc_s *s, int cs, MemoryRegion *iomem, void (*base_upd)(void *opaque, target_phys_addr_t new), void (*unmap)(void *opaque), void *opaque); diff --git a/hw/omap_gpmc.c b/hw/omap_gpmc.c index 8bf3343..4901aba 100644 --- a/hw/omap_gpmc.c +++ b/hw/omap_gpmc.c @@ -21,10 +21,13 @@ #include hw.h #include flash.h #include omap.h +#include memory.h +#include exec-memory.h /* General-Purpose Memory Controller */ struct omap_gpmc_s { qemu_irq irq; +MemoryRegion iomem; uint8_t sysconfig; uint16_t irqst; @@ -39,7 +42,8 @@ struct omap_gpmc_s { uint32_t config[7]; target_phys_addr_t base; size_t size; -int iomemtype; +MemoryRegion *iomem; +MemoryRegion container; void (*base_update)(void *opaque, target_phys_addr_t new); void (*unmap)(void *opaque); void *opaque; @@ -75,8 +79,12 @@ static void omap_gpmc_cs_map(struct omap_gpmc_cs_file_s *f, int base, int mask) * constant), the mask should cause wrapping of the address space, so * that the same memory becomes accessible at every isize/i bytes * starting from ibase/i. */ -if (f-iomemtype) -cpu_register_physical_memory(f-base, f-size, f-iomemtype); +if (f-iomem) { +memory_region_init(f-container, omap-gpmc-file, f-size); +memory_region_add_subregion(f-container, 0, f-iomem); +memory_region_add_subregion(get_system_memory(), f-base, +f-container); +} if (f-base_update) f-base_update(f-opaque, f-base); @@ -87,8 +95,11 @@ static void omap_gpmc_cs_unmap(struct omap_gpmc_cs_file_s *f) if (f-size) { if (f-unmap) f-unmap(f-opaque); -if (f-iomemtype) -cpu_register_physical_memory(f-base, f-size, IO_MEM_UNASSIGNED); +if (f-iomem) { +memory_region_del_subregion(get_system_memory(), f-container); +memory_region_del_subregion(f-container, f-iomem); +memory_region_destroy(f-container); +} f-base = 0; f-size = 0; } @@ -132,7 +143,8 @@ void omap_gpmc_reset(struct omap_gpmc_s *s) ecc_reset(s-ecc[i]); } -static uint32_t omap_gpmc_read(void *opaque, target_phys_addr_t addr) +static uint64_t omap_gpmc_read(void *opaque, target_phys_addr_t addr, + unsigned size) { struct omap_gpmc_s *s = (struct omap_gpmc_s *) opaque; int cs; @@ -230,7 +242,7 @@ static uint32_t omap_gpmc_read(void *opaque, target_phys_addr_t addr) } static void omap_gpmc_write(void *opaque, target_phys_addr_t addr, -uint32_t value) +uint64_t value, unsigned size) { struct omap_gpmc_s *s = (struct omap_gpmc_s *) opaque; int cs; @@ -249,7 +261,7 @@ static void omap_gpmc_write(void *opaque, target_phys_addr_t addr, case 0x010:/* GPMC_SYSCONFIG */ if ((value 3) == 0x3) -fprintf(stderr, %s: bad SDRAM idle mode %i\n, +fprintf(stderr, %s: bad SDRAM idle mode %PRIi64\n, __FUNCTION__, value 3); if (value 2) omap_gpmc_reset(s); @@ -369,34 +381,27 @@ static void omap_gpmc_write(void *opaque, target_phys_addr_t addr, } } -static CPUReadMemoryFunc * const omap_gpmc_readfn[] = { -omap_badwidth_read32, /* TODO */ -omap_badwidth_read32, /* TODO */ -omap_gpmc_read, -}; - -static CPUWriteMemoryFunc * const omap_gpmc_writefn[] = { -omap_badwidth_write32, /* TODO */ -omap_badwidth_write32, /* TODO */ -omap_gpmc_write, +static const MemoryRegionOps omap_gpmc_ops = { +/* TODO: specialize 4 byte writes? */ +.read = omap_gpmc_read, +.write = omap_gpmc_write, +.endianness = DEVICE_NATIVE_ENDIAN, }; struct omap_gpmc_s *omap_gpmc_init(target_phys_addr_t base, qemu_irq irq) { -int iomemtype; struct omap_gpmc_s *s = (struct omap_gpmc_s *)