Re: [PATCH] virtio-blk: allow toggling host cache between writeback and writethrough
Il 05/07/2012 20:39, Badari Pulavarty ha scritto: > On Tue, 2012-07-03 at 15:19 +0200, Paolo Bonzini wrote: >> This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature, >> which exposes the cache mode in the configuration space and lets the >> driver modify it. The cache mode is exposed via sysfs. >> >> Even if the host does not support the new feature, the cache mode is >> visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable. >> >> Signed-off-by: Paolo Bonzini >> --- > > Hi Paolo, > > Curious - What is the host side change to support this ? QEMU would > close and re-open the device/file with the corresponding flags > (O_SYNC) ? QEMU is not using O_SYNC anymore; instead, it manually issues a flush after each write. We found this didn't penalize performance (in fact, for qcow2 metadata writes the fdatasyncs will be more coarse and there is a small improvement). So, when you toggle writethrough to writeback QEMU simply has to stop forcing flushes, and vice versa if you go to writethrough. > And also, is there a way to expose cache=none (O_DIRECT) to the guest ? Not yet. The main problem is that while we can invent something in virtio-blk, I'm not sure if there is an equivalent in SCSI for example. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
On Fri, 06 Jul 2012 11:20:06 +0800 Jason Wang wrote: > On 07/05/2012 08:51 PM, Sasha Levin wrote: > > On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote: > >> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev) > >> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) > >> vi->has_cvq = true; > >> > >> + /* Use single tx/rx queue pair as default */ > >> + vi->num_queue_pairs = 1; > >> + vi->total_queue_pairs = num_queue_pairs; > > The code is using this "default" even if the amount of queue pairs it > > wants was specified during initialization. This basically limits any > > device to use 1 pair when starting up. > > > > Yes, currently the virtio-net driver would use 1 txq/txq by default > since multiqueue may not outperform in all kinds of workload. So it's > better to keep it as default and let user enable multiqueue by ethtool -L. > I would prefer that the driver sized number of queues based on number of online CPU's. That is what real hardware does. What kind of workload are you doing? If it is some DBMS benchmark then maybe the issue is that some CPU's need to be reserved. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote: > So I'm pretty sure this discrepancy is attributed to the small block > random I/O bottleneck currently present for all Linux/SCSI core LLDs > regardless of physical or virtual storage fabric. > > The SCSI wide host-lock less conversion that happened in .38 code back > in 2010, and subsequently having LLDs like virtio-scsi convert to run in > host-lock-less mode have helped to some extent.. But it's still not > enough.. > > Another example where we've been able to prove this bottleneck recently > is with the following target setup: > > *) Intel Romley production machines with 128 GB of DDR-3 memory > *) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2) > *) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec > *) Infiniband SRP Target backported to RHEL 6.2 + latest OFED > > In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 + > iomemory_vsl export we end up avoiding SCSI core bottleneck on the > target machine, just as with the tcm_vhost example here for host kernel > side processing with vhost. > > Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP > (OFED) Initiator connected to four ib_srpt LUNs, we've observed that > MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs. > ~215K with heavy random 4k WRITE iometer / fio tests. Note this with an > optimized queue_depth ib_srp client w/ noop I/O schedulering, but is > still lacking the host_lock-less patches on RHEL 6.2 OFED.. > > This bottleneck has been mentioned by various people (including myself) > on linux-scsi the last 18 months, and I've proposed that that it be > discussed at KS-2012 so we can start making some forward progress: Well, no, it hasn't. You randomly drop things like this into unrelated email (I suppose that is a mention in strict English construction) but it's not really enough to get anyone to pay attention since they mostly stopped reading at the top, if they got that far: most people just go by subject when wading through threads initially. But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32 kernel, which is now nearly three years old) is 25% slower than W2k8R2 on infiniband isn't really going to get anyone excited either (particularly when you mention OFED, which usually means a stack replacement on Linux anyway). What people might pay attention to is evidence that there's a problem in 3.5-rc6 (without any OFED crap). If you're not going to bother investigating, it has to be in an environment they can reproduce (so ordinary hardware, not infiniband) otherwise it gets ignored as an esoteric hardware issue. James -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
Il 06/07/2012 05:38, Nicholas A. Bellinger ha scritto: > So I imagine that setting inquiry/vpd/mode via configfs attribs to match > whatever the guest wants to see (or expects to see) can be enabled > via /sys/kernel/config/target/core/$HBA/$DEV/[wwn,attrib]/ easily to > whatever is required. > > However, beyond basic SCSI WWN related bits, I would avoid trying to > match complex SCSI target state between the in-kernel patch and QEMU > SCSI. Agreed. It should just be the bare minimum to make stable /dev/disk paths, well, stable between the two backends. >>> so that it is not possible to migrate one to the other. >> >> Migration between different backend types does not seem all that useful. >> The general rule is you need identical flags on both sides to allow >> migration, and it is not clear how valuable it is to relax this >> somewhat. > > I really need to learn more about how QEMU Live migration works wrt to > storage before saying how this may (or may not) work. vhost-scsi live migration should be easy to fix. You need some ioctl or eventfd mechanism to communicate to userspace that there is no pending I/O, but you need that anyway also for other operations (as simple as stopping the VM: QEMU guarantees that the "stop" monitor command returns only when there is no outstanding I/O). What worries me most is: 1) the amount of functionality that is lost with vhost-scsi, especially the new live operations that we're adding to QEMU; 2) whether any hook we introduce in the QEMU block layer will cause problems down the road when we set to fix the existing virtio-blk/virtio-scsi-qemu performance problems. This is the reason why I'm reluctant to merge the QEMU bits. The kernel bits are self-contained and are much less problematic. It may well be that _the same_ (or very similar) hooks will be needed by both tcm_vhost and high-performance userspace virtio backends. This would of course remove the objection. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, 2012-07-05 at 16:53 +0300, Michael S. Tsirkin wrote: > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: > > Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: > > > > > > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | > > > bare-metal raw block > > > > > > 25 Write / 75 Read | ~15K | ~45K | > > > ~70K > > > 75 Write / 25 Read | ~20K | ~55K | > > > ~60K > > > > This is impressive, but I think it's still not enough to justify the > > inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly > > worthwhile as drivers for improvements to QEMU performance. We want to > > add more fast paths to QEMU that let us move SCSI and virtio processing > > to separate threads, we have proof of concepts that this can be done, > > and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. > > A general rant below: > > OTOH if it works, and adds value, we really should consider including code. > To me, it does not make sense to reject code just because in theory > someone could write even better code. Code walks. Time to marker matters too. > Yes I realize more options increases support. But downstreams can make > their own decisions on whether to support some configurations: > add a configure option to disable it and that's enough. > +1 for mst here. I think that type of sentiment deserves a toast at KS/LC in August. ;) > > In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two > > completely different devices that happen to speak the same SCSI > > transport. Not only virtio-scsi-vhost must be configured outside QEMU > > configuration outside QEMU is OK I think - real users use > management anyway. But maybe we can have helper scripts > like we have for tun? > > > and doesn't support -device; > > This needs to be fixed I think. > > > it (obviously) presents different > > inquiry/vpd/mode data than virtio-scsi-qemu, > > Why is this obvious and can't be fixed? Userspace virtio-scsi > is pretty flexible - can't it supply matching inquiry/vpd/mode data > so that switching is transparent to the guest? > So I imagine that setting inquiry/vpd/mode via configfs attribs to match whatever the guest wants to see (or expects to see) can be enabled via /sys/kernel/config/target/core/$HBA/$DEV/[wwn,attrib]/ easily to whatever is required. However, beyond basic SCSI WWN related bits, I would avoid trying to match complex SCSI target state between the in-kernel patch and QEMU SCSI. We've had this topic come up numerous times over the nears for other fabric modules (namely iscsi-target) and usually it end's up with a long email thread re-hashing history of failures until Linus starts yelling at the person who is pushing complex kernel <-> user split. The part where I start to get nervous is where you get into the cluster + multipath features. We have methods in TCM core that rebuild the exact state of this bits based upon external file metadata, based upon the running configfs layout. This is used by physical node failover + re-takeover to ensure the SCSI client sees exactly the same SCSI state. Trying to propagate up this type of complexity is where I think you go overboard. KISS and let's let fabric independent configfs (leaning on vfs) do the hard work for tracking these types of SCSI relationships. > > so that it is not possible to migrate one to the other. > > Migration between different backend types does not seem all that useful. > The general rule is you need identical flags on both sides to allow > migration, and it is not clear how valuable it is to relax this > somewhat. > I really need to learn more about how QEMU Live migration works wrt to storage before saying how this may (or may not) work. We certainly have no problems doing physical machine failover with target_core_mod for iscsi-target, and ATM I don't see why the QEMU userspace process driving the real-time configfs configuration of the storage fabric would not work.. > > I don't think vhost-scsi is particularly useful for virtualization, > > honestly. However, if it is useful for development, testing or > > benchmarking of lio itself (does this make any sense? :)) that could be > > by itself a good reason to include it. > > > > Paolo > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
On 07/05/2012 08:51 PM, Sasha Levin wrote: On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote: @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) vi->has_cvq = true; + /* Use single tx/rx queue pair as default */ + vi->num_queue_pairs = 1; + vi->total_queue_pairs = num_queue_pairs; The code is using this "default" even if the amount of queue pairs it wants was specified during initialization. This basically limits any device to use 1 pair when starting up. Yes, currently the virtio-net driver would use 1 txq/txq by default since multiqueue may not outperform in all kinds of workload. So it's better to keep it as default and let user enable multiqueue by ethtool -L. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue
On 07/05/2012 07:40 PM, Sasha Levin wrote: On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote: Instead of storing the queue index in virtio infos, this patch moves them to vring_virtqueue and introduces helpers to set and get the value. This would simplify the management and tracing. Signed-off-by: Jason Wang This patch actually fails to compile: drivers/virtio/virtio_mmio.c: In function ‘vm_notify’: drivers/virtio/virtio_mmio.c:229:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’ drivers/virtio/virtio_mmio.c: In function ‘vm_del_vq’: drivers/virtio/virtio_mmio.c:278:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’ make[2]: *** [drivers/virtio/virtio_mmio.o] Error 1 It probably misses the following hunks: diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index f5432b6..12b6180 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -222,11 +222,10 @@ static void vm_reset(struct virtio_device *vdev) static void vm_notify(struct virtqueue *vq) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev); - struct virtio_mmio_vq_info *info = vq->priv; /* We write the queue's selector into the notification register to * signal the other end */ - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY); + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY); } /* Notify all virtqueues on an interrupt. */ @@ -275,7 +274,7 @@ static void vm_del_vq(struct virtqueue *vq) vring_del_virtqueue(vq); /* Select and deactivate the queue */ - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL); + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_SEL); writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); size = PAGE_ALIGN(vring_size(info->num, VIRTIO_MMIO_VRING_ALIGN)); Oops, I miss the virtio mmio part, thanks for pointing this. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, 2012-07-05 at 12:31 +0300, Michael S. Tsirkin wrote: > On Wed, Jul 04, 2012 at 07:01:05PM -0700, Nicholas A. Bellinger wrote: > > On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote: > > > I was talking about 4/6 first of all. > > > > So yeah, this code is still considered RFC at this point for-3.6, but > > I'd like to get this into target-pending/for-next in next week for more > > feedback and start collecting signoffs for the necessary pieces that > > effect existing vhost code. > > > > By that time the cmwq conversion of tcm_vhost should be in place as > > well.. > > I'll try to give some feedback but I think we do need > to see the qemu patches - they weren't posted yet, were they? > This driver has some userspace interface and once > that is merged it has to be supported. > So I think we need the buy-in from the qemu side at the principal level. > Stefan posted the QEMU vhost-scsi patches a few items, but I think it's been awhile since the last round of review. For the recent development's with tcm_vhost, I've been using Zhi's QEMU tree here: https://github.com/wuzhy/qemu/tree/vhost-scsi Other than a few printf I added to help me understand how it works, no function changes have been made to work with target-pending/tcm_vhost. Thank you, --nab -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCHv3 RFC 0/2] kvm: direct msix injection
Hi, Michael/Alex, do you have progress for device assignment issue fixing? https://bugzilla.kernel.org/show_bug.cgi?id=43328 Thanks, -Xudong > -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Alex Williamson > Sent: Tuesday, July 03, 2012 1:08 AM > To: Jan Kiszka > Cc: Michael S. Tsirkin; kvm@vger.kernel.org > Subject: Re: [PATCHv3 RFC 0/2] kvm: direct msix injection > > On Mon, 2012-06-25 at 11:32 +0200, Jan Kiszka wrote: > > On 2012-06-11 13:19, Michael S. Tsirkin wrote: > > > We can deliver certain interrupts, notably MSIX, > > > from atomic context. > > > Here's an untested patch to do this (compiled only). > > > > > > Changes from v2: > > > Don't inject broadcast interrupts directly > > > Changes from v1: > > > Tried to address comments from v1, except unifying > > > with kvm_set_irq: passing flags to it looks too ugly. > > > Added a comment. > > > > > > Jan, you said you can test this? > > > > > > > > > Michael S. Tsirkin (2): > > > kvm: implement kvm_set_msi_inatomic > > > kvm: deliver msix interrupts from irq handler > > > > > > include/linux/kvm_host.h | 3 ++ > > > virt/kvm/assigned-dev.c | 31 ++-- > > > virt/kvm/irq_comm.c | 75 > > > > 3 files changed, 102 insertions(+), 7 deletions(-) > > > > > > > Finally-tested-by: Jan Kiszka > > Michael, we need either this or the simple oneshot patch to get device > assignment working again for 3.5. Are you planning to push this for > 3.5? Thanks, > > Alex > > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, 2012-07-05 at 09:06 -0500, Anthony Liguori wrote: > On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote: > > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: > >> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: > >>> > >>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | > >>> bare-metal raw block > >>> > >>> 25 Write / 75 Read | ~15K | ~45K | > >>> ~70K > >>> 75 Write / 25 Read | ~20K | ~55K | > >>> ~60K > >> > >> This is impressive, but I think it's still not enough to justify the > >> inclusion of tcm_vhost. > > We have demonstrated better results at much higher IOP rates with virtio-blk > in > userspace so while these results are nice, there's no reason to believe we > can't > do this in userspace. > So I'm pretty sure this discrepancy is attributed to the small block random I/O bottleneck currently present for all Linux/SCSI core LLDs regardless of physical or virtual storage fabric. The SCSI wide host-lock less conversion that happened in .38 code back in 2010, and subsequently having LLDs like virtio-scsi convert to run in host-lock-less mode have helped to some extent.. But it's still not enough.. Another example where we've been able to prove this bottleneck recently is with the following target setup: *) Intel Romley production machines with 128 GB of DDR-3 memory *) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2) *) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec *) Infiniband SRP Target backported to RHEL 6.2 + latest OFED In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 + iomemory_vsl export we end up avoiding SCSI core bottleneck on the target machine, just as with the tcm_vhost example here for host kernel side processing with vhost. Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP (OFED) Initiator connected to four ib_srpt LUNs, we've observed that MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs. ~215K with heavy random 4k WRITE iometer / fio tests. Note this with an optimized queue_depth ib_srp client w/ noop I/O schedulering, but is still lacking the host_lock-less patches on RHEL 6.2 OFED.. This bottleneck has been mentioned by various people (including myself) on linux-scsi the last 18 months, and I've proposed that that it be discussed at KS-2012 so we can start making some forward progress: http://lists.linux-foundation.org/pipermail/ksummit-2012-discuss/2012-June/98.html, > >> In my opinion, vhost-blk/vhost-scsi are mostly > >> worthwhile as drivers for improvements to QEMU performance. We want to > >> add more fast paths to QEMU that let us move SCSI and virtio processing > >> to separate threads, we have proof of concepts that this can be done, > >> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. > > > > A general rant below: > > > > OTOH if it works, and adds value, we really should consider including code. > > Users want something that has lots of features and performs really, really > well. > They want everything. > > Having one device type that is "fast" but has no features and another that is > "not fast" but has a lot of features forces the user to make a bad choice. > No > one wins in the end. > > virtio-scsi is brand new. It's not as if we've had any significant time to > make > virtio-scsi-qemu faster. In fact, tcm_vhost existed before virtio-scsi-qemu > did > if I understand correctly. > So based upon the data above, I'm going to make a prediction that MSFT guests connected with SCSI miniport <-> tcm_vhost will out perform Linux guests with virtio-scsi (w/ <= 3.5 host-lock-less) <-> tcm_vhost w/ connected to the same raw block flash iomemory_vsl backends. Of course that depends upon how fast virtio-scsi drivers get written for MSFT guests vs. us fixing the long-term performance bottleneck in our SCSI subsystem. ;) (Ksummit-2012 discuss CC'ed for the later) > > To me, it does not make sense to reject code just because in theory > > someone could write even better code. > > There is no theory. We have proof points with virtio-blk. > > > Code walks. Time to marker matters too. > > But guest/user facing decisions cannot be easily unmade and making the wrong > technical choices because of premature concerns of "time to market" just > result > in a long term mess. > > There is no technical reason why tcm_vhost is going to be faster than doing > it > in userspace. We can demonstrate this with virtio-blk. This isn't a > theoretical argument. > > > Yes I realize more options increases support. But downstreams can make > > their own decisions on whether to support some configurations: > > add a configure option to disable it and that's enough. > > > >> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two > >> completely diff
Re: [PATCH 3/3] virtio-blk: Add bio-based IO path for virtio-blk
On 07/04/2012 10:40 AM, Rusty Russell wrote: On Tue, 03 Jul 2012 08:39:39 +0800, Asias He wrote: On 07/02/2012 02:41 PM, Rusty Russell wrote: Sure, our guest merging might save us 100x as many exits as no merging. But since we're not doing many requests, does it matter? We can still have many requests with slow devices. The number of requests depends on the workload in guest. E.g. 512 IO threads in guest keeping doing IO. You can have many requests outstanding. But if the device is slow, the rate of requests being serviced must be low. Yes. Am I misunderstanding something? I thought if you could have a high rate of requests, it's not a slow device. Sure. -- Asias -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] [RFC PATCH 12/17] PowerPC: booke64: Add DO_KVM kernel hooks
On 07/04/2012 01:15 PM, Caraman Mihai Claudiu-B02008 wrote: >> >> From: Alexander Graf [ag...@suse.de] >> Sent: Wednesday, July 04, 2012 6:45 PM >> To: Caraman Mihai Claudiu-B02008 >> Cc: ; KVM list; linuxppc-dev; qemu-...@nongnu.org >> List; Benjamin Herrenschmidt >> Subject: Re: [Qemu-ppc] [RFC PATCH 12/17] PowerPC: booke64: Add DO_KVM >> kernel hooks >> >> On 04.07.2012, at 17:27, Caraman Mihai Claudiu-B02008 wrote: >> -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, July 04, 2012 5:30 PM To: Caraman Mihai Claudiu-B02008 Cc: ; KVM list; linuxppc-dev; qemu- p...@nongnu.org List; Benjamin Herrenschmidt Subject: Re: [Qemu-ppc] [RFC PATCH 12/17] PowerPC: booke64: Add DO_KVM kernel hooks On 25.06.2012, at 14:26, Mihai Caraman wrote: > Hook DO_KVM macro to 64-bit booke in a optimal way similar to 32-bit booke > see head_fsl_booke.S file. Extend interrupt handlers' parameter list with > interrupt vector numbers to accomodate the macro. Rework Guest Doorbell > handler to use the proper GSRRx save/restore registers. > Only the bolted version of tlb miss handers is addressed now. > > Signed-off-by: Mihai Caraman > --- > arch/powerpc/kernel/exceptions-64e.S | 114 --- --- > arch/powerpc/mm/tlb_low_64e.S| 14 +++- > 2 files changed, 92 insertions(+), 36 deletions(-) > > diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S > index 06f7aec..a60f81f 100644 > --- a/arch/powerpc/kernel/exceptions-64e.S > +++ b/arch/powerpc/kernel/exceptions-64e.S > @@ -25,6 +25,8 @@ > #include > #include > #include > +#include > +#include > > /* XXX This will ultimately add space for a special exception save > * structure used to save things like SRR0/SRR1, SPRGs, MAS, etc... > @@ -34,13 +36,24 @@ > */ > #define SPECIAL_EXC_FRAME_SIZE INT_FRAME_SIZE > > +#ifdef CONFIG_KVM_BOOKE_HV > +#define KVM_BOOKE_HV_MFSPR(reg, spr) \ > + BEGIN_FTR_SECTION \ > + mfspr reg, spr; \ > + END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > +#else > +#define KVM_BOOKE_HV_MFSPR(reg, spr) > +#endif Bleks - this is ugly. >>> >>> I agree :) But I opted to keep the optimizations done for 32-bit. >>> Do we really need to open-code the #ifdef here? >>> >>> 32-bit implementation fortunately use asm macros, we can't nest defines. >>> Can't the feature section code determine that the feature is disabled and just always not include the code? >>> >>> CPU_FTR_EMB_HV is set even if KVM is not configured. >> >> I don't get the point then. Why not have the whole DO_KVM masked under >> FTR_SECTION_IFSET(CPU_FTR_EMB_HV)? Are there book3s_64 implementations >> without HV? > > I guess you refer to book3e_64. I don't know all implementations but > Embedded.HV category is optional. > >> Can't we just mfspr unconditionally in DO_KVM? > > I think Scott should better answer this question, I don't know why he opted > for the other approach. That was on 32-bit, where some of DO_KVM's users want SRR1 for their own purposes. > -.macro tlb_prolog_bolted addr > +.macro tlb_prolog_bolted intnum addr > mtspr SPRN_SPRG_TLB_SCRATCH,r13 > mfspr r13,SPRN_SPRG_PACA > std r10,PACA_EXTLB+EX_TLB_R10(r13) > mfcrr10 > std r11,PACA_EXTLB+EX_TLB_R11(r13) > +#ifdef CONFIG_KVM_BOOKE_HV > +BEGIN_FTR_SECTION > + mfspr r11, SPRN_SRR1 > +END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) > +#endif This thing really should vanish behind DO_KVM :) >>> >>> Then let's do it first for 32-bit ;) >> >> You could #ifdef it in DO_KVM for 64-bit for now. IIRC it's not done on >> 32-bit because the register value is used even beyond DO_KVM there. > > Nope, 32-bit code is also guarded by CONFIG_KVM_BOOKE_HV. Only in the TLB miss handlers, not the normal exception prolog. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] [RFC PATCH 04/17] KVM: PPC64: booke: Add guest computation mode for irq delivery
On 07/04/2012 08:40 AM, Alexander Graf wrote: > On 25.06.2012, at 14:26, Mihai Caraman wrote: >> @@ -381,7 +386,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu >> *vcpu, >> set_guest_esr(vcpu, vcpu->arch.queued_esr); >> if (update_dear == true) >> set_guest_dear(vcpu, vcpu->arch.queued_dear); >> -kvmppc_set_msr(vcpu, vcpu->arch.shared->msr & msr_mask); >> +kvmppc_set_msr(vcpu, (vcpu->arch.shared->msr & msr_mask) >> +| msr_cm); > > Please split this computation out into its own variable and apply the masking > regardless. Something like > > ulong new_msr = vcpu->arch.shared->msr; > if (vcpu->arch.epcr & SPRN_EPCR_ICM) > new_msr |= MSR_CM; > new_msr &= msr_mask; > kvmppc_set_msr(vcpu, new_msr); This will fail to clear MSR[CM] in the odd but legal situation where you have MSR[CM] set but EPCR[ICM] unset. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] add PLE stats to kvmstat
I, and I expect others, have a keen interest in knowing how often we exit for PLE, and also how often that includes a yielding to another vcpu. The following adds two more counters to kvmstat to track the exits and the vcpu yields. This in no way changes PLE behavior, just helps us track what's going on. -Andrew Theurer Signed-off-by: Andrew Theurer arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/svm.c |1 + arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c |2 ++ virt/kvm/kvm_main.c |1 + 5 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 24b7647..aebba8a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -593,6 +593,8 @@ struct kvm_vcpu_stat { u32 hypercalls; u32 irq_injections; u32 nmi_injections; + u32 pause_exits; + u32 vcpu_yield_to; }; struct x86_instruction_info; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 7a41878..1c1b81e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3264,6 +3264,7 @@ static int interrupt_window_interception(struct vcpu_svm *svm) static int pause_interception(struct vcpu_svm *svm) { + ++svm->vcpu.stat.pause_exits; kvm_vcpu_on_spin(&(svm->vcpu)); return 1; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index eeeb4a2..1309578 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5004,6 +5004,7 @@ out: */ static int handle_pause(struct kvm_vcpu *vcpu) { + ++vcpu->stat.pause_exits; skip_emulated_instruction(vcpu); kvm_vcpu_on_spin(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8eacb2e..ad85403 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -143,6 +143,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { { "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) }, { "irq_injections", VCPU_STAT(irq_injections) }, { "nmi_injections", VCPU_STAT(nmi_injections) }, + { "pause_exits", VCPU_STAT(pause_exits) }, + { "vcpu_yield_to", VCPU_STAT(vcpu_yield_to) }, { "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) }, { "mmu_pte_write", VM_STAT(mmu_pte_write) }, { "mmu_pte_updated", VM_STAT(mmu_pte_updated) }, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 636bd08..d80b6cd 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1610,6 +1610,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me) if (kvm_vcpu_yield_to(vcpu)) { kvm->last_boosted_vcpu = i; yielded = 1; + ++vcpu->stat.vcpu_yield_to; break; } } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, Jul 05, 2012 at 04:32:31PM +0200, Paolo Bonzini wrote: > Il 05/07/2012 15:53, Michael S. Tsirkin ha scritto: > > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: > >> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: > >>> > >>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | > >>> bare-metal raw block > >>> > >>> 25 Write / 75 Read | ~15K | ~45K | > >>> ~70K > >>> 75 Write / 25 Read | ~20K | ~55K | > >>> ~60K > >> > >> This is impressive, but I think it's still not enough to justify the > >> inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly > >> worthwhile as drivers for improvements to QEMU performance. We want to > >> add more fast paths to QEMU that let us move SCSI and virtio processing > >> to separate threads, we have proof of concepts that this can be done, > >> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. > > > > A general rant below: > > > > OTOH if it works, and adds value, we really should consider including code. > > To me, it does not make sense to reject code just because in theory > > someone could write even better code. > > It's not about writing better code. It's about having two completely > separate SCSI/block layers with completely different feature sets. You mean qemu one versus kernel one? Both exist anyway :) > > Code walks. Time to marker matters too. > > Yes I realize more options increases support. But downstreams can make > > their own decisions on whether to support some configurations: > > add a configure option to disable it and that's enough. > > > >> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two > >> completely different devices that happen to speak the same SCSI > >> transport. Not only virtio-scsi-vhost must be configured outside QEMU > > > > configuration outside QEMU is OK I think - real users use > > management anyway. But maybe we can have helper scripts > > like we have for tun? > > We could add hooks for vhost-scsi in the SCSI devices and let them > configure themselves. I'm not sure it is a good idea. This is exactly what virtio-net does. > >> and doesn't support -device; > > > > This needs to be fixed I think. > > To be clear, it supports -device for the virtio-scsi HBA itself; it > doesn't support using -drive/-device to set up the disks hanging off it. Fixable, isn't it? > >> it (obviously) presents different > >> inquiry/vpd/mode data than virtio-scsi-qemu, > > > > Why is this obvious and can't be fixed? Userspace virtio-scsi > > is pretty flexible - can't it supply matching inquiry/vpd/mode data > > so that switching is transparent to the guest? > > It cannot support anyway the whole feature set unless you want to port > thousands of lines from the kernel to QEMU (well, perhaps we'll get > there but it's far. And dually, the in-kernel target of course does not > support qcow2 and friends though perhaps you could imagine some hack > based on NBD. > > Paolo Exactly. Kernel also gains functionality all the time. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
On 07/05/2012 08:51 PM, Sasha Levin wrote: > On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote: >> @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev) >> if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) >> vi->has_cvq = true; >> >> + /* Use single tx/rx queue pair as default */ >> + vi->num_queue_pairs = 1; >> + vi->total_queue_pairs = num_queue_pairs; vi->total_queue_pairs also should be set to 1 vi->total_queue_pairs = 1; > > The code is using this "default" even if the amount of queue pairs it > wants was specified during initialization. This basically limits any > device to use 1 pair when starting up. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Amos. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 4/5] virtio_net: multiqueue support
On 07/05/2012 06:29 PM, Jason Wang wrote: > This patch converts virtio_net to a multi queue device. After negotiated > VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs, > and driver could read the number from config space. > > The driver expects the number of rx/tx queue paris is equal to the number of > vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some > optimization were introduced: > > - Txq selection is based on the processor id in order to avoid contending a > lock > whose owner may exits to host. > - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns > the queue pairs. > > Signed-off-by: Krishna Kumar > Signed-off-by: Jason Wang > --- ... > > static int virtnet_probe(struct virtio_device *vdev) > { > - int err; > + int i, err; > struct net_device *dev; > struct virtnet_info *vi; > + u16 num_queues, num_queue_pairs; > + > + /* Find if host supports multiqueue virtio_net device */ > + err = virtio_config_val(vdev, VIRTIO_NET_F_MULTIQUEUE, > + offsetof(struct virtio_net_config, > + num_queues), &num_queues); > + > + /* We need atleast 2 queue's */ s/atleast/at least/ > + if (err || num_queues < 2) > + num_queues = 2; > + if (num_queues > MAX_QUEUES * 2) > + num_queues = MAX_QUEUES; num_queues = MAX_QUEUES * 2; MAX_QUEUES is the limitation of RX or TX. > + > + num_queue_pairs = num_queues / 2; ... -- Amos. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On 07/05/12 17:53, Bart Van Assche wrote: > On 07/05/12 01:52, Nicholas A. Bellinger wrote: >> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal >> raw block >> >> 25 Write / 75 Read | ~15K | ~45K | ~70K >> 75 Write / 25 Read | ~20K | ~55K | ~60K > > These numbers are interesting. To me these numbers mean that there is a > huge performance bottleneck in the virtio-scsi-raw storage path. Why is > the virtio-scsi-raw bandwidth only one third of the bare-metal raw block > bandwidth ? (replying to my own e-mail) Or maybe the above numbers mean that in the virtio-scsi-raw test I/O was serialized (I/O depth 1) while the other two tests use a large I/O depth (64) ? It can't be a coincidence that the virtio-scsi-raw results are close to the bare-metal results for I/O depth 1. Another question: which functionality does tcm_vhost provide that is not yet provided by the SCSI emulation code in qemu + tcm_loop ? Bart. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: allow toggling host cache between writeback and writethrough
On Tue, 2012-07-03 at 15:19 +0200, Paolo Bonzini wrote: > This patch adds support for the new VIRTIO_BLK_F_CONFIG_WCE feature, > which exposes the cache mode in the configuration space and lets the > driver modify it. The cache mode is exposed via sysfs. > > Even if the host does not support the new feature, the cache mode is > visible (thanks to the existing VIRTIO_BLK_F_WCE), but not modifiable. > > Signed-off-by: Paolo Bonzini > --- Hi Paolo, Curious - What is the host side change to support this ? QEMU would close and re-open the device/file with the corresponding flags (O_SYNC) ? And also, is there a way to expose cache=none (O_DIRECT) to the guest ? Our cluster filesystem folks need a way to verify/guarantee that virtio-blk device has cache=none selected at host. Otherwise, they can not support a cluster filesystem running inside a VM (on virtio-blk). Thoughts ? Thanks, Badari -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] plan for device assignment upstream
On Wed, Jul 4, 2012 at 8:05 AM, Avi Kivity wrote: > On 07/03/2012 10:06 PM, Blue Swirl wrote: >> On Mon, Jul 2, 2012 at 9:43 AM, Avi Kivity wrote: >>> On 07/02/2012 12:30 PM, Jan Kiszka wrote: On 2012-07-02 11:18, Michael S. Tsirkin wrote: > I've been thinking hard about Jan's patches for device > assignment. Basically while I thought it makes sense > to make all devices: assignment and not - behave the > same and use same APIs for injecting irqs, Anthony thinks there is huge > value in making irq propagation hierarchical and device assignment > should be special cased. On the long term, we will need direct injection, ie. caching, to allow making it lock-less. Stepping through all intermediate layers will cause troubles, at least performance-wise, when having to take and drop a lock at each stop. >>> >>> So we precalculate everything beforehand. Instead of each qemu_irq >>> triggering a callback, calculating the next hop and firing the next >>> qemu_irq, configure each qemu_irq array with a function that describes >>> how to take the next hop. Whenever the configuration changes, >>> recalculate all routes. >> >> Yes, we had this discussion last year when I proposed the IRQ matrix: >> http://lists.nongnu.org/archive/html/qemu-devel/2011-09/msg00474.html >> >> One problem with the matrix is that it only works for enable/disable >> level, not for more complex situations like boolean logic or >> multiplexed outputs. > > I think we do need to support inverters etc. > >> Perhaps the devices should describe the currently valid logic with >> packet filter type mechanism? I think that could scale arbitrarily and >> it could be more friendly even as a kernel interface? > > Interesting idea. So qemu creates multiple eventfds, gives half to > devices and half to kvm (as irqfds), and configures bpf programs that > calculate the irqfd outputs from the vfio inputs. I wasn't thinking of using fds, I guess that could work too but just that the interface could be similar to packet filters. So a device which implements an enable switch and ORs 8 inputs to a global output could be implemented with: context = rule_init(); context = append_rule(context, R_OR, 8, &irq_array[]); context = append_rule(context, R_AND, 1, irq_enable); send_to_kernel_or_master_irq_controller(context); > > At least for x86 this is overkill. I would be okay with > one-input-one-output cases handled with the current code and everything > else routed through qemu. If this is efficient, some of the internal logic inside devices (for example PCI) could be implemented with the rules. Usually devices have one or just a few IRQ outputs but several possible internal sources for these. > > -- > error compiling committee.c: too many arguments to function > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
On 07/05/12 17:47, Bart Van Assche wrote: > On 07/04/12 04:24, Nicholas A. Bellinger wrote: >> +/* Fill in status and signal that we are done processing this command >> + * >> + * This is scheduled in the vhost work queue so we are called with the owner >> + * process mm and can access the vring. >> + */ >> +static void vhost_scsi_complete_cmd_work(struct vhost_work *work) >> +{ > > As far as I can see vhost_scsi_complete_cmd_work() runs on the context > of a work queue kernel thread and hence doesn't have an mm context. Did > I misunderstand something ? Please ignore the above - I've found the answer in vhost_dev_ioctl() and vhost_dev_set_owner(). Bart. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On 07/05/12 01:52, Nicholas A. Bellinger wrote: > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal > raw block > > 25 Write / 75 Read | ~15K | ~45K | ~70K > 75 Write / 25 Read | ~20K | ~55K | ~60K These numbers are interesting. To me these numbers mean that there is a huge performance bottleneck in the virtio-scsi-raw storage path. Why is the virtio-scsi-raw bandwidth only one third of the bare-metal raw block bandwidth ? Bart. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
On 07/04/12 04:24, Nicholas A. Bellinger wrote: > +/* Fill in status and signal that we are done processing this command > + * > + * This is scheduled in the vhost work queue so we are called with the owner > + * process mm and can access the vring. > + */ > +static void vhost_scsi_complete_cmd_work(struct vhost_work *work) > +{ As far as I can see vhost_scsi_complete_cmd_work() runs on the context of a work queue kernel thread and hence doesn't have an mm context. Did I misunderstand something ? Bart. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 0/5] Multiqueue virtio-net
On 07/05/2012 03:29 AM, Jason Wang wrote: Test result: 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning - Guest to External Host TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 650.55 655.61 100% 24.88 24.86 99% 2 64 1446.81 1309.44 90% 30.49 27.16 89% 4 64 1430.52 1305.59 91% 30.78 26.80 87% 8 64 1450.89 1270.82 87% 30.83 25.95 84% Was the -D test-specific option used to set TCP_NODELAY? I'm guessing from your description of how packet sizes were smaller with multiqueue and your need to hack tcp_write_xmit() it wasn't but since we don't have the specific netperf command lines (hint hint :) I wanted to make certain. Instead of calling them throughput1 and throughput2, it might be more clear in future to identify them as singlequeue and multiqueue. Also, how are you combining the concurrent netperf results? Are you taking sums of what netperf reports, or are you gathering statistics outside of netperf? - TCP RR sessions size throughput1 throughput2 norm1 norm2 50 1 54695.41 84164.98 153% 1957.33 1901.31 97% A single instance TCP_RR test would help confirm/refute any non-trivial change in (effective) path length between the two cases. happy benchmarking, rick jones -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, Jul 05, 2012 at 04:47:43PM +0200, Paolo Bonzini wrote: > Il 05/07/2012 16:40, Michael S. Tsirkin ha scritto: > >> virtio-scsi is brand new. It's not as if we've had any significant > >> time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed > >> before virtio-scsi-qemu did if I understand correctly. > > Yes. > > > Can't same can be said about virtio scsi - it seems to be > > slower so we force a bad choice between blk and scsi at the user? > > virtio-scsi supports multiple devices per PCI slot (or even function), > can talk to tapes, has better passthrough support for disks, and does a > bunch of other things that virtio-blk by design doesn't do. This > applies to both tcm_vhost and virtio-scsi-qemu. > > So far, all that virtio-scsi vs. virtio-blk benchmarks say is that more > benchmarking is needed. Some people see it faster, some people see it > slower. In some sense, it's consistent with the expectation that the > two should roughly be the same. :) Anyway, all I was saying is new technology often lacks some features of the old one. We are not forcing new inferior one on anyone, so we can let it mature it tree. > >> But guest/user facing decisions cannot be easily unmade and making > >> the wrong technical choices because of premature concerns of "time > >> to market" just result in a long term mess. > >> > >> There is no technical reason why tcm_vhost is going to be faster > >> than doing it in userspace. > > > > But doing what in userspace exactly? > > Processing virtqueues in separate threads, switching the block and SCSI > layer to fine-grained locking, adding some more fast paths. > > >> Basically, the issue is that the kernel has more complete SCSI > >> emulation that QEMU does right now. > >> > >> There are lots of ways to try to solve this--like try to reuse the > >> kernel code in userspace or just improving the userspace code. If > >> we were able to make the two paths identical, then I strongly > >> suspect there'd be no point in having tcm_vhost anyway. > > > > However, a question we should ask ourselves is whether this will happen > > in practice, and when. > > It's already happening, but it takes a substantial amount of preparatory > work before you can actually see results. > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] Fixes related to processing of qemu's -numa option
Changes since v2: - Using "unsigned long *" for the node_cpumask[]. - Use bitmap_new() instead of g_malloc0() for allocation. - Don't rely on "max_cpus" since it may not be initialized before the numa related qemu options are parsed & processed. Note: Continuing to use a new constant for allocation of the mask (This constant is currently set to 255 since with an 8bit APIC ID VCPUs can range from 0-254 in a guest. The APIC ID 255 (0xFF) is reserved for broadcast). Changes since v1: - Use bitmap functions that are already in qemu (instead of cpu_set_t macro's from sched.h) - Added a check for endvalue >= max_cpus. - Fix to address the round-robbing assignment when cpu's are not explicitly specified. Tested-by: Eduardo Habkost redhat.com> --- v1: -- The -numa option to qemu is used to create [fake] numa nodes and expose them to the guest OS instance. There are a couple of issues with the -numa option: a) Max VCPU's that can be specified for a guest while using the qemu's -numa option is 64. Due to a typecasting issue when the number of VCPUs is > 32 the VCPUs don't show up under the specified [fake] numa nodes. b) KVM currently has support for 160VCPUs per guest. The qemu's -numa option has only support for upto 64VCPUs per guest. This patch addresses these two issues. Below are examples of (a) and (b) a) >32 VCPUs are specified with the -numa option: /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ 71:01:01 \ -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 ... Upstream qemu : -- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 6 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 32 33 34 35 36 37 38 39 40 41 node 0 size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 node 2 size: 131072 MB node 3 cpus: 30 node 3 size: 131072 MB node 4 cpus: node 4 size: 131072 MB node 5 cpus: 31 node 5 size: 131072 MB With the patch applied : --- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 6 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 node 0 size: 131072 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 131072 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 131072 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 131072 MB node 4 cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 131072 MB node 5 cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 131072 MB b) >64 VCPUs specified with -numa option: /usr/local/bin/qemu-system-x86_64 \ -enable-kvm \ -cpu Westmere,+rdtscp,+pdpe1gb,+dca,+pdcm,+xtpr,+tm2,+est,+smx,+vmx,+ds_cpl,+monitor,+dtes64,+pclmuldq,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme \ -smp sockets=8,cores=10,threads=1 \ -numa node,nodeid=0,cpus=0-9,mem=64g \ -numa node,nodeid=1,cpus=10-19,mem=64g \ -numa node,nodeid=2,cpus=20-29,mem=64g \ -numa node,nodeid=3,cpus=30-39,mem=64g \ -numa node,nodeid=4,cpus=40-49,mem=64g \ -numa node,nodeid=5,cpus=50-59,mem=64g \ -numa node,nodeid=6,cpus=60-69,mem=64g \ -numa node,nodeid=7,cpus=70-79,mem=64g \ -m 524288 \ -name vm1 \ -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vm1.monitor,server,nowait \ -drive file=/dev/libvirt_lvm/vm.img,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -monitor stdio \ -net nic,macaddr=52:54:00:71:01:01 \ -net tap,ifname=tap0,script=no,downscript=no \ -vnc :4 ... Upstream qemu : -- only 63 CPUs in NUMA mode supported. only 64 CPUs in NUMA mode supported. QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 8 nodes node 0 cpus: 6 7 8 9 38 39 40 41 70 71 72 73 node 0 size: 65536 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 42 43 44 45 46 47 48 49 50 51 74 75 76 77 78 79 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 52 53 54 55 56 57 58 59 60 61 node 2 size: 65536 MB node 3 cpus: 30 62 node 3 size: 65536 MB node 4 cpus: node 4 size: 65536 MB node 5 cpus: node 5 size: 65536 MB node 6 cpus: 31 63 node 6 size: 65536 MB node 7 cpus: 0 1 2 3 4 5 32 33 34 35 36 37 64 65 66 67 68 69 node 7 size: 65536 MB With the patch applied : --- QEMU 1.1.50 monitor - type 'help' for more information (qemu) info numa 8 nodes node 0 cpus: 0 1 2 3 4 5 6 7 8 9 node 0 size: 65536 MB node 1 cpus: 10 11 12 13 14 15 16 17 18 19 node 1 size: 65536 MB node 2 cpus: 20 21 22 23 24 25 26 27 28 29 node 2 size: 65536 MB node 3 cpus: 30 31 32 33 34 35 36 37 38 39 node 3 size: 65536 MB node 4 cpus: 40 41 42 43 44 45 46 47 48 49 node 4 size: 65536 MB node 5 cpus: 50 51 52 53 54 55 56 57 58 59 node 5 size: 65536 MB node 6 cpus: 60 61 62 63 64 65 66 67 68 69 node 6 size: 65536 MB node 7 cpus: 70 71 72 73 74 75 76 77 78 79 Signed-off-
[Bug 15486] amd_adac error
https://bugzilla.kernel.org/show_bug.cgi?id=15486 Alan changed: What|Removed |Added Status|NEW |RESOLVED CC||a...@lxorguk.ukuu.org.uk Component|kvm |Video(DRI - non Intel) Version|unspecified |2.5 Resolution||OBSOLETE AssignedTo|virtualization_kvm@kernel-b |drivers_video-dri@kernel-bu |ugs.osdl.org|gs.osdl.org Product|Virtualization |Drivers -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] kvm: KVM_EOIFD, an eventfd for EOIs
On Wed, Jul 04, 2012 at 10:24:54PM -0600, Alex Williamson wrote: > On Wed, 2012-07-04 at 17:00 +0300, Michael S. Tsirkin wrote: > > On Tue, Jul 03, 2012 at 01:21:29PM -0600, Alex Williamson wrote: > > > This new ioctl enables an eventfd to be triggered when an EOI is > > > written for a specified irqchip pin. By default this is a simple > > > notification, but we can also tie the eoifd to a level irqfd, which > > > enables the irqchip pin to be automatically de-asserted on EOI. > > > This mode is particularly useful for device-assignment applications > > > where the unmask and notify triggers a hardware unmask. The default > > > mode is most applicable to simple notify with no side-effects for > > > userspace usage, such as Qemu. > > > > > > Here we make use of the reference counting of the _irq_source > > > object allowing us to share it with an irqfd and cleanup regardless > > > of the release order. > > > > > > Signed-off-by: Alex Williamson > > > --- > > > > > > Documentation/virtual/kvm/api.txt | 21 > > > arch/x86/kvm/x86.c|1 > > > include/linux/kvm.h | 14 ++ > > > include/linux/kvm_host.h | 13 ++ > > > virt/kvm/eventfd.c| 208 > > > + > > > virt/kvm/kvm_main.c | 11 ++ > > > 6 files changed, 266 insertions(+), 2 deletions(-) > > > > > > diff --git a/Documentation/virtual/kvm/api.txt > > > b/Documentation/virtual/kvm/api.txt > > > index c7267d5..a38af14 100644 > > > --- a/Documentation/virtual/kvm/api.txt > > > +++ b/Documentation/virtual/kvm/api.txt > > > @@ -1988,6 +1988,27 @@ to independently assert level interrupts. The > > > KVM_IRQFD_FLAG_LEVEL > > > is only necessary on setup, teardown is identical to that above. > > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > > > +4.77 KVM_EOIFD > > > + > > > +Capability: KVM_CAP_EOIFD > > > > Maybe add a specific capability KVM_CAP_EOIFD_LEVEL_IRQFD too? > > Good idea, allows them to be split later. > > > > +Architectures: x86 > > > +Type: vm ioctl > > > +Parameters: struct kvm_eoifd (in) > > > +Returns: 0 on success, -1 on error > > > + > > > +KVM_EOIFD allows userspace to receive interrupt EOI notification > > > +through an eventfd. kvm_eoifd.fd specifies the eventfd used for > > > +notification and kvm_eoifd.gsi specifies the irchip pin, similar to > > > +KVM_IRQFD. The eoifd is removed using the KVM_EOIFD_FLAG_DEASSIGN > > > +flag, specifying both kvm_eoifd.fd and kvm_eoifd.gsi. > > > > This text reads like it would give you EOIs for any GSI, > > but you haven't actually implemented this for edge GSIs - and > > making it work would bloat the datapath for fast (edge) interrupts. > > I do allow you to register any GSI, but you won't be getting EOIs unless > it's operating in level triggered mode. Perhaps it's best to specify it > as unsupported and let some future patch create a new capability if > support is added. I'll add a comment. > > > What's the intended use of this part of the interface? qemu with > > irqchip disabled? > > VFIO should not be dependent on KVM, therefore when kvm is not enabled > we need to add an interface in qemu for devices to be notified of eoi. > > This doesn't currently exist. VFIO can take additional advantage of > irqchip when it is enabled, thus the interface below. However, I don't > feel I can propose an eoi notifier in qemu that stops working as soon as > irqchip is enabled, even if I'm the only user. This theoretical qemu > eoi notifier could then use the above when irqchip is enabled. Well internal qemu APIs are qemu's problem and can be addressed there. For example, can we make it mimic our interface: make qemu EOI notifier accept an object that includes qemu_irq without irqchip and irqfd with? In other words adding interface with no users looks weird. > > > + > > > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the provided > > > +kvm_eoifd stucture includes a valid kvm_eoifd.irqfd file descriptor > > > +for a level irqfd configured using the KVM_IRQFD_FLAG_LEVEL flag. > > > +In this mode the level interrupt is de-asserted prior to EOI eventfd > > > +notification. The KVM_EOIFD_FLAG_LEVEL_IRQFD is only necessary on > > > +setup, teardown is identical to that above. > > > > It seems a single fd can not be used multiple times for > > different irqfds? In that case: > > 1. Need to document this limitation > > Ok > > > 2. This differs from other notifiers e.g. ioeventfd. > > But the same as an irqfd. However irqfd is about interrupting guest. eoifd is more like ioeventfd really: kvm writes ioeventfd/eoifd but reads irqfd. > Neither use case I'm thinking of needs to > allow eventfds to be re-used and knowing that an eventfd is unique on > our list makes matching it much easier. For instance, if we have an > eoifd bound to a level irqfd and it's being de-assigned, we'd have to > require the eoifd is de-assigned before th
RE: [RFC PATCH 13/17] PowerPC: booke64: Use SPRG0/3 scratch for bolted TLB miss & crit int
> -Original Message- > From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] > Sent: Wednesday, June 27, 2012 1:16 AM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org; Anton Blanchard > Subject: Re: [RFC PATCH 13/17] PowerPC: booke64: Use SPRG0/3 scratch for > bolted TLB miss & crit int > > On Mon, 2012-06-25 at 15:26 +0300, Mihai Caraman wrote: > > Embedded.Hypervisor category defines GSPRG0..3 physical registers for > guests. > > Avoid SPRG4-7 usage as scratch in host exception handlers, otherwise > guest > > SPRG4-7 registers will be clobbered. > > For bolted TLB miss exception handlers, which is the version currently > > supported by KVM, use SPRN_SPRG_GEN_SCRATCH (aka SPRG0) instead of > > SPRN_SPRG_TLB_SCRATCH (aka SPRG6) and replace TLB with GEN PACA slots > to > > keep consitency. > > For critical exception handler use SPRG3 instead of SPRG7. > > Beware with SPRG3 usage. It's user space visible and we plan to use it > for other things (see Anton's patch to stick topology information in > there for use by the vdso). If you clobber it, you may want to restore > it later. In booke3e case SPRG3 will not be clobbered by the guests which access GSPRG3, but by the host exception handler. This means that we will have to restore SPRG3 even in the absence of KVM. My proposal is to add a PACA slot for r13 and save it in the same way you did with r12 in TLB_MISS_PROLOG. Then we can restore SPRG3 right in the prolog thus also avoiding to deal with it in KVM. The EXCEPTION_PROLOG is a common define for GEN/DBG/CRIT/MC, we use addition defines to specialize just the CRIT case. > I think Anton's patch should put the "proper" value we want in the PACA > anyway since we also need to restore it on exit from KVM, so you can > still use it as scratch, just restore the value before going to C. I just saw last iteration of Anton's vsdo patch that matches your description. Cheers, -Mike N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf
[PATCH uq/master 9/9] virtio: move common irqfd handling out of virtio-pci
All transports can use the same event handler for the irqfd, though the exact mechanics of the assignment will be specific. Note that there are three states: handled by the kernel, handled in userspace, disabled. This also lets virtio use event_notifier_set_handler. Signed-off-by: Paolo Bonzini --- hw/virtio-pci.c | 37 ++--- hw/virtio.c | 24 hw/virtio.h |2 ++ kvm-all.c | 10 ++ kvm-stub.c | 10 ++ kvm.h |2 ++ 6 files changed, 58 insertions(+), 27 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 36770fd..a66c946 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -496,25 +496,15 @@ static unsigned virtio_pci_get_features(void *opaque) return proxy->host_features; } -static void virtio_pci_guest_notifier_read(void *opaque) -{ -VirtQueue *vq = opaque; -EventNotifier *n = virtio_queue_get_guest_notifier(vq); -if (event_notifier_test_and_clear(n)) { -virtio_irq(vq); -} -} - static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy, unsigned int queue_no, unsigned int vector, MSIMessage msg) { VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no); +EventNotifier *n = virtio_queue_get_guest_notifier(vq); VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector]; -int fd, ret; - -fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vq)); +int ret; if (irqfd->users == 0) { ret = kvm_irqchip_add_msi_route(kvm_state, msg); @@ -525,7 +515,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy, } irqfd->users++; -ret = kvm_irqchip_add_irqfd(kvm_state, fd, irqfd->virq); +ret = kvm_irqchip_add_irq_notifier(kvm_state, n, irqfd->virq); if (ret < 0) { if (--irqfd->users == 0) { kvm_irqchip_release_virq(kvm_state, irqfd->virq); @@ -533,8 +523,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy, return ret; } -qemu_set_fd_handler(fd, NULL, NULL, NULL); - +virtio_queue_set_guest_notifier_fd_handler(vq, true, true); return 0; } @@ -543,19 +532,18 @@ static void kvm_virtio_pci_vq_vector_release(VirtIOPCIProxy *proxy, unsigned int vector) { VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no); +EventNotifier *n = virtio_queue_get_guest_notifier(vq); VirtIOIRQFD *irqfd = &proxy->vector_irqfd[vector]; -int fd, ret; - -fd = event_notifier_get_fd(virtio_queue_get_guest_notifier(vq)); +int ret; -ret = kvm_irqchip_remove_irqfd(kvm_state, fd, irqfd->virq); +ret = kvm_irqchip_remove_irq_notifier(kvm_state, n, irqfd->virq); assert(ret == 0); if (--irqfd->users == 0) { kvm_irqchip_release_virq(kvm_state, irqfd->virq); } -qemu_set_fd_handler(fd, virtio_pci_guest_notifier_read, NULL, vq); +virtio_queue_set_guest_notifier_fd_handler(vq, true, false); } static int kvm_virtio_pci_vector_use(PCIDevice *dev, unsigned vector, @@ -617,14 +605,9 @@ static int virtio_pci_set_guest_notifier(void *opaque, int n, bool assign) if (r < 0) { return r; } -qemu_set_fd_handler(event_notifier_get_fd(notifier), -virtio_pci_guest_notifier_read, NULL, vq); +virtio_queue_set_guest_notifier_fd_handler(vq, true, false); } else { -qemu_set_fd_handler(event_notifier_get_fd(notifier), -NULL, NULL, NULL); -/* Test and clear notifier before closing it, - * in case poll callback didn't have time to run. */ -virtio_pci_guest_notifier_read(vq); +virtio_queue_set_guest_notifier_fd_handler(vq, false, false); event_notifier_cleanup(notifier); } diff --git a/hw/virtio.c b/hw/virtio.c index 197edf0..d146f86 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -984,6 +984,30 @@ VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n) return vdev->vq + n; } +static void virtio_queue_guest_notifier_read(EventNotifier *n) +{ +VirtQueue *vq = container_of(n, VirtQueue, guest_notifier); +if (event_notifier_test_and_clear(n)) { +virtio_irq(vq); +} +} + +void virtio_queue_set_guest_notifier_fd_handler(VirtQueue *vq, bool assign, +bool with_irqfd) +{ +if (assign && !with_irqfd) { +event_notifier_set_handler(&vq->guest_notifier, + virtio_queue_guest_notifier_read); +} else { +event_notifier_set_handler(&vq->guest_notifier, NULL); +} +if (!assign) { +/* Test and clear notifier before closing it, + * in case poll callback didn't have time to run. */ +virtio_queue_guest_notifier_read(&vq->guest_notifie
[PATCH uq/master 8/9] virtio: move common ioeventfd handling out of virtio-pci
All transports can use the same event handler for the ioeventfd, though the exact setup (address/memory region) will be specific. This lets virtio use event_notifier_set_handler. Signed-off-by: Paolo Bonzini --- hw/virtio-pci.c | 36 ++-- hw/virtio.c | 22 ++ hw/virtio.h |1 + 3 files changed, 25 insertions(+), 34 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index a555728..36770fd 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -173,46 +173,18 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, __func__, r); return r; } +virtio_queue_set_host_notifier_fd_handler(vq, true); memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, true, n, notifier); } else { memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, true, n, notifier); -/* Handle the race condition where the guest kicked and we deassigned - * before we got around to handling the kick. - */ -if (event_notifier_test_and_clear(notifier)) { -virtio_queue_notify_vq(vq); -} - +virtio_queue_set_host_notifier_fd_handler(vq, false); event_notifier_cleanup(notifier); } return r; } -static void virtio_pci_host_notifier_read(void *opaque) -{ -VirtQueue *vq = opaque; -EventNotifier *n = virtio_queue_get_host_notifier(vq); -if (event_notifier_test_and_clear(n)) { -virtio_queue_notify_vq(vq); -} -} - -static void virtio_pci_set_host_notifier_fd_handler(VirtIOPCIProxy *proxy, -int n, bool assign) -{ -VirtQueue *vq = virtio_get_queue(proxy->vdev, n); -EventNotifier *notifier = virtio_queue_get_host_notifier(vq); -if (assign) { -qemu_set_fd_handler(event_notifier_get_fd(notifier), -virtio_pci_host_notifier_read, NULL, vq); -} else { -qemu_set_fd_handler(event_notifier_get_fd(notifier), -NULL, NULL, NULL); -} -} - static void virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy) { int n, r; @@ -232,8 +204,6 @@ static void virtio_pci_start_ioeventfd(VirtIOPCIProxy *proxy) if (r < 0) { goto assign_error; } - -virtio_pci_set_host_notifier_fd_handler(proxy, n, true); } proxy->ioeventfd_started = true; return; @@ -244,7 +214,6 @@ assign_error: continue; } -virtio_pci_set_host_notifier_fd_handler(proxy, n, false); r = virtio_pci_set_host_notifier_internal(proxy, n, false); assert(r >= 0); } @@ -266,7 +235,6 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy *proxy) continue; } -virtio_pci_set_host_notifier_fd_handler(proxy, n, false); r = virtio_pci_set_host_notifier_internal(proxy, n, false); assert(r >= 0); } diff --git a/hw/virtio.c b/hw/virtio.c index 168abe4..197edf0 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -988,6 +988,28 @@ EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq) { return &vq->guest_notifier; } + +static void virtio_queue_host_notifier_read(EventNotifier *n) +{ +VirtQueue *vq = container_of(n, VirtQueue, host_notifier); +if (event_notifier_test_and_clear(n)) { +virtio_queue_notify_vq(vq); +} +} + +void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign) +{ +if (assign) { +event_notifier_set_handler(&vq->host_notifier, + virtio_queue_host_notifier_read); +} else { +event_notifier_set_handler(&vq->host_notifier, NULL); +/* Test and clear notifier before after disabling event, + * in case poll callback didn't have time to run. */ +virtio_queue_host_notifier_read(&vq->host_notifier); +} +} + EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq) { return &vq->host_notifier; diff --git a/hw/virtio.h b/hw/virtio.h index 85aabe5..2949485 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -232,6 +232,7 @@ VirtQueue *virtio_get_queue(VirtIODevice *vdev, int n); int virtio_queue_get_id(VirtQueue *vq); EventNotifier *virtio_queue_get_guest_notifier(VirtQueue *vq); EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq); +void virtio_queue_set_host_notifier_fd_handler(VirtQueue *vq, bool assign); void virtio_queue_notify_vq(VirtQueue *vq); void virtio_irq(VirtQueue *vq); #endif -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master 7/9] event_notifier: add event_notifier_set_handler
Win32 event notifiers are not file descriptors, so they will not be able to use qemu_set_fd_handler. But even if for now we only have a POSIX version of EventNotifier, we can add a specific function that wraps the call. The wrapper passes the EventNotifier as the opaque value so that it will be used with container_of. Signed-off-by: Paolo Bonzini --- event_notifier.c |7 +++ event_notifier.h |3 +++ 2 files changed, 10 insertions(+) diff --git a/event_notifier.c b/event_notifier.c index 99c376c..2c207e1 100644 --- a/event_notifier.c +++ b/event_notifier.c @@ -12,6 +12,7 @@ #include "qemu-common.h" #include "event_notifier.h" +#include "qemu-char.h" #ifdef CONFIG_EVENTFD #include @@ -45,6 +46,12 @@ int event_notifier_get_fd(EventNotifier *e) return e->fd; } +int event_notifier_set_handler(EventNotifier *e, + EventNotifierHandler *handler) +{ +return qemu_set_fd_handler(e->fd, (IOHandler *)handler, NULL, e); +} + int event_notifier_set(EventNotifier *e) { uint64_t value = 1; diff --git a/event_notifier.h b/event_notifier.h index 30c12dd..e5888ed 100644 --- a/event_notifier.h +++ b/event_notifier.h @@ -19,11 +19,14 @@ struct EventNotifier { int fd; }; +typedef void EventNotifierHandler(EventNotifier *); + void event_notifier_init_fd(EventNotifier *, int fd); int event_notifier_init(EventNotifier *, int active); void event_notifier_cleanup(EventNotifier *); int event_notifier_get_fd(EventNotifier *); int event_notifier_set(EventNotifier *); int event_notifier_test_and_clear(EventNotifier *); +int event_notifier_set_handler(EventNotifier *, EventNotifierHandler *); #endif -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master 6/9] memory: pass EventNotifier, not eventfd
Under Win32, EventNotifiers will not have event_notifier_get_fd, so we cannot call it in common code such as hw/virtio-pci.c. Pass a pointer to the notifier, and only retrieve the file descriptor in kvm-specific code. Signed-off-by: Paolo Bonzini --- exec.c |8 hw/ivshmem.c|4 ++-- hw/vhost.c |4 ++-- hw/virtio-pci.c |4 ++-- hw/xen_pt.c |2 +- kvm-all.c | 19 +-- memory.c| 18 +- memory.h|9 + xen-all.c |6 -- 9 files changed, 42 insertions(+), 32 deletions(-) diff --git a/exec.c b/exec.c index 8244d54..29b5078 100644 --- a/exec.c +++ b/exec.c @@ -3212,13 +3212,13 @@ static void core_log_global_stop(MemoryListener *listener) static void core_eventfd_add(MemoryListener *listener, MemoryRegionSection *section, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *e) { } static void core_eventfd_del(MemoryListener *listener, MemoryRegionSection *section, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *e) { } @@ -3278,13 +3278,13 @@ static void io_log_global_stop(MemoryListener *listener) static void io_eventfd_add(MemoryListener *listener, MemoryRegionSection *section, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *e) { } static void io_eventfd_del(MemoryListener *listener, MemoryRegionSection *section, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *e) { } diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 19e164a..bba21c5 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -350,7 +350,7 @@ static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i) 4, true, (posn << 16) | i, - event_notifier_get_fd(&s->peers[posn].eventfds[i])); + &s->peers[posn].eventfds[i]); } static void ivshmem_del_eventfd(IVShmemState *s, int posn, int i) @@ -360,7 +360,7 @@ static void ivshmem_del_eventfd(IVShmemState *s, int posn, int i) 4, true, (posn << 16) | i, - event_notifier_get_fd(&s->peers[posn].eventfds[i])); + &s->peers[posn].eventfds[i]); } static void close_guest_eventfds(IVShmemState *s, int posn) diff --git a/hw/vhost.c b/hw/vhost.c index 43664e7..0fd8da8 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -737,13 +737,13 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev, static void vhost_eventfd_add(MemoryListener *listener, MemoryRegionSection *section, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *e) { } static void vhost_eventfd_del(MemoryListener *listener, MemoryRegionSection *section, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *e) { } diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 9342eed..a555728 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -174,10 +174,10 @@ static int virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy, return r; } memory_region_add_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, - true, n, event_notifier_get_fd(notifier)); + true, n, notifier); } else { memory_region_del_eventfd(&proxy->bar, VIRTIO_PCI_QUEUE_NOTIFY, 2, - true, n, event_notifier_get_fd(notifier)); + true, n, notifier); /* Handle the race condition where the guest kicked and we deassigned * before we got around to handling the kick. */ diff --git a/hw/xen_pt.c b/hw/xen_pt.c index 3b6d186..fdf68aa 100644 --- a/hw/xen_pt.c +++ b/hw/xen_pt.c @@ -634,7 +634,7 @@ static void xen_pt_log_global_fns(MemoryListener *l) } static void xen_pt_eventfd_fns(MemoryListener *l, MemoryRegionSection *s, - bool match_data, uint64_t data, int fd) + bool match_data, uint64_t data, EventNotifier *n) { } diff --git a/kvm-all.c b/kvm-all.c index f8e4328..56f723e 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -32,6 +32,7 @@
[PATCH uq/master 5/9] ivshmem: wrap ivshmem_del_eventfd loops with transaction
Signed-off-by: Paolo Bonzini --- hw/ivshmem.c |4 1 file changed, 4 insertions(+) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 3cdbea2..19e164a 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -369,8 +369,12 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s->peers[posn].nb_eventfds; +memory_region_transaction_begin(); for (i = 0; i < guest_curr_max; i++) { ivshmem_del_eventfd(s, posn, i); +} +memory_region_transaction_commit(); +for (i = 0; i < guest_curr_max; i++) { event_notifier_cleanup(&s->peers[posn].eventfds[i]); } -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master 4/9] ivshmem: use EventNotifier and memory API
All of ivshmem's usage of eventfd now has a corresponding API in EventNotifier. Simplify the code by using it, and also use the memory API consistently to set up and tear down the ioeventfds. Signed-off-by: Paolo Bonzini --- hw/ivshmem.c | 63 -- 1 file changed, 35 insertions(+), 28 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 05559b6..3cdbea2 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -23,6 +23,7 @@ #include "kvm.h" #include "migration.h" #include "qerror.h" +#include "event_notifier.h" #include #include @@ -45,7 +46,7 @@ typedef struct Peer { int nb_eventfds; -int *eventfds; +EventNotifier *eventfds; } Peer; typedef struct EventfdEntry { @@ -63,7 +64,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; -pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -168,7 +168,6 @@ static void ivshmem_io_write(void *opaque, target_phys_addr_t addr, { IVShmemState *s = opaque; -uint64_t write_one = 1; uint16_t dest = val >> 16; uint16_t vector = val & 0xff; @@ -194,12 +193,8 @@ static void ivshmem_io_write(void *opaque, target_phys_addr_t addr, /* check doorbell range */ if (vector < s->peers[dest].nb_eventfds) { -IVSHMEM_DPRINTF("Writing %" PRId64 " to VM %d on vector %d\n", -write_one, dest, vector); -if (write(s->peers[dest].eventfds[vector], -&(write_one), 8) != 8) { -IVSHMEM_DPRINTF("error writing to eventfd\n"); -} +IVSHMEM_DPRINTF("Notifying VM %d on vector %d\n", dest, vector); +event_notifier_set(&s->peers[dest].eventfds[vector]); } break; default: @@ -279,12 +274,13 @@ static void fake_irqfd(void *opaque, const uint8_t *buf, int size) { msix_notify(pdev, entry->vector); } -static CharDriverState* create_eventfd_chr_device(void * opaque, int eventfd, -int vector) +static CharDriverState* create_eventfd_chr_device(void * opaque, EventNotifier *n, + int vector) { /* create a event character device based on the passed eventfd */ IVShmemState *s = opaque; CharDriverState * chr; +int eventfd = event_notifier_get_fd(n); chr = qemu_chr_open_eventfd(eventfd); @@ -347,6 +343,26 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar); } +static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i) +{ +memory_region_add_eventfd(&s->ivshmem_mmio, + DOORBELL, + 4, + true, + (posn << 16) | i, + event_notifier_get_fd(&s->peers[posn].eventfds[i])); +} + +static void ivshmem_del_eventfd(IVShmemState *s, int posn, int i) +{ +memory_region_del_eventfd(&s->ivshmem_mmio, + DOORBELL, + 4, + true, + (posn << 16) | i, + event_notifier_get_fd(&s->peers[posn].eventfds[i])); +} + static void close_guest_eventfds(IVShmemState *s, int posn) { int i, guest_curr_max; @@ -354,9 +370,8 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s->peers[posn].nb_eventfds; for (i = 0; i < guest_curr_max; i++) { -kvm_set_ioeventfd_mmio(s->peers[posn].eventfds[i], -s->mmio_addr + DOORBELL, (posn << 16) | i, 0, 4); -close(s->peers[posn].eventfds[i]); +ivshmem_del_eventfd(s, posn, i); +event_notifier_cleanup(&s->peers[posn].eventfds[i]); } g_free(s->peers[posn].eventfds); @@ -369,12 +384,7 @@ static void setup_ioeventfds(IVShmemState *s) { for (i = 0; i <= s->max_peer; i++) { for (j = 0; j < s->peers[i].nb_eventfds; j++) { -memory_region_add_eventfd(&s->ivshmem_mmio, - DOORBELL, - 4, - true, - (i << 16) | j, - s->peers[i].eventfds[j]); +ivshmem_add_eventfd(s, i, j); } } } @@ -476,14 +486,14 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) if (guest_max_eventfd == 0) { /* one eventfd per MSI vector */ -
[PATCH uq/master 3/9] event_notifier: add event_notifier_init_fd
Signed-off-by: Paolo Bonzini --- event_notifier.c |7 +++ event_notifier.h |3 ++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/event_notifier.c b/event_notifier.c index c339bfe..99c376c 100644 --- a/event_notifier.c +++ b/event_notifier.c @@ -10,11 +10,18 @@ * See the COPYING file in the top-level directory. */ +#include "qemu-common.h" #include "event_notifier.h" + #ifdef CONFIG_EVENTFD #include #endif +void event_notifier_init_fd(EventNotifier *e, int fd) +{ +e->fd = fd; +} + int event_notifier_init(EventNotifier *e, int active) { #ifdef CONFIG_EVENTFD diff --git a/event_notifier.h b/event_notifier.h index 9b2edf4..30c12dd 100644 --- a/event_notifier.h +++ b/event_notifier.h @@ -16,9 +16,10 @@ #include "qemu-common.h" struct EventNotifier { - int fd; +int fd; }; +void event_notifier_init_fd(EventNotifier *, int fd); int event_notifier_init(EventNotifier *, int active); void event_notifier_cleanup(EventNotifier *); int event_notifier_get_fd(EventNotifier *); -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master 2/9] event_notifier: remove event_notifier_test
This is broken; since the eventfd is used in nonblocking mode there is a race between reading and writing. Signed-off-by: Paolo Bonzini --- event_notifier.c | 15 --- event_notifier.h |1 - 2 files changed, 16 deletions(-) diff --git a/event_notifier.c b/event_notifier.c index 2b210f4..c339bfe 100644 --- a/event_notifier.c +++ b/event_notifier.c @@ -51,18 +51,3 @@ int event_notifier_test_and_clear(EventNotifier *e) int r = read(e->fd, &value, sizeof(value)); return r == sizeof(value); } - -int event_notifier_test(EventNotifier *e) -{ -uint64_t value; -int r = read(e->fd, &value, sizeof(value)); -if (r == sizeof(value)) { -/* restore previous value. */ -int s = write(e->fd, &value, sizeof(value)); -/* never blocks because we use EFD_SEMAPHORE. - * If we didn't we'd get EAGAIN on overflow - * and we'd have to write code to ignore it. */ -assert(s == sizeof(value)); -} -return r == sizeof(value); -} diff --git a/event_notifier.h b/event_notifier.h index efca852..9b2edf4 100644 --- a/event_notifier.h +++ b/event_notifier.h @@ -24,6 +24,5 @@ void event_notifier_cleanup(EventNotifier *); int event_notifier_get_fd(EventNotifier *); int event_notifier_set(EventNotifier *); int event_notifier_test_and_clear(EventNotifier *); -int event_notifier_test(EventNotifier *); #endif -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master 1/9] event_notifier: add event_notifier_set
EventNotifier right now cannot be used as an inter-thread communication primitive. It only works if something else (the kernel) sets the eventfd. Add a primitive to signal an EventNotifier that another thread is waiting on. Signed-off-by: Paolo Bonzini --- event_notifier.c |7 +++ event_notifier.h |1 + 2 files changed, 8 insertions(+) diff --git a/event_notifier.c b/event_notifier.c index 0b82981..2b210f4 100644 --- a/event_notifier.c +++ b/event_notifier.c @@ -38,6 +38,13 @@ int event_notifier_get_fd(EventNotifier *e) return e->fd; } +int event_notifier_set(EventNotifier *e) +{ +uint64_t value = 1; +int r = write(e->fd, &value, sizeof(value)); +return r == sizeof(value); +} + int event_notifier_test_and_clear(EventNotifier *e) { uint64_t value; diff --git a/event_notifier.h b/event_notifier.h index 886222c..efca852 100644 --- a/event_notifier.h +++ b/event_notifier.h @@ -22,6 +22,7 @@ struct EventNotifier { int event_notifier_init(EventNotifier *, int active); void event_notifier_cleanup(EventNotifier *); int event_notifier_get_fd(EventNotifier *); +int event_notifier_set(EventNotifier *); int event_notifier_test_and_clear(EventNotifier *); int event_notifier_test(EventNotifier *); -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master 0/9] remove event_notifier_get_fd from non-KVM code
This is part 1 of a three-part series that expands usage of EventNotifier in QEMU (including AIO and the main loop). I started working on this when playing with the threaded block layer; the part of that work that I hope will be in 1.2 is generalizing posix-aio-compat.c to be a generic portable thread pool + porting AIO to Win32 (part 2). On top of this, discard can be easily made asynchronous (part 3), which is a prerequisite for enabling it. This first part does the necessary changes for porting EventNotifier to Win32. The Win32 version will not have event_notifier_get_fd, and thus I want to remove all calls in portable code. Instead, all functions used in portable code after this series take an EventNotifier; KVM-specific implementations retrieve the file descriptor internally (these calls are in hw/ivshmem.c, hw/vhost.c, kvm-all.c). Patches 1 to 6 cover ivshmem and the memory API, first adding the required EventNotifier APIs and then using them. Patches 7 to 9 do the same with KVM ioeventfd and irqfd, refactoring transport-independent code in the process from virtio-pci to virtio (the two steps are a bit hard to separate). Paolo Bonzini (9): event_notifier: add event_notifier_set event_notifier: remove event_notifier_test event_notifier: add event_notifier_init_fd ivshmem: use EventNotifier and memory API ivshmem: wrap ivshmem_del_eventfd loops with transaction memory: pass EventNotifier, not eventfd event_notifier: add event_notifier_set_handler virtio: move common ioeventfd handling out of virtio-pci virtio: move common irqfd handling out of virtio-pci event_notifier.c | 30 - event_notifier.h |8 -- exec.c |8 +++--- hw/ivshmem.c | 67 +++ hw/vhost.c |4 +-- hw/virtio-pci.c | 77 ++ hw/virtio.c | 46 hw/virtio.h |3 +++ hw/xen_pt.c |2 +- kvm-all.c| 29 +++- kvm-stub.c | 10 +++ kvm.h|2 ++ memory.c | 18 ++--- memory.h |9 --- xen-all.c|6 +++-- 15 files changed, 186 insertions(+), 133 deletions(-) -- 1.7.10.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] buildsys: Move msi[x] and virtio-pci from Makefile.objs to Makefile.target
On 2012-07-05 15:35, Hans de Goede wrote: > Building non-kvm versions of qemu from qemu-kvm.git results in a linker error > with undefined references to kvm_kernel_irqchip, expanded from the > kvm_irqchip_in_kernel macro in kvm.h: > > This patch fixes this. > > Note maybe a better fix would be to drop the test for !defined NEED_CPU_H > in the above macro ? The best solution is to push device assignment upstream and celebrate the funeral of qemu-kvm. ;) Until then, qemu-kvm-1.1 (not master) just has to pick "kvm: Enable use of kvm_irqchip_in_kernel in hwlib code" (bbf3b80401) from upstream. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
Il 05/07/2012 16:40, Michael S. Tsirkin ha scritto: >> virtio-scsi is brand new. It's not as if we've had any significant >> time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed >> before virtio-scsi-qemu did if I understand correctly. Yes. > Can't same can be said about virtio scsi - it seems to be > slower so we force a bad choice between blk and scsi at the user? virtio-scsi supports multiple devices per PCI slot (or even function), can talk to tapes, has better passthrough support for disks, and does a bunch of other things that virtio-blk by design doesn't do. This applies to both tcm_vhost and virtio-scsi-qemu. So far, all that virtio-scsi vs. virtio-blk benchmarks say is that more benchmarking is needed. Some people see it faster, some people see it slower. In some sense, it's consistent with the expectation that the two should roughly be the same. :) >> But guest/user facing decisions cannot be easily unmade and making >> the wrong technical choices because of premature concerns of "time >> to market" just result in a long term mess. >> >> There is no technical reason why tcm_vhost is going to be faster >> than doing it in userspace. > > But doing what in userspace exactly? Processing virtqueues in separate threads, switching the block and SCSI layer to fine-grained locking, adding some more fast paths. >> Basically, the issue is that the kernel has more complete SCSI >> emulation that QEMU does right now. >> >> There are lots of ways to try to solve this--like try to reuse the >> kernel code in userspace or just improving the userspace code. If >> we were able to make the two paths identical, then I strongly >> suspect there'd be no point in having tcm_vhost anyway. > > However, a question we should ask ourselves is whether this will happen > in practice, and when. It's already happening, but it takes a substantial amount of preparatory work before you can actually see results. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case
On Mon, 2012-07-02 at 10:49 -0400, Rik van Riel wrote: > On 06/28/2012 06:55 PM, Vinod, Chegu wrote: > > Hello, > > > > I am just catching up on this email thread... > > > > Perhaps one of you may be able to help answer this query.. preferably along > > with some data. [BTW, I do understand the basic intent behind PLE in a > > typical [sweet spot] use case where there is over subscription etc. and the > > need to optimize the PLE handler in the host etc. ] > > > > In a use case where the host has fewer but much larger guests (say 40VCPUs > > and higher) and there is no over subscription (i.e. # of vcpus across > > guests<= physical cpus in the host and perhaps each guest has their vcpu's > > pinned to specific physical cpus for other reasons), I would like to > > understand if/how the PLE really helps ? For these use cases would it be > > ok to turn PLE off (ple_gap=0) since is no real need to take an exit and > > find some other VCPU to yield to ? > > Yes, that should be ok. > > On a related note, I wonder if we should increase the ple_gap > significantly. > > After all, 4096 cycles of spinning is not that much, when you > consider how much time is spent doing the subsequent vmexit, > scanning the other VCPU's status (200 cycles per cache miss), > deciding what to do, maybe poking another CPU, and eventually > a vmenter. > > A factor 4 increase in ple_gap might be what it takes to > get the amount of time spent spinning equal to the amount of > time spent on the host side doing KVM stuff... I was recently thinking the same thing as I have observed over 180,000 exits/sec from a 40-way VM on a 80-way host, where there should be no cpu overcommit. Also, the number of directed yields for this was only 1800/sec, so we have a 1% usefulness for our exits. I am wondering if the ple_window should be similar to the host scheduler task switching granularity, and not what we think a typical max cycles should be for holding a lock. BTW, I have a patch to add a couple PLE stats to kvmstat which I will send out shortly. -Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, Jul 05, 2012 at 09:06:35AM -0500, Anthony Liguori wrote: > On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote: > >On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: > >>Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: > >>> > >>>fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal > >>>raw block > >>> > >>>25 Write / 75 Read | ~15K | ~45K | > >>>~70K > >>>75 Write / 25 Read | ~20K | ~55K | > >>>~60K > >> > >>This is impressive, but I think it's still not enough to justify the > >>inclusion of tcm_vhost. > > We have demonstrated better results at much higher IOP rates with > virtio-blk in userspace so while these results are nice, there's no > reason to believe we can't do this in userspace. > > >>In my opinion, vhost-blk/vhost-scsi are mostly > >>worthwhile as drivers for improvements to QEMU performance. We want to > >>add more fast paths to QEMU that let us move SCSI and virtio processing > >>to separate threads, we have proof of concepts that this can be done, > >>and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. > > > >A general rant below: > > > >OTOH if it works, and adds value, we really should consider including code. > > Users want something that has lots of features and performs really, > really well. They want everything. > > Having one device type that is "fast" but has no features and > another that is "not fast" but has a lot of features forces the user > to make a bad choice. No one wins in the end. > > virtio-scsi is brand new. It's not as if we've had any significant > time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed > before virtio-scsi-qemu did if I understand correctly. Can't same can be said about virtio scsi - it seems to be slower so we force a bad choice between blk and scsi at the user? > > >To me, it does not make sense to reject code just because in theory > >someone could write even better code. > > There is no theory. We have proof points with virtio-blk. > > >Code walks. Time to marker matters too. > > But guest/user facing decisions cannot be easily unmade and making > the wrong technical choices because of premature concerns of "time > to market" just result in a long term mess. > > There is no technical reason why tcm_vhost is going to be faster > than doing it in userspace. But doing what in userspace exactly? > We can demonstrate this with > virtio-blk. This isn't a theoretical argument. > > >Yes I realize more options increases support. But downstreams can make > >their own decisions on whether to support some configurations: > >add a configure option to disable it and that's enough. > > > >>In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two > >>completely different devices that happen to speak the same SCSI > >>transport. Not only virtio-scsi-vhost must be configured outside QEMU > > > >configuration outside QEMU is OK I think - real users use > >management anyway. But maybe we can have helper scripts > >like we have for tun? > > Asking a user to write a helper script is pretty awful... A developer can write a helper. A user should just use management. > > > >>and doesn't support -device; > > > >This needs to be fixed I think. > > > >>it (obviously) presents different > >>inquiry/vpd/mode data than virtio-scsi-qemu, > > > >Why is this obvious and can't be fixed? > > It's an entirely different emulation path. It's not a simple packet > protocol like virtio-net. It's a complex command protocol where the > backend maintains a very large amount of state. > > >Userspace virtio-scsi > >is pretty flexible - can't it supply matching inquiry/vpd/mode data > >so that switching is transparent to the guest? > > Basically, the issue is that the kernel has more complete SCSI > emulation that QEMU does right now. > > There are lots of ways to try to solve this--like try to reuse the > kernel code in userspace or just improving the userspace code. If > we were able to make the two paths identical, then I strongly > suspect there'd be no point in having tcm_vhost anyway. > > Regards, > > Anthony Liguori However, a question we should ask ourselves is whether this will happen in practice, and when. I have no idea, I am just asking questions. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
Il 05/07/2012 15:53, Michael S. Tsirkin ha scritto: > On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: >> Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: >>> >>> fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal >>> raw block >>> >>> 25 Write / 75 Read | ~15K | ~45K | ~70K >>> 75 Write / 25 Read | ~20K | ~55K | ~60K >> >> This is impressive, but I think it's still not enough to justify the >> inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly >> worthwhile as drivers for improvements to QEMU performance. We want to >> add more fast paths to QEMU that let us move SCSI and virtio processing >> to separate threads, we have proof of concepts that this can be done, >> and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. > > A general rant below: > > OTOH if it works, and adds value, we really should consider including code. > To me, it does not make sense to reject code just because in theory > someone could write even better code. It's not about writing better code. It's about having two completely separate SCSI/block layers with completely different feature sets. > Code walks. Time to marker matters too. > Yes I realize more options increases support. But downstreams can make > their own decisions on whether to support some configurations: > add a configure option to disable it and that's enough. > >> In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two >> completely different devices that happen to speak the same SCSI >> transport. Not only virtio-scsi-vhost must be configured outside QEMU > > configuration outside QEMU is OK I think - real users use > management anyway. But maybe we can have helper scripts > like we have for tun? We could add hooks for vhost-scsi in the SCSI devices and let them configure themselves. I'm not sure it is a good idea. >> and doesn't support -device; > > This needs to be fixed I think. To be clear, it supports -device for the virtio-scsi HBA itself; it doesn't support using -drive/-device to set up the disks hanging off it. >> it (obviously) presents different >> inquiry/vpd/mode data than virtio-scsi-qemu, > > Why is this obvious and can't be fixed? Userspace virtio-scsi > is pretty flexible - can't it supply matching inquiry/vpd/mode data > so that switching is transparent to the guest? It cannot support anyway the whole feature set unless you want to port thousands of lines from the kernel to QEMU (well, perhaps we'll get there but it's far. And dually, the in-kernel target of course does not support qcow2 and friends though perhaps you could imagine some hack based on NBD. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On 07/05/2012 08:53 AM, Michael S. Tsirkin wrote: On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal raw block 25 Write / 75 Read | ~15K | ~45K | ~70K 75 Write / 25 Read | ~20K | ~55K | ~60K This is impressive, but I think it's still not enough to justify the inclusion of tcm_vhost. We have demonstrated better results at much higher IOP rates with virtio-blk in userspace so while these results are nice, there's no reason to believe we can't do this in userspace. In my opinion, vhost-blk/vhost-scsi are mostly worthwhile as drivers for improvements to QEMU performance. We want to add more fast paths to QEMU that let us move SCSI and virtio processing to separate threads, we have proof of concepts that this can be done, and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. A general rant below: OTOH if it works, and adds value, we really should consider including code. Users want something that has lots of features and performs really, really well. They want everything. Having one device type that is "fast" but has no features and another that is "not fast" but has a lot of features forces the user to make a bad choice. No one wins in the end. virtio-scsi is brand new. It's not as if we've had any significant time to make virtio-scsi-qemu faster. In fact, tcm_vhost existed before virtio-scsi-qemu did if I understand correctly. To me, it does not make sense to reject code just because in theory someone could write even better code. There is no theory. We have proof points with virtio-blk. Code walks. Time to marker matters too. But guest/user facing decisions cannot be easily unmade and making the wrong technical choices because of premature concerns of "time to market" just result in a long term mess. There is no technical reason why tcm_vhost is going to be faster than doing it in userspace. We can demonstrate this with virtio-blk. This isn't a theoretical argument. Yes I realize more options increases support. But downstreams can make their own decisions on whether to support some configurations: add a configure option to disable it and that's enough. In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two completely different devices that happen to speak the same SCSI transport. Not only virtio-scsi-vhost must be configured outside QEMU configuration outside QEMU is OK I think - real users use management anyway. But maybe we can have helper scripts like we have for tun? Asking a user to write a helper script is pretty awful... and doesn't support -device; This needs to be fixed I think. it (obviously) presents different inquiry/vpd/mode data than virtio-scsi-qemu, Why is this obvious and can't be fixed? It's an entirely different emulation path. It's not a simple packet protocol like virtio-net. It's a complex command protocol where the backend maintains a very large amount of state. Userspace virtio-scsi is pretty flexible - can't it supply matching inquiry/vpd/mode data so that switching is transparent to the guest? Basically, the issue is that the kernel has more complete SCSI emulation that QEMU does right now. There are lots of ways to try to solve this--like try to reuse the kernel code in userspace or just improving the userspace code. If we were able to make the two paths identical, then I strongly suspect there'd be no point in having tcm_vhost anyway. Regards, Anthony Liguori so that it is not possible to migrate one to the other. Migration between different backend types does not seem all that useful. The general rule is you need identical flags on both sides to allow migration, and it is not clear how valuable it is to relax this somewhat. I don't think vhost-scsi is particularly useful for virtualization, honestly. However, if it is useful for development, testing or benchmarking of lio itself (does this make any sense? :)) that could be by itself a good reason to include it. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
On Thu, 5 Jul 2012 14:50:00 +0300 Gleb Natapov wrote: > > Note that "if (!nr_to_scan--)" check is removed since we do not try to > > free mmu pages from more than one VM. > > > IIRC this was proposed in the past that we should iterate over vm list > until freeing something eventually, but Avi was against it. I think the > probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so > it looks OK to drop nr_to_scan to me. Since our batch size is 128, the minimum positive @nr_to_scan, it's almost impossible to see the effect of the check. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Thu, Jul 05, 2012 at 12:22:33PM +0200, Paolo Bonzini wrote: > Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: > > > > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal > > raw block > > > > 25 Write / 75 Read | ~15K | ~45K | ~70K > > 75 Write / 25 Read | ~20K | ~55K | ~60K > > This is impressive, but I think it's still not enough to justify the > inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly > worthwhile as drivers for improvements to QEMU performance. We want to > add more fast paths to QEMU that let us move SCSI and virtio processing > to separate threads, we have proof of concepts that this can be done, > and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. A general rant below: OTOH if it works, and adds value, we really should consider including code. To me, it does not make sense to reject code just because in theory someone could write even better code. Code walks. Time to marker matters too. Yes I realize more options increases support. But downstreams can make their own decisions on whether to support some configurations: add a configure option to disable it and that's enough. > In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two > completely different devices that happen to speak the same SCSI > transport. Not only virtio-scsi-vhost must be configured outside QEMU configuration outside QEMU is OK I think - real users use management anyway. But maybe we can have helper scripts like we have for tun? > and doesn't support -device; This needs to be fixed I think. > it (obviously) presents different > inquiry/vpd/mode data than virtio-scsi-qemu, Why is this obvious and can't be fixed? Userspace virtio-scsi is pretty flexible - can't it supply matching inquiry/vpd/mode data so that switching is transparent to the guest? > so that it is not possible to migrate one to the other. Migration between different backend types does not seem all that useful. The general rule is you need identical flags on both sides to allow migration, and it is not clear how valuable it is to relax this somewhat. > I don't think vhost-scsi is particularly useful for virtualization, > honestly. However, if it is useful for development, testing or > benchmarking of lio itself (does this make any sense? :)) that could be > by itself a good reason to include it. > > Paolo -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] kvm, Add x86_hyper_kvm to complete detect_hypervisor_platform check [v2]
On 07/05/2012 09:26 AM, Avi Kivity wrote: > Please copy at least kvm@vger.kernel.org, and preferably Marcelo as well > (the other kvm co-maintainer). > > While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. [v2]: add detect and _GPL export Signed-off-by: Prarit Bhargava Cc: Avi Kivity Cc: Gleb Natapov Cc: Alex Williamson Cc: Konrad Rzeszutek Wilk --- arch/x86/include/asm/hypervisor.h |1 + arch/x86/kernel/cpu/hypervisor.c |1 + arch/x86/kernel/kvm.c | 14 ++ 3 files changed, 16 insertions(+) diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h index 7a15153..b518c75 100644 --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -49,6 +49,7 @@ extern const struct hypervisor_x86 *x86_hyper; extern const struct hypervisor_x86 x86_hyper_vmware; extern const struct hypervisor_x86 x86_hyper_ms_hyperv; extern const struct hypervisor_x86 x86_hyper_xen_hvm; +extern const struct hypervisor_x86 x86_hyper_kvm; static inline bool hypervisor_x2apic_available(void) { diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index 755f64f..6d6dd7a 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -37,6 +37,7 @@ static const __initconst struct hypervisor_x86 * const hypervisors[] = #endif &x86_hyper_vmware, &x86_hyper_ms_hyperv, + &x86_hyper_kvm, }; const struct hypervisor_x86 *x86_hyper; diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index e554e5a..865cd13 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -39,6 +39,7 @@ #include #include #include +#include static int kvmapf = 1; @@ -432,6 +433,19 @@ void __init kvm_guest_init(void) #endif } +static bool __init kvm_detect(void) +{ + if (!kvm_para_available()) + return false; + return true; +} + +const struct hypervisor_x86 x86_hyper_kvm __refconst = { + .name = "KVM", + .detect = kvm_detect, +}; +EXPORT_SYMBOL_GPL(x86_hyper_kvm); + static __init int activate_jump_labels(void) { if (has_steal_clock) { -- 1.7.9.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] buildsys: Move msi[x] and virtio-pci from Makefile.objs to Makefile.target
Building non-kvm versions of qemu from qemu-kvm.git results in a linker error with undefined references to kvm_kernel_irqchip, expanded from the kvm_irqchip_in_kernel macro in kvm.h: This patch fixes this. Note maybe a better fix would be to drop the test for !defined NEED_CPU_H in the above macro ? Signed-off-by: Hans de Goede --- Makefile.objs |2 -- Makefile.target |3 ++- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/Makefile.objs b/Makefile.objs index 264f1fe..8d49738 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -211,10 +211,8 @@ hw-obj-y = hw-obj-y += vl.o loader.o hw-obj-$(CONFIG_VIRTIO) += virtio-console.o hw-obj-y += usb/libhw.o -hw-obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o hw-obj-y += fw_cfg.o hw-obj-$(CONFIG_PCI) += pci_bridge.o pci_bridge_dev.o -hw-obj-$(CONFIG_PCI) += msix.o msi.o hw-obj-$(CONFIG_PCI) += shpc.o hw-obj-$(CONFIG_PCI) += slotid_cap.o hw-obj-$(CONFIG_PCI) += pci_host.o pcie_host.o diff --git a/Makefile.target b/Makefile.target index eda8637..ede8ed3 100644 --- a/Makefile.target +++ b/Makefile.target @@ -183,9 +183,10 @@ obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o # virtio has to be here due to weird dependency between PCI and virtio-net. # need to fix this properly obj-$(CONFIG_NO_PCI) += pci-stub.o -obj-$(CONFIG_PCI) += pci.o +obj-$(CONFIG_PCI) += pci.o msi.o msix.o obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-serial-bus.o obj-$(CONFIG_VIRTIO) += virtio-scsi.o +obj-$(CONFIG_VIRTIO_PCI) += virtio-pci.o obj-y += vhost_net.o obj-$(CONFIG_VHOST_NET) += vhost.o obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virtio-9p-device.o -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/8] KVM: Optimize MMU notifier's THP page invalidation -v4
On Mon, Jul 02, 2012 at 05:52:39PM +0900, Takuya Yoshikawa wrote: > v3->v4: Resolved trace_kvm_age_page() issue -- patch 6,7 > v2->v3: Fixed intersection calculations. -- patch 3, 8 > > Takuya > > Takuya Yoshikawa (8): > KVM: MMU: Use __gfn_to_rmap() to clean up kvm_handle_hva() > KVM: Introduce hva_to_gfn_memslot() for kvm_handle_hva() > KVM: MMU: Make kvm_handle_hva() handle range of addresses > KVM: Introduce kvm_unmap_hva_range() for > kvm_mmu_notifier_invalidate_range_start() > KVM: Separate rmap_pde from kvm_lpage_info->write_count > KVM: MMU: Add memslot parameter to hva handlers > KVM: MMU: Push trace_kvm_age_page() into kvm_age_rmapp() > KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range() > > arch/powerpc/include/asm/kvm_host.h |2 + > arch/powerpc/kvm/book3s_64_mmu_hv.c | 47 --- > arch/x86/include/asm/kvm_host.h |3 +- > arch/x86/kvm/mmu.c | 107 > +++ > arch/x86/kvm/x86.c | 11 > include/linux/kvm_host.h|8 +++ > virt/kvm/kvm_main.c |3 +- > 7 files changed, 131 insertions(+), 50 deletions(-) > > > >From v2: > > The new test result was impressively good, see below, and THP page > invalidation was more than 5 times faster on my x86 machine. > > Before: > ... > 19.852 us | __mmu_notifier_invalidate_range_start(); > 28.033 us | __mmu_notifier_invalidate_range_start(); > 19.066 us | __mmu_notifier_invalidate_range_start(); > 44.715 us | __mmu_notifier_invalidate_range_start(); > 31.613 us | __mmu_notifier_invalidate_range_start(); > 20.659 us | __mmu_notifier_invalidate_range_start(); > 19.979 us | __mmu_notifier_invalidate_range_start(); > 20.416 us | __mmu_notifier_invalidate_range_start(); > 20.632 us | __mmu_notifier_invalidate_range_start(); > 22.316 us | __mmu_notifier_invalidate_range_start(); > ... > > After: > ... > 4.089 us | __mmu_notifier_invalidate_range_start(); > 4.096 us | __mmu_notifier_invalidate_range_start(); > 3.560 us | __mmu_notifier_invalidate_range_start(); > 3.376 us | __mmu_notifier_invalidate_range_start(); > 3.772 us | __mmu_notifier_invalidate_range_start(); > 3.353 us | __mmu_notifier_invalidate_range_start(); > 3.332 us | __mmu_notifier_invalidate_range_start(); > 3.332 us | __mmu_notifier_invalidate_range_start(); > 3.332 us | __mmu_notifier_invalidate_range_start(); > 3.337 us | __mmu_notifier_invalidate_range_start(); > ... Neat. Andrea can you please ACK? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR support in sregs
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Thursday, July 05, 2012 3:13 PM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org > Subject: Re: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR > support in sregs > > On 07/05/2012 01:49 PM, Caraman Mihai Claudiu-B02008 wrote: > >> -Original Message- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Wednesday, July 04, 2012 4:34 PM > >> To: Caraman Mihai Claudiu-B02008 > >> Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > >> d...@lists.ozlabs.org; qemu-...@nongnu.org > >> Subject: Re: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR > >> support in sregs > >> > >> > >> On 25.06.2012, at 14:26, Mihai Caraman wrote: > >> > >>> Add KVM_SREGS_E_64 feature and EPCR spr support in get/set sregs > >>> for 64-bit hosts. > >> Please also implement a ONE_REG interface while at it. Over time, I'd > >> like to move towards ONE_REG instead of the messy regs/sregs API. > > ONE_REG doesn't seem to be implemented at all for book3e, I looked at > > kvm_vcpu_ioctl_set_one_reg/kvm_vcpu_ioctl_get_one_reg in booke.c file. > > > > I can take care of it soon but in a different patch set. It's ok like > this? > > Do it in a different patch, but as part of this patch set. Hmm ... then if you don't disagree I will do it as a prerequisite patch since I want to keep this patchset strictly for 64-bit support. I am not familiar with ONE_REG, is qemu tailored to use it? I need a way to test it. -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote: > @@ -1387,6 +1404,10 @@ static int virtnet_probe(struct virtio_device *vdev) > if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) > vi->has_cvq = true; > > + /* Use single tx/rx queue pair as default */ > + vi->num_queue_pairs = 1; > + vi->total_queue_pairs = num_queue_pairs; The code is using this "default" even if the amount of queue pairs it wants was specified during initialization. This basically limits any device to use 1 pair when starting up. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 05.07.2012 10:51, Xiao Guangrong wrote: On 06/28/2012 05:11 PM, Peter Lieven wrote: that here is bascially whats going on: qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len 3 gpa 0xa val 0x10ff qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa gpa 0xa Read GPA qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio unsatisfied-read len 1 gpa 0xa val 0x0 qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason KVM_EXIT_MMIO (6) There are two mmio emulation after user-space-exit, it is caused by mmio read access which spans two pages. But it should be fixed by: commit f78146b0f9230765c6315b2e14f56112513389ad Author: Avi Kivity Date: Wed Apr 18 19:22:47 2012 +0300 KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of "fragments", and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti Could you please pull the code from: https://git.kernel.org/pub/scm/virt/kvm/kvm.git and trace it again? Thank you very much, this fixes the issue I have seen. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] MAINTAINERS: add kvm list for virtio components
On Thu, Jul 05, 2012 at 12:07:07PM +0200, Paolo Bonzini wrote: > The KVM list is followed by more people than the generic > virtualizat...@lists.linux-foundation.org mailing list, and is > already "de facto" the place where virtio patches are posted. I have no data on the first statement (do you?) and I disagree with the last statement, but have no objection to people adding kvm list as well. > pv-ops still has no other lists than > virtualizat...@lists.linux-foundation.org. > However, pv-ops patches will likely touch Xen or KVM files as well and > the respective mailing list will usually be reached as well. > > Signed-off-by: Paolo Bonzini So pls please replace the 1st paragraph in the commit log with "virtio changes are likely to affect many KVM users as virtio is the de-facto standard for PV devices under KVM". Otherwise ok. Acked-by: Michael S. Tsirkin > --- > MAINTAINERS |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 14bc707..e265f2e 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -7340,6 +7340,7 @@ F: include/media/videobuf2-* > VIRTIO CONSOLE DRIVER > M: Amit Shah > L: virtualizat...@lists.linux-foundation.org > +L: kvm@vger.kernel.org > S: Maintained > F: drivers/char/virtio_console.c > F: include/linux/virtio_console.h > @@ -7348,6 +7349,7 @@ VIRTIO CORE, NET AND BLOCK DRIVERS > M: Rusty Russell > M: "Michael S. Tsirkin" > L: virtualizat...@lists.linux-foundation.org > +L: kvm@vger.kernel.org > S: Maintained > F: drivers/virtio/ > F: drivers/net/virtio_net.c > -- > 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR support in sregs
On 07/05/2012 01:49 PM, Caraman Mihai Claudiu-B02008 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, July 04, 2012 4:34 PM To: Caraman Mihai Claudiu-B02008 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; qemu-...@nongnu.org Subject: Re: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR support in sregs On 25.06.2012, at 14:26, Mihai Caraman wrote: Add KVM_SREGS_E_64 feature and EPCR spr support in get/set sregs for 64-bit hosts. Please also implement a ONE_REG interface while at it. Over time, I'd like to move towards ONE_REG instead of the messy regs/sregs API. ONE_REG doesn't seem to be implemented at all for book3e, I looked at kvm_vcpu_ioctl_set_one_reg/kvm_vcpu_ioctl_get_one_reg in booke.c file. I can take care of it soon but in a different patch set. It's ok like this? Do it in a different patch, but as part of this patch set. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cpu: smp_wmb before lauching cpus.
On 2012-07-05 13:02, liu ping fan wrote: > On Thu, Jul 5, 2012 at 6:16 PM, Jan Kiszka wrote: >> On 2012-07-05 12:10, liu ping fan wrote: >>> On Thu, Jul 5, 2012 at 2:46 PM, Jan Kiszka wrote: On 2012-07-05 04:18, Liu Ping Fan wrote: > Vcpu state must be set completely before receiving INIT-IPI,SIPI > > Signed-off-by: Liu Ping Fan > --- > kvm.h |1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/kvm.h b/kvm.h > index 9c7b0ea..5b3c228 100644 > --- a/kvm.h > +++ b/kvm.h > @@ -198,6 +198,7 @@ static inline void > cpu_synchronize_post_init(CPUArchState *env) > { > if (kvm_enabled()) { > kvm_cpu_synchronize_post_init(env); > +smp_wmb(); > } > } > > In theory, there should be no vcpu kick-off after this without some locking operations involved that imply barriers. Did you see real >>> >>> Yeah, but what if it is non-x86? >> >> The locking I'm referring to is arch independent. >> inconsistencies without this explicit one? >> >> Again: Did you see real issues or is this based on static analysis? >> > Just on static analysis Then please describe - also for the changelog - at least one case in details where this is needed. Thanks, Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
On Thu, Jul 05, 2012 at 07:56:07PM +0900, Takuya Yoshikawa wrote: > The following commit changed mmu_shrink() so that it would skip VMs > whose n_used_mmu_pages is not zero and try to free pages from others: > Oops, > commit 1952639665e92481c34c34c3e2a71bf3e66ba362 > KVM: MMU: do not iterate over all VMs in mmu_shrink() > > This patch fixes the function so that it can free mmu pages as before. > > Note that "if (!nr_to_scan--)" check is removed since we do not try to > free mmu pages from more than one VM. > IIRC this was proposed in the past that we should iterate over vm list until freeing something eventually, but Avi was against it. I think the probability of a VM with kvm->arch.n_used_mmu_pages == 0 is low, so it looks OK to drop nr_to_scan to me. > Signed-off-by: Takuya Yoshikawa > Cc: Gleb Natapov > --- > arch/x86/kvm/mmu.c |5 + > 1 files changed, 1 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 3b53d9e..5fd268a 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -3957,11 +3957,8 @@ static int mmu_shrink(struct shrinker *shrink, struct > shrink_control *sc) >* want to shrink a VM that only started to populate its MMU >* anyway. >*/ > - if (kvm->arch.n_used_mmu_pages > 0) { > - if (!nr_to_scan--) > - break; > + if (!kvm->arch.n_used_mmu_pages) > continue; > - } > > idx = srcu_read_lock(&kvm->srcu); > spin_lock(&kvm->mmu_lock); > -- > 1.7.5.4 -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR support in sregs
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Wednesday, July 04, 2012 4:34 PM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org > Subject: Re: [Qemu-ppc] [RFC PATCH 03/17] KVM: PPC64: booke: Add EPCR > support in sregs > > > On 25.06.2012, at 14:26, Mihai Caraman wrote: > > > Add KVM_SREGS_E_64 feature and EPCR spr support in get/set sregs > > for 64-bit hosts. > > Please also implement a ONE_REG interface while at it. Over time, I'd > like to move towards ONE_REG instead of the messy regs/sregs API. ONE_REG doesn't seem to be implemented at all for book3e, I looked at kvm_vcpu_ioctl_set_one_reg/kvm_vcpu_ioctl_get_one_reg in booke.c file. I can take care of it soon but in a different patch set. It's ok like this? -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue
On Thu, 2012-07-05 at 18:29 +0800, Jason Wang wrote: > Instead of storing the queue index in virtio infos, this patch moves them to > vring_virtqueue and introduces helpers to set and get the value. This would > simplify the management and tracing. > > Signed-off-by: Jason Wang This patch actually fails to compile: drivers/virtio/virtio_mmio.c: In function ‘vm_notify’: drivers/virtio/virtio_mmio.c:229:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’ drivers/virtio/virtio_mmio.c: In function ‘vm_del_vq’: drivers/virtio/virtio_mmio.c:278:13: error: ‘struct virtio_mmio_vq_info’ has no member named ‘queue_index’ make[2]: *** [drivers/virtio/virtio_mmio.o] Error 1 It probably misses the following hunks: diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index f5432b6..12b6180 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -222,11 +222,10 @@ static void vm_reset(struct virtio_device *vdev) static void vm_notify(struct virtqueue *vq) { struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vq->vdev); - struct virtio_mmio_vq_info *info = vq->priv; /* We write the queue's selector into the notification register to * signal the other end */ - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY); + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_NOTIFY); } /* Notify all virtqueues on an interrupt. */ @@ -275,7 +274,7 @@ static void vm_del_vq(struct virtqueue *vq) vring_del_virtqueue(vq); /* Select and deactivate the queue */ - writel(info->queue_index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL); + writel(virtqueue_get_queue_index(vq), vm_dev->base + VIRTIO_MMIO_QUEUE_SEL); writel(0, vm_dev->base + VIRTIO_MMIO_QUEUE_PFN); size = PAGE_ALIGN(vring_size(info->num, VIRTIO_MMIO_VRING_ALIGN)); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH 06/17] KVM: PPC: e500: Add emulation helper for getting instruction ea
> -Original Message- > From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc- > ow...@vger.kernel.org] On Behalf Of Alexander Graf > Sent: Wednesday, July 04, 2012 4:56 PM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org > Subject: Re: [RFC PATCH 06/17] KVM: PPC: e500: Add emulation helper for > getting instruction ea > > > On 25.06.2012, at 14:26, Mihai Caraman wrote: > > > Add emulation helper for getting instruction ea and refactor tlb > instruction > > emulation to use it. > > > > Signed-off-by: Mihai Caraman > > --- > > arch/powerpc/kvm/e500.h |6 +++--- > > arch/powerpc/kvm/e500_emulate.c | 21 ++--- > > arch/powerpc/kvm/e500_tlb.c | 23 ++- > > 3 files changed, 27 insertions(+), 23 deletions(-) > > > > diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h > > index 3e31098..70bfed4 100644 > > --- a/arch/powerpc/kvm/e500.h > > +++ b/arch/powerpc/kvm/e500.h > > @@ -130,9 +130,9 @@ int kvmppc_e500_emul_mt_mmucsr0(struct > kvmppc_vcpu_e500 *vcpu_e500, > > ulong value); > > int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu); > > int kvmppc_e500_emul_tlbre(struct kvm_vcpu *vcpu); > > -int kvmppc_e500_emul_tlbivax(struct kvm_vcpu *vcpu, int ra, int rb); > > -int kvmppc_e500_emul_tlbilx(struct kvm_vcpu *vcpu, int rt, int ra, int > rb); > > -int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, int rb); > > +int kvmppc_e500_emul_tlbivax(struct kvm_vcpu *vcpu, gva_t ea); > > +int kvmppc_e500_emul_tlbilx(struct kvm_vcpu *vcpu, int rt, gva_t ea); > > +int kvmppc_e500_emul_tlbsx(struct kvm_vcpu *vcpu, gva_t ea); > > int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *vcpu_e500); > > void kvmppc_e500_tlb_uninit(struct kvmppc_vcpu_e500 *vcpu_e500); > > > > diff --git a/arch/powerpc/kvm/e500_emulate.c > b/arch/powerpc/kvm/e500_emulate.c > > index 8b99e07..81288f7 100644 > > --- a/arch/powerpc/kvm/e500_emulate.c > > +++ b/arch/powerpc/kvm/e500_emulate.c > > @@ -82,6 +82,17 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu > *vcpu, int rb) > > } > > #endif > > > > +static inline ulong kvmppc_get_ea_indexed(struct kvm_vcpu *vcpu, int > ra, int rb) > > +{ > > + ulong ea; > > + > > + ea = kvmppc_get_gpr(vcpu, rb); > > + if (ra) > > + ea += kvmppc_get_gpr(vcpu, ra); > > + > > + return ea; > > +} > > + > > Please move this one to arch/powerpc/include/asm/kvm_ppc.h. Yep. This is similar with what I had in my internal version before emulation refactoring took place upstream. The only difference is that I split the embedded and server implementation touching this files: arch/powerpc/include/asm/kvm_booke.h arch/powerpc/include/asm/kvm_book3s.h Which approach do you prefer? > > > int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, > >unsigned int inst, int *advance) > > { > > @@ -89,6 +100,7 @@ int kvmppc_core_emulate_op(struct kvm_run *run, > struct kvm_vcpu *vcpu, > > int ra = get_ra(inst); > > int rb = get_rb(inst); > > int rt = get_rt(inst); > > + gva_t ea; > > > > switch (get_op(inst)) { > > case 31: > > @@ -113,15 +125,18 @@ int kvmppc_core_emulate_op(struct kvm_run *run, > struct kvm_vcpu *vcpu, > > break; > > > > case XOP_TLBSX: > > - emulated = kvmppc_e500_emul_tlbsx(vcpu,rb); > > + ea = kvmppc_get_ea_indexed(vcpu, ra, rb); > > + emulated = kvmppc_e500_emul_tlbsx(vcpu, ea); > > break; > > > > case XOP_TLBILX: > > - emulated = kvmppc_e500_emul_tlbilx(vcpu, rt, ra, rb); > > + ea = kvmppc_get_ea_indexed(vcpu, ra, rb); > > + emulated = kvmppc_e500_emul_tlbilx(vcpu, rt, ea); > > What's the point in hiding ra+rb, but not rt? I like the idea of hiding > the register semantics, but please move rt into a local variable that > gets passed as pointer to kvmppc_e500_emul_tlbilx. Why to send it as a pointer? rt which should be rather named t in this case is an [in] value for tlbilx, according to section 6.11.4.9 in the PowerISA 2.06b. -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Wednesday, July 04, 2012 4:50 PM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org > Subject: Re: [Qemu-ppc] [RFC PATCH 05/17] KVM: PPC: booke: Extend MAS2 > EPN mask for 64-bit > > > On 25.06.2012, at 14:26, Mihai Caraman wrote: > > > Extend MAS2 EPN mask for 64-bit hosts, to retain most significant bits. > > Change get tlb eaddr to use this mask. > > Please see section 6.11.4.8 in the PowerISA 2.06b: > > MMU behavior is largely unaffected by whether the thread is in 32-bit > computation mode (MSRCM=0) or 64- bit computation mode (MSRCM=1). The > only differ- ences occur in the EPN field of the TLB entry and the EPN > field of MAS2. The differences are summarized here. > > * Executing a tlbwe instruction in 32-bit mode will set bits 0:31 > of the TLB EPN field to zero unless MAS0ATSEL is set, in which case those > bits are not written to zero. > * In 32-bit implementations, MAS2U can be used to read or write > EPN0:31 of MAS2. > > So if MSR.CM is not set tlbwe should mask the upper 32 bits out - which > can happen regardless of CONFIG_64BIT. MAS2_EPN reflects EPN field of MAS2 aka bits 0:51 (for MAV = 1.0) according to section 6.10.3.10 in the PowerISA 2.06b. MAS2_EPN is not used in tlbwe execution emulation, we have MAS2_VAL define for this case. > Also, we need to implement MAS2U, to potentially make the upper 32bits of > MAS2 available, right? But that one isn't as important as the first bit. MAS2U is guest privileged why does it need special care? Freescale core Manuals and EREF does not mention MAS2U so I think I our case it is not implemented. -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cpu: smp_wmb before lauching cpus.
On Thu, Jul 5, 2012 at 6:16 PM, Jan Kiszka wrote: > On 2012-07-05 12:10, liu ping fan wrote: >> On Thu, Jul 5, 2012 at 2:46 PM, Jan Kiszka wrote: >>> On 2012-07-05 04:18, Liu Ping Fan wrote: Vcpu state must be set completely before receiving INIT-IPI,SIPI Signed-off-by: Liu Ping Fan --- kvm.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kvm.h b/kvm.h index 9c7b0ea..5b3c228 100644 --- a/kvm.h +++ b/kvm.h @@ -198,6 +198,7 @@ static inline void cpu_synchronize_post_init(CPUArchState *env) { if (kvm_enabled()) { kvm_cpu_synchronize_post_init(env); +smp_wmb(); } } >>> >>> In theory, there should be no vcpu kick-off after this without some >>> locking operations involved that imply barriers. Did you see real >> >> Yeah, but what if it is non-x86? > > The locking I'm referring to is arch independent. > >>> inconsistencies without this explicit one? > > Again: Did you see real issues or is this based on static analysis? > Just on static analysis Regards, pingfan > Jan > > -- > Siemens AG, Corporate Technology, CT RTC ITP SDP-DE > Corporate Competence Center Embedded Linux > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended
The following commit changed mmu_shrink() so that it would skip VMs whose n_used_mmu_pages is not zero and try to free pages from others: commit 1952639665e92481c34c34c3e2a71bf3e66ba362 KVM: MMU: do not iterate over all VMs in mmu_shrink() This patch fixes the function so that it can free mmu pages as before. Note that "if (!nr_to_scan--)" check is removed since we do not try to free mmu pages from more than one VM. Signed-off-by: Takuya Yoshikawa Cc: Gleb Natapov --- arch/x86/kvm/mmu.c |5 + 1 files changed, 1 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3b53d9e..5fd268a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3957,11 +3957,8 @@ static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc) * want to shrink a VM that only started to populate its MMU * anyway. */ - if (kvm->arch.n_used_mmu_pages > 0) { - if (!nr_to_scan--) - break; + if (!kvm->arch.n_used_mmu_pages) continue; - } idx = srcu_read_lock(&kvm->srcu); spin_lock(&kvm->mmu_lock); -- 1.7.5.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq
This patch let the virtio_net driver can negotiate the number of queues it wishes to use through control virtqueue and export an ethtool interface to let use tweak it. As current multiqueue virtio-net implementation has optimizations on per-cpu virtuqueues, so only two modes were support: - single queue pair mode - multiple queue paris mode, the number of queues matches the number of vcpus The single queue mode were used by default currently due to regression of multiqueue mode in some test (especially in stream test). Since virtio core does not support paritially deleting virtqueues, so during mode switching the whole virtqueue were deleted and the driver would re-create the virtqueues it would used. btw. The queue number negotiating were defered to .ndo_open(), this is because only after feature negotitaion could we send the command to control virtqueue (as it may also use event index). Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 171 ++- include/linux/virtio_net.h |7 ++ 2 files changed, 142 insertions(+), 36 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7410187..3339eeb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -88,6 +88,7 @@ struct receive_queue { struct virtnet_info { u16 num_queue_pairs;/* # of RX/TX vq pairs */ + u16 total_queue_pairs; struct send_queue *sq[MAX_QUEUES] cacheline_aligned_in_smp; struct receive_queue *rq[MAX_QUEUES] cacheline_aligned_in_smp; @@ -137,6 +138,8 @@ struct padded_vnet_hdr { char padding[6]; }; +static const struct ethtool_ops virtnet_ethtool_ops; + static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq) { int ret = virtqueue_get_queue_index(vq); @@ -802,22 +805,6 @@ static void virtnet_netpoll(struct net_device *dev) } #endif -static int virtnet_open(struct net_device *dev) -{ - struct virtnet_info *vi = netdev_priv(dev); - int i; - - for (i = 0; i < vi->num_queue_pairs; i++) { - /* Make sure we have some buffers: if oom use wq. */ - if (!try_fill_recv(vi->rq[i], GFP_KERNEL)) - queue_delayed_work(system_nrt_wq, - &vi->rq[i]->refill, 0); - virtnet_napi_enable(vi->rq[i]); - } - - return 0; -} - /* * Send command via the control virtqueue and check status. Commands * supported by the hypervisor, as indicated by feature bits, should @@ -873,6 +860,43 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi) rtnl_unlock(); } +static int virtnet_set_queues(struct virtnet_info *vi) +{ + struct scatterlist sg; + struct net_device *dev = vi->dev; + sg_init_one(&sg, &vi->num_queue_pairs, sizeof(vi->num_queue_pairs)); + + if (!vi->has_cvq) + return -EINVAL; + + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MULTIQUEUE, + VIRTIO_NET_CTRL_MULTIQUEUE_QNUM, &sg, 1, 0)){ + dev_warn(&dev->dev, "Fail to set the number of queue pairs to" +" %d\n", vi->num_queue_pairs); + return -EINVAL; + } + + return 0; +} + +static int virtnet_open(struct net_device *dev) +{ + struct virtnet_info *vi = netdev_priv(dev); + int i; + + for (i = 0; i < vi->num_queue_pairs; i++) { + /* Make sure we have some buffers: if oom use wq. */ + if (!try_fill_recv(vi->rq[i], GFP_KERNEL)) + queue_delayed_work(system_nrt_wq, + &vi->rq[i]->refill, 0); + virtnet_napi_enable(vi->rq[i]); + } + + virtnet_set_queues(vi); + + return 0; +} + static int virtnet_close(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); @@ -1013,12 +1037,6 @@ static void virtnet_get_drvinfo(struct net_device *dev, } -static const struct ethtool_ops virtnet_ethtool_ops = { - .get_drvinfo = virtnet_get_drvinfo, - .get_link = ethtool_op_get_link, - .get_ringparam = virtnet_get_ringparam, -}; - #define MIN_MTU 68 #define MAX_MTU 65535 @@ -1235,7 +1253,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) err: if (ret && names) - for (i = 0; i < vi->num_queue_pairs * 2; i++) + for (i = 0; i < total_vqs * 2; i++) kfree(names[i]); kfree(names); @@ -1373,7 +1391,6 @@ static int virtnet_probe(struct virtio_device *vdev) mutex_init(&vi->config_lock); vi->config_enable = true; INIT_WORK(&vi->config_work, virtnet_config_changed_work); - vi->num_queue_pairs = num_queue_pairs; /* If we can receive ANY GSO packets, we must allocate large ones. */ if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) || @@ -1387,6 +1404,
[net-next RFC V5 4/5] virtio_net: multiqueue support
This patch converts virtio_net to a multi queue device. After negotiated VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs, and driver could read the number from config space. The driver expects the number of rx/tx queue paris is equal to the number of vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some optimization were introduced: - Txq selection is based on the processor id in order to avoid contending a lock whose owner may exits to host. - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns the queue pairs. Signed-off-by: Krishna Kumar Signed-off-by: Jason Wang --- drivers/net/virtio_net.c | 645 ++- include/linux/virtio_net.h |2 + 2 files changed, 452 insertions(+), 195 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 1db445b..7410187 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -26,6 +26,7 @@ #include #include #include +#include static int napi_weight = 128; module_param(napi_weight, int, 0444); @@ -41,6 +42,8 @@ module_param(gso, bool, 0444); #define VIRTNET_SEND_COMMAND_SG_MAX2 #define VIRTNET_DRIVER_VERSION "1.0.0" +#define MAX_QUEUES 256 + struct virtnet_stats { struct u64_stats_sync tx_syncp; struct u64_stats_sync rx_syncp; @@ -51,43 +54,69 @@ struct virtnet_stats { u64 rx_packets; }; -struct virtnet_info { - struct virtio_device *vdev; - struct virtqueue *rvq, *svq, *cvq; - struct net_device *dev; +/* Internal representation of a send virtqueue */ +struct send_queue { + /* Virtqueue associated with this send _queue */ + struct virtqueue *vq; + + /* TX: fragments + linear part + virtio header */ + struct scatterlist sg[MAX_SKB_FRAGS + 2]; +}; + +/* Internal representation of a receive virtqueue */ +struct receive_queue { + /* Virtqueue associated with this receive_queue */ + struct virtqueue *vq; + + /* Back pointer to the virtnet_info */ + struct virtnet_info *vi; + struct napi_struct napi; - unsigned int status; /* Number of input buffers, and max we've ever had. */ unsigned int num, max; + /* Work struct for refilling if we run low on memory. */ + struct delayed_work refill; + + /* Chain pages by the private ptr. */ + struct page *pages; + + /* RX: fragments + linear part + virtio header */ + struct scatterlist sg[MAX_SKB_FRAGS + 2]; +}; + +struct virtnet_info { + u16 num_queue_pairs;/* # of RX/TX vq pairs */ + + struct send_queue *sq[MAX_QUEUES] cacheline_aligned_in_smp; + struct receive_queue *rq[MAX_QUEUES] cacheline_aligned_in_smp; + struct virtqueue *cvq; + + struct virtio_device *vdev; + struct net_device *dev; + unsigned int status; + /* I like... big packets and I cannot lie! */ bool big_packets; /* Host will merge rx buffers for big packets (shake it! shake it!) */ bool mergeable_rx_bufs; + /* Has control virtqueue */ + bool has_cvq; + /* enable config space updates */ bool config_enable; /* Active statistics */ struct virtnet_stats __percpu *stats; - /* Work struct for refilling if we run low on memory. */ - struct delayed_work refill; - /* Work struct for config space updates */ struct work_struct config_work; /* Lock for config space updates */ struct mutex config_lock; - - /* Chain pages by the private ptr. */ - struct page *pages; - - /* fragments + linear part + virtio header */ - struct scatterlist rx_sg[MAX_SKB_FRAGS + 2]; - struct scatterlist tx_sg[MAX_SKB_FRAGS + 2]; }; struct skb_vnet_hdr { @@ -108,6 +137,22 @@ struct padded_vnet_hdr { char padding[6]; }; +static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq) +{ + int ret = virtqueue_get_queue_index(vq); + + /* skip ctrl vq */ + if (vi->has_cvq) + return (ret - 1) / 2; + else + return ret / 2; +} + +static inline int rxq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq) +{ + return virtqueue_get_queue_index(vq) / 2; +} + static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) { return (struct skb_vnet_hdr *)skb->cb; @@ -117,22 +162,22 @@ static inline struct skb_vnet_hdr *skb_vnet_hdr(struct sk_buff *skb) * private is used to chain pages for big packets, put the whole * most recent used list in the beginning for reuse */ -static void give_pages(struct virtnet_info *vi, struct page *page) +static void give_pages(struct receive_queue *rq, struct page *page) { struct page *end; /* Find end of list, sew whole thing into vi->pages. */ for (end = page; end->private; end = (struc
[net-next RFC V5 3/5] virtio: intorduce an API to set affinity for a virtqueue
Sometimes, virtio device need to configure irq affiniry hint to maximize the performance. Instead of just exposing the irq of a virtqueue, this patch introduce an API to set the affinity for a virtqueue. The api is best-effort, the affinity hint may not be set as expected due to platform support, irq sharing or irq type. Currently, only pci method were implemented and we set the affinity according to: - if device uses INTX, we just ignore the request - if device has per vq vector, we force the affinity hint - if the virtqueues share MSI, make the affinity OR over all affinities requested Signed-off-by: Jason Wang --- drivers/virtio/virtio_pci.c | 46 + include/linux/virtio_config.h | 21 ++ 2 files changed, 67 insertions(+), 0 deletions(-) diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index adb24f2..2ff0451 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -48,6 +48,7 @@ struct virtio_pci_device int msix_enabled; int intx_enabled; struct msix_entry *msix_entries; + cpumask_var_t *msix_affinity_masks; /* Name strings for interrupts. This size should be enough, * and I'm too lazy to allocate each name separately. */ char (*msix_names)[256]; @@ -276,6 +277,10 @@ static void vp_free_vectors(struct virtio_device *vdev) for (i = 0; i < vp_dev->msix_used_vectors; ++i) free_irq(vp_dev->msix_entries[i].vector, vp_dev); + for (i = 0; i < vp_dev->msix_vectors; i++) + if (vp_dev->msix_affinity_masks[i]) + free_cpumask_var(vp_dev->msix_affinity_masks[i]); + if (vp_dev->msix_enabled) { /* Disable the vector used for configuration */ iowrite16(VIRTIO_MSI_NO_VECTOR, @@ -293,6 +298,8 @@ static void vp_free_vectors(struct virtio_device *vdev) vp_dev->msix_names = NULL; kfree(vp_dev->msix_entries); vp_dev->msix_entries = NULL; + kfree(vp_dev->msix_affinity_masks); + vp_dev->msix_affinity_masks = NULL; } static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, @@ -311,6 +318,15 @@ static int vp_request_msix_vectors(struct virtio_device *vdev, int nvectors, GFP_KERNEL); if (!vp_dev->msix_names) goto error; + vp_dev->msix_affinity_masks + = kzalloc(nvectors * sizeof *vp_dev->msix_affinity_masks, + GFP_KERNEL); + if (!vp_dev->msix_affinity_masks) + goto error; + for (i = 0; i < nvectors; ++i) + if (!alloc_cpumask_var(&vp_dev->msix_affinity_masks[i], + GFP_KERNEL)) + goto error; for (i = 0; i < nvectors; ++i) vp_dev->msix_entries[i].entry = i; @@ -607,6 +623,35 @@ static const char *vp_bus_name(struct virtio_device *vdev) return pci_name(vp_dev->pci_dev); } +/* Setup the affinity for a virtqueue: + * - force the affinity for per vq vector + * - OR over all affinities for shared MSI + * - ignore the affinity request if we're using INTX + */ +static int vp_set_vq_affinity(struct virtqueue *vq, int cpu) +{ + struct virtio_device *vdev = vq->vdev; + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct virtio_pci_vq_info *info = vq->priv; + struct cpumask *mask; + unsigned int irq; + + if (!vq->callback) + return -EINVAL; + + if (vp_dev->msix_enabled) { + mask = vp_dev->msix_affinity_masks[info->msix_vector]; + irq = vp_dev->msix_entries[info->msix_vector].vector; + if (cpu == -1) + irq_set_affinity_hint(irq, NULL); + else { + cpumask_set_cpu(cpu, mask); + irq_set_affinity_hint(irq, mask); + } + } + return 0; +} + static struct virtio_config_ops virtio_pci_config_ops = { .get= vp_get, .set= vp_set, @@ -618,6 +663,7 @@ static struct virtio_config_ops virtio_pci_config_ops = { .get_features = vp_get_features, .finalize_features = vp_finalize_features, .bus_name = vp_bus_name, + .set_vq_affinity = vp_set_vq_affinity, }; static void virtio_pci_release_dev(struct device *_d) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index fc457f4..2c4a989 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -98,6 +98,7 @@ * vdev: the virtio_device * This returns a pointer to the bus name a la pci_name from which * the caller can then copy. + * @set_vq_affinity: set the affinity for a virtqueue. */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -116,6 +117,7 @@ struct virti
[net-next RFC V5 2/5] virtio_ring: move queue_index to vring_virtqueue
Instead of storing the queue index in virtio infos, this patch moves them to vring_virtqueue and introduces helpers to set and get the value. This would simplify the management and tracing. Signed-off-by: Jason Wang --- drivers/virtio/virtio_mmio.c |5 + drivers/virtio/virtio_pci.c | 12 +--- drivers/virtio/virtio_ring.c | 17 + include/linux/virtio.h |4 4 files changed, 27 insertions(+), 11 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index 453db0c..f5432b6 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -131,9 +131,6 @@ struct virtio_mmio_vq_info { /* the number of entries in the queue */ unsigned int num; - /* the index of the queue */ - int queue_index; - /* the virtual address of the ring queue */ void *queue; @@ -324,7 +321,6 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, err = -ENOMEM; goto error_kmalloc; } - info->queue_index = index; /* Allocate pages for the queue - start with a queue as big as * possible (limited by maximum size allowed by device), drop down @@ -363,6 +359,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device *vdev, unsigned index, goto error_new_virtqueue; } + virtqueue_set_queue_index(vq, index); vq->priv = info; info->vq = vq; diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c index 2e03d41..adb24f2 100644 --- a/drivers/virtio/virtio_pci.c +++ b/drivers/virtio/virtio_pci.c @@ -79,9 +79,6 @@ struct virtio_pci_vq_info /* the number of entries in the queue */ int num; - /* the index of the queue */ - int queue_index; - /* the virtual address of the ring queue */ void *queue; @@ -202,11 +199,11 @@ static void vp_reset(struct virtio_device *vdev) static void vp_notify(struct virtqueue *vq) { struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); - struct virtio_pci_vq_info *info = vq->priv; /* we write the queue's selector into the notification register to * signal the other end */ - iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY); + iowrite16(virtqueue_get_queue_index(vq), + vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY); } /* Handle a configuration change: Tell driver if it wants to know. */ @@ -402,7 +399,6 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, if (!info) return ERR_PTR(-ENOMEM); - info->queue_index = index; info->num = num; info->msix_vector = msix_vec; @@ -425,6 +421,7 @@ static struct virtqueue *setup_vq(struct virtio_device *vdev, unsigned index, goto out_activate_queue; } + virtqueue_set_queue_index(vq, index); vq->priv = info; info->vq = vq; @@ -467,7 +464,8 @@ static void vp_del_vq(struct virtqueue *vq) list_del(&info->node); spin_unlock_irqrestore(&vp_dev->lock, flags); - iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); + iowrite16(virtqueue_get_queue_index(vq), + vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL); if (vp_dev->msix_enabled) { iowrite16(VIRTIO_MSI_NO_VECTOR, diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 5aa43c3..9c5aeea 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -106,6 +106,9 @@ struct vring_virtqueue /* How to notify other side. FIXME: commonalize hcalls! */ void (*notify)(struct virtqueue *vq); + /* Index of the queue */ + int queue_index; + #ifdef DEBUG /* They're supposed to lock for us. */ unsigned int in_use; @@ -171,6 +174,20 @@ static int vring_add_indirect(struct vring_virtqueue *vq, return head; } +void virtqueue_set_queue_index(struct virtqueue *_vq, int queue_index) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + vq->queue_index = queue_index; +} +EXPORT_SYMBOL_GPL(virtqueue_set_queue_index); + +int virtqueue_get_queue_index(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + return vq->queue_index; +} +EXPORT_SYMBOL_GPL(virtqueue_get_queue_index); + /** * virtqueue_add_buf - expose buffer to other end * @vq: the struct virtqueue we're talking about. diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 8efd28a..0d8ed46 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -50,6 +50,10 @@ void *virtqueue_detach_unused_buf(struct virtqueue *vq); unsigned int virtqueue_get_vring_size(struct virtqueue *vq); +void virtqueue_set_queue_index(struct virtqueue *vq, int queue_index); + +int virtqueue_get_queue_index(struct virtqueue *vq);
[net-next RFC V5 1/5] virtio_net: Introduce VIRTIO_NET_F_MULTIQUEUE
From: Krishna Kumar Introduce VIRTIO_NET_F_MULTIQUEUE. Signed-off-by: Krishna Kumar Signed-off-by: Jason Wang --- include/linux/virtio_net.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 2470f54..1bc7e30 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -51,6 +51,7 @@ #define VIRTIO_NET_F_CTRL_RX_EXTRA 20 /* Extra RX mode control support */ #define VIRTIO_NET_F_GUEST_ANNOUNCE 21 /* Guest can announce device on the * network */ +#define VIRTIO_NET_F_MULTIQUEUE22 /* Device supports multiple TXQ/RXQ */ #define VIRTIO_NET_S_LINK_UP 1 /* Link is up */ #define VIRTIO_NET_S_ANNOUNCE 2 /* Announcement is needed */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next RFC V5 0/5] Multiqueue virtio-net
Hello All: This series is an update version of multiqueue virtio-net driver based on Krishna Kumar's work to let virtio-net use multiple rx/tx queues to do the packets reception and transmission. Please review and comments. Test Environment: - Intel(R) Xeon(R) CPU E5620 @ 2.40GHz, 8 cores 2 numa nodes - Two directed connected 82599 Test Summary: - Highlights: huge improvements on TCP_RR test - Lowlights: regression on small packet transmission, higher cpu utilization than single queue, need further optimization Analysis of the performance result: - I count the number of packets sending/receiving during the test, and multiqueue show much more ability in terms of packets per second. - For the tx regression, multiqueue send about 1-2 times of more packets compared to single queue, and the packets size were much smaller than single queue does. I suspect tcp does less batching in multiqueue, so I hack the tcp_write_xmit() to forece more batching, multiqueue works as well as singlequeue for both small transmission and throughput - I didn't pack the accelerate RFS with virtio-net in this sereis as it still need further shaping, for the one that interested in this please see: http://www.mail-archive.com/kvm@vger.kernel.org/msg64111.html Changes from V4: - Add ability to negotiate the number of queues through control virtqueue - Ethtool -{L|l} support and default the tx/rx queue number to 1 - Expose the API to set irq affinity instead of irq itself Changes from V3: - Rebase to the net-next - Let queue 2 to be the control virtqueue to obey the spec - Prodives irq affinity - Choose txq based on processor id References: - V4: https://lkml.org/lkml/2012/6/25/120 - V3: http://lwn.net/Articles/467283/ Test result: 1) 1 vm 2 vcpu 1q vs 2q, 1 - 1q, 2 - 2q, no pinning - Guest to External Host TCP STREAM sessions size throughput1 throughput2 norm1 norm2 1 64 650.55 655.61 100% 24.88 24.86 99% 2 64 1446.81 1309.44 90% 30.49 27.16 89% 4 64 1430.52 1305.59 91% 30.78 26.80 87% 8 64 1450.89 1270.82 87% 30.83 25.95 84% 1 256 1699.45 1779.58 104% 56.75 59.08 104% 2 256 4902.71 3446.59 70% 98.53 62.78 63% 4 256 4803.76 2980.76 62% 97.44 54.68 56% 8 256 5128.88 3158.74 61% 104.68 58.61 55% 1 512 2837.98 2838.42 100% 89.76 90.41 100% 2 512 6742.59 5495.83 81% 155.03 99.07 63% 4 512 9193.70 5900.17 64% 202.84 106.44 52% 8 512 9287.51 7107.79 76% 202.18 129.08 63% 1 1024 4166.42 4224.98 101% 128.55 129.86 101% 2 1024 6196.94 7823.08 126% 181.80 168.81 92% 4 1024 9113.62 9219.49 101% 235.15 190.93 81% 8 1024 9324.25 9402.66 100% 239.10 179.99 75% 1 2048 7441.63 6534.04 87% 248.01 215.63 86% 2 2048 7024.61 7414.90 105% 225.79 219.62 97% 4 2048 8971.49 9269.00 103% 278.94 220.84 79% 8 2048 9314.20 9359.96 100% 268.36 192.23 71% 1 4096 8282.60 8990.08 108% 277.45 320.05 115% 2 4096 9194.80 9293.78 101% 317.02 248.76 78% 4 4096 9340.73 9313.19 99% 300.34 230.35 76% 8 4096 9148.23 9347.95 102% 279.49 199.43 71% 1 16384 8787.89 8766.31 99% 312.38 316.53 101% 2 16384 9306.35 9156.14 98% 319.53 279.83 87% 4 16384 9177.81 9307.50 101% 312.69 230.07 73% 8 16384 9035.82 9188.00 101% 298.32 199.17 66% - TCP RR sessions size throughput1 throughput2 norm1 norm2 50 1 54695.41 84164.98 153% 1957.33 1901.31 97% 100 1 60141.88 88598.94 147% 2157.90 2000.45 92% 250 1 74763.56 135584.22 181% 2541.94 2628.59 103% 50 64 51628.38 82867.50 160% 1872.55 1812.16 96% 100 64 60367.73 84080.60 139% 2215.69 1867.69 84% 250 64 68502.70 124910.59 182% 2321.43 2495.76 107% 50 128 53477.08 77625.07 145% 1905.10 1870.99 98% 100 128 59697.56 74902.37 125% 2230.66 1751.03 78% 250 128 71248.74 133963.55 188% 2453.12 2711.72 110% 50 256 47663.86 67742.63 142% 1880.45 1735.30 92% 100 256 54051.84 68738.57 127% 2123.03 1778.59 83% 250 256 68250.06 124487.90 182% 2321.89 2598.60 111% - External Host to Guest TCP STRAM sessions size throughput1 throughput2 norm1 norm2 1 64 847.71 864.83 102% 57.99 57.93 99% 2 64 1690.82 1544.94 91% 80.13 55.09 68% 4 64 3434.98 3455.53 100% 127.17 89.00 69% 8 64 5890.19 6557.35 111% 194.70 146.52 75% 1 256 2094.04 2109.14 100% 130.73 127.14 97% 2 256 5218.13 3731.97 71% 219.15 114.02 52% 4 256 6734.51 9213.47 136% 227.87 208.31 91% 8 256 6452.86 9402.78 145% 224.83 207.77 92% 1 512 3945.07 4203.68 106% 279.72 273.30 97% 2 512 7878.96 8122.55 103% 278.25 231.71 83% 4 512 7645.89 9402.13 122% 252.10 217.42 86% 8 512 6657.06 9403.71 141% 239.81 214.89 89% 1 1024 5729.06 5111.21 89% 289.38 303.09 104% 2 1024 8097.27 8159.67 100% 269.29 242.97 90% 4 1024 7778.93 8919.02 114% 261.28 205.50 78% 8 1024 6458.02 9360.02 144% 221.26 208.09 94% 1 2048 6426.94 5195.59 80% 292.52 307.47 105% 2 2048 8221.90 9025.66 109% 283.80 242.25 85% 4 2048 7364.72 8527.79 115% 248.10 198.36 79% 8 2048 6760.63 9161.07 135% 230.53 205.12 88% 1 4096 7247.02 6874.21 94% 276.23 287.68 104% 2 4096 8346.04 8818.65 105% 281.49 254.81 90% 4 4096 6710.00 9354.59 139% 216.41 210.13 97% 8 4096 6265.69 9406.87 150% 206.69 210.92 102% 1
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
Il 05/07/2012 03:52, Nicholas A. Bellinger ha scritto: > > fio randrw workload | virtio-scsi-raw | virtio-scsi+tcm_vhost | bare-metal > raw block > > 25 Write / 75 Read | ~15K | ~45K | ~70K > 75 Write / 25 Read | ~20K | ~55K | ~60K This is impressive, but I think it's still not enough to justify the inclusion of tcm_vhost. In my opinion, vhost-blk/vhost-scsi are mostly worthwhile as drivers for improvements to QEMU performance. We want to add more fast paths to QEMU that let us move SCSI and virtio processing to separate threads, we have proof of concepts that this can be done, and we can use vhost-blk/vhost-scsi to find bottlenecks more effectively. In fact, virtio-scsi-qemu and virtio-scsi-vhost are effectively two completely different devices that happen to speak the same SCSI transport. Not only virtio-scsi-vhost must be configured outside QEMU and doesn't support -device; it (obviously) presents different inquiry/vpd/mode data than virtio-scsi-qemu, so that it is not possible to migrate one to the other. I don't think vhost-scsi is particularly useful for virtualization, honestly. However, if it is useful for development, testing or benchmarking of lio itself (does this make any sense? :)) that could be by itself a good reason to include it. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cpu: smp_wmb before lauching cpus.
On 2012-07-05 12:10, liu ping fan wrote: > On Thu, Jul 5, 2012 at 2:46 PM, Jan Kiszka wrote: >> On 2012-07-05 04:18, Liu Ping Fan wrote: >>> Vcpu state must be set completely before receiving INIT-IPI,SIPI >>> >>> Signed-off-by: Liu Ping Fan >>> --- >>> kvm.h |1 + >>> 1 files changed, 1 insertions(+), 0 deletions(-) >>> >>> diff --git a/kvm.h b/kvm.h >>> index 9c7b0ea..5b3c228 100644 >>> --- a/kvm.h >>> +++ b/kvm.h >>> @@ -198,6 +198,7 @@ static inline void >>> cpu_synchronize_post_init(CPUArchState *env) >>> { >>> if (kvm_enabled()) { >>> kvm_cpu_synchronize_post_init(env); >>> +smp_wmb(); >>> } >>> } >>> >>> >> >> In theory, there should be no vcpu kick-off after this without some >> locking operations involved that imply barriers. Did you see real > > Yeah, but what if it is non-x86? The locking I'm referring to is arch independent. >> inconsistencies without this explicit one? Again: Did you see real issues or is this based on static analysis? Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] cpu: smp_wmb before lauching cpus.
On Thu, Jul 5, 2012 at 2:46 PM, Jan Kiszka wrote: > On 2012-07-05 04:18, Liu Ping Fan wrote: >> Vcpu state must be set completely before receiving INIT-IPI,SIPI >> >> Signed-off-by: Liu Ping Fan >> --- >> kvm.h |1 + >> 1 files changed, 1 insertions(+), 0 deletions(-) >> >> diff --git a/kvm.h b/kvm.h >> index 9c7b0ea..5b3c228 100644 >> --- a/kvm.h >> +++ b/kvm.h >> @@ -198,6 +198,7 @@ static inline void >> cpu_synchronize_post_init(CPUArchState *env) >> { >> if (kvm_enabled()) { >> kvm_cpu_synchronize_post_init(env); >> +smp_wmb(); >> } >> } >> >> > > In theory, there should be no vcpu kick-off after this without some > locking operations involved that imply barriers. Did you see real Yeah, but what if it is non-x86? > inconsistencies without this explicit one? > > Jan > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio-blk: add back VIRTIO_BLK_F_FLUSH
The old name is part of the userspace API, add it back for compatibility. Reported-by: Sasha Levin Signed-off-by: Paolo Bonzini --- include/linux/virtio_blk.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/virtio_blk.h b/include/linux/virtio_blk.h index 18a1027..83a3116 100644 --- a/include/linux/virtio_blk.h +++ b/include/linux/virtio_blk.h @@ -41,6 +41,9 @@ #define VIRTIO_BLK_F_TOPOLOGY 10 /* Topology information is available */ #define VIRTIO_BLK_F_CONFIG_WCE11 /* Writeback mode available in config */ +/* Backwards-compatibility #defines for renamed features. */ +#define VIRTIO_BLK_F_FLUSH VIRTIO_BLK_F_WCE + #define VIRTIO_BLK_ID_BYTES20 /* ID string length */ struct virtio_blk_config { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] MAINTAINERS: add kvm list for virtio components
The KVM list is followed by more people than the generic virtualizat...@lists.linux-foundation.org mailing list, and is already "de facto" the place where virtio patches are posted. pv-ops still has no other lists than virtualizat...@lists.linux-foundation.org. However, pv-ops patches will likely touch Xen or KVM files as well and the respective mailing list will usually be reached as well. Signed-off-by: Paolo Bonzini --- MAINTAINERS |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 14bc707..e265f2e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -7340,6 +7340,7 @@ F:include/media/videobuf2-* VIRTIO CONSOLE DRIVER M: Amit Shah L: virtualizat...@lists.linux-foundation.org +L: kvm@vger.kernel.org S: Maintained F: drivers/char/virtio_console.c F: include/linux/virtio_console.h @@ -7348,6 +7349,7 @@ VIRTIO CORE, NET AND BLOCK DRIVERS M: Rusty Russell M: "Michael S. Tsirkin" L: virtualizat...@lists.linux-foundation.org +L: kvm@vger.kernel.org S: Maintained F: drivers/virtio/ F: drivers/net/virtio_net.c -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Wed, Jul 04, 2012 at 07:01:05PM -0700, Nicholas A. Bellinger wrote: > On Wed, 2012-07-04 at 18:05 +0300, Michael S. Tsirkin wrote: > > On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote: > > > Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto: > > > > On Wed, Jul 04, 2012 at 04:24:00AM +, Nicholas A. Bellinger wrote: > > > >> From: Nicholas Bellinger > > > >> > > > >> Hi folks, > > > >> > > > >> This series contains patches required to update tcm_vhost <-> > > > >> virtio-scsi > > > >> connected hosts <-> guests to run on v3.5-rc2 mainline code. This > > > >> series is > > > >> available on top of target-pending/auto-next here: > > > >> > > > >> > > > >> git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git > > > >> tcm_vhost > > > >> > > > >> This includes the necessary vhost changes from Stefan to to get > > > >> tcm_vhost > > > >> functioning, along a virtio-scsi LUN scanning change to address a > > > >> client bug > > > >> with tcm_vhost I ran into.. Also, tcm_vhost driver has been merged > > > >> into a single > > > >> source + header file that is now living under /drivers/vhost/, along > > > >> with latest > > > >> tcm_vhost changes from Zhi's tcm_vhost tree. > > > >> > > > >> Here are a couple of screenshots of the code in action using raw IBLOCK > > > >> backends provided by FusionIO ioDrive Duo: > > > >> > > > >>http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png > > > >>http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png > > > >> > > > >> So the next steps on my end will be converting tcm_vhost to submit > > > >> backend I/O from > > > >> cmwq context, along with fio benchmark numbers between > > > >> tcm_vhost/virtio-scsi and > > > >> virtio-scsi-raw using raw IBLOCK iomemory_vsl flash. > > > > > > > > OK so this is an RFC, not for merge yet? > > > > > > Patch 6 definitely looks RFCish, but patch 5 should go in anyway. > > > > > > Paolo > > > > I was talking about 4/6 first of all. > > So yeah, this code is still considered RFC at this point for-3.6, but > I'd like to get this into target-pending/for-next in next week for more > feedback and start collecting signoffs for the necessary pieces that > effect existing vhost code. > > By that time the cmwq conversion of tcm_vhost should be in place as > well.. I'll try to give some feedback but I think we do need to see the qemu patches - they weren't posted yet, were they? This driver has some userspace interface and once that is merged it has to be supported. So I think we need the buy-in from the qemu side at the principal level. > > Anyway, it's best to split, not to mix RFCs and fixes. > > > > , I'll send patch #5 separately to linux-scsi -> James and CC > stable following Paolo's request. > > Thanks! > > --nab -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-ppc] [RFC PATCH 04/17] KVM: PPC64: booke: Add guest computation mode for irq delivery
> -Original Message- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Wednesday, July 04, 2012 4:41 PM > To: Caraman Mihai Claudiu-B02008 > Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; linuxppc- > d...@lists.ozlabs.org; qemu-...@nongnu.org > Subject: Re: [Qemu-ppc] [RFC PATCH 04/17] KVM: PPC64: booke: Add guest > computation mode for irq delivery > > > On 25.06.2012, at 14:26, Mihai Caraman wrote: > > > Signed-off-by: Mihai Caraman > > --- > > arch/powerpc/kvm/booke.c |8 +++- > > 1 files changed, 7 insertions(+), 1 deletions(-) > > > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c > > index d15c4b5..93b48e0 100644 > > --- a/arch/powerpc/kvm/booke.c > > +++ b/arch/powerpc/kvm/booke.c > > @@ -287,6 +287,7 @@ static int kvmppc_booke_irqprio_deliver(struct > kvm_vcpu *vcpu, > > bool crit; > > bool keep_irq = false; > > enum int_class int_class; > > + ulong msr_cm = 0; > > > > /* Truncate crit indicators in 32 bit mode */ > > if (!(vcpu->arch.shared->msr & MSR_SF)) { > > @@ -299,6 +300,10 @@ static int kvmppc_booke_irqprio_deliver(struct > kvm_vcpu *vcpu, > > /* ... and we're in supervisor mode */ > > crit = crit && !(vcpu->arch.shared->msr & MSR_PR); > > > > +#ifdef CONFIG_64BIT > > + msr_cm = vcpu->arch.epcr & SPRN_EPCR_ICM ? MSR_CM : 0; > > +#endif > > No need for the ifdef, no?. Just mask EPCR_ICM out in the 32-bit host > case, then this check is always false on 32-bit hosts. It will break e500v2. epcr field is declared only for CONFIG_KVM_BOOKE_HV, we can limit to this instead of CONFIG_64BIT. -Mike -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: race between kvm-kmod-3.0 and kvm-kmod-3.3 // was: race condition in qemu-kvm-1.0.1
On 06/28/2012 05:11 PM, Peter Lieven wrote: > that here is bascially whats going on: > > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read len > 3 gpa 0xa val 0x10ff > qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa > gpa 0xa Read GPA > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio > unsatisfied-read len 1 gpa 0xa val 0x0 > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason > KVM_EXIT_MMIO (6) > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read > len 3 gpa 0xa val 0x10ff > qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa > gpa 0xa Read GPA > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio > unsatisfied-read len 1 gpa 0xa val 0x0 > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason > KVM_EXIT_MMIO (6) > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read > len 3 gpa 0xa val 0x10ff > qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa > gpa 0xa Read GPA > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio > unsatisfied-read len 1 gpa 0xa val 0x0 > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason > KVM_EXIT_MMIO (6) > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio read > len 3 gpa 0xa val 0x10ff > qemu-kvm-1.0-2506 [010] 60996.908000: vcpu_match_mmio: gva 0xa > gpa 0xa Read GPA > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_mmio: mmio > unsatisfied-read len 1 gpa 0xa val 0x0 > qemu-kvm-1.0-2506 [010] 60996.908000: kvm_userspace_exit: reason > KVM_EXIT_MMIO (6) > There are two mmio emulation after user-space-exit, it is caused by mmio read access which spans two pages. But it should be fixed by: commit f78146b0f9230765c6315b2e14f56112513389ad Author: Avi Kivity Date: Wed Apr 18 19:22:47 2012 +0300 KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of "fragments", and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti Could you please pull the code from: https://git.kernel.org/pub/scm/virt/kvm/kvm.git and trace it again? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: allow toggling host cache between writeback and writethrough
On Thu, 2012-07-05 at 08:47 +0200, Paolo Bonzini wrote: > Il 04/07/2012 23:11, Sasha Levin ha scritto: > > There are two things going on here: > > 1. Rename VIRTIO_BLK_F_FLUSH to VIRTIO_BLK_F_WCE > > 2. Add a new VIRTIO_BLK_F_CONFIG_WCE > > > > I'm concerned that the first change is going to break compilation for > > any code that included linux/virtio-blk.h and used VIRTIO_BLK_F_FLUSH. > > That would be nlkt, right? :) nlkt, lguest, and probably others. linux/virtio_blk.h is a public kernel header, which is supposed to be used from userspace - so I assume many others who implemented virtio-blk for any reason took advantage of that header. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html