[PATCH] kvm, ioapic: Fix an error field reference
From: Liu Yuan Function ioapic_debug() in the ioapic_deliver() misnames one filed by reference. This patch correct it. Signed-off-by: Liu Yuan --- virt/kvm/ioapic.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 0b9df83..8df1ca1 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -167,7 +167,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x " "vector=%x trig_mode=%x\n", -entry->fields.dest, entry->fields.dest_mode, +entry->fields.dest_id, entry->fields.dest_mode, entry->fields.delivery_mode, entry->fields.vector, entry->fields.trig_mode); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [qemu-iotests][PATCH] Update rbd support
On Tue, Apr 12, 2011 at 10:42:00PM -0700, Josh Durgin wrote: > > I suspect we only support the weird writing past size for the > > file protocol, so we should only run the test for it. > > > > Or does sheepdog do anything special about it? > > Sheepdog supports it by truncating to the right size if a write > would be past the end. I'm not sure if other protocols support > it. I've changed 016 to require the file or sheepdog protocols, and then applied the rest of your patch. Thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Autotest PATCH] KVM-test: Check if guest bootable after reseting several times
This test comes from a regression bug: Guest can not found bootable device after reseting several times by monitor command. Signed-off-by: Amos Kong --- client/tests/kvm/tests/system_reset_bootable.py | 29 +++ client/tests/kvm/tests_base.cfg.sample |7 ++ 2 files changed, 36 insertions(+), 0 deletions(-) create mode 100755 client/tests/kvm/tests/system_reset_bootable.py diff --git a/client/tests/kvm/tests/system_reset_bootable.py b/client/tests/kvm/tests/system_reset_bootable.py new file mode 100755 index 000..ca9fb70 --- /dev/null +++ b/client/tests/kvm/tests/system_reset_bootable.py @@ -0,0 +1,29 @@ +import logging, time +from autotest_lib.client.common_lib import error +import kvm_test_utils + + +def run_system_reset_bootable(test, params, env): +""" +KVM reset test: +1) Boot guest. +2) Send some times system_reset monitor command. +3) Log into the guest to verify it could normally boot. + +@param test: kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" +vm = env.get_vm(params["main_vm"]) +vm.verify_alive() +timeout = float(params.get("login_timeout", 240)) +reset_times = int(params.get("reset_times",20)) +interval = int(params.get("reset_interval",10)) +wait_time = int(params.get("wait_time_for_reset",60)) +time.sleep(wait_time) + +for i in range(reset_times): +vm.monitor.cmd("system_reset") +time.sleep(interval) + +session = vm.wait_for_login(timeout=timeout) diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 7333ed0..ceafebe 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -961,6 +961,13 @@ variants: sleep_before_reset = 20 kill_vm_on_error = yes +- system_reset_bootable: +type = system_reset_bootable +interval = 1 +reset_times = 20 +wait_time_for_reset = 120 +kill_vm_on_error = yes + - shutdown: install setup unattended_install.cdrom type = shutdown shutdown_method = shell -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Autotest PATCH] KVM-test: Simple stop/continue test
Change guest state by monitor cmd, verify guest status, and try to login guest by network. Signed-off-by: Jason Wang Signed-off-by: Amos Kong --- client/tests/kvm/tests/stop_continue.py | 52 +++ client/tests/kvm/tests_base.cfg.sample |4 ++ 2 files changed, 56 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/stop_continue.py diff --git a/client/tests/kvm/tests/stop_continue.py b/client/tests/kvm/tests/stop_continue.py new file mode 100644 index 000..c7d8025 --- /dev/null +++ b/client/tests/kvm/tests/stop_continue.py @@ -0,0 +1,52 @@ +import logging +from autotest_lib.client.common_lib import error + + +def run_stop_continue(test, params, env): +""" +Suspend a running Virtual Machine and verify its state. + +1) Boot the vm +2) Suspend the vm through stop command +3) Verify the state through info status command +4) Check is the ssh session to guest is still responsive, + if succeed, fail the test. + +@param test: Kvm test object +@param params: Dictionary with the test parameters +@param env: Dictionary with test environment. +""" +vm = env.get_vm(params["main_vm"]) +vm.verify_alive() +timeout = float(params.get("login_timeout", 240)) +session = vm.wait_for_login(timeout=timeout) + +try: +logging.info("Suspend the virtual machine") +vm.monitor.cmd("stop") + +logging.info("Verifying the status of virtual machine through monitor") +o = vm.monitor.info("status") +if 'paused' not in o and ( "u'running': False" not in str(o)): +logging.error(o) +raise error.TestFail("Fail to suspend through monitor command line") + +logging.info("Check the session responsiveness") +if session.is_responsive(): +raise error.TestFail("Session is still responsive after stop") + +logging.info("Try to resume the guest") +vm.monitor.cmd("cont") + +o = vm.monitor.info("status") +m_type = params.get("monitor_type", "human") +if ('human' in m_type and 'running' not in o) or\ + ('qmp' in m_type and "u'running': True" not in str(o)): +logging.error(o) +raise error.TestFail("Could not continue the execution") + +logging.info("Try to re-log into guest") +session = vm.wait_for_login(timeout=timeout) + +finally: +session.close() diff --git a/client/tests/kvm/tests_base.cfg.sample b/client/tests/kvm/tests_base.cfg.sample index 5d274f8..7333ed0 100644 --- a/client/tests/kvm/tests_base.cfg.sample +++ b/client/tests/kvm/tests_base.cfg.sample @@ -260,6 +260,10 @@ variants: - systemtap: test_control_file = systemtap.control +- stop_continue: +type = stop_continue +kill_vm_on_error = yes + - linux_s3: install setup unattended_install.cdrom only Linux type = linux_s3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 7/8] Enable ixgbe to support zerocopy
Signed-off-by: Shirley Ma --- drivers/net/ixgbe/ixgbe_main.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index 6f8adc7..68f1e93 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -7395,6 +7395,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev, #endif /* IXGBE_FCOE */ if (pci_using_dac) { netdev->features |= NETIF_F_HIGHDMA; + netdev->features |= NETIF_F_ZEROCOPY; netdev->vlan_features |= NETIF_F_HIGHDMA; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to use qemu-kvm with Fedora15-beta gnome3 (better vga driver ?)
Hi all, I'm trying to use qemu-kvm to run Fedora15-beta with gnome3, but it told me graphics hardware failed to run gnome3 specific features and it fallback to gnome2; I checked the qemu-doc and tried all these vga drivers, no one could work with gnome3, does someone know how to run qemu with a better virtual graphics hardware ? Thanks, http://wiki.qemu.org/download/qemu-doc.html ‘-vga type’ Select type of VGA card to emulate. Valid values for type are ‘cirrus’ Cirrus Logic GD5446 Video card. All Windows versions starting from Windows 95 should recognize and use this graphic card. For optimal performances, use 16 bit color depth in the guest and the host OS. (This one is the default) ‘std’ Standard VGA card with Bochs VBE extensions. If your guest OS supports the VESA 2.0 VBE extensions (e.g. Windows XP) and if you want to use high resolution modes (>= 1280x1024x16) then you should use this option. ‘vmware’ VMWare SVGA-II compatible adapter. Use it if you have sufficiently recent XFree86/XOrg server or Windows guest with a driver for this card. -- Cheng Renquan (程任全) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
Krishna Kumar2 writes: > Thanks Jason! > > So I can use my virtio-net guest driver and test with this patch? > Please provide the script you use to start MQ guest. > Yes and thanks. Following is a simple script may help you start macvtap mq guest. qemu_path=./qemu-system-x86_64 img_path=/home/kvm_autotest_root/images/mq.qcow2 vtap_dev=/dev/tap104 mac=96:88:12:1C:27:83 smp=2 mq=4 for i in `seq $mq` do vtap+=" -netdev tap,id=hn$i,fd=$((i+100)) $((i+100))<>$vtap_dev" netdev+="hn$i#" done eval "$qemu_path $img_path $vtap -device virtio-net-pci,queues=$mq,netdev=$netdev,mac=$mac,vectors=32 -enable-kvm -smp $smp" > Regards, > > - KK > > Jason Wang wrote on 04/20/2011 02:03:07 PM: > > > Jason Wang > > 04/20/2011 02:03 PM > > > > To > > > > Krishna Kumar2/India/IBM@IBMIN, kvm@vger.kernel.org, m...@redhat.com, > > net...@vger.kernel.org, ru...@rustcorp.com.au, qemu- > > de...@nongnu.org, anth...@codemonkey.ws > > > > cc > > > > Subject > > > > [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net) > > > > Inspired by Krishna's patch > (http://www.spinics.net/lists/kvm/msg52098.html > > ) and > > Michael's suggestions. The following series adds the multiqueue support > for > > qemu and enable it for virtio-net (both userspace and vhost). > > > > The aim for this series is to simplified the management and achieve the > same > > performacne with less codes. > > > > Follows are the differences between this series and Krishna's: > > > > - Add the multiqueue support for qemu and also for userspace virtio-net > > - Instead of hacking the vhost module to manipulate kthreads, this patch > just > > implement the userspace based multiqueues and thus can re-use the > > existed vhost kernel-side codes without any modification. > > - Use 1:1 mapping between TX/RX pairs and vhost kthread because the > > implementation is based on usersapce. > > - The cli is also changed to make the mgmt easier, the -netdev option of > qdev > > can now accpet more than one ids. You can start a multiqueue virtio-net > device > > through: > > ./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev > > tap,id=hn0,vhost=on,fd=Y -device > virtio-net-pci,netdev=hn0#hn1,queues=2 ... > > > > The series is very primitive and still need polished. > > > > Suggestions are welcomed. > > --- > > > > Jason Wang (2): > > net: Add multiqueue support > > virtio-net: add multiqueue support > > > > > > hw/qdev-properties.c | 37 - > > hw/qdev.h|3 > > hw/vhost.c | 26 ++- > > hw/vhost.h |1 > > hw/vhost_net.c |7 + > > hw/vhost_net.h |2 > > hw/virtio-net.c | 409 +++ > > +-- > > hw/virtio-net.h |2 > > hw/virtio-pci.c |1 > > hw/virtio.h |1 > > net.c| 34 +++- > > net.h| 15 +- > > 12 files changed, 353 insertions(+), 185 deletions(-) > > > > -- > > Jason Wang > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3 V8] QAPI: add inject-nmi qmp command
Hi, Anthony Liguori Any suggestion? Although all command line interfaces will be converted to to use QMP interfaces in 0.16, I hope inject-nmi come into QAPI earlier, 0.15. Thanks, Lai -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance of virtual functions compared to virtio
On Wed, 2011-04-20 at 19:57 -0600, David Ahern wrote: > In general should virtual functions outperform virtio+vhost for > networking performance - latency and throughput? > > I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF > and the other going through virtio and a tap device like so: > >-- > | || VF |--- > | | | > | VM 1 || > | |- | > | |---| tap |--- | >-- - | --- > ---| e | > | b | | t | > | r | | h | > ---| 2 | >-- - | --- > | |---| tap |--- | > | |- | > | VM 2 || > | | | > | || VF |--- >-- > > The network arguments to qemu-kvm are: > -netdev type=tap,vhost=on,ifname=tap2,id=netdev1 > -device virtio-net-pci,mac=${mac},netdev=netdev1 > > where ${mac} is unique to each VM and for the VF: > -device pci-assign,host=${pciid} > > netserver is running within the VMs, and the netperf commands I am > running are: > > netperf -p 12346 -H -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024 > netperf -p 12346 -H -l 20 -jcC -fM -v 2 -t TCP_STREAM > > where changes depending on which interface I want to send the > traffic through. To say the least results are a bit disappointing for > the VF: > > latency throughput > (usec/Tran) (MB/sec) > Host-VM > over virtio 139.1601199.40 > over VF 488.124 209.22 > > VM-VM > over virtio 322.056 773.54 > over VF 488.051 328.88 > > I am just getting started with VFs and could use some hints on how to > improve the performance. Device assignment via a VF provides the lowest latency and most bandwidth for *getting data off the host system*, though virtio/vhost is getting better. If all you care about is VM-VM on the same host or VM-host, then virtio is only limited by memory bandwidth/latency and host processor cycles. Your processor has 25GB/s of memory bandwidth. On the other hand, the VF has to send data all the way out to the wire and all the way back up through the NIC to get to the other VM/host. You're using a 1Gb/s NIC. Your results actually seem to indicate you're getting better than wire rate, so maybe you're only passing through an internal switch on the NIC, in any case, VFs are not optimal for communication within the same physical system. They are optimal for off host communication. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
performance of virtual functions compared to virtio
In general should virtual functions outperform virtio+vhost for networking performance - latency and throughput? I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF and the other going through virtio and a tap device like so: -- | || VF |--- | | | | VM 1 || | |- | | |---| tap |--- | -- - | --- ---| e | | b | | t | | r | | h | ---| 2 | -- - | --- | |---| tap |--- | | |- | | VM 2 || | | | | || VF |--- -- The network arguments to qemu-kvm are: -netdev type=tap,vhost=on,ifname=tap2,id=netdev1 -device virtio-net-pci,mac=${mac},netdev=netdev1 where ${mac} is unique to each VM and for the VF: -device pci-assign,host=${pciid} netserver is running within the VMs, and the netperf commands I am running are: netperf -p 12346 -H -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024 netperf -p 12346 -H -l 20 -jcC -fM -v 2 -t TCP_STREAM where changes depending on which interface I want to send the traffic through. To say the least results are a bit disappointing for the VF: latency throughput (usec/Tran) (MB/sec) Host-VM over virtio 139.1601199.40 over VF 488.124 209.22 VM-VM over virtio 322.056 773.54 over VF 488.051 328.88 I am just getting started with VFs and could use some hints on how to improve the performance. Host: Dell R410 2 quad core E5620@2.40 GHz processors 16 GB RAM Intel 82576 NIC (Gigabit ET Quad Port) Fedora 14 kernel: 2.6.35.12-88.fc14.x86_64 qemu-kvm-0.13.0-1.fc14.x86_64 VMs: Fedora 14 kernel 2.6.35.11-83.fc14.x86_64 2 vcpus 1GB RAM Thanks, David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 3/8] Add userspace buffers support in skb
This patch adds userspace buffers support in skb. A new struct skb_ubuf_info is needed to maintain the userspace buffers argument and index, a callback is used to notify userspace to release the buffers once lower device has done DMA (Last reference to that skb has gone). Signed-off-by: Shirley Ma --- include/linux/skbuff.h | 14 ++ net/core/skbuff.c | 15 ++- 2 files changed, 28 insertions(+), 1 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d0ae90a..47a187b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -189,6 +189,16 @@ enum { SKBTX_DRV_NEEDS_SK_REF = 1 << 3, }; +/* The callback notifies userspace to release buffers when skb DMA is done in + * lower device, the desc is used to track userspace buffer index. + */ +struct skb_ubuf_info { + /* support buffers allocation from userspace */ + void(*callback)(struct sk_buff *); + void*arg; + size_t desc; +}; + /* This data is invariant across clones and lives at * the end of the header data, ie. at skb->end. */ @@ -211,6 +221,10 @@ struct skb_shared_info { /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; + + /* DMA mapping from/to userspace buffers */ + struct skb_ubuf_info ubuf; + /* must be last field, see pskb_expand_head() */ skb_frag_t frags[MAX_SKB_FRAGS]; }; diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7ebeed0..822c07d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -210,6 +210,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, shinfo = skb_shinfo(skb); memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); atomic_set(&shinfo->dataref, 1); + shinfo->ubuf.callback = NULL; + shinfo->ubuf.arg = NULL; kmemcheck_annotate_variable(shinfo->destructor_arg); if (fclone) { @@ -327,7 +329,15 @@ static void skb_release_data(struct sk_buff *skb) for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) put_page(skb_shinfo(skb)->frags[i].page); } - + /* +* if skb buf is from userspace, we need to notify the caller +* the lower device DMA has done; +*/ + if (skb_shinfo(skb)->ubuf.callback) { + skb_shinfo(skb)->ubuf.callback(skb); + skb_shinfo(skb)->ubuf.callback = NULL; + skb_shinfo(skb)->ubuf.arg = NULL; + } if (skb_has_frag_list(skb)) skb_drop_fraglist(skb); @@ -480,6 +490,9 @@ bool skb_recycle_check(struct sk_buff *skb, int skb_size) if (irqs_disabled()) return false; + if (shinfo->ubuf.callback) + return false; + if (skb_is_nonlinear(skb) || skb->fclone != SKB_FCLONE_UNAVAILABLE) return false; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
On Wed, 20 Apr 2011 14:18:12 +0300 Avi Kivity wrote: > Correct. The reason I don't want the helper, is so we can use ptep_user > in both places (not for efficiency, just to make sure it's exactly the > same value). > Thank you for your explanation, now I've got the picture! I will send a new patch taking into account your advice. Takuya > > The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a > > fix soon. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too
On Wed, 20 Apr 2011 15:33:16 +0200 "Roedel, Joerg" wrote: > @@ -245,13 +257,17 @@ walk: > goto error; > > if (write_fault && !is_dirty_gpte(pte)) { > - bool ret; > + int ret; > > trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); > - ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, > + ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte, > pte|PT_DIRTY_MASK); > - if (ret) > + if (ret < 0) { > + present = false; > + goto error; > + } if (ret) > goto walk; Preferably else if or another line ? :) Takuya > + > mark_page_dirty(vcpu->kvm, table_gfn); > pte |= PT_DIRTY_MASK; > walker->ptes[walker->level - 1] = pte; > -- > 1.7.1 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 5/8] Enable cxgb3 to support zerocopy
On Wed, 2011-04-20 at 13:58 -0700, Shirley Ma wrote: > This flag can be ON when HIGHDMA and scatter/gather support. I will > modify the patch to make it conditionally. Double checked, it only needs HIGHDMA condition, not scatter/gather. thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 5/8] Enable cxgb3 to support zerocopy
On Wed, 2011-04-20 at 13:52 -0700, Dimitris Michailidis wrote: > The features handling has been reworked in net-next and patches like > this > won't apply as the code you're patching has changed. Also core code > now > does a lot of the related work and you'll need to tell it what to do > with > any new flags. Ok, will do. > What properties does a device or driver need to meet to set this flag? > It > seems to be set a bit too unconditionally. For example, does it work > if one > disables scatter/gather? This flag can be ON when HIGHDMA and scatter/gather support. I will modify the patch to make it conditionally. thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 5/8] Enable cxgb3 to support zerocopy
On 04/20/2011 01:13 PM, Shirley Ma wrote: Signed-off-by: Shirley Ma --- drivers/net/cxgb3/cxgb3_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index 9108931..93a1101 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -3313,7 +3313,7 @@ static int __devinit init_one(struct pci_dev *pdev, netdev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO; netdev->features |= NETIF_F_GRO; if (pci_using_dac) - netdev->features |= NETIF_F_HIGHDMA; + netdev->features |= NETIF_F_HIGHDMA | NETIF_F_ZEROCOPY; netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX; netdev->netdev_ops = &cxgb_netdev_ops; The features handling has been reworked in net-next and patches like this won't apply as the code you're patching has changed. Also core code now does a lot of the related work and you'll need to tell it what to do with any new flags. What properties does a device or driver need to meet to set this flag? It seems to be set a bit too unconditionally. For example, does it work if one disables scatter/gather? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/3] KVM: Use pci_store/load_saved_state() around VM device usage
Store the device saved state so that we can reload the device back to the original state when it's unassigned. This has the benefit that the state survives across pci_reset_function() calls via the PCI sysfs reset interface while the VM is using the device. Signed-off-by: Alex Williamson --- include/linux/kvm_host.h |1 + virt/kvm/assigned-dev.c | 18 ++ 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ab42855..9272db0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -513,6 +513,7 @@ struct kvm_assigned_dev_kernel { struct kvm *kvm; spinlock_t intx_lock; char irq_name[32]; + struct pci_saved_state *pci_saved_state; }; struct kvm_irq_mask_notifier { diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index ae72ae6..6cc4b97 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -197,8 +197,13 @@ static void kvm_free_assigned_device(struct kvm *kvm, { kvm_free_assigned_irq(kvm, assigned_dev); - __pci_reset_function(assigned_dev->dev); - pci_restore_state(assigned_dev->dev); + pci_reset_function(assigned_dev->dev); + if (pci_load_and_free_saved_state(assigned_dev->dev, + &assigned_dev->pci_saved_state)) + printk(KERN_INFO "%s: Couldn't reload %s saved state\n", + __func__, dev_name(&assigned_dev->dev->dev)); + else + pci_restore_state(assigned_dev->dev); pci_release_regions(assigned_dev->dev); pci_disable_device(assigned_dev->dev); @@ -516,7 +521,10 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, pci_reset_function(dev); pci_save_state(dev); - + match->pci_saved_state = pci_store_saved_state(dev); + if (!match->pci_saved_state) + printk(KERN_DEBUG "%s: Couldn't store %s saved state\n", + __func__, dev_name(&dev->dev)); match->assigned_dev_id = assigned_dev->assigned_dev_id; match->host_segnr = assigned_dev->segnr; match->host_busnr = assigned_dev->busnr; @@ -546,7 +554,9 @@ out: mutex_unlock(&kvm->lock); return r; out_list_del: - pci_restore_state(dev); + if (pci_load_and_free_saved_state(dev, &match->pci_saved_state)) + printk(KERN_INFO "%s: Couldn't reload %s saved state\n", + __func__, dev_name(&dev->dev)); list_del(&match->list); pci_release_regions(dev); out_disable: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] PCI: Add interfaces to store and load the device saved state
For KVM device assignment, we'd like to save off the state of a device prior to passing it to the guest and restore it later. We also want to allow pci_reset_funciton() to be called while the device is owned by the guest. This however overwrites and invalidates the struct pci_dev buffers, so we can't just manually call save and restore. Add generic interfaces for the saved state to be stored and reloaded back into struct pci_dev at a later time. Signed-off-by: Alex Williamson --- drivers/pci/pci.c | 98 +++ include/linux/pci.h |4 ++ 2 files changed, 102 insertions(+), 0 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d2500a0..7631acf 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -976,6 +976,104 @@ void pci_restore_state(struct pci_dev *dev) dev->state_saved = false; } +struct pci_saved_state { + u32 config_space[16]; + struct pci_cap_saved cap_saved[0]; +}; + +/** + * pci_store_saved_state - Allocate and return an opaque struct containing + *the device saved state. + * @dev: PCI device that we're dealing with + * + * Rerturn NULL if no state or error. + */ +struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev) +{ + struct pci_saved_state *state; + struct pci_cap_saved_state *tmp; + struct pci_cap_saved *cap_saved; + struct hlist_node *pos; + size_t size; + + if (!dev->state_saved) + return NULL; + + size = sizeof(*state) + sizeof(struct pci_cap_saved); + + hlist_for_each_entry(tmp, pos, &dev->saved_cap_space, next) + size += sizeof(struct pci_cap_saved) + tmp->saved.size; + + state = kzalloc(size, GFP_KERNEL); + if (!state) + return NULL; + + memcpy(state->config_space, dev->saved_config_space, + sizeof(state->config_space)); + + cap_saved = state->cap_saved; + hlist_for_each_entry(tmp, pos, &dev->saved_cap_space, next) { + size_t len = sizeof(struct pci_cap_saved) + tmp->saved.size; + memcpy(cap_saved, &tmp->saved, len); + cap_saved = (struct pci_cap_saved *)((u8 *)cap_saved + len); + } + /* Empty cap_save terminates list */ + + return state; +} +EXPORT_SYMBOL_GPL(pci_store_saved_state); + +/** + * pci_load_saved_state - Reload the provided save state into struct pci_dev. + * @dev: PCI device that we're dealing with + * @state: Saved state returned from pci_store_saved_state() + */ +int pci_load_saved_state(struct pci_dev *dev, struct pci_saved_state *state) +{ + struct pci_cap_saved *cap_saved; + + dev->state_saved = false; + + if (!state) + return 0; + + memcpy(dev->saved_config_space, state->config_space, + sizeof(state->config_space)); + + cap_saved = state->cap_saved; + while (cap_saved->size) { + struct pci_cap_saved_state *tmp; + + tmp = pci_find_saved_cap(dev, cap_saved->cap_nr); + if (!tmp || tmp->saved.size != cap_saved->size) + return -EINVAL; + + memcpy(tmp->saved.data, cap_saved->data, tmp->saved.size); + cap_saved = (struct pci_cap_saved *)((u8 *)cap_saved + + sizeof(struct pci_cap_saved) + cap_saved->size); + } + + dev->state_saved = true; + return 0; +} +EXPORT_SYMBOL_GPL(pci_load_saved_state); + +/** + * pci_load_and_free_saved_state - Reload the save state pointed to by state, + *and free the memory allocated for it. + * @dev: PCI device that we're dealing with + * @state: Pointer to saved state returned from pci_store_saved_state() + */ +int pci_load_and_free_saved_state(struct pci_dev *dev, + struct pci_saved_state **state) +{ + int ret = pci_load_saved_state(dev, *state); + kfree(*state); + *state = NULL; + return ret; +} +EXPORT_SYMBOL_GPL(pci_load_and_free_saved_state); + static int do_pci_enable_device(struct pci_dev *dev, int bars) { int err; diff --git a/include/linux/pci.h b/include/linux/pci.h index 46fd382..f2a6262 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -812,6 +812,10 @@ size_t pci_get_rom_size(struct pci_dev *pdev, void __iomem *rom, size_t size); /* Power management related routines */ int pci_save_state(struct pci_dev *dev); void pci_restore_state(struct pci_dev *dev); +struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev); +int pci_load_saved_state(struct pci_dev *dev, struct pci_saved_state *state); +int pci_load_and_free_saved_state(struct pci_dev *dev, + struct pci_saved_state **state); int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state); int pci_set_power_state(struct pci_dev *dev, pci_power_t state); pci_power_t pci_choose_state(struc
[PATCH v3 1/3] PCI: Track the size of each saved capability data area
This will allow us to store and load it later. Signed-off-by: Alex Williamson --- drivers/pci/pci.c | 12 +++- include/linux/pci.h | 11 --- 2 files changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2472e71..d2500a0 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -830,7 +830,7 @@ static int pci_save_pcie_state(struct pci_dev *dev) dev_err(&dev->dev, "buffer not found in %s\n", __func__); return -ENOMEM; } - cap = (u16 *)&save_state->data[0]; + cap = (u16 *)&save_state->saved.data[0]; pci_read_config_word(dev, pos + PCI_EXP_FLAGS, &flags); @@ -863,7 +863,7 @@ static void pci_restore_pcie_state(struct pci_dev *dev) pos = pci_find_capability(dev, PCI_CAP_ID_EXP); if (!save_state || pos <= 0) return; - cap = (u16 *)&save_state->data[0]; + cap = (u16 *)&save_state->saved.data[0]; pci_read_config_word(dev, pos + PCI_EXP_FLAGS, &flags); @@ -899,7 +899,8 @@ static int pci_save_pcix_state(struct pci_dev *dev) return -ENOMEM; } - pci_read_config_word(dev, pos + PCI_X_CMD, (u16 *)save_state->data); + pci_read_config_word(dev, pos + PCI_X_CMD, +(u16 *)save_state->saved.data); return 0; } @@ -914,7 +915,7 @@ static void pci_restore_pcix_state(struct pci_dev *dev) pos = pci_find_capability(dev, PCI_CAP_ID_PCIX); if (!save_state || pos <= 0) return; - cap = (u16 *)&save_state->data[0]; + cap = (u16 *)&save_state->saved.data[0]; pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]); } @@ -1771,7 +1772,8 @@ static int pci_add_cap_save_buffer( if (!save_state) return -ENOMEM; - save_state->cap_nr = cap; + save_state->saved.cap_nr = cap; + save_state->saved.size = size; pci_add_saved_cap(dev, save_state); return 0; diff --git a/include/linux/pci.h b/include/linux/pci.h index 96f70d7..46fd382 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -214,12 +214,17 @@ enum pci_bus_speed { PCI_SPEED_UNKNOWN = 0xff, }; -struct pci_cap_saved_state { - struct hlist_node next; +struct pci_cap_saved { char cap_nr; + unsigned int size; u32 data[0]; }; +struct pci_cap_saved_state { + struct hlist_node next; + struct pci_cap_saved saved; +}; + struct pcie_link_state; struct pci_vpd; struct pci_sriov; @@ -366,7 +371,7 @@ static inline struct pci_cap_saved_state *pci_find_saved_cap( struct hlist_node *pos; hlist_for_each_entry(tmp, pos, &pci_dev->saved_cap_space, next) { - if (tmp->cap_nr == cap) + if (tmp->saved.cap_nr == cap) return tmp; } return NULL; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3] Store and load PCI device saved state across function resets
v2 -> v3: Saved structure has variable contents. Avi, see if this adds any credibility to the pci-core allocated opaque buffer. It was wrong in the previous versions to distill the variable device capability save list into a fixed struct. This should also eliminate any future maintenance specific to this storing and loading of state as capability save changes. v1 -> v2: Make the pointer passed around less opaque for type safety. Bug https://bugs.launchpad.net/qemu/+bug/754591 is caused because the KVM module attempts to do a pci_save_state() before assigning the device to a VM, expecting that the saved state will remain valid until we release the device. This is in conflict with our need to reset devices using PCI sysfs during a VM reset to quiesce the device. Any calls to pci_reset_function() will overwrite the device saved stated prior to reset, and reload and invalidate the state after. KVM then ends up trying to restore the state, but it's already invalid, so the device ends up with reset values. This series adds a mechanism to pull the saved state off the struct pci_dev and reload it later. Thanks, Alex --- Alex Williamson (3): KVM: Use pci_store/load_saved_state() around VM device usage PCI: Add interfaces to store and load the device saved state PCI: Track the size of each saved capability data area drivers/pci/pci.c| 110 -- include/linux/kvm_host.h |1 include/linux/pci.h | 15 +- virt/kvm/assigned-dev.c | 18 ++-- 4 files changed, 132 insertions(+), 12 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/8] Add a new zerocopy device flag
Resubmit it with 31 bit. Signed-off-by: Shirley Ma --- include/linux/netdevice.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0249fe7..0998d3d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1067,6 +1067,9 @@ struct net_device { #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */ #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */ +/* Bit 31 is for device to map userspace buffers -- zerocopy */ +#define NETIF_F_ZEROCOPY (1 << 31) + /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 #define NETIF_F_GSO_MASK 0x00ff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/8] Add a new zerocopy device flag
On Wed, 2011-04-20 at 13:24 -0700, Dimitris Michailidis wrote: > Bit 30 is also taken in net-next. How about 31? Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 6/8] macvtap/vhost TX zero copy support
Only when buffer size is greater than GOODCOPY_LEN (128), macvtap enables zero-copy. Signed-off-by: Shirley Ma --- drivers/net/macvtap.c | 124 - 1 files changed, 112 insertions(+), 12 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 6696e56..b4e6656 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -60,6 +60,7 @@ static struct proto macvtap_proto = { */ static dev_t macvtap_major; #define MACVTAP_NUM_DEVS 65536 +#define GOODCOPY_LEN (L1_CACHE_BYTES < 128 ? 128 : L1_CACHE_BYTES) static struct class *macvtap_class; static struct cdev macvtap_cdev; @@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct file *file) { struct net *net = current->nsproxy->net_ns; struct net_device *dev = dev_get_by_index(net, iminor(inode)); + struct macvlan_dev *vlan = netdev_priv(dev); struct macvtap_queue *q; int err; @@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode, struct file *file) q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP; q->vnet_hdr_sz = sizeof(struct virtio_net_hdr); + /* +* so far only VM uses macvtap, enable zero copy between guest +* kernel and host kernel when lower device supports high memory +* DMA +*/ + if (vlan) { + if (vlan->lowerdev->features & NETIF_F_ZEROCOPY) + sock_set_flag(&q->sk, SOCK_ZEROCOPY); + } + err = macvtap_set_queue(dev, file, q); if (err) sock_put(&q->sk); @@ -433,6 +445,80 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock *sk, size_t prepad, return skb; } +/* set skb frags from iovec, this can move to core network code for reuse */ +static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from, + int offset, size_t count) +{ + int len = iov_length(from, count) - offset; + int copy = skb_headlen(skb); + int size, offset1 = 0; + int i = 0; + skb_frag_t *f; + + /* Skip over from offset */ + while (offset >= from->iov_len) { + offset -= from->iov_len; + ++from; + --count; + } + + /* copy up to skb headlen */ + while (copy > 0) { + size = min_t(unsigned int, copy, from->iov_len - offset); + if (copy_from_user(skb->data + offset1, from->iov_base + offset, + size)) + return -EFAULT; + if (copy > size) { + ++from; + --count; + } + copy -= size; + offset1 += size; + offset = 0; + } + + if (len == offset1) + return 0; + + while (count--) { + struct page *page[MAX_SKB_FRAGS]; + int num_pages; + unsigned long base; + + len = from->iov_len - offset1; + if (!len) { + offset1 = 0; + ++from; + continue; + } + base = (unsigned long)from->iov_base + offset1; + size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT; + num_pages = get_user_pages_fast(base, size, 0, &page[i]); + if ((num_pages != size) || + (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags)) + /* put_page is in skb free */ + return -EFAULT; + skb->data_len += len; + skb->len += len; + skb->truesize += len; + while (len) { + f = &skb_shinfo(skb)->frags[i]; + f->page = page[i]; + f->page_offset = base & ~PAGE_MASK; + f->size = min_t(int, len, PAGE_SIZE - f->page_offset); + skb_shinfo(skb)->nr_frags++; + /* increase sk_wmem_alloc */ + atomic_add(f->size, &skb->sk->sk_wmem_alloc); + base += f->size; + len -= f->size; + i++; + } + offset1 = 0; + ++from; + } + return 0; +} + /* * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should * be shared with the tun/tap driver. @@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb, /* Get packet from user space buffer */ -static ssize_t macvtap_get_user(struct macvtap_queue *q, - const struct iovec *iv, size_t count, - int noblock) +static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, + const struct iovec *iv, unsigned long total_len, +
Re: [PATCH V3 2/8] Add a new zerocopy device flag
Thanks. I need to update it to 30 bit. Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/8] Add a new zerocopy device flag
On 04/20/2011 01:09 PM, Shirley Ma wrote: Resubmit this patch with the new bit. Bit 30 is also taken in net-next. Signed-off-by: Shirley Ma --- include/linux/netdevice.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0249fe7..0998d3d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1067,6 +1067,9 @@ struct net_device { #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */ #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */ +/* Bit 30 is for device to map userspace buffers -- zerocopy */ +#define NETIF_F_ZEROCOPY (1 << 30) + /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 #define NETIF_F_GSO_MASK 0x00ff -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 0/8] macvtap/vhost TX zero copy support
Only when buffer size is greater than GOODCOPY_LEN (128), macvtap enables zero-copy. Signed-off-by: Shirley MA --- drivers/net/macvtap.c | 124 - 1 files changed, 112 insertions(+), 12 deletions(-) diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c index 6696e56..b4e6656 100644 --- a/drivers/net/macvtap.c +++ b/drivers/net/macvtap.c @@ -60,6 +60,7 @@ static struct proto macvtap_proto = { */ static dev_t macvtap_major; #define MACVTAP_NUM_DEVS 65536 +#define GOODCOPY_LEN (L1_CACHE_BYTES < 128 ? 128 : L1_CACHE_BYTES) static struct class *macvtap_class; static struct cdev macvtap_cdev; @@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct file *file) { struct net *net = current->nsproxy->net_ns; struct net_device *dev = dev_get_by_index(net, iminor(inode)); + struct macvlan_dev *vlan = netdev_priv(dev); struct macvtap_queue *q; int err; @@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode, struct file *file) q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP; q->vnet_hdr_sz = sizeof(struct virtio_net_hdr); + /* +* so far only VM uses macvtap, enable zero copy between guest +* kernel and host kernel when lower device supports high memory +* DMA +*/ + if (vlan) { + if (vlan->lowerdev->features & NETIF_F_ZEROCOPY) + sock_set_flag(&q->sk, SOCK_ZEROCOPY); + } + err = macvtap_set_queue(dev, file, q); if (err) sock_put(&q->sk); @@ -433,6 +445,80 @@ static inline struct sk_buff *macvtap_alloc_skb(struct sock *sk, size_t prepad, return skb; } +/* set skb frags from iovec, this can move to core network code for reuse */ +static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec *from, + int offset, size_t count) +{ + int len = iov_length(from, count) - offset; + int copy = skb_headlen(skb); + int size, offset1 = 0; + int i = 0; + skb_frag_t *f; + + /* Skip over from offset */ + while (offset >= from->iov_len) { + offset -= from->iov_len; + ++from; + --count; + } + + /* copy up to skb headlen */ + while (copy > 0) { + size = min_t(unsigned int, copy, from->iov_len - offset); + if (copy_from_user(skb->data + offset1, from->iov_base + offset, + size)) + return -EFAULT; + if (copy > size) { + ++from; + --count; + } + copy -= size; + offset1 += size; + offset = 0; + } + + if (len == offset1) + return 0; + + while (count--) { + struct page *page[MAX_SKB_FRAGS]; + int num_pages; + unsigned long base; + + len = from->iov_len - offset1; + if (!len) { + offset1 = 0; + ++from; + continue; + } + base = (unsigned long)from->iov_base + offset1; + size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT; + num_pages = get_user_pages_fast(base, size, 0, &page[i]); + if ((num_pages != size) || + (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags)) + /* put_page is in skb free */ + return -EFAULT; + skb->data_len += len; + skb->len += len; + skb->truesize += len; + while (len) { + f = &skb_shinfo(skb)->frags[i]; + f->page = page[i]; + f->page_offset = base & ~PAGE_MASK; + f->size = min_t(int, len, PAGE_SIZE - f->page_offset); + skb_shinfo(skb)->nr_frags++; + /* increase sk_wmem_alloc */ + atomic_add(f->size, &skb->sk->sk_wmem_alloc); + base += f->size; + len -= f->size; + i++; + } + offset1 = 0; + ++from; + } + return 0; +} + /* * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should * be shared with the tun/tap driver. @@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff *skb, /* Get packet from user space buffer */ -static ssize_t macvtap_get_user(struct macvtap_queue *q, - const struct iovec *iv, size_t count, - int noblock) +static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m, + const struct iovec *iv, unsigned long total_len, +
[PATCH V3 8/8] Enable benet to support zerocopy
Signed-off-by: Shirley Ma --- drivers/net/benet/be_main.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c index 7cb5a11..d7b7254 100644 --- a/drivers/net/benet/be_main.c +++ b/drivers/net/benet/be_main.c @@ -2982,6 +2982,7 @@ static int __devinit be_probe(struct pci_dev *pdev, status = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)); if (!status) { netdev->features |= NETIF_F_HIGHDMA; + netdev->features |= NETIF_F_ZEROCOPY; } else { status = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)); if (status) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 5/8] Enable cxgb3 to support zerocopy
Signed-off-by: Shirley Ma --- drivers/net/cxgb3/cxgb3_main.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c index 9108931..93a1101 100644 --- a/drivers/net/cxgb3/cxgb3_main.c +++ b/drivers/net/cxgb3/cxgb3_main.c @@ -3313,7 +3313,7 @@ static int __devinit init_one(struct pci_dev *pdev, netdev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO; netdev->features |= NETIF_F_GRO; if (pci_using_dac) - netdev->features |= NETIF_F_HIGHDMA; + netdev->features |= NETIF_F_HIGHDMA | NETIF_F_ZEROCOPY; netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX; netdev->netdev_ops = &cxgb_netdev_ops; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/8] Add a new zerocopy device flag
Resubmit this patch with the new bit. Signed-off-by: Shirley Ma --- include/linux/netdevice.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0249fe7..0998d3d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1067,6 +1067,9 @@ struct net_device { #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */ #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */ +/* Bit 30 is for device to map userspace buffers -- zerocopy */ +#define NETIF_F_ZEROCOPY (1 << 30) + /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 #define NETIF_F_GSO_MASK 0x00ff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 4/8] vhost TX zero copy support
This patch maintains the outstanding userspace buffers in the sequence it is delivered to vhost. The outstanding userspace buffers will be marked as done once the lower device buffers DMA has finished. This is monitored through last reference of kfree_skb callback. Two buffer index are used for this purpose. The vhost passes the userspace buffers info to lower device skb through message control. Since there will be some done DMAs when entering vhost handle_tx. The worse case is all buffers in the vq are in pending/done status, so we need to notify guest to release DMA done buffers first before get any new buffers from the vq. Signed-off-by: Shirley --- drivers/vhost/net.c | 30 +++- drivers/vhost/vhost.c | 50 - drivers/vhost/vhost.h | 10 + 3 files changed, 87 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 2f7c76a..1bc4536 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -32,6 +32,8 @@ * Using this limit prevents one virtqueue from starving others. */ #define VHOST_NET_WEIGHT 0x8 +#define MAX_ZEROCOPY_PEND 64 + enum { VHOST_NET_VQ_RX = 0, VHOST_NET_VQ_TX = 1, @@ -129,6 +131,7 @@ static void handle_tx(struct vhost_net *net) int err, wmem; size_t hdr_size; struct socket *sock; + struct skb_ubuf_info pend; /* TODO: check that we are running from vhost_worker? */ sock = rcu_dereference_check(vq->private_data, 1); @@ -151,6 +154,10 @@ static void handle_tx(struct vhost_net *net) hdr_size = vq->vhost_hlen; for (;;) { + /* Release DMAs done buffers first */ + if (sock_flag(sock->sk, SOCK_ZEROCOPY)) + vhost_zerocopy_signal_used(vq); + head = vhost_get_vq_desc(&net->dev, vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, @@ -166,6 +173,12 @@ static void handle_tx(struct vhost_net *net) set_bit(SOCK_ASYNC_NOSPACE, &sock->flags); break; } + /* If more outstanding DMAs, queue the work */ + if (sock_flag(sock->sk, SOCK_ZEROCOPY) && + (atomic_read(&vq->refcnt) > MAX_ZEROCOPY_PEND)) { + vhost_poll_queue(&vq->poll); + break; + } if (unlikely(vhost_enable_notify(vq))) { vhost_disable_notify(vq); continue; @@ -188,17 +201,30 @@ static void handle_tx(struct vhost_net *net) iov_length(vq->hdr, s), hdr_size); break; } + /* use msg_control to pass vhost zerocopy ubuf info to skb */ + if (sock_flag(sock->sk, SOCK_ZEROCOPY)) { + pend.callback = vhost_zerocopy_callback; + pend.arg = vq; + pend.desc = vq->upend_idx; + msg.msg_control = &pend; + msg.msg_controllen = sizeof(pend); + vq->heads[vq->upend_idx].id = head; + vq->upend_idx = (vq->upend_idx + 1) % UIO_MAXIOV; + atomic_inc(&vq->refcnt); + } /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(NULL, sock, &msg, len); if (unlikely(err < 0)) { - vhost_discard_vq_desc(vq, 1); + if (!sock_flag(sock->sk, SOCK_ZEROCOPY)) + vhost_discard_vq_desc(vq, 1); tx_poll_start(net, sock); break; } if (err != len) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); - vhost_add_used_and_signal(&net->dev, vq, head, 0); + if (!sock_flag(sock->sk, SOCK_ZEROCOPY)) + vhost_add_used_and_signal(&net->dev, vq, head, 0); total_len += len; if (unlikely(total_len >= VHOST_NET_WEIGHT)) { vhost_poll_queue(&vq->poll); diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2ab2912..09bcb1d 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -174,6 +174,9 @@ static void vhost_vq_reset(struct vhost_dev *dev, vq->call_ctx = NULL; vq->call = NULL; vq->log_ctx = NULL; + vq->upend_idx = 0; + vq->done_idx = 0; + atomic_set(&vq->refcnt, 0); } static int vhost_worker(void *data) @@ -230,7 +233,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
Re: [PATCH V3 2/8] Add a new zerocopy device flag
On Wed, 2011-04-20 at 12:44 -0700, Shirley Ma wrote: > This zerocopy flag is used to support device DMA userspace buffers. > > Signed-off-by: Shirley Ma > --- > > include/linux/netdevice.h |3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 0249fe7..0998d3d 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1067,6 +1067,9 @@ struct net_device { > #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */ > #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming > offload */ > > +/* bit 29 is for device to map userspace buffers -- zerocopy */ > +#define NETIF_F_ZEROCOPY (1 << 29) Look above. Ben. > /* Segmentation offload features */ > #define NETIF_F_GSO_SHIFT16 > #define NETIF_F_GSO_MASK 0x00ff > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Hutchings, Senior Software Engineer, Solarflare Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 2/8] Add a new zerocopy device flag
This zerocopy flag is used to support device DMA userspace buffers. Signed-off-by: Shirley Ma --- include/linux/netdevice.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0249fe7..0998d3d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1067,6 +1067,9 @@ struct net_device { #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */ #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */ +/* bit 29 is for device to map userspace buffers -- zerocopy */ +#define NETIF_F_ZEROCOPY (1 << 29) + /* Segmentation offload features */ #define NETIF_F_GSO_SHIFT 16 #define NETIF_F_GSO_MASK 0x00ff -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 1/8] Add a new sock zerocopy flag
This sock zerocopy flag is used to support lower level device DMA userspace buffers. Signed-off-by: Shirley Ma --- include/net/sock.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 01810a3..daa0a80 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -562,6 +562,7 @@ enum sock_flags { SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */ SOCK_FASYNC, /* fasync() active */ SOCK_RXQ_OVFL, + SOCK_ZEROCOPY, }; static inline void sock_copy_flags(struct sock *nsk, struct sock *osk) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3 0/8] macvtap/vhost TX zero copy support
This patchset add supports for TX zero-copy between guest and host kernel through vhost. It significantly reduces CPU utilization on the local host on which the guest is located (It reduced 30-50% CPU usage for vhost thread for single stream test). The patchset is based on previous submission and comments from the community regarding when/how to handle guest kernel buffers to be released. This is the simplest approach I can think of after comparing with several other solutions. This patchset includes: 1/8: Add a new sock zero-copy flag, SOCK_ZEROCOPY; 2/8: Add a new device flag, NETIF_F_ZEROCOPY for lower level device support zero-copy; 3/8: Add a new struct skb_ubuf_info in skb_share_info for userspace buffers release callback when lower device DMA has done for that skb; 4/8: Add vhost zero-copy callback in vhost when skb last refcnt is gone; add vhost_zerocopy_add_used_and_signal to notify guest to release TX skb buffers. 5/8: Add macvtap zero-copy in lower device when sending packet is greater than 128 bytes. 6/8: Add Chelsio 10Gb NIC to zero copy feature flag 7/8: Add Intel 10Gb NIC zero copy feature flag 8/8: Add Emulex 10Gb NIC zero copy feature flag The patchset is built against most recent linux 2.6.git. It has passed netperf/netserver multiple streams stress test on above NICs. The single stream test results from 2.6.37 kernel on Chelsio: 64K message size: copy_from_user dropped from 40% to 5%; vhost thread cpu utilization dropped from 76% to 28% I am collecting more test results against 2.6.39-rc3 kernel and will provide the test matrix later. Thanks Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel
https://bugzilla.kernel.org/show_bug.cgi?id=33762 Anton Kochkov changed: What|Removed |Added Kernel Version||2.6.38 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel
https://bugzilla.kernel.org/show_bug.cgi?id=33762 --- Comment #3 from Anton Kochkov 2011-04-20 16:40:49 --- Created an attachment (id=54832) --> (https://bugzilla.kernel.org/attachment.cgi?id=54832) Dmesg output -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel
https://bugzilla.kernel.org/show_bug.cgi?id=33762 --- Comment #2 from Anton Kochkov 2011-04-20 16:38:36 --- http://lists.nongnu.org/archive/html/qemu-devel/2011-04/msg01547.html -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel
https://bugzilla.kernel.org/show_bug.cgi?id=33762 --- Comment #1 from Anton Kochkov 2011-04-20 16:38:17 --- Additional discussion in qemu-devel mailing list -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 33762] New: Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel
https://bugzilla.kernel.org/show_bug.cgi?id=33762 Summary: Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel Product: Virtualization Version: unspecified Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: anton.koch...@gmail.com Regression: No Created an attachment (id=54822) --> (https://bugzilla.kernel.org/attachment.cgi?id=54822) Kernel CONFIG I'm using 2.6.38 kernel sources with grsecurity/PaX patches on Gentoo Hardened linux on Intel iCore7 x64 host. Example guest is Debian-6.0-amd64. Grecurity -> Security level -> Virtualization enabled starting qemu as qemu-kvm -net tap,ifname=tap1,script=no -net nic -monitor stdio -m 256 -d cpu,in_asm,exec -s -boot d -cdrom debian-minimal.iso -hda debian.qcow2 (qemu) info kvm kvm support: enabled (qemu) info cpus * CPU #0: pc=0x0010017c (halted) thread_id=4688 (qemu) info pci Bus 0, device 0, function 0: Host bridge: PCI device 8086:1237 id "" Bus 0, device 1, function 0: ISA bridge: PCI device 8086:7000 id "" Bus 0, device 1, function 1: IDE controller: PCI device 8086:7010 BAR4: I/O at 0xc000 [0xc00f]. id "" Bus 0, device 1, function 3: Bridge: PCI device 8086:7113 IRQ 9. id "" Bus 0, device 2, function 0: VGA controller: PCI device 1013:00b8 BAR0: 32 bit prefetchable memory at 0xf000 [0xf1ff]. BAR1: 32 bit memory at 0xf200 [0xf2000fff]. BAR6: 32 bit memory at 0x [0xfffe]. id "" (qemu) info status VM status: running (qemu) info roms fw=genroms/vapic.bin size=0x002400 name="vapic.bin" addr=fffe size=0x02 mem=rom name="bios.bin" (qemu) info registers EAX= EBX=00187130 ECX=00187130 EDX= ESI= EDI= EBP= ESP=0ffcfeac EIP=0010017c EFL=0246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=1 ES =0028 00c09300 DPL=0 DS [-WA] CS =0020 00c09b00 DPL=0 CS32 [-RA] SS =0028 00c09300 DPL=0 DS [-WA] DS =0028 00c09300 DPL=0 DS [-WA] FS = GS = LDT= TR =0008 0580 0067 8b00 DPL=0 TSS32-busy GDT= ab80 002f IDT= 30b8 07ff CR0=0013 CR2= CR3= CR4= DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER= FCW=037f FSW=0020 [ST=0] FTW=00 MXCSR=1f80 FPR0=f44d002c6000 400d FPR1=80847fe7 400e FPR2=fa007fa24000 400e FPR3=80e88055f000 400e FPR4=ea61009c4000 400d FPR5=ea62009c4000 400c FPR6=bb7fffb9b000 400b FPR7=bb83ffb9b000 400b XMM00= XMM01= XMM02= XMM03= XMM04= XMM05= XMM06= XMM07= My emerge --info: app-shells/bash: 4.2_p8 dev-lang/python: 2.7.1-r1, 3.1.3-r1 dev-util/cmake: 2.8.4 sys-apps/baselayout: 2.0.2 sys-apps/openrc: 0.8.1 sys-apps/sandbox:2.5 sys-devel/autoconf: 2.68 sys-devel/automake: 1.11.1-r1 sys-devel/binutils: 2.21 sys-devel/gcc: 4.5.2 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.4-r1 sys-devel/make: 3.82 sys-kernel/linux-headers: 2.6.38 virtual/os-headers: 2.6.38 (sys-kernel/linux-headers) ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=core2 -mtune=generic -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /var/bind" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-march=core2 -mtune=generic -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="assume-digests binpkg-logs distlocks fixlafiles fixpackages news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch" FFLAGS="" GENTOO_MIRRORS="ftp://rush.tisys.org/pub/gentoo/"; LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j9" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rush.tisys.org/gentoo-porta
Re: [PATCH] kvm tools: Add read-only support for QCOW2 images
On Tue, 19 Apr 2011, Prasad Joshi wrote: On Tue, Apr 19, 2011 at 10:07 PM, Pekka Enberg wrote: This patch extends the QCOW1 format to also support QCOW2 images as specified by the following document: http://people.gnome.org/~markmc/qcow-image-format.html Cc: Asias He Cc: Cyrill Gorcunov Cc: Prasad Joshi Cc: Sasha Levin Cc: Ingo Molnar Signed-off-by: Pekka Enberg --- tools/kvm/include/kvm/qcow.h | 42 ++- tools/kvm/qcow.c | 177 +- 2 files changed, 181 insertions(+), 38 deletions(-) diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h index 4be2597..afd776d 100644 --- a/tools/kvm/include/kvm/qcow.h +++ b/tools/kvm/include/kvm/qcow.h @@ -4,9 +4,17 @@ #include #define QCOW_MAGIC (('Q' << 24) | ('F' << 16) | ('I' << 8) | 0xfb) + #define QCOW1_VERSION 1 +#define QCOW2_VERSION 2 + +#define QCOW1_OFLAG_COMPRESSED (1LL << 63) + +#define QCOW1_OFLAG_MASK QCOW1_OFLAG_COMPRESSED -#define QCOW_OFLAG_COMPRESSED (1LL << 63) +#define QCOW2_OFLAG_COPIED (1LL << 63) +#define QCOW2_OFLAG_COMPRESSED (1LL << 62) +#define QCOW2_OFLAG_MASK (QCOW2_OFLAG_COPIED|QCOW2_OFLAG_COMPRESSED) struct qcow_table { u32 table_size; @@ -19,7 +27,16 @@ struct qcow { int fd; }; -struct qcow1_header { +struct qcow_header { + u64 size; /* in bytes */ + u64 l1_table_offset; + u32 l1_size; + u8 cluster_bits; + u8 l2_bits; + uint64_t oflag_mask; +}; + +struct qcow1_header_disk { u32 magic; u32 version; @@ -36,6 +53,27 @@ struct qcow1_header { u64 l1_table_offset; }; +struct qcow2_header_disk { + u32 magic; + u32 version; + + u64 backing_file_offset; + u32 backing_file_size; + + u32 cluster_bits; + u64 size; /* in bytes */ + u32 crypt_method; + + u32 l1_size; + u64 l1_table_offset; + + u64 refcount_table_offset; + u32 refcount_table_clusters; + + u32 nb_snapshots; + u64 snapshots_offset; +}; IMHO, as we start adding other features of QCOW, the two structures qcow2_header_disk and qcow_header might eventually become the same. No, the point of 'struct qcow2_header_disk' is to map to the on-disk representation. 'struct qcow_header' is the in-memory version of the data. + disk_image = disk_image__new(fd, h->size, &qcow1_disk_ops); qcow1_disk_ops can be changed to qcow_disk_ops. Sure, there's more qcow1 prefixes that need fixing now as well.
[PATCH] kvm tools: Add missing space before root= option
If user pases own options we need an extra space otherwise options get joined. Signed-off-by: Cyrill Gorcunov --- tools/kvm/kvm-run.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.git/tools/kvm/kvm-run.c = --- linux-2.6.git.orig/tools/kvm/kvm-run.c +++ linux-2.6.git/tools/kvm/kvm-run.c @@ -383,7 +383,7 @@ int kvm_cmd_run(int argc, const char **a } if (!strstr(real_cmdline, "root=")) - strlcat(real_cmdline, "root=/dev/vda rw ", sizeof(real_cmdline)); + strlcat(real_cmdline, " root=/dev/vda rw ", sizeof(real_cmdline)); if (image_filename) { kvm->disk_image = disk_image__open(image_filename, readonly_image); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: Use pci_store/load_saved_state() around VM device usage
On 04/20/2011 06:13 PM, Alex Williamson wrote: > > > This is also why I changed the > > __pci_reset_function() back to a normal pci_reset_function(), so we're > > never left with an uninitialized device like we are now. > > > > We could be more verbose or return an error here, but we've gone for a > > long time not even doing this save/restore across VM usage, so I don't > > think it's worthy of preventing the device attachment if it fails. > > At least a log? Ok, I'm not sure what corrective action a user would take or what they should expect not to work, but I guess a KERN_DEBUG printk is reasonable. "X didn't work" vs "X didn't work and I got this in the log" > Note avoiding the pointer would have removed the problem altogether. Returning a struct on store? We lose any kind of opacity that way since the caller needs to know about the struct then. I thought the pointer makes it clear the caller shouldn't be touching the contents, but if you think it's a better way to go, I can try it. Thanks, Avoid the allocation altogether. Having the caller be responsible for storage (in our case, embed the struct instead of the pointer). You can encrypt the contents using the TPM, or maybe a comment indicating that the contents should suffice. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: Use pci_store/load_saved_state() around VM device usage
On Wed, 2011-04-20 at 10:19 +0300, Avi Kivity wrote: > On 04/18/2011 10:43 PM, Alex Williamson wrote: > > On Sun, 2011-04-17 at 12:25 +0300, Avi Kivity wrote: > > > On 04/15/2011 10:54 PM, Alex Williamson wrote: > > > > Store the device saved state so that we can reload the device back > > > > to the original state when it's unassigned. This has the benefit > > > > that the state survives across pci_reset_function() calls via > > > > the PCI sysfs reset interface while the VM is using the device. > > > > > > > @@ -516,7 +518,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm > > > *kvm, > > > > > > > >pci_reset_function(dev); > > > >pci_save_state(dev); > > > > - > > > > + match->pci_saved_state = pci_store_saved_state(dev); > > > >match->assigned_dev_id = assigned_dev->assigned_dev_id; > > > > > > Error check? > > > > > > It might be better to give up the opacity of the data structure and make > > > pci_saved_state the full struct, not a pointer. > > > > pci_store_saved_state() returns NULL on error, which is correctly > > handled if we pass NULL to pci_load_saved_state() or a pointer to NULL > > to pci_load_and_free_saved_state(). > > But we silently swallow an error, this isn't good. > > >This is also why I changed the > > __pci_reset_function() back to a normal pci_reset_function(), so we're > > never left with an uninitialized device like we are now. > > > > We could be more verbose or return an error here, but we've gone for a > > long time not even doing this save/restore across VM usage, so I don't > > think it's worthy of preventing the device attachment if it fails. > > At least a log? Ok, I'm not sure what corrective action a user would take or what they should expect not to work, but I guess a KERN_DEBUG printk is reasonable. > Note avoiding the pointer would have removed the problem altogether. Returning a struct on store? We lose any kind of opacity that way since the caller needs to know about the struct then. I thought the pointer makes it clear the caller shouldn't be touching the contents, but if you think it's a better way to go, I can try it. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Windows XP, Tablet, vnc and mouse clicks
Hello! I am facing a strange problem. I use qemu-kvm-0.13 under Fedora 14. There is windows xp installed in the virtual machine. But some mouse clicks are not delivered to the virtual machine. When I do a quick click (press the mouse button and then momentally release it), it can be not delivered. If I do a long click (press the button, then wait a bit and release), it is delivered all the time. qemu startup command is standart: /usr/bin/qemu-kvm -enable-kvm -name VM2 -nographic -vnc 0.0.0.0:2 -vga std -m 512 -smp 2 -boot c -pidfile /home/vms/run/2.pid -monitor unix:/home/vms/run/2.monitor,server,nowait -serial unix:/home/vms/run/2.serial,server,nowait -net nic,vlan=0,macaddr=FE:E1:DE:AD:00:11,model=virtio -net tap,vlan=0,ifname=tap_2_0,script=/home/vms/ifup,downscript=/home/vms/ifdown -drive media=disk,if=virtio,index=0,file=/dev/sdb,cache=none,boot=on -usb -usbdevice tablet Operating system is Windows XP SP3 with virtio drivers for network and block devices. What is the problem? -- Boris Dolgov. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU-KVM and hardened (GRSEC/PaX) kernel
On 04/17/2011 01:45 AM, Антон Кочков wrote: Good day! I'm trying to make working qemu-kvm with hardened gentoo on hardened kernel. When i'm using CONFIG_PAX_KERNPAGEXEC and CONFIG_PAX_MEM_UNDEREF qemu just start and go to infinite loop and take 100% of one of my CPU core. adn it even can't be killed. Also it is dont give answer for qemu monitor/remote gdb. When I'm changed these two values as disabled, qemu-kvm now start, and stop (i mean qemu monitor show that virtual machine is running, but no any activity/output). Also it's load about 0%. See details in bug http://bugs.gentoo.org/show_bug.cgi?id=363713 Hope this info help improve qemu-kvm. As Blue says, the problem is likely in kvm, not qemu. Please try: - hardened guest on soft host (I expect this to work) - soft guest on hardened host (I expect this to fail). Are you using an Intel or AMD host? Note virtualization hardware will play with segmentation and defeat all those games the hardened kernel plays. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too
On Wed, Apr 20, 2011 at 07:18:12AM -0400, Avi Kivity wrote: > On 04/20/2011 02:06 PM, Roedel, Joerg wrote: > > The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a > > fix soon. > > Thanks. Here is a fix for review. I am out-of-office starting in nearly one hour until next Tuesday. So the corrections will most likely not happen before :) The patch ist tested with npt and shadow paging as well as with npt-on-npt (64 bit wit kvm). Regards, Joerg >From 6b1dcd9f17bbd482061180001d1f45c3adcef430 Mon Sep 17 00:00:00 2001 From: Joerg Roedel Date: Wed, 20 Apr 2011 15:22:21 +0200 Subject: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too This patch makes the cmpxchg_gpte() function aware of the difference between l1-gfns and l2-gfns when nested virtualization is in use. This fixes a potential data-corruption problem in the l1-guest and makes the code work correct (at least as correct as the hardware which is emulated in this code) again. Cc: sta...@kernel.org Signed-off-by: Joerg Roedel --- arch/x86/kvm/paging_tmpl.h | 30 +++--- 1 files changed, 23 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 74f8567..e442bf4 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -78,15 +78,21 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl) return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT; } -static bool FNAME(cmpxchg_gpte)(struct kvm *kvm, +static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gfn_t table_gfn, unsigned index, pt_element_t orig_pte, pt_element_t new_pte) { pt_element_t ret; pt_element_t *table; struct page *page; + gpa_t gpa; - page = gfn_to_page(kvm, table_gfn); + gpa = mmu->translate_gpa(vcpu, table_gfn << PAGE_SHIFT, +PFERR_USER_MASK|PFERR_WRITE_MASK); + if (gpa == UNMAPPED_GVA) + return -EFAULT; + + page = gfn_to_page(vcpu->kvm, gpa_to_gfn(gpa)); table = kmap_atomic(page, KM_USER0); ret = CMPXCHG(&table[index], orig_pte, new_pte); @@ -192,11 +198,17 @@ walk: #endif if (!eperm && !rsvd_fault && !(pte & PT_ACCESSED_MASK)) { + int ret; trace_kvm_mmu_set_accessed_bit(table_gfn, index, sizeof(pte)); - if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, - index, pte, pte|PT_ACCESSED_MASK)) + ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, + index, pte, pte|PT_ACCESSED_MASK); + if (ret < 0) { + present = false; + break; + } else if (ret) goto walk; + mark_page_dirty(vcpu->kvm, table_gfn); pte |= PT_ACCESSED_MASK; } @@ -245,13 +257,17 @@ walk: goto error; if (write_fault && !is_dirty_gpte(pte)) { - bool ret; + int ret; trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte)); - ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, + ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte, pte|PT_DIRTY_MASK); - if (ret) + if (ret < 0) { + present = false; + goto error; + } if (ret) goto walk; + mark_page_dirty(vcpu->kvm, table_gfn); pte |= PT_DIRTY_MASK; walker->ptes[walker->level - 1] = pte; -- 1.7.1 -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/16] KVM: x86 emulator: drop vcpu argument from intercept callback
Making the emulator caller agnostic. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |2 +- arch/x86/kvm/emulate.c |2 +- arch/x86/kvm/x86.c |4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 2c02e75..e2b082a 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -185,7 +185,7 @@ struct x86_emulate_ops { int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata); void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */ void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */ - int (*intercept)(struct kvm_vcpu *vcpu, + int (*intercept)(struct x86_emulate_ctxt *ctxt, struct x86_instruction_info *info, enum x86_intercept_stage stage); }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 57c730b..55ca5a5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -438,7 +438,7 @@ static int emulator_check_intercept(struct x86_emulate_ctxt *ctxt, .next_rip = ctxt->eip, }; - return ctxt->ops->intercept(ctxt->vcpu, &info, stage); + return ctxt->ops->intercept(ctxt, &info, stage); } static inline unsigned long ad_mask(struct decode_cache *c) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 16373a5..4f7248e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4365,11 +4365,11 @@ static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt) preempt_enable(); } -static int emulator_intercept(struct kvm_vcpu *vcpu, +static int emulator_intercept(struct x86_emulate_ctxt *ctxt, struct x86_instruction_info *info, enum x86_intercept_stage stage) { - return kvm_x86_ops->check_intercept(vcpu, info, stage); + return kvm_x86_ops->check_intercept(emul_to_vcpu(ctxt), info, stage); } static struct x86_emulate_ops emulate_ops = { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/16] KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks
Making the emulator caller agnostic. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h | 14 +++--- arch/x86/kvm/emulate.c | 84 ++-- arch/x86/kvm/x86.c | 34 ++ 3 files changed, 73 insertions(+), 59 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 656046a..2c02e75 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -176,13 +176,13 @@ struct x86_emulate_ops { int seg); void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); void (*get_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); - ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); - int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); - int (*cpl)(struct kvm_vcpu *vcpu); - int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu); - int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); - int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); + ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr); + int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val); + int (*cpl)(struct x86_emulate_ctxt *ctxt); + int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest); + int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value); + int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data); + int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata); void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */ void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */ int (*intercept)(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d1e0a1b..57c730b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -596,7 +596,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, if (addr.ea > lim || (u32)(addr.ea + size - 1) > lim) goto bad; } - cpl = ctxt->ops->cpl(ctxt->vcpu); + cpl = ctxt->ops->cpl(ctxt); rpl = ctxt->ops->get_segment_selector(ctxt, addr.seg) & 3; cpl = max(cpl, rpl); if (!(desc.type & 8)) { @@ -1248,7 +1248,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, rpl = selector & 3; dpl = seg_desc.dpl; - cpl = ops->cpl(ctxt->vcpu); + cpl = ops->cpl(ctxt); switch (seg) { case VCPU_SREG_SS: @@ -1407,7 +1407,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt, int rc; unsigned long val, change_mask; int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT; - int cpl = ops->cpl(ctxt->vcpu); + int cpl = ops->cpl(ctxt); rc = emulate_pop(ctxt, ops, &val, len); if (rc != X86EMUL_CONTINUE) @@ -1852,7 +1852,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) setup_syscalls_segments(ctxt, ops, &cs, &ss); - ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); + ops->get_msr(ctxt, MSR_STAR, &msr_data); msr_data >>= 32; cs_sel = (u16)(msr_data & 0xfffc); ss_sel = (u16)(msr_data + 8); @@ -1871,17 +1871,17 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) #ifdef CONFIG_X86_64 c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF; - ops->get_msr(ctxt->vcpu, + ops->get_msr(ctxt, ctxt->mode == X86EMUL_MODE_PROT64 ? MSR_LSTAR : MSR_CSTAR, &msr_data); c->eip = msr_data; - ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data); + ops->get_msr(ctxt, MSR_SYSCALL_MASK, &msr_data); ctxt->eflags &= ~(msr_data | EFLG_RF); #endif } else { /* legacy mode */ - ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data); + ops->get_msr(ctxt, MSR_STAR, &msr_data); c->eip = (u32)msr_data; ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF); @@ -1910,7 +1910,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) setup_syscalls_segments(ctxt, ops, &cs, &ss); - ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data); + ops->get_msr(ctxt, MSR_IA32_SYSENTER_CS, &msr_data); switch (ctxt->mode) { case X86EMUL_MODE_PROT32: if ((msr_data & 0xfffc) == 0x0) @@ -1938,10 +1938,10 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) ops->set_cached_descriptor(ctxt, &ss, 0, VCPU_S
[PATCH 03/16] KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks
Making the emulator caller agnostic. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h | 22 --- arch/x86/kvm/emulate.c | 112 ++-- arch/x86/kvm/x86.c | 39 +++-- 3 files changed, 90 insertions(+), 83 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 1348bdf..656046a 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -163,15 +163,19 @@ struct x86_emulate_ops { int size, unsigned short port, const void *val, unsigned int count); - bool (*get_cached_descriptor)(struct desc_struct *desc, u32 *base3, - int seg, struct kvm_vcpu *vcpu); - void (*set_cached_descriptor)(struct desc_struct *desc, u32 base3, - int seg, struct kvm_vcpu *vcpu); - u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu); - void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu); - unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu *vcpu); - void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); - void (*get_idt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu); + bool (*get_cached_descriptor)(struct x86_emulate_ctxt *ctxt, + struct desc_struct *desc, u32 *base3, + int seg); + void (*set_cached_descriptor)(struct x86_emulate_ctxt *ctxt, + struct desc_struct *desc, u32 base3, + int seg); + u16 (*get_segment_selector)(struct x86_emulate_ctxt *ctxt, int seg); + void (*set_segment_selector)(struct x86_emulate_ctxt *ctxt, +u16 sel, int seg); + unsigned long (*get_cached_segment_base)(struct x86_emulate_ctxt *ctxt, +int seg); + void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); + void (*get_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu); int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu); int (*cpl)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8af08a1..d1e0a1b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -495,7 +495,7 @@ static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, if (ctxt->mode == X86EMUL_MODE_PROT64 && seg < VCPU_SREG_FS) return 0; - return ops->get_cached_segment_base(seg, ctxt->vcpu); + return ops->get_cached_segment_base(ctxt, seg); } static unsigned seg_override(struct x86_emulate_ctxt *ctxt, @@ -573,8 +573,8 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, return emulate_gp(ctxt, 0); break; default: - usable = ctxt->ops->get_cached_descriptor(&desc, NULL, addr.seg, - ctxt->vcpu); + usable = ctxt->ops->get_cached_descriptor(ctxt, &desc, NULL, + addr.seg); if (!usable) goto bad; /* code segment or read-only data segment */ @@ -597,7 +597,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, goto bad; } cpl = ctxt->ops->cpl(ctxt->vcpu); - rpl = ctxt->ops->get_segment_selector(addr.seg, ctxt->vcpu) & 3; + rpl = ctxt->ops->get_segment_selector(ctxt, addr.seg) & 3; cpl = max(cpl, rpl); if (!(desc.type & 8)) { /* data segment */ @@ -1142,14 +1142,14 @@ static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, if (selector & 1 << 2) { struct desc_struct desc; memset (dt, 0, sizeof *dt); - if (!ops->get_cached_descriptor(&desc, NULL, VCPU_SREG_LDTR, - ctxt->vcpu)) + if (!ops->get_cached_descriptor(ctxt, &desc, NULL, + VCPU_SREG_LDTR)); return; dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? */ dt->address = get_desc_base(&desc); } else - ops->get_gdt(dt, ctxt->vcpu); + ops->get_gdt(ctxt, dt); } /* allowed just for 8 bytes segments */ @@ -1304,8 +1304,8 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, return ret; } load: - ops->set_segment_selector(selector, seg, ctxt->vcpu); - ops->set_cached_descriptor(&seg_des
[PATCH 12/16] KVM: x86 emulator: add new ->halt() callback
Instead of reaching into vcpu internals. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |2 +- arch/x86/kvm/x86.c |6 ++ 3 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index f890769..d30f1e9 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -186,6 +186,7 @@ struct x86_emulate_ops { int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value); int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data); int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata); + void (*halt)(struct x86_emulate_ctxt *ctxt); void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */ void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */ int (*intercept)(struct x86_emulate_ctxt *ctxt, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 6fca45f..a2a5008 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3913,7 +3913,7 @@ special_insn: c->dst.type = OP_NONE; /* Disable writeback. */ break; case 0xf4: /* hlt */ - ctxt->vcpu->arch.halt_request = 1; + ctxt->ops->halt(ctxt); break; case 0xf5: /* cmc */ /* complement carry flag from eflags reg */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8af49b3..2246cf1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4351,6 +4351,11 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt, return kvm_set_msr(emul_to_vcpu(ctxt), msr_index, data); } +static void emulator_halt(struct x86_emulate_ctxt *ctxt) +{ + emul_to_vcpu(ctxt)->arch.halt_request = 1; +} + static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt) { preempt_disable(); @@ -4400,6 +4405,7 @@ static struct x86_emulate_ops emulate_ops = { .set_dr = emulator_set_dr, .set_msr = emulator_set_msr, .get_msr = emulator_get_msr, + .halt= emulator_halt, .get_fpu = emulator_get_fpu, .put_fpu = emulator_put_fpu, .intercept = emulator_intercept, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/16] KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks
Unneeded for register access. Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 55ca5a5..8020f1b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2720,7 +2720,7 @@ static int check_svme(struct x86_emulate_ctxt *ctxt) static int check_svme_pa(struct x86_emulate_ctxt *ctxt) { - u64 rax = kvm_register_read(ctxt->vcpu, VCPU_REGS_RAX); + u64 rax = ctxt->decode.regs[VCPU_REGS_RAX]; /* Valid physical address? */ if (rax & 0x) @@ -2742,7 +2742,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt) static int check_rdpmc(struct x86_emulate_ctxt *ctxt) { u64 cr4 = ctxt->ops->get_cr(ctxt, 4); - u64 rcx = kvm_register_read(ctxt->vcpu, VCPU_REGS_RCX); + u64 rcx = ctxt->decode.regs[VCPU_REGS_RCX]; if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) || (rcx > 3)) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/16] KVM: x86 emulator: emulate CLTS internally
Avoid using ctxt->vcpu; we can do everything with ->get_cr() and ->set_cr(). A side effect is that we no longer activate the fpu on emulated CLTS; but that should be very rare. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 - arch/x86/kvm/emulate.c | 12 +++- arch/x86/kvm/x86.c |7 --- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a8616ca..9c3567e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -691,7 +691,6 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); -int emulate_clts(struct kvm_vcpu *vcpu); int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dc495a0..91c4a14 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2579,6 +2579,16 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static int em_clts(struct x86_emulate_ctxt *ctxt) +{ + ulong cr0; + + cr0 = ctxt->ops->get_cr(ctxt, 0); + cr0 &= ~X86_CR0_TS; + ctxt->ops->set_cr(ctxt, 0, cr0); + return X86EMUL_CONTINUE; +} + static bool valid_cr(int nr) { switch (nr) { @@ -4079,7 +4089,7 @@ twobyte_insn: rc = emulate_syscall(ctxt, ops); break; case 0x06: - emulate_clts(ctxt->vcpu); + rc = em_clts(ctxt); break; case 0x09: /* wbinvd */ kvm_emulate_wbinvd(ctxt->vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7cd3a3b..a9e8386 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4153,13 +4153,6 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd); -int emulate_clts(struct kvm_vcpu *vcpu) -{ - kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS)); - kvm_x86_ops->fpu_activate(vcpu); - return X86EMUL_CONTINUE; -} - int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest) { return _kvm_get_dr(emul_to_vcpu(ctxt), dr, dest); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/16] KVM: x86 emulator: add new ->wbinvd() callback
Instead of calling kvm_emulate_wbinvd() directly. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |2 +- arch/x86/kvm/x86.c |6 ++ 3 files changed, 8 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index d30840d..51341d6 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -187,6 +187,7 @@ struct x86_emulate_ops { int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data); int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata); void (*halt)(struct x86_emulate_ctxt *ctxt); + void (*wbinvd)(struct x86_emulate_ctxt *ctxt); int (*fix_hypercall)(struct x86_emulate_ctxt *ctxt); void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */ void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */ diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a41f406..f683ce1 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4092,7 +4092,7 @@ twobyte_insn: rc = em_clts(ctxt); break; case 0x09: /* wbinvd */ - kvm_emulate_wbinvd(ctxt->vcpu); + ctxt->ops->wbinvd(ctxt); break; case 0x08: /* invd */ case 0x0d: /* GrpP (prefetch) */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4a2b40e..5d853d5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4154,6 +4154,11 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd); +static void emulator_wbinvd(struct x86_emulate_ctxt *ctxt) +{ + kvm_emulate_wbinvd(emul_to_vcpu(ctxt)); +} + int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest) { return _kvm_get_dr(emul_to_vcpu(ctxt), dr, dest); @@ -4408,6 +4413,7 @@ static struct x86_emulate_ops emulate_ops = { .set_msr = emulator_set_msr, .get_msr = emulator_get_msr, .halt= emulator_halt, + .wbinvd = emulator_wbinvd, .fix_hypercall = emulator_fix_hypercall, .get_fpu = emulator_get_fpu, .put_fpu = emulator_put_fpu, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 16/16] KVM: x86 emulator: drop x86_emulate_ctxt::vcpu
No longer used. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |2 -- arch/x86/kvm/x86.c |1 - 2 files changed, 0 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 51341d6..127ea3e 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -269,8 +269,6 @@ struct x86_emulate_ctxt { struct x86_emulate_ops *ops; /* Register state before/after emulation. */ - struct kvm_vcpu *vcpu; - unsigned long eflags; unsigned long eip; /* eip before instruction emulation */ /* Emulated execution mode, represented by an X86EMUL_MODE value. */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 65a5b0c..a831d5d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4463,7 +4463,6 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu) kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l); - vcpu->arch.emulate_ctxt.vcpu = vcpu; vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu); vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu); vcpu->arch.emulate_ctxt.mode = -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 15/16] KVM: Avoid using x86_emulate_ctxt.vcpu
We can use container_of() instead. Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5d853d5..65a5b0c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4366,7 +4366,7 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt) static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt) { preempt_disable(); - kvm_load_guest_fpu(ctxt->vcpu); + kvm_load_guest_fpu(emul_to_vcpu(ctxt)); /* * CR0.TS may reference the host fpu state, not the guest fpu state, * so it may be clear at this point. -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/16] KVM: x86 emulator: add ->fix_hypercall() callback
Artificial, but needed to remove direct calls to KVM. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/include/asm/kvm_host.h|2 -- arch/x86/kvm/emulate.c |4 ++-- arch/x86/kvm/x86.c |6 +- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index d30f1e9..d30840d 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -187,6 +187,7 @@ struct x86_emulate_ops { int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data); int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata); void (*halt)(struct x86_emulate_ctxt *ctxt); + int (*fix_hypercall)(struct x86_emulate_ctxt *ctxt); void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */ void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */ int (*intercept)(struct x86_emulate_ctxt *ctxt, diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d957d0d..6cfc1ab 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -752,8 +752,6 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, int kvm_emulate_hypercall(struct kvm_vcpu *vcpu); -int kvm_fix_hypercall(struct kvm_vcpu *vcpu); - int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code, void *insn, int insn_len); void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a2a5008..a41f406 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -4025,7 +4025,7 @@ twobyte_insn: if (c->modrm_mod != 3 || c->modrm_rm != 1) goto cannot_emulate; - rc = kvm_fix_hypercall(ctxt->vcpu); + rc = ctxt->ops->fix_hypercall(ctxt); if (rc != X86EMUL_CONTINUE) goto done; @@ -4048,7 +4048,7 @@ twobyte_insn: if (c->modrm_mod == 3) { switch (c->modrm_rm) { case 1: - rc = kvm_fix_hypercall(ctxt->vcpu); + rc = ctxt->ops->fix_hypercall(ctxt); break; default: goto cannot_emulate; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2246cf1..4a2b40e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -152,6 +152,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = { u64 __read_mostly host_xcr0; +int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt); + static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu) { int i; @@ -4406,6 +4408,7 @@ static struct x86_emulate_ops emulate_ops = { .set_msr = emulator_set_msr, .get_msr = emulator_get_msr, .halt= emulator_halt, + .fix_hypercall = emulator_fix_hypercall, .get_fpu = emulator_get_fpu, .put_fpu = emulator_put_fpu, .intercept = emulator_intercept, @@ -5042,8 +5045,9 @@ out: } EXPORT_SYMBOL_GPL(kvm_emulate_hypercall); -int kvm_fix_hypercall(struct kvm_vcpu *vcpu) +int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt) { + struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); char instruction[3]; unsigned long rip = kvm_rip_read(vcpu); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/16] KVM: x86 emulator: drop vcpu argument from memory read/write callbacks
Making the emulator caller agnostic. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h | 34 ++ arch/x86/kvm/emulate.c | 54 --- arch/x86/kvm/x86.c | 54 ++- 3 files changed, 75 insertions(+), 67 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 9b760c8..b4d8467 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -92,8 +92,9 @@ struct x86_emulate_ops { * @val: [OUT] Value read from memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to read from memory. */ - int (*read_std)(unsigned long addr, void *val, - unsigned int bytes, struct kvm_vcpu *vcpu, + int (*read_std)(struct x86_emulate_ctxt *ctxt, + unsigned long addr, void *val, + unsigned int bytes, struct x86_exception *fault); /* @@ -103,8 +104,8 @@ struct x86_emulate_ops { * @val: [OUT] Value write to memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to write to memory. */ - int (*write_std)(unsigned long addr, void *val, -unsigned int bytes, struct kvm_vcpu *vcpu, + int (*write_std)(struct x86_emulate_ctxt *ctxt, +unsigned long addr, void *val, unsigned int bytes, struct x86_exception *fault); /* * fetch: Read bytes of standard (non-emulated/special) memory. @@ -113,8 +114,8 @@ struct x86_emulate_ops { * @val: [OUT] Value read from memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to read from memory. */ - int (*fetch)(unsigned long addr, void *val, -unsigned int bytes, struct kvm_vcpu *vcpu, + int (*fetch)(struct x86_emulate_ctxt *ctxt, +unsigned long addr, void *val, unsigned int bytes, struct x86_exception *fault); /* @@ -123,11 +124,9 @@ struct x86_emulate_ops { * @val: [OUT] Value read from memory, zero-extended to 'u_long'. * @bytes: [IN ] Number of bytes to read from memory. */ - int (*read_emulated)(unsigned long addr, -void *val, -unsigned int bytes, -struct x86_exception *fault, -struct kvm_vcpu *vcpu); + int (*read_emulated)(struct x86_emulate_ctxt *ctxt, +unsigned long addr, void *val, unsigned int bytes, +struct x86_exception *fault); /* * write_emulated: Write bytes to emulated/special memory area. @@ -136,11 +135,10 @@ struct x86_emulate_ops { *required). * @bytes: [IN ] Number of bytes to write to memory. */ - int (*write_emulated)(unsigned long addr, - const void *val, + int (*write_emulated)(struct x86_emulate_ctxt *ctxt, + unsigned long addr, const void *val, unsigned int bytes, - struct x86_exception *fault, - struct kvm_vcpu *vcpu); + struct x86_exception *fault); /* * cmpxchg_emulated: Emulate an atomic (LOCKed) CMPXCHG operation on an @@ -150,12 +148,12 @@ struct x86_emulate_ops { * @new: [IN ] Value to write to @addr. * @bytes: [IN ] Number of bytes to access using CMPXCHG. */ - int (*cmpxchg_emulated)(unsigned long addr, + int (*cmpxchg_emulated)(struct x86_emulate_ctxt *ctxt, + unsigned long addr, const void *old, const void *new, unsigned int bytes, - struct x86_exception *fault, - struct kvm_vcpu *vcpu); + struct x86_exception *fault); int (*pio_in_emulated)(int size, unsigned short port, void *val, unsigned int count, struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 3c11703..ff64b17 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -645,8 +645,7 @@ static int segmented_read_std(struct x86_emulate_ctxt *ctxt, rc = linearize(ctxt, addr, size, false, &linear); if (rc != X86EMUL_CONTINUE) return rc; - return ctxt->ops->read_std(linear, data, size, ctxt->vcpu, - &ctxt->exception); + return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception); } static int do_fe
[PATCH 07/16] KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt()
Replacing direct calls to realmode_lgdt(), realmode_lidt(). Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |2 ++ arch/x86/include/asm/kvm_host.h|3 --- arch/x86/kvm/emulate.c | 14 +++--- arch/x86/kvm/x86.c | 26 -- 4 files changed, 21 insertions(+), 24 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index e2b082a..4d1546a 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -176,6 +176,8 @@ struct x86_emulate_ops { int seg); void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); void (*get_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); + void (*set_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); + void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr); int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val); int (*cpl)(struct x86_emulate_ctxt *ctxt); diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e50bffc..a8616ca 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -681,9 +681,6 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu, return x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0); } -void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); -void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address); - void kvm_enable_efer_bits(u64); int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data); int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8020f1b..fb431f3 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3494,6 +3494,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt) int rc = X86EMUL_CONTINUE; int saved_dst_type = c->dst.type; int irq; /* Used for int 3, int, and into */ + struct desc_ptr desc_ptr; ctxt->decode.mem_read.pos = 0; @@ -4005,9 +4006,6 @@ twobyte_insn: switch (c->b) { case 0x01: /* lgdt, lidt, lmsw */ switch (c->modrm_reg) { - u16 size; - unsigned long address; - case 0: /* vmcall */ if (c->modrm_mod != 3 || c->modrm_rm != 1) goto cannot_emulate; @@ -4023,10 +4021,11 @@ twobyte_insn: break; case 2: /* lgdt */ rc = read_descriptor(ctxt, ops, c->src.addr.mem, -&size, &address, c->op_bytes); +&desc_ptr.size, &desc_ptr.address, +c->op_bytes); if (rc != X86EMUL_CONTINUE) goto done; - realmode_lgdt(ctxt->vcpu, size, address); + ctxt->ops->set_gdt(ctxt, &desc_ptr); /* Disable writeback. */ c->dst.type = OP_NONE; break; @@ -4041,11 +4040,12 @@ twobyte_insn: } } else { rc = read_descriptor(ctxt, ops, c->src.addr.mem, -&size, &address, +&desc_ptr.size, +&desc_ptr.address, c->op_bytes); if (rc != X86EMUL_CONTINUE) goto done; - realmode_lidt(ctxt->vcpu, size, address); + ctxt->ops->set_idt(ctxt, &desc_ptr); } /* Disable writeback. */ c->dst.type = OP_NONE; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4f7248e..7cd3a3b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4249,6 +4249,16 @@ static void emulator_get_idt(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt) kvm_x86_ops->get_idt(emul_to_vcpu(ctxt), dt); } +static void emulator_set_gdt(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt) +{ + kvm_x86_ops->set_gdt(emul_to_vcpu(ctxt), dt); +} + +static void emulator_set_idt(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt) +{ + kvm_x86_ops->set_idt(emul_to_vcpu(ctxt), dt); +} + static unsigned long emulator_get_cached_segment_base( struct x86_emulate_ctxt *ctxt, int seg) { @@ -4388,6 +4398,8 @@ static struct x86_emulate_ops emulate_ops = { .get_cach
[PATCH 08/16] KVM: x86 emulator: drop use of is_long_mode()
Requires ctxt->vcpu, which is to be abolished. Replace with open calls to get_msr(). Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 19 --- 1 files changed, 12 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index fb431f3..a4227bf 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1844,12 +1844,14 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) struct desc_struct cs, ss; u64 msr_data; u16 cs_sel, ss_sel; + u64 efer = 0; /* syscall is not available in real mode */ if (ctxt->mode == X86EMUL_MODE_REAL || ctxt->mode == X86EMUL_MODE_VM86) return emulate_ud(ctxt); + ops->get_msr(ctxt, MSR_EFER, &efer); setup_syscalls_segments(ctxt, ops, &cs, &ss); ops->get_msr(ctxt, MSR_STAR, &msr_data); @@ -1857,7 +1859,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) cs_sel = (u16)(msr_data & 0xfffc); ss_sel = (u16)(msr_data + 8); - if (is_long_mode(ctxt->vcpu)) { + if (efer & EFER_LMA) { cs.d = 0; cs.l = 1; } @@ -1867,7 +1869,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) ops->set_segment_selector(ctxt, ss_sel, VCPU_SREG_SS); c->regs[VCPU_REGS_RCX] = c->eip; - if (is_long_mode(ctxt->vcpu)) { + if (efer & EFER_LMA) { #ifdef CONFIG_X86_64 c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF; @@ -1897,7 +1899,9 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) struct desc_struct cs, ss; u64 msr_data; u16 cs_sel, ss_sel; + u64 efer = 0; + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); /* inject #GP if in real mode */ if (ctxt->mode == X86EMUL_MODE_REAL) return emulate_gp(ctxt, 0); @@ -1927,8 +1931,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops) cs_sel &= ~SELECTOR_RPL_MASK; ss_sel = cs_sel + 8; ss_sel &= ~SELECTOR_RPL_MASK; - if (ctxt->mode == X86EMUL_MODE_PROT64 - || is_long_mode(ctxt->vcpu)) { + if (ctxt->mode == X86EMUL_MODE_PROT64 || (efer & EFER_LMA)) { cs.d = 0; cs.l = 1; } @@ -2603,6 +2606,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) struct decode_cache *c = &ctxt->decode; u64 new_val = c->src.val64; int cr = c->modrm_reg; + u64 efer = 0; static u64 cr_reserved_bits[] = { 0xULL, @@ -2620,7 +2624,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) switch (cr) { case 0: { - u64 cr4, efer; + u64 cr4; if (((new_val & X86_CR0_PG) && !(new_val & X86_CR0_PE)) || ((new_val & X86_CR0_NW) && !(new_val & X86_CR0_CD))) return emulate_gp(ctxt, 0); @@ -2637,7 +2641,8 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) case 3: { u64 rsvd = 0; - if (is_long_mode(ctxt->vcpu)) + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); + if (efer & EFER_LMA) rsvd = CR3_L_MODE_RESERVED_BITS; else if (is_pae(ctxt->vcpu)) rsvd = CR3_PAE_RESERVED_BITS; @@ -2650,7 +2655,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) break; } case 4: { - u64 cr4, efer; + u64 cr4; cr4 = ctxt->ops->get_cr(ctxt, 4); ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/16] KVM: x86 emulator: make emulate_invlpg() an emulator callback
Removing direct calls to KVM. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/include/asm/kvm_host.h|1 - arch/x86/kvm/emulate.c |2 +- arch/x86/kvm/x86.c |6 +++--- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 4d1546a..f890769 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -154,6 +154,7 @@ struct x86_emulate_ops { const void *new, unsigned int bytes, struct x86_exception *fault); + void (*invlpg)(struct x86_emulate_ctxt *ctxt, ulong addr); int (*pio_in_emulated)(struct x86_emulate_ctxt *ctxt, int size, unsigned short port, void *val, diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9c3567e..d957d0d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -690,7 +690,6 @@ struct x86_emulate_ctxt; int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port); void kvm_emulate_cpuid(struct kvm_vcpu *vcpu); int kvm_emulate_halt(struct kvm_vcpu *vcpu); -int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address); int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu); void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 91c4a14..6fca45f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2573,7 +2573,7 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt) rc = linearize(ctxt, c->src.addr.mem, 1, false, &linear); if (rc == X86EMUL_CONTINUE) - emulate_invlpg(ctxt->vcpu, linear); + ctxt->ops->invlpg(ctxt, linear); /* Disable writeback. */ c->dst.type = OP_NONE; return X86EMUL_CONTINUE; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a9e8386..8af49b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4128,10 +4128,9 @@ static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg) return kvm_x86_ops->get_segment_base(vcpu, seg); } -int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address) +static void emulator_invlpg(struct x86_emulate_ctxt *ctxt, ulong address) { - kvm_mmu_invlpg(vcpu, address); - return X86EMUL_CONTINUE; + kvm_mmu_invlpg(emul_to_vcpu(ctxt), address); } int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu) @@ -4382,6 +4381,7 @@ static struct x86_emulate_ops emulate_ops = { .read_emulated = emulator_read_emulated, .write_emulated = emulator_write_emulated, .cmpxchg_emulated= emulator_cmpxchg_emulated, + .invlpg = emulator_invlpg, .pio_in_emulated = emulator_pio_in_emulated, .pio_out_emulated= emulator_pio_out_emulated, .get_cached_descriptor = emulator_get_cached_descriptor, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/16] KVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr()
Avoid use of ctxt->vcpu. Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index a4227bf..dc495a0 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2644,9 +2644,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt) ctxt->ops->get_msr(ctxt, MSR_EFER, &efer); if (efer & EFER_LMA) rsvd = CR3_L_MODE_RESERVED_BITS; - else if (is_pae(ctxt->vcpu)) + else if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PAE) rsvd = CR3_PAE_RESERVED_BITS; - else if (is_paging(ctxt->vcpu)) + else if (ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PG) rsvd = CR3_NONPAE_RESERVED_BITS; if (new_val & rsvd) -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/16] KVM: x86 emulator: drop vcpu argument from pio callbacks
Making the emulator caller agnostic. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h | 10 ++ arch/x86/kvm/emulate.c |6 +++--- arch/x86/kvm/x86.c | 18 -- 3 files changed, 21 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index b4d8467..1348bdf 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -155,11 +155,13 @@ struct x86_emulate_ops { unsigned int bytes, struct x86_exception *fault); - int (*pio_in_emulated)(int size, unsigned short port, void *val, - unsigned int count, struct kvm_vcpu *vcpu); + int (*pio_in_emulated)(struct x86_emulate_ctxt *ctxt, + int size, unsigned short port, void *val, + unsigned int count); - int (*pio_out_emulated)(int size, unsigned short port, const void *val, - unsigned int count, struct kvm_vcpu *vcpu); + int (*pio_out_emulated)(struct x86_emulate_ctxt *ctxt, + int size, unsigned short port, const void *val, + unsigned int count); bool (*get_cached_descriptor)(struct desc_struct *desc, u32 *base3, int seg, struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ff64b17..8af08a1 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1125,7 +1125,7 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt, if (n == 0) n = 1; rc->pos = rc->end = 0; - if (!ops->pio_in_emulated(size, port, rc->data, n, ctxt->vcpu)) + if (!ops->pio_in_emulated(ctxt, size, port, rc->data, n)) return 0; rc->end = n * size; } @@ -3892,8 +3892,8 @@ special_insn: case 0xef: /* out dx,(e/r)ax */ c->dst.val = c->regs[VCPU_REGS_RDX]; do_io_out: - ops->pio_out_emulated(c->src.bytes, c->dst.val, - &c->src.val, 1, ctxt->vcpu); + ops->pio_out_emulated(ctxt, c->src.bytes, c->dst.val, + &c->src.val, 1); c->dst.type = OP_NONE; /* Disable writeback. */ break; case 0xf4: /* hlt */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 274652a..e9040a9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4060,9 +4060,12 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd) } -static int emulator_pio_in_emulated(int size, unsigned short port, void *val, -unsigned int count, struct kvm_vcpu *vcpu) +static int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt, + int size, unsigned short port, void *val, + unsigned int count) { + struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); + if (vcpu->arch.pio.count) goto data_avail; @@ -4090,10 +4093,12 @@ static int emulator_pio_in_emulated(int size, unsigned short port, void *val, return 0; } -static int emulator_pio_out_emulated(int size, unsigned short port, - const void *val, unsigned int count, - struct kvm_vcpu *vcpu) +static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt, +int size, unsigned short port, +const void *val, unsigned int count) { + struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); + trace_kvm_pio(1, port, size, count); vcpu->arch.pio.port = port; @@ -4614,7 +4619,8 @@ EXPORT_SYMBOL_GPL(x86_emulate_instruction); int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port) { unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX); - int ret = emulator_pio_out_emulated(size, port, &val, 1, vcpu); + int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt, + size, port, &val, 1); /* do not return to emulator after return from userspace */ vcpu->arch.pio.count = 0; return ret; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/16] Decouple the x86 emulator from the rest of kvm
This (longer than expected) patchset decouples the x86 emulator from the rest of kvm. All communication is not done via x86_emulate_ctxt fields and callbacks; there is no access to ctxt->vcpu (which is eliminated by the last patch). Avi Kivity (16): KVM: x86 emulator: drop vcpu argument from memory read/write callbacks KVM: x86 emulator: drop vcpu argument from pio callbacks KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks KVM: x86 emulator: drop vcpu argument from intercept callback KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt() KVM: x86 emulator: drop use of is_long_mode() KVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr() KVM: x86 emulator: emulate CLTS internally KVM: x86 emulator: make emulate_invlpg() an emulator callback KVM: x86 emulator: add new ->halt() callback KVM: x86 emulator: add ->fix_hypercall() callback KVM: x86 emulator: add new ->wbinvd() callback KVM: Avoid using x86_emulate_ctxt.vcpu KVM: x86 emulator: drop x86_emulate_ctxt::vcpu arch/x86/include/asm/kvm_emulate.h | 90 ++- arch/x86/include/asm/kvm_host.h|7 - arch/x86/kvm/emulate.c | 321 +++- arch/x86/kvm/x86.c | 203 ++- 4 files changed, 338 insertions(+), 283 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: emulator: Use linearize() when fetching instructions.
On 04/18/2011 07:05 PM, Nelson Elhage wrote: Since segments need to be handled slightly differently when fetching instructions, we add a __linearize helper that accepts a new 'fetch' boolean. static int segmented_read_std(struct x86_emulate_ctxt *ctxt, struct segmented_address addr, void *data, @@ -637,11 +646,13 @@ static int do_fetch_insn_byte(struct x86_emulate_ctxt *ctxt, int size, cur_size; if (eip == fc->end) { - unsigned long linear = eip + ctxt->cs_base; - if (ctxt->mode != X86EMUL_MODE_PROT64) - linear&= (u32)-1; + unsigned long linear; + struct segmented_address addr = {VCPU_SREG_CS, eip}; cur_size = fc->end - fc->start; size = min(15UL - cur_size, PAGE_SIZE - offset_in_page(eip)); Breaks immediately - the segmented_address initializer is backwards. I've fixed this in my tree. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
On 04/20/2011 02:06 PM, Roedel, Joerg wrote: On Wed, Apr 20, 2011 at 06:05:08AM -0400, Avi Kivity wrote: > On 04/20/2011 12:35 PM, Roedel, Joerg wrote: > > This patch seems only to introduce another wrapper around > > kvm_read_guest_page_mmu(), so I don't see a problem in this patch. > > By patch 3, ptep_user will be computed in this function and no longer > available for setting the accessed bit later on. > > > The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or > > l2-gfn (by calling mmu->translate_gpa). > > But cmpxchg_gpte() does not. You are right, cmpxchg_gpte needs to handle this too. But the bug is not introduced with this patch-set it was there before. Correct. The reason I don't want the helper, is so we can use ptep_user in both places (not for efficiency, just to make sure it's exactly the same value). The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a fix soon. Thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
On Wed, Apr 20, 2011 at 06:05:08AM -0400, Avi Kivity wrote: > On 04/20/2011 12:35 PM, Roedel, Joerg wrote: > > This patch seems only to introduce another wrapper around > > kvm_read_guest_page_mmu(), so I don't see a problem in this patch. > > By patch 3, ptep_user will be computed in this function and no longer > available for setting the accessed bit later on. > > > The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or > > l2-gfn (by calling mmu->translate_gpa). > > But cmpxchg_gpte() does not. You are right, cmpxchg_gpte needs to handle this too. But the bug is not introduced with this patch-set it was there before. The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a fix soon. Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.38.1 general protection fault
On 20.04.2011 11:28, Thomas Treutner wrote: On 03/28/2011 10:14 PM, Tomasz Chmielewski wrote: On 28.03.2011 22:04, Andrea Arcangeli wrote: Tomasz, how easily can you reproduce? Well, this server runs 10 VMs or so, and it happens after 1-2 days of uptime. I reverted now to a 2.6.35.x, as it had enough downtime with 2.6.38 already ;) so I'd rather not experiment anymore for some time with a kernel known to cause problems. Tomasz, to which exact kernel version (host+guests) did you switch and is it now stable? I've switched the host to the latest 2.6.35.x and it's stable. Guest kernel doesn't seem to make a difference here, but majority of them are running 2.6.38.x kernel (had some weird issues with "events/0", taking 100% CPU on guests when I used 2.6.35, which made the guests crawling slow). -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86 emulator: whitespace cleanups
Clean up lines longer than 80 columns. No code changes. Signed-off-by: Avi Kivity --- arch/x86/kvm/emulate.c | 96 +++- 1 files changed, 54 insertions(+), 42 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 88c1f7a..4986e1b 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -262,42 +262,42 @@ struct gprefix { "w", "r", _LO32, "r", "", "r") /* Instruction has three operands and one operand is stored in ECX register */ -#define __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, _suffix, _type) \ - do { \ - unsigned long _tmp; \ - _type _clv = (_cl).val; \ - _type _srcv = (_src).val; \ - _type _dstv = (_dst).val; \ - \ - __asm__ __volatile__ ( \ - _PRE_EFLAGS("0", "5", "2") \ - _op _suffix " %4,%1 \n" \ - _POST_EFLAGS("0", "5", "2") \ - : "=m" (_eflags), "+r" (_dstv), "=&r" (_tmp) \ - : "c" (_clv) , "r" (_srcv), "i" (EFLAGS_MASK) \ - ); \ - \ - (_cl).val = (unsigned long) _clv; \ - (_src).val = (unsigned long) _srcv; \ - (_dst).val = (unsigned long) _dstv; \ +#define __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, _suffix, _type) \ + do {\ + unsigned long _tmp; \ + _type _clv = (_cl).val;\ + _type _srcv = (_src).val; \ + _type _dstv = (_dst).val; \ + \ + __asm__ __volatile__ ( \ + _PRE_EFLAGS("0", "5", "2") \ + _op _suffix " %4,%1 \n" \ + _POST_EFLAGS("0", "5", "2") \ + : "=m" (_eflags), "+r" (_dstv), "=&r" (_tmp)\ + : "c" (_clv) , "r" (_srcv), "i" (EFLAGS_MASK) \ + ); \ + \ + (_cl).val = (unsigned long) _clv; \ + (_src).val = (unsigned long) _srcv; \ + (_dst).val = (unsigned long) _dstv; \ } while (0) -#define emulate_2op_cl(_op, _cl, _src, _dst, _eflags) \ - do { \ - switch ((_dst).bytes) { \ - case 2: \ - __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, \ - "w", unsigned short); \ - break; \ - case 4: \ - __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, \ - "l", unsigned int); \ - break; \ - case 8: \ - ON64(__emulate_2op_cl(_op, _cl, _src, _dst, _eflags, \ - "q", unsigned long)); \ - break; \ - } \ +#define emulate_2op_cl(_op, _cl, _src, _dst, _eflags) \ + do {\ + switch ((_dst).bytes) { \ + case 2:
Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
On 04/20/2011 12:35 PM, Roedel, Joerg wrote: On Wed, Apr 20, 2011 at 05:07:12AM -0400, Avi Kivity wrote: > On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote: > > From: Takuya Yoshikawa > > > > This will be optimized later. > > > > Signed-off-by: Takuya Yoshikawa > > --- > >arch/x86/kvm/paging_tmpl.h | 12 +--- > >1 files changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > > index 74f8567..109939a 100644 > > --- a/arch/x86/kvm/paging_tmpl.h > > +++ b/arch/x86/kvm/paging_tmpl.h > > @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) > > return access; > >} > > > > +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > > +gfn_t table_gfn, int offset, pt_element_t *ptep) > > +{ > > + return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, > > + offset, sizeof(*ptep), > > + PFERR_USER_MASK | PFERR_WRITE_MASK); > > +} > > + > >/* > > * Fetch a guest pte for a guest virtual address > > */ > > @@ -160,9 +168,7 @@ walk: > > walker->table_gfn[walker->level - 1] = table_gfn; > > walker->pte_gpa[walker->level - 1] = pte_gpa; > > > > - if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte, > > - offset, sizeof(pte), > > - PFERR_USER_MASK|PFERR_WRITE_MASK)) { > > + if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) { > > present = false; > > break; > > } > > > I think it's better to avoid a separate function for this. The reason > is I'd like to use ptep_user for cmpxchg_gpte() later on in > walk_addr_generic(), so we use the same calculation for both read and > write. So please just inline the new code in walk_addr_generic(). > > In fact there's probably a bug there for nested npt - we use > gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn. > Joerg, am I right here? This patch seems only to introduce another wrapper around kvm_read_guest_page_mmu(), so I don't see a problem in this patch. By patch 3, ptep_user will be computed in this function and no longer available for setting the accessed bit later on. The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or l2-gfn (by calling mmu->translate_gpa). But cmpxchg_gpte() does not. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] qemu-kvm: Sort out upstream merge regressions
On 04/18/2011 12:26 PM, Jan Kiszka wrote: Recent merge with upstream left some corners of qemu-kvm broken behind. This series addresses those I've spotted based on my merge experiments in the past months. Applied all, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.38.1 general protection fault
On 03/28/2011 10:14 PM, Tomasz Chmielewski wrote: On 28.03.2011 22:04, Andrea Arcangeli wrote: Tomasz, how easily can you reproduce? Well, this server runs 10 VMs or so, and it happens after 1-2 days of uptime. I reverted now to a 2.6.35.x, as it had enough downtime with 2.6.38 already ;) so I'd rather not experiment anymore for some time with a kernel known to cause problems. Tomasz, to which exact kernel version (host+guests) did you switch and is it now stable? thanks, -t -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
On Wed, Apr 20, 2011 at 05:07:12AM -0400, Avi Kivity wrote: > On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote: > > From: Takuya Yoshikawa > > > > This will be optimized later. > > > > Signed-off-by: Takuya Yoshikawa > > --- > > arch/x86/kvm/paging_tmpl.h | 12 +--- > > 1 files changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > > index 74f8567..109939a 100644 > > --- a/arch/x86/kvm/paging_tmpl.h > > +++ b/arch/x86/kvm/paging_tmpl.h > > @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu > > *vcpu, pt_element_t gpte) > > return access; > > } > > > > +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu > > *mmu, > > +gfn_t table_gfn, int offset, pt_element_t > > *ptep) > > +{ > > + return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, > > + offset, sizeof(*ptep), > > + PFERR_USER_MASK | PFERR_WRITE_MASK); > > +} > > + > > /* > >* Fetch a guest pte for a guest virtual address > >*/ > > @@ -160,9 +168,7 @@ walk: > > walker->table_gfn[walker->level - 1] = table_gfn; > > walker->pte_gpa[walker->level - 1] = pte_gpa; > > > > - if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte, > > - offset, sizeof(pte), > > - PFERR_USER_MASK|PFERR_WRITE_MASK)) { > > + if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) { > > present = false; > > break; > > } > > > I think it's better to avoid a separate function for this. The reason > is I'd like to use ptep_user for cmpxchg_gpte() later on in > walk_addr_generic(), so we use the same calculation for both read and > write. So please just inline the new code in walk_addr_generic(). > > In fact there's probably a bug there for nested npt - we use > gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn. > Joerg, am I right here? This patch seems only to introduce another wrapper around kvm_read_guest_page_mmu(), so I don't see a problem in this patch. The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or l2-gfn (by calling mmu->translate_gpa). Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: emulator: Use linearize() when fetching instructions.
On 04/18/2011 07:05 PM, Nelson Elhage wrote: Since segments need to be handled slightly differently when fetching instructions, we add a __linearize helper that accepts a new 'fetch' boolean. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
On 04/19/2011 06:47 AM, Takuya Yoshikawa wrote: > So if certain algorithm seems to be addapted, yes, I will test based > on that. IIRC, any practically good algorithm has not been found yet, > right? I think a simple sort based on size will provide the same optimization (just the cache, not get_user()) without any downsides. Most memory in a guest is usually in just one or two slots, that's the reason for the high hit rate. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote: From: Takuya Yoshikawa This will be optimized later. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/paging_tmpl.h | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 74f8567..109939a 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) return access; } +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, +gfn_t table_gfn, int offset, pt_element_t *ptep) +{ + return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, + offset, sizeof(*ptep), + PFERR_USER_MASK | PFERR_WRITE_MASK); +} + /* * Fetch a guest pte for a guest virtual address */ @@ -160,9 +168,7 @@ walk: walker->table_gfn[walker->level - 1] = table_gfn; walker->pte_gpa[walker->level - 1] = pte_gpa; - if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte, - offset, sizeof(pte), - PFERR_USER_MASK|PFERR_WRITE_MASK)) { + if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) { present = false; break; } I think it's better to avoid a separate function for this. The reason is I'd like to use ptep_user for cmpxchg_gpte() later on in walk_addr_generic(), so we use the same calculation for both read and write. So please just inline the new code in walk_addr_generic(). In fact there's probably a bug there for nested npt - we use gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn. Joerg, am I right here? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
On 04/18/2011 09:38 PM, Takuya Yoshikawa wrote: From: Takuya Yoshikawa We optimize multi level guest page table walk as follows: 1. We cache the memslot which, probably, includes the next guest page tables to avoid searching for it many times. 2. We use get_user() instead of copy_from_user(). Note that this is kind of a restricted way of Xiao's more generic work: "KVM: optimize memslots searching and cache GPN to GFN." Good optimization. copy_from_user() really isn't optimized for short buffers, I expect much of the improvement comes from that. +/* + * Read the guest PTE refered to by table_gfn and offset and put it into ptep. + * + * *slot_hint, if not NULL, should point to a memslot which probably includes + * the guest PTE. The actual memslot will be put back into this so that + * callers can cache it. + */ Please drop the slot_hint optimization. First, it belongs in a separate patch. Second, I prefer to see a generic slot sort instead of an ad-hoc cache. static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, -gfn_t table_gfn, int offset, pt_element_t *ptep) +gfn_t table_gfn, int offset, pt_element_t *ptep, +struct kvm_memory_slot **slot_hint) { - return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, - offset, sizeof(*ptep), - PFERR_USER_MASK | PFERR_WRITE_MASK); + unsigned long addr; + pt_element_t __user *ptep_user; + gfn_t real_gfn; + + real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), + PFERR_USER_MASK | PFERR_WRITE_MASK); + if (real_gfn == UNMAPPED_GVA) + return -EFAULT; + + real_gfn = gpa_to_gfn(real_gfn); + + if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn)) + *slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn); + + addr = gfn_to_hva_memslot(*slot_hint, real_gfn); + if (kvm_is_error_hva(addr)) + return -EFAULT; + + ptep_user = (pt_element_t __user *)((void *)addr + offset); + return get_user(*ptep, ptep_user); } /* @@ -130,6 +155,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, gpa_t pte_gpa; bool eperm, present, rsvd_fault; int offset, write_fault, user_fault, fetch_fault; + struct kvm_memory_slot *slot_cache = NULL; write_fault = access& PFERR_WRITE_MASK; user_fault = access& PFERR_USER_MASK; @@ -168,7 +194,8 @@ walk: walker->table_gfn[walker->level - 1] = table_gfn; walker->pte_gpa[walker->level - 1] = pte_gpa; - if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) { + if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, + offset,&pte,&slot_cache)) { present = false; break; } -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] kvm tools: Add debug feature to test the IO thread
* Pekka Enberg wrote: > Sorry for the bikeshedding but wouldn't it be better to follow Git's lead and > have something like > > kvm config MyInstance-1 --set debug.io.delay.ms 100 > > and > > kvm config MyInstance-1 --list Yeah, agreed - 'kvm config' is intuitive. I tried to think of something better than 'kvm set' but failed. ( And no, being super diligent with high level, very user visible changes and names is not bikeshed painting. ) Note that 'git config' touches the .gitconfig IIRC - while here we really also want to include runtime, dynamic configuration - but i think that distinction is fine. Now the whole 'kvm config' thing needs more thought and the whole enumeration of KVM instances needs to be well thought out as well. How do we list instances - 'kvm list' - or should perhaps 'kvm config' list all the currently running instances? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
Thanks Jason! So I can use my virtio-net guest driver and test with this patch? Please provide the script you use to start MQ guest. Regards, - KK Jason Wang wrote on 04/20/2011 02:03:07 PM: > Jason Wang > 04/20/2011 02:03 PM > > To > > Krishna Kumar2/India/IBM@IBMIN, kvm@vger.kernel.org, m...@redhat.com, > net...@vger.kernel.org, ru...@rustcorp.com.au, qemu- > de...@nongnu.org, anth...@codemonkey.ws > > cc > > Subject > > [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net) > > Inspired by Krishna's patch (http://www.spinics.net/lists/kvm/msg52098.html > ) and > Michael's suggestions. The following series adds the multiqueue support for > qemu and enable it for virtio-net (both userspace and vhost). > > The aim for this series is to simplified the management and achieve the same > performacne with less codes. > > Follows are the differences between this series and Krishna's: > > - Add the multiqueue support for qemu and also for userspace virtio-net > - Instead of hacking the vhost module to manipulate kthreads, this patch just > implement the userspace based multiqueues and thus can re-use the > existed vhost kernel-side codes without any modification. > - Use 1:1 mapping between TX/RX pairs and vhost kthread because the > implementation is based on usersapce. > - The cli is also changed to make the mgmt easier, the -netdev option of qdev > can now accpet more than one ids. You can start a multiqueue virtio-net device > through: > ./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev > tap,id=hn0,vhost=on,fd=Y -device virtio-net-pci,netdev=hn0#hn1,queues=2 ... > > The series is very primitive and still need polished. > > Suggestions are welcomed. > --- > > Jason Wang (2): > net: Add multiqueue support > virtio-net: add multiqueue support > > > hw/qdev-properties.c | 37 - > hw/qdev.h|3 > hw/vhost.c | 26 ++- > hw/vhost.h |1 > hw/vhost_net.c |7 + > hw/vhost_net.h |2 > hw/virtio-net.c | 409 +++ > +-- > hw/virtio-net.h |2 > hw/virtio-pci.c |1 > hw/virtio.h |1 > net.c| 34 +++- > net.h| 15 +- > 12 files changed, 353 insertions(+), 185 deletions(-) > > -- > Jason Wang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/2] virtio-net: add multiqueue support
This patch add the multiqueue ability to virtio-net for both userapce and vhost. With this patch the kernel side vhost could be reused without modification to support multiqueue virtio-net nics. Signed-off-by: Jason Wang --- hw/vhost.c | 26 ++- hw/vhost.h |1 hw/vhost_net.c |7 + hw/vhost_net.h |2 hw/virtio-net.c | 409 +++ hw/virtio-net.h |2 hw/virtio-pci.c |1 hw/virtio.h |1 8 files changed, 284 insertions(+), 165 deletions(-) diff --git a/hw/vhost.c b/hw/vhost.c index 14b571d..2301d53 100644 --- a/hw/vhost.c +++ b/hw/vhost.c @@ -450,10 +450,10 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, target_phys_addr_t s, l, a; int r; struct vhost_vring_file file = { -.index = idx, +.index = idx % dev->nvqs, }; struct vhost_vring_state state = { -.index = idx, +.index = idx % dev->nvqs, }; struct VirtQueue *vvq = virtio_get_queue(vdev, idx); @@ -504,12 +504,13 @@ static int vhost_virtqueue_init(struct vhost_dev *dev, goto fail_alloc_ring; } -r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled); +r = vhost_virtqueue_set_addr(dev, vq, idx % dev->nvqs, dev->log_enabled); if (r < 0) { r = -errno; goto fail_alloc; } r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, true); + if (r < 0) { fprintf(stderr, "Error binding host notifier: %d\n", -r); goto fail_host_notifier; @@ -557,7 +558,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev, unsigned idx) { struct vhost_vring_state state = { -.index = idx, +.index = idx % dev->nvqs, }; int r; r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false); @@ -648,10 +649,13 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) goto fail; } -r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true); -if (r < 0) { -fprintf(stderr, "Error binding guest notifier: %d\n", -r); -goto fail_notifiers; +for (i = 0; i < hdev->nvqs; i++) { +r = vdev->binding->set_guest_notifier(vdev->binding_opaque, + hdev->start_idx + i, true); +if (r < 0) { +fprintf(stderr, "Error binding guest notifier: %d\n", -r); +goto fail_notifiers; +} } r = vhost_dev_set_features(hdev, hdev->log_enabled); @@ -667,7 +671,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev) r = vhost_virtqueue_init(hdev, vdev, hdev->vqs + i, - i); + hdev->start_idx + i); if (r < 0) { goto fail_vq; } @@ -694,7 +698,7 @@ fail_vq: vhost_virtqueue_cleanup(hdev, vdev, hdev->vqs + i, -i); +hdev->start_idx + i); } fail_mem: fail_features: @@ -712,7 +716,7 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev) vhost_virtqueue_cleanup(hdev, vdev, hdev->vqs + i, -i); +hdev->start_idx + i); } vhost_client_sync_dirty_bitmap(&hdev->client, 0, (target_phys_addr_t)~0x0ull); diff --git a/hw/vhost.h b/hw/vhost.h index c8c595a..48b9478 100644 --- a/hw/vhost.h +++ b/hw/vhost.h @@ -31,6 +31,7 @@ struct vhost_dev { struct vhost_memory *mem; struct vhost_virtqueue *vqs; int nvqs; +int start_idx; unsigned long long features; unsigned long long acked_features; unsigned long long backend_features; diff --git a/hw/vhost_net.c b/hw/vhost_net.c index 420e05f..7fc87f8 100644 --- a/hw/vhost_net.c +++ b/hw/vhost_net.c @@ -128,7 +128,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev) } int vhost_net_start(struct vhost_net *net, -VirtIODevice *dev) +VirtIODevice *dev, +int start_idx) { struct vhost_vring_file file = { }; int r; @@ -139,6 +140,7 @@ int vhost_net_start(struct vhost_net *net, net->dev.nvqs = 2; net->dev.vqs = net->vqs; +net->dev.start_idx = start_idx; r = vhost_dev_start(&net->dev, dev); if (r < 0) { return r; @@ -206,7 +208,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev) } int vhost_net_start(struct vhost_net *net, - VirtIODevice *dev) +VirtIODevice *dev, +int start_idx) { return -ENOSYS; } diff --git a/hw/vhost_net.h b/hw/vhost_net.h index 91
[RFC PATCH 1/2] net: Add multiqueue support
This patch adds the multiqueues support for emulated nics. Each VLANClientState pairs are now abstract as a queue instead of a nic, and multiple VLANClientState pointers were stored in the NICState and treated as the multiple queues of a single nic. The netdev options of qdev were now expanded to accept more than one netdev ids. A queue_index were also introduced to let the emulated nics know which queue the packet were came from or sent out. Virtio-net would be the first user. The legacy single queue nics can still run happily without modification as the the compatibility were kept. Signed-off-by: Jason Wang --- hw/qdev-properties.c | 37 ++--- hw/qdev.h|3 ++- net.c| 34 ++ net.h| 15 +++ 4 files changed, 69 insertions(+), 20 deletions(-) diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c index 1088a26..dd371e1 100644 --- a/hw/qdev-properties.c +++ b/hw/qdev-properties.c @@ -384,14 +384,37 @@ PropertyInfo qdev_prop_chr = { static int parse_netdev(DeviceState *dev, Property *prop, const char *str) { -VLANClientState **ptr = qdev_get_prop_ptr(dev, prop); +VLANClientState ***nc = qdev_get_prop_ptr(dev, prop); +const char *ptr = str; +int i = 0; +size_t len = strlen(str); +*nc = qemu_malloc(MAX_QUEUE_NUM * sizeof(VLANClientState *)); + +while (i < MAX_QUEUE_NUM && ptr < str + len) { +char *name = NULL; +char *this = strchr(ptr, '#'); + +if (this == NULL) { +name = strdup(ptr); +} else { +name = strndup(ptr, this - ptr); +} -*ptr = qemu_find_netdev(str); -if (*ptr == NULL) -return -ENOENT; -if ((*ptr)->peer) { -return -EEXIST; +(*nc)[i] = qemu_find_netdev(name); +if ((*nc)[i] == NULL) { +return -ENOENT; +} +if (((*nc)[i])->peer) { +return -EEXIST; +} + +if (this == NULL) { +break; +} +i++; +ptr = this + 1; } + return 0; } @@ -409,7 +432,7 @@ static int print_netdev(DeviceState *dev, Property *prop, char *dest, size_t len PropertyInfo qdev_prop_netdev = { .name = "netdev", .type = PROP_TYPE_NETDEV, -.size = sizeof(VLANClientState*), +.size = sizeof(VLANClientState **), .parse = parse_netdev, .print = print_netdev, }; diff --git a/hw/qdev.h b/hw/qdev.h index 8a13ec9..b438da0 100644 --- a/hw/qdev.h +++ b/hw/qdev.h @@ -257,6 +257,7 @@ extern PropertyInfo qdev_prop_pci_devfn; .defval= (bool[]) { (_defval) }, \ } + #define DEFINE_PROP_UINT8(_n, _s, _f, _d) \ DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t) #define DEFINE_PROP_UINT16(_n, _s, _f, _d) \ @@ -281,7 +282,7 @@ extern PropertyInfo qdev_prop_pci_devfn; #define DEFINE_PROP_STRING(_n, _s, _f) \ DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*) #define DEFINE_PROP_NETDEV(_n, _s, _f) \ -DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*) +DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState**) #define DEFINE_PROP_VLAN(_n, _s, _f) \ DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*) #define DEFINE_PROP_DRIVE(_n, _s, _f) \ diff --git a/net.c b/net.c index 4f777c3..a937e5d 100644 --- a/net.c +++ b/net.c @@ -227,16 +227,36 @@ NICState *qemu_new_nic(NetClientInfo *info, { VLANClientState *nc; NICState *nic; +int i; assert(info->type == NET_CLIENT_TYPE_NIC); assert(info->size >= sizeof(NICState)); -nc = qemu_new_net_client(info, conf->vlan, conf->peer, model, name); +nc = qemu_new_net_client(info, conf->vlan, conf->peers[0], model, name); nic = DO_UPCAST(NICState, nc, nc); nic->conf = conf; nic->opaque = opaque; +/* For compatiablity with single queue nic */ +nic->ncs[0] = nc; +nc->opaque = nic; + +for (i = 1 ; i < conf->queues; i++) { +VLANClientState *vc = qemu_mallocz(sizeof(*vc)); +vc->opaque = nic; +nic->ncs[i] = vc; +vc->peer = conf->peers[i]; +vc->info = info; +vc->queue_index = i; +vc->peer->peer = vc; +QTAILQ_INSERT_TAIL(&non_vlan_clients, vc, next); + +vc->send_queue = qemu_new_net_queue(qemu_deliver_packet, +qemu_deliver_packet_iov, +vc); +} + return nic; } @@ -272,11 +292,10 @@ void qemu_del_vlan_client(VLANClientState *vc) { /* If there is a peer NIC, delete and cleanup client, but do not free. */ if (!vc->vlan && vc->peer && vc->peer->info->type == NET_CLIENT_TYPE_NIC) { -NICState *nic = DO_UPCAST(NICState, nc, vc->peer); -if (nic->peer_deleted) { +if (vc->peer_deleted) { retur
[RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
Inspired by Krishna's patch (http://www.spinics.net/lists/kvm/msg52098.html) and Michael's suggestions. The following series adds the multiqueue support for qemu and enable it for virtio-net (both userspace and vhost). The aim for this series is to simplified the management and achieve the same performacne with less codes. Follows are the differences between this series and Krishna's: - Add the multiqueue support for qemu and also for userspace virtio-net - Instead of hacking the vhost module to manipulate kthreads, this patch just implement the userspace based multiqueues and thus can re-use the existed vhost kernel-side codes without any modification. - Use 1:1 mapping between TX/RX pairs and vhost kthread because the implementation is based on usersapce. - The cli is also changed to make the mgmt easier, the -netdev option of qdev can now accpet more than one ids. You can start a multiqueue virtio-net device through: ./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev tap,id=hn0,vhost=on,fd=Y -device virtio-net-pci,netdev=hn0#hn1,queues=2 ... The series is very primitive and still need polished. Suggestions are welcomed. --- Jason Wang (2): net: Add multiqueue support virtio-net: add multiqueue support hw/qdev-properties.c | 37 - hw/qdev.h|3 hw/vhost.c | 26 ++- hw/vhost.h |1 hw/vhost_net.c |7 + hw/vhost_net.h |2 hw/virtio-net.c | 409 -- hw/virtio-net.h |2 hw/virtio-pci.c |1 hw/virtio.h |1 net.c| 34 +++- net.h| 15 +- 12 files changed, 353 insertions(+), 185 deletions(-) -- Jason Wang -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Some KVM fixes
On 04/18/2011 12:42 PM, Joerg Roedel wrote: Hi, these two patches fix one issue introduced with the recent emulator-intercept code (the issue was there before too, but hidden by other workaround code which was removed in the mentioned patch-set). The second patch fixes a problem introduced with the tsc-scaling patch-set where the TSC was not usable anymore after a guest-reboot. All-in-all, these fixes are no -stable material. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] acpi_piix4: remove bad save/restore of cpus_sts
On 04/18/2011 04:56 PM, Isaku Yamahata wrote: This patch would fix the segfaults. But I suppose the followings are necessary. - PIIX4PMState::gpe_cpu needs to be saved/loaded somewhere Yes. Juan? - gpe_writeb() needs to handle PROC_BASE ... PROC_BASE+31 like gpe_readb(). To be honest, I don't see why gpe_readb/writeb() are used for PROC_BASE...PROC_BASE + 31 Even before the merge, we didn't handle a write to this address. Perhaps it's read-only? (that should explain no save/restore). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] Store and load PCI device saved state across function resets
On 04/19/2011 11:12 PM, Alex Williamson wrote: v1 -> v2: Make the pointer passed around less opaque for type safety. Bug https://bugs.launchpad.net/qemu/+bug/754591 is caused because the KVM module attempts to do a pci_save_state() before assigning the device to a VM, expecting that the saved state will remain valid until we release the device. This is in conflict with our need to reset devices using PCI sysfs during a VM reset to quiesce the device. Any calls to pci_reset_function() will overwrite the device saved stated prior to reset, and reload and invalidate the state after. KVM then ends up trying to restore the state, but it's already invalid, so the device ends up with reset values. This series adds a mechanism to pull the saved state off the struct pci_dev and reload it later. Thanks, Based on the sizes of the patches, this should go in via the pci tree. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm hangs with 1GB or more memory assigned
(re-adding list) On 04/15/2011 07:28 AM, Neal Murphy wrote: On Wednesday 06 April 2011 03:35:27 you wrote: > On 04/06/2011 06:22 AM, Neal Murphy wrote: > > ENVIRONS: > > I'm running > > > > - Debian Squeeze. > > - QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5) > > - 2.6.32-5-686-bigmem #1 SMP Tue Mar 8 22:14:55 UTC 2011 i686 > > GNU/Linux - Quad Phenom II 965, 8GB RAM > > > > I'm booting generic 2.6.35.11 through syslinux. The command is generated > > via a script I wrote. It works fine until I assign more than 1005M RAM > > to the VM; it's been working fine (at less than 1GB RAM) for many > > months. The system I am booting boots and runs fine on bare metal. > > > > I got the same results when I DLed and installed ver. 0.14. > Looks like a guest BIOS issue. > > Please try qemu-0.14. Also try -cpu qemu64 instead of phenom. > > If those fail, we can attach with gdb and try to look at what's going > on, but let's try the simple tests first. BTW, is there a preferred 'set' of options to feed ./configure? No. I used './configure --prefix=/opt/kvm --enable-kvm' to build. Finally got my system (Smoothwall/Roadster) stabilized and out the door, so I have time to dedicate to this. OK. I've built 0.14 (installed to /opt/kvm and I deleted Squeeze's older package) and am using Squeeze's kvm_amd module. I tried without -cpu and -smp; I tried -cpu qemu64. I eliminated the -vga and -serial options to no avail. It still chokes on 1005MB RAM. Works for me. Please use -monitor stdio and issue the commands (qemu) info registers (qemu) x/50i $eip - 30 So what's the next ste... ... Wait, just thought of something to try: 'rmmod kvm-amd'. ... Oho! Without the accelerator, it runs with at least 1588M RAM, but can't allocate 2000M RAM (this may be expected on a 32-bit OS with PAE). 2G limit is expected on i386. Does this help? Is it time to submit a debian bug? If debian fixes the bug, that would be great. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 V2] kvm tools: Complete missing segments in a iov op using regular op
Hi, [ Sasha, please remember to CC people who were involved in discussions! ] On Mon, Apr 18, 2011 at 4:05 PM, Sasha Levin wrote: > If any of the iov operations return mid-block, use regular ops to complete > the current block and continue using iov ops. > > Signed-off-by: Sasha Levin > --- > tools/kvm/read-write.c | 58 ++- > 1 files changed, 51 insertions(+), 7 deletions(-) > > diff --git a/tools/kvm/read-write.c b/tools/kvm/read-write.c > index 0c995c8..bf2e4a0 100644 > --- a/tools/kvm/read-write.c > +++ b/tools/kvm/read-write.c > @@ -189,10 +189,10 @@ static inline ssize_t get_iov_size(const struct iovec > *iov, int iovcnt) > } > > static inline void shift_iovec(const struct iovec **iov, int *iovcnt, > - size_t nr, ssize_t *total, size_t *count, > off_t *offset) > + ssize_t *nr, ssize_t *total, size_t *count, > off_t *offset) > { > - while (nr >= (*iov)->iov_len) { > - nr -= (*iov)->iov_len; > + while ((size_t)*nr >= (*iov)->iov_len) { > + *nr -= (*iov)->iov_len; > *total += (*iov)->iov_len; > *count -= (*iov)->iov_len; > if (offset) > @@ -218,7 +218,18 @@ ssize_t readv_in_full(int fd, const struct iovec *iov, > int iovcnt) > return -1; > } > > - shift_iovec(&iov, &iovcnt, nr, &total, &count, NULL); > + shift_iovec(&iov, &iovcnt, &nr, &total, &count, NULL); > + > + while (nr > 0) { > + ssize_t nr_readagain; > + nr_readagain = xread(fd, iov->iov_base + nr, > + > iov->iov_len - nr); > + if (nr_readagain <= 0) > + return total; > + > + nr += nr_readagain; > + shift_iovec(&iov, &iovcnt, &nr, &total, &count, NULL); > + } > } > > return total; As mentioned on IRC, I hate this patch with a passion. ;-) We don't do O_DIRECT now so this doesn't help with performance and the extra complexity it brings to the table isn't very appealing. Modifying the struct iovec (or making a copy of it) seems to be much nicer approach. Pekka -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] KVM: Use pci_store/load_saved_state() around VM device usage
On 04/18/2011 10:43 PM, Alex Williamson wrote: On Sun, 2011-04-17 at 12:25 +0300, Avi Kivity wrote: > On 04/15/2011 10:54 PM, Alex Williamson wrote: > > Store the device saved state so that we can reload the device back > > to the original state when it's unassigned. This has the benefit > > that the state survives across pci_reset_function() calls via > > the PCI sysfs reset interface while the VM is using the device. > > > @@ -516,7 +518,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm, > > > > pci_reset_function(dev); > > pci_save_state(dev); > > - > > + match->pci_saved_state = pci_store_saved_state(dev); > > match->assigned_dev_id = assigned_dev->assigned_dev_id; > > Error check? > > It might be better to give up the opacity of the data structure and make > pci_saved_state the full struct, not a pointer. pci_store_saved_state() returns NULL on error, which is correctly handled if we pass NULL to pci_load_saved_state() or a pointer to NULL to pci_load_and_free_saved_state(). But we silently swallow an error, this isn't good. This is also why I changed the __pci_reset_function() back to a normal pci_reset_function(), so we're never left with an uninitialized device like we are now. We could be more verbose or return an error here, but we've gone for a long time not even doing this save/restore across VM usage, so I don't think it's worthy of preventing the device attachment if it fails. At least a log? Note avoiding the pointer would have removed the problem altogether. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html