[PATCH] kvm, ioapic: Fix an error field reference

2011-04-20 Thread Liu Yuan
From: Liu Yuan 

Function ioapic_debug() in the ioapic_deliver() misnames
one filed by reference. This patch correct it.

Signed-off-by: Liu Yuan 
---
 virt/kvm/ioapic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 0b9df83..8df1ca1 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -167,7 +167,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
 
ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
 "vector=%x trig_mode=%x\n",
-entry->fields.dest, entry->fields.dest_mode,
+entry->fields.dest_id, entry->fields.dest_mode,
 entry->fields.delivery_mode, entry->fields.vector,
 entry->fields.trig_mode);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [qemu-iotests][PATCH] Update rbd support

2011-04-20 Thread Christoph Hellwig
On Tue, Apr 12, 2011 at 10:42:00PM -0700, Josh Durgin wrote:
> > I suspect we only support the weird writing past size for the
> > file protocol, so we should only run the test for it.
> > 
> > Or does sheepdog do anything special about it?
> 
> Sheepdog supports it by truncating to the right size if a write
> would be past the end. I'm not sure if other protocols support
> it.

I've changed 016 to require the file or sheepdog protocols, and then
applied the rest of your patch.  Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest PATCH] KVM-test: Check if guest bootable after reseting several times

2011-04-20 Thread Amos Kong
This test comes from a regression bug:
Guest can not found bootable device after reseting several times by
monitor command.

Signed-off-by: Amos Kong 
---
 client/tests/kvm/tests/system_reset_bootable.py |   29 +++
 client/tests/kvm/tests_base.cfg.sample  |7 ++
 2 files changed, 36 insertions(+), 0 deletions(-)
 create mode 100755 client/tests/kvm/tests/system_reset_bootable.py

diff --git a/client/tests/kvm/tests/system_reset_bootable.py 
b/client/tests/kvm/tests/system_reset_bootable.py
new file mode 100755
index 000..ca9fb70
--- /dev/null
+++ b/client/tests/kvm/tests/system_reset_bootable.py
@@ -0,0 +1,29 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_test_utils
+
+
+def run_system_reset_bootable(test, params, env):
+"""
+KVM reset test:
+1) Boot guest.
+2) Send some times system_reset monitor command.
+3) Log into the guest to verify it could normally boot.
+
+@param test: kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+vm = env.get_vm(params["main_vm"])
+vm.verify_alive()
+timeout = float(params.get("login_timeout", 240))
+reset_times = int(params.get("reset_times",20))
+interval = int(params.get("reset_interval",10))
+wait_time = int(params.get("wait_time_for_reset",60))
+time.sleep(wait_time)
+
+for i in range(reset_times):
+vm.monitor.cmd("system_reset")
+time.sleep(interval)
+
+session = vm.wait_for_login(timeout=timeout)
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 7333ed0..ceafebe 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -961,6 +961,13 @@ variants:
 sleep_before_reset = 20
 kill_vm_on_error = yes
 
+- system_reset_bootable:
+type = system_reset_bootable
+interval = 1
+reset_times = 20
+wait_time_for_reset = 120
+kill_vm_on_error = yes
+
 - shutdown: install setup unattended_install.cdrom
 type = shutdown
 shutdown_method = shell

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Autotest PATCH] KVM-test: Simple stop/continue test

2011-04-20 Thread Amos Kong
Change guest state by monitor cmd, verify guest status,
and try to login guest by network.

Signed-off-by: Jason Wang 
Signed-off-by: Amos Kong 
---
 client/tests/kvm/tests/stop_continue.py |   52 +++
 client/tests/kvm/tests_base.cfg.sample  |4 ++
 2 files changed, 56 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/kvm/tests/stop_continue.py

diff --git a/client/tests/kvm/tests/stop_continue.py 
b/client/tests/kvm/tests/stop_continue.py
new file mode 100644
index 000..c7d8025
--- /dev/null
+++ b/client/tests/kvm/tests/stop_continue.py
@@ -0,0 +1,52 @@
+import logging
+from autotest_lib.client.common_lib import error
+
+
+def run_stop_continue(test, params, env):
+"""
+Suspend a running Virtual Machine and verify its state.
+
+1) Boot the vm
+2) Suspend the vm through stop command
+3) Verify the state through info status command
+4) Check is the ssh session to guest is still responsive,
+   if succeed, fail the test.
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters
+@param env: Dictionary with test environment.
+"""
+vm = env.get_vm(params["main_vm"])
+vm.verify_alive()
+timeout = float(params.get("login_timeout", 240))
+session = vm.wait_for_login(timeout=timeout)
+
+try:
+logging.info("Suspend the virtual machine")
+vm.monitor.cmd("stop")
+
+logging.info("Verifying the status of virtual machine through monitor")
+o = vm.monitor.info("status")
+if 'paused' not in o and ( "u'running': False" not in str(o)):
+logging.error(o)
+raise error.TestFail("Fail to suspend through monitor command 
line")
+
+logging.info("Check the session responsiveness")
+if session.is_responsive():
+raise error.TestFail("Session is still responsive after stop")
+
+logging.info("Try to resume the guest")
+vm.monitor.cmd("cont")
+
+o = vm.monitor.info("status")
+m_type = params.get("monitor_type", "human")
+if ('human' in m_type and 'running' not in o) or\
+   ('qmp' in m_type and "u'running': True" not in str(o)):
+logging.error(o)
+raise error.TestFail("Could not continue the execution")
+
+logging.info("Try to re-log into guest")
+session = vm.wait_for_login(timeout=timeout)
+
+finally:
+session.close()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 5d274f8..7333ed0 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -260,6 +260,10 @@ variants:
 - systemtap:
 test_control_file = systemtap.control
 
+- stop_continue:
+type = stop_continue
+kill_vm_on_error = yes
+
 - linux_s3: install setup unattended_install.cdrom
 only Linux
 type = linux_s3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 7/8] Enable ixgbe to support zerocopy

2011-04-20 Thread Shirley Ma
Signed-off-by: Shirley Ma 
---

 drivers/net/ixgbe/ixgbe_main.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 6f8adc7..68f1e93 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -7395,6 +7395,7 @@ static int __devinit ixgbe_probe(struct pci_dev *pdev,
 #endif /* IXGBE_FCOE */
if (pci_using_dac) {
netdev->features |= NETIF_F_HIGHDMA;
+   netdev->features |= NETIF_F_ZEROCOPY;
netdev->vlan_features |= NETIF_F_HIGHDMA;
}
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to use qemu-kvm with Fedora15-beta gnome3 (better vga driver ?)

2011-04-20 Thread Cheng Renquan
Hi all,

I'm trying to use qemu-kvm to run Fedora15-beta with gnome3,
but it told me graphics hardware failed to run gnome3 specific
features and it fallback to gnome2;

I checked the qemu-doc and tried all these vga drivers, no one
could work with gnome3, does someone know how to run qemu with
a better virtual graphics hardware ? Thanks,

http://wiki.qemu.org/download/qemu-doc.html

‘-vga type’
Select type of VGA card to emulate. Valid values for type are

‘cirrus’
Cirrus Logic GD5446 Video card. All Windows versions starting from
Windows 95 should recognize and use this graphic card. For optimal
performances, use 16 bit color depth in the guest and the host OS.
(This one is the default)

‘std’
Standard VGA card with Bochs VBE extensions. If your guest OS supports
the VESA 2.0 VBE extensions (e.g. Windows XP) and if you want to use
high resolution modes (>= 1280x1024x16) then you should use this
option.

‘vmware’
VMWare SVGA-II compatible adapter. Use it if you have sufficiently
recent XFree86/XOrg server or Windows guest with a driver for this
card.


--
Cheng Renquan (程任全)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)

2011-04-20 Thread Jason Wang
Krishna Kumar2 writes:
 > Thanks Jason!
 > 
 > So I can use my virtio-net guest driver and test with this patch?
 > Please provide the script you use to start MQ guest.
 > 

Yes and thanks. Following is a simple script may help you start macvtap mq
guest.

qemu_path=./qemu-system-x86_64
img_path=/home/kvm_autotest_root/images/mq.qcow2
vtap_dev=/dev/tap104
mac=96:88:12:1C:27:83
smp=2
mq=4

for i in `seq $mq`
do
vtap+=" -netdev tap,id=hn$i,fd=$((i+100)) $((i+100))<>$vtap_dev"
netdev+="hn$i#"
done

eval "$qemu_path $img_path $vtap -device 
virtio-net-pci,queues=$mq,netdev=$netdev,mac=$mac,vectors=32 -enable-kvm -smp 
$smp"


 > Regards,
 > 
 > - KK
 > 
 > Jason Wang  wrote on 04/20/2011 02:03:07 PM:
 > 
 > > Jason Wang 
 > > 04/20/2011 02:03 PM
 > >
 > > To
 > >
 > > Krishna Kumar2/India/IBM@IBMIN, kvm@vger.kernel.org, m...@redhat.com,
 > > net...@vger.kernel.org, ru...@rustcorp.com.au, qemu-
 > > de...@nongnu.org, anth...@codemonkey.ws
 > >
 > > cc
 > >
 > > Subject
 > >
 > > [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
 > >
 > > Inspired by Krishna's patch
 > (http://www.spinics.net/lists/kvm/msg52098.html
 > > ) and
 > > Michael's suggestions.  The following series adds the multiqueue support
 > for
 > > qemu and enable it for virtio-net (both userspace and vhost).
 > >
 > > The aim for this series is to simplified the management and achieve the
 > same
 > > performacne with less codes.
 > >
 > > Follows are the differences between this series and Krishna's:
 > >
 > > - Add the multiqueue support for qemu and also for userspace virtio-net
 > > - Instead of hacking the vhost module to manipulate kthreads, this patch
 > just
 > > implement the userspace based multiqueues and thus can re-use the
 > > existed vhost kernel-side codes without any modification.
 > > - Use 1:1 mapping between TX/RX pairs and vhost kthread because the
 > > implementation is based on usersapce.
 > > - The cli is also changed to make the mgmt easier, the -netdev option of
 > qdev
 > > can now accpet more than one ids. You can start a multiqueue virtio-net
 > device
 > > through:
 > > ./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev
 > > tap,id=hn0,vhost=on,fd=Y -device
 > virtio-net-pci,netdev=hn0#hn1,queues=2 ...
 > >
 > > The series is very primitive and still need polished.
 > >
 > > Suggestions are welcomed.
 > > ---
 > >
 > > Jason Wang (2):
 > >   net: Add multiqueue support
 > >   virtio-net: add multiqueue support
 > >
 > >
 > >  hw/qdev-properties.c |   37 -
 > >  hw/qdev.h|3
 > >  hw/vhost.c   |   26 ++-
 > >  hw/vhost.h   |1
 > >  hw/vhost_net.c   |7 +
 > >  hw/vhost_net.h   |2
 > >  hw/virtio-net.c  |  409 +++
 > > +--
 > >  hw/virtio-net.h  |2
 > >  hw/virtio-pci.c  |1
 > >  hw/virtio.h  |1
 > >  net.c|   34 +++-
 > >  net.h|   15 +-
 > >  12 files changed, 353 insertions(+), 185 deletions(-)
 > >
 > > --
 > > Jason Wang
 > 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3 V8] QAPI: add inject-nmi qmp command

2011-04-20 Thread Lai Jiangshan

Hi, Anthony Liguori

Any suggestion?

Although all command line interfaces will be converted to to use QMP interfaces 
in 0.16,
I hope inject-nmi come into QAPI earlier, 0.15.

Thanks,
Lai
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance of virtual functions compared to virtio

2011-04-20 Thread Alex Williamson
On Wed, 2011-04-20 at 19:57 -0600, David Ahern wrote:
> In general should virtual functions outperform virtio+vhost for
> networking performance - latency and throughput?
> 
> I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF
> and the other going through virtio and a tap device like so:
> 
>--  
>   |  || VF |---
>   |  |    |
>   | VM 1 ||
>   |  |-   |
>   |  |---| tap |---   |
>-- -   |  ---
>  ---| e |
> | b |   | t |
> | r |   | h |
>  ---| 2 |
>-- -   |  ---
>   |  |---| tap |---   |
>   |  |-   |
>   | VM 2 ||
>   |  |    |
>   |  || VF |---
>--  
> 
> The network arguments to qemu-kvm are:
> -netdev type=tap,vhost=on,ifname=tap2,id=netdev1
> -device virtio-net-pci,mac=${mac},netdev=netdev1
> 
> where ${mac} is unique to each VM and for the VF:
> -device pci-assign,host=${pciid}
> 
> netserver is running within the VMs, and the netperf commands I am
> running are:
> 
>   netperf -p 12346 -H  -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024
>   netperf -p 12346 -H  -l 20 -jcC -fM -v 2 -t TCP_STREAM
> 
> where  changes depending on which interface I want to send the
> traffic through. To say the least results are a bit disappointing for
> the VF:
> 
>   latency   throughput
>   (usec/Tran)   (MB/sec)
> Host-VM
>  over virtio  139.1601199.40
>  over VF  488.124 209.22
> 
> VM-VM
>  over virtio  322.056 773.54
>  over VF  488.051 328.88
> 
> I am just getting started with VFs and could use some hints on how to
> improve the performance.

Device assignment via a VF provides the lowest latency and most
bandwidth for *getting data off the host system*, though virtio/vhost is
getting better.  If all you care about is VM-VM on the same host or
VM-host, then virtio is only limited by memory bandwidth/latency and
host processor cycles.  Your processor has 25GB/s of memory bandwidth.
On the other hand, the VF has to send data all the way out to the wire
and all the way back up through the NIC to get to the other VM/host.
You're using a 1Gb/s NIC.  Your results actually seem to indicate you're
getting better than wire rate, so maybe you're only passing through an
internal switch on the NIC, in any case, VFs are not optimal for
communication within the same physical system.  They are optimal for off
host communication.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


performance of virtual functions compared to virtio

2011-04-20 Thread David Ahern
In general should virtual functions outperform virtio+vhost for
networking performance - latency and throughput?

I have 2 VMs running on a host. Each VM has 2 nics -- one tied to a VF
and the other going through virtio and a tap device like so:

   --  
  |  || VF |---
  |  |    |
  | VM 1 ||
  |  |-   |
  |  |---| tap |---   |
   -- -   |  ---
 ---| e |
| b |   | t |
| r |   | h |
 ---| 2 |
   -- -   |  ---
  |  |---| tap |---   |
  |  |-   |
  | VM 2 ||
  |  |    |
  |  || VF |---
   --  

The network arguments to qemu-kvm are:
-netdev type=tap,vhost=on,ifname=tap2,id=netdev1
-device virtio-net-pci,mac=${mac},netdev=netdev1

where ${mac} is unique to each VM and for the VF:
-device pci-assign,host=${pciid}

netserver is running within the VMs, and the netperf commands I am
running are:

  netperf -p 12346 -H  -l 20 -jcC -fM -v 2 -t TCP_RR -- -r 1024
  netperf -p 12346 -H  -l 20 -jcC -fM -v 2 -t TCP_STREAM

where  changes depending on which interface I want to send the
traffic through. To say the least results are a bit disappointing for
the VF:

  latency   throughput
  (usec/Tran)   (MB/sec)
Host-VM
 over virtio  139.1601199.40
 over VF  488.124 209.22

VM-VM
 over virtio  322.056 773.54
 over VF  488.051 328.88

I am just getting started with VFs and could use some hints on how to
improve the performance.

Host:
  Dell R410
  2 quad core E5620@2.40 GHz processors
  16 GB RAM
  Intel 82576 NIC (Gigabit ET Quad Port)
  Fedora 14
  kernel: 2.6.35.12-88.fc14.x86_64
  qemu-kvm-0.13.0-1.fc14.x86_64

VMs:
  Fedora 14
  kernel 2.6.35.11-83.fc14.x86_64
  2 vcpus
  1GB RAM

Thanks,

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 3/8] Add userspace buffers support in skb

2011-04-20 Thread Shirley Ma
This patch adds userspace buffers support in skb. A new struct
skb_ubuf_info is needed to maintain the userspace buffers argument
and index, a callback is used to notify userspace to release the
buffers once lower device has done DMA (Last reference to that skb
has gone).

Signed-off-by: Shirley Ma 
---

 include/linux/skbuff.h |   14 ++
 net/core/skbuff.c  |   15 ++-
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d0ae90a..47a187b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -189,6 +189,16 @@ enum {
SKBTX_DRV_NEEDS_SK_REF = 1 << 3,
 };
 
+/* The callback notifies userspace to release buffers when skb DMA is done in
+ * lower device, the desc is used to track userspace buffer index.
+ */
+struct skb_ubuf_info {
+   /* support buffers allocation from userspace */
+   void(*callback)(struct sk_buff *);
+   void*arg;
+   size_t  desc;
+};
+
 /* This data is invariant across clones and lives at
  * the end of the header data, ie. at skb->end.
  */
@@ -211,6 +221,10 @@ struct skb_shared_info {
/* Intermediate layers must ensure that destructor_arg
 * remains valid until skb destructor */
void *  destructor_arg;
+
+   /* DMA mapping from/to userspace buffers */
+   struct skb_ubuf_info ubuf;
+
/* must be last field, see pskb_expand_head() */
skb_frag_t  frags[MAX_SKB_FRAGS];
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7ebeed0..822c07d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -210,6 +210,8 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
gfp_mask,
shinfo = skb_shinfo(skb);
memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
atomic_set(&shinfo->dataref, 1);
+   shinfo->ubuf.callback = NULL;
+   shinfo->ubuf.arg = NULL;
kmemcheck_annotate_variable(shinfo->destructor_arg);
 
if (fclone) {
@@ -327,7 +329,15 @@ static void skb_release_data(struct sk_buff *skb)
for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
put_page(skb_shinfo(skb)->frags[i].page);
}
-
+   /*
+* if skb buf is from userspace, we need to notify the caller
+* the lower device DMA has done;
+*/
+   if (skb_shinfo(skb)->ubuf.callback) {
+   skb_shinfo(skb)->ubuf.callback(skb);
+   skb_shinfo(skb)->ubuf.callback = NULL;
+   skb_shinfo(skb)->ubuf.arg = NULL;
+   }
if (skb_has_frag_list(skb))
skb_drop_fraglist(skb);
 
@@ -480,6 +490,9 @@ bool skb_recycle_check(struct sk_buff *skb, int skb_size)
if (irqs_disabled())
return false;
 
+   if (shinfo->ubuf.callback)
+   return false;
+
if (skb_is_nonlinear(skb) || skb->fclone != SKB_FCLONE_UNAVAILABLE)
return false;
 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte

2011-04-20 Thread Takuya Yoshikawa
On Wed, 20 Apr 2011 14:18:12 +0300
Avi Kivity  wrote:

> Correct.  The reason I don't want the helper, is so we can use ptep_user 
> in both places (not for efficiency, just to make sure it's exactly the 
> same value).
> 

Thank you for your explanation, now I've got the picture!

I will send a new patch taking into account your advice.

  Takuya


> > The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
> > fix soon.
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too

2011-04-20 Thread Takuya Yoshikawa
On Wed, 20 Apr 2011 15:33:16 +0200
"Roedel, Joerg"  wrote:

> @@ -245,13 +257,17 @@ walk:
>   goto error;
>  
>   if (write_fault && !is_dirty_gpte(pte)) {
> - bool ret;
> + int ret;
>  
>   trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
> - ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
> + ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte,
>   pte|PT_DIRTY_MASK);
> - if (ret)
> + if (ret < 0) {
> + present = false;
> + goto error;
> + } if (ret)
>   goto walk;

Preferably else if or another line ? :)

Takuya


> +
>   mark_page_dirty(vcpu->kvm, table_gfn);
>   pte |= PT_DIRTY_MASK;
>   walker->ptes[walker->level - 1] = pte;
> -- 
> 1.7.1
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 5/8] Enable cxgb3 to support zerocopy

2011-04-20 Thread Shirley Ma
On Wed, 2011-04-20 at 13:58 -0700, Shirley Ma wrote:
> This flag can be ON when HIGHDMA and scatter/gather support. I will
> modify the patch to make it conditionally.

Double checked, it only needs HIGHDMA condition, not scatter/gather.

thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 5/8] Enable cxgb3 to support zerocopy

2011-04-20 Thread Shirley Ma
On Wed, 2011-04-20 at 13:52 -0700, Dimitris Michailidis wrote:
> The features handling has been reworked in net-next and patches like
> this 
> won't apply as the code you're patching has changed.  Also core code
> now 
> does a lot of the related work and you'll need to tell it what to do
> with 
> any new flags.
Ok, will do.

> What properties does a device or driver need to meet to set this flag?
> It 
> seems to be set a bit too unconditionally.  For example, does it work
> if one 
> disables scatter/gather? 

This flag can be ON when HIGHDMA and scatter/gather support. I will
modify the patch to make it conditionally.

thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 5/8] Enable cxgb3 to support zerocopy

2011-04-20 Thread Dimitris Michailidis

On 04/20/2011 01:13 PM, Shirley Ma wrote:

Signed-off-by: Shirley Ma 
---

 drivers/net/cxgb3/cxgb3_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 9108931..93a1101 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -3313,7 +3313,7 @@ static int __devinit init_one(struct pci_dev *pdev,
netdev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
netdev->features |= NETIF_F_GRO;
if (pci_using_dac)
-   netdev->features |= NETIF_F_HIGHDMA;
+   netdev->features |= NETIF_F_HIGHDMA | NETIF_F_ZEROCOPY;
 
 		netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;

netdev->netdev_ops = &cxgb_netdev_ops;


The features handling has been reworked in net-next and patches like this 
won't apply as the code you're patching has changed.  Also core code now 
does a lot of the related work and you'll need to tell it what to do with 
any new flags.


What properties does a device or driver need to meet to set this flag?  It 
seems to be set a bit too unconditionally.  For example, does it work if one 
disables scatter/gather?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 3/3] KVM: Use pci_store/load_saved_state() around VM device usage

2011-04-20 Thread Alex Williamson
Store the device saved state so that we can reload the device back
to the original state when it's unassigned.  This has the benefit
that the state survives across pci_reset_function() calls via
the PCI sysfs reset interface while the VM is using the device.

Signed-off-by: Alex Williamson 
---

 include/linux/kvm_host.h |1 +
 virt/kvm/assigned-dev.c  |   18 ++
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab42855..9272db0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -513,6 +513,7 @@ struct kvm_assigned_dev_kernel {
struct kvm *kvm;
spinlock_t intx_lock;
char irq_name[32];
+   struct pci_saved_state *pci_saved_state;
 };
 
 struct kvm_irq_mask_notifier {
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index ae72ae6..6cc4b97 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -197,8 +197,13 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 {
kvm_free_assigned_irq(kvm, assigned_dev);
 
-   __pci_reset_function(assigned_dev->dev);
-   pci_restore_state(assigned_dev->dev);
+   pci_reset_function(assigned_dev->dev);
+   if (pci_load_and_free_saved_state(assigned_dev->dev,
+ &assigned_dev->pci_saved_state))
+   printk(KERN_INFO "%s: Couldn't reload %s saved state\n",
+  __func__, dev_name(&assigned_dev->dev->dev));
+   else
+   pci_restore_state(assigned_dev->dev);
 
pci_release_regions(assigned_dev->dev);
pci_disable_device(assigned_dev->dev);
@@ -516,7 +521,10 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 
pci_reset_function(dev);
pci_save_state(dev);
-
+   match->pci_saved_state = pci_store_saved_state(dev);
+   if (!match->pci_saved_state)
+   printk(KERN_DEBUG "%s: Couldn't store %s saved state\n",
+  __func__, dev_name(&dev->dev));
match->assigned_dev_id = assigned_dev->assigned_dev_id;
match->host_segnr = assigned_dev->segnr;
match->host_busnr = assigned_dev->busnr;
@@ -546,7 +554,9 @@ out:
mutex_unlock(&kvm->lock);
return r;
 out_list_del:
-   pci_restore_state(dev);
+   if (pci_load_and_free_saved_state(dev, &match->pci_saved_state))
+   printk(KERN_INFO "%s: Couldn't reload %s saved state\n",
+  __func__, dev_name(&dev->dev));
list_del(&match->list);
pci_release_regions(dev);
 out_disable:

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] PCI: Add interfaces to store and load the device saved state

2011-04-20 Thread Alex Williamson
For KVM device assignment, we'd like to save off the state of a device
prior to passing it to the guest and restore it later.  We also want
to allow pci_reset_funciton() to be called while the device is owned
by the guest.  This however overwrites and invalidates the struct pci_dev
buffers, so we can't just manually call save and restore.  Add generic
interfaces for the saved state to be stored and reloaded back into
struct pci_dev at a later time.

Signed-off-by: Alex Williamson 
---

 drivers/pci/pci.c   |   98 +++
 include/linux/pci.h |4 ++
 2 files changed, 102 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d2500a0..7631acf 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -976,6 +976,104 @@ void pci_restore_state(struct pci_dev *dev)
dev->state_saved = false;
 }
 
+struct pci_saved_state {
+   u32 config_space[16];
+   struct pci_cap_saved cap_saved[0];
+};
+
+/**
+ * pci_store_saved_state - Allocate and return an opaque struct containing
+ *the device saved state.
+ * @dev: PCI device that we're dealing with
+ *
+ * Rerturn NULL if no state or error.
+ */
+struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev)
+{
+   struct pci_saved_state *state;
+   struct pci_cap_saved_state *tmp;
+   struct pci_cap_saved *cap_saved;
+   struct hlist_node *pos;
+   size_t size;
+
+   if (!dev->state_saved)
+   return NULL;
+
+   size = sizeof(*state) + sizeof(struct pci_cap_saved);
+
+   hlist_for_each_entry(tmp, pos, &dev->saved_cap_space, next)
+   size += sizeof(struct pci_cap_saved) + tmp->saved.size;
+
+   state = kzalloc(size, GFP_KERNEL);
+   if (!state)
+   return NULL;
+
+   memcpy(state->config_space, dev->saved_config_space,
+  sizeof(state->config_space));
+
+   cap_saved = state->cap_saved;
+   hlist_for_each_entry(tmp, pos, &dev->saved_cap_space, next) {
+   size_t len = sizeof(struct pci_cap_saved) + tmp->saved.size;
+   memcpy(cap_saved, &tmp->saved, len);
+   cap_saved = (struct pci_cap_saved *)((u8 *)cap_saved + len);
+   }
+   /* Empty cap_save terminates list */
+
+   return state;
+}
+EXPORT_SYMBOL_GPL(pci_store_saved_state);
+
+/**
+ * pci_load_saved_state - Reload the provided save state into struct pci_dev.
+ * @dev: PCI device that we're dealing with
+ * @state: Saved state returned from pci_store_saved_state()
+ */
+int pci_load_saved_state(struct pci_dev *dev, struct pci_saved_state *state)
+{
+   struct pci_cap_saved *cap_saved;
+
+   dev->state_saved = false;
+
+   if (!state)
+   return 0;
+
+   memcpy(dev->saved_config_space, state->config_space,
+  sizeof(state->config_space));
+
+   cap_saved = state->cap_saved;
+   while (cap_saved->size) {
+   struct pci_cap_saved_state *tmp;
+
+   tmp = pci_find_saved_cap(dev, cap_saved->cap_nr);
+   if (!tmp || tmp->saved.size != cap_saved->size)
+   return -EINVAL;
+
+   memcpy(tmp->saved.data, cap_saved->data, tmp->saved.size);
+   cap_saved = (struct pci_cap_saved *)((u8 *)cap_saved +
+   sizeof(struct pci_cap_saved) + cap_saved->size);
+   }
+
+   dev->state_saved = true;
+   return 0;
+}
+EXPORT_SYMBOL_GPL(pci_load_saved_state);
+
+/**
+ * pci_load_and_free_saved_state - Reload the save state pointed to by state,
+ *and free the memory allocated for it.
+ * @dev: PCI device that we're dealing with
+ * @state: Pointer to saved state returned from pci_store_saved_state()
+ */
+int pci_load_and_free_saved_state(struct pci_dev *dev,
+ struct pci_saved_state **state)
+{
+   int ret = pci_load_saved_state(dev, *state);
+   kfree(*state);
+   *state = NULL;
+   return ret;
+}
+EXPORT_SYMBOL_GPL(pci_load_and_free_saved_state);
+
 static int do_pci_enable_device(struct pci_dev *dev, int bars)
 {
int err;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 46fd382..f2a6262 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -812,6 +812,10 @@ size_t pci_get_rom_size(struct pci_dev *pdev, void __iomem 
*rom, size_t size);
 /* Power management related routines */
 int pci_save_state(struct pci_dev *dev);
 void pci_restore_state(struct pci_dev *dev);
+struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev);
+int pci_load_saved_state(struct pci_dev *dev, struct pci_saved_state *state);
+int pci_load_and_free_saved_state(struct pci_dev *dev,
+ struct pci_saved_state **state);
 int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state);
 int pci_set_power_state(struct pci_dev *dev, pci_power_t state);
 pci_power_t pci_choose_state(struc

[PATCH v3 1/3] PCI: Track the size of each saved capability data area

2011-04-20 Thread Alex Williamson
This will allow us to store and load it later.

Signed-off-by: Alex Williamson 
---

 drivers/pci/pci.c   |   12 +++-
 include/linux/pci.h |   11 ---
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2472e71..d2500a0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -830,7 +830,7 @@ static int pci_save_pcie_state(struct pci_dev *dev)
dev_err(&dev->dev, "buffer not found in %s\n", __func__);
return -ENOMEM;
}
-   cap = (u16 *)&save_state->data[0];
+   cap = (u16 *)&save_state->saved.data[0];
 
pci_read_config_word(dev, pos + PCI_EXP_FLAGS, &flags);
 
@@ -863,7 +863,7 @@ static void pci_restore_pcie_state(struct pci_dev *dev)
pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
if (!save_state || pos <= 0)
return;
-   cap = (u16 *)&save_state->data[0];
+   cap = (u16 *)&save_state->saved.data[0];
 
pci_read_config_word(dev, pos + PCI_EXP_FLAGS, &flags);
 
@@ -899,7 +899,8 @@ static int pci_save_pcix_state(struct pci_dev *dev)
return -ENOMEM;
}
 
-   pci_read_config_word(dev, pos + PCI_X_CMD, (u16 *)save_state->data);
+   pci_read_config_word(dev, pos + PCI_X_CMD,
+(u16 *)save_state->saved.data);
 
return 0;
 }
@@ -914,7 +915,7 @@ static void pci_restore_pcix_state(struct pci_dev *dev)
pos = pci_find_capability(dev, PCI_CAP_ID_PCIX);
if (!save_state || pos <= 0)
return;
-   cap = (u16 *)&save_state->data[0];
+   cap = (u16 *)&save_state->saved.data[0];
 
pci_write_config_word(dev, pos + PCI_X_CMD, cap[i++]);
 }
@@ -1771,7 +1772,8 @@ static int pci_add_cap_save_buffer(
if (!save_state)
return -ENOMEM;
 
-   save_state->cap_nr = cap;
+   save_state->saved.cap_nr = cap;
+   save_state->saved.size = size;
pci_add_saved_cap(dev, save_state);
 
return 0;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 96f70d7..46fd382 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -214,12 +214,17 @@ enum pci_bus_speed {
PCI_SPEED_UNKNOWN   = 0xff,
 };
 
-struct pci_cap_saved_state {
-   struct hlist_node next;
+struct pci_cap_saved {
char cap_nr;
+   unsigned int size;
u32 data[0];
 };
 
+struct pci_cap_saved_state {
+   struct hlist_node next;
+   struct pci_cap_saved saved;
+};
+
 struct pcie_link_state;
 struct pci_vpd;
 struct pci_sriov;
@@ -366,7 +371,7 @@ static inline struct pci_cap_saved_state 
*pci_find_saved_cap(
struct hlist_node *pos;
 
hlist_for_each_entry(tmp, pos, &pci_dev->saved_cap_space, next) {
-   if (tmp->cap_nr == cap)
+   if (tmp->saved.cap_nr == cap)
return tmp;
}
return NULL;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] Store and load PCI device saved state across function resets

2011-04-20 Thread Alex Williamson
v2 -> v3:
  Saved structure has variable contents.

Avi, see if this adds any credibility to the pci-core allocated
opaque buffer.  It was wrong in the previous versions to distill
the variable device capability save list into a fixed struct.
This should also eliminate any future maintenance specific to
this storing and loading of state as capability save changes.

v1 -> v2:
  Make the pointer passed around less opaque for type safety.

Bug https://bugs.launchpad.net/qemu/+bug/754591 is caused because
the KVM module attempts to do a pci_save_state() before assigning
the device to a VM, expecting that the saved state will remain
valid until we release the device.  This is in conflict with our
need to reset devices using PCI sysfs during a VM reset to
quiesce the device.  Any calls to pci_reset_function() will
overwrite the device saved stated prior to reset, and reload and
invalidate the state after.  KVM then ends up trying to restore
the state, but it's already invalid, so the device ends up with
reset values.

This series adds a mechanism to pull the saved state off the
struct pci_dev and reload it later.  Thanks,

Alex
---

Alex Williamson (3):
  KVM: Use pci_store/load_saved_state() around VM device usage
  PCI: Add interfaces to store and load the device saved state
  PCI: Track the size of each saved capability data area


 drivers/pci/pci.c|  110 --
 include/linux/kvm_host.h |1 
 include/linux/pci.h  |   15 +-
 virt/kvm/assigned-dev.c  |   18 ++--
 4 files changed, 132 insertions(+), 12 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Shirley Ma
Resubmit it with 31 bit.

Signed-off-by: Shirley Ma 
---

 include/linux/netdevice.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0249fe7..0998d3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1067,6 +1067,9 @@ struct net_device {
 #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */
 
+/* Bit 31 is for device to map userspace buffers -- zerocopy */
+#define NETIF_F_ZEROCOPY   (1 << 31)
+
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
 #define NETIF_F_GSO_MASK   0x00ff


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Shirley Ma
On Wed, 2011-04-20 at 13:24 -0700, Dimitris Michailidis wrote:
> Bit 30 is also taken in net-next.

How about 31?

Thanks
Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 6/8] macvtap/vhost TX zero copy support

2011-04-20 Thread Shirley Ma
Only when buffer size is greater than GOODCOPY_LEN (128), macvtap
enables zero-copy.

Signed-off-by: Shirley Ma 
---

 drivers/net/macvtap.c |  124
-
 1 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 6696e56..b4e6656 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
  */
 static dev_t macvtap_major;
 #define MACVTAP_NUM_DEVS 65536
+#define GOODCOPY_LEN  (L1_CACHE_BYTES < 128 ? 128 : L1_CACHE_BYTES)
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
 
@@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct
file *file)
 {
struct net *net = current->nsproxy->net_ns;
struct net_device *dev = dev_get_by_index(net, iminor(inode));
+   struct macvlan_dev *vlan = netdev_priv(dev);
struct macvtap_queue *q;
int err;
 
@@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode, struct
file *file)
q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
+   /*
+* so far only VM uses macvtap, enable zero copy between guest
+* kernel and host kernel when lower device supports high memory
+* DMA
+*/
+   if (vlan) {
+   if (vlan->lowerdev->features & NETIF_F_ZEROCOPY)
+   sock_set_flag(&q->sk, SOCK_ZEROCOPY);
+   }
+
err = macvtap_set_queue(dev, file, q);
if (err)
sock_put(&q->sk);
@@ -433,6 +445,80 @@ static inline struct sk_buff
*macvtap_alloc_skb(struct sock *sk, size_t prepad,
return skb;
 }
 
+/* set skb frags from iovec, this can move to core network code for
reuse */
+static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct
iovec *from,
+ int offset, size_t count)
+{
+   int len = iov_length(from, count) - offset;
+   int copy = skb_headlen(skb);
+   int size, offset1 = 0;
+   int i = 0;
+   skb_frag_t *f;
+
+   /* Skip over from offset */
+   while (offset >= from->iov_len) {
+   offset -= from->iov_len;
+   ++from;
+   --count;
+   }
+
+   /* copy up to skb headlen */
+   while (copy > 0) {
+   size = min_t(unsigned int, copy, from->iov_len - offset);
+   if (copy_from_user(skb->data + offset1, from->iov_base + offset,
+  size))
+   return -EFAULT;
+   if (copy > size) {
+   ++from;
+   --count;
+   }
+   copy -= size;
+   offset1 += size;
+   offset = 0;
+   }
+
+   if (len == offset1)
+   return 0;
+
+   while (count--) {
+   struct page *page[MAX_SKB_FRAGS];
+   int num_pages;
+   unsigned long base;
+
+   len = from->iov_len - offset1;
+   if (!len) {
+   offset1 = 0;
+   ++from;
+   continue;
+   }
+   base = (unsigned long)from->iov_base + offset1;
+   size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+   num_pages = get_user_pages_fast(base, size, 0, &page[i]);
+   if ((num_pages != size) ||
+   (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
+   /* put_page is in skb free */
+   return -EFAULT;
+   skb->data_len += len;
+   skb->len += len;
+   skb->truesize += len;
+   while (len) {
+   f = &skb_shinfo(skb)->frags[i];
+   f->page = page[i];
+   f->page_offset = base & ~PAGE_MASK;
+   f->size = min_t(int, len, PAGE_SIZE - f->page_offset);
+   skb_shinfo(skb)->nr_frags++;
+   /* increase sk_wmem_alloc */
+   atomic_add(f->size, &skb->sk->sk_wmem_alloc);
+   base += f->size;
+   len -= f->size;
+   i++;
+   }
+   offset1 = 0;
+   ++from;
+   }
+   return 0;
+}
+
 /*
  * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should
  * be shared with the tun/tap driver.
@@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const struct
sk_buff *skb,
 
 
 /* Get packet from user space buffer */
-static ssize_t macvtap_get_user(struct macvtap_queue *q,
-   const struct iovec *iv, size_t count,
-   int noblock)
+static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr
*m,
+   const struct iovec *iv, unsigned long total_len,
+

Re: [PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Shirley Ma
Thanks. I need to update it to 30 bit.

Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Dimitris Michailidis

On 04/20/2011 01:09 PM, Shirley Ma wrote:

Resubmit this patch with the new bit.


Bit 30 is also taken in net-next.



Signed-off-by: Shirley Ma 
---

 include/linux/netdevice.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0249fe7..0998d3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1067,6 +1067,9 @@ struct net_device {
 #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */
 
+/* Bit 30 is for device to map userspace buffers -- zerocopy */

+#define NETIF_F_ZEROCOPY   (1 << 30)
+
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
 #define NETIF_F_GSO_MASK   0x00ff


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/8] macvtap/vhost TX zero copy support

2011-04-20 Thread Shirley Ma
Only when buffer size is greater than GOODCOPY_LEN (128), macvtap
enables zero-copy.

Signed-off-by: Shirley MA 
---

 drivers/net/macvtap.c |  124 -
 1 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 6696e56..b4e6656 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -60,6 +60,7 @@ static struct proto macvtap_proto = {
  */
 static dev_t macvtap_major;
 #define MACVTAP_NUM_DEVS 65536
+#define GOODCOPY_LEN  (L1_CACHE_BYTES < 128 ? 128 : L1_CACHE_BYTES)
 static struct class *macvtap_class;
 static struct cdev macvtap_cdev;
 
@@ -340,6 +341,7 @@ static int macvtap_open(struct inode *inode, struct file 
*file)
 {
struct net *net = current->nsproxy->net_ns;
struct net_device *dev = dev_get_by_index(net, iminor(inode));
+   struct macvlan_dev *vlan = netdev_priv(dev);
struct macvtap_queue *q;
int err;
 
@@ -369,6 +371,16 @@ static int macvtap_open(struct inode *inode, struct file 
*file)
q->flags = IFF_VNET_HDR | IFF_NO_PI | IFF_TAP;
q->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
 
+   /*
+* so far only VM uses macvtap, enable zero copy between guest
+* kernel and host kernel when lower device supports high memory
+* DMA
+*/
+   if (vlan) {
+   if (vlan->lowerdev->features & NETIF_F_ZEROCOPY)
+   sock_set_flag(&q->sk, SOCK_ZEROCOPY);
+   }
+
err = macvtap_set_queue(dev, file, q);
if (err)
sock_put(&q->sk);
@@ -433,6 +445,80 @@ static inline struct sk_buff *macvtap_alloc_skb(struct 
sock *sk, size_t prepad,
return skb;
 }
 
+/* set skb frags from iovec, this can move to core network code for reuse */
+static int zerocopy_sg_from_iovec(struct sk_buff *skb, const struct iovec 
*from,
+ int offset, size_t count)
+{
+   int len = iov_length(from, count) - offset;
+   int copy = skb_headlen(skb);
+   int size, offset1 = 0;
+   int i = 0;
+   skb_frag_t *f;
+
+   /* Skip over from offset */
+   while (offset >= from->iov_len) {
+   offset -= from->iov_len;
+   ++from;
+   --count;
+   }
+
+   /* copy up to skb headlen */
+   while (copy > 0) {
+   size = min_t(unsigned int, copy, from->iov_len - offset);
+   if (copy_from_user(skb->data + offset1, from->iov_base + offset,
+  size))
+   return -EFAULT;
+   if (copy > size) {
+   ++from;
+   --count;
+   }
+   copy -= size;
+   offset1 += size;
+   offset = 0;
+   }
+
+   if (len == offset1)
+   return 0;
+
+   while (count--) {
+   struct page *page[MAX_SKB_FRAGS];
+   int num_pages;
+   unsigned long base;
+
+   len = from->iov_len - offset1;
+   if (!len) {
+   offset1 = 0;
+   ++from;
+   continue;
+   }
+   base = (unsigned long)from->iov_base + offset1;
+   size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+   num_pages = get_user_pages_fast(base, size, 0, &page[i]);
+   if ((num_pages != size) ||
+   (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags))
+   /* put_page is in skb free */
+   return -EFAULT;
+   skb->data_len += len;
+   skb->len += len;
+   skb->truesize += len;
+   while (len) {
+   f = &skb_shinfo(skb)->frags[i];
+   f->page = page[i];
+   f->page_offset = base & ~PAGE_MASK;
+   f->size = min_t(int, len, PAGE_SIZE - f->page_offset);
+   skb_shinfo(skb)->nr_frags++;
+   /* increase sk_wmem_alloc */
+   atomic_add(f->size, &skb->sk->sk_wmem_alloc);
+   base += f->size;
+   len -= f->size;
+   i++;
+   }
+   offset1 = 0;
+   ++from;
+   }
+   return 0;
+}
+
 /*
  * macvtap_skb_from_vnet_hdr and macvtap_skb_to_vnet_hdr should
  * be shared with the tun/tap driver.
@@ -515,17 +601,19 @@ static int macvtap_skb_to_vnet_hdr(const struct sk_buff 
*skb,
 
 
 /* Get packet from user space buffer */
-static ssize_t macvtap_get_user(struct macvtap_queue *q,
-   const struct iovec *iv, size_t count,
-   int noblock)
+static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
+   const struct iovec *iv, unsigned long total_len,
+   

[PATCH V3 8/8] Enable benet to support zerocopy

2011-04-20 Thread Shirley Ma
Signed-off-by: Shirley Ma 
---

 drivers/net/benet/be_main.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 7cb5a11..d7b7254 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2982,6 +2982,7 @@ static int __devinit be_probe(struct pci_dev *pdev,
status = dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
if (!status) {
netdev->features |= NETIF_F_HIGHDMA;
+   netdev->features |= NETIF_F_ZEROCOPY;
} else {
status = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
if (status) {


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 5/8] Enable cxgb3 to support zerocopy

2011-04-20 Thread Shirley Ma
Signed-off-by: Shirley Ma 
---

 drivers/net/cxgb3/cxgb3_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb3/cxgb3_main.c b/drivers/net/cxgb3/cxgb3_main.c
index 9108931..93a1101 100644
--- a/drivers/net/cxgb3/cxgb3_main.c
+++ b/drivers/net/cxgb3/cxgb3_main.c
@@ -3313,7 +3313,7 @@ static int __devinit init_one(struct pci_dev *pdev,
netdev->features |= NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
netdev->features |= NETIF_F_GRO;
if (pci_using_dac)
-   netdev->features |= NETIF_F_HIGHDMA;
+   netdev->features |= NETIF_F_HIGHDMA | NETIF_F_ZEROCOPY;
 
netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
netdev->netdev_ops = &cxgb_netdev_ops;


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Shirley Ma
Resubmit this patch with the new bit.

Signed-off-by: Shirley Ma 
---

 include/linux/netdevice.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0249fe7..0998d3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1067,6 +1067,9 @@ struct net_device {
 #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */
 
+/* Bit 30 is for device to map userspace buffers -- zerocopy */
+#define NETIF_F_ZEROCOPY   (1 << 30)
+
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
 #define NETIF_F_GSO_MASK   0x00ff


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 4/8] vhost TX zero copy support

2011-04-20 Thread Shirley Ma
This patch maintains the outstanding userspace buffers in the 
sequence it is delivered to vhost. The outstanding userspace buffers 
will be marked as done once the lower device buffers DMA has finished. 
This is monitored through last reference of kfree_skb callback. Two
buffer index are used for this purpose.

The vhost passes the userspace buffers info to lower device skb 
through message control. Since there will be some done DMAs when
entering vhost handle_tx. The worse case is all buffers in the vq are
in pending/done status, so we need to notify guest to release DMA done 
buffers first before get any new buffers from the vq.

Signed-off-by: Shirley 
---

 drivers/vhost/net.c   |   30 +++-
 drivers/vhost/vhost.c |   50 -
 drivers/vhost/vhost.h |   10 +
 3 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2f7c76a..1bc4536 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -32,6 +32,8 @@
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x8
 
+#define MAX_ZEROCOPY_PEND 64
+
 enum {
VHOST_NET_VQ_RX = 0,
VHOST_NET_VQ_TX = 1,
@@ -129,6 +131,7 @@ static void handle_tx(struct vhost_net *net)
int err, wmem;
size_t hdr_size;
struct socket *sock;
+   struct skb_ubuf_info pend;
 
/* TODO: check that we are running from vhost_worker? */
sock = rcu_dereference_check(vq->private_data, 1);
@@ -151,6 +154,10 @@ static void handle_tx(struct vhost_net *net)
hdr_size = vq->vhost_hlen;
 
for (;;) {
+   /* Release DMAs done buffers first */
+   if (sock_flag(sock->sk, SOCK_ZEROCOPY))
+   vhost_zerocopy_signal_used(vq);
+
head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
 ARRAY_SIZE(vq->iov),
 &out, &in,
@@ -166,6 +173,12 @@ static void handle_tx(struct vhost_net *net)
set_bit(SOCK_ASYNC_NOSPACE, &sock->flags);
break;
}
+   /* If more outstanding DMAs, queue the work */
+   if (sock_flag(sock->sk, SOCK_ZEROCOPY) &&
+   (atomic_read(&vq->refcnt) > MAX_ZEROCOPY_PEND)) {
+   vhost_poll_queue(&vq->poll);
+   break;
+   }
if (unlikely(vhost_enable_notify(vq))) {
vhost_disable_notify(vq);
continue;
@@ -188,17 +201,30 @@ static void handle_tx(struct vhost_net *net)
   iov_length(vq->hdr, s), hdr_size);
break;
}
+   /* use msg_control to pass vhost zerocopy ubuf info to skb */
+   if (sock_flag(sock->sk, SOCK_ZEROCOPY)) {
+   pend.callback = vhost_zerocopy_callback;
+   pend.arg = vq;
+   pend.desc = vq->upend_idx;
+   msg.msg_control = &pend;
+   msg.msg_controllen = sizeof(pend);
+   vq->heads[vq->upend_idx].id = head;
+   vq->upend_idx = (vq->upend_idx + 1) % UIO_MAXIOV;
+   atomic_inc(&vq->refcnt);
+   }
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(NULL, sock, &msg, len);
if (unlikely(err < 0)) {
-   vhost_discard_vq_desc(vq, 1);
+   if (!sock_flag(sock->sk, SOCK_ZEROCOPY))
+   vhost_discard_vq_desc(vq, 1);
tx_poll_start(net, sock);
break;
}
if (err != len)
pr_debug("Truncated TX packet: "
 " len %d != %zd\n", err, len);
-   vhost_add_used_and_signal(&net->dev, vq, head, 0);
+   if (!sock_flag(sock->sk, SOCK_ZEROCOPY))
+   vhost_add_used_and_signal(&net->dev, vq, head, 0);
total_len += len;
if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2ab2912..09bcb1d 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -174,6 +174,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
vq->call_ctx = NULL;
vq->call = NULL;
vq->log_ctx = NULL;
+   vq->upend_idx = 0;
+   vq->done_idx = 0;
+   atomic_set(&vq->refcnt, 0);
 }
 
 static int vhost_worker(void *data)
@@ -230,7 +233,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
  

Re: [PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Ben Hutchings
On Wed, 2011-04-20 at 12:44 -0700, Shirley Ma wrote:
> This zerocopy flag is used to support device DMA userspace buffers.
> 
> Signed-off-by: Shirley Ma 
> ---
> 
>  include/linux/netdevice.h |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0249fe7..0998d3d 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1067,6 +1067,9 @@ struct net_device {
>  #define NETIF_F_RXHASH   (1 << 28) /* Receive hashing offload */
>  #define NETIF_F_RXCSUM   (1 << 29) /* Receive checksumming 
> offload */
>  
> +/* bit 29 is for device to map userspace buffers -- zerocopy */
> +#define NETIF_F_ZEROCOPY (1 << 29)

Look above.

Ben.

>   /* Segmentation offload features */
>  #define NETIF_F_GSO_SHIFT16
>  #define NETIF_F_GSO_MASK 0x00ff
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 2/8] Add a new zerocopy device flag

2011-04-20 Thread Shirley Ma
This zerocopy flag is used to support device DMA userspace buffers.

Signed-off-by: Shirley Ma 
---

 include/linux/netdevice.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0249fe7..0998d3d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1067,6 +1067,9 @@ struct net_device {
 #define NETIF_F_RXHASH (1 << 28) /* Receive hashing offload */
 #define NETIF_F_RXCSUM (1 << 29) /* Receive checksumming offload */
 
+/* bit 29 is for device to map userspace buffers -- zerocopy */
+#define NETIF_F_ZEROCOPY   (1 << 29)
+
/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT  16
 #define NETIF_F_GSO_MASK   0x00ff


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 1/8] Add a new sock zerocopy flag

2011-04-20 Thread Shirley Ma
This sock zerocopy flag is used to support lower level device DMA 
userspace buffers.

Signed-off-by: Shirley Ma 
---

 include/net/sock.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 01810a3..daa0a80 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -562,6 +562,7 @@ enum sock_flags {
SOCK_TIMESTAMPING_SYS_HARDWARE, /* %SOF_TIMESTAMPING_SYS_HARDWARE */
SOCK_FASYNC, /* fasync() active */
SOCK_RXQ_OVFL,
+   SOCK_ZEROCOPY,
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 0/8] macvtap/vhost TX zero copy support

2011-04-20 Thread Shirley Ma
This patchset add supports for TX zero-copy between guest and host
kernel through vhost. It significantly reduces CPU utilization on the
local host on which the guest is located (It reduced 30-50% CPU usage
for vhost thread for single stream test). The patchset is based on
previous submission and comments from the community regarding when/how
to handle guest kernel buffers to be released. This is the simplest
approach I can think of after comparing with several other solutions.

This patchset includes:

1/8: Add a new sock zero-copy flag, SOCK_ZEROCOPY;

2/8: Add a new device flag, NETIF_F_ZEROCOPY for lower level device
support zero-copy;

3/8: Add a new struct skb_ubuf_info in skb_share_info for userspace
buffers release callback when lower device DMA has done for that skb;

4/8: Add vhost zero-copy callback in vhost when skb last refcnt is gone;
add vhost_zerocopy_add_used_and_signal to notify guest to release TX skb
buffers.

5/8: Add macvtap zero-copy in lower device when sending packet is
greater than 128 bytes.

6/8: Add Chelsio 10Gb NIC to zero copy feature flag

7/8: Add Intel 10Gb NIC zero copy feature flag

8/8: Add Emulex 10Gb NIC zero copy feature flag

The patchset is built against most recent linux 2.6.git. It has passed
netperf/netserver multiple streams stress test on above NICs.

The single stream test results from 2.6.37 kernel on Chelsio:

64K message size: copy_from_user dropped from 40% to 5%; vhost thread
cpu utilization dropped from 76% to 28%

I am collecting more test results against 2.6.39-rc3 kernel and will
provide the test matrix later.

Thanks
Shirley


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel

2011-04-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=33762


Anton Kochkov  changed:

   What|Removed |Added

 Kernel Version||2.6.38




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel

2011-04-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=33762





--- Comment #3 from Anton Kochkov   2011-04-20 
16:40:49 ---
Created an attachment (id=54832)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=54832)
Dmesg output

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel

2011-04-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=33762





--- Comment #2 from Anton Kochkov   2011-04-20 
16:38:36 ---
http://lists.nongnu.org/archive/html/qemu-devel/2011-04/msg01547.html

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 33762] Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel

2011-04-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=33762





--- Comment #1 from Anton Kochkov   2011-04-20 
16:38:17 ---
Additional discussion in qemu-devel mailing list

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 33762] New: Qemu-kvm infinite loop on hardened (Grsecurity/PaX) kernel

2011-04-20 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=33762

   Summary: Qemu-kvm infinite loop on hardened (Grsecurity/PaX)
kernel
   Product: Virtualization
   Version: unspecified
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: anton.koch...@gmail.com
Regression: No


Created an attachment (id=54822)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=54822)
Kernel CONFIG

I'm using 2.6.38 kernel sources with grsecurity/PaX patches on Gentoo Hardened
linux on Intel iCore7 x64 host. Example guest is Debian-6.0-amd64.

Grecurity -> Security level -> Virtualization enabled

starting qemu as qemu-kvm -net tap,ifname=tap1,script=no -net nic -monitor
stdio -m 256 -d cpu,in_asm,exec -s -boot d -cdrom debian-minimal.iso -hda
debian.qcow2

(qemu) info kvm
kvm support: enabled
(qemu) info cpus
* CPU #0: pc=0x0010017c (halted) thread_id=4688 
(qemu) info pci
  Bus  0, device   0, function 0:
Host bridge: PCI device 8086:1237
  id ""
  Bus  0, device   1, function 0:
ISA bridge: PCI device 8086:7000
  id ""
  Bus  0, device   1, function 1:
IDE controller: PCI device 8086:7010
  BAR4: I/O at 0xc000 [0xc00f].
  id ""
  Bus  0, device   1, function 3:
Bridge: PCI device 8086:7113
  IRQ 9.
  id ""
  Bus  0, device   2, function 0:
VGA controller: PCI device 1013:00b8
  BAR0: 32 bit prefetchable memory at 0xf000 [0xf1ff].
  BAR1: 32 bit memory at 0xf200 [0xf2000fff].
  BAR6: 32 bit memory at 0x [0xfffe].
  id ""
(qemu) info status
VM status: running
(qemu) info roms
fw=genroms/vapic.bin size=0x002400 name="vapic.bin" 
addr=fffe size=0x02 mem=rom name="bios.bin" 
(qemu) info registers
EAX= EBX=00187130 ECX=00187130 EDX=
ESI= EDI= EBP= ESP=0ffcfeac
EIP=0010017c EFL=0246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=1
ES =0028   00c09300 DPL=0 DS   [-WA]
CS =0020   00c09b00 DPL=0 CS32 [-RA]
SS =0028   00c09300 DPL=0 DS   [-WA]
DS =0028   00c09300 DPL=0 DS   [-WA]
FS =   
GS =   
LDT=   
TR =0008 0580 0067 8b00 DPL=0 TSS32-busy
GDT= ab80 002f
IDT= 30b8 07ff
CR0=0013 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3= 
DR6=0ff0 DR7=0400
EFER=
FCW=037f FSW=0020 [ST=0] FTW=00 MXCSR=1f80
FPR0=f44d002c6000 400d FPR1=80847fe7 400e
FPR2=fa007fa24000 400e FPR3=80e88055f000 400e
FPR4=ea61009c4000 400d FPR5=ea62009c4000 400c
FPR6=bb7fffb9b000 400b FPR7=bb83ffb9b000 400b
XMM00= XMM01=
XMM02= XMM03=
XMM04= XMM05=
XMM06= XMM07=

My emerge --info:
app-shells/bash: 4.2_p8
dev-lang/python: 2.7.1-r1, 3.1.3-r1
dev-util/cmake:  2.8.4
sys-apps/baselayout: 2.0.2
sys-apps/openrc: 0.8.1
sys-apps/sandbox:2.5
sys-devel/autoconf:  2.68
sys-devel/automake:  1.11.1-r1
sys-devel/binutils:  2.21
sys-devel/gcc:   4.5.2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.4-r1
sys-devel/make:  3.82
sys-kernel/linux-headers: 2.6.38
virtual/os-headers:  2.6.38 (sys-kernel/linux-headers)
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=core2 -mtune=generic -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf
/etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/
/etc/php/cgi-php5.3/ext-active/ /etc/php/cli-php5.3/ext-active/
/etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=core2 -mtune=generic -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="assume-digests binpkg-logs distlocks fixlafiles fixpackages news
parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn
unmerge-logs unmerge-orphans userfetch"
FFLAGS=""
GENTOO_MIRRORS="ftp://rush.tisys.org/pub/gentoo/";
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j9"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress
--force --whole-file --delete --stats --timeout=180 --exclude=/distfiles
--exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rush.tisys.org/gentoo-porta

Re: [PATCH] kvm tools: Add read-only support for QCOW2 images

2011-04-20 Thread Pekka Enberg

On Tue, 19 Apr 2011, Prasad Joshi wrote:


On Tue, Apr 19, 2011 at 10:07 PM, Pekka Enberg  wrote:
  This patch extends the QCOW1 format to also support QCOW2 images as 
specified
  by the following document:

   http://people.gnome.org/~markmc/qcow-image-format.html

  Cc: Asias He 
  Cc: Cyrill Gorcunov 
  Cc: Prasad Joshi 
  Cc: Sasha Levin 
  Cc: Ingo Molnar 
  Signed-off-by: Pekka Enberg 
  ---
   tools/kvm/include/kvm/qcow.h |   42 ++-
   tools/kvm/qcow.c             |  177 
+-
   2 files changed, 181 insertions(+), 38 deletions(-)

  diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h
  index 4be2597..afd776d 100644
  --- a/tools/kvm/include/kvm/qcow.h
  +++ b/tools/kvm/include/kvm/qcow.h
  @@ -4,9 +4,17 @@
   #include 

   #define QCOW_MAGIC             (('Q' << 24) | ('F' << 16) | ('I' << 8) | 
0xfb)
  +
   #define QCOW1_VERSION          1
  +#define QCOW2_VERSION          2
  +
  +#define QCOW1_OFLAG_COMPRESSED (1LL << 63)
  +
  +#define QCOW1_OFLAG_MASK       QCOW1_OFLAG_COMPRESSED

  -#define QCOW_OFLAG_COMPRESSED  (1LL << 63)
  +#define QCOW2_OFLAG_COPIED     (1LL << 63)
  +#define QCOW2_OFLAG_COMPRESSED (1LL << 62)
  +#define QCOW2_OFLAG_MASK       
(QCOW2_OFLAG_COPIED|QCOW2_OFLAG_COMPRESSED)

   struct qcow_table {
         u32                     table_size;
  @@ -19,7 +27,16 @@ struct qcow {
         int                     fd;
   };

  -struct qcow1_header {
  +struct qcow_header {
  +       u64                     size; /* in bytes */
  +       u64                     l1_table_offset;
  +       u32                     l1_size;
  +       u8                      cluster_bits;
  +       u8                      l2_bits;
  +       uint64_t                oflag_mask;
  +};
  +
  +struct qcow1_header_disk {
         u32                     magic;
         u32                     version;

  @@ -36,6 +53,27 @@ struct qcow1_header {
         u64                     l1_table_offset;
   };

  +struct qcow2_header_disk {
  +       u32                     magic;
  +       u32                     version;
  +
  +       u64                     backing_file_offset;
  +       u32                     backing_file_size;
  +
  +       u32                     cluster_bits;
  +       u64                     size; /* in bytes */
  +       u32                     crypt_method;
  +
  +       u32                     l1_size;
  +       u64                     l1_table_offset;
  +
  +       u64                     refcount_table_offset;
  +       u32                     refcount_table_clusters;
  +
  +       u32                     nb_snapshots;
  +       u64                     snapshots_offset;
  +};

IMHO, as we start adding other features of QCOW, the two structures 
qcow2_header_disk and qcow_header might eventually become the same. 


No, the point of 'struct qcow2_header_disk' is to map to the on-disk 
representation. 'struct qcow_header' is the in-memory version of the data.



  +       disk_image = disk_image__new(fd, h->size, &qcow1_disk_ops);


qcow1_disk_ops can be changed to qcow_disk_ops.


Sure, there's more qcow1 prefixes that need fixing now as well.

[PATCH] kvm tools: Add missing space before root= option

2011-04-20 Thread Cyrill Gorcunov
If user pases own options we need an extra space otherwise
options get joined.

Signed-off-by: Cyrill Gorcunov 
---
 tools/kvm/kvm-run.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.git/tools/kvm/kvm-run.c
=
--- linux-2.6.git.orig/tools/kvm/kvm-run.c
+++ linux-2.6.git/tools/kvm/kvm-run.c
@@ -383,7 +383,7 @@ int kvm_cmd_run(int argc, const char **a
}

if (!strstr(real_cmdline, "root="))
-   strlcat(real_cmdline, "root=/dev/vda rw ", 
sizeof(real_cmdline));
+   strlcat(real_cmdline, " root=/dev/vda rw ", 
sizeof(real_cmdline));

if (image_filename) {
kvm->disk_image = disk_image__open(image_filename, 
readonly_image);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: Use pci_store/load_saved_state() around VM device usage

2011-04-20 Thread Avi Kivity

On 04/20/2011 06:13 PM, Alex Williamson wrote:

>
>  > This is also why I changed the
>  >  __pci_reset_function() back to a normal pci_reset_function(), so we're
>  >  never left with an uninitialized device like we are now.
>  >
>  >  We could be more verbose or return an error here, but we've gone for a
>  >  long time not even doing this save/restore across VM usage, so I don't
>  >  think it's worthy of preventing the device attachment if it fails.
>
>  At least a log?

Ok, I'm not sure what corrective action a user would take or what they
should expect not to work, but I guess a KERN_DEBUG printk is
reasonable.


"X didn't work" vs "X didn't work and I got this in the log"


>  Note avoiding the pointer would have removed the problem altogether.

Returning a struct on store?  We lose any kind of opacity that way since
the caller needs to know about the struct then.  I thought the pointer
makes it clear the caller shouldn't be touching the contents, but if you
think it's a better way to go, I can try it.  Thanks,


Avoid the allocation altogether.  Having the caller be responsible for 
storage (in our case, embed the struct instead of the pointer).


You can encrypt the contents using the TPM, or maybe a comment 
indicating that the contents should suffice.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: Use pci_store/load_saved_state() around VM device usage

2011-04-20 Thread Alex Williamson
On Wed, 2011-04-20 at 10:19 +0300, Avi Kivity wrote:
> On 04/18/2011 10:43 PM, Alex Williamson wrote:
> > On Sun, 2011-04-17 at 12:25 +0300, Avi Kivity wrote:
> > >  On 04/15/2011 10:54 PM, Alex Williamson wrote:
> > >  >  Store the device saved state so that we can reload the device back
> > >  >  to the original state when it's unassigned.  This has the benefit
> > >  >  that the state survives across pci_reset_function() calls via
> > >  >  the PCI sysfs reset interface while the VM is using the device.
> > >
> > >  >  @@ -516,7 +518,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm 
> > > *kvm,
> > >  >
> > >  >pci_reset_function(dev);
> > >  >pci_save_state(dev);
> > >  >  -
> > >  >  + match->pci_saved_state = pci_store_saved_state(dev);
> > >  >match->assigned_dev_id = assigned_dev->assigned_dev_id;
> > >
> > >  Error check?
> > >
> > >  It might be better to give up the opacity of the data structure and make
> > >  pci_saved_state the full struct, not a pointer.
> >
> > pci_store_saved_state() returns NULL on error, which is correctly
> > handled if we pass NULL to pci_load_saved_state() or a pointer to NULL
> > to pci_load_and_free_saved_state().
> 
> But we silently swallow an error, this isn't good.
> 
> >This is also why I changed the
> > __pci_reset_function() back to a normal pci_reset_function(), so we're
> > never left with an uninitialized device like we are now.
> >
> > We could be more verbose or return an error here, but we've gone for a
> > long time not even doing this save/restore across VM usage, so I don't
> > think it's worthy of preventing the device attachment if it fails.
> 
> At least a log?

Ok, I'm not sure what corrective action a user would take or what they
should expect not to work, but I guess a KERN_DEBUG printk is
reasonable.

> Note avoiding the pointer would have removed the problem altogether.

Returning a struct on store?  We lose any kind of opacity that way since
the caller needs to know about the struct then.  I thought the pointer
makes it clear the caller shouldn't be touching the contents, but if you
think it's a better way to go, I can try it.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Windows XP, Tablet, vnc and mouse clicks

2011-04-20 Thread Boris Dolgov
Hello!

I am facing a strange problem.

I use qemu-kvm-0.13 under Fedora 14. There is windows xp installed in
the virtual machine.
But some mouse clicks are not delivered to the virtual machine.
When I do a quick click (press the mouse button and then momentally
release it), it can be not delivered.
If I do a long click (press the button, then wait a bit and release),
it is delivered all the time.

qemu startup command is standart:
/usr/bin/qemu-kvm -enable-kvm -name VM2 -nographic -vnc 0.0.0.0:2 -vga
std -m 512 -smp 2 -boot c -pidfile /home/vms/run/2.pid -monitor
unix:/home/vms/run/2.monitor,server,nowait -serial
unix:/home/vms/run/2.serial,server,nowait -net
nic,vlan=0,macaddr=FE:E1:DE:AD:00:11,model=virtio -net
tap,vlan=0,ifname=tap_2_0,script=/home/vms/ifup,downscript=/home/vms/ifdown
-drive media=disk,if=virtio,index=0,file=/dev/sdb,cache=none,boot=on
-usb -usbdevice tablet

Operating system is Windows XP SP3 with virtio drivers for network and
block devices.

What is the problem?

-- 
Boris Dolgov.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] QEMU-KVM and hardened (GRSEC/PaX) kernel

2011-04-20 Thread Avi Kivity

On 04/17/2011 01:45 AM, Антон Кочков wrote:

Good day!
I'm trying to make working qemu-kvm with hardened gentoo on hardened kernel.
When i'm using CONFIG_PAX_KERNPAGEXEC and CONFIG_PAX_MEM_UNDEREF qemu just start
and go to infinite loop and take 100% of one of my CPU core. adn it
even can't be killed.
Also it is dont give answer for qemu monitor/remote gdb.
When I'm changed these two values as disabled, qemu-kvm now start, and
stop (i mean qemu monitor show that virtual machine is running, but no
any activity/output). Also it's load about 0%.
See details in bug http://bugs.gentoo.org/show_bug.cgi?id=363713

Hope this info help improve qemu-kvm.



As Blue says, the problem is likely in kvm, not qemu.

Please try:
- hardened guest on soft host (I expect this to work)
- soft guest on hardened host (I expect this to fail).

Are you using an Intel or AMD host?

Note virtualization hardware will play with segmentation and defeat all 
those games the hardened kernel plays.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too

2011-04-20 Thread Roedel, Joerg
On Wed, Apr 20, 2011 at 07:18:12AM -0400, Avi Kivity wrote:
> On 04/20/2011 02:06 PM, Roedel, Joerg wrote:

> > The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
> > fix soon.
> 
> Thanks.

Here is a fix for review. I am out-of-office starting in nearly one hour
until next Tuesday. So the corrections will most likely not happen
before :)
The patch ist tested with npt and shadow paging as well as with
npt-on-npt (64 bit wit kvm).

Regards,

Joerg

>From 6b1dcd9f17bbd482061180001d1f45c3adcef430 Mon Sep 17 00:00:00 2001
From: Joerg Roedel 
Date: Wed, 20 Apr 2011 15:22:21 +0200
Subject: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too

This patch makes the cmpxchg_gpte() function aware of the
difference between l1-gfns and l2-gfns when nested
virtualization is in use. This fixes a potential
data-corruption problem in the l1-guest and makes the code
work correct (at least as correct as the hardware which is
emulated in this code) again.

Cc: sta...@kernel.org
Signed-off-by: Joerg Roedel 
---
 arch/x86/kvm/paging_tmpl.h |   30 +++---
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 74f8567..e442bf4 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -78,15 +78,21 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
 }
 
-static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
+static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 gfn_t table_gfn, unsigned index,
 pt_element_t orig_pte, pt_element_t new_pte)
 {
pt_element_t ret;
pt_element_t *table;
struct page *page;
+   gpa_t gpa;
 
-   page = gfn_to_page(kvm, table_gfn);
+   gpa = mmu->translate_gpa(vcpu, table_gfn << PAGE_SHIFT,
+PFERR_USER_MASK|PFERR_WRITE_MASK);
+   if (gpa == UNMAPPED_GVA)
+   return -EFAULT;
+
+   page = gfn_to_page(vcpu->kvm, gpa_to_gfn(gpa));
 
table = kmap_atomic(page, KM_USER0);
ret = CMPXCHG(&table[index], orig_pte, new_pte);
@@ -192,11 +198,17 @@ walk:
 #endif
 
if (!eperm && !rsvd_fault && !(pte & PT_ACCESSED_MASK)) {
+   int ret;
trace_kvm_mmu_set_accessed_bit(table_gfn, index,
   sizeof(pte));
-   if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
-   index, pte, pte|PT_ACCESSED_MASK))
+   ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn,
+   index, pte, pte|PT_ACCESSED_MASK);
+   if (ret < 0) {
+   present = false;
+   break;
+   } else if (ret)
goto walk;
+
mark_page_dirty(vcpu->kvm, table_gfn);
pte |= PT_ACCESSED_MASK;
}
@@ -245,13 +257,17 @@ walk:
goto error;
 
if (write_fault && !is_dirty_gpte(pte)) {
-   bool ret;
+   int ret;
 
trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
-   ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
+   ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte,
pte|PT_DIRTY_MASK);
-   if (ret)
+   if (ret < 0) {
+   present = false;
+   goto error;
+   } if (ret)
goto walk;
+
mark_page_dirty(vcpu->kvm, table_gfn);
pte |= PT_DIRTY_MASK;
walker->ptes[walker->level - 1] = pte;
-- 
1.7.1



-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/16] KVM: x86 emulator: drop vcpu argument from intercept callback

2011-04-20 Thread Avi Kivity
Making the emulator caller agnostic.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |2 +-
 arch/x86/kvm/emulate.c |2 +-
 arch/x86/kvm/x86.c |4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 2c02e75..e2b082a 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -185,7 +185,7 @@ struct x86_emulate_ops {
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 
*pdata);
void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */
void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */
-   int (*intercept)(struct kvm_vcpu *vcpu,
+   int (*intercept)(struct x86_emulate_ctxt *ctxt,
 struct x86_instruction_info *info,
 enum x86_intercept_stage stage);
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 57c730b..55ca5a5 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -438,7 +438,7 @@ static int emulator_check_intercept(struct x86_emulate_ctxt 
*ctxt,
.next_rip   = ctxt->eip,
};
 
-   return ctxt->ops->intercept(ctxt->vcpu, &info, stage);
+   return ctxt->ops->intercept(ctxt, &info, stage);
 }
 
 static inline unsigned long ad_mask(struct decode_cache *c)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 16373a5..4f7248e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4365,11 +4365,11 @@ static void emulator_put_fpu(struct x86_emulate_ctxt 
*ctxt)
preempt_enable();
 }
 
-static int emulator_intercept(struct kvm_vcpu *vcpu,
+static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
  struct x86_instruction_info *info,
  enum x86_intercept_stage stage)
 {
-   return kvm_x86_ops->check_intercept(vcpu, info, stage);
+   return kvm_x86_ops->check_intercept(emul_to_vcpu(ctxt), info, stage);
 }
 
 static struct x86_emulate_ops emulate_ops = {
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/16] KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks

2011-04-20 Thread Avi Kivity
Making the emulator caller agnostic.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |   14 +++---
 arch/x86/kvm/emulate.c |   84 ++--
 arch/x86/kvm/x86.c |   34 ++
 3 files changed, 73 insertions(+), 59 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 656046a..2c02e75 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,13 +176,13 @@ struct x86_emulate_ops {
 int seg);
void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
void (*get_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
-   ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
-   int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
-   int (*cpl)(struct kvm_vcpu *vcpu);
-   int (*get_dr)(int dr, unsigned long *dest, struct kvm_vcpu *vcpu);
-   int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
-   int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
-   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
+   ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
+   int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
+   int (*cpl)(struct x86_emulate_ctxt *ctxt);
+   int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
+   int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
+   int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
+   int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 
*pdata);
void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */
void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */
int (*intercept)(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index d1e0a1b..57c730b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -596,7 +596,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
if (addr.ea > lim || (u32)(addr.ea + size - 1) > lim)
goto bad;
}
-   cpl = ctxt->ops->cpl(ctxt->vcpu);
+   cpl = ctxt->ops->cpl(ctxt);
rpl = ctxt->ops->get_segment_selector(ctxt, addr.seg) & 3;
cpl = max(cpl, rpl);
if (!(desc.type & 8)) {
@@ -1248,7 +1248,7 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
 
rpl = selector & 3;
dpl = seg_desc.dpl;
-   cpl = ops->cpl(ctxt->vcpu);
+   cpl = ops->cpl(ctxt);
 
switch (seg) {
case VCPU_SREG_SS:
@@ -1407,7 +1407,7 @@ static int emulate_popf(struct x86_emulate_ctxt *ctxt,
int rc;
unsigned long val, change_mask;
int iopl = (ctxt->eflags & X86_EFLAGS_IOPL) >> IOPL_SHIFT;
-   int cpl = ops->cpl(ctxt->vcpu);
+   int cpl = ops->cpl(ctxt);
 
rc = emulate_pop(ctxt, ops, &val, len);
if (rc != X86EMUL_CONTINUE)
@@ -1852,7 +1852,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
setup_syscalls_segments(ctxt, ops, &cs, &ss);
 
-   ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+   ops->get_msr(ctxt, MSR_STAR, &msr_data);
msr_data >>= 32;
cs_sel = (u16)(msr_data & 0xfffc);
ss_sel = (u16)(msr_data + 8);
@@ -1871,17 +1871,17 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 #ifdef CONFIG_X86_64
c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF;
 
-   ops->get_msr(ctxt->vcpu,
+   ops->get_msr(ctxt,
 ctxt->mode == X86EMUL_MODE_PROT64 ?
 MSR_LSTAR : MSR_CSTAR, &msr_data);
c->eip = msr_data;
 
-   ops->get_msr(ctxt->vcpu, MSR_SYSCALL_MASK, &msr_data);
+   ops->get_msr(ctxt, MSR_SYSCALL_MASK, &msr_data);
ctxt->eflags &= ~(msr_data | EFLG_RF);
 #endif
} else {
/* legacy mode */
-   ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
+   ops->get_msr(ctxt, MSR_STAR, &msr_data);
c->eip = (u32)msr_data;
 
ctxt->eflags &= ~(EFLG_VM | EFLG_IF | EFLG_RF);
@@ -1910,7 +1910,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
 
setup_syscalls_segments(ctxt, ops, &cs, &ss);
 
-   ops->get_msr(ctxt->vcpu, MSR_IA32_SYSENTER_CS, &msr_data);
+   ops->get_msr(ctxt, MSR_IA32_SYSENTER_CS, &msr_data);
switch (ctxt->mode) {
case X86EMUL_MODE_PROT32:
if ((msr_data & 0xfffc) == 0x0)
@@ -1938,10 +1938,10 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
ops->set_cached_descriptor(ctxt, &ss, 0, VCPU_S

[PATCH 03/16] KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks

2011-04-20 Thread Avi Kivity
Making the emulator caller agnostic.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |   22 ---
 arch/x86/kvm/emulate.c |  112 ++--
 arch/x86/kvm/x86.c |   39 +++--
 3 files changed, 90 insertions(+), 83 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 1348bdf..656046a 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -163,15 +163,19 @@ struct x86_emulate_ops {
int size, unsigned short port, const void *val,
unsigned int count);
 
-   bool (*get_cached_descriptor)(struct desc_struct *desc, u32 *base3,
- int seg, struct kvm_vcpu *vcpu);
-   void (*set_cached_descriptor)(struct desc_struct *desc, u32 base3,
- int seg, struct kvm_vcpu *vcpu);
-   u16 (*get_segment_selector)(int seg, struct kvm_vcpu *vcpu);
-   void (*set_segment_selector)(u16 sel, int seg, struct kvm_vcpu *vcpu);
-   unsigned long (*get_cached_segment_base)(int seg, struct kvm_vcpu 
*vcpu);
-   void (*get_gdt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
-   void (*get_idt)(struct desc_ptr *dt, struct kvm_vcpu *vcpu);
+   bool (*get_cached_descriptor)(struct x86_emulate_ctxt *ctxt,
+ struct desc_struct *desc, u32 *base3,
+ int seg);
+   void (*set_cached_descriptor)(struct x86_emulate_ctxt *ctxt,
+ struct desc_struct *desc, u32 base3,
+ int seg);
+   u16 (*get_segment_selector)(struct x86_emulate_ctxt *ctxt, int seg);
+   void (*set_segment_selector)(struct x86_emulate_ctxt *ctxt,
+u16 sel, int seg);
+   unsigned long (*get_cached_segment_base)(struct x86_emulate_ctxt *ctxt,
+int seg);
+   void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
+   void (*get_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(int cr, struct kvm_vcpu *vcpu);
int (*set_cr)(int cr, ulong val, struct kvm_vcpu *vcpu);
int (*cpl)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8af08a1..d1e0a1b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -495,7 +495,7 @@ static unsigned long seg_base(struct x86_emulate_ctxt *ctxt,
if (ctxt->mode == X86EMUL_MODE_PROT64 && seg < VCPU_SREG_FS)
return 0;
 
-   return ops->get_cached_segment_base(seg, ctxt->vcpu);
+   return ops->get_cached_segment_base(ctxt, seg);
 }
 
 static unsigned seg_override(struct x86_emulate_ctxt *ctxt,
@@ -573,8 +573,8 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
return emulate_gp(ctxt, 0);
break;
default:
-   usable = ctxt->ops->get_cached_descriptor(&desc, NULL, addr.seg,
- ctxt->vcpu);
+   usable = ctxt->ops->get_cached_descriptor(ctxt, &desc, NULL,
+ addr.seg);
if (!usable)
goto bad;
/* code segment or read-only data segment */
@@ -597,7 +597,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
goto bad;
}
cpl = ctxt->ops->cpl(ctxt->vcpu);
-   rpl = ctxt->ops->get_segment_selector(addr.seg, ctxt->vcpu) & 3;
+   rpl = ctxt->ops->get_segment_selector(ctxt, addr.seg) & 3;
cpl = max(cpl, rpl);
if (!(desc.type & 8)) {
/* data segment */
@@ -1142,14 +1142,14 @@ static void get_descriptor_table_ptr(struct 
x86_emulate_ctxt *ctxt,
if (selector & 1 << 2) {
struct desc_struct desc;
memset (dt, 0, sizeof *dt);
-   if (!ops->get_cached_descriptor(&desc, NULL, VCPU_SREG_LDTR,
-   ctxt->vcpu))
+   if (!ops->get_cached_descriptor(ctxt, &desc, NULL,
+   VCPU_SREG_LDTR));
return;
 
dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? 
*/
dt->address = get_desc_base(&desc);
} else
-   ops->get_gdt(dt, ctxt->vcpu);
+   ops->get_gdt(ctxt, dt);
 }
 
 /* allowed just for 8 bytes segments */
@@ -1304,8 +1304,8 @@ static int load_segment_descriptor(struct 
x86_emulate_ctxt *ctxt,
return ret;
}
 load:
-   ops->set_segment_selector(selector, seg, ctxt->vcpu);
-   ops->set_cached_descriptor(&seg_des

[PATCH 12/16] KVM: x86 emulator: add new ->halt() callback

2011-04-20 Thread Avi Kivity
Instead of reaching into vcpu internals.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |2 +-
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index f890769..d30f1e9 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -186,6 +186,7 @@ struct x86_emulate_ops {
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 
*pdata);
+   void (*halt)(struct x86_emulate_ctxt *ctxt);
void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */
void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */
int (*intercept)(struct x86_emulate_ctxt *ctxt,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6fca45f..a2a5008 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3913,7 +3913,7 @@ special_insn:
c->dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xf4:  /* hlt */
-   ctxt->vcpu->arch.halt_request = 1;
+   ctxt->ops->halt(ctxt);
break;
case 0xf5:  /* cmc */
/* complement carry flag from eflags reg */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8af49b3..2246cf1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4351,6 +4351,11 @@ static int emulator_set_msr(struct x86_emulate_ctxt 
*ctxt,
return kvm_set_msr(emul_to_vcpu(ctxt), msr_index, data);
 }
 
+static void emulator_halt(struct x86_emulate_ctxt *ctxt)
+{
+   emul_to_vcpu(ctxt)->arch.halt_request = 1;
+}
+
 static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
 {
preempt_disable();
@@ -4400,6 +4405,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_dr  = emulator_set_dr,
.set_msr = emulator_set_msr,
.get_msr = emulator_get_msr,
+   .halt= emulator_halt,
.get_fpu = emulator_get_fpu,
.put_fpu = emulator_put_fpu,
.intercept   = emulator_intercept,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/16] KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks

2011-04-20 Thread Avi Kivity
Unneeded for register access.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 55ca5a5..8020f1b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2720,7 +2720,7 @@ static int check_svme(struct x86_emulate_ctxt *ctxt)
 
 static int check_svme_pa(struct x86_emulate_ctxt *ctxt)
 {
-   u64 rax = kvm_register_read(ctxt->vcpu, VCPU_REGS_RAX);
+   u64 rax = ctxt->decode.regs[VCPU_REGS_RAX];
 
/* Valid physical address? */
if (rax & 0x)
@@ -2742,7 +2742,7 @@ static int check_rdtsc(struct x86_emulate_ctxt *ctxt)
 static int check_rdpmc(struct x86_emulate_ctxt *ctxt)
 {
u64 cr4 = ctxt->ops->get_cr(ctxt, 4);
-   u64 rcx = kvm_register_read(ctxt->vcpu, VCPU_REGS_RCX);
+   u64 rcx = ctxt->decode.regs[VCPU_REGS_RCX];
 
if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) ||
(rcx > 3))
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/16] KVM: x86 emulator: emulate CLTS internally

2011-04-20 Thread Avi Kivity
Avoid using ctxt->vcpu; we can do everything with ->get_cr() and ->set_cr().

A side effect is that we no longer activate the fpu on emulated CLTS; but that
should be very rare.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_host.h |1 -
 arch/x86/kvm/emulate.c  |   12 +++-
 arch/x86/kvm/x86.c  |7 ---
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a8616ca..9c3567e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -691,7 +691,6 @@ int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, 
unsigned short port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
-int emulate_clts(struct kvm_vcpu *vcpu);
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dc495a0..91c4a14 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2579,6 +2579,16 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
 }
 
+static int em_clts(struct x86_emulate_ctxt *ctxt)
+{
+   ulong cr0;
+
+   cr0 = ctxt->ops->get_cr(ctxt, 0);
+   cr0 &= ~X86_CR0_TS;
+   ctxt->ops->set_cr(ctxt, 0, cr0);
+   return X86EMUL_CONTINUE;
+}
+
 static bool valid_cr(int nr)
 {
switch (nr) {
@@ -4079,7 +4089,7 @@ twobyte_insn:
rc = emulate_syscall(ctxt, ops);
break;
case 0x06:
-   emulate_clts(ctxt->vcpu);
+   rc = em_clts(ctxt);
break;
case 0x09:  /* wbinvd */
kvm_emulate_wbinvd(ctxt->vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7cd3a3b..a9e8386 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4153,13 +4153,6 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd);
 
-int emulate_clts(struct kvm_vcpu *vcpu)
-{
-   kvm_x86_ops->set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS));
-   kvm_x86_ops->fpu_activate(vcpu);
-   return X86EMUL_CONTINUE;
-}
-
 int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest)
 {
return _kvm_get_dr(emul_to_vcpu(ctxt), dr, dest);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/16] KVM: x86 emulator: add new ->wbinvd() callback

2011-04-20 Thread Avi Kivity
Instead of calling kvm_emulate_wbinvd() directly.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |2 +-
 arch/x86/kvm/x86.c |6 ++
 3 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index d30840d..51341d6 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -187,6 +187,7 @@ struct x86_emulate_ops {
int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 
*pdata);
void (*halt)(struct x86_emulate_ctxt *ctxt);
+   void (*wbinvd)(struct x86_emulate_ctxt *ctxt);
int (*fix_hypercall)(struct x86_emulate_ctxt *ctxt);
void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */
void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a41f406..f683ce1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4092,7 +4092,7 @@ twobyte_insn:
rc = em_clts(ctxt);
break;
case 0x09:  /* wbinvd */
-   kvm_emulate_wbinvd(ctxt->vcpu);
+   ctxt->ops->wbinvd(ctxt);
break;
case 0x08:  /* invd */
case 0x0d:  /* GrpP (prefetch) */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4a2b40e..5d853d5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4154,6 +4154,11 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd);
 
+static void emulator_wbinvd(struct x86_emulate_ctxt *ctxt)
+{
+   kvm_emulate_wbinvd(emul_to_vcpu(ctxt));
+}
+
 int emulator_get_dr(struct x86_emulate_ctxt *ctxt, int dr, unsigned long *dest)
 {
return _kvm_get_dr(emul_to_vcpu(ctxt), dr, dest);
@@ -4408,6 +4413,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_msr = emulator_set_msr,
.get_msr = emulator_get_msr,
.halt= emulator_halt,
+   .wbinvd  = emulator_wbinvd,
.fix_hypercall   = emulator_fix_hypercall,
.get_fpu = emulator_get_fpu,
.put_fpu = emulator_put_fpu,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/16] KVM: x86 emulator: drop x86_emulate_ctxt::vcpu

2011-04-20 Thread Avi Kivity
No longer used.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |2 --
 arch/x86/kvm/x86.c |1 -
 2 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 51341d6..127ea3e 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -269,8 +269,6 @@ struct x86_emulate_ctxt {
struct x86_emulate_ops *ops;
 
/* Register state before/after emulation. */
-   struct kvm_vcpu *vcpu;
-
unsigned long eflags;
unsigned long eip; /* eip before instruction emulation */
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 65a5b0c..a831d5d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4463,7 +4463,6 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
 
kvm_x86_ops->get_cs_db_l_bits(vcpu, &cs_db, &cs_l);
 
-   vcpu->arch.emulate_ctxt.vcpu = vcpu;
vcpu->arch.emulate_ctxt.eflags = kvm_get_rflags(vcpu);
vcpu->arch.emulate_ctxt.eip = kvm_rip_read(vcpu);
vcpu->arch.emulate_ctxt.mode =
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/16] KVM: Avoid using x86_emulate_ctxt.vcpu

2011-04-20 Thread Avi Kivity
We can use container_of() instead.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d853d5..65a5b0c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4366,7 +4366,7 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt)
 static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
 {
preempt_disable();
-   kvm_load_guest_fpu(ctxt->vcpu);
+   kvm_load_guest_fpu(emul_to_vcpu(ctxt));
/*
 * CR0.TS may reference the host fpu state, not the guest fpu state,
 * so it may be clear at this point.
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/16] KVM: x86 emulator: add ->fix_hypercall() callback

2011-04-20 Thread Avi Kivity
Artificial, but needed to remove direct calls to KVM.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/include/asm/kvm_host.h|2 --
 arch/x86/kvm/emulate.c |4 ++--
 arch/x86/kvm/x86.c |6 +-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index d30f1e9..d30840d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -187,6 +187,7 @@ struct x86_emulate_ops {
int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 
*pdata);
void (*halt)(struct x86_emulate_ctxt *ctxt);
+   int (*fix_hypercall)(struct x86_emulate_ctxt *ctxt);
void (*get_fpu)(struct x86_emulate_ctxt *ctxt); /* disables preempt */
void (*put_fpu)(struct x86_emulate_ctxt *ctxt); /* reenables preempt */
int (*intercept)(struct x86_emulate_ctxt *ctxt,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d957d0d..6cfc1ab 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -752,8 +752,6 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, 
gva_t gva,
 
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
-int kvm_fix_hypercall(struct kvm_vcpu *vcpu);
-
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
   void *insn, int insn_len);
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a2a5008..a41f406 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4025,7 +4025,7 @@ twobyte_insn:
if (c->modrm_mod != 3 || c->modrm_rm != 1)
goto cannot_emulate;
 
-   rc = kvm_fix_hypercall(ctxt->vcpu);
+   rc = ctxt->ops->fix_hypercall(ctxt);
if (rc != X86EMUL_CONTINUE)
goto done;
 
@@ -4048,7 +4048,7 @@ twobyte_insn:
if (c->modrm_mod == 3) {
switch (c->modrm_rm) {
case 1:
-   rc = kvm_fix_hypercall(ctxt->vcpu);
+   rc = ctxt->ops->fix_hypercall(ctxt);
break;
default:
goto cannot_emulate;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2246cf1..4a2b40e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -152,6 +152,8 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 
 u64 __read_mostly host_xcr0;
 
+int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
+
 static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
 {
int i;
@@ -4406,6 +4408,7 @@ static struct x86_emulate_ops emulate_ops = {
.set_msr = emulator_set_msr,
.get_msr = emulator_get_msr,
.halt= emulator_halt,
+   .fix_hypercall   = emulator_fix_hypercall,
.get_fpu = emulator_get_fpu,
.put_fpu = emulator_put_fpu,
.intercept   = emulator_intercept,
@@ -5042,8 +5045,9 @@ out:
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
 
-int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
+int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 {
+   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
char instruction[3];
unsigned long rip = kvm_rip_read(vcpu);
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/16] KVM: x86 emulator: drop vcpu argument from memory read/write callbacks

2011-04-20 Thread Avi Kivity
Making the emulator caller agnostic.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |   34 ++
 arch/x86/kvm/emulate.c |   54 ---
 arch/x86/kvm/x86.c |   54 ++-
 3 files changed, 75 insertions(+), 67 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 9b760c8..b4d8467 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -92,8 +92,9 @@ struct x86_emulate_ops {
 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
 *  @bytes: [IN ] Number of bytes to read from memory.
 */
-   int (*read_std)(unsigned long addr, void *val,
-   unsigned int bytes, struct kvm_vcpu *vcpu,
+   int (*read_std)(struct x86_emulate_ctxt *ctxt,
+   unsigned long addr, void *val,
+   unsigned int bytes,
struct x86_exception *fault);
 
/*
@@ -103,8 +104,8 @@ struct x86_emulate_ops {
 *  @val:   [OUT] Value write to memory, zero-extended to 'u_long'.
 *  @bytes: [IN ] Number of bytes to write to memory.
 */
-   int (*write_std)(unsigned long addr, void *val,
-unsigned int bytes, struct kvm_vcpu *vcpu,
+   int (*write_std)(struct x86_emulate_ctxt *ctxt,
+unsigned long addr, void *val, unsigned int bytes,
 struct x86_exception *fault);
/*
 * fetch: Read bytes of standard (non-emulated/special) memory.
@@ -113,8 +114,8 @@ struct x86_emulate_ops {
 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
 *  @bytes: [IN ] Number of bytes to read from memory.
 */
-   int (*fetch)(unsigned long addr, void *val,
-unsigned int bytes, struct kvm_vcpu *vcpu,
+   int (*fetch)(struct x86_emulate_ctxt *ctxt,
+unsigned long addr, void *val, unsigned int bytes,
 struct x86_exception *fault);
 
/*
@@ -123,11 +124,9 @@ struct x86_emulate_ops {
 *  @val:   [OUT] Value read from memory, zero-extended to 'u_long'.
 *  @bytes: [IN ] Number of bytes to read from memory.
 */
-   int (*read_emulated)(unsigned long addr,
-void *val,
-unsigned int bytes,
-struct x86_exception *fault,
-struct kvm_vcpu *vcpu);
+   int (*read_emulated)(struct x86_emulate_ctxt *ctxt,
+unsigned long addr, void *val, unsigned int bytes,
+struct x86_exception *fault);
 
/*
 * write_emulated: Write bytes to emulated/special memory area.
@@ -136,11 +135,10 @@ struct x86_emulate_ops {
 *required).
 *  @bytes: [IN ] Number of bytes to write to memory.
 */
-   int (*write_emulated)(unsigned long addr,
- const void *val,
+   int (*write_emulated)(struct x86_emulate_ctxt *ctxt,
+ unsigned long addr, const void *val,
  unsigned int bytes,
- struct x86_exception *fault,
- struct kvm_vcpu *vcpu);
+ struct x86_exception *fault);
 
/*
 * cmpxchg_emulated: Emulate an atomic (LOCKed) CMPXCHG operation on an
@@ -150,12 +148,12 @@ struct x86_emulate_ops {
 *  @new:   [IN ] Value to write to @addr.
 *  @bytes: [IN ] Number of bytes to access using CMPXCHG.
 */
-   int (*cmpxchg_emulated)(unsigned long addr,
+   int (*cmpxchg_emulated)(struct x86_emulate_ctxt *ctxt,
+   unsigned long addr,
const void *old,
const void *new,
unsigned int bytes,
-   struct x86_exception *fault,
-   struct kvm_vcpu *vcpu);
+   struct x86_exception *fault);
 
int (*pio_in_emulated)(int size, unsigned short port, void *val,
   unsigned int count, struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3c11703..ff64b17 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -645,8 +645,7 @@ static int segmented_read_std(struct x86_emulate_ctxt *ctxt,
rc = linearize(ctxt, addr, size, false, &linear);
if (rc != X86EMUL_CONTINUE)
return rc;
-   return ctxt->ops->read_std(linear, data, size, ctxt->vcpu,
-  &ctxt->exception);
+   return ctxt->ops->read_std(ctxt, linear, data, size, &ctxt->exception);
 }
 
 static int do_fe

[PATCH 07/16] KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt()

2011-04-20 Thread Avi Kivity
Replacing direct calls to realmode_lgdt(), realmode_lidt().

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |2 ++
 arch/x86/include/asm/kvm_host.h|3 ---
 arch/x86/kvm/emulate.c |   14 +++---
 arch/x86/kvm/x86.c |   26 --
 4 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index e2b082a..4d1546a 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -176,6 +176,8 @@ struct x86_emulate_ops {
 int seg);
void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
void (*get_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
+   void (*set_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
+   void (*set_idt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt);
ulong (*get_cr)(struct x86_emulate_ctxt *ctxt, int cr);
int (*set_cr)(struct x86_emulate_ctxt *ctxt, int cr, ulong val);
int (*cpl)(struct x86_emulate_ctxt *ctxt);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e50bffc..a8616ca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -681,9 +681,6 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu,
return x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0);
 }
 
-void realmode_lgdt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-void realmode_lidt(struct kvm_vcpu *vcpu, u16 size, unsigned long address);
-
 void kvm_enable_efer_bits(u64);
 int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8020f1b..fb431f3 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3494,6 +3494,7 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
int rc = X86EMUL_CONTINUE;
int saved_dst_type = c->dst.type;
int irq; /* Used for int 3, int, and into */
+   struct desc_ptr desc_ptr;
 
ctxt->decode.mem_read.pos = 0;
 
@@ -4005,9 +4006,6 @@ twobyte_insn:
switch (c->b) {
case 0x01: /* lgdt, lidt, lmsw */
switch (c->modrm_reg) {
-   u16 size;
-   unsigned long address;
-
case 0: /* vmcall */
if (c->modrm_mod != 3 || c->modrm_rm != 1)
goto cannot_emulate;
@@ -4023,10 +4021,11 @@ twobyte_insn:
break;
case 2: /* lgdt */
rc = read_descriptor(ctxt, ops, c->src.addr.mem,
-&size, &address, c->op_bytes);
+&desc_ptr.size, &desc_ptr.address,
+c->op_bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
-   realmode_lgdt(ctxt->vcpu, size, address);
+   ctxt->ops->set_gdt(ctxt, &desc_ptr);
/* Disable writeback. */
c->dst.type = OP_NONE;
break;
@@ -4041,11 +4040,12 @@ twobyte_insn:
}
} else {
rc = read_descriptor(ctxt, ops, c->src.addr.mem,
-&size, &address,
+&desc_ptr.size,
+&desc_ptr.address,
 c->op_bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
-   realmode_lidt(ctxt->vcpu, size, address);
+   ctxt->ops->set_idt(ctxt, &desc_ptr);
}
/* Disable writeback. */
c->dst.type = OP_NONE;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f7248e..7cd3a3b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4249,6 +4249,16 @@ static void emulator_get_idt(struct x86_emulate_ctxt 
*ctxt, struct desc_ptr *dt)
kvm_x86_ops->get_idt(emul_to_vcpu(ctxt), dt);
 }
 
+static void emulator_set_gdt(struct x86_emulate_ctxt *ctxt, struct desc_ptr 
*dt)
+{
+   kvm_x86_ops->set_gdt(emul_to_vcpu(ctxt), dt);
+}
+
+static void emulator_set_idt(struct x86_emulate_ctxt *ctxt, struct desc_ptr 
*dt)
+{
+   kvm_x86_ops->set_idt(emul_to_vcpu(ctxt), dt);
+}
+
 static unsigned long emulator_get_cached_segment_base(
struct x86_emulate_ctxt *ctxt, int seg)
 {
@@ -4388,6 +4398,8 @@ static struct x86_emulate_ops emulate_ops = {
.get_cach

[PATCH 08/16] KVM: x86 emulator: drop use of is_long_mode()

2011-04-20 Thread Avi Kivity
Requires ctxt->vcpu, which is to be abolished.  Replace with open calls
to get_msr().

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index fb431f3..a4227bf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1844,12 +1844,14 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
struct desc_struct cs, ss;
u64 msr_data;
u16 cs_sel, ss_sel;
+   u64 efer = 0;
 
/* syscall is not available in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL ||
ctxt->mode == X86EMUL_MODE_VM86)
return emulate_ud(ctxt);
 
+   ops->get_msr(ctxt, MSR_EFER, &efer);
setup_syscalls_segments(ctxt, ops, &cs, &ss);
 
ops->get_msr(ctxt, MSR_STAR, &msr_data);
@@ -1857,7 +1859,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
cs_sel = (u16)(msr_data & 0xfffc);
ss_sel = (u16)(msr_data + 8);
 
-   if (is_long_mode(ctxt->vcpu)) {
+   if (efer & EFER_LMA) {
cs.d = 0;
cs.l = 1;
}
@@ -1867,7 +1869,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
ops->set_segment_selector(ctxt, ss_sel, VCPU_SREG_SS);
 
c->regs[VCPU_REGS_RCX] = c->eip;
-   if (is_long_mode(ctxt->vcpu)) {
+   if (efer & EFER_LMA) {
 #ifdef CONFIG_X86_64
c->regs[VCPU_REGS_R11] = ctxt->eflags & ~EFLG_RF;
 
@@ -1897,7 +1899,9 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
struct desc_struct cs, ss;
u64 msr_data;
u16 cs_sel, ss_sel;
+   u64 efer = 0;
 
+   ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
/* inject #GP if in real mode */
if (ctxt->mode == X86EMUL_MODE_REAL)
return emulate_gp(ctxt, 0);
@@ -1927,8 +1931,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt, struct 
x86_emulate_ops *ops)
cs_sel &= ~SELECTOR_RPL_MASK;
ss_sel = cs_sel + 8;
ss_sel &= ~SELECTOR_RPL_MASK;
-   if (ctxt->mode == X86EMUL_MODE_PROT64
-   || is_long_mode(ctxt->vcpu)) {
+   if (ctxt->mode == X86EMUL_MODE_PROT64 || (efer & EFER_LMA)) {
cs.d = 0;
cs.l = 1;
}
@@ -2603,6 +2606,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
struct decode_cache *c = &ctxt->decode;
u64 new_val = c->src.val64;
int cr = c->modrm_reg;
+   u64 efer = 0;
 
static u64 cr_reserved_bits[] = {
0xULL,
@@ -2620,7 +2624,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
 
switch (cr) {
case 0: {
-   u64 cr4, efer;
+   u64 cr4;
if (((new_val & X86_CR0_PG) && !(new_val & X86_CR0_PE)) ||
((new_val & X86_CR0_NW) && !(new_val & X86_CR0_CD)))
return emulate_gp(ctxt, 0);
@@ -2637,7 +2641,8 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
case 3: {
u64 rsvd = 0;
 
-   if (is_long_mode(ctxt->vcpu))
+   ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
+   if (efer & EFER_LMA)
rsvd = CR3_L_MODE_RESERVED_BITS;
else if (is_pae(ctxt->vcpu))
rsvd = CR3_PAE_RESERVED_BITS;
@@ -2650,7 +2655,7 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
break;
}
case 4: {
-   u64 cr4, efer;
+   u64 cr4;
 
cr4 = ctxt->ops->get_cr(ctxt, 4);
ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/16] KVM: x86 emulator: make emulate_invlpg() an emulator callback

2011-04-20 Thread Avi Kivity
Removing direct calls to KVM.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/include/asm/kvm_host.h|1 -
 arch/x86/kvm/emulate.c |2 +-
 arch/x86/kvm/x86.c |6 +++---
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 4d1546a..f890769 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -154,6 +154,7 @@ struct x86_emulate_ops {
const void *new,
unsigned int bytes,
struct x86_exception *fault);
+   void (*invlpg)(struct x86_emulate_ctxt *ctxt, ulong addr);
 
int (*pio_in_emulated)(struct x86_emulate_ctxt *ctxt,
   int size, unsigned short port, void *val,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9c3567e..d957d0d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -690,7 +690,6 @@ struct x86_emulate_ctxt;
 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port);
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
-int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address);
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 91c4a14..6fca45f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2573,7 +2573,7 @@ static int em_invlpg(struct x86_emulate_ctxt *ctxt)
 
rc = linearize(ctxt, c->src.addr.mem, 1, false, &linear);
if (rc == X86EMUL_CONTINUE)
-   emulate_invlpg(ctxt->vcpu, linear);
+   ctxt->ops->invlpg(ctxt, linear);
/* Disable writeback. */
c->dst.type = OP_NONE;
return X86EMUL_CONTINUE;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9e8386..8af49b3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4128,10 +4128,9 @@ static unsigned long get_segment_base(struct kvm_vcpu 
*vcpu, int seg)
return kvm_x86_ops->get_segment_base(vcpu, seg);
 }
 
-int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address)
+static void emulator_invlpg(struct x86_emulate_ctxt *ctxt, ulong address)
 {
-   kvm_mmu_invlpg(vcpu, address);
-   return X86EMUL_CONTINUE;
+   kvm_mmu_invlpg(emul_to_vcpu(ctxt), address);
 }
 
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
@@ -4382,6 +4381,7 @@ static struct x86_emulate_ops emulate_ops = {
.read_emulated   = emulator_read_emulated,
.write_emulated  = emulator_write_emulated,
.cmpxchg_emulated= emulator_cmpxchg_emulated,
+   .invlpg  = emulator_invlpg,
.pio_in_emulated = emulator_pio_in_emulated,
.pio_out_emulated= emulator_pio_out_emulated,
.get_cached_descriptor = emulator_get_cached_descriptor,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/16] KVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr()

2011-04-20 Thread Avi Kivity
Avoid use of ctxt->vcpu.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a4227bf..dc495a0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2644,9 +2644,9 @@ static int check_cr_write(struct x86_emulate_ctxt *ctxt)
ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
if (efer & EFER_LMA)
rsvd = CR3_L_MODE_RESERVED_BITS;
-   else if (is_pae(ctxt->vcpu))
+   else if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PAE)
rsvd = CR3_PAE_RESERVED_BITS;
-   else if (is_paging(ctxt->vcpu))
+   else if (ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PG)
rsvd = CR3_NONPAE_RESERVED_BITS;
 
if (new_val & rsvd)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/16] KVM: x86 emulator: drop vcpu argument from pio callbacks

2011-04-20 Thread Avi Kivity
Making the emulator caller agnostic.

Signed-off-by: Avi Kivity 
---
 arch/x86/include/asm/kvm_emulate.h |   10 ++
 arch/x86/kvm/emulate.c |6 +++---
 arch/x86/kvm/x86.c |   18 --
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index b4d8467..1348bdf 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -155,11 +155,13 @@ struct x86_emulate_ops {
unsigned int bytes,
struct x86_exception *fault);
 
-   int (*pio_in_emulated)(int size, unsigned short port, void *val,
-  unsigned int count, struct kvm_vcpu *vcpu);
+   int (*pio_in_emulated)(struct x86_emulate_ctxt *ctxt,
+  int size, unsigned short port, void *val,
+  unsigned int count);
 
-   int (*pio_out_emulated)(int size, unsigned short port, const void *val,
-   unsigned int count, struct kvm_vcpu *vcpu);
+   int (*pio_out_emulated)(struct x86_emulate_ctxt *ctxt,
+   int size, unsigned short port, const void *val,
+   unsigned int count);
 
bool (*get_cached_descriptor)(struct desc_struct *desc, u32 *base3,
  int seg, struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ff64b17..8af08a1 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1125,7 +1125,7 @@ static int pio_in_emulated(struct x86_emulate_ctxt *ctxt,
if (n == 0)
n = 1;
rc->pos = rc->end = 0;
-   if (!ops->pio_in_emulated(size, port, rc->data, n, ctxt->vcpu))
+   if (!ops->pio_in_emulated(ctxt, size, port, rc->data, n))
return 0;
rc->end = n * size;
}
@@ -3892,8 +3892,8 @@ special_insn:
case 0xef: /* out dx,(e/r)ax */
c->dst.val = c->regs[VCPU_REGS_RDX];
do_io_out:
-   ops->pio_out_emulated(c->src.bytes, c->dst.val,
- &c->src.val, 1, ctxt->vcpu);
+   ops->pio_out_emulated(ctxt, c->src.bytes, c->dst.val,
+ &c->src.val, 1);
c->dst.type = OP_NONE;  /* Disable writeback. */
break;
case 0xf4:  /* hlt */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 274652a..e9040a9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4060,9 +4060,12 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
 }
 
 
-static int emulator_pio_in_emulated(int size, unsigned short port, void *val,
-unsigned int count, struct kvm_vcpu *vcpu)
+static int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt,
+   int size, unsigned short port, void *val,
+   unsigned int count)
 {
+   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+
if (vcpu->arch.pio.count)
goto data_avail;
 
@@ -4090,10 +4093,12 @@ static int emulator_pio_in_emulated(int size, unsigned 
short port, void *val,
return 0;
 }
 
-static int emulator_pio_out_emulated(int size, unsigned short port,
- const void *val, unsigned int count,
- struct kvm_vcpu *vcpu)
+static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
+int size, unsigned short port,
+const void *val, unsigned int count)
 {
+   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+
trace_kvm_pio(1, port, size, count);
 
vcpu->arch.pio.port = port;
@@ -4614,7 +4619,8 @@ EXPORT_SYMBOL_GPL(x86_emulate_instruction);
 int kvm_fast_pio_out(struct kvm_vcpu *vcpu, int size, unsigned short port)
 {
unsigned long val = kvm_register_read(vcpu, VCPU_REGS_RAX);
-   int ret = emulator_pio_out_emulated(size, port, &val, 1, vcpu);
+   int ret = emulator_pio_out_emulated(&vcpu->arch.emulate_ctxt,
+   size, port, &val, 1);
/* do not return to emulator after return from userspace */
vcpu->arch.pio.count = 0;
return ret;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/16] Decouple the x86 emulator from the rest of kvm

2011-04-20 Thread Avi Kivity
This (longer than expected) patchset decouples the x86 emulator from the rest
of kvm.  All communication is not done via x86_emulate_ctxt fields and
callbacks; there is no access to ctxt->vcpu (which is eliminated by the last
patch).

Avi Kivity (16):
  KVM: x86 emulator: drop vcpu argument from memory read/write
callbacks
  KVM: x86 emulator: drop vcpu argument from pio callbacks
  KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks
  KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks
  KVM: x86 emulator: drop vcpu argument from intercept callback
  KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks
  KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt()
  KVM: x86 emulator: drop use of is_long_mode()
  KVM: x86 emulator: Replace calls to is_pae() and is_paging with
->get_cr()
  KVM: x86 emulator: emulate CLTS internally
  KVM: x86 emulator: make emulate_invlpg() an emulator callback
  KVM: x86 emulator: add new ->halt() callback
  KVM: x86 emulator: add ->fix_hypercall() callback
  KVM: x86 emulator: add new ->wbinvd() callback
  KVM: Avoid using x86_emulate_ctxt.vcpu
  KVM: x86 emulator: drop x86_emulate_ctxt::vcpu

 arch/x86/include/asm/kvm_emulate.h |   90 ++-
 arch/x86/include/asm/kvm_host.h|7 -
 arch/x86/kvm/emulate.c |  321 +++-
 arch/x86/kvm/x86.c |  203 ++-
 4 files changed, 338 insertions(+), 283 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: emulator: Use linearize() when fetching instructions.

2011-04-20 Thread Avi Kivity

On 04/18/2011 07:05 PM, Nelson Elhage wrote:

Since segments need to be handled slightly differently when fetching
instructions, we add a __linearize helper that accepts a new 'fetch' boolean.



  static int segmented_read_std(struct x86_emulate_ctxt *ctxt,
  struct segmented_address addr,
  void *data,
@@ -637,11 +646,13 @@ static int do_fetch_insn_byte(struct x86_emulate_ctxt 
*ctxt,
int size, cur_size;

if (eip == fc->end) {
-   unsigned long linear = eip + ctxt->cs_base;
-   if (ctxt->mode != X86EMUL_MODE_PROT64)
-   linear&= (u32)-1;
+   unsigned long linear;
+   struct segmented_address addr = {VCPU_SREG_CS, eip};
cur_size = fc->end - fc->start;
size = min(15UL - cur_size, PAGE_SIZE - offset_in_page(eip));


Breaks immediately - the segmented_address initializer is backwards.  
I've fixed this in my tree.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte

2011-04-20 Thread Avi Kivity

On 04/20/2011 02:06 PM, Roedel, Joerg wrote:

On Wed, Apr 20, 2011 at 06:05:08AM -0400, Avi Kivity wrote:
>  On 04/20/2011 12:35 PM, Roedel, Joerg wrote:

>  >  This patch seems only to introduce another wrapper around
>  >  kvm_read_guest_page_mmu(), so I don't see a problem in this patch.
>
>  By patch 3, ptep_user will be computed in this function and no longer
>  available for setting the accessed bit later on.
>
>  >  The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
>  >  l2-gfn (by calling mmu->translate_gpa).
>
>  But cmpxchg_gpte() does not.

You are right, cmpxchg_gpte needs to handle this too. But the bug is not
introduced with this patch-set it was there before.


Correct.  The reason I don't want the helper, is so we can use ptep_user 
in both places (not for efficiency, just to make sure it's exactly the 
same value).



The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
fix soon.


Thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte

2011-04-20 Thread Roedel, Joerg
On Wed, Apr 20, 2011 at 06:05:08AM -0400, Avi Kivity wrote:
> On 04/20/2011 12:35 PM, Roedel, Joerg wrote:

> > This patch seems only to introduce another wrapper around
> > kvm_read_guest_page_mmu(), so I don't see a problem in this patch.
> 
> By patch 3, ptep_user will be computed in this function and no longer 
> available for setting the accessed bit later on.
> 
> > The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
> > l2-gfn (by calling mmu->translate_gpa).
> 
> But cmpxchg_gpte() does not.

You are right, cmpxchg_gpte needs to handle this too. But the bug is not
introduced with this patch-set it was there before.
The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
fix soon.

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.38.1 general protection fault

2011-04-20 Thread Tomasz Chmielewski

On 20.04.2011 11:28, Thomas Treutner wrote:

On 03/28/2011 10:14 PM, Tomasz Chmielewski wrote:

On 28.03.2011 22:04, Andrea Arcangeli wrote:


Tomasz, how easily can you reproduce?


Well, this server runs 10 VMs or so, and it happens after 1-2 days of
uptime.

I reverted now to a 2.6.35.x, as it had enough downtime with 2.6.38
already ;) so I'd rather not experiment anymore for some time with a
kernel known to cause problems.


Tomasz, to which exact kernel version (host+guests) did you switch and
is it now stable?


I've switched the host to the latest 2.6.35.x and it's stable.

Guest kernel doesn't seem to make a difference here, but majority of 
them are running 2.6.38.x kernel (had some weird issues with "events/0", 
taking 100% CPU on guests when I used 2.6.35, which made the guests 
crawling slow).



--
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: whitespace cleanups

2011-04-20 Thread Avi Kivity
Clean up lines longer than 80 columns.  No code changes.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/emulate.c |   96 +++-
 1 files changed, 54 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 88c1f7a..4986e1b 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -262,42 +262,42 @@ struct gprefix {
 "w", "r", _LO32, "r", "", "r")
 
 /* Instruction has three operands and one operand is stored in ECX register */
-#define __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, _suffix, _type)
\
-   do {
\
-   unsigned long _tmp; 
\
-   _type _clv  = (_cl).val;
\
-   _type _srcv = (_src).val;   
\
-   _type _dstv = (_dst).val;   
\
-   
\
-   __asm__ __volatile__ (  
\
-   _PRE_EFLAGS("0", "5", "2")  
\
-   _op _suffix " %4,%1 \n" 
\
-   _POST_EFLAGS("0", "5", "2") 
\
-   : "=m" (_eflags), "+r" (_dstv), "=&r" (_tmp)
\
-   : "c" (_clv) , "r" (_srcv), "i" (EFLAGS_MASK)   
\
-   );  
\
-   
\
-   (_cl).val  = (unsigned long) _clv;  
\
-   (_src).val = (unsigned long) _srcv; 
\
-   (_dst).val = (unsigned long) _dstv; 
\
+#define __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, _suffix, _type)
\
+   do {\
+   unsigned long _tmp; \
+   _type _clv  = (_cl).val;\
+   _type _srcv = (_src).val;   \
+   _type _dstv = (_dst).val;   \
+   \
+   __asm__ __volatile__ (  \
+   _PRE_EFLAGS("0", "5", "2")  \
+   _op _suffix " %4,%1 \n" \
+   _POST_EFLAGS("0", "5", "2") \
+   : "=m" (_eflags), "+r" (_dstv), "=&r" (_tmp)\
+   : "c" (_clv) , "r" (_srcv), "i" (EFLAGS_MASK)   \
+   );  \
+   \
+   (_cl).val  = (unsigned long) _clv;  \
+   (_src).val = (unsigned long) _srcv; \
+   (_dst).val = (unsigned long) _dstv; \
} while (0)
 
-#define emulate_2op_cl(_op, _cl, _src, _dst, _eflags)  
\
-   do {
\
-   switch ((_dst).bytes) { 
\
-   case 2: 
\
-   __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, 
\
-   "w", unsigned short);   
\
-   break;  
\
-   case 4: 
\
-   __emulate_2op_cl(_op, _cl, _src, _dst, _eflags, 
\
-   "l", unsigned int); 
\
-   break;  
\
-   case 8: 
\
-   ON64(__emulate_2op_cl(_op, _cl, _src, _dst, _eflags,
\
-   "q", unsigned long));   
\
-   break;  
\
-   }   
\
+#define emulate_2op_cl(_op, _cl, _src, _dst, _eflags)  \
+   do {\
+   switch ((_dst).bytes) { \
+   case 2: 

Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte

2011-04-20 Thread Avi Kivity

On 04/20/2011 12:35 PM, Roedel, Joerg wrote:

On Wed, Apr 20, 2011 at 05:07:12AM -0400, Avi Kivity wrote:
>  On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote:
>  >  From: Takuya Yoshikawa
>  >
>  >  This will be optimized later.
>  >
>  >  Signed-off-by: Takuya Yoshikawa
>  >  ---
>  >arch/x86/kvm/paging_tmpl.h |   12 +---
>  >1 files changed, 9 insertions(+), 3 deletions(-)
>  >
>  >  diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>  >  index 74f8567..109939a 100644
>  >  --- a/arch/x86/kvm/paging_tmpl.h
>  >  +++ b/arch/x86/kvm/paging_tmpl.h
>  >  @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu 
*vcpu, pt_element_t gpte)
>  >  return access;
>  >}
>  >
>  >  +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu 
*mmu,
>  >  +gfn_t table_gfn, int offset, pt_element_t 
*ptep)
>  >  +{
>  >  +   return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
>  >  +  offset, sizeof(*ptep),
>  >  +  PFERR_USER_MASK | PFERR_WRITE_MASK);
>  >  +}
>  >  +
>  >/*
>  > * Fetch a guest pte for a guest virtual address
>  > */
>  >  @@ -160,9 +168,7 @@ walk:
>  >  walker->table_gfn[walker->level - 1] = table_gfn;
>  >  walker->pte_gpa[walker->level - 1] = pte_gpa;
>  >
>  >  -   if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte,
>  >  -   offset, sizeof(pte),
>  >  -   
PFERR_USER_MASK|PFERR_WRITE_MASK)) {
>  >  +   if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) 
{
>  >  present = false;
>  >  break;
>  >  }
>
>
>  I think it's better to avoid a separate function for this.  The reason
>  is I'd like to use ptep_user for cmpxchg_gpte() later on in
>  walk_addr_generic(), so we use the same calculation for both read and
>  write.  So please just inline the new code in walk_addr_generic().
>
>  In fact there's probably a bug there for nested npt - we use
>  gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn.
>  Joerg, am I right here?

This patch seems only to introduce another wrapper around
kvm_read_guest_page_mmu(), so I don't see a problem in this patch.


By patch 3, ptep_user will be computed in this function and no longer 
available for setting the accessed bit later on.



The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
l2-gfn (by calling mmu->translate_gpa).


But cmpxchg_gpte() does not.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] qemu-kvm: Sort out upstream merge regressions

2011-04-20 Thread Avi Kivity

On 04/18/2011 12:26 PM, Jan Kiszka wrote:

Recent merge with upstream left some corners of qemu-kvm broken behind.
This series addresses those I've spotted based on my merge experiments
in the past months.


Applied all, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.38.1 general protection fault

2011-04-20 Thread Thomas Treutner

On 03/28/2011 10:14 PM, Tomasz Chmielewski wrote:

On 28.03.2011 22:04, Andrea Arcangeli wrote:


Tomasz, how easily can you reproduce?


Well, this server runs 10 VMs or so, and it happens after 1-2 days of
uptime.

I reverted now to a 2.6.35.x, as it had enough downtime with 2.6.38
already ;) so I'd rather not experiment anymore for some time with a
kernel known to cause problems.


Tomasz, to which exact kernel version (host+guests) did you switch and 
is it now stable?


thanks, -t
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte

2011-04-20 Thread Roedel, Joerg
On Wed, Apr 20, 2011 at 05:07:12AM -0400, Avi Kivity wrote:
> On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote:
> > From: Takuya Yoshikawa
> >
> > This will be optimized later.
> >
> > Signed-off-by: Takuya Yoshikawa
> > ---
> >   arch/x86/kvm/paging_tmpl.h |   12 +---
> >   1 files changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > index 74f8567..109939a 100644
> > --- a/arch/x86/kvm/paging_tmpl.h
> > +++ b/arch/x86/kvm/paging_tmpl.h
> > @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu 
> > *vcpu, pt_element_t gpte)
> > return access;
> >   }
> >
> > +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu 
> > *mmu,
> > +gfn_t table_gfn, int offset, pt_element_t 
> > *ptep)
> > +{
> > +   return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
> > +  offset, sizeof(*ptep),
> > +  PFERR_USER_MASK | PFERR_WRITE_MASK);
> > +}
> > +
> >   /*
> >* Fetch a guest pte for a guest virtual address
> >*/
> > @@ -160,9 +168,7 @@ walk:
> > walker->table_gfn[walker->level - 1] = table_gfn;
> > walker->pte_gpa[walker->level - 1] = pte_gpa;
> >
> > -   if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte,
> > -   offset, sizeof(pte),
> > -   PFERR_USER_MASK|PFERR_WRITE_MASK)) {
> > +   if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
> > present = false;
> > break;
> > }
> 
> 
> I think it's better to avoid a separate function for this.  The reason 
> is I'd like to use ptep_user for cmpxchg_gpte() later on in 
> walk_addr_generic(), so we use the same calculation for both read and 
> write.  So please just inline the new code in walk_addr_generic().
> 
> In fact there's probably a bug there for nested npt - we use 
> gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn.  
> Joerg, am I right here?

This patch seems only to introduce another wrapper around
kvm_read_guest_page_mmu(), so I don't see a problem in this patch.

The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
l2-gfn (by calling mmu->translate_gpa).

Regards,

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] KVM: emulator: Use linearize() when fetching instructions.

2011-04-20 Thread Avi Kivity

On 04/18/2011 07:05 PM, Nelson Elhage wrote:

Since segments need to be handled slightly differently when fetching
instructions, we add a __linearize helper that accepts a new 'fetch' boolean.


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk

2011-04-20 Thread Avi Kivity
On 04/19/2011 06:47 AM, Takuya Yoshikawa wrote:
> So if certain algorithm seems to be addapted, yes, I will test based
> on that.  IIRC, any practically good algorithm has not been found yet,
> right?

I think a simple sort based on size will provide the same optimization
(just the cache, not get_user()) without any downsides. Most memory in a
guest is usually in just one or two slots, that's the reason for the
high hit rate.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte

2011-04-20 Thread Avi Kivity

On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote:

From: Takuya Yoshikawa

This will be optimized later.

Signed-off-by: Takuya Yoshikawa
---
  arch/x86/kvm/paging_tmpl.h |   12 +---
  1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 74f8567..109939a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
pt_element_t gpte)
return access;
  }

+static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+gfn_t table_gfn, int offset, pt_element_t 
*ptep)
+{
+   return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
+  offset, sizeof(*ptep),
+  PFERR_USER_MASK | PFERR_WRITE_MASK);
+}
+
  /*
   * Fetch a guest pte for a guest virtual address
   */
@@ -160,9 +168,7 @@ walk:
walker->table_gfn[walker->level - 1] = table_gfn;
walker->pte_gpa[walker->level - 1] = pte_gpa;

-   if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte,
-   offset, sizeof(pte),
-   PFERR_USER_MASK|PFERR_WRITE_MASK)) {
+   if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
present = false;
break;
}



I think it's better to avoid a separate function for this.  The reason 
is I'd like to use ptep_user for cmpxchg_gpte() later on in 
walk_addr_generic(), so we use the same calculation for both read and 
write.  So please just inline the new code in walk_addr_generic().


In fact there's probably a bug there for nested npt - we use 
gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn.  
Joerg, am I right here?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk

2011-04-20 Thread Avi Kivity

On 04/18/2011 09:38 PM, Takuya Yoshikawa wrote:

From: Takuya Yoshikawa

We optimize multi level guest page table walk as follows:

   1. We cache the memslot which, probably, includes the next guest page
  tables to avoid searching for it many times.
   2. We use get_user() instead of copy_from_user().

Note that this is kind of a restricted way of Xiao's more generic
work: "KVM: optimize memslots searching and cache GPN to GFN."


Good optimization.  copy_from_user() really isn't optimized for short 
buffers, I expect much of the improvement comes from that.



+/*
+ * Read the guest PTE refered to by table_gfn and offset and put it into ptep.
+ *
+ * *slot_hint, if not NULL, should point to a memslot which probably includes
+ * the guest PTE.  The actual memslot will be put back into this so that
+ * callers can cache it.
+ */


Please drop the slot_hint optimization.  First, it belongs in a separate 
patch.  Second, I prefer to see a generic slot sort instead of an ad-hoc 
cache.



  static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-gfn_t table_gfn, int offset, pt_element_t 
*ptep)
+gfn_t table_gfn, int offset, pt_element_t 
*ptep,
+struct kvm_memory_slot **slot_hint)
  {
-   return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
-  offset, sizeof(*ptep),
-  PFERR_USER_MASK | PFERR_WRITE_MASK);
+   unsigned long addr;
+   pt_element_t __user *ptep_user;
+   gfn_t real_gfn;
+
+   real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
+ PFERR_USER_MASK | PFERR_WRITE_MASK);
+   if (real_gfn == UNMAPPED_GVA)
+   return -EFAULT;
+
+   real_gfn = gpa_to_gfn(real_gfn);
+
+   if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn))
+   *slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn);
+
+   addr = gfn_to_hva_memslot(*slot_hint, real_gfn);
+   if (kvm_is_error_hva(addr))
+   return -EFAULT;
+
+   ptep_user = (pt_element_t __user *)((void *)addr + offset);
+   return get_user(*ptep, ptep_user);
  }

  /*
@@ -130,6 +155,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker 
*walker,
gpa_t pte_gpa;
bool eperm, present, rsvd_fault;
int offset, write_fault, user_fault, fetch_fault;
+   struct kvm_memory_slot *slot_cache = NULL;

write_fault = access&  PFERR_WRITE_MASK;
user_fault = access&  PFERR_USER_MASK;
@@ -168,7 +194,8 @@ walk:
walker->table_gfn[walker->level - 1] = table_gfn;
walker->pte_gpa[walker->level - 1] = pte_gpa;

-   if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
+   if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn,
+ offset,&pte,&slot_cache)) {
present = false;
break;
}



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] kvm tools: Add debug feature to test the IO thread

2011-04-20 Thread Ingo Molnar

* Pekka Enberg  wrote:

> Sorry for the bikeshedding but wouldn't it be better to follow Git's lead and 
> have something like
> 
>   kvm config MyInstance-1 --set debug.io.delay.ms 100
> 
> and
> 
>   kvm config MyInstance-1 --list

Yeah, agreed - 'kvm config' is intuitive. I tried to think of something better 
than 'kvm set' but failed.

 ( And no, being super diligent with high level, very user visible changes and
   names is not bikeshed painting. )

Note that 'git config' touches the .gitconfig IIRC - while here we really also 
want to include runtime, dynamic configuration - but i think that distinction 
is fine.

Now the whole 'kvm config' thing needs more thought and the whole enumeration 
of KVM instances needs to be well thought out as well. How do we list instances 
- 'kvm list' - or should perhaps 'kvm config' list all the currently running 
instances?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)

2011-04-20 Thread Krishna Kumar2
Thanks Jason!

So I can use my virtio-net guest driver and test with this patch?
Please provide the script you use to start MQ guest.

Regards,

- KK

Jason Wang  wrote on 04/20/2011 02:03:07 PM:

> Jason Wang 
> 04/20/2011 02:03 PM
>
> To
>
> Krishna Kumar2/India/IBM@IBMIN, kvm@vger.kernel.org, m...@redhat.com,
> net...@vger.kernel.org, ru...@rustcorp.com.au, qemu-
> de...@nongnu.org, anth...@codemonkey.ws
>
> cc
>
> Subject
>
> [RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)
>
> Inspired by Krishna's patch
(http://www.spinics.net/lists/kvm/msg52098.html
> ) and
> Michael's suggestions.  The following series adds the multiqueue support
for
> qemu and enable it for virtio-net (both userspace and vhost).
>
> The aim for this series is to simplified the management and achieve the
same
> performacne with less codes.
>
> Follows are the differences between this series and Krishna's:
>
> - Add the multiqueue support for qemu and also for userspace virtio-net
> - Instead of hacking the vhost module to manipulate kthreads, this patch
just
> implement the userspace based multiqueues and thus can re-use the
> existed vhost kernel-side codes without any modification.
> - Use 1:1 mapping between TX/RX pairs and vhost kthread because the
> implementation is based on usersapce.
> - The cli is also changed to make the mgmt easier, the -netdev option of
qdev
> can now accpet more than one ids. You can start a multiqueue virtio-net
device
> through:
> ./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev
> tap,id=hn0,vhost=on,fd=Y -device
virtio-net-pci,netdev=hn0#hn1,queues=2 ...
>
> The series is very primitive and still need polished.
>
> Suggestions are welcomed.
> ---
>
> Jason Wang (2):
>   net: Add multiqueue support
>   virtio-net: add multiqueue support
>
>
>  hw/qdev-properties.c |   37 -
>  hw/qdev.h|3
>  hw/vhost.c   |   26 ++-
>  hw/vhost.h   |1
>  hw/vhost_net.c   |7 +
>  hw/vhost_net.h   |2
>  hw/virtio-net.c  |  409 +++
> +--
>  hw/virtio-net.h  |2
>  hw/virtio-pci.c  |1
>  hw/virtio.h  |1
>  net.c|   34 +++-
>  net.h|   15 +-
>  12 files changed, 353 insertions(+), 185 deletions(-)
>
> --
> Jason Wang

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/2] virtio-net: add multiqueue support

2011-04-20 Thread Jason Wang
This patch add the multiqueue ability to virtio-net for both userapce and
vhost. With this patch the kernel side vhost could be reused without
modification to support multiqueue virtio-net nics.

Signed-off-by: Jason Wang 
---
 hw/vhost.c  |   26 ++-
 hw/vhost.h  |1 
 hw/vhost_net.c  |7 +
 hw/vhost_net.h  |2 
 hw/virtio-net.c |  409 +++
 hw/virtio-net.h |2 
 hw/virtio-pci.c |1 
 hw/virtio.h |1 
 8 files changed, 284 insertions(+), 165 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 14b571d..2301d53 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -450,10 +450,10 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 target_phys_addr_t s, l, a;
 int r;
 struct vhost_vring_file file = {
-.index = idx,
+.index = idx % dev->nvqs,
 };
 struct vhost_vring_state state = {
-.index = idx,
+.index = idx % dev->nvqs,
 };
 struct VirtQueue *vvq = virtio_get_queue(vdev, idx);
 
@@ -504,12 +504,13 @@ static int vhost_virtqueue_init(struct vhost_dev *dev,
 goto fail_alloc_ring;
 }
 
-r = vhost_virtqueue_set_addr(dev, vq, idx, dev->log_enabled);
+r = vhost_virtqueue_set_addr(dev, vq, idx % dev->nvqs, dev->log_enabled);
 if (r < 0) {
 r = -errno;
 goto fail_alloc;
 }
 r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, true);
+
 if (r < 0) {
 fprintf(stderr, "Error binding host notifier: %d\n", -r);
 goto fail_host_notifier;
@@ -557,7 +558,7 @@ static void vhost_virtqueue_cleanup(struct vhost_dev *dev,
 unsigned idx)
 {
 struct vhost_vring_state state = {
-.index = idx,
+.index = idx % dev->nvqs,
 };
 int r;
 r = vdev->binding->set_host_notifier(vdev->binding_opaque, idx, false);
@@ -648,10 +649,13 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 goto fail;
 }
 
-r = vdev->binding->set_guest_notifiers(vdev->binding_opaque, true);
-if (r < 0) {
-fprintf(stderr, "Error binding guest notifier: %d\n", -r);
-goto fail_notifiers;
+for (i = 0; i < hdev->nvqs; i++) {
+r = vdev->binding->set_guest_notifier(vdev->binding_opaque,
+  hdev->start_idx + i, true);
+if (r < 0) {
+fprintf(stderr, "Error binding guest notifier: %d\n", -r);
+goto fail_notifiers;
+}
 }
 
 r = vhost_dev_set_features(hdev, hdev->log_enabled);
@@ -667,7 +671,7 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 r = vhost_virtqueue_init(hdev,
  vdev,
  hdev->vqs + i,
- i);
+ hdev->start_idx + i);
 if (r < 0) {
 goto fail_vq;
 }
@@ -694,7 +698,7 @@ fail_vq:
 vhost_virtqueue_cleanup(hdev,
 vdev,
 hdev->vqs + i,
-i);
+hdev->start_idx + i);
 }
 fail_mem:
 fail_features:
@@ -712,7 +716,7 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 vhost_virtqueue_cleanup(hdev,
 vdev,
 hdev->vqs + i,
-i);
+hdev->start_idx + i);
 }
 vhost_client_sync_dirty_bitmap(&hdev->client, 0,
(target_phys_addr_t)~0x0ull);
diff --git a/hw/vhost.h b/hw/vhost.h
index c8c595a..48b9478 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -31,6 +31,7 @@ struct vhost_dev {
 struct vhost_memory *mem;
 struct vhost_virtqueue *vqs;
 int nvqs;
+int start_idx;
 unsigned long long features;
 unsigned long long acked_features;
 unsigned long long backend_features;
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index 420e05f..7fc87f8 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -128,7 +128,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 }
 
 int vhost_net_start(struct vhost_net *net,
-VirtIODevice *dev)
+VirtIODevice *dev,
+int start_idx)
 {
 struct vhost_vring_file file = { };
 int r;
@@ -139,6 +140,7 @@ int vhost_net_start(struct vhost_net *net,
 
 net->dev.nvqs = 2;
 net->dev.vqs = net->vqs;
+net->dev.start_idx = start_idx;
 r = vhost_dev_start(&net->dev, dev);
 if (r < 0) {
 return r;
@@ -206,7 +208,8 @@ bool vhost_net_query(VHostNetState *net, VirtIODevice *dev)
 }
 
 int vhost_net_start(struct vhost_net *net,
-   VirtIODevice *dev)
+VirtIODevice *dev,
+int start_idx)
 {
 return -ENOSYS;
 }
diff --git a/hw/vhost_net.h b/hw/vhost_net.h
index 91

[RFC PATCH 1/2] net: Add multiqueue support

2011-04-20 Thread Jason Wang
This patch adds the multiqueues support for emulated nics. Each VLANClientState
pairs are now abstract as a queue instead of a nic, and multiple VLANClientState
pointers were stored in the NICState and treated as the multiple queues of a
single nic. The netdev options of qdev were now expanded to accept more than one
netdev ids. A queue_index were also introduced to let the emulated nics know
which queue the packet were came from or sent out. Virtio-net would be the first
user.

The legacy single queue nics can still run happily without modification as the
the compatibility were kept.

Signed-off-by: Jason Wang 
---
 hw/qdev-properties.c |   37 ++---
 hw/qdev.h|3 ++-
 net.c|   34 ++
 net.h|   15 +++
 4 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 1088a26..dd371e1 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -384,14 +384,37 @@ PropertyInfo qdev_prop_chr = {
 
 static int parse_netdev(DeviceState *dev, Property *prop, const char *str)
 {
-VLANClientState **ptr = qdev_get_prop_ptr(dev, prop);
+VLANClientState ***nc = qdev_get_prop_ptr(dev, prop);
+const char *ptr = str;
+int i = 0;
+size_t len = strlen(str);
+*nc = qemu_malloc(MAX_QUEUE_NUM * sizeof(VLANClientState *));
+
+while (i < MAX_QUEUE_NUM && ptr < str + len) {
+char *name = NULL;
+char *this = strchr(ptr, '#');
+
+if (this == NULL) {
+name = strdup(ptr);
+} else {
+name = strndup(ptr, this - ptr);
+}
 
-*ptr = qemu_find_netdev(str);
-if (*ptr == NULL)
-return -ENOENT;
-if ((*ptr)->peer) {
-return -EEXIST;
+(*nc)[i] = qemu_find_netdev(name);
+if ((*nc)[i] == NULL) {
+return -ENOENT;
+}
+if (((*nc)[i])->peer) {
+return -EEXIST;
+}
+
+if (this == NULL) {
+break;
+}
+i++;
+ptr = this + 1;
 }
+
 return 0;
 }
 
@@ -409,7 +432,7 @@ static int print_netdev(DeviceState *dev, Property *prop, 
char *dest, size_t len
 PropertyInfo qdev_prop_netdev = {
 .name  = "netdev",
 .type  = PROP_TYPE_NETDEV,
-.size  = sizeof(VLANClientState*),
+.size  = sizeof(VLANClientState **),
 .parse = parse_netdev,
 .print = print_netdev,
 };
diff --git a/hw/qdev.h b/hw/qdev.h
index 8a13ec9..b438da0 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h
@@ -257,6 +257,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
 .defval= (bool[]) { (_defval) }, \
 }
 
+
 #define DEFINE_PROP_UINT8(_n, _s, _f, _d)   \
 DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_uint8, uint8_t)
 #define DEFINE_PROP_UINT16(_n, _s, _f, _d)  \
@@ -281,7 +282,7 @@ extern PropertyInfo qdev_prop_pci_devfn;
 #define DEFINE_PROP_STRING(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_string, char*)
 #define DEFINE_PROP_NETDEV(_n, _s, _f) \
-DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState*)
+DEFINE_PROP(_n, _s, _f, qdev_prop_netdev, VLANClientState**)
 #define DEFINE_PROP_VLAN(_n, _s, _f) \
 DEFINE_PROP(_n, _s, _f, qdev_prop_vlan, VLANState*)
 #define DEFINE_PROP_DRIVE(_n, _s, _f) \
diff --git a/net.c b/net.c
index 4f777c3..a937e5d 100644
--- a/net.c
+++ b/net.c
@@ -227,16 +227,36 @@ NICState *qemu_new_nic(NetClientInfo *info,
 {
 VLANClientState *nc;
 NICState *nic;
+int i;
 
 assert(info->type == NET_CLIENT_TYPE_NIC);
 assert(info->size >= sizeof(NICState));
 
-nc = qemu_new_net_client(info, conf->vlan, conf->peer, model, name);
+nc = qemu_new_net_client(info, conf->vlan, conf->peers[0], model, name);
 
 nic = DO_UPCAST(NICState, nc, nc);
 nic->conf = conf;
 nic->opaque = opaque;
 
+/* For compatiablity with single queue nic */
+nic->ncs[0] = nc;
+nc->opaque = nic;
+
+for (i = 1 ; i < conf->queues; i++) {
+VLANClientState *vc = qemu_mallocz(sizeof(*vc));
+vc->opaque = nic;
+nic->ncs[i] = vc;
+vc->peer = conf->peers[i];
+vc->info = info;
+vc->queue_index = i;
+vc->peer->peer = vc;
+QTAILQ_INSERT_TAIL(&non_vlan_clients, vc, next);
+
+vc->send_queue = qemu_new_net_queue(qemu_deliver_packet,
+qemu_deliver_packet_iov,
+vc);
+}
+
 return nic;
 }
 
@@ -272,11 +292,10 @@ void qemu_del_vlan_client(VLANClientState *vc)
 {
 /* If there is a peer NIC, delete and cleanup client, but do not free. */
 if (!vc->vlan && vc->peer && vc->peer->info->type == NET_CLIENT_TYPE_NIC) {
-NICState *nic = DO_UPCAST(NICState, nc, vc->peer);
-if (nic->peer_deleted) {
+if (vc->peer_deleted) {
 retur

[RFC PATCH 0/2] Multiqueue support for qemu(virtio-net)

2011-04-20 Thread Jason Wang
Inspired by Krishna's patch (http://www.spinics.net/lists/kvm/msg52098.html) and
Michael's suggestions.  The following series adds the multiqueue support for
qemu and enable it for virtio-net (both userspace and vhost).

The aim for this series is to simplified the management and achieve the same
performacne with less codes.

Follows are the differences between this series and Krishna's:

- Add the multiqueue support for qemu and also for userspace virtio-net
- Instead of hacking the vhost module to manipulate kthreads, this patch just
implement the userspace based multiqueues and thus can re-use the existed vhost 
kernel-side codes without any modification.
- Use 1:1 mapping between TX/RX pairs and vhost kthread because the
implementation is based on usersapce.
- The cli is also changed to make the mgmt easier, the -netdev option of qdev
can now accpet more than one ids. You can start a multiqueue virtio-net device
through:
./qemu-system-x86_64 -netdev tap,id=hn0,vhost=on,fd=X -netdev
tap,id=hn0,vhost=on,fd=Y -device virtio-net-pci,netdev=hn0#hn1,queues=2 ...

The series is very primitive and still need polished.

Suggestions are welcomed.
---

Jason Wang (2):
  net: Add multiqueue support
  virtio-net: add multiqueue support


 hw/qdev-properties.c |   37 -
 hw/qdev.h|3 
 hw/vhost.c   |   26 ++-
 hw/vhost.h   |1 
 hw/vhost_net.c   |7 +
 hw/vhost_net.h   |2 
 hw/virtio-net.c  |  409 --
 hw/virtio-net.h  |2 
 hw/virtio-pci.c  |1 
 hw/virtio.h  |1 
 net.c|   34 +++-
 net.h|   15 +-
 12 files changed, 353 insertions(+), 185 deletions(-)

-- 
Jason Wang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Some KVM fixes

2011-04-20 Thread Avi Kivity

On 04/18/2011 12:42 PM, Joerg Roedel wrote:

Hi,

these two patches fix one issue introduced with the recent
emulator-intercept code (the issue was there before too, but
hidden by other workaround code which was removed in the
mentioned patch-set).
The second patch fixes a problem introduced with the tsc-scaling
patch-set where the TSC was not usable anymore after a
guest-reboot.
All-in-all, these fixes are no -stable material.


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] acpi_piix4: remove bad save/restore of cpus_sts

2011-04-20 Thread Avi Kivity

On 04/18/2011 04:56 PM, Isaku Yamahata wrote:

This patch would fix the segfaults. But I suppose the followings
are necessary.

- PIIX4PMState::gpe_cpu needs to be saved/loaded somewhere


Yes.  Juan?


- gpe_writeb() needs to handle PROC_BASE ... PROC_BASE+31
   like gpe_readb(). To be honest, I don't see why gpe_readb/writeb()
   are used for PROC_BASE...PROC_BASE + 31



Even before the merge, we didn't handle a write to this address.  
Perhaps it's read-only? (that should explain no save/restore).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] Store and load PCI device saved state across function resets

2011-04-20 Thread Avi Kivity

On 04/19/2011 11:12 PM, Alex Williamson wrote:

v1 ->  v2:
   Make the pointer passed around less opaque for type safety.

Bug https://bugs.launchpad.net/qemu/+bug/754591 is caused because
the KVM module attempts to do a pci_save_state() before assigning
the device to a VM, expecting that the saved state will remain
valid until we release the device.  This is in conflict with our
need to reset devices using PCI sysfs during a VM reset to
quiesce the device.  Any calls to pci_reset_function() will
overwrite the device saved stated prior to reset, and reload and
invalidate the state after.  KVM then ends up trying to restore
the state, but it's already invalid, so the device ends up with
reset values.

This series adds a mechanism to pull the saved state off the
struct pci_dev and reload it later.  Thanks,


Based on the sizes of the patches, this should go in via the pci tree.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm hangs with 1GB or more memory assigned

2011-04-20 Thread Avi Kivity

(re-adding list)

On 04/15/2011 07:28 AM, Neal Murphy wrote:

On Wednesday 06 April 2011 03:35:27 you wrote:
>  On 04/06/2011 06:22 AM, Neal Murphy wrote:
>  >  ENVIRONS:
>  >  I'm running
>  >
>  > - Debian Squeeze.
>  > - QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5)
>  > - 2.6.32-5-686-bigmem #1 SMP Tue Mar 8 22:14:55 UTC 2011 i686
>  > GNU/Linux - Quad Phenom II 965, 8GB RAM
>  >
>  >  I'm booting generic 2.6.35.11 through syslinux. The command is generated
>  >  via a script I wrote. It works fine until I assign more than 1005M RAM
>  >  to the VM; it's been working fine (at less than 1GB RAM) for many
>  >  months. The system I am booting boots and runs fine on bare metal.
>  >
>  >  I got the same results when I DLed and installed ver. 0.14.

>  Looks like a guest BIOS issue.
>
>  Please try qemu-0.14.  Also try -cpu qemu64 instead of phenom.
>
>  If those fail, we can attach with gdb and try to look at what's going
>  on, but let's try the simple tests first.

BTW, is there a preferred 'set' of options to feed ./configure?


No.


I used
'./configure --prefix=/opt/kvm --enable-kvm' to build.

Finally got my system (Smoothwall/Roadster) stabilized and out the door, so I
have time to dedicate to this.

OK. I've built 0.14 (installed to /opt/kvm and I deleted Squeeze's older
package) and am using Squeeze's kvm_amd module. I tried without -cpu and -smp;
I tried -cpu qemu64. I eliminated the -vga and -serial options to no avail. It
still chokes on 1005MB RAM.



Works for me.  Please use -monitor stdio and issue the commands

(qemu) info registers
(qemu) x/50i $eip - 30


So what's the next ste... ... Wait, just thought of something to try: 'rmmod
kvm-amd'. ... Oho! Without the accelerator, it runs with at least 1588M RAM,
but can't allocate 2000M RAM (this may be expected on a 32-bit OS with PAE).


2G limit is expected on i386.


Does this help? Is it time to submit a debian bug?


If debian fixes the bug, that would be great.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 V2] kvm tools: Complete missing segments in a iov op using regular op

2011-04-20 Thread Pekka Enberg
Hi,

[ Sasha, please remember to CC people who were involved in discussions! ]

On Mon, Apr 18, 2011 at 4:05 PM, Sasha Levin  wrote:
> If any of the iov operations return mid-block, use regular ops to complete 
> the current block and continue using iov ops.
>
> Signed-off-by: Sasha Levin 
> ---
>  tools/kvm/read-write.c |   58 ++-
>  1 files changed, 51 insertions(+), 7 deletions(-)
>
> diff --git a/tools/kvm/read-write.c b/tools/kvm/read-write.c
> index 0c995c8..bf2e4a0 100644
> --- a/tools/kvm/read-write.c
> +++ b/tools/kvm/read-write.c
> @@ -189,10 +189,10 @@ static inline ssize_t get_iov_size(const struct iovec 
> *iov, int iovcnt)
>  }
>
>  static inline void shift_iovec(const struct iovec **iov, int *iovcnt,
> -                               size_t nr, ssize_t *total, size_t *count, 
> off_t *offset)
> +                               ssize_t *nr, ssize_t *total, size_t *count, 
> off_t *offset)
>  {
> -       while (nr >= (*iov)->iov_len) {
> -               nr -= (*iov)->iov_len;
> +       while ((size_t)*nr >= (*iov)->iov_len) {
> +               *nr -= (*iov)->iov_len;
>                *total += (*iov)->iov_len;
>                *count -= (*iov)->iov_len;
>                if (offset)
> @@ -218,7 +218,18 @@ ssize_t readv_in_full(int fd, const struct iovec *iov, 
> int iovcnt)
>                        return -1;
>                }
>
> -               shift_iovec(&iov, &iovcnt, nr, &total, &count, NULL);
> +               shift_iovec(&iov, &iovcnt, &nr, &total, &count, NULL);
> +
> +               while (nr > 0) {
> +                       ssize_t nr_readagain;
> +                       nr_readagain = xread(fd, iov->iov_base + nr,
> +                                                                       
> iov->iov_len - nr);
> +                       if (nr_readagain <= 0)
> +                               return total;
> +
> +                       nr += nr_readagain;
> +                       shift_iovec(&iov, &iovcnt, &nr, &total, &count, NULL);
> +               }
>        }
>
>        return total;

As mentioned on IRC, I hate this patch with a passion. ;-)

We don't do O_DIRECT now so this doesn't help with performance and the
extra complexity it brings to the table isn't very appealing.
Modifying the struct iovec (or making a copy of it) seems to be much
nicer approach.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: Use pci_store/load_saved_state() around VM device usage

2011-04-20 Thread Avi Kivity

On 04/18/2011 10:43 PM, Alex Williamson wrote:

On Sun, 2011-04-17 at 12:25 +0300, Avi Kivity wrote:
>  On 04/15/2011 10:54 PM, Alex Williamson wrote:
>  >  Store the device saved state so that we can reload the device back
>  >  to the original state when it's unassigned.  This has the benefit
>  >  that the state survives across pci_reset_function() calls via
>  >  the PCI sysfs reset interface while the VM is using the device.
>
>  >  @@ -516,7 +518,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
>  >
>  >  pci_reset_function(dev);
>  >  pci_save_state(dev);
>  >  -
>  >  +   match->pci_saved_state = pci_store_saved_state(dev);
>  >  match->assigned_dev_id = assigned_dev->assigned_dev_id;
>
>  Error check?
>
>  It might be better to give up the opacity of the data structure and make
>  pci_saved_state the full struct, not a pointer.

pci_store_saved_state() returns NULL on error, which is correctly
handled if we pass NULL to pci_load_saved_state() or a pointer to NULL
to pci_load_and_free_saved_state().


But we silently swallow an error, this isn't good.


   This is also why I changed the
__pci_reset_function() back to a normal pci_reset_function(), so we're
never left with an uninitialized device like we are now.

We could be more verbose or return an error here, but we've gone for a
long time not even doing this save/restore across VM usage, so I don't
think it's worthy of preventing the device attachment if it fails.


At least a log?

Note avoiding the pointer would have removed the problem altogether.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html