Re: [PATCH 0/1] virtio: console: fix for early console

2011-09-22 Thread Rusty Russell
On Thu, 22 Sep 2011 20:59:13 +0200, Christian Borntraeger 
 wrote:
> On 22/09/11 20:14, Amit Shah wrote:
> > Hi Rusty,
> > 
> > This is a fix from Christian for early console handling with multiport
> > support.  Please apply.
> > 
> > Christian, I've made some changes to the patch as noted in the commit
> > message.  Nothing major, but an ACK would be nice.
> 
> The changes look fine.
> Acked-by: Chrstian Borntraeger 

Thanks, applied.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: inter VM / PF-VF communication

2011-09-22 Thread Sagar Borikar
> I'm not aware of any vendor these days that actually requires a PV driver
> for PF-VF communications.  I know some toyed with the idea years ago but I
> thought malboxes have become defacto standard.
May be because Intel started the mailbox implementation ;). But just
wondering from hypervisor point of view, shouldn't their be any way to
communicate between PF and VF? It may also lead to unraveling loop
holes in security issues that a rogue VF driver can do.

Do you also say that Linux would depend upon HW for this? Although I
see couple of papers for PF-VF communication in KVM through mmio and
virtual pci devices to guest..
> Is there a specific card you think needs a pv mailbox?
>
> Regards,
>
> Anthony Liguori
>
>>
>> Thanks
>> Sagar
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: inter VM / PF-VF communication

2011-09-22 Thread Anthony Liguori

On 09/22/2011 10:23 AM, Sagar Borikar wrote:

All,

Sorry if I am not keeping up on the subject but wanted to know whether
there is any effort going on for inter VM communication / PF-VF
communication (in case of SR-IOV)
I see that most of SR-IOV capable NIC supports mailboxes for that
purpose to avoid the security hole.
Xen has virtual device implementation for the same. Should I presume
that such kind of effort is not on the radar and HW needs to own the
responsibility of filling the loop holes in security threats imposed
by VF?


I'm not aware of any vendor these days that actually requires a PV driver for 
PF-VF communications.  I know some toyed with the idea years ago but I thought 
malboxes have become defacto standard.


Is there a specific card you think needs a pv mailbox?

Regards,

Anthony Liguori



Thanks
Sagar
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulate lapic tsc deadline timer for guest

2011-09-22 Thread Marcelo Tosatti
On Thu, Sep 22, 2011 at 11:22:02PM +0800, Liu, Jinsong wrote:
> Marcelo Tosatti wrote:
> > On Thu, Sep 22, 2011 at 04:55:52PM +0800, Liu, Jinsong wrote:
> >>> From 4d5b83aba40ce0d421add9a41a6c591a8590a32e Mon Sep 17 00:00:00
> >>> 2001 
> >> From: Liu, Jinsong 
> >> Date: Thu, 22 Sep 2011 14:00:08 +0800
> >> Subject: [PATCH 2/2] KVM: emulate lapic tsc deadline timer for guest
> >> 
> >> This patch emulate lapic tsc deadline timer for guest:
> >> Enumerate tsc deadline timer capability by CPUID;
> >> Enable tsc deadline timer mode by lapic MMIO;
> >> Start tsc deadline timer by WRMSR;
> >> 
> >> Signed-off-by: Liu, Jinsong  ---
> >>  arch/x86/include/asm/kvm_host.h |2 +
> >>  arch/x86/kvm/kvm_timer.h|2 +
> >>  arch/x86/kvm/lapic.c|  123
> >>  --- arch/x86/kvm/lapic.h   
> >>  |3 + arch/x86/kvm/x86.c  |   16 +-
> >>  5 files changed, 123 insertions(+), 23 deletions(-)
> > 
> > Looks good, please rebase against branch master of
> > 
> > git://github.com/avikivity/kvm.git
> 
> Thanks!
> 
> And, for qemu patch rebase, I guess below address, which one is right?
> git://github.com/avikivity/qemu-kvm.git   or
> git://github.com/avikivity/qemu.git

qemu.git. 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] virtio: console: fix for early console

2011-09-22 Thread Christian Borntraeger
On 22/09/11 20:14, Amit Shah wrote:
> Hi Rusty,
> 
> This is a fix from Christian for early console handling with multiport
> support.  Please apply.
> 
> Christian, I've made some changes to the patch as noted in the commit
> message.  Nothing major, but an ACK would be nice.

The changes look fine.
Acked-by: Chrstian Borntraeger 

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VNC and SDL/VGA simultaneously?

2011-09-22 Thread Erik Rull

Hi all,

is there an update regarding simultaneous use of VNC and VGA output?

Thanks.

Best regards,

Erik



Bitman Zhou wrote:

We tried before with both spice(QXL) and VNC enabled at the same time
for the same VM. It works a little bit, I mean VNC session can hold some
time. I use gtk-vnc and it looks like qemu vnc implementation sends some
special packets and cause gtk-vnc broken.

BR
Bitman Zhou

在 2011-03-10四的 20:15 +0100,Erik Rull写道:

Avi Kivity wrote:

On 03/09/2011 11:31 PM, Erik Rull wrote:

Hi all,

is it possible to parameterize qemu in a way where the VNC port and
the VGA output is available in parallel?



Not really, though it should be possible to do it with some effort.


My system screen remains dark if I run it with the -vnc :0 option and
vnc is unavailable when SDL/VGA is available.



What's your use case?


I want to make remote support possble for a guest system that has only a
local network connection to the host and has a graphical console running
where the operator works on. (So "real" VNC is not available to the rest of
the world)
It would also be okay if a switching between the VNC and the SDL/VGA would
be possible at runtime so that the remote support can do the work and then
switch back to the operators screen.

Best regards,

Erik
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] virtio: console: wait for first console port for early console output

2011-09-22 Thread Amit Shah
From: Christian Borntraeger 

On s390 I have seen some random

"Warning: unable to open an initial console"

boot failure. Turns out that tty_open fails, because the
hvc_alloc was not yet done. In former times this could not happen,
since the probe function automatically called hvc_alloc. With newer
versions (multiport) some host<->guest interaction is required
before hvc_alloc is called. This might be too late, especially if
an initramfs is involved. Lets use a completion if we have
multiport and an early console.

[Amit:
  * Use NULL instead of 0 for pointer comparison
  * Rename 'port_added' to 'early_console_added'
  * Re-format, re-word commit message
  * Rebase patch on top of current queue]

Signed-off-by: Christian Borntraeger 
Signed-off-by: Amit Shah 
---
 drivers/char/virtio_console.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 9ea3b5e..7f2c6e5 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -19,6 +19,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -74,6 +75,7 @@ struct ports_driver_data {
 static struct ports_driver_data pdrvdata;
 
 DEFINE_SPINLOCK(pdrvdata_lock);
+DECLARE_COMPLETION(early_console_added);
 
 /* This struct holds information that's relevant only for console ports */
 struct console {
@@ -1366,6 +1368,7 @@ static void handle_control_message(struct ports_device 
*portdev,
break;
 
init_port_console(port);
+   complete(&early_console_added);
/*
 * Could remove the port here in case init fails - but
 * have to notify the host first.
@@ -1668,6 +1671,10 @@ static int __devinit virtcons_probe(struct virtio_device 
*vdev)
struct ports_device *portdev;
int err;
bool multiport;
+   bool early = early_put_chars != NULL;
+
+   /* Ensure to read early_put_chars now */
+   barrier();
 
portdev = kmalloc(sizeof(*portdev), GFP_KERNEL);
if (!portdev) {
@@ -1739,6 +1746,19 @@ static int __devinit virtcons_probe(struct virtio_device 
*vdev)
 
__send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
   VIRTIO_CONSOLE_DEVICE_READY, 1);
+
+   /*
+* If there was an early virtio console, assume that there are no
+* other consoles. We need to wait until the hvc_alloc matches the
+* hvc_instantiate, otherwise tty_open will complain, resulting in
+* a "Warning: unable to open an initial console" boot failure.
+* Without multiport this is done in add_port above. With multiport
+* this might take some host<->guest communication - thus we have to
+* wait.
+*/
+   if (multiport && early)
+   wait_for_completion(&early_console_added);
+
return 0;
 
 free_vqs:
-- 
1.7.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] virtio: console: fix for early console

2011-09-22 Thread Amit Shah
Hi Rusty,

This is a fix from Christian for early console handling with multiport
support.  Please apply.

Christian, I've made some changes to the patch as noted in the commit
message.  Nothing major, but an ACK would be nice.

Thanks.


Christian Borntraeger (1):
  virtio: console: wait for first console port for early console output

 drivers/char/virtio_console.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

-- 
1.7.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] pci-assign: Fix MSI-X capability test

2011-09-22 Thread Alex Williamson
Commit c4525754 added a capability check for KVM_CAP_DEVICE_MSIX,
which is unfortunately not exposed, resulting in MSIX never
being listed as a capability.  This breaks anything depending on
MSIX, such as igbvf.  Instead let's use a dummy call to
KVM_ASSIGN_SET_MSIX_NR which will return -EFAULT if the call
exists.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 137c409..f0a6ca9 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1212,7 +1212,10 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 }
 /* Expose MSI-X capability */
 pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_DEVICE_MSIX)) {
+/* Would really like to test kvm_check_extension(, KVM_CAP_DEVICE_MSIX),
+ * but the kernel doesn't expose it.  Instead do a dummy call to
+ * KVM_ASSIGN_SET_MSIX_NR to see if it exists. */
+if (pos != 0 && kvm_assign_set_msix_nr(kvm_state, NULL) == -EFAULT) {
 int bar_nr;
 uint32_t msix_table_entry;
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] pci-assign: Re-order initfn for memory API

2011-09-22 Thread Alex Williamson
We now need to scan PCI capabilities and setup an MSI-X page
before we walk the device resources since the overlay is now
setup during init instead of at the first mapping by the guest.

Signed-off-by: Alex Williamson 
---

 hw/device-assignment.c |   19 +++
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 288f80c..137c409 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1603,6 +1603,17 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 goto out;
 }
 
+if (assigned_device_pci_cap_init(pci_dev) < 0) {
+goto out;
+}
+
+/* intercept MSI-X entry page in the MMIO */
+if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
+if (assigned_dev_register_msix_mmio(dev)) {
+goto out;
+}
+}
+
 /* handle real device's MMIO/PIO BARs */
 if (assigned_dev_register_regions(dev->real_device.regions,
   dev->real_device.region_number,
@@ -1618,9 +1629,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 dev->h_busnr = dev->host.bus;
 dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
 
-if (assigned_device_pci_cap_init(pci_dev) < 0)
-goto out;
-
 /* assign device to guest */
 r = assign_device(dev);
 if (r < 0)
@@ -1631,11 +1639,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (r < 0)
 goto assigned_out;
 
-/* intercept MSI-X entry page in the MMIO */
-if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
-if (assigned_dev_register_msix_mmio(dev))
-goto assigned_out;
-
 assigned_dev_load_option_rom(dev);
 QLIST_INSERT_HEAD(&devs, dev, next);
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/2] pci-assign: Fix MSI-X support

2011-09-22 Thread Alex Williamson
Assigned device MSI-X support hasn't been working, this fixes
it.  I believe this should also fix:

https://bugs.launchpad.net/qemu/+bug/830558

v2:
   Incorporate comments from Jan

WRT exposing KVM_CAP_DEVICE_MSIX, I'll send a patch with a big
comment noting that we can't rely on it for older kernels.  Maybe
some day we'll draw a line in the sand and be able to use it.
Thanks,

Alex

---

Alex Williamson (2):
  pci-assign: Fix MSI-X capability test
  pci-assign: Re-order initfn for memory API


 hw/device-assignment.c |   24 +++-
 1 files changed, 15 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] adds cgroup tests on KVM guests with first test

2011-09-22 Thread Lukas Doktor
basic structure:
 * similar to general client/tests/cgroup/ test (imports from the
   cgroup_common.py)
 * uses classes for better handling
 * improved logging and error handling
 * checks/repair the guests after each subtest
 * subtest mapping is specified in test dictionary in cgroup.py
 * allows to specify tests/repetions in tests_base.cfg
(cgroup_tests = "re1[:loops] re2[:loops] ...")

TestBlkioBandwidthWeight{Read,Write}:
 * Two similar tests for blkio.weight functionality inside the guest using
   direct io and virtio_blk driver
 * Function:
 1) On 2 VMs adds small (10MB) virtio_blk disk
 2) Assigns each to different cgroup and sets blkio.weight 100/1000
 3) Runs dd with flag=direct (read/write) from the virtio_blk disk
repeatidly
 4) After 1 minute checks the results. If the ratio is better then 1:3,
test passes

Signed-off-by: Lukas Doktor 
---
 client/tests/kvm/subtests.cfg.sample |7 +
 client/tests/kvm/tests/cgroup.py |  316 ++
 2 files changed, 323 insertions(+), 0 deletions(-)
 create mode 100644 client/tests/cgroup/__init__.py
 create mode 100644 client/tests/kvm/tests/cgroup.py

diff --git a/client/tests/cgroup/__init__.py b/client/tests/cgroup/__init__.py
new file mode 100644
index 000..e69de29
diff --git a/client/tests/kvm/subtests.cfg.sample 
b/client/tests/kvm/subtests.cfg.sample
index 74e550b..79e0656 100644
--- a/client/tests/kvm/subtests.cfg.sample
+++ b/client/tests/kvm/subtests.cfg.sample
@@ -848,6 +848,13 @@ variants:
 only Linux
 type = iofuzz
 
+- cgroup:
+type = cgroup
+# cgroup_tests = "re1[:loops] re2[:loops] ..."
+cgroup_tests = ".*:1"
+vms += " vm2"
+extra_params += " -snapshot"
+
 - virtio_console: install setup image_copy unattended_install.cdrom
 only Linux
 vms = ''
diff --git a/client/tests/kvm/tests/cgroup.py b/client/tests/kvm/tests/cgroup.py
new file mode 100644
index 000..4d0ec43
--- /dev/null
+++ b/client/tests/kvm/tests/cgroup.py
@@ -0,0 +1,316 @@
+"""
+cgroup autotest test (on KVM guest)
+@author: Lukas Doktor 
+@copyright: 2011 Red Hat, Inc.
+"""
+import logging, re, sys, tempfile, time, traceback
+from autotest_lib.client.common_lib import error
+from autotest_lib.client.bin import utils
+from autotest_lib.client.tests.cgroup.cgroup_common import Cgroup, 
CgroupModules
+
+def run_cgroup(test, params, env):
+"""
+Tests the cgroup functions on KVM guests.
+ * Uses variable tests (marked by TODO comment) to map the subtests
+"""
+vms = None
+tests = None
+
+# Tests
+class _TestBlkioBandwidth:
+"""
+BlkioBandwidth dummy test
+ * Use it as a base class to an actual test!
+ * self.dd_cmd and attr '_set_properties' have to be implemented
+ * It prepares 2 vms and run self.dd_cmd to simultaniously stress the
+machines. After 1 minute it kills the dd and gather the throughput
+informations.
+"""
+def __init__(self, vms, modules):
+"""
+Initialization
+@param vms: list of vms
+@param modules: initialized cgroup module class
+"""
+self.vms = vms  # Virt machines
+self.modules = modules  # cgroup module handler
+self.blkio = Cgroup('blkio', '')# cgroup blkio handler
+self.files = [] # Temporary files (files of virt disks)
+self.devices = []   # Temporary virt devices (PCI drive 1 per vm)
+self.dd_cmd = None  # DD command used to test the throughput
+
+def cleanup(self):
+"""
+Cleanup
+"""
+err = ""
+try:
+for i in range (2):
+vms[i].monitor.cmd("pci_del %s" % self.devices[i])
+self.files[i].close()
+except Exception, inst:
+err += "\nCan't remove PCI drive: %s" % inst
+try:
+del(self.blkio)
+except Exception, inst:
+err += "\nCan't remove Cgroup: %s" % inst
+
+if err:
+logging.error("Some parts of cleanup failed:%s", err)
+raise error.TestError("Some parts of cleanup failed:%s" % err)
+
+def init(self):
+"""
+Initialization
+ * assigns vm1 and vm2 into cgroups and sets the properties
+ * creates a new virtio device and adds it into vms
+"""
+if test.tagged_testname.find('virtio_blk') == -1:
+logging.warn("You are executing non-virtio_blk test but this "
+ "particular subtest uses manually added "
+ "'virtio_blk' device.")
+if not self.dd_cmd:
+raise error.TestError("Corrupt class, aren't you trying to run 
"
+ 

[PATCH 1/2] cgroup: cgroup_common.py bugfixies and modifications

2011-09-22 Thread Lukas Doktor
[FIX] incorrect prop/dir variable usage
[MOD] Use __del__() instead of cleanup - Simplifies the code with small 
drawback (failures can't be handled. Anyway, they are not critical and were 
never handled before...)

Signed-off-by: Lukas Doktor 
---
 client/tests/cgroup/cgroup_common.py |   41 +-
 1 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/client/tests/cgroup/cgroup_common.py 
b/client/tests/cgroup/cgroup_common.py
index 836a23e..2a95c76 100755
--- a/client/tests/cgroup/cgroup_common.py
+++ b/client/tests/cgroup/cgroup_common.py
@@ -25,8 +25,20 @@ class Cgroup(object):
 self.module = module
 self._client = _client
 self.root = None
+self.cgroups = []
 
 
+def __del__(self):
+"""
+Destructor
+"""
+self.cgroups.sort(reverse=True)
+for pwd in self.cgroups[:]:
+for task in self.get_property("tasks", pwd):
+if task:
+self.set_root_cgroup(int(task))
+self.rm_cgroup(pwd)
+
 def initialize(self, modules):
 """
 Initializes object for use.
@@ -57,6 +69,7 @@ class Cgroup(object):
 except Exception, inst:
 logging.error("cg.mk_cgroup(): %s" , inst)
 return None
+self.cgroups.append(pwd)
 return pwd
 
 
@@ -70,6 +83,10 @@ class Cgroup(object):
 """
 try:
 os.rmdir(pwd)
+self.cgroups.remove(pwd)
+except ValueError:
+logging.warn("cg.rm_cgroup(): Removed cgroup which wasn't created"
+ "using this Cgroup")
 except Exception, inst:
 if not supress:
 logging.error("cg.rm_cgroup(): %s" , inst)
@@ -329,6 +346,22 @@ class CgroupModules(object):
 self.modules.append([])
 self.mountdir = mkdtemp(prefix='cgroup-') + '/'
 
+def __del__(self):
+"""
+Unmount all cgroups and remove the mountdir
+"""
+for i in range(len(self.modules[0])):
+if self.modules[2][i]:
+try:
+os.system('umount %s -l' % self.modules[1][i])
+except:
+logging.warn("CGM: Couldn't unmount %s directory"
+ % self.modules[1][i])
+try:
+os.system('rm -rf %s' % self.mountdir)
+except:
+logging.warn("CGM: Couldn't remove the %s directory"
+ % self.mountdir)
 
 def init(self, _modules):
 """
@@ -376,13 +409,9 @@ class CgroupModules(object):
 
 def cleanup(self):
 """
-Unmount all cgroups and remove the mountdir.
+Kept for compatibility
 """
-for i in range(len(self.modules[0])):
-if self.modules[2][i]:
-utils.system('umount %s -l' % self.modules[1][i],
- ignore_status=True)
-shutil.rmtree(self.mountdir)
+pass
 
 
 def get_pwd(self, module):
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-autotest][PATCH] cgroup test with KVM guest +first subtests

2011-09-22 Thread Lukas Doktor
Hi guys,

Do you remember the discussion about cgroup testing in autotest vs. LTP? I hope 
there won't be any doubts about this one as ground_test (+ first 2 subtests) 
are strictly focused on cgroups features enforced on KVM guest systems. Also 
more subtests will follow if you approve the test structure (blkio_throttle, 
memory, cpus...).

No matter whether we drop or keep the general 'cgroup' test. The 
'cgroup_common.py' library can be imported either from 'client/tests/cgroup/' 
directory or directly from 'client/tests/kvm/tests/' directory.

The modifications of 'cgroup_common.py' library is backward compatible with 
general cgroup test.

See the commits for details.

Regards,
Lukáš Doktor

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


inter VM / PF-VF communication

2011-09-22 Thread Sagar Borikar
All,

Sorry if I am not keeping up on the subject but wanted to know whether
there is any effort going on for inter VM communication / PF-VF
communication (in case of SR-IOV)
I see that most of SR-IOV capable NIC supports mailboxes for that
purpose to avoid the security hole.
Xen has virtual device implementation for the same. Should I presume
that such kind of effort is not on the radar and HW needs to own the
responsibility of filling the loop holes in security threats imposed
by VF?

Thanks
Sagar
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: emulate lapic tsc deadline timer for guest

2011-09-22 Thread Liu, Jinsong
Marcelo Tosatti wrote:
> On Thu, Sep 22, 2011 at 04:55:52PM +0800, Liu, Jinsong wrote:
>>> From 4d5b83aba40ce0d421add9a41a6c591a8590a32e Mon Sep 17 00:00:00
>>> 2001 
>> From: Liu, Jinsong 
>> Date: Thu, 22 Sep 2011 14:00:08 +0800
>> Subject: [PATCH 2/2] KVM: emulate lapic tsc deadline timer for guest
>> 
>> This patch emulate lapic tsc deadline timer for guest:
>> Enumerate tsc deadline timer capability by CPUID;
>> Enable tsc deadline timer mode by lapic MMIO;
>> Start tsc deadline timer by WRMSR;
>> 
>> Signed-off-by: Liu, Jinsong  ---
>>  arch/x86/include/asm/kvm_host.h |2 +
>>  arch/x86/kvm/kvm_timer.h|2 +
>>  arch/x86/kvm/lapic.c|  123
>>  --- arch/x86/kvm/lapic.h   
>>  |3 + arch/x86/kvm/x86.c  |   16 +-
>>  5 files changed, 123 insertions(+), 23 deletions(-)
> 
> Looks good, please rebase against branch master of
> 
> git://github.com/avikivity/kvm.git

Thanks!

And, for qemu patch rebase, I guess below address, which one is right?
git://github.com/avikivity/qemu-kvm.git   or
git://github.com/avikivity/qemu.git
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: Fix inject-nmi

2011-09-22 Thread Jan Kiszka
On 2011-09-22 11:50, Lai Jiangshan wrote:
> 
> From: KAMEZAWA Hiroyuki 
> Subject: [PATCH] Fix inject-nmi
> 
> Now, inject-nmi sends NMI to all cpus...but this doesn't emulate
> pc hardware 'NMI button', which triggers LINT1.
> 
> So, now, LINT1 mask is ignored by inject-nmi and NMIs are sent to
> all cpus without checking LINT1 mask.
> 
> Because Linux masks LINT1 of cpus other than 0, this makes trouble.
> For example, kdump cannot run sometimes.
> ---
>  hw/apic.c |7 +++
>  hw/apic.h |1 +
>  monitor.c |4 ++--
>  3 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/apic.c b/hw/apic.c
> index 69d6ac5..020305b 100644
> --- a/hw/apic.c
> +++ b/hw/apic.c
> @@ -205,6 +205,13 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
>  }
>  }
>  
> +void apic_deliver_lint1_intr(DeviceState *d)
> +{
> +APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
> +
> +   apic_local_deliver(s, APIC_LVT_LINT1);

This will cause a qemu crash when apic_state is NULL (non-SMP 486
systems). Moreover: wrong indention.

You know that this won't work for qemu-kvm with in-kernel irqchip? You
may want to provide a patch for that tree, emulating the unavailable
LINT1 injection via testing the APIC configration and then raising an
NMI as before if it is accepted.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: emulate lapic tsc deadline timer for guest

2011-09-22 Thread Marcelo Tosatti
On Thu, Sep 22, 2011 at 04:55:52PM +0800, Liu, Jinsong wrote:
> >From 4d5b83aba40ce0d421add9a41a6c591a8590a32e Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong 
> Date: Thu, 22 Sep 2011 14:00:08 +0800
> Subject: [PATCH 2/2] KVM: emulate lapic tsc deadline timer for guest
> 
> This patch emulate lapic tsc deadline timer for guest:
> Enumerate tsc deadline timer capability by CPUID;
> Enable tsc deadline timer mode by lapic MMIO;
> Start tsc deadline timer by WRMSR;
> 
> Signed-off-by: Liu, Jinsong 
> ---
>  arch/x86/include/asm/kvm_host.h |2 +
>  arch/x86/kvm/kvm_timer.h|2 +
>  arch/x86/kvm/lapic.c|  123 
> ---
>  arch/x86/kvm/lapic.h|3 +
>  arch/x86/kvm/x86.c  |   16 +-
>  5 files changed, 123 insertions(+), 23 deletions(-)

Looks good, please rebase against branch master of

git://github.com/avikivity/kvm.git 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT

2011-09-22 Thread Kevin O'Connor
On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote:
> > On Wed, Sep 21, 2011 at 03:44:13PM +0300, Michael S. Tsirkin wrote:
> > > The correct way to suppress hotplug is not to have _EJ0,
> > > so this is what this patch does: it probes PIIX and
> > > modifies DSDT to match.
> > 
> > The code to generate basic SSDT code isn't that difficult (see
> > build_ssdt and src/ssdt-proc.dsl).  Is there a compelling reason to
> > patch the DSDT versus just generating the necessary blocks in an SSDT?
> 
> I don't really care whether the code is in DSDT or SSDT,
> IMO there isn't much difference between build_ssdt and patching:
> main reason is build_ssdt uses offsets hardcoded to a specific binary
> (ssdt_proc and SD_OFFSET_* ) while I used
> a script to extract offsets.
> 
> I think we should avoid relying on copy-pasted binary 
> because I see the related ASL code changing in the near future
> (with multifunction and bridge support among others).
> 
> I can generalize the approach though, so that
> it can work for finding arbitrary names
> without writing more scripts, hopefully with the
> potential to address the hard-coded offsets in acpi.c
> as well. Does that sound interesting?

Replacing the hardcoding of offsets in src/ssdt-proc.dsl would be
nice.

I'll take a look at your new patches tonight.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Best way to exchange data between host and guest?

2011-09-22 Thread Sasha Levin
On Thu, 2011-09-22 at 05:04 -0700, Anjali Kulkarni wrote:
> Hi,
> 
> What is the fastest way to exchange data between host and guest?
> 
> Thanks
> Anjali

With regards to speed, I guess it's the ivshmem device.

-- 

Sasha.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] virtio-console: wait for console ports

2011-09-22 Thread Christian Borntraeger
On s390 I have seen some random "Warning: unable to open an initial
console" boot failure. Turns out that tty_open fails, because the
hvc_alloc was not yet done. In former times this could not happen,
since the probe function automatically called hvc_alloc. With newer
versions (multiport) some host<->guest interaction is required
before hvc_alloc is called. This might be too late, especially if
an initramfs is involved. Lets use a completion if we have
multiport and an early console.

Signed-off-by: Christian Borntraeger 


---
 drivers/char/virtio_console.c |   20 
 1 file changed, 20 insertions(+)

Index: b/drivers/char/virtio_console.c
===
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -19,6 +19,7 @@
  */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -73,6 +74,7 @@ struct ports_driver_data {
 static struct ports_driver_data pdrvdata;
 
 DEFINE_SPINLOCK(pdrvdata_lock);
+DECLARE_COMPLETION(port_added);
 
 /* This struct holds information that's relevant only for console ports */
 struct console {
@@ -1352,6 +1354,7 @@ static void handle_control_message(struc
break;
 
init_port_console(port);
+   complete(&port_added);
/*
 * Could remove the port here in case init fails - but
 * have to notify the host first.
@@ -1648,6 +1651,10 @@ static int __devinit virtcons_probe(stru
struct ports_device *portdev;
int err;
bool multiport;
+   bool early = early_put_chars != 0;
+
+   /* Ensure to read early_put_chars now */
+   barrier();
 
portdev = kmalloc(sizeof(*portdev), GFP_KERNEL);
if (!portdev) {
@@ -1719,6 +1726,19 @@ static int __devinit virtcons_probe(stru
 
__send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
   VIRTIO_CONSOLE_DEVICE_READY, 1);
+
+   /*
+* If there was an early virtio console, assume that there are no
+* other consoles. We need to wait until the hvc_alloc matches the
+* hvc_instantiate, otherwise tty_open will complain, resulting in
+* a "Warning: unable to open an initial console" boot failure.
+* Without multiport this is done in add_port above. With multiport
+* this might take some host<->guest communication - thus we have to
+* wait.
+*/
+   if (multiport && early)
+   wait_for_completion(&port_added);
+
return 0;
 
 free_vqs:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Best way to exchange data between host and guest?

2011-09-22 Thread Anjali Kulkarni
Hi,

What is the fastest way to exchange data between host and guest?

Thanks
Anjali

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Enable "fast string operations" in MSR_IA32_MISC_ENABLE

2011-09-22 Thread Avi Kivity
Recent (3.0+) Linux guests check for the fast string bit in
MSR_IA32_MISC_ENABLE before enabling rep/movs based memcpy and
related on fam 6/model 13+ processors.

Enable the bit by default, as required by the specification.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/x86.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b37f18..459f2bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6374,6 +6374,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 {
+   vcpu->arch.ia32_misc_enable_msr = MSR_IA32_MISC_ENABLE_FAST_STRING;
+
vcpu->arch.nmi_pending = false;
vcpu->arch.nmi_injected = false;
 
-- 
1.7.6.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 3/3] acpi: remove _RMV

2011-09-22 Thread Michael S. Tsirkin
The macro gen_pci_device is used to add _RMV
method to a slot device so it is no longer needed:
presence of _EJ0 now indicates that the slot is ejectable.
It is also placing two devices with the same _ADR
on the same bus, which isn't defined by the ACPI spec.
So let's remove it.

Signed-off-by: Michael S. Tsirkin 
---
 src/acpi-dsdt.dsl |   49 -
 1 files changed, 0 insertions(+), 49 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 440e315..055202b 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -145,12 +145,6 @@ DefinitionBlock (
 {
 B0EJ, 32,
 }
-
-OperationRegion(RMVC, SystemIO, 0xae0c, 0x04)
-Field(RMVC, DWordAcc, NoLock, WriteAsZeros)
-{
-PCRM, 32,
-}
 // Method _EJ0 can be patched by BIOS to EJ0_
 // at runtime, if the slot is detected to not support hotplug.
 // Extract the offset of the address dword and the
@@ -488,49 +482,6 @@ DefinitionBlock (
DRSJ, 32
}
}
-
-#define gen_pci_device(name, nr)\
-Device(SL##name) {  \
-Name (_ADR, nr##)   \
-Method (_RMV) { \
-If (And(\_SB.PCI0.PCRM, ShiftLeft(1, nr))) {\
-Return (0x1)\
-}   \
-Return (0x0)\
-}   \
-Name (_SUN, name)   \
-}
-
-/* VGA (slot 1) and ISA bus (slot 2) defined above */
-   gen_pci_device(3, 0x0003)
-   gen_pci_device(4, 0x0004)
-   gen_pci_device(5, 0x0005)
-   gen_pci_device(6, 0x0006)
-   gen_pci_device(7, 0x0007)
-   gen_pci_device(8, 0x0008)
-   gen_pci_device(9, 0x0009)
-   gen_pci_device(10, 0x000a)
-   gen_pci_device(11, 0x000b)
-   gen_pci_device(12, 0x000c)
-   gen_pci_device(13, 0x000d)
-   gen_pci_device(14, 0x000e)
-   gen_pci_device(15, 0x000f)
-   gen_pci_device(16, 0x0010)
-   gen_pci_device(17, 0x0011)
-   gen_pci_device(18, 0x0012)
-   gen_pci_device(19, 0x0013)
-   gen_pci_device(20, 0x0014)
-   gen_pci_device(21, 0x0015)
-   gen_pci_device(22, 0x0016)
-   gen_pci_device(23, 0x0017)
-   gen_pci_device(24, 0x0018)
-   gen_pci_device(25, 0x0019)
-   gen_pci_device(26, 0x001a)
-   gen_pci_device(27, 0x001b)
-   gen_pci_device(28, 0x001c)
-   gen_pci_device(29, 0x001d)
-   gen_pci_device(30, 0x001e)
-   gen_pci_device(31, 0x001f)
 }
 
 /* PCI IRQs */
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 2/3] acpi: EJ0 method name patching

2011-09-22 Thread Michael S. Tsirkin
Modify ACPI to only supply _EJ0 methods for PCI
slots that support hotplug.

This is done by runtime patching:
- Instrument ASL code with ACPI_EXTRACT directives
  tagging _EJ0 and _ADR fields.
- At compile time, tools/acpi_extract.py looks for these methods
  in ASL source finds the matching AML, and stores the offsets
  of these methods in tables named aml_ej0_name and aml_adr_dword.
- At run time, go over aml_ej0_name, use aml_adr_dword
  to get slot information and check which slots
  support hotplug.

  If hotplug is disabled, we patch the _EJ0 NameString in ACPI table,
  replacing _EJ0 with EJ0_.

  Note that this has the same checksum, but
  is ignored by OSPM.

Note: the method used is robust in that we don't need
to change any offsets manually in case of ASL code changes.
As all parsing is done at compile time, any unexpected input causes
build failure, not a runtime failure.

Signed-off-by: Michael S. Tsirkin 
---
 src/acpi-dsdt.dsl |   47 ++-
 src/acpi.c|   31 +++
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 08412e2..440e315 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -16,6 +16,30 @@
  * License along with this library; if not, write to the Free Software
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
+
+/*
+Documentation of ACPI_EXTRACT_* directive tags:
+
+These directive tags are processed by tools/acpi_extract.py
+to output offset information from AML for BIOS runtime table generation.
+Each directive is of the form:
+ACPI_EXTRACT_   (...)
+and causes the extractor to create an array
+named  with offset, in the generated AML,
+of an object of a given type from the following .
+
+A directive and array name must fit on a single code line.
+
+Object type in AML is verified, a mismatch causes a build failure.
+
+Directives and operators currently supported are:
+ACPI_EXTRACT_NAME_DWORD_CONST - extract a Dword Const object from Name()
+ACPI_EXTRACT_METHOD_STRING - extract a NameString from Method()
+ACPI_EXTRACT_NAME_STRING - extract a NameString from Name()
+
+ACPI_EXTRACT is not allowed anywhere else in code, except in comments.
+*/
+
 DefinitionBlock (
 "acpi-dsdt.aml",// Output Filename
 "DSDT", // Signature
@@ -127,15 +151,20 @@ DefinitionBlock (
 {
 PCRM, 32,
 }
-
-#define hotplug_slot(name, nr) \
-Device (S##name) {\
-   Name (_ADR, nr##)  \
-   Method (_EJ0,1) {  \
-Store(ShiftLeft(1, nr), B0EJ) \
-Return (0x0)  \
-   }  \
-   Name (_SUN, name)  \
+// Method _EJ0 can be patched by BIOS to EJ0_
+// at runtime, if the slot is detected to not support hotplug.
+// Extract the offset of the address dword and the
+// _EJ0 name to allow this patching.
+#define hotplug_slot(name, nr)\
+Device (S##name) {\
+ACPI_EXTRACT_NAME_DWORD_CONST aml_adr_dword   \
+Name (_ADR, nr##) \
+ACPI_EXTRACT_METHOD_STRING aml_ej0_name   \
+Method  (_EJ0, 1) {   \
+Store(ShiftLeft(1, nr), B0EJ) \
+Return (0x0)  \
+   }  \
+   Name (_SUN, name)  \
 }
 
hotplug_slot(1, 0x0001)
diff --git a/src/acpi.c b/src/acpi.c
index 6bb6ff6..f65f974 100644
--- a/src/acpi.c
+++ b/src/acpi.c
@@ -198,6 +198,8 @@ struct srat_memory_affinity
 u32reserved3[2];
 } PACKED;
 
+#define PCI_RMV_BASE 0xae0c
+
 #include "acpi-dsdt.hex"
 
 static void
@@ -237,12 +239,16 @@ static const struct pci_device_id fadt_init_tbl[] = {
 PCI_DEVICE_END
 };
 
+extern void link_time_assertion(void);
+
 static void *
 build_fadt(struct pci_device *pci)
 {
 struct fadt_descriptor_rev1 *fadt = malloc_high(sizeof(*fadt));
 struct facs_descriptor_rev1 *facs = memalign_high(64, sizeof(*facs));
 void *dsdt = malloc_high(sizeof(AmlCode));
+u32 rmvc_pcrm;
+int i;
 
 if (!fadt || !facs || !dsdt) {
 warn_noalloc();
@@ -257,6 +263,25 @@ build_fadt(struct pci_device *pci)
 /* DSDT */
 memcpy(dsdt, AmlCode, sizeof(AmlCode));
 
+/* Runtime patching of EJ0: to disable hotplug for a slot,
+ * replace the method name: _EJ0 by EJ0_. */
+if (ARRAY_SIZE(aml_ej0_name) != ARRAY_SIZE(aml_adr_dword)) {
+link_time_assertion();
+}
+rmvc_pcrm = inl(PCI_RMV_BASE);
+fo

[PATCHv2 1/3] acpi: generate and parse mixed asl/aml listing

2011-09-22 Thread Michael S. Tsirkin
Use iasl -l flag to produce a mixed listing, where a
source line is followed by matching AML.

Add a tool tools/acpi_extract.py to process this
listing. The tool looks for ACPI_EXTRACT tags
in the ASL source and outputs matching AML offsets
in an array.

To make these directives pass through ASL without affecting AML,
and to make it possible to match AML to source exactly,
add a preprocessing stage, which prepares input for iasl,
and puts each ACPI_EXTRACT tag within a comment,
on a line by itself.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile |   10 +-
 tools/acpi_extract.py|  195 ++
 tools/acpi_extract_preprocess.py |   37 +++
 3 files changed, 238 insertions(+), 4 deletions(-)
 create mode 100755 tools/acpi_extract.py
 create mode 100755 tools/acpi_extract_preprocess.py

diff --git a/Makefile b/Makefile
index 109091b..541b080 100644
--- a/Makefile
+++ b/Makefile
@@ -192,11 +192,13 @@ $(OUT)vgabios.bin: $(OUT)vgabios.bin.raw tools/buildrom.py
$(Q)./tools/buildrom.py $< $@
 
 ### dsdt build rules
-src/%.hex: src/%.dsl
+src/%.hex: src/%.dsl ./tools/acpi_extract_preprocess.py ./tools/acpi_extract.py
@echo "Compiling DSDT"
-   $(Q)cpp -P $< > $(OUT)$*.dsl.i
-   $(Q)iasl -tc -p $(OUT)$* $(OUT)$*.dsl.i
-   $(Q)cp $(OUT)$*.hex $@
+   $(Q)cpp -P $< > $(OUT)$*.dsl.i.orig
+   $(Q)./tools/acpi_extract_preprocess.py $(OUT)$*.dsl.i.orig > 
$(OUT)$*.dsl.i
+   $(Q)iasl -l -tc -p $(OUT)$* $(OUT)$*.dsl.i
+   $(Q)./tools/acpi_extract.py $(OUT)$*.lst > $(OUT)$*.off
+   $(Q)cat $(OUT)$*.hex $(OUT)$*.off > $@
 
 $(OUT)ccode32flat.o: src/acpi-dsdt.hex
 
diff --git a/tools/acpi_extract.py b/tools/acpi_extract.py
new file mode 100755
index 000..67efe35
--- /dev/null
+++ b/tools/acpi_extract.py
@@ -0,0 +1,195 @@
+#!/usr/bin/python
+
+# Process mixed ASL/AML listing (.lst file) produced by iasl -l
+# Locate and execute ACPI_EXTRACT directives, output offset info
+# 
+# Documentation of ACPI_EXTRACT_* directive tags:
+# 
+# These directive tags output offset information from AML for BIOS runtime
+# table generation.
+# Each directive is of the form:
+# ACPI_EXTRACT_   (...)
+# and causes the extractor to create an array
+# named  with offset, in the generated AML,
+# of an object of a given type in the following .
+# 
+# A directive must fit on a single code line.
+# 
+# Object type in AML is verified, a mismatch causes a build failure.
+# 
+# Directives and operators currently supported are:
+# ACPI_EXTRACT_NAME_DWORD_CONST - extract a Dword Const object from Name()
+# ACPI_EXTRACT_METHOD_STRING - extract a NameString from Method()
+# ACPI_EXTRACT_NAME_STRING - extract a NameString from Name()
+# 
+# ACPI_EXTRACT is not allowed anywhere else in code, except in comments.
+
+import re;
+import sys;
+import fileinput;
+
+aml = []
+asl = []
+output = {}
+debug = ""
+
+class asl_line:
+line = None
+lineno = None
+aml_offset = None
+
+def die(diag):
+sys.stderr.write("Error: %s; %s\n" % (diag, debug))
+sys.exit(1)
+
+#Store an ASL command, matching AML offset, and input line (for debugging)
+def add_asl(lineno, line):
+l = asl_line()
+l.line = line
+l.lineno = lineno
+l.aml_offset = len(aml)
+asl.append(l)
+
+#Store an AML byte sequence
+#Verify that offset output by iasl matches # of bytes so far
+def add_aml(offset, line):
+o = int(offset, 16);
+# Sanity check: offset must match size of code so far
+if (o != len(aml)):
+die("Offset 0x%x != 0x%x" % (o, len(aml)))
+# Strip any trailing dots and ASCII dump after "
+line = re.sub(r'\s*\.*\s*".*$',"", line)
+# Strip traling whitespace
+line = re.sub(r'\s+$',"", line)
+# Strip leading whitespace
+line = re.sub(r'^\s+',"", line)
+# Split on whitespace
+code = re.split(r'\s+', line)
+for c in code:
+# Require a legal hex number, two digits
+if (not(re.search(r'^[0-9A-Fa-f][0-9A-Fa-f]$', c))):
+die("Unexpected octet %s" % c);
+aml.append(int(c, 16));
+
+# Process aml bytecode array, decoding AML
+
+# Given method offset, find its NameString offset
+def aml_method_string(offset):
+#0x14 MethodOp PkgLength NameString MethodFlags TermList
+if (aml[offset] != 0x14):
+die( "Method offset 0x%x: expected 0x14 actual 0x%x" %
+ (offset, aml[offset]));
+offset += 1;
+# PkgLength can be multibyte. Bits 8-7 give the # of extra bytes.
+pkglenbytes = aml[offset] >> 6;
+offset += 1 + pkglenbytes;
+return offset;
+
+# Given name offset, find its NameString offset
+def aml_name_string(offset):
+#0x08 NameOp NameString DataRef
+if (aml[offset] != 0x08):
+die( "Name offset 0x%x: expected 0x08 actual 0x%x" %
+ (offset, aml[offset]));
+return offset + 1;
+
+# Given data offset, find dword const offset
+def aml_data_dword_const(offset):
+#0x08 NameOp NameString Dat

[PATCHv2 0/3] acpi: fix up EJ0 in DSDT

2011-09-22 Thread Michael S. Tsirkin
This is a second iteration of the patch.  The patch has been
significantly reworked to address (offline) comments by Gleb.

I think the infrastructure created is generic enough
to be generally useful beyond the specific bug
that I would like to fix. Specifically it
will be able to find S3 Name to patch that,
or process compiled CPU SSDT to avoid the need for
hardcoded offsets.

Please comment.

Main changes:
- tools rewritten in python
- Original ASL retains _EJ0 methods, BIOS patches that to EJ0_
- generic ACP_EXTRACT infrastructure that can match Method
  and Name Operators
- instead of matching specific method name, insert tags
  in original DSL source and match that to AML

-

Here's a bug: guest thinks it can eject VGA device and ISA bridge.

[root@dhcp74-172 ~]#lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 PCI bridge: Red Hat, Inc. Device 0001
00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/
adapter  address  attention  latch  module  power
[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/
adapter  address  attention  latch  module  power

[root@dhcp74-172 ~]# echo 0 > /sys/bus/pci/slots/2/power 
[root@dhcp74-172 ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:03.0 PCI bridge: Red Hat, Inc. Device 0001
00:04.0 Ethernet controller: Qumranet, Inc. Virtio network device
00:05.0 SCSI storage controller: Qumranet, Inc. Virtio block device
01:00.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet 
Controller (rev 03)

This is wrong because slots 1 and 2 are marked as not hotpluggable
in qemu.

The reason is that our acpi tables declare both _RMV with value 0,
and _EJ0 method for these slots. What happens in this case
is undocumented by ACPI spec, so linux ignores _RMV,
and windows seems to ignore _EJ0.

The correct way to suppress hotplug is not to have _EJ0,
so this is what this patch does: it probes PIIX and
modifies DSDT to match.

With these patches applied, we get:

[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/1/
address
[root@dhcp74-172 ~]# ls /sys/bus/pci/slots/2/
address



Michael S. Tsirkin (3):
  acpi: generate and parse mixed asl/aml listing
  acpi: EJ0 method name patching
  acpi: remove _RMV

 Makefile |   10 +-
 src/acpi-dsdt.dsl|   96 ---
 src/acpi.c   |   31 ++
 tools/acpi_extract.py|  195 ++
 tools/acpi_extract_preprocess.py |   37 +++
 5 files changed, 307 insertions(+), 62 deletions(-)
 create mode 100755 tools/acpi_extract.py
 create mode 100755 tools/acpi_extract_preprocess.py

-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] virtio-console: wait for console ports

2011-09-22 Thread Amit Shah
On (Thu) 22 Sep 2011 [13:20:07], Christian Borntraeger wrote:
> On 22/09/11 12:08, Amit Shah wrote:
> >> +  /* If there was an early virtio console, assume that there are no
> >> +   * other consoles. We need to wait until the hvc_alloc matches the
> >> +   * hvc_instantiate, otherwise tty_open will complain, resulting in
> >> +   * a "Warning: unable to open an initial console" boot failure.
> >> +   * Without multiport this is done in add_port above. With multiport
> >> +   * this might take some host<->guest communication - thus we have to
> >> +   * wait. */
> > 
> > This file uses comments in the form
> >   /*
> >* ...
> >*/
> 
> Will fix.
> 
> > 
> >> +  if (multiport && early)
> >> +  wait_for_completion(&port_added);
> >> +
> > 
> > Can there be a problem to not timeout this wait?  Maybe it's not a
> > real problem; just thinking out aloud.
> 
> I had the same thoughts. I then asked myself, how big the timeout has to be - 
> and
> the answer was it really depends on the host load. So we can certainly use
> wait_for_completion_timeout(&port_added, HZ*);
> which will work in 99.9% of all cases. It might still cause spurious boot 
> failures,
> if for some reasons it takes too long.

Yes; there's no deterministic way.

> So the big question is, is there a case were virtio is used as an early 
> console
> but virtio_console does not register a console during probe. Ideas?

Currently only ppc and s390 use early_console.  And since you're
sending this patch now, I guess you've started using multiport in the
host recently.  So it's really upto you to decide :-)  I think this is
benign as of now.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] virtio-console: wait for console ports

2011-09-22 Thread Christian Borntraeger
On 22/09/11 12:08, Amit Shah wrote:
>> +/* If there was an early virtio console, assume that there are no
>> + * other consoles. We need to wait until the hvc_alloc matches the
>> + * hvc_instantiate, otherwise tty_open will complain, resulting in
>> + * a "Warning: unable to open an initial console" boot failure.
>> + * Without multiport this is done in add_port above. With multiport
>> + * this might take some host<->guest communication - thus we have to
>> + * wait. */
> 
> This file uses comments in the form
>   /*
>* ...
>*/

Will fix.

> 
>> +if (multiport && early)
>> +wait_for_completion(&port_added);
>> +
> 
> Can there be a problem to not timeout this wait?  Maybe it's not a
> real problem; just thinking out aloud.

I had the same thoughts. I then asked myself, how big the timeout has to be - 
and
the answer was it really depends on the host load. So we can certainly use
wait_for_completion_timeout(&port_added, HZ*);
which will work in 99.9% of all cases. It might still cause spurious boot 
failures,
if for some reasons it takes too long.
So the big question is, is there a case were virtio is used as an early console
but virtio_console does not register a console during probe. Ideas?

Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] nVMX: Fix warning-causing idt-vectoring-info behavior

2011-09-22 Thread Nadav Har'El
When L0 wishes to inject an interrupt while L2 is running, it emulates an exit
to L1 with EXIT_REASON_EXTERNAL_INTERRUPT. This was explained in the original
nVMX patch 23, titled "Correct handling of interrupt injection".

Unfortunately, it is possible (though rare) that at this point there is valid
idt_vectoring_info in vmcs02. For example, L1 injected some interrupt to L2,
and when L2 tried to run this interrupt's handler, it got a page fault - so
it returns the original interrupt vector in idt_vectoring_info. The problem
is that if this is the case, we cannot exit to L1 with EXTERNAL_INTERRUPT
like we wished to, because the VMX spec guarantees that idt_vectoring_info
and exit_reason_external_interrupt can never happen together. This is not
just specified in the spec - a KVM L1 actually prints a kernel warning
"unexpected, valid vectoring info" if we violate this guarantee, and some
users noticed these warnings in L1's logs.

In order to better emulate a processor, which would never return the external
interrupt and the idt-vectoring-info together, we need to separate the two
injection steps: First, complete L1's injection into L2 (i.e., enter L2,
injecting to it the idt-vectoring-info); Second, after entry into L2 succeeds
and it exits back to L0, exit to L1 with the EXIT_REASON_EXTERNAL_INTERRUPT.
Most of this is already in the code - the only change we need is to remain
in L2 (and not exit to L1) in this case.

Note that the previous patch ensures (by using KVM_REQ_IMMEDIATE_EXIT) that
although we do enter L2 first, it will exit immediately after processing its
injection, allowing us to promptly inject to L1.

Note how we test vmcs12->idt_vectoring_info_field; This isn't really the
vmcs12 value (we haven't exited to L1 yet, so vmcs12 hasn't been updated),
but rather the place we save, at the end of vmx_vcpu_run, the vmcs02 value
of this field. This was explained in patch 25 ("Correct handling of idt
vectoring info") of the original nVMX patch series.

Thanks to Dave Allan and to Federico Simoncelli for reporting this bug,
to Abel Gordon for helping me figure out the solution, and to Avi Kivity
for helping to improve it.

Signed-off-by: Nadav Har'El 
---
 arch/x86/kvm/vmx.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- .before/arch/x86/kvm/vmx.c  2011-09-22 13:51:31.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-09-22 13:51:31.0 +0300
@@ -3993,11 +3993,12 @@ static void vmx_set_nmi_mask(struct kvm_
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
-   struct vmcs12 *vmcs12;
-   if (to_vmx(vcpu)->nested.nested_run_pending)
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+   if (to_vmx(vcpu)->nested.nested_run_pending ||
+   (vmcs12->idt_vectoring_info_field &
+VECTORING_INFO_VALID_MASK))
return 0;
nested_vmx_vmexit(vcpu);
-   vmcs12 = get_vmcs12(vcpu);
vmcs12->vm_exit_reason = EXIT_REASON_EXTERNAL_INTERRUPT;
vmcs12->vm_exit_intr_info = 0;
/* fall through to normal code, but now in L1, not L2 */
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] nVMX: Add KVM_REQ_IMMEDIATE_EXIT

2011-09-22 Thread Nadav Har'El
This patch adds a new vcpu->requests bit, KVM_REQ_IMMEDIATE_EXIT.
This bit requests that when next entering the guest, we should run it only
for as little as possible, and exit again.

We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1
to continue running so it can inject an event to it, we unfortunately cannot
just pretend to have run L2 for a little while - We must really launch L2,
otherwise certain one-off vmcs12 parameters (namely, L1 injection into L2)
will be lost. So the existing code runs L2 in this case.
But L2 could potentially run for a long time until it exits, and the
injection into L1 will be delayed. The new KVM_REQ_IMMEDIATE_EXIT allows us
to request that L2 will be entered, as necessary, but will exit as soon as
possible after entry.

Our implementation of this request uses smp_send_reschedule() to send a
self-IPI, with interrupts disabled. The interrupts remain disabled until the
guest is entered, and then, after the entry is complete (often including
processing an injection and jumping to the relevant handler), the physical
interrupt is noticed and causes an exit.

On recent Intel processors, we could have achieved the same goal by using
MTF instead of a self-IPI. Another technique worth considering in the future
is to use VM_EXIT_ACK_INTR_ON_EXIT and a highest-priority vector IPI - to
slightly improve performance by avoiding the useless interrupt handler
which ends up being called when smp_send_reschedule() is used.

Signed-off-by: Nadav Har'El 
---
 arch/x86/kvm/vmx.c   |   11 +++
 arch/x86/kvm/x86.c   |6 ++
 include/linux/kvm_host.h |1 +
 3 files changed, 14 insertions(+), 4 deletions(-)

--- .before/include/linux/kvm_host.h2011-09-22 13:51:31.0 +0300
+++ .after/include/linux/kvm_host.h 2011-09-22 13:51:31.0 +0300
@@ -48,6 +48,7 @@
 #define KVM_REQ_EVENT 11
 #define KVM_REQ_APF_HALT  12
 #define KVM_REQ_STEAL_UPDATE  13
+#define KVM_REQ_IMMEDIATE_EXIT14
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 
--- .before/arch/x86/kvm/x86.c  2011-09-22 13:51:31.0 +0300
+++ .after/arch/x86/kvm/x86.c   2011-09-22 13:51:31.0 +0300
@@ -5610,6 +5610,7 @@ static int vcpu_enter_guest(struct kvm_v
bool nmi_pending;
bool req_int_win = !irqchip_in_kernel(vcpu->kvm) &&
vcpu->run->request_interrupt_window;
+   bool req_immediate_exit = 0;
 
if (vcpu->requests) {
if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
@@ -5647,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_v
}
if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
record_steal_time(vcpu);
+   req_immediate_exit =
+   kvm_check_request(KVM_REQ_IMMEDIATE_EXIT, vcpu);
 
}
 
@@ -5706,6 +5709,9 @@ static int vcpu_enter_guest(struct kvm_v
 
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
 
+   if (req_immediate_exit)
+   smp_send_reschedule(vcpu->cpu);
+
kvm_guest_enter();
 
if (unlikely(vcpu->arch.switch_db_regs)) {
--- .before/arch/x86/kvm/vmx.c  2011-09-22 13:51:31.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2011-09-22 13:51:31.0 +0300
@@ -3858,12 +3858,15 @@ static bool nested_exit_on_intr(struct k
 static void enable_irq_window(struct kvm_vcpu *vcpu)
 {
u32 cpu_based_vm_exec_control;
-   if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
-   /* We can get here when nested_run_pending caused
-* vmx_interrupt_allowed() to return false. In this case, do
-* nothing - the interrupt will be injected later.
+   if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu)) {
+   /*
+* We get here if vmx_interrupt_allowed() said we can't
+* inject to L1 now because L2 must run. Ask L2 to exit
+* right after entry, so we can inject to L1 more promptly.
 */
+   kvm_make_request(KVM_REQ_IMMEDIATE_EXIT, vcpu);
return;
+   }
 
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] nVMX injection corrections

2011-09-22 Thread Nadav Har'El
The following two patches solve two injection-related nested VMX issues:

 1. When we must run L2 next (namely on L1's VMLAUNCH/VMRESUME), injection
into L1 was delayed for an unknown amount of time - until L2 exits.
We now force (using a self IPI) an exit immediately after entry to L2,
so that the injection into L1 happens promptly.

 2. "unexpected, valid vectoring info" warnings appeared in L1.
These are fixed by correcting the emulation of concurrent L0->L1 and
L1->L2 injections: We cannot inject into L1 until the injection into L2
has been processed.

Patch statistics:
-

 arch/x86/kvm/vmx.c   |   18 +++---
 arch/x86/kvm/x86.c   |6 ++
 include/linux/kvm_host.h |1 +
 3 files changed, 18 insertions(+), 7 deletions(-)

--
Nadav Har'El
IBM Haifa Research Lab
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] virtio-console: wait for console ports

2011-09-22 Thread Amit Shah
Hi Christian,

On (Wed) 21 Sep 2011 [17:52:23], Christian Borntraeger wrote:
> Amit,
> 
> can you have a look at the patch below and give feedback or apply
> if appropriate?

The patch looks good.  Just a couple of comments:

> On s390 I have seen some random "Warning: unable to open an initial
> console" boot failure. Turns out that tty_open fails, because the
> hvc_alloc was not yet done. In former times this could not happen,
> since the probe function automatically called hvc_alloc. With newer
> versions (multiport) some host<->guest interaction is required
> before hvc_alloc is called. This might be too late, especially if
> an initramfs is involved. Lets use a completion if we have
> multiport and an early console.
> 
> Signed-off-by: Christian Borntraeger 
> 
> ---
>  drivers/char/virtio_console.c |   18 ++
>  1 file changed, 18 insertions(+)
> 
> Index: b/drivers/char/virtio_console.c
> ===
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -19,6 +19,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -73,6 +74,7 @@ struct ports_driver_data {
>  static struct ports_driver_data pdrvdata;
>  
>  DEFINE_SPINLOCK(pdrvdata_lock);
> +DECLARE_COMPLETION(port_added);
>  
>  /* This struct holds information that's relevant only for console ports */
>  struct console {
> @@ -1352,6 +1354,7 @@ static void handle_control_message(struc
>   break;
>  
>   init_port_console(port);
> + complete(&port_added);
>   /*
>* Could remove the port here in case init fails - but
>* have to notify the host first.
> @@ -1648,6 +1651,10 @@ static int __devinit virtcons_probe(stru
>   struct ports_device *portdev;
>   int err;
>   bool multiport;
> + bool early = early_put_chars != 0;

Check for NULL instead of 0.  Is it necessary to create this variable
instead of checking for early_put_chars != NULL below?

> +
> + /* Ensure to read early_put_chars now */
> + barrier();
>  
>   portdev = kmalloc(sizeof(*portdev), GFP_KERNEL);
>   if (!portdev) {
> @@ -1719,6 +1726,17 @@ static int __devinit virtcons_probe(stru
>  
>   __send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
>  VIRTIO_CONSOLE_DEVICE_READY, 1);
> +
> + /* If there was an early virtio console, assume that there are no
> +  * other consoles. We need to wait until the hvc_alloc matches the
> +  * hvc_instantiate, otherwise tty_open will complain, resulting in
> +  * a "Warning: unable to open an initial console" boot failure.
> +  * Without multiport this is done in add_port above. With multiport
> +  * this might take some host<->guest communication - thus we have to
> +  * wait. */

This file uses comments in the form
  /*
   * ...
   */

> + if (multiport && early)
> + wait_for_completion(&port_added);
> +

Can there be a problem to not timeout this wait?  Maybe it's not a
real problem; just thinking out aloud.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu: Fix inject-nmi

2011-09-22 Thread Lai Jiangshan

From: KAMEZAWA Hiroyuki 
Subject: [PATCH] Fix inject-nmi

Now, inject-nmi sends NMI to all cpus...but this doesn't emulate
pc hardware 'NMI button', which triggers LINT1.

So, now, LINT1 mask is ignored by inject-nmi and NMIs are sent to
all cpus without checking LINT1 mask.

Because Linux masks LINT1 of cpus other than 0, this makes trouble.
For example, kdump cannot run sometimes.
---
 hw/apic.c |7 +++
 hw/apic.h |1 +
 monitor.c |4 ++--
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 69d6ac5..020305b 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -205,6 +205,13 @@ void apic_deliver_pic_intr(DeviceState *d, int level)
 }
 }
 
+void apic_deliver_lint1_intr(DeviceState *d)
+{
+APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
+
+   apic_local_deliver(s, APIC_LVT_LINT1);
+}
+
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j, __mask;\
diff --git a/hw/apic.h b/hw/apic.h
index c857d52..7ccf214 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -10,6 +10,7 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t trigger_mode);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
+void apic_deliver_lint1_intr(DeviceState *s);
 int apic_get_interrupt(DeviceState *s);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
diff --git a/monitor.c b/monitor.c
index cb485bf..d740478 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2614,9 +2614,9 @@ static void do_wav_capture(Monitor *mon, const QDict 
*qdict)
 static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
 CPUState *env;
-
+/* This emulates hardware NMI button. So, trigger LINT1 */
 for (env = first_cpu; env != NULL; env = env->next_cpu) {
-cpu_interrupt(env, CPU_INTERRUPT_NMI);
+apic_deliver_lint1_intr(env->apic_state);
 }
 
 return 0;
-- 1.7.4.1 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Virtualization for ARM CPUs

2011-09-22 Thread MatteoGmail

Hi to all,

I'm interested in virtualization support for ARM architectures.

After googling, examining /proc/cpuinfo and reading ARM related posts on 
this ml I still don't know

the level of hardware virtualization support of the many ARM CPUs.


Could anyone please point me to some detailed resources on the subject?

Thanks in advance. Matteo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] pci-assign: Fix MSI-X registration

2011-09-22 Thread Jan Kiszka
On 2011-09-22 05:12, Alex Williamson wrote:
> Commit c4525754 added a capability check for KVM_CAP_DEVICE_MSIX,
> which is unfortunately not exposed, resulting in MSIX never
> being listed as a capability. 

Oops. Should we fix this nevertheless in the kernel?

> This breaks anything depending on
> MSIX, such as igbvf.  Since we can't specifically check for MSIX
> support and KVM_CAP_ASSIGN_DEV_IRQ indicates more than just MSI,
> let's just revert c4525754 and replace it with a sanity check that
> we need KVM_CAP_ASSIGN_DEV_IRQ if the device supports any kind of
> interrupt (which is still mostly paranoia).
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  hw/device-assignment.c |   13 +
>  1 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 93913b3..b5bde68 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1189,8 +1189,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
> *pci_dev)
>  
>  /* Expose MSI capability
>   * MSI capability is the 1st capability in capability config */
> -pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0);
> -if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ)) {
> +if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI, 0))) {
>  dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
>  /* Only 32-bit/no-mask currently supported */
>  if ((ret = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, pos, 10)) < 
> 0) {
> @@ -1211,8 +1210,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
> *pci_dev)
>  pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0x);
>  }
>  /* Expose MSI-X capability */
> -pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
> -if (pos != 0 && kvm_check_extension(kvm_state, KVM_CAP_DEVICE_MSIX)) {
> +if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0))) {
>  int bar_nr;
>  uint32_t msix_table_entry;
>  
> @@ -1606,6 +1604,13 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>  if (assigned_device_pci_cap_init(pci_dev) < 0)
>  goto out;
>  
> +if (!kvm_check_extension(kvm_state, KVM_CAP_ASSIGN_DEV_IRQ) &&
> +(dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX ||
> + dev->cap.available & ASSIGNED_DEVICE_CAP_MSI ||
> + assigned_dev_pci_read_byte(pci_dev, PCI_INTERRUPT_PIN) != 0)) {
> +goto out;
> +}
> +

That's not equivalent as it needlessly prevents IRQ support in the
absence of KVM_CAP_ASSIGN_DEV_IRQ.

Let's just fix the core issue and replace the test for
KVM_CAP_DEVICE_MSIX with a test call of KVM_ASSIGN_SET_MSIX_NR, passing
in a NULL struct. If it returns -EFAULT, the IOCTL is known and MSIX is
supported.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] pci-assign: Re-order initfn for memory API

2011-09-22 Thread Jan Kiszka
On 2011-09-22 05:12, Alex Williamson wrote:
> We now need to scan PCI capabilities and setup an MSI-X page
> before we walk the device resources since the overlay is now
> setup during init instead of at the first mapping by the guest.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  hw/device-assignment.c |   16 
>  1 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 288f80c..93913b3 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -1603,6 +1603,14 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
>  goto out;
>  }
>  
> +if (assigned_device_pci_cap_init(pci_dev) < 0)
> +goto out;
> +
> +/* intercept MSI-X entry page in the MMIO */
> +if (dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX)
> +if (assigned_dev_register_msix_mmio(dev))
> +goto out;
> +

Please adjust the coding style at this chance.

Looks correct otherwise.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 03/11] KVM: x86: retry non-page-table writing instructions

2011-09-22 Thread Xiao Guangrong
If the emulation is caused by #PF and it is non-page_table writing instruction,
it means the VM-EXIT is caused by shadow page protected, we can zap the shadow
page and retry this instruction directly

The idea is from Avi

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/include/asm/kvm_host.h|5 
 arch/x86/kvm/emulate.c |5 
 arch/x86/kvm/mmu.c |   25 ++
 arch/x86/kvm/x86.c |   47 
 5 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index a026507..9a4acf4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -364,6 +364,7 @@ enum x86_intercept {
 #endif
 
 int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len);
+bool x86_page_table_writing_insn(struct x86_emulate_ctxt *ctxt);
 #define EMULATION_FAILED -1
 #define EMULATION_OK 0
 #define EMULATION_RESTART 1
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ab4241..27a25df 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -443,6 +443,9 @@ struct kvm_vcpu_arch {
 
cpumask_var_t wbinvd_dirty_mask;
 
+   unsigned long last_retry_eip;
+   unsigned long last_retry_addr;
+
struct {
bool halted;
gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
@@ -689,6 +692,7 @@ enum emulation_result {
 #define EMULTYPE_NO_DECODE (1 << 0)
 #define EMULTYPE_TRAP_UD   (1 << 1)
 #define EMULTYPE_SKIP  (1 << 2)
+#define EMULTYPE_RETRY (1 << 3)
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, unsigned long cr2,
int emulation_type, void *insn, int insn_len);
 
@@ -753,6 +757,7 @@ void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes,
   bool guest_initiated);
+int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index a10950a..8547958 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3702,6 +3702,11 @@ done:
return (rc != X86EMUL_CONTINUE) ? EMULATION_FAILED : EMULATION_OK;
 }
 
+bool x86_page_table_writing_insn(struct x86_emulate_ctxt *ctxt)
+{
+   return ctxt->d & PageTable;
+}
+
 static bool string_insn_completed(struct x86_emulate_ctxt *ctxt)
 {
/* The second termination condition only applies for REPE
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b01afee..4e53d6b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1997,7 +1997,7 @@ void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned 
int goal_nr_mmu_pages)
kvm->arch.n_max_mmu_pages = goal_nr_mmu_pages;
 }
 
-static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
+int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 {
struct kvm_mmu_page *sp;
struct hlist_node *node;
@@ -2006,7 +2006,7 @@ static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t 
gfn)
 
pgprintk("%s: looking for gfn %llx\n", __func__, gfn);
r = 0;
-
+   spin_lock(&kvm->mmu_lock);
for_each_gfn_indirect_valid_sp(kvm, sp, gfn, node) {
pgprintk("%s: gfn %llx role %x\n", __func__, gfn,
 sp->role.word);
@@ -2014,8 +2014,11 @@ static int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t 
gfn)
kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
}
kvm_mmu_commit_zap_page(kvm, &invalid_list);
+   spin_unlock(&kvm->mmu_lock);
+
return r;
 }
+EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
 
 static void mmu_unshadow(struct kvm *kvm, gfn_t gfn)
 {
@@ -3697,9 +3700,8 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, 
gva_t gva)
 
gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
 
-   spin_lock(&vcpu->kvm->mmu_lock);
r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-   spin_unlock(&vcpu->kvm->mmu_lock);
+
return r;
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
@@ -3720,10 +3722,18 @@ void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
 }
 
+static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr)
+{
+   if (vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu))
+   return vcpu_match_mmio_gpa(vcpu, addr);
+
+   return vcpu_match_mmio_gva(vcpu, addr);
+}
+
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code,
   void *insn, int insn_len)
 {
-   int r;
+   int r, emula

[PATCH] Qemu co-operation with kvm tsc deadline timer

2011-09-22 Thread Liu, Jinsong
>From 8c39f2ddbf7069342826a83e535c0c7b641d6501 Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Thu, 22 Sep 2011 16:28:13 +0800
Subject: [PATCH] Qemu co-operation with kvm tsc deadline timer

KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.

Signed-off-by: Liu, Jinsong 
---
 target-i386/cpu.h |2 ++
 target-i386/kvm.c |7 +++
 target-i386/machine.c |1 +
 3 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 935d08a..62ff73c 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -283,6 +283,7 @@
 #define MSR_IA32_APICBASE_BSP   (1<<8)
 #define MSR_IA32_APICBASE_ENABLE(1<<11)
 #define MSR_IA32_APICBASE_BASE  (0xf<<12)
+#define MSR_IA32_TSCDEADLINE0x6e0
 
 #define MSR_MTRRcap0xfe
 #define MSR_MTRRcap_VCNT   8
@@ -687,6 +688,7 @@ typedef struct CPUX86State {
 uint64_t async_pf_en_msr;
 
 uint64_t tsc;
+uint64_t tsc_deadline;
 
 uint64_t mcg_status;
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index aa843f0..2d55070 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -942,6 +942,8 @@ static int kvm_put_msrs(CPUState *env, int level)
 }
 }
 
+kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, env->tsc_deadline);
+
 msr_data.info.nmsrs = n;
 
 return kvm_vcpu_ioctl(env, KVM_SET_MSRS, &msr_data);
@@ -1173,6 +1175,8 @@ static int kvm_get_msrs(CPUState *env)
 }
 }
 
+msrs[n++].index = MSR_IA32_TSCDEADLINE;
+
 msr_data.info.nmsrs = n;
 ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, &msr_data);
 if (ret < 0) {
@@ -1213,6 +1217,9 @@ static int kvm_get_msrs(CPUState *env)
 case MSR_IA32_TSC:
 env->tsc = msrs[i].data;
 break;
+case MSR_IA32_TSCDEADLINE:
+env->tsc_deadline = msrs[i].data;
+break;
 case MSR_VM_HSAVE_PA:
 env->vm_hsave = msrs[i].data;
 break;
diff --git a/target-i386/machine.c b/target-i386/machine.c
index 9aca8e0..25fa97d 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -410,6 +410,7 @@ static const VMStateDescription vmstate_cpu = {
 VMSTATE_UINT64_V(xcr0, CPUState, 12),
 VMSTATE_UINT64_V(xstate_bv, CPUState, 12),
 VMSTATE_YMMH_REGS_VARS(ymmh_regs, CPUState, CPU_NB_REGS, 12),
+VMSTATE_UINT64_V(tsc_deadline, CPUState, 13),
 VMSTATE_END_OF_LIST()
 /* The above list is not sorted /wrt version numbers, watch out! */
 },
-- 
1.6.5.6


qemu-tsc-deadline-timer.patch
Description: qemu-tsc-deadline-timer.patch


[PATCH v4 11/11] KVM: MMU: improve write flooding detected

2011-09-22 Thread Xiao Guangrong
Detecting write-flooding does not work well, when we handle page written, if
the last speculative spte is not accessed, we treat the page is
write-flooding, however, we can speculative spte on many path, such as pte
prefetch, page synced, that means the last speculative spte may be not point
to the written page and the written page can be accessed via other sptes, so
depends on the Accessed bit of the last speculative spte is not enough

Instead of detected page accessed, we can detect whether the spte is accessed
after it is written, if the spte is not accessed but it is written frequently,
we treat is not a page table or it not used for a long time

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |6 +--
 arch/x86/kvm/mmu.c  |   62 +++---
 arch/x86/kvm/paging_tmpl.h  |   12 +++
 3 files changed, 32 insertions(+), 48 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 927ba73..9d17238 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -239,6 +239,8 @@ struct kvm_mmu_page {
int clear_spte_count;
 #endif
 
+   int write_flooding_count;
+
struct rcu_head rcu;
 };
 
@@ -353,10 +355,6 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
 
-   gfn_t last_pt_write_gfn;
-   int   last_pt_write_count;
-   u64  *last_pte_updated;
-
struct fpu guest_fpu;
u64 xcr0;
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 13f4d2a..77030ea 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1652,6 +1652,18 @@ static void init_shadow_page_table(struct kvm_mmu_page 
*sp)
sp->spt[i] = 0ull;
 }
 
+static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp)
+{
+   sp->write_flooding_count = 0;
+}
+
+static void clear_sp_write_flooding_count(u64 *spte)
+{
+   struct kvm_mmu_page *sp =  page_header(__pa(spte));
+
+   __clear_sp_write_flooding_count(sp);
+}
+
 static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
 gfn_t gfn,
 gva_t gaddr,
@@ -1695,6 +1707,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
} else if (sp->unsync)
kvm_mmu_mark_parents_unsync(sp);
 
+   __clear_sp_write_flooding_count(sp);
trace_kvm_mmu_get_page(sp, false);
return sp;
}
@@ -1847,15 +1860,6 @@ static void kvm_mmu_put_page(struct kvm_mmu_page *sp, 
u64 *parent_pte)
mmu_page_remove_parent_pte(sp, parent_pte);
 }
 
-static void kvm_mmu_reset_last_pte_updated(struct kvm *kvm)
-{
-   int i;
-   struct kvm_vcpu *vcpu;
-
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   vcpu->arch.last_pte_updated = NULL;
-}
-
 static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
u64 *parent_pte;
@@ -1915,7 +1919,6 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, 
struct kvm_mmu_page *sp,
}
 
sp->role.invalid = 1;
-   kvm_mmu_reset_last_pte_updated(kvm);
return ret;
 }
 
@@ -2360,8 +2363,6 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
}
kvm_release_pfn_clean(pfn);
-   if (speculative)
-   vcpu->arch.last_pte_updated = sptep;
 }
 
 static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
@@ -3522,13 +3523,6 @@ static void mmu_pte_write_flush_tlb(struct kvm_vcpu 
*vcpu, bool zap_page,
kvm_mmu_flush_tlb(vcpu);
 }
 
-static bool last_updated_pte_accessed(struct kvm_vcpu *vcpu)
-{
-   u64 *spte = vcpu->arch.last_pte_updated;
-
-   return !!(spte && (*spte & shadow_accessed_mask));
-}
-
 static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
const u8 *new, int *bytes)
 {
@@ -3569,22 +3563,16 @@ static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu 
*vcpu, gpa_t *gpa,
  * If we're seeing too many writes to a page, it may no longer be a page table,
  * or we may be forking, in which case it is better to unmap the page.
  */
-static bool detect_write_flooding(struct kvm_vcpu *vcpu, gfn_t gfn)
+static bool detect_write_flooding(struct kvm_mmu_page *sp, u64 *spte)
 {
-   bool flooded = false;
-
-   if (gfn == vcpu->arch.last_pt_write_gfn
-   && !last_updated_pte_accessed(vcpu)) {
-   ++vcpu->arch.last_pt_write_count;
-   if (vcpu->arch.last_pt_write_count >= 3)
-   flooded = true;
-   } else {
-   vcpu->arch.last_pt_write_gfn = gfn;
-   vcpu->arch.last_pt_write_count = 1;
-   vcpu->arch.last_pte_updated = NULL;
-   }
+   /*
+* Skip write-flooding detected for the sp whose level is 1, because
+ 

[PATCH] KVM: emulate lapic tsc deadline timer for guest

2011-09-22 Thread Liu, Jinsong
>From 4d5b83aba40ce0d421add9a41a6c591a8590a32e Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Thu, 22 Sep 2011 14:00:08 +0800
Subject: [PATCH 2/2] KVM: emulate lapic tsc deadline timer for guest

This patch emulate lapic tsc deadline timer for guest:
Enumerate tsc deadline timer capability by CPUID;
Enable tsc deadline timer mode by lapic MMIO;
Start tsc deadline timer by WRMSR;

Signed-off-by: Liu, Jinsong 
---
 arch/x86/include/asm/kvm_host.h |2 +
 arch/x86/kvm/kvm_timer.h|2 +
 arch/x86/kvm/lapic.c|  123 ---
 arch/x86/kvm/lapic.h|3 +
 arch/x86/kvm/x86.c  |   16 +-
 5 files changed, 123 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ab4241..b9d4291 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -673,6 +673,8 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t 
gfn);
 
 extern bool tdp_enabled;
 
+u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
+
 /* control of guest tsc rate supported? */
 extern bool kvm_has_tsc_control;
 /* minimum supported tsc_khz for guests */
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index 64bc6ea..497dbaa 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -2,6 +2,8 @@
 struct kvm_timer {
struct hrtimer timer;
s64 period; /* unit: ns */
+   u32 timer_mode_mask;
+   u64 tscdeadline;
atomic_t pending;   /* accumulated triggered timers 
*/
bool reinject;
struct kvm_timer_ops *t_ops;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 57dcbd4..66b64b8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -135,9 +135,23 @@ static inline int apic_lvt_vector(struct kvm_lapic *apic, 
int lvt_type)
return apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK;
 }
 
+static inline int apic_lvtt_oneshot(struct kvm_lapic *apic)
+{
+   return ((apic_get_reg(apic, APIC_LVTT) &
+   apic->lapic_timer.timer_mode_mask) == APIC_LVT_TIMER_ONESHOT);
+}
+
 static inline int apic_lvtt_period(struct kvm_lapic *apic)
 {
-   return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
+   return ((apic_get_reg(apic, APIC_LVTT) &
+   apic->lapic_timer.timer_mode_mask) == APIC_LVT_TIMER_PERIODIC);
+}
+
+static inline int apic_lvtt_tscdeadline(struct kvm_lapic *apic)
+{
+   return ((apic_get_reg(apic, APIC_LVTT) &
+   apic->lapic_timer.timer_mode_mask) ==
+   APIC_LVT_TIMER_TSCDEADLINE);
 }
 
 static inline int apic_lvt_nmi_mode(u32 lvt_val)
@@ -166,7 +180,7 @@ static inline int apic_x2apic_mode(struct kvm_lapic *apic)
 }
 
 static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
-   LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
+   LVT_MASK ,  /* part LVTT mask, timer mode mask added at runtime */
LVT_MASK | APIC_MODE_MASK,  /* LVTTHMR */
LVT_MASK | APIC_MODE_MASK,  /* LVTPC */
LINT_MASK, LINT_MASK,   /* LVT0-1 */
@@ -570,6 +584,9 @@ static u32 __apic_read(struct kvm_lapic *apic, unsigned int 
offset)
break;
 
case APIC_TMCCT:/* Timer CCR */
+   if (apic_lvtt_tscdeadline(apic))
+   return 0;
+
val = apic_get_tmcct(apic);
break;
 
@@ -664,29 +681,32 @@ static void update_divide_count(struct kvm_lapic *apic)
 
 static void start_apic_timer(struct kvm_lapic *apic)
 {
-   ktime_t now = apic->lapic_timer.timer.base->get_time();
-
-   apic->lapic_timer.period = (u64)apic_get_reg(apic, APIC_TMICT) *
-   APIC_BUS_CYCLE_NS * apic->divide_count;
+   ktime_t now;
atomic_set(&apic->lapic_timer.pending, 0);
 
-   if (!apic->lapic_timer.period)
-   return;
-   /*
-* Do not allow the guest to program periodic timers with small
-* interval, since the hrtimers are not throttled by the host
-* scheduler.
-*/
-   if (apic_lvtt_period(apic)) {
-   if (apic->lapic_timer.period < NSEC_PER_MSEC/2)
-   apic->lapic_timer.period = NSEC_PER_MSEC/2;
-   }
+   if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) {
+   /* lapic timer in oneshot or peroidic mode */
+   now = apic->lapic_timer.timer.base->get_time();
+   apic->lapic_timer.period = (u64)apic_get_reg(apic, APIC_TMICT)
+   * APIC_BUS_CYCLE_NS * apic->divide_count;
+
+   if (!apic->lapic_timer.period)
+   return;
+   /*
+* Do not allow the guest to program periodic timers with small
+* interval, since the hrtimers are not throttled by the host
+* scheduler.
+*/
+   if (apic_lvtt_period(apic))

[PATCH v4 10/11] KVM: MMU: fix detecting misaligned accessed

2011-09-22 Thread Xiao Guangrong
Sometimes, we only modify the last one byte of a pte to update status bit,
for example, clear_bit is used to clear r/w bit in linux kernel and 'andb'
instruction is used in this function, in this case, kvm_mmu_pte_write will
treat it as misaligned access, and the shadow page table is zapped

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6e39ec5..13f4d2a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3601,6 +3601,14 @@ static bool detect_write_misaligned(struct kvm_mmu_page 
*sp, gpa_t gpa,
 
offset = offset_in_page(gpa);
pte_size = sp->role.cr4_pae ? 8 : 4;
+
+   /*
+* Sometimes, the OS only writes the last one bytes to update status
+* bits, for example, in linux, andb instruction is used in clear_bit().
+*/
+   if (!(offset & (pte_size - 1)) && bytes == 1)
+   return false;
+
misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
misaligned |= bytes < 4;
 
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 09/11] KVM: MMU: split kvm_mmu_pte_write function

2011-09-22 Thread Xiao Guangrong
kvm_mmu_pte_write is too long, we split it for better readable

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |  194 
 1 files changed, 119 insertions(+), 75 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7b22f3a..6e39ec5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3529,48 +3529,28 @@ static bool last_updated_pte_accessed(struct kvm_vcpu 
*vcpu)
return !!(spte && (*spte & shadow_accessed_mask));
 }
 
-void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-  const u8 *new, int bytes)
+static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
+   const u8 *new, int *bytes)
 {
-   gfn_t gfn = gpa >> PAGE_SHIFT;
-   union kvm_mmu_page_role mask = { .word = 0 };
-   struct kvm_mmu_page *sp;
-   struct hlist_node *node;
-   LIST_HEAD(invalid_list);
-   u64 entry, gentry, *spte;
-   unsigned pte_size, page_offset, misaligned, quadrant, offset;
-   int level, npte, r, flooded = 0;
-   bool remote_flush, local_flush, zap_page;
-
-   /*
-* If we don't have indirect shadow pages, it means no page is
-* write-protected, so we can exit simply.
-*/
-   if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
-   return;
-
-   zap_page = remote_flush = local_flush = false;
-   offset = offset_in_page(gpa);
-
-   pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
+   u64 gentry;
+   int r;
 
/*
 * Assume that the pte write on a page table of the same type
 * as the current vcpu paging mode since we update the sptes only
 * when they have the same mode.
 */
-   if (is_pae(vcpu) && bytes == 4) {
+   if (is_pae(vcpu) && *bytes == 4) {
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   gpa &= ~(gpa_t)7;
-   bytes = 8;
-
-   r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
+   *gpa &= ~(gpa_t)7;
+   *bytes = 8;
+   r = kvm_read_guest(vcpu->kvm, *gpa, &gentry, min(*bytes, 8));
if (r)
gentry = 0;
new = (const u8 *)&gentry;
}
 
-   switch (bytes) {
+   switch (*bytes) {
case 4:
gentry = *(const u32 *)new;
break;
@@ -3582,71 +3562,135 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t 
gpa,
break;
}
 
-   /*
-* No need to care whether allocation memory is successful
-* or not since pte prefetch is skiped if it does not have
-* enough objects in the cache.
-*/
-   mmu_topup_memory_caches(vcpu);
-   spin_lock(&vcpu->kvm->mmu_lock);
-   ++vcpu->kvm->stat.mmu_pte_write;
-   trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
+   return gentry;
+}
+
+/*
+ * If we're seeing too many writes to a page, it may no longer be a page table,
+ * or we may be forking, in which case it is better to unmap the page.
+ */
+static bool detect_write_flooding(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+   bool flooded = false;
+
if (gfn == vcpu->arch.last_pt_write_gfn
&& !last_updated_pte_accessed(vcpu)) {
++vcpu->arch.last_pt_write_count;
if (vcpu->arch.last_pt_write_count >= 3)
-   flooded = 1;
+   flooded = true;
} else {
vcpu->arch.last_pt_write_gfn = gfn;
vcpu->arch.last_pt_write_count = 1;
vcpu->arch.last_pte_updated = NULL;
}
 
+   return flooded;
+}
+
+/*
+ * Misaligned accesses are too much trouble to fix up; also, they usually
+ * indicate a page is not used as a page table.
+ */
+static bool detect_write_misaligned(struct kvm_mmu_page *sp, gpa_t gpa,
+   int bytes)
+{
+   unsigned offset, pte_size, misaligned;
+
+   pgprintk("misaligned: gpa %llx bytes %d role %x\n",
+gpa, bytes, sp->role.word);
+
+   offset = offset_in_page(gpa);
+   pte_size = sp->role.cr4_pae ? 8 : 4;
+   misaligned = (offset ^ (offset + bytes - 1)) & ~(pte_size - 1);
+   misaligned |= bytes < 4;
+
+   return misaligned;
+}
+
+static u64 *get_written_sptes(struct kvm_mmu_page *sp, gpa_t gpa, int *nspte)
+{
+   unsigned page_offset, quadrant;
+   u64 *spte;
+   int level;
+
+   page_offset = offset_in_page(gpa);
+   level = sp->role.level;
+   *nspte = 1;
+   if (!sp->role.cr4_pae) {
+   page_offset <<= 1;  /* 32->64 */
+   /*
+* A 32-bit pde maps 4MB while the shadow pdes map
+* only 2MB.  So we need to double the offset again
+* and zap two pdes instead of one.
+*/
+   if (level =

[PATCH v4 08/11] KVM: MMU: remove unnecessary kvm_mmu_free_some_pages

2011-09-22 Thread Xiao Guangrong
In kvm_mmu_pte_write, we do not need to alloc shadow page, so calling
kvm_mmu_free_some_pages is really unnecessary

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4128aba..7b22f3a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3589,7 +3589,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 */
mmu_topup_memory_caches(vcpu);
spin_lock(&vcpu->kvm->mmu_lock);
-   kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
if (gfn == vcpu->arch.last_pt_write_gfn
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add some pre-defination

2011-09-22 Thread Liu, Jinsong
>From cab4eb79efc498abbda19c5b10c7d0858349af5f Mon Sep 17 00:00:00 2001
From: Liu, Jinsong 
Date: Thu, 22 Sep 2011 09:49:05 +0800
Subject: [PATCH 1/2] Add some pre-defination

This pre-defination is preparing for KVM tsc deadline timer emulation, but 
theirself are no-kvm-specific.

Signed-off-by: Liu, Jinsong 
---
 arch/x86/include/asm/apicdef.h|2 ++
 arch/x86/include/asm/cpufeature.h |1 +
 arch/x86/include/asm/msr-index.h  |2 ++
 3 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 34595d5..3925d80 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -100,7 +100,9 @@
 #defineAPIC_TIMER_BASE_CLKIN   0x0
 #defineAPIC_TIMER_BASE_TMBASE  0x1
 #defineAPIC_TIMER_BASE_DIV 0x2
+#defineAPIC_LVT_TIMER_ONESHOT  (0 << 17)
 #defineAPIC_LVT_TIMER_PERIODIC (1 << 17)
+#defineAPIC_LVT_TIMER_TSCDEADLINE  (2 << 17)
 #defineAPIC_LVT_MASKED (1 << 16)
 #defineAPIC_LVT_LEVEL_TRIGGER  (1 << 15)
 #defineAPIC_LVT_REMOTE_IRR (1 << 14)
diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 4258aac..823c4b6 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -120,6 +120,7 @@
 #define X86_FEATURE_X2APIC (4*32+21) /* x2APIC */
 #define X86_FEATURE_MOVBE  (4*32+22) /* MOVBE instruction */
 #define X86_FEATURE_POPCNT  (4*32+23) /* POPCNT instruction */
+#define X86_FEATURE_TSC_DEADLINE_TIMER (4*32+24) /* Tsc deadline timer */
 #define X86_FEATURE_AES(4*32+25) /* AES instructions */
 #define X86_FEATURE_XSAVE  (4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */
 #define X86_FEATURE_OSXSAVE(4*32+27) /* "" XSAVE enabled in the OS */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d52609a..a6962d9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -229,6 +229,8 @@
 #define MSR_IA32_APICBASE_ENABLE   (1<<11)
 #define MSR_IA32_APICBASE_BASE (0xf<<12)
 
+#define MSR_IA32_TSCDEADLINE   0x06e0
+
 #define MSR_IA32_UCODE_WRITE   0x0079
 #define MSR_IA32_UCODE_REV 0x008b
 
-- 
1.6.5.6


Add-some-pre-defination.patch
Description: Add-some-pre-defination.patch


[PATCH v4 07/11] KVM: MMU: fast prefetch spte on invlpg path

2011-09-22 Thread Xiao Guangrong
Fast prefetch spte for the unsync shadow page on invlpg path

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |4 +---
 arch/x86/kvm/mmu.c  |   38 +++---
 arch/x86/kvm/paging_tmpl.h  |   30 ++
 arch/x86/kvm/x86.c  |4 ++--
 4 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 58ea3a7..927ba73 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -460,7 +460,6 @@ struct kvm_arch {
unsigned int n_requested_mmu_pages;
unsigned int n_max_mmu_pages;
unsigned int indirect_shadow_pages;
-   atomic_t invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
 * Hash table of struct kvm_mmu_page.
@@ -754,8 +753,7 @@ int fx_init(struct kvm_vcpu *vcpu);
 
 void kvm_mmu_flush_tlb(struct kvm_vcpu *vcpu);
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-  const u8 *new, int bytes,
-  bool guest_initiated);
+  const u8 *new, int bytes);
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 805a9d5..4128aba 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3530,8 +3530,7 @@ static bool last_updated_pte_accessed(struct kvm_vcpu 
*vcpu)
 }
 
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
-  const u8 *new, int bytes,
-  bool guest_initiated)
+  const u8 *new, int bytes)
 {
gfn_t gfn = gpa >> PAGE_SHIFT;
union kvm_mmu_page_role mask = { .word = 0 };
@@ -3540,7 +3539,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
LIST_HEAD(invalid_list);
u64 entry, gentry, *spte;
unsigned pte_size, page_offset, misaligned, quadrant, offset;
-   int level, npte, invlpg_counter, r, flooded = 0;
+   int level, npte, r, flooded = 0;
bool remote_flush, local_flush, zap_page;
 
/*
@@ -3555,19 +3554,16 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
 
-   invlpg_counter = atomic_read(&vcpu->kvm->arch.invlpg_counter);
-
/*
 * Assume that the pte write on a page table of the same type
 * as the current vcpu paging mode since we update the sptes only
 * when they have the same mode.
 */
-   if ((is_pae(vcpu) && bytes == 4) || !new) {
+   if (is_pae(vcpu) && bytes == 4) {
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
-   if (is_pae(vcpu)) {
-   gpa &= ~(gpa_t)7;
-   bytes = 8;
-   }
+   gpa &= ~(gpa_t)7;
+   bytes = 8;
+
r = kvm_read_guest(vcpu->kvm, gpa, &gentry, min(bytes, 8));
if (r)
gentry = 0;
@@ -3593,22 +3589,18 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
 */
mmu_topup_memory_caches(vcpu);
spin_lock(&vcpu->kvm->mmu_lock);
-   if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
-   gentry = 0;
kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
-   if (guest_initiated) {
-   if (gfn == vcpu->arch.last_pt_write_gfn
-   && !last_updated_pte_accessed(vcpu)) {
-   ++vcpu->arch.last_pt_write_count;
-   if (vcpu->arch.last_pt_write_count >= 3)
-   flooded = 1;
-   } else {
-   vcpu->arch.last_pt_write_gfn = gfn;
-   vcpu->arch.last_pt_write_count = 1;
-   vcpu->arch.last_pte_updated = NULL;
-   }
+   if (gfn == vcpu->arch.last_pt_write_gfn
+   && !last_updated_pte_accessed(vcpu)) {
+   ++vcpu->arch.last_pt_write_count;
+   if (vcpu->arch.last_pt_write_count >= 3)
+   flooded = 1;
+   } else {
+   vcpu->arch.last_pt_write_gfn = gfn;
+   vcpu->arch.last_pt_write_count = 1;
+   vcpu->arch.last_pte_updated = NULL;
}
 
mask.cr0_wp = mask.cr4_pae = mask.nxe = 1;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index d8d3906..9efb860 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -672,20 +672,27 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
-   gpa_t pte_gpa = -

[PATCH v4 06/11] KVM: MMU: cleanup FNAME(invlpg)

2011-09-22 Thread Xiao Guangrong
Directly Use mmu_page_zap_pte to zap spte in FNAME(invlpg), also remove the
same code between FNAME(invlpg) and FNAME(sync_page)

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   16 ++--
 arch/x86/kvm/paging_tmpl.h |   44 +---
 2 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6a35024..805a9d5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1808,7 +1808,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, 
u64 *sptep,
}
 }
 
-static void mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
+static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
 u64 *spte)
 {
u64 pte;
@@ -1816,17 +1816,21 @@ static void mmu_page_zap_pte(struct kvm *kvm, struct 
kvm_mmu_page *sp,
 
pte = *spte;
if (is_shadow_present_pte(pte)) {
-   if (is_last_spte(pte, sp->role.level))
+   if (is_last_spte(pte, sp->role.level)) {
drop_spte(kvm, spte);
-   else {
+   if (is_large_pte(pte))
+   --kvm->stat.lpages;
+   } else {
child = page_header(pte & PT64_BASE_ADDR_MASK);
drop_parent_pte(child, spte);
}
-   } else if (is_mmio_spte(pte))
+   return true;
+   }
+
+   if (is_mmio_spte(pte))
mmu_spte_clear_no_track(spte);
 
-   if (is_large_pte(pte))
-   --kvm->stat.lpages;
+   return false;
 }
 
 static void kvm_mmu_page_unlink_children(struct kvm *kvm,
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 9299410..d8d3906 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -656,6 +656,18 @@ out_unlock:
return 0;
 }
 
+static gpa_t FNAME(get_level1_sp_gpa)(struct kvm_mmu_page *sp)
+{
+   int offset = 0;
+
+   WARN_ON(sp->role.level != 1);
+
+   if (PTTYPE == 32)
+   offset = sp->role.quadrant << PT64_LEVEL_BITS;
+
+   return gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
+}
+
 static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
 {
struct kvm_shadow_walk_iterator iterator;
@@ -663,7 +675,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
gpa_t pte_gpa = -1;
int level;
u64 *sptep;
-   int need_flush = 0;
 
vcpu_clear_mmio_info(vcpu, gva);
 
@@ -675,36 +686,20 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t 
gva)
 
sp = page_header(__pa(sptep));
if (is_last_spte(*sptep, level)) {
-   int offset, shift;
-
if (!sp->unsync)
break;
 
-   shift = PAGE_SHIFT -
- (PT_LEVEL_BITS - PT64_LEVEL_BITS) * level;
-   offset = sp->role.quadrant << shift;
-
-   pte_gpa = (sp->gfn << PAGE_SHIFT) + offset;
+   pte_gpa = FNAME(get_level1_sp_gpa)(sp);
pte_gpa += (sptep - sp->spt) * sizeof(pt_element_t);
 
-   if (is_shadow_present_pte(*sptep)) {
-   if (is_large_pte(*sptep))
-   --vcpu->kvm->stat.lpages;
-   drop_spte(vcpu->kvm, sptep);
-   need_flush = 1;
-   } else if (is_mmio_spte(*sptep))
-   mmu_spte_clear_no_track(sptep);
-
-   break;
+   if (mmu_page_zap_pte(vcpu->kvm, sp, sptep))
+   kvm_flush_remote_tlbs(vcpu->kvm);
}
 
if (!is_shadow_present_pte(*sptep) || !sp->unsync_children)
break;
}
 
-   if (need_flush)
-   kvm_flush_remote_tlbs(vcpu->kvm);
-
atomic_inc(&vcpu->kvm->arch.invlpg_counter);
 
spin_unlock(&vcpu->kvm->mmu_lock);
@@ -769,19 +764,14 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu 
*vcpu, gva_t vaddr,
  */
 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
-   int i, offset, nr_present;
+   int i, nr_present = 0;
bool host_writable;
gpa_t first_pte_gpa;
 
-   offset = nr_present = 0;
-
/* direct kvm_mmu_page can not be unsync. */
BUG_ON(sp->role.direct);
 
-   if (PTTYPE == 32)
-   offset = sp->role.quadrant << PT64_LEVEL_BITS;
-
-   first_pte_gpa = gfn_to_gpa(sp->gfn) + offset * sizeof(pt_element_t);
+   first_pte_gpa = FNAME(get_level1_sp_gpa)(sp);
 
for (i = 0; i < PT64_ENT_PER_PAGE; i++) {
unsigned pte_access;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body

[PATCH v4 05/11] KVM: MMU: do not mark accessed bit on pte write path

2011-09-22 Thread Xiao Guangrong
In current code, the accessed bit is always set when page fault occurred,
do not need to set it on pte write path

Signed-off-by: Xiao Guangrong 
---
 arch/x86/include/asm/kvm_host.h |1 -
 arch/x86/kvm/mmu.c  |   22 +-
 2 files changed, 1 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 27a25df..58ea3a7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -356,7 +356,6 @@ struct kvm_vcpu_arch {
gfn_t last_pt_write_gfn;
int   last_pt_write_count;
u64  *last_pte_updated;
-   gfn_t last_pte_gfn;
 
struct fpu guest_fpu;
u64 xcr0;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4e53d6b..6a35024 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2206,11 +2206,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (set_mmio_spte(sptep, gfn, pfn, pte_access))
return 0;
 
-   /*
-* We don't set the accessed bit, since we sometimes want to see
-* whether the guest actually used the pte (in order to detect
-* demand paging).
-*/
spte = PT_PRESENT_MASK;
if (!speculative)
spte |= shadow_accessed_mask;
@@ -2361,10 +2356,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
}
kvm_release_pfn_clean(pfn);
-   if (speculative) {
+   if (speculative)
vcpu->arch.last_pte_updated = sptep;
-   vcpu->arch.last_pte_gfn = gfn;
-   }
 }
 
 static void nonpaging_new_cr3(struct kvm_vcpu *vcpu)
@@ -3532,18 +3525,6 @@ static bool last_updated_pte_accessed(struct kvm_vcpu 
*vcpu)
return !!(spte && (*spte & shadow_accessed_mask));
 }
 
-static void kvm_mmu_access_page(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-   u64 *spte = vcpu->arch.last_pte_updated;
-
-   if (spte
-   && vcpu->arch.last_pte_gfn == gfn
-   && shadow_accessed_mask
-   && !(*spte & shadow_accessed_mask)
-   && is_shadow_present_pte(*spte))
-   set_bit(PT_ACCESSED_SHIFT, (unsigned long *)spte);
-}
-
 void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
   const u8 *new, int bytes,
   bool guest_initiated)
@@ -3614,7 +3595,6 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
++vcpu->kvm->stat.mmu_pte_write;
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
if (guest_initiated) {
-   kvm_mmu_access_page(vcpu, gfn);
if (gfn == vcpu->arch.last_pt_write_gfn
&& !last_updated_pte_accessed(vcpu)) {
++vcpu->arch.last_pt_write_count;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 04/11] KVM: x86: cleanup port-in/port-out emulated

2011-09-22 Thread Xiao Guangrong
Remove the same code between emulator_pio_in_emulated and
emulator_pio_out_emulated

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/x86.c |   59 ++-
 1 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 727a6af..a69a3e5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4327,32 +4327,24 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
return r;
 }
 
-
-static int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt,
-   int size, unsigned short port, void *val,
-   unsigned int count)
+static int emulator_pio_in_out(struct kvm_vcpu *vcpu, int size,
+  unsigned short port, void *val,
+  unsigned int count, bool in)
 {
-   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
-
-   if (vcpu->arch.pio.count)
-   goto data_avail;
-
-   trace_kvm_pio(0, port, size, count);
+   trace_kvm_pio(!in, port, size, count);
 
vcpu->arch.pio.port = port;
-   vcpu->arch.pio.in = 1;
+   vcpu->arch.pio.in = in;
vcpu->arch.pio.count  = count;
vcpu->arch.pio.size = size;
 
if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
-   data_avail:
-   memcpy(val, vcpu->arch.pio_data, size * count);
vcpu->arch.pio.count = 0;
return 1;
}
 
vcpu->run->exit_reason = KVM_EXIT_IO;
-   vcpu->run->io.direction = KVM_EXIT_IO_IN;
+   vcpu->run->io.direction = in ? KVM_EXIT_IO_IN : KVM_EXIT_IO_OUT;
vcpu->run->io.size = size;
vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE;
vcpu->run->io.count = count;
@@ -4361,36 +4353,37 @@ static int emulator_pio_in_emulated(struct 
x86_emulate_ctxt *ctxt,
return 0;
 }
 
-static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
-int size, unsigned short port,
-const void *val, unsigned int count)
+static int emulator_pio_in_emulated(struct x86_emulate_ctxt *ctxt,
+   int size, unsigned short port, void *val,
+   unsigned int count)
 {
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+   int ret;
 
-   trace_kvm_pio(1, port, size, count);
-
-   vcpu->arch.pio.port = port;
-   vcpu->arch.pio.in = 0;
-   vcpu->arch.pio.count = count;
-   vcpu->arch.pio.size = size;
-
-   memcpy(vcpu->arch.pio_data, val, size * count);
+   if (vcpu->arch.pio.count)
+   goto data_avail;
 
-   if (!kernel_pio(vcpu, vcpu->arch.pio_data)) {
+   ret = emulator_pio_in_out(vcpu, size, port, val, count, true);
+   if (ret) {
+data_avail:
+   memcpy(val, vcpu->arch.pio_data, size * count);
vcpu->arch.pio.count = 0;
return 1;
}
 
-   vcpu->run->exit_reason = KVM_EXIT_IO;
-   vcpu->run->io.direction = KVM_EXIT_IO_OUT;
-   vcpu->run->io.size = size;
-   vcpu->run->io.data_offset = KVM_PIO_PAGE_OFFSET * PAGE_SIZE;
-   vcpu->run->io.count = count;
-   vcpu->run->io.port = port;
-
return 0;
 }
 
+static int emulator_pio_out_emulated(struct x86_emulate_ctxt *ctxt,
+int size, unsigned short port,
+const void *val, unsigned int count)
+{
+   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+
+   memcpy(vcpu->arch.pio_data, val, size * count);
+   return emulator_pio_in_out(vcpu, size, port, (void *)val, count, false);
+}
+
 static unsigned long get_segment_base(struct kvm_vcpu *vcpu, int seg)
 {
return kvm_x86_ops->get_segment_base(vcpu, seg);
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 02/11] KVM: x86: tag the instructions which are used to write page table

2011-09-22 Thread Xiao Guangrong
The idea is from Avi:
| tag instructions that are typically used to modify the page tables, and
| drop shadow if any other instruction is used.
| The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg,
| and cmpxchg8b.

This patch is used to tag the instructions and in the later path, shadow page
is dropped if it is written by other instructions

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/emulate.c |   37 +
 1 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f1e3be1..a10950a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -125,8 +125,9 @@
 #define Lock(1<<26) /* lock prefix is allowed for the instruction */
 #define Priv(1<<27) /* instruction generates #GP if current CPL != 0 */
 #define No64   (1<<28)
+#define PageTable   (1 << 29)   /* instruction used to write page table */
 /* Source 2 operand type */
-#define Src2Shift   (29)
+#define Src2Shift   (30)
 #define Src2None(OpNone << Src2Shift)
 #define Src2CL  (OpCL << Src2Shift)
 #define Src2ImmByte (OpImmByte << Src2Shift)
@@ -3033,10 +3034,10 @@ static struct opcode group7_rm7[] = {
 
 static struct opcode group1[] = {
I(Lock, em_add),
-   I(Lock, em_or),
+   I(Lock | PageTable, em_or),
I(Lock, em_adc),
I(Lock, em_sbb),
-   I(Lock, em_and),
+   I(Lock | PageTable, em_and),
I(Lock, em_sub),
I(Lock, em_xor),
I(0, em_cmp),
@@ -3096,18 +3097,21 @@ static struct group_dual group7 = { {
 
 static struct opcode group8[] = {
N, N, N, N,
-   D(DstMem | SrcImmByte | ModRM), D(DstMem | SrcImmByte | ModRM | Lock),
-   D(DstMem | SrcImmByte | ModRM | Lock), D(DstMem | SrcImmByte | ModRM | 
Lock),
+   D(DstMem | SrcImmByte | ModRM),
+   D(DstMem | SrcImmByte | ModRM | Lock | PageTable),
+   D(DstMem | SrcImmByte | ModRM | Lock),
+   D(DstMem | SrcImmByte | ModRM | Lock | PageTable),
 };
 
 static struct group_dual group9 = { {
-   N, D(DstMem64 | ModRM | Lock), N, N, N, N, N, N,
+   N, D(DstMem64 | ModRM | Lock | PageTable), N, N, N, N, N, N,
 }, {
N, N, N, N, N, N, N, N,
 } };
 
 static struct opcode group11[] = {
-   I(DstMem | SrcImm | ModRM | Mov, em_mov), X7(D(Undefined)),
+   I(DstMem | SrcImm | ModRM | Mov | PageTable, em_mov),
+   X7(D(Undefined)),
 };
 
 static struct gprefix pfx_0f_6f_0f_7f = {
@@ -3120,7 +3124,7 @@ static struct opcode opcode_table[256] = {
I(ImplicitOps | Stack | No64 | Src2ES, em_push_sreg),
I(ImplicitOps | Stack | No64 | Src2ES, em_pop_sreg),
/* 0x08 - 0x0F */
-   I6ALU(Lock, em_or),
+   I6ALU(Lock | PageTable, em_or),
I(ImplicitOps | Stack | No64 | Src2CS, em_push_sreg),
N,
/* 0x10 - 0x17 */
@@ -3132,7 +3136,7 @@ static struct opcode opcode_table[256] = {
I(ImplicitOps | Stack | No64 | Src2DS, em_push_sreg),
I(ImplicitOps | Stack | No64 | Src2DS, em_pop_sreg),
/* 0x20 - 0x27 */
-   I6ALU(Lock, em_and), N, N,
+   I6ALU(Lock | PageTable, em_and), N, N,
/* 0x28 - 0x2F */
I6ALU(Lock, em_sub), N, I(ByteOp | DstAcc | No64, em_das),
/* 0x30 - 0x37 */
@@ -3165,11 +3169,11 @@ static struct opcode opcode_table[256] = {
G(ByteOp | DstMem | SrcImm | ModRM | No64 | Group, group1),
G(DstMem | SrcImmByte | ModRM | Group, group1),
I2bv(DstMem | SrcReg | ModRM, em_test),
-   I2bv(DstMem | SrcReg | ModRM | Lock, em_xchg),
+   I2bv(DstMem | SrcReg | ModRM | Lock | PageTable, em_xchg),
/* 0x88 - 0x8F */
-   I2bv(DstMem | SrcReg | ModRM | Mov, em_mov),
+   I2bv(DstMem | SrcReg | ModRM | Mov | PageTable, em_mov),
I2bv(DstReg | SrcMem | ModRM | Mov, em_mov),
-   I(DstMem | SrcNone | ModRM | Mov, em_mov_rm_sreg),
+   I(DstMem | SrcNone | ModRM | Mov | PageTable, em_mov_rm_sreg),
D(ModRM | SrcMem | NoAccess | DstReg),
I(ImplicitOps | SrcMem16 | ModRM, em_mov_sreg_rm),
G(0, group1A),
@@ -3182,7 +3186,7 @@ static struct opcode opcode_table[256] = {
II(ImplicitOps | Stack, em_popf, popf), N, N,
/* 0xA0 - 0xA7 */
I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
-   I2bv(DstMem | SrcAcc | Mov | MemAbs, em_mov),
+   I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
I2bv(SrcSI | DstDI | Mov | String, em_mov),
I2bv(SrcSI | DstDI | String, em_cmp),
/* 0xA8 - 0xAF */
@@ -3280,12 +3284,13 @@ static struct opcode twobyte_table[256] = {
D(DstMem | SrcReg | Src2CL | ModRM), N, N,
/* 0xA8 - 0xAF */
I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
-   DI(ImplicitOps, rsm), D(DstMem | SrcReg | ModRM | BitOp | Lock),
+   DI(ImplicitOps, rsm),
+   D(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable),
D(DstMem | SrcReg | Src2ImmByte | ModRM),
D(DstMem | SrcReg 

[PATCH v4 01/11] KVM: MMU: avoid pte_list_desc running out in kvm_mmu_pte_write

2011-09-22 Thread Xiao Guangrong
kvm_mmu_pte_write is unsafe since we need to alloc pte_list_desc in the
function when spte is prefetched, unfortunately, we can not know how many
spte need to be prefetched on this path, that means we can use out of the
free  pte_list_desc object in the cache, and BUG_ON() is triggered, also some
path does not fill the cache, such as INS instruction emulated that does not
trigger page fault

Signed-off-by: Xiao Guangrong 
---
 arch/x86/kvm/mmu.c |   25 -
 1 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5d7fbf0..b01afee 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -592,6 +592,11 @@ static int mmu_topup_memory_cache(struct 
kvm_mmu_memory_cache *cache,
return 0;
 }
 
+static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)
+{
+   return cache->nobjs;
+}
+
 static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
  struct kmem_cache *cache)
 {
@@ -969,6 +974,14 @@ static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t 
gfn, int level)
return &linfo->rmap_pde;
 }
 
+static bool rmap_can_add(struct kvm_vcpu *vcpu)
+{
+   struct kvm_mmu_memory_cache *cache;
+
+   cache = &vcpu->arch.mmu_pte_list_desc_cache;
+   return mmu_memory_cache_free_objects(cache);
+}
+
 static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
 {
struct kvm_mmu_page *sp;
@@ -3585,6 +3598,12 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
break;
}
 
+   /*
+* No need to care whether allocation memory is successful
+* or not since pte prefetch is skiped if it does not have
+* enough objects in the cache.
+*/
+   mmu_topup_memory_caches(vcpu);
spin_lock(&vcpu->kvm->mmu_lock);
if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
gentry = 0;
@@ -3655,7 +3674,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
mmu_page_zap_pte(vcpu->kvm, sp, spte);
if (gentry &&
  !((sp->role.word ^ vcpu->arch.mmu.base_role.word)
- & mask.word))
+ & mask.word) && rmap_can_add(vcpu))
mmu_pte_write_new_pte(vcpu, sp, spte, &gentry);
if (!remote_flush && need_remote_flush(entry, *spte))
remote_flush = true;
@@ -3716,10 +3735,6 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code,
goto out;
}
 
-   r = mmu_topup_memory_caches(vcpu);
-   if (r)
-   goto out;
-
er = x86_emulate_instruction(vcpu, cr2, 0, insn, insn_len);
 
switch (er) {
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 00/11] KVM: x86: optimize for writing guest page

2011-09-22 Thread Xiao Guangrong
This patchset is against https://github.com/avikivity/kvm.git next branch.

In this version, some changes come from Avi's comments:
- fix instruction retried for nested guest
- skip write-flooding for the sp whose level is 1
- rename some functions
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM-test: Add two scripts to disable services for perf tests

2011-09-22 Thread Amos Kong
- Original Message -
> On Wed, Sep 7, 2011 at 3:07 AM, Amos Kong  wrote:
> > System services on guest and host take uncertain resource, it
> > effects
> > the perf results. We can use the below two scripts to disable some
> > services of host and guest.
> >
> > stop_serivices_perf.sh is used to stop the running serivices.
> > off_service_perf.sh is used to off services when host starts up.
> 
> Hi Amos, thanks! I've been thinking about those scripts, let me
> summarize my thoughts:
> 
>  * If we are going to make this using scripts, then we should try as
> much as possible to be distro agnostic.

>  * We could detect services present on guest first before trying to
> disable them, ie, instead of just issuing a service koan stop, verify
> if there's a /etc/init.d/koan first.

We have a black list, if one server exist in the guest (/etc/init.d/*),
then stop it.

Some services need to be disabled by 'chkconfig * off', we can execute a 
python/shell
script in post installation, checking if service exists is also necessary.

>  * Have you considered doing this env setup using shell commands
>  typed
> on an ssh session, such as
> 
>  session = vm.wait_for_login()
>  session.cmd('service auditd stop')
> 
> Rather than a script? It's one less script to maintain under the
> scripts/ directory.
> 
>  * Your approach is a blacklist of services that will be turned off.
> What if we used a 'whitelist' approach and kept a list of the
> fundamental services needed and turn off *everything else*?

Blacklist is safe and better, the whitelist is difficult to maintain.

> Please let me know what you think about this. I'm not applying this
> for now.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html