Re: Live migration between Intel Q6600 and AMD Phenom II

2009-09-08 Thread Amit Shah
Hello,

On (Tue) Sep 08 2009 [13:32:39], Sterling Windmill wrote:
> I've read that it's possible to live migrate KVM guests between Intel and AMD 
> CPUs, is it also possible to migrate from a CPU without NPT/EPT to the Phenom 
> II that supports NPT? Will I lose out on any of the benefits NPT allows 
> without shutting down and restarting the guest?

Live migration between different vendors isn't tested enough for us to
be confident in saying it works well. There has been some work done in
the area, but there can always be bugs or unimplemented features. If you
do try it, please share your experiences whether good or bad.

I think NPT support should get enabled for your VM if the migration does
succeed.

> Also, any thoughts on how much more performant a 3.0GHz Phenom II will be for 
> running KVM guests than the 2.4GHz Intel Q6600?

The Q6600 does not support EPT, right? If so, the Phenom will be faster
as it does support NPT.

Amit
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest

2009-09-08 Thread Huang Ying
UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
where some hardware error such as some memory error can be reported
without PCC (processor context corrupted). To recover from such MCE,
the corresponding memory will be unmapped, and all processes accessing
the memory will be killed via SIGBUS.

For KVM, if QEMU/KVM is killed, all guest processes will be killed
too. So we relay SIGBUS from host OS to guest system via a UCR MCE
injection. Then guest OS can isolate corresponding memory and kill
necessary guest processes only. SIGBUS sent to main thread (not VCPU
threads) will be broadcast to all VCPU threads as UCR MCE.

v2:

- Use qemu_ram_addr_from_host instead of self made one to covert from
  host address to guest RAM address. Thanks Anthony Liguori.

Signed-off-by: Huang Ying 

---
 cpu-common.h  |1 
 exec.c|   20 +--
 qemu-kvm.c|  154 ++
 target-i386/cpu.h |   20 ++-
 4 files changed, 178 insertions(+), 17 deletions(-)

--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -27,10 +27,23 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #define false 0
 #define true 1
 
+#ifndef PR_MCE_KILL
+#define PR_MCE_KILL 33
+#endif
+
+#ifndef BUS_MCEERR_AR
+#define BUS_MCEERR_AR 4
+#endif
+#ifndef BUS_MCEERR_AO
+#define BUS_MCEERR_AO 5
+#endif
+
 #define EXPECTED_KVM_API_VERSION 12
 
 #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
@@ -1507,6 +1520,37 @@ static void sig_ipi_handler(int n)
 {
 }
 
+static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx)
+{
+if (siginfo->ssi_code == BUS_MCEERR_AO) {
+uint64_t status;
+unsigned long paddr;
+CPUState *cenv;
+
+/* Hope we are lucky for AO MCE */
+if (do_qemu_ram_addr_from_host((void *)siginfo->ssi_addr, &paddr)) {
+fprintf(stderr, "Hardware memory error for memory used by "
+"QEMU itself instead of guest system!: %llx\n",
+(unsigned long long)siginfo->ssi_addr);
+return;
+}
+status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+| 0xc0;
+kvm_inject_x86_mce(first_cpu, 9, status,
+   MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr,
+   (MCM_ADDR_PHYS << 6) | 0xc);
+for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu)
+kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC,
+   MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0);
+return;
+} else if (siginfo->ssi_code == BUS_MCEERR_AR)
+fprintf(stderr, "Hardware memory error!\n");
+else
+fprintf(stderr, "Internal error in QEMU!\n");
+exit(1);
+}
+
 static void on_vcpu(CPUState *env, void (*func)(void *data), void *data)
 {
 struct qemu_work_item wi;
@@ -1649,29 +1693,102 @@ static void flush_queued_work(CPUState *
 pthread_cond_broadcast(&qemu_work_cond);
 }
 
+static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo)
+{
+#if defined(KVM_CAP_MCE) && defined(TARGET_I386)
+struct kvm_x86_mce mce = {
+.bank = 9,
+};
+unsigned long paddr;
+int r;
+
+if (env->mcg_cap && siginfo->si_addr
+&& (siginfo->si_code == BUS_MCEERR_AR
+|| siginfo->si_code == BUS_MCEERR_AO)) {
+if (siginfo->si_code == BUS_MCEERR_AR) {
+/* Fake an Intel architectural Data Load SRAR UCR */
+mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+| MCI_STATUS_AR | 0x134;
+mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
+mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV;
+} else {
+/* Fake an Intel architectural Memory scrubbing UCR */
+mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN
+| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S
+| 0xc0;
+mce.misc = (MCM_ADDR_PHYS << 6) | 0xc;
+mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV;
+}
+if (do_qemu_ram_addr_from_host((void *)siginfo->si_addr, &paddr)) {
+fprintf(stderr, "Hardware memory error for memory used by "
+"QEMU itself instaed of guest system!\n");
+/* Hope we are lucky for AO MCE */
+if (siginfo->si_code == BUS_MCEERR_AO)
+return;
+else
+exit(1);
+}
+mce.addr = paddr;
+r = kvm_set_mce(env->kvm_cpu_state.vcpu_ctx, &mce);
+if (r < 0) {
+fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno));
+exit(1);
+}
+} else
+#endif
+{
+if (siginfo->si_code == BUS_MCEERR_AO)
+return;
+if (siginfo->si_code == BUS_MCEERR_AR)
+fprintf

[PATCH 4/4] KVM test: Rename BEFORE_YOU_START to README

2009-09-08 Thread Lucas Meneghel Rodrigues
Also, point out to the latest online documentation.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/BEFORE_YOU_START |   19 ---
 client/tests/kvm/README   |3 +++
 2 files changed, 3 insertions(+), 19 deletions(-)
 delete mode 100644 client/tests/kvm/BEFORE_YOU_START
 create mode 100644 client/tests/kvm/README

diff --git a/client/tests/kvm/BEFORE_YOU_START 
b/client/tests/kvm/BEFORE_YOU_START
deleted file mode 100644
index 7478d9d..000
--- a/client/tests/kvm/BEFORE_YOU_START
+++ /dev/null
@@ -1,19 +0,0 @@
-Install kvm and load modules.
-Remove 'env' file if exists.
-Remove control.state file if exists.
-
-Copy kvm_tests.cfg.sample into kvm_tests.cfg
-Modify kvm_tests.cfg to your liking.
-Modify control if you prefer to "use your own kvm" (comment out kvm_install).
-
-Create those symbolic links under kvm or under 
-qemu   -> qemu-kvm binary (unless using kvm_install)
-qemu-img   -> qemu-img binary (unless using kvm_install)
-isos/  -> isos (mount or symlink)
-images/-> images (mount or symlink)
-autotest/  -> ../../ (an autotest client directroy)
-steps_data/-> steps_data dir (when available)
-
-Please make sure qemu points to an "installed" kvm-qemu executable, and
-not one just compiled in the source directory. An installed executable "knows"
-where to find its associated data-dir (e.g. for bios).
diff --git a/client/tests/kvm/README b/client/tests/kvm/README
new file mode 100644
index 000..88d2c15
--- /dev/null
+++ b/client/tests/kvm/README
@@ -0,0 +1,3 @@
+In order to get started, please refer to the online documentation:
+
+http://www.linux-kvm.org/page/KVM-Autotest/Client_Install
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM test: Removing the fix_cdkeys.py program

2009-09-08 Thread Lucas Meneghel Rodrigues
That is no longer necessary since we handle cd keys on
a separate configuration file.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/fix_cdkeys.py |   76 
 1 files changed, 0 insertions(+), 76 deletions(-)
 delete mode 100755 client/tests/kvm/fix_cdkeys.py

diff --git a/client/tests/kvm/fix_cdkeys.py b/client/tests/kvm/fix_cdkeys.py
deleted file mode 100755
index aa9fc3e..000
--- a/client/tests/kvm/fix_cdkeys.py
+++ /dev/null
@@ -1,76 +0,0 @@
-#!/usr/bin/python
-"""
-Program that replaces the CD keys present on a KVM autotest configuration file.
-
-...@copyright: Red Hat 2008-2009
-...@author: u...@redhat.com (Uri Lublin)
-"""
-
-import shutil, os, sys
-import common
-
-
-def file_to_lines(filename):
-f = open(filename, 'r')
-lines = f.readlines()
-f.close
-return lines
-
-def lines_to_file(filename, lines):
-f = open(filename, 'w')
-f.writelines(lines)
-f.close()
-
-def replace_var_with_val(lines, variables):
-new = []
-for line in lines:
-for (var,val) in variables:
-if var in line:
-print 'replacing %s with %s in "%s"' % (var, val, line[:-1])
-line = line.replace(var, val)
-print ' ... new line is "%s"' % (line[:-1])
-new.append(line)
-return new
-
-def filter_comments(line):
-return not line.strip().startswith('#')
-
-def filter_empty(line):
-return len(line.strip()) != 0
-
-def line_to_pair(line):
-x,y = line.split('=', 1)
-return (x.strip(), y.strip())
-
-def read_vars(varfile):
-varlines = file_to_lines(varfile)
-varlines = filter(filter_comments, varlines)
-varlines = filter(filter_empty,varlines)
-vars = map(line_to_pair, varlines)
-return vars
-
-def main(cfgfile, varfile):
-# first save a copy of the original file (if does not exist)
-backupfile = '%s.backup' % cfgfile
-if not os.path.exists(backupfile):
-shutil.copy(cfgfile, backupfile)
-
-vars = read_vars(varfile)
-datalines = file_to_lines(cfgfile)
-newlines = replace_var_with_val(datalines, vars)
-lines_to_file(cfgfile, newlines)
-
-
-if __name__ == '__main__':
-def die(msg, val):
-print msg
-sys.exit(val)
-if len(sys.argv) != 3:
-die('usage: %s  ', 1)
-cfgfile = sys.argv[1]
-varfile = sys.argv[2]
-if not os.path.exists(cfgfile):
-die('bad cfgfile "%s"' % cfgfile, 2)
-if not os.path.exists(varfile):
-die('bad varfile "%s"' % varfile, 2)
-main(cfgfile, varfile)
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM test: Move top level docstrings, other cleanups

2009-09-08 Thread Lucas Meneghel Rodrigues
In order to prepare for the subsequent changes, made
some cleanups on the kvm source files: I've noticed
that the top level docstrings were going before the
imports block, and that does not follow the pattern
found on other files (my fault). This patch fixes
that problem and fixed some places on scan_results.py
where 80 char line width was not being obeyed. Also,
cleaned up the last places where we were using the
shebang #/usr/bin/env python, which is not the
preferred usage of the shebang across the project.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/calc_md5sum_1m.py   |8 
 client/tests/kvm/fix_cdkeys.py   |6 +++---
 client/tests/kvm/kvm_config.py   |   10 +-
 client/tests/kvm/kvm_guest_wizard.py |   12 ++--
 client/tests/kvm/kvm_subprocess.py   |8 
 client/tests/kvm/kvm_tests.py|8 
 client/tests/kvm/kvm_utils.py|   12 ++--
 client/tests/kvm/kvm_vm.py   |6 +++---
 client/tests/kvm/make_html_report.py |   13 +++--
 client/tests/kvm/ppm_utils.py|5 ++---
 client/tests/kvm/scan_results.py |   25 +
 client/tests/kvm/stepeditor.py   |8 
 client/tests/kvm/stepmaker.py|   13 +++--
 13 files changed, 68 insertions(+), 66 deletions(-)

diff --git a/client/tests/kvm/calc_md5sum_1m.py 
b/client/tests/kvm/calc_md5sum_1m.py
index 6660d0e..2325673 100755
--- a/client/tests/kvm/calc_md5sum_1m.py
+++ b/client/tests/kvm/calc_md5sum_1m.py
@@ -1,7 +1,4 @@
-#!/usr/bin/env python
-import os, sys
-import kvm_utils
-
+#!/usr/bin/python
 """
 Program that calculates the md5sum for the first megabyte of a file.
 It's faster than calculating the md5sum for the whole ISO image.
@@ -10,6 +7,9 @@ It's faster than calculating the md5sum for the whole ISO 
image.
 @author: Uri Lublin (u...@redhat.com)
 """
 
+import os, sys
+import kvm_utils
+
 
 if len(sys.argv) < 2:
 print 'usage: %s ' % sys.argv[0]
diff --git a/client/tests/kvm/fix_cdkeys.py b/client/tests/kvm/fix_cdkeys.py
index 7a821fa..aa9fc3e 100755
--- a/client/tests/kvm/fix_cdkeys.py
+++ b/client/tests/kvm/fix_cdkeys.py
@@ -1,7 +1,4 @@
 #!/usr/bin/python
-import shutil, os, sys
-import common
-
 """
 Program that replaces the CD keys present on a KVM autotest configuration file.
 
@@ -9,6 +6,9 @@ Program that replaces the CD keys present on a KVM autotest 
configuration file.
 @author: u...@redhat.com (Uri Lublin)
 """
 
+import shutil, os, sys
+import common
+
 
 def file_to_lines(filename):
 f = open(filename, 'r')
diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py
index da7988b..9404f28 100755
--- a/client/tests/kvm/kvm_config.py
+++ b/client/tests/kvm/kvm_config.py
@@ -1,15 +1,15 @@
 #!/usr/bin/python
-import logging, re, os, sys, StringIO, optparse
-import common
-from autotest_lib.client.common_lib import error
-from autotest_lib.client.common_lib import logging_config, logging_manager
-
 """
 KVM configuration file utility functions.
 
 @copyright: Red Hat 2008-2009
 """
 
+import logging, re, os, sys, StringIO, optparse
+import common
+from autotest_lib.client.common_lib import error
+from autotest_lib.client.common_lib import logging_config, logging_manager
+
 
 class KvmLoggingConfig(logging_config.LoggingConfig):
 def configure_logging(self, results_dir=None, verbose=False):
diff --git a/client/tests/kvm/kvm_guest_wizard.py 
b/client/tests/kvm/kvm_guest_wizard.py
index 3d3f3b2..8bc85f2 100644
--- a/client/tests/kvm/kvm_guest_wizard.py
+++ b/client/tests/kvm/kvm_guest_wizard.py
@@ -1,3 +1,9 @@
+"""
+Utilities to perform automatic guest installation using step files.
+
+...@copyright: Red Hat 2008-2009
+"""
+
 import os, time, md5, re, shutil, logging
 from autotest_lib.client.common_lib import utils, error
 import kvm_utils, ppm_utils, kvm_subprocess
@@ -9,12 +15,6 @@ except ImportError:
 'please install python-imaging or the equivalent for your '
 'distro.')
 
-"""
-Utilities to perform automatic guest installation using step files.
-
-...@copyright: Red Hat 2008-2009
-"""
-
 
 def handle_var(vm, params, varname):
 var = params.get(varname)
diff --git a/client/tests/kvm/kvm_subprocess.py 
b/client/tests/kvm/kvm_subprocess.py
index 07303a8..5df9e9b 100755
--- a/client/tests/kvm/kvm_subprocess.py
+++ b/client/tests/kvm/kvm_subprocess.py
@@ -1,14 +1,14 @@
 #!/usr/bin/python
-import sys, subprocess, pty, select, os, time, signal, re, termios, fcntl
-import threading, logging, commands
-import common, kvm_utils
-
 """
 A class and functions used for running and controlling child processes.
 
 @copyright: 2008-2009 Red Hat Inc.
 """
 
+import sys, subprocess, pty, select, os, time, signal, re, termios, fcntl
+import threading, logging, commands
+import common, kvm_utils
+
 
 def run_bg(command, termination_func=None, output_func=None, output_prefix="",
timeout=1.0):
diff --git a/cli

Re: Modifying RAM during runtime on guest

2009-09-08 Thread Brian Jackson
On Tuesday 08 September 2009 03:52:07 pm Daniel Bareiro wrote:
> Hi all!
> 
> I'm trying to modify the amount of RAM that has some of guests. Host has
> 2.6.30 kernel with KVM-88.
> 
> In one of guest I didn't have problems when decreasing the amount of memory
> from 3584 MIB to 1024 MiB. This guest has 2.6.26-2-686 stock kernel. Also I
> was trying to decrease the amount RAM of another guest from 3584 MiB to
> 2048 MiB, but it didn't work. This other guest has
> 2.6.24-etchnhalf.1-686-bigmem stock kernel. Does Ballooning in guest
> require 2.6.25 or superior?


I don't know, if that kernel has a virtio-balloon driver, I'd think that was 
all you need to balloon memory.


> 
> Thinking that it could be an impediment related to the kernel version of
> guest, I tried to increase the memory of another one guest with
> 2.6.26-2-686 from 512 MIB to 1024 MIB, but this didn't work either.


You can only grow memory up to the amount you specified on the command line if 
you've already ballooned down. So if you specify "-m 1024M" on the command 
line, then shrink it to 512, you could then balloon it back up to a max of 
1024.


> 
> These are the statistics of of memory usage in host:
> 
> # free
>  total   used   free sharedbuffers cached
> Mem:  16469828   147634601706368  07800712 202044
> -/+ buffers/cache:67607049709124
> Swap:  8319948  192408300708
> 
> 
> 
> Which can be the cause?
> 
> Thanks in advance for your reply.
> 
> Regards,
> Daniel
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Modifying RAM during runtime on guest

2009-09-08 Thread Daniel Bareiro
Hi all!

I'm trying to modify the amount of RAM that has some of guests. Host has
2.6.30 kernel with KVM-88.

In one of guest I didn't have problems when decreasing the amount of memory
from 3584 MIB to 1024 MiB. This guest has 2.6.26-2-686 stock kernel. Also I
was trying to decrease the amount RAM of another guest from 3584 MiB to
2048 MiB, but it didn't work. This other guest has
2.6.24-etchnhalf.1-686-bigmem stock kernel. Does Ballooning in guest
require 2.6.25 or superior?

Thinking that it could be an impediment related to the kernel version of
guest, I tried to increase the memory of another one guest with
2.6.26-2-686 from 512 MIB to 1024 MIB, but this didn't work either.

These are the statistics of of memory usage in host:

# free
 total   used   free sharedbuffers cached
Mem:  16469828   147634601706368  07800712 202044
-/+ buffers/cache:67607049709124
Swap:  8319948  192408300708



Which can be the cause?

Thanks in advance for your reply.

Regards,
Daniel
-- 
Fingerprint: BFB3 08D6 B4D1 31B2 72B9  29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Squeeze - Linux user #188.598


signature.asc
Description: Digital signature


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-08 Thread Michael S. Tsirkin
On Tue, Sep 08, 2009 at 10:20:35AM -0700, Ira W. Snyder wrote:
> On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
> > On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> > > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > > > What it is: vhost net is a character device that can be used to reduce
> > > > the number of system calls involved in virtio networking.
> > > > Existing virtio net code is used in the guest without modification.
> > > > 
> > > > There's similarity with vringfd, with some differences and reduced scope
> > > > - uses eventfd for signalling
> > > > - structures can be moved around in memory at any time (good for 
> > > > migration)
> > > > - support memory table and not just an offset (needed for kvm)
> > > > 
> > > > common virtio related code has been put in a separate file vhost.c and
> > > > can be made into a separate module if/when more backends appear.  I used
> > > > Rusty's lguest.c as the source for developing this part : this supplied
> > > > me with witty comments I wouldn't be able to write myself.
> > > > 
> > > > What it is not: vhost net is not a bus, and not a generic new system
> > > > call. No assumptions are made on how guest performs hypercalls.
> > > > Userspace hypervisors are supported as well as kvm.
> > > > 
> > > > How it works: Basically, we connect virtio frontend (configured by
> > > > userspace) to a backend. The backend could be a network device, or a
> > > > tun-like device. In this version I only support raw socket as a backend,
> > > > which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> > > > also configured by userspace, including vlan/mac etc.
> > > > 
> > > > Status:
> > > > This works for me, and I haven't see any crashes.
> > > > I have done some light benchmarking (with v4), compared to userspace, I
> > > > see improved latency (as I save up to 4 system calls per packet) but not
> > > > bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> > > > ping benchmark (where there's no TSO) troughput is also improved.
> > > > 
> > > > Features that I plan to look at in the future:
> > > > - tap support
> > > > - TSO
> > > > - interrupt mitigation
> > > > - zero copy
> > > > 
> > > 
> > > Hello Michael,
> > > 
> > > I've started looking at vhost with the intention of using it over PCI to
> > > connect physical machines together.
> > > 
> > > The part that I am struggling with the most is figuring out which parts
> > > of the rings are in the host's memory, and which parts are in the
> > > guest's memory.
> > 
> > All rings are in guest's memory, to match existing virtio code.
> 
> Ok, this makes sense.
> 
> > vhost
> > assumes that the memory space of the hypervisor userspace process covers
> > the whole of guest memory.
> 
> Is this necessary? Why?

Because with virtio ring can give us arbitrary guest addresses.  If
guest was limited to using a subset of addresses, hypervisor would only
have to map these.

> The assumption seems very wrong when you're
> doing data transport between two physical systems via PCI.
> I know vhost has not been designed for this specific situation, but it
> is good to be looking toward other possible uses.
> 
> > And there's a translation table.
> > Ring addresses are userspace addresses, they do not undergo translation.
> > 
> > > If I understand everything correctly, the rings are all userspace
> > > addresses, which means that they can be moved around in physical memory,
> > > and get pushed out to swap.
> > 
> > Unless they are locked, yes.
> > 
> > > AFAIK, this is impossible to handle when
> > > connecting two physical systems, you'd need the rings available in IO
> > > memory (PCI memory), so you can ioreadXX() them instead. To the best of
> > > my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
> > > Also, having them migrate around in memory would be a bad thing.
> > > 
> > > Also, I'm having trouble figuring out how the packet contents are
> > > actually copied from one system to the other. Could you point this out
> > > for me?
> > 
> > The code in net/packet/af_packet.c does it when vhost calls sendmsg.
> > 
> 
> Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible
> to make this use a DMA engine instead?

Maybe.

> I know this was suggested in an earlier thread.

Yes, it might even give some performance benefit with e.g. I/O AT.

> > > Is there somewhere I can find the userspace code (kvm, qemu, lguest,
> > > etc.) code needed for interacting with the vhost misc device so I can
> > > get a better idea of how userspace is supposed to work?
> > 
> > Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost 
> > net.
> > 
> > > (Features
> > > negotiation, etc.)
> > > 
> > 
> > That's not yet implemented as there are no features yet.  I'm working on
> > tap support, which will add a feature bit.  Overall, qemu does an ioctl
> > to query supported features, and then a

Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

2009-09-08 Thread Glauber Costa
On Tue, Sep 08, 2009 at 05:00:04PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 08, 2009 at 04:37:52PM -0300, Glauber Costa wrote:
> > On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote:
> > > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> > > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> > > > However, the current mechanism will not propagate changes in wallclock 
> > > > value
> > > > upwards. This effectively means that in a large pool of VMs that need 
> > > > accurate timing,
> > > > all of them has to run NTP, instead of just the host doing it.
> > > > 
> > > > Since the host updates information in the shared memory area upon msr 
> > > > writes,
> > > > this patch introduces a worker that writes to that msr, and calls 
> > > > do_settimeofday
> > > > at fixed intervals, with second resolution. A interval of 0 determines 
> > > > that we
> > > > are not interested in this behaviour. A later patch will make this 
> > > > optional at
> > > > runtime
> > > > 
> > > > Signed-off-by: Glauber Costa 
> > > 
> > > As mentioned before, ntp already does this (and its not that heavy is
> > > it?).
> > > 
> > > For example, if ntp running on the host, it avoids stepping the clock
> > > backwards by slow adjustment, while the periodic frequency adjustment on
> > > the guest bypasses that.
> > 
> > Simple question: How do I run ntp in guests without network?
> 
> You don't.
For those guests, the mechanism I am proposing comes handy.

Furthermore, it is not only optional, but disabled by default. And then even if 
you
have a network, but a genuine reason not to use ntp in your VMs, you can use it 
too.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

2009-09-08 Thread Anthony Liguori

Marcelo Tosatti wrote:


Simple question: How do I run ntp in guests without network?



You don't.
  
Why bother doing this in the kernel?  Isn't this the sort of thing 
vmchannel is supposed to handle.  openvm-tools does this.


/me ducks

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

2009-09-08 Thread Marcelo Tosatti
On Tue, Sep 08, 2009 at 04:37:52PM -0300, Glauber Costa wrote:
> On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote:
> > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> > > However, the current mechanism will not propagate changes in wallclock 
> > > value
> > > upwards. This effectively means that in a large pool of VMs that need 
> > > accurate timing,
> > > all of them has to run NTP, instead of just the host doing it.
> > > 
> > > Since the host updates information in the shared memory area upon msr 
> > > writes,
> > > this patch introduces a worker that writes to that msr, and calls 
> > > do_settimeofday
> > > at fixed intervals, with second resolution. A interval of 0 determines 
> > > that we
> > > are not interested in this behaviour. A later patch will make this 
> > > optional at
> > > runtime
> > > 
> > > Signed-off-by: Glauber Costa 
> > 
> > As mentioned before, ntp already does this (and its not that heavy is
> > it?).
> > 
> > For example, if ntp running on the host, it avoids stepping the clock
> > backwards by slow adjustment, while the periodic frequency adjustment on
> > the guest bypasses that.
> 
> Simple question: How do I run ntp in guests without network?

You don't.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

2009-09-08 Thread Glauber Costa
On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote:
> On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> > KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> > However, the current mechanism will not propagate changes in wallclock value
> > upwards. This effectively means that in a large pool of VMs that need 
> > accurate timing,
> > all of them has to run NTP, instead of just the host doing it.
> > 
> > Since the host updates information in the shared memory area upon msr 
> > writes,
> > this patch introduces a worker that writes to that msr, and calls 
> > do_settimeofday
> > at fixed intervals, with second resolution. A interval of 0 determines that 
> > we
> > are not interested in this behaviour. A later patch will make this optional 
> > at
> > runtime
> > 
> > Signed-off-by: Glauber Costa 
> 
> As mentioned before, ntp already does this (and its not that heavy is
> it?).
> 
> For example, if ntp running on the host, it avoids stepping the clock
> backwards by slow adjustment, while the periodic frequency adjustment on
> the guest bypasses that.

Simple question: How do I run ntp in guests without network?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock

2009-09-08 Thread Marcelo Tosatti
On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote:
> KVM clock is great to avoid drifting in guest VMs running ontop of kvm.
> However, the current mechanism will not propagate changes in wallclock value
> upwards. This effectively means that in a large pool of VMs that need 
> accurate timing,
> all of them has to run NTP, instead of just the host doing it.
> 
> Since the host updates information in the shared memory area upon msr writes,
> this patch introduces a worker that writes to that msr, and calls 
> do_settimeofday
> at fixed intervals, with second resolution. A interval of 0 determines that we
> are not interested in this behaviour. A later patch will make this optional at
> runtime
> 
> Signed-off-by: Glauber Costa 

As mentioned before, ntp already does this (and its not that heavy is
it?).

For example, if ntp running on the host, it avoids stepping the clock
backwards by slow adjustment, while the periodic frequency adjustment on
the guest bypasses that.

> ---
>  arch/x86/kernel/kvmclock.c |   70 ++-
>  1 files changed, 61 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index e5efcdc..555aab0 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -27,6 +27,7 @@
>  #define KVM_SCALE 22
>  
>  static int kvmclock = 1;
> +static unsigned int kvm_wall_update_interval = 0;
>  
>  static int parse_no_kvmclock(char *arg)
>  {
> @@ -39,24 +40,75 @@ early_param("no-kvmclock", parse_no_kvmclock);
>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, 
> hv_clock);
>  static struct pvclock_wall_clock wall_clock;
>  
> -/*
> - * The wallclock is the time of day when we booted. Since then, some time may
> - * have elapsed since the hypervisor wrote the data. So we try to account for
> - * that with system time
> - */
> -static unsigned long kvm_get_wallclock(void)
> +static void kvm_get_wall_ts(struct timespec *ts)
>  {
> - struct pvclock_vcpu_time_info *vcpu_time;
> - struct timespec ts;
>   int low, high;
> + struct pvclock_vcpu_time_info *vcpu_time;
>  
>   low = (int)__pa_symbol(&wall_clock);
>   high = ((u64)__pa_symbol(&wall_clock) >> 32);
>   native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
>  
>   vcpu_time = &get_cpu_var(hv_clock);
> - pvclock_read_wallclock(&wall_clock, vcpu_time, &ts);
> + pvclock_read_wallclock(&wall_clock, vcpu_time, ts);
>   put_cpu_var(hv_clock);
> +}
> +
> +static void kvm_sync_wall_clock(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock);
> +
> +static void schedule_next_update(void)
> +{
> + struct timespec next;
> +
> + if ((kvm_wall_update_interval == 0) ||
> +(!kvm_para_available()) ||
> +(!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
> + return;
> +
> + next.tv_sec = kvm_wall_update_interval;
> + next.tv_nsec = 0;
> +
> + schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next));
> +}
> +
> +static void kvm_sync_wall_clock(struct work_struct *work)
> +{
> + struct timespec now, after;
> + u64 nsec_delta;
> +
> + do {
> + kvm_get_wall_ts(&now);
> + do_settimeofday(&now);
> + kvm_get_wall_ts(&after);
> + nsec_delta = (u64)after.tv_sec * NSEC_PER_SEC + after.tv_nsec;
> + nsec_delta -= (u64)now.tv_sec * NSEC_PER_SEC + now.tv_nsec;
> + } while (nsec_delta > NSEC_PER_SEC / 8);
> +
> + schedule_next_update();
> +}
> +
> +static __init int init_updates(void)
> +{
> + schedule_next_update();
> + return 0;
> +}
> +/*
> + * It has to be run after workqueues are initialized, since we call
> + * schedule_delayed_work. Other than that, we have no specific requirements
> + */
> +late_initcall(init_updates);
> +
> +/*
> + * The wallclock is the time of day when we booted. Since then, some time may
> + * have elapsed since the hypervisor wrote the data. So we try to account for
> + * that with system time
> + */
> +static unsigned long kvm_get_wallclock(void)
> +{
> + struct timespec ts;
> +
> + kvm_get_wall_ts(&ts);
>  
>   return ts.tv_sec;
>  }
> -- 
> 1.6.2.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] defer skb allocation in virtio_net -- mergable buff part

2009-09-08 Thread Shirley Ma
Thanks Michael for you details review comments. I am just back from my
vacation. I am working on what you have raised here.

Shirley

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: x86: drop duplicate kvm_flush_remote_tlb calls

2009-09-08 Thread Marcelo Tosatti

kvm_mmu_slot_remove_write_access already calls it.

Signed-off-by: Marcelo Tosatti 

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 891234b..f83e990 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2159,7 +2159,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
spin_lock(&kvm->mmu_lock);
kvm_mmu_slot_remove_write_access(kvm, log->slot);
spin_unlock(&kvm->mmu_lock);
-   kvm_flush_remote_tlbs(kvm);
memslot = &kvm->memslots[log->slot];
n = ALIGN(memslot->npages, BITS_PER_LONG) / 8;
memset(memslot->dirty_bitmap, 0, n);
@@ -4907,7 +4906,6 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
 
kvm_mmu_slot_remove_write_access(kvm, mem->slot);
spin_unlock(&kvm->mmu_lock);
-   kvm_flush_remote_tlbs(kvm);
 
return 0;
 }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM: SVM: remove needless mmap_sem acquision from nested_svm_map

2009-09-08 Thread Marcelo Tosatti

nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since
gfn_to_page / get_user_pages are responsible for it.

Signed-off-by: Marcelo Tosatti 

Index: kvm/arch/x86/kvm/svm.c
===
--- kvm.orig/arch/x86/kvm/svm.c
+++ kvm/arch/x86/kvm/svm.c
@@ -1394,10 +1394,7 @@ static void *nested_svm_map(struct vcpu_
 {
struct page *page;
 
-   down_read(¤t->mm->mmap_sem);
page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT);
-   up_read(¤t->mm->mmap_sem);
-
if (is_error_page(page))
goto error;
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Live migration between Intel Q6600 and AMD Phenom II

2009-09-08 Thread Sterling Windmill
I've read that it's possible to live migrate KVM guests between Intel and AMD 
CPUs, is it also possible to migrate from a CPU without NPT/EPT to the Phenom 
II that supports NPT? Will I lose out on any of the benefits NPT allows without 
shutting down and restarting the guest?

Also, any thoughts on how much more performant a 3.0GHz Phenom II will be for 
running KVM guests than the 2.4GHz Intel Q6600?

Thanks in advance.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-09-08 Thread Ira W. Snyder
On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote:
> On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote:
> > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote:
> > > What it is: vhost net is a character device that can be used to reduce
> > > the number of system calls involved in virtio networking.
> > > Existing virtio net code is used in the guest without modification.
> > > 
> > > There's similarity with vringfd, with some differences and reduced scope
> > > - uses eventfd for signalling
> > > - structures can be moved around in memory at any time (good for 
> > > migration)
> > > - support memory table and not just an offset (needed for kvm)
> > > 
> > > common virtio related code has been put in a separate file vhost.c and
> > > can be made into a separate module if/when more backends appear.  I used
> > > Rusty's lguest.c as the source for developing this part : this supplied
> > > me with witty comments I wouldn't be able to write myself.
> > > 
> > > What it is not: vhost net is not a bus, and not a generic new system
> > > call. No assumptions are made on how guest performs hypercalls.
> > > Userspace hypervisors are supported as well as kvm.
> > > 
> > > How it works: Basically, we connect virtio frontend (configured by
> > > userspace) to a backend. The backend could be a network device, or a
> > > tun-like device. In this version I only support raw socket as a backend,
> > > which can be bound to e.g. SR IOV, or to macvlan device.  Backend is
> > > also configured by userspace, including vlan/mac etc.
> > > 
> > > Status:
> > > This works for me, and I haven't see any crashes.
> > > I have done some light benchmarking (with v4), compared to userspace, I
> > > see improved latency (as I save up to 4 system calls per packet) but not
> > > bandwidth/CPU (as TSO and interrupt mitigation are not supported).  For
> > > ping benchmark (where there's no TSO) troughput is also improved.
> > > 
> > > Features that I plan to look at in the future:
> > > - tap support
> > > - TSO
> > > - interrupt mitigation
> > > - zero copy
> > > 
> > 
> > Hello Michael,
> > 
> > I've started looking at vhost with the intention of using it over PCI to
> > connect physical machines together.
> > 
> > The part that I am struggling with the most is figuring out which parts
> > of the rings are in the host's memory, and which parts are in the
> > guest's memory.
> 
> All rings are in guest's memory, to match existing virtio code.

Ok, this makes sense.

> vhost
> assumes that the memory space of the hypervisor userspace process covers
> the whole of guest memory.

Is this necessary? Why? The assumption seems very wrong when you're
doing data transport between two physical systems via PCI.

I know vhost has not been designed for this specific situation, but it
is good to be looking toward other possible uses.

> And there's a translation table.
> Ring addresses are userspace addresses, they do not undergo translation.
> 
> > If I understand everything correctly, the rings are all userspace
> > addresses, which means that they can be moved around in physical memory,
> > and get pushed out to swap.
> 
> Unless they are locked, yes.
> 
> > AFAIK, this is impossible to handle when
> > connecting two physical systems, you'd need the rings available in IO
> > memory (PCI memory), so you can ioreadXX() them instead. To the best of
> > my knowledge, I shouldn't be using copy_to_user() on an __iomem address.
> > Also, having them migrate around in memory would be a bad thing.
> > 
> > Also, I'm having trouble figuring out how the packet contents are
> > actually copied from one system to the other. Could you point this out
> > for me?
> 
> The code in net/packet/af_packet.c does it when vhost calls sendmsg.
> 

Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible
to make this use a DMA engine instead? I know this was suggested in an
earlier thread.

> > Is there somewhere I can find the userspace code (kvm, qemu, lguest,
> > etc.) code needed for interacting with the vhost misc device so I can
> > get a better idea of how userspace is supposed to work?
> 
> Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost net.
> 
> > (Features
> > negotiation, etc.)
> > 
> 
> That's not yet implemented as there are no features yet.  I'm working on
> tap support, which will add a feature bit.  Overall, qemu does an ioctl
> to query supported features, and then acks them with another ioctl.  I'm
> also trying to avoid duplicating functionality available elsewhere.  So
> that to check e.g. TSO support, you'd just look at the underlying
> hardware device you are binding to.
> 

Ok. Do you have plans to support the VIRTIO_NET_F_MRG_RXBUF feature in
the future? I found that this made an enormous improvement in throughput
on my virtio-net <-> virtio-net system. Perhaps it isn't needed with
vhost-net.

Thanks for replying,
Ira
--
To unsubscribe from this list: send the 

Re: kvm ptrace 32bit DoS bug - bisected

2009-09-08 Thread Jan Kiszka
Marcelo Tosatti wrote:
> On Sun, Sep 06, 2009 at 02:50:00PM +0700, Antoine Martin wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA512
>>
>> [snip]
 Is this an AMD host? 
>>> Nope, Intel Core2, more host info :
>> I have put all the relevant binaries and their config files here:
>> http://uml.devloop.org.uk/kvmbug/
>> Host kernel, qemu binary, kvm guest kernel and the UML binary I have
>> used for bisecting.
> 
> Antoine,
> 
> Works for me with master branch. Its likely this commit fixed it:
> 
> commit 76d4622776d007de3f90f311591babc5f6ba6f39
> Author: Avi Kivity 
> Date:   Tue Sep 1 12:03:25 2009 +0300
> 
> KVM: VMX: Check cpl before emulating debug register access
> 
> Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
> code to emulate the instruction even though it was issued from guest
> userspace, possibly leading to an unexpected trap later.
> 
> It will be included in 2.6.30 / 2.6.27 stable (.29 is not maintained
> anymore).

Easy to check: Does the UML image still contain mov-to-db instructions?
If not, this commit cannot make the difference.

Jan



signature.asc
Description: OpenPGP digital signature


Re: kvm ptrace 32bit DoS bug - bisected

2009-09-08 Thread Marcelo Tosatti
On Sun, Sep 06, 2009 at 02:50:00PM +0700, Antoine Martin wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> [snip]
> >> Is this an AMD host? 
> > Nope, Intel Core2, more host info :
> I have put all the relevant binaries and their config files here:
> http://uml.devloop.org.uk/kvmbug/
> Host kernel, qemu binary, kvm guest kernel and the UML binary I have
> used for bisecting.

Antoine,

Works for me with master branch. Its likely this commit fixed it:

commit 76d4622776d007de3f90f311591babc5f6ba6f39
Author: Avi Kivity 
Date:   Tue Sep 1 12:03:25 2009 +0300

KVM: VMX: Check cpl before emulating debug register access

Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

It will be included in 2.6.30 / 2.6.27 stable (.29 is not maintained
anymore).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Adding a userspace application crash handling system to autotest

2009-09-08 Thread Lucas Meneghel Rodrigues
This patch adds a system to watch user space segmentation
faults, writing core dumps and some degree of core dump
analysis report. We believe that such a system will be
beneficial for autotest as a whole, since the ability to
get core dumps and dump analysis for each app crashing
during an autotest execution can help test engineers with
richer debugging information.

The system is comprised by 2 parts:

 * Modifications on test code that enable core dumps
generation, register a core handler script in the kernel
and check by generated core files at the end of each
test.

 * A core handler script that is going to write the
core on each test debug dir in a convenient way, with
a report that currently is comprised by the process that
died and a gdb stacktrace of the process. As the system
gets in shape, we could add more scripts that can do
fancier stuff (such as handlers that use frysk to get
more info such as memory maps, provided that we have
frysk installed in the machine).

This is the proof of concept of the system. I am sending it
to the mailing list on this early stage so I can get
feedback on the feature. The system passes my basic
tests:

 * Run a simple long test, such as the kvm test, and
then crash an application while the test is running. I
get reports generated on test.debugdir

 * Run a slightly more complex control file, with 3 parallel
bonnie instances at once and crash an application while the
test is running. I get reports generated on all
test.debugdirs.

3rd try:
 * Explicitely enable core dumps using the resource module
 * Fixed a bug on the crash detection code, and factored
   it into a utility function.

I believe we are good to go now.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/common_lib/test.py |   66 +-
 client/tools/crash_handler.py |  202 +
 2 files changed, 266 insertions(+), 2 deletions(-)
 create mode 100755 client/tools/crash_handler.py

diff --git a/client/common_lib/test.py b/client/common_lib/test.py
index 362c960..65b78a3 100644
--- a/client/common_lib/test.py
+++ b/client/common_lib/test.py
@@ -17,7 +17,7 @@
 #   tmpdir  eg. tmp/_
 
 import fcntl, os, re, sys, shutil, tarfile, tempfile, time, traceback
-import warnings, logging
+import warnings, logging, glob, resource
 
 from autotest_lib.client.common_lib import error
 from autotest_lib.client.bin import utils
@@ -31,7 +31,6 @@ class base_test:
 self.job = job
 self.pkgmgr = job.pkgmgr
 self.autodir = job.autodir
-
 self.outputdir = outputdir
 self.tagged_testname = os.path.basename(self.outputdir)
 self.resultsdir = os.path.join(self.outputdir, 'results')
@@ -40,6 +39,7 @@ class base_test:
 os.mkdir(self.profdir)
 self.debugdir = os.path.join(self.outputdir, 'debug')
 os.mkdir(self.debugdir)
+self.configure_crash_handler()
 self.bindir = bindir
 if hasattr(job, 'libdir'):
 self.libdir = job.libdir
@@ -54,6 +54,66 @@ class base_test:
 self.after_iteration_hooks = []
 
 
+def configure_crash_handler(self):
+"""
+Configure the crash handler by:
+ * Setting up core size to unlimited
+ * Putting an appropriate crash handler on 
/proc/sys/kernel/core_pattern
+ * Create files that the crash handler will use to figure which tests
+   are active at a given moment
+
+The crash handler will pick up the core file and write it to
+self.debugdir, and perform analysis on it to generate a report. The
+program also outputs some results to syslog.
+
+If multiple tests are running, an attempt to verify if we still have
+the old PID on the system process table to determine whether it is a
+parent of the current test execution. If we can't determine it, the
+core file and the report file will be copied to all test debug dirs.
+"""
+self.pattern_file = '/proc/sys/kernel/core_pattern'
+try:
+# Enable core dumps
+resource.setrlimit(resource.RLIMIT_CORE, (-1, -1))
+# Trying to backup core pattern and register our script
+self.core_pattern_backup = open(self.pattern_file, 'r').read()
+pattern_file = open(self.pattern_file, 'w')
+tools_dir = os.path.join(self.autodir, 'tools')
+crash_handler_path = os.path.join(tools_dir, 'crash_handler.py')
+pattern_file.write('|' + crash_handler_path + ' %p %t %u %s %h %e')
+# Writing the files that the crash handler is going to use
+self.debugdir_tmp_file = ('/tmp/autotest_results_dir.%s' %
+  os.getpid())
+utils.open_write_close(self.debugdir_tmp_file, self.debugdir + 
"\n")
+except Exception, e:
+self.crash_handling_enabled = False
+logging.error('Crash handling system disabled: %s' % e)

Re: [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest

2009-09-08 Thread Anthony Liguori

Huang Ying wrote:

I find there is already a function named qemu_ram_addr_from_host which
translate from user space virtual address into qemu RAM address. But I
need function to return a error code instead of abort in case of no RAM
address corresponding specified user space virtual address. So I plan to
use following code to deal with that.

int do_qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
ram_addr_t qemu_ram_addr_from_host(void *ptr);

Does this follow the coding style of qemu?
  


I don't like the do_ prefix much but I don't have a better suggestion.

If the instruction gets skipped, we may be leaking host memory because 
the access never happened.



There are two kinds of recoverable MCE named SRAO (Software Recoverable
Action Optional) and SRAR (Software Recoverable Action Required). For
your example, it is a SRAR error. Where kernel will munmap the error
page and send SIGBUS to qemu via force_sig_info, which will unblock
SIGBUS and reset its action to SIG_DFL, so qemu will be terminated.

If the guest mode is interrupted, because signal mask processing of KVM
kernel part, SIGBUS can be captured by qemu.
  

Ah, I didn't realize this path just worked.

--
Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM test: Renaming kvm_hugepages variant to hugepages

2009-09-08 Thread Lucas Meneghel Rodrigues
Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/kvm_tests.cfg.sample |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index a83ef9b..fdf2963 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -620,8 +620,8 @@ variants:
 
 
 variants:
-- @kvm_smallpages:
-- kvm_hugepages:
+- @smallpages:
+- hugepages:
 pre_command = "/usr/bin/python scripts/hugepage.py /mnt/kvm_hugepage"
 extra_params += " -mem-path /mnt/kvm_hugepage"
 
@@ -638,7 +638,7 @@ variants:
 only Fedora.8.32
 only install setup boot shutdown
 only rtl8139
-only kvm_hugepages
+only hugepages
 - @sample1:
 only qcow2
 only ide
-- 
1.6.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rtl8139 and qemu-kvm-0.11.0-rc2: NFS not responding

2009-09-08 Thread Sven Rudolph
Hello,

with the current qemu-kvm release candidate our diskless linux guests
cannot use their NFS root filesystem anymore.

  /usr/local/bin/qemu-system-x86_64 -m 4096 -smp 1 -boot n -net 
nic,macaddr=00:50:56:24:0b:57,model=rtl8139 -net 
tap,ifname=vm01,script=no,downscript=no

This boots via the pxe-rtl8139.bin Boot ROM and starts a
locally-developed diskless boot environment. (Unfortunately I'm not
able to describe this in detail, and I didn't manage to reproduce this
in an easier environment.) When this boot environment tries to mount
the new root filesystem via NFS, these message appear, waiting
severeal seconds between each line:

  nfs: server 172.31.11.10 not responding, still trying
  nfs: server 172.31.11.10 OK

This continues until I kill the qemu process. During this problem the
guests IP address can be pinged.

The problem disappears with each of these:
- qemu-kvm-0.10.6
- model=virtio or model=e1000

These changes didn't help:
- -no-kvm-irqchip
- -no-kvm-pit
- -no-kvm
  - qemu-system-x86_64 instantly coredumps
- qemu-kvm-0.11.0-rc1

My environment:
- Host
  - Linux 2.6.31-rc9 x86_64
- kvm kernel components from this kernel
  - Dual-Socket AMD Opteron 2347 HE
- Guest
  - Linux 2.6.25.9 i686

I tried to watch this with tcpdump. Before the line 
"nfs: server 172.31.11.10 OK" it looks like this:

  13:45:28.665935 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.665940 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666162 IP 172.31.11.10.948098278 > 172.31.10.11.2049: 112 read [|nfs]
  13:45:28.666259 IP 172.31.11.10.964875494 > 172.31.10.11.2049: 112 read [|nfs]
  13:45:28.666345 IP 172.31.10.11.2049 > 172.31.11.10.948098278: reply ok 1472 
read
  13:45:28.666403 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666408 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666412 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666416 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666421 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666464 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666469 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666476 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666482 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666487 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666526 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666532 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666538 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666543 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666549 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666587 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666594 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666599 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.05 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.13 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.22 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.28 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666929 IP 172.31.10.11.2049 > 172.31.11.10.964875494: reply ok 1472 
read
  13:45:28.666935 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666940 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666944 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666949 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666991 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.666996 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667002 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667008 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667014 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667052 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667058 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667064 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667069 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667075 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667114 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667121 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667128 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667134 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667140 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667160 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667168 IP 172.31.10.11 > 172.31.11.10: udp
  13:45:28.667174 IP 172.31.10.11 > 172.31.11.10: udp

and after this line this appears:


  13:45:41.261362 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556
  13:45:41.682275 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556
  13:45:42.091219 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556
  13:45:42.942096 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556
  13:45:46.260603 arp who-has 172.31.10.11 tell 172.31.11.10
  13:45:46.260756 arp reply 172.31.10.11 is-at 00:c0:9f:ca:9a:78
  13:45:56.507139 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556
  13:45:57.207030 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556
  13:45:58.688814 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time 
exceeded, length 556


Thats all inf

Re: [qemu-kvm][PATCH] Add "sda" alias options to "hda" options

2009-09-08 Thread Tsuyoshi Ozawa
> -hda is deprecated in favor of -drive, please use -drive instead.

I see, it's better.
-- 
Tsuyoshi Ozawa

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest

2009-09-08 Thread Andi Kleen
On Mon, Sep 07, 2009 at 03:48:07PM -0500, Anthony Liguori wrote:
>>
>>  int kvm_set_irq_level(kvm_context_t kvm, int irq, int level, int *status)
>> @@ -1515,6 +1546,38 @@ static void sig_ipi_handler(int n)
>>  {
>>  }
>>
>> +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void 
>> *ctx)
>> +{
>> +if (siginfo->ssi_code == BUS_MCEERR_AO) {
>> +uint64_t status;
>> +unsigned long paddr;
>> +CPUState *cenv;
>> +
>> +/* Hope we are lucky for AO MCE */
>>   
>
> Even if the error was limited to guest memory, it could have been generated 
> by either the kernel or userspace reading guest memory, no?

Only user space reads or asynchronously detected errors
(e.g. patrol scrubbing) are reported this way. Kernel reading
corrupted memory always leads to panic currently.

>
> Does this potentially open a security hole for us?  Consider the following:
>
> 1) We happen to read guest memory and that causes an MCE.  For instance, 
> say we're in virtio.c and we read the virtio ring.
> 2) That should trigger the kernel to generate a sigbus.
> 3) We catch sigbus, and queue an MCE for delivery.
> 4) After sigbus handler completes, we're back in virtio.c, what was the 
> value of the memory operation we just completed?

Yes for any errors on accessing qemu internal memory that is not
owned by the guest image you should abort. I thought Ying's patch
did that already though, by aborting if there's no slot match.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm][PATCH] Add "sda" alias options to "hda" options

2009-09-08 Thread Avi Kivity

On 09/07/2009 05:00 PM, Ozawa Tsuyoshi wrote:

qemu-kvm: Add "sda" alias options to "hda" options

I know that the name "hda" come from IDE drive, but I felt strange
when I use qemu to boot linux kernel directly as follows:

$ qemu-system-x86 -kernel vmlinux-2.6.28.15 -initrd
initrd.img-2.6.28.15 -hda vdisk.img

By applying this patch,  the command will change to:

$ qemu-system-x86 -kernel vmlinux-2.6.28.15 -initrd
initrd.img-2.6.28.15 -sda vdisk.img

The latter one seems to be more intuitive for me.

   


-hda is deprecated in favor of -drive, please use -drive instead.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html