Re: Live migration between Intel Q6600 and AMD Phenom II
Hello, On (Tue) Sep 08 2009 [13:32:39], Sterling Windmill wrote: > I've read that it's possible to live migrate KVM guests between Intel and AMD > CPUs, is it also possible to migrate from a CPU without NPT/EPT to the Phenom > II that supports NPT? Will I lose out on any of the benefits NPT allows > without shutting down and restarting the guest? Live migration between different vendors isn't tested enough for us to be confident in saying it works well. There has been some work done in the area, but there can always be bugs or unimplemented features. If you do try it, please share your experiences whether good or bad. I think NPT support should get enabled for your VM if the migration does succeed. > Also, any thoughts on how much more performant a 3.0GHz Phenom II will be for > running KVM guests than the 2.4GHz Intel Q6600? The Q6600 does not support EPT, right? If so, the Phenom will be faster as it does support NPT. Amit -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -v2] QEMU-KVM: MCE: Relay UCR MCE to guest
UCR (uncorrected recovery) MCE is supported in recent Intel CPUs, where some hardware error such as some memory error can be reported without PCC (processor context corrupted). To recover from such MCE, the corresponding memory will be unmapped, and all processes accessing the memory will be killed via SIGBUS. For KVM, if QEMU/KVM is killed, all guest processes will be killed too. So we relay SIGBUS from host OS to guest system via a UCR MCE injection. Then guest OS can isolate corresponding memory and kill necessary guest processes only. SIGBUS sent to main thread (not VCPU threads) will be broadcast to all VCPU threads as UCR MCE. v2: - Use qemu_ram_addr_from_host instead of self made one to covert from host address to guest RAM address. Thanks Anthony Liguori. Signed-off-by: Huang Ying --- cpu-common.h |1 exec.c| 20 +-- qemu-kvm.c| 154 ++ target-i386/cpu.h | 20 ++- 4 files changed, 178 insertions(+), 17 deletions(-) --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -27,10 +27,23 @@ #include #include #include +#include +#include #define false 0 #define true 1 +#ifndef PR_MCE_KILL +#define PR_MCE_KILL 33 +#endif + +#ifndef BUS_MCEERR_AR +#define BUS_MCEERR_AR 4 +#endif +#ifndef BUS_MCEERR_AO +#define BUS_MCEERR_AO 5 +#endif + #define EXPECTED_KVM_API_VERSION 12 #if EXPECTED_KVM_API_VERSION != KVM_API_VERSION @@ -1507,6 +1520,37 @@ static void sig_ipi_handler(int n) { } +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx) +{ +if (siginfo->ssi_code == BUS_MCEERR_AO) { +uint64_t status; +unsigned long paddr; +CPUState *cenv; + +/* Hope we are lucky for AO MCE */ +if (do_qemu_ram_addr_from_host((void *)siginfo->ssi_addr, &paddr)) { +fprintf(stderr, "Hardware memory error for memory used by " +"QEMU itself instead of guest system!: %llx\n", +(unsigned long long)siginfo->ssi_addr); +return; +} +status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN +| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S +| 0xc0; +kvm_inject_x86_mce(first_cpu, 9, status, + MCG_STATUS_MCIP | MCG_STATUS_RIPV, paddr, + (MCM_ADDR_PHYS << 6) | 0xc); +for (cenv = first_cpu->next_cpu; cenv != NULL; cenv = cenv->next_cpu) +kvm_inject_x86_mce(cenv, 1, MCI_STATUS_VAL | MCI_STATUS_UC, + MCG_STATUS_MCIP | MCG_STATUS_RIPV, 0, 0); +return; +} else if (siginfo->ssi_code == BUS_MCEERR_AR) +fprintf(stderr, "Hardware memory error!\n"); +else +fprintf(stderr, "Internal error in QEMU!\n"); +exit(1); +} + static void on_vcpu(CPUState *env, void (*func)(void *data), void *data) { struct qemu_work_item wi; @@ -1649,29 +1693,102 @@ static void flush_queued_work(CPUState * pthread_cond_broadcast(&qemu_work_cond); } +static void kvm_on_sigbus(CPUState *env, siginfo_t *siginfo) +{ +#if defined(KVM_CAP_MCE) && defined(TARGET_I386) +struct kvm_x86_mce mce = { +.bank = 9, +}; +unsigned long paddr; +int r; + +if (env->mcg_cap && siginfo->si_addr +&& (siginfo->si_code == BUS_MCEERR_AR +|| siginfo->si_code == BUS_MCEERR_AO)) { +if (siginfo->si_code == BUS_MCEERR_AR) { +/* Fake an Intel architectural Data Load SRAR UCR */ +mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN +| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S +| MCI_STATUS_AR | 0x134; +mce.misc = (MCM_ADDR_PHYS << 6) | 0xc; +mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_EIPV; +} else { +/* Fake an Intel architectural Memory scrubbing UCR */ +mce.status = MCI_STATUS_VAL | MCI_STATUS_UC | MCI_STATUS_EN +| MCI_STATUS_MISCV | MCI_STATUS_ADDRV | MCI_STATUS_S +| 0xc0; +mce.misc = (MCM_ADDR_PHYS << 6) | 0xc; +mce.mcg_status = MCG_STATUS_MCIP | MCG_STATUS_RIPV; +} +if (do_qemu_ram_addr_from_host((void *)siginfo->si_addr, &paddr)) { +fprintf(stderr, "Hardware memory error for memory used by " +"QEMU itself instaed of guest system!\n"); +/* Hope we are lucky for AO MCE */ +if (siginfo->si_code == BUS_MCEERR_AO) +return; +else +exit(1); +} +mce.addr = paddr; +r = kvm_set_mce(env->kvm_cpu_state.vcpu_ctx, &mce); +if (r < 0) { +fprintf(stderr, "kvm_set_mce: %s\n", strerror(errno)); +exit(1); +} +} else +#endif +{ +if (siginfo->si_code == BUS_MCEERR_AO) +return; +if (siginfo->si_code == BUS_MCEERR_AR) +fprintf
[PATCH 4/4] KVM test: Rename BEFORE_YOU_START to README
Also, point out to the latest online documentation. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/BEFORE_YOU_START | 19 --- client/tests/kvm/README |3 +++ 2 files changed, 3 insertions(+), 19 deletions(-) delete mode 100644 client/tests/kvm/BEFORE_YOU_START create mode 100644 client/tests/kvm/README diff --git a/client/tests/kvm/BEFORE_YOU_START b/client/tests/kvm/BEFORE_YOU_START deleted file mode 100644 index 7478d9d..000 --- a/client/tests/kvm/BEFORE_YOU_START +++ /dev/null @@ -1,19 +0,0 @@ -Install kvm and load modules. -Remove 'env' file if exists. -Remove control.state file if exists. - -Copy kvm_tests.cfg.sample into kvm_tests.cfg -Modify kvm_tests.cfg to your liking. -Modify control if you prefer to "use your own kvm" (comment out kvm_install). - -Create those symbolic links under kvm or under -qemu -> qemu-kvm binary (unless using kvm_install) -qemu-img -> qemu-img binary (unless using kvm_install) -isos/ -> isos (mount or symlink) -images/-> images (mount or symlink) -autotest/ -> ../../ (an autotest client directroy) -steps_data/-> steps_data dir (when available) - -Please make sure qemu points to an "installed" kvm-qemu executable, and -not one just compiled in the source directory. An installed executable "knows" -where to find its associated data-dir (e.g. for bios). diff --git a/client/tests/kvm/README b/client/tests/kvm/README new file mode 100644 index 000..88d2c15 --- /dev/null +++ b/client/tests/kvm/README @@ -0,0 +1,3 @@ +In order to get started, please refer to the online documentation: + +http://www.linux-kvm.org/page/KVM-Autotest/Client_Install -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM test: Removing the fix_cdkeys.py program
That is no longer necessary since we handle cd keys on a separate configuration file. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/fix_cdkeys.py | 76 1 files changed, 0 insertions(+), 76 deletions(-) delete mode 100755 client/tests/kvm/fix_cdkeys.py diff --git a/client/tests/kvm/fix_cdkeys.py b/client/tests/kvm/fix_cdkeys.py deleted file mode 100755 index aa9fc3e..000 --- a/client/tests/kvm/fix_cdkeys.py +++ /dev/null @@ -1,76 +0,0 @@ -#!/usr/bin/python -""" -Program that replaces the CD keys present on a KVM autotest configuration file. - -...@copyright: Red Hat 2008-2009 -...@author: u...@redhat.com (Uri Lublin) -""" - -import shutil, os, sys -import common - - -def file_to_lines(filename): -f = open(filename, 'r') -lines = f.readlines() -f.close -return lines - -def lines_to_file(filename, lines): -f = open(filename, 'w') -f.writelines(lines) -f.close() - -def replace_var_with_val(lines, variables): -new = [] -for line in lines: -for (var,val) in variables: -if var in line: -print 'replacing %s with %s in "%s"' % (var, val, line[:-1]) -line = line.replace(var, val) -print ' ... new line is "%s"' % (line[:-1]) -new.append(line) -return new - -def filter_comments(line): -return not line.strip().startswith('#') - -def filter_empty(line): -return len(line.strip()) != 0 - -def line_to_pair(line): -x,y = line.split('=', 1) -return (x.strip(), y.strip()) - -def read_vars(varfile): -varlines = file_to_lines(varfile) -varlines = filter(filter_comments, varlines) -varlines = filter(filter_empty,varlines) -vars = map(line_to_pair, varlines) -return vars - -def main(cfgfile, varfile): -# first save a copy of the original file (if does not exist) -backupfile = '%s.backup' % cfgfile -if not os.path.exists(backupfile): -shutil.copy(cfgfile, backupfile) - -vars = read_vars(varfile) -datalines = file_to_lines(cfgfile) -newlines = replace_var_with_val(datalines, vars) -lines_to_file(cfgfile, newlines) - - -if __name__ == '__main__': -def die(msg, val): -print msg -sys.exit(val) -if len(sys.argv) != 3: -die('usage: %s ', 1) -cfgfile = sys.argv[1] -varfile = sys.argv[2] -if not os.path.exists(cfgfile): -die('bad cfgfile "%s"' % cfgfile, 2) -if not os.path.exists(varfile): -die('bad varfile "%s"' % varfile, 2) -main(cfgfile, varfile) -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM test: Move top level docstrings, other cleanups
In order to prepare for the subsequent changes, made some cleanups on the kvm source files: I've noticed that the top level docstrings were going before the imports block, and that does not follow the pattern found on other files (my fault). This patch fixes that problem and fixed some places on scan_results.py where 80 char line width was not being obeyed. Also, cleaned up the last places where we were using the shebang #/usr/bin/env python, which is not the preferred usage of the shebang across the project. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/calc_md5sum_1m.py |8 client/tests/kvm/fix_cdkeys.py |6 +++--- client/tests/kvm/kvm_config.py | 10 +- client/tests/kvm/kvm_guest_wizard.py | 12 ++-- client/tests/kvm/kvm_subprocess.py |8 client/tests/kvm/kvm_tests.py|8 client/tests/kvm/kvm_utils.py| 12 ++-- client/tests/kvm/kvm_vm.py |6 +++--- client/tests/kvm/make_html_report.py | 13 +++-- client/tests/kvm/ppm_utils.py|5 ++--- client/tests/kvm/scan_results.py | 25 + client/tests/kvm/stepeditor.py |8 client/tests/kvm/stepmaker.py| 13 +++-- 13 files changed, 68 insertions(+), 66 deletions(-) diff --git a/client/tests/kvm/calc_md5sum_1m.py b/client/tests/kvm/calc_md5sum_1m.py index 6660d0e..2325673 100755 --- a/client/tests/kvm/calc_md5sum_1m.py +++ b/client/tests/kvm/calc_md5sum_1m.py @@ -1,7 +1,4 @@ -#!/usr/bin/env python -import os, sys -import kvm_utils - +#!/usr/bin/python """ Program that calculates the md5sum for the first megabyte of a file. It's faster than calculating the md5sum for the whole ISO image. @@ -10,6 +7,9 @@ It's faster than calculating the md5sum for the whole ISO image. @author: Uri Lublin (u...@redhat.com) """ +import os, sys +import kvm_utils + if len(sys.argv) < 2: print 'usage: %s ' % sys.argv[0] diff --git a/client/tests/kvm/fix_cdkeys.py b/client/tests/kvm/fix_cdkeys.py index 7a821fa..aa9fc3e 100755 --- a/client/tests/kvm/fix_cdkeys.py +++ b/client/tests/kvm/fix_cdkeys.py @@ -1,7 +1,4 @@ #!/usr/bin/python -import shutil, os, sys -import common - """ Program that replaces the CD keys present on a KVM autotest configuration file. @@ -9,6 +6,9 @@ Program that replaces the CD keys present on a KVM autotest configuration file. @author: u...@redhat.com (Uri Lublin) """ +import shutil, os, sys +import common + def file_to_lines(filename): f = open(filename, 'r') diff --git a/client/tests/kvm/kvm_config.py b/client/tests/kvm/kvm_config.py index da7988b..9404f28 100755 --- a/client/tests/kvm/kvm_config.py +++ b/client/tests/kvm/kvm_config.py @@ -1,15 +1,15 @@ #!/usr/bin/python -import logging, re, os, sys, StringIO, optparse -import common -from autotest_lib.client.common_lib import error -from autotest_lib.client.common_lib import logging_config, logging_manager - """ KVM configuration file utility functions. @copyright: Red Hat 2008-2009 """ +import logging, re, os, sys, StringIO, optparse +import common +from autotest_lib.client.common_lib import error +from autotest_lib.client.common_lib import logging_config, logging_manager + class KvmLoggingConfig(logging_config.LoggingConfig): def configure_logging(self, results_dir=None, verbose=False): diff --git a/client/tests/kvm/kvm_guest_wizard.py b/client/tests/kvm/kvm_guest_wizard.py index 3d3f3b2..8bc85f2 100644 --- a/client/tests/kvm/kvm_guest_wizard.py +++ b/client/tests/kvm/kvm_guest_wizard.py @@ -1,3 +1,9 @@ +""" +Utilities to perform automatic guest installation using step files. + +...@copyright: Red Hat 2008-2009 +""" + import os, time, md5, re, shutil, logging from autotest_lib.client.common_lib import utils, error import kvm_utils, ppm_utils, kvm_subprocess @@ -9,12 +15,6 @@ except ImportError: 'please install python-imaging or the equivalent for your ' 'distro.') -""" -Utilities to perform automatic guest installation using step files. - -...@copyright: Red Hat 2008-2009 -""" - def handle_var(vm, params, varname): var = params.get(varname) diff --git a/client/tests/kvm/kvm_subprocess.py b/client/tests/kvm/kvm_subprocess.py index 07303a8..5df9e9b 100755 --- a/client/tests/kvm/kvm_subprocess.py +++ b/client/tests/kvm/kvm_subprocess.py @@ -1,14 +1,14 @@ #!/usr/bin/python -import sys, subprocess, pty, select, os, time, signal, re, termios, fcntl -import threading, logging, commands -import common, kvm_utils - """ A class and functions used for running and controlling child processes. @copyright: 2008-2009 Red Hat Inc. """ +import sys, subprocess, pty, select, os, time, signal, re, termios, fcntl +import threading, logging, commands +import common, kvm_utils + def run_bg(command, termination_func=None, output_func=None, output_prefix="", timeout=1.0): diff --git a/cli
Re: Modifying RAM during runtime on guest
On Tuesday 08 September 2009 03:52:07 pm Daniel Bareiro wrote: > Hi all! > > I'm trying to modify the amount of RAM that has some of guests. Host has > 2.6.30 kernel with KVM-88. > > In one of guest I didn't have problems when decreasing the amount of memory > from 3584 MIB to 1024 MiB. This guest has 2.6.26-2-686 stock kernel. Also I > was trying to decrease the amount RAM of another guest from 3584 MiB to > 2048 MiB, but it didn't work. This other guest has > 2.6.24-etchnhalf.1-686-bigmem stock kernel. Does Ballooning in guest > require 2.6.25 or superior? I don't know, if that kernel has a virtio-balloon driver, I'd think that was all you need to balloon memory. > > Thinking that it could be an impediment related to the kernel version of > guest, I tried to increase the memory of another one guest with > 2.6.26-2-686 from 512 MIB to 1024 MIB, but this didn't work either. You can only grow memory up to the amount you specified on the command line if you've already ballooned down. So if you specify "-m 1024M" on the command line, then shrink it to 512, you could then balloon it back up to a max of 1024. > > These are the statistics of of memory usage in host: > > # free > total used free sharedbuffers cached > Mem: 16469828 147634601706368 07800712 202044 > -/+ buffers/cache:67607049709124 > Swap: 8319948 192408300708 > > > > Which can be the cause? > > Thanks in advance for your reply. > > Regards, > Daniel > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Modifying RAM during runtime on guest
Hi all! I'm trying to modify the amount of RAM that has some of guests. Host has 2.6.30 kernel with KVM-88. In one of guest I didn't have problems when decreasing the amount of memory from 3584 MIB to 1024 MiB. This guest has 2.6.26-2-686 stock kernel. Also I was trying to decrease the amount RAM of another guest from 3584 MiB to 2048 MiB, but it didn't work. This other guest has 2.6.24-etchnhalf.1-686-bigmem stock kernel. Does Ballooning in guest require 2.6.25 or superior? Thinking that it could be an impediment related to the kernel version of guest, I tried to increase the memory of another one guest with 2.6.26-2-686 from 512 MIB to 1024 MIB, but this didn't work either. These are the statistics of of memory usage in host: # free total used free sharedbuffers cached Mem: 16469828 147634601706368 07800712 202044 -/+ buffers/cache:67607049709124 Swap: 8319948 192408300708 Which can be the cause? Thanks in advance for your reply. Regards, Daniel -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Squeeze - Linux user #188.598 signature.asc Description: Digital signature
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Tue, Sep 08, 2009 at 10:20:35AM -0700, Ira W. Snyder wrote: > On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote: > > On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote: > > > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote: > > > > What it is: vhost net is a character device that can be used to reduce > > > > the number of system calls involved in virtio networking. > > > > Existing virtio net code is used in the guest without modification. > > > > > > > > There's similarity with vringfd, with some differences and reduced scope > > > > - uses eventfd for signalling > > > > - structures can be moved around in memory at any time (good for > > > > migration) > > > > - support memory table and not just an offset (needed for kvm) > > > > > > > > common virtio related code has been put in a separate file vhost.c and > > > > can be made into a separate module if/when more backends appear. I used > > > > Rusty's lguest.c as the source for developing this part : this supplied > > > > me with witty comments I wouldn't be able to write myself. > > > > > > > > What it is not: vhost net is not a bus, and not a generic new system > > > > call. No assumptions are made on how guest performs hypercalls. > > > > Userspace hypervisors are supported as well as kvm. > > > > > > > > How it works: Basically, we connect virtio frontend (configured by > > > > userspace) to a backend. The backend could be a network device, or a > > > > tun-like device. In this version I only support raw socket as a backend, > > > > which can be bound to e.g. SR IOV, or to macvlan device. Backend is > > > > also configured by userspace, including vlan/mac etc. > > > > > > > > Status: > > > > This works for me, and I haven't see any crashes. > > > > I have done some light benchmarking (with v4), compared to userspace, I > > > > see improved latency (as I save up to 4 system calls per packet) but not > > > > bandwidth/CPU (as TSO and interrupt mitigation are not supported). For > > > > ping benchmark (where there's no TSO) troughput is also improved. > > > > > > > > Features that I plan to look at in the future: > > > > - tap support > > > > - TSO > > > > - interrupt mitigation > > > > - zero copy > > > > > > > > > > Hello Michael, > > > > > > I've started looking at vhost with the intention of using it over PCI to > > > connect physical machines together. > > > > > > The part that I am struggling with the most is figuring out which parts > > > of the rings are in the host's memory, and which parts are in the > > > guest's memory. > > > > All rings are in guest's memory, to match existing virtio code. > > Ok, this makes sense. > > > vhost > > assumes that the memory space of the hypervisor userspace process covers > > the whole of guest memory. > > Is this necessary? Why? Because with virtio ring can give us arbitrary guest addresses. If guest was limited to using a subset of addresses, hypervisor would only have to map these. > The assumption seems very wrong when you're > doing data transport between two physical systems via PCI. > I know vhost has not been designed for this specific situation, but it > is good to be looking toward other possible uses. > > > And there's a translation table. > > Ring addresses are userspace addresses, they do not undergo translation. > > > > > If I understand everything correctly, the rings are all userspace > > > addresses, which means that they can be moved around in physical memory, > > > and get pushed out to swap. > > > > Unless they are locked, yes. > > > > > AFAIK, this is impossible to handle when > > > connecting two physical systems, you'd need the rings available in IO > > > memory (PCI memory), so you can ioreadXX() them instead. To the best of > > > my knowledge, I shouldn't be using copy_to_user() on an __iomem address. > > > Also, having them migrate around in memory would be a bad thing. > > > > > > Also, I'm having trouble figuring out how the packet contents are > > > actually copied from one system to the other. Could you point this out > > > for me? > > > > The code in net/packet/af_packet.c does it when vhost calls sendmsg. > > > > Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible > to make this use a DMA engine instead? Maybe. > I know this was suggested in an earlier thread. Yes, it might even give some performance benefit with e.g. I/O AT. > > > Is there somewhere I can find the userspace code (kvm, qemu, lguest, > > > etc.) code needed for interacting with the vhost misc device so I can > > > get a better idea of how userspace is supposed to work? > > > > Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost > > net. > > > > > (Features > > > negotiation, etc.) > > > > > > > That's not yet implemented as there are no features yet. I'm working on > > tap support, which will add a feature bit. Overall, qemu does an ioctl > > to query supported features, and then a
Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock
On Tue, Sep 08, 2009 at 05:00:04PM -0300, Marcelo Tosatti wrote: > On Tue, Sep 08, 2009 at 04:37:52PM -0300, Glauber Costa wrote: > > On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote: > > > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote: > > > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm. > > > > However, the current mechanism will not propagate changes in wallclock > > > > value > > > > upwards. This effectively means that in a large pool of VMs that need > > > > accurate timing, > > > > all of them has to run NTP, instead of just the host doing it. > > > > > > > > Since the host updates information in the shared memory area upon msr > > > > writes, > > > > this patch introduces a worker that writes to that msr, and calls > > > > do_settimeofday > > > > at fixed intervals, with second resolution. A interval of 0 determines > > > > that we > > > > are not interested in this behaviour. A later patch will make this > > > > optional at > > > > runtime > > > > > > > > Signed-off-by: Glauber Costa > > > > > > As mentioned before, ntp already does this (and its not that heavy is > > > it?). > > > > > > For example, if ntp running on the host, it avoids stepping the clock > > > backwards by slow adjustment, while the periodic frequency adjustment on > > > the guest bypasses that. > > > > Simple question: How do I run ntp in guests without network? > > You don't. For those guests, the mechanism I am proposing comes handy. Furthermore, it is not only optional, but disabled by default. And then even if you have a network, but a genuine reason not to use ntp in your VMs, you can use it too. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock
Marcelo Tosatti wrote: Simple question: How do I run ntp in guests without network? You don't. Why bother doing this in the kernel? Isn't this the sort of thing vmchannel is supposed to handle. openvm-tools does this. /me ducks Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock
On Tue, Sep 08, 2009 at 04:37:52PM -0300, Glauber Costa wrote: > On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote: > > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote: > > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm. > > > However, the current mechanism will not propagate changes in wallclock > > > value > > > upwards. This effectively means that in a large pool of VMs that need > > > accurate timing, > > > all of them has to run NTP, instead of just the host doing it. > > > > > > Since the host updates information in the shared memory area upon msr > > > writes, > > > this patch introduces a worker that writes to that msr, and calls > > > do_settimeofday > > > at fixed intervals, with second resolution. A interval of 0 determines > > > that we > > > are not interested in this behaviour. A later patch will make this > > > optional at > > > runtime > > > > > > Signed-off-by: Glauber Costa > > > > As mentioned before, ntp already does this (and its not that heavy is > > it?). > > > > For example, if ntp running on the host, it avoids stepping the clock > > backwards by slow adjustment, while the periodic frequency adjustment on > > the guest bypasses that. > > Simple question: How do I run ntp in guests without network? You don't. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock
On Tue, Sep 08, 2009 at 03:41:59PM -0300, Marcelo Tosatti wrote: > On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote: > > KVM clock is great to avoid drifting in guest VMs running ontop of kvm. > > However, the current mechanism will not propagate changes in wallclock value > > upwards. This effectively means that in a large pool of VMs that need > > accurate timing, > > all of them has to run NTP, instead of just the host doing it. > > > > Since the host updates information in the shared memory area upon msr > > writes, > > this patch introduces a worker that writes to that msr, and calls > > do_settimeofday > > at fixed intervals, with second resolution. A interval of 0 determines that > > we > > are not interested in this behaviour. A later patch will make this optional > > at > > runtime > > > > Signed-off-by: Glauber Costa > > As mentioned before, ntp already does this (and its not that heavy is > it?). > > For example, if ntp running on the host, it avoids stepping the clock > backwards by slow adjustment, while the periodic frequency adjustment on > the guest bypasses that. Simple question: How do I run ntp in guests without network? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] keep guest wallclock in sync with host clock
On Wed, Sep 02, 2009 at 10:34:57AM -0400, Glauber Costa wrote: > KVM clock is great to avoid drifting in guest VMs running ontop of kvm. > However, the current mechanism will not propagate changes in wallclock value > upwards. This effectively means that in a large pool of VMs that need > accurate timing, > all of them has to run NTP, instead of just the host doing it. > > Since the host updates information in the shared memory area upon msr writes, > this patch introduces a worker that writes to that msr, and calls > do_settimeofday > at fixed intervals, with second resolution. A interval of 0 determines that we > are not interested in this behaviour. A later patch will make this optional at > runtime > > Signed-off-by: Glauber Costa As mentioned before, ntp already does this (and its not that heavy is it?). For example, if ntp running on the host, it avoids stepping the clock backwards by slow adjustment, while the periodic frequency adjustment on the guest bypasses that. > --- > arch/x86/kernel/kvmclock.c | 70 ++- > 1 files changed, 61 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index e5efcdc..555aab0 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -27,6 +27,7 @@ > #define KVM_SCALE 22 > > static int kvmclock = 1; > +static unsigned int kvm_wall_update_interval = 0; > > static int parse_no_kvmclock(char *arg) > { > @@ -39,24 +40,75 @@ early_param("no-kvmclock", parse_no_kvmclock); > static DEFINE_PER_CPU_SHARED_ALIGNED(struct pvclock_vcpu_time_info, > hv_clock); > static struct pvclock_wall_clock wall_clock; > > -/* > - * The wallclock is the time of day when we booted. Since then, some time may > - * have elapsed since the hypervisor wrote the data. So we try to account for > - * that with system time > - */ > -static unsigned long kvm_get_wallclock(void) > +static void kvm_get_wall_ts(struct timespec *ts) > { > - struct pvclock_vcpu_time_info *vcpu_time; > - struct timespec ts; > int low, high; > + struct pvclock_vcpu_time_info *vcpu_time; > > low = (int)__pa_symbol(&wall_clock); > high = ((u64)__pa_symbol(&wall_clock) >> 32); > native_write_msr(MSR_KVM_WALL_CLOCK, low, high); > > vcpu_time = &get_cpu_var(hv_clock); > - pvclock_read_wallclock(&wall_clock, vcpu_time, &ts); > + pvclock_read_wallclock(&wall_clock, vcpu_time, ts); > put_cpu_var(hv_clock); > +} > + > +static void kvm_sync_wall_clock(struct work_struct *work); > +static DECLARE_DELAYED_WORK(kvm_sync_wall_work, kvm_sync_wall_clock); > + > +static void schedule_next_update(void) > +{ > + struct timespec next; > + > + if ((kvm_wall_update_interval == 0) || > +(!kvm_para_available()) || > +(!kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE))) > + return; > + > + next.tv_sec = kvm_wall_update_interval; > + next.tv_nsec = 0; > + > + schedule_delayed_work(&kvm_sync_wall_work, timespec_to_jiffies(&next)); > +} > + > +static void kvm_sync_wall_clock(struct work_struct *work) > +{ > + struct timespec now, after; > + u64 nsec_delta; > + > + do { > + kvm_get_wall_ts(&now); > + do_settimeofday(&now); > + kvm_get_wall_ts(&after); > + nsec_delta = (u64)after.tv_sec * NSEC_PER_SEC + after.tv_nsec; > + nsec_delta -= (u64)now.tv_sec * NSEC_PER_SEC + now.tv_nsec; > + } while (nsec_delta > NSEC_PER_SEC / 8); > + > + schedule_next_update(); > +} > + > +static __init int init_updates(void) > +{ > + schedule_next_update(); > + return 0; > +} > +/* > + * It has to be run after workqueues are initialized, since we call > + * schedule_delayed_work. Other than that, we have no specific requirements > + */ > +late_initcall(init_updates); > + > +/* > + * The wallclock is the time of day when we booted. Since then, some time may > + * have elapsed since the hypervisor wrote the data. So we try to account for > + * that with system time > + */ > +static unsigned long kvm_get_wallclock(void) > +{ > + struct timespec ts; > + > + kvm_get_wall_ts(&ts); > > return ts.tv_sec; > } > -- > 1.6.2.2 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] defer skb allocation in virtio_net -- mergable buff part
Thanks Michael for you details review comments. I am just back from my vacation. I am working on what you have raised here. Shirley -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: drop duplicate kvm_flush_remote_tlb calls
kvm_mmu_slot_remove_write_access already calls it. Signed-off-by: Marcelo Tosatti diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 891234b..f83e990 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2159,7 +2159,6 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, spin_lock(&kvm->mmu_lock); kvm_mmu_slot_remove_write_access(kvm, log->slot); spin_unlock(&kvm->mmu_lock); - kvm_flush_remote_tlbs(kvm); memslot = &kvm->memslots[log->slot]; n = ALIGN(memslot->npages, BITS_PER_LONG) / 8; memset(memslot->dirty_bitmap, 0, n); @@ -4907,7 +4906,6 @@ int kvm_arch_set_memory_region(struct kvm *kvm, kvm_mmu_slot_remove_write_access(kvm, mem->slot); spin_unlock(&kvm->mmu_lock); - kvm_flush_remote_tlbs(kvm); return 0; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: SVM: remove needless mmap_sem acquision from nested_svm_map
nested_svm_map unnecessarily takes mmap_sem around gfn_to_page, since gfn_to_page / get_user_pages are responsible for it. Signed-off-by: Marcelo Tosatti Index: kvm/arch/x86/kvm/svm.c === --- kvm.orig/arch/x86/kvm/svm.c +++ kvm/arch/x86/kvm/svm.c @@ -1394,10 +1394,7 @@ static void *nested_svm_map(struct vcpu_ { struct page *page; - down_read(¤t->mm->mmap_sem); page = gfn_to_page(svm->vcpu.kvm, gpa >> PAGE_SHIFT); - up_read(¤t->mm->mmap_sem); - if (is_error_page(page)) goto error; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Live migration between Intel Q6600 and AMD Phenom II
I've read that it's possible to live migrate KVM guests between Intel and AMD CPUs, is it also possible to migrate from a CPU without NPT/EPT to the Phenom II that supports NPT? Will I lose out on any of the benefits NPT allows without shutting down and restarting the guest? Also, any thoughts on how much more performant a 3.0GHz Phenom II will be for running KVM guests than the 2.4GHz Intel Q6600? Thanks in advance. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Mon, Sep 07, 2009 at 01:15:37PM +0300, Michael S. Tsirkin wrote: > On Thu, Sep 03, 2009 at 11:39:45AM -0700, Ira W. Snyder wrote: > > On Thu, Aug 27, 2009 at 07:07:50PM +0300, Michael S. Tsirkin wrote: > > > What it is: vhost net is a character device that can be used to reduce > > > the number of system calls involved in virtio networking. > > > Existing virtio net code is used in the guest without modification. > > > > > > There's similarity with vringfd, with some differences and reduced scope > > > - uses eventfd for signalling > > > - structures can be moved around in memory at any time (good for > > > migration) > > > - support memory table and not just an offset (needed for kvm) > > > > > > common virtio related code has been put in a separate file vhost.c and > > > can be made into a separate module if/when more backends appear. I used > > > Rusty's lguest.c as the source for developing this part : this supplied > > > me with witty comments I wouldn't be able to write myself. > > > > > > What it is not: vhost net is not a bus, and not a generic new system > > > call. No assumptions are made on how guest performs hypercalls. > > > Userspace hypervisors are supported as well as kvm. > > > > > > How it works: Basically, we connect virtio frontend (configured by > > > userspace) to a backend. The backend could be a network device, or a > > > tun-like device. In this version I only support raw socket as a backend, > > > which can be bound to e.g. SR IOV, or to macvlan device. Backend is > > > also configured by userspace, including vlan/mac etc. > > > > > > Status: > > > This works for me, and I haven't see any crashes. > > > I have done some light benchmarking (with v4), compared to userspace, I > > > see improved latency (as I save up to 4 system calls per packet) but not > > > bandwidth/CPU (as TSO and interrupt mitigation are not supported). For > > > ping benchmark (where there's no TSO) troughput is also improved. > > > > > > Features that I plan to look at in the future: > > > - tap support > > > - TSO > > > - interrupt mitigation > > > - zero copy > > > > > > > Hello Michael, > > > > I've started looking at vhost with the intention of using it over PCI to > > connect physical machines together. > > > > The part that I am struggling with the most is figuring out which parts > > of the rings are in the host's memory, and which parts are in the > > guest's memory. > > All rings are in guest's memory, to match existing virtio code. Ok, this makes sense. > vhost > assumes that the memory space of the hypervisor userspace process covers > the whole of guest memory. Is this necessary? Why? The assumption seems very wrong when you're doing data transport between two physical systems via PCI. I know vhost has not been designed for this specific situation, but it is good to be looking toward other possible uses. > And there's a translation table. > Ring addresses are userspace addresses, they do not undergo translation. > > > If I understand everything correctly, the rings are all userspace > > addresses, which means that they can be moved around in physical memory, > > and get pushed out to swap. > > Unless they are locked, yes. > > > AFAIK, this is impossible to handle when > > connecting two physical systems, you'd need the rings available in IO > > memory (PCI memory), so you can ioreadXX() them instead. To the best of > > my knowledge, I shouldn't be using copy_to_user() on an __iomem address. > > Also, having them migrate around in memory would be a bad thing. > > > > Also, I'm having trouble figuring out how the packet contents are > > actually copied from one system to the other. Could you point this out > > for me? > > The code in net/packet/af_packet.c does it when vhost calls sendmsg. > Ok. The sendmsg() implementation uses memcpy_fromiovec(). Is it possible to make this use a DMA engine instead? I know this was suggested in an earlier thread. > > Is there somewhere I can find the userspace code (kvm, qemu, lguest, > > etc.) code needed for interacting with the vhost misc device so I can > > get a better idea of how userspace is supposed to work? > > Look in archives for k...@vger.kernel.org. the subject is qemu-kvm: vhost net. > > > (Features > > negotiation, etc.) > > > > That's not yet implemented as there are no features yet. I'm working on > tap support, which will add a feature bit. Overall, qemu does an ioctl > to query supported features, and then acks them with another ioctl. I'm > also trying to avoid duplicating functionality available elsewhere. So > that to check e.g. TSO support, you'd just look at the underlying > hardware device you are binding to. > Ok. Do you have plans to support the VIRTIO_NET_F_MRG_RXBUF feature in the future? I found that this made an enormous improvement in throughput on my virtio-net <-> virtio-net system. Perhaps it isn't needed with vhost-net. Thanks for replying, Ira -- To unsubscribe from this list: send the
Re: kvm ptrace 32bit DoS bug - bisected
Marcelo Tosatti wrote: > On Sun, Sep 06, 2009 at 02:50:00PM +0700, Antoine Martin wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA512 >> >> [snip] Is this an AMD host? >>> Nope, Intel Core2, more host info : >> I have put all the relevant binaries and their config files here: >> http://uml.devloop.org.uk/kvmbug/ >> Host kernel, qemu binary, kvm guest kernel and the UML binary I have >> used for bisecting. > > Antoine, > > Works for me with master branch. Its likely this commit fixed it: > > commit 76d4622776d007de3f90f311591babc5f6ba6f39 > Author: Avi Kivity > Date: Tue Sep 1 12:03:25 2009 +0300 > > KVM: VMX: Check cpl before emulating debug register access > > Debug registers may only be accessed from cpl 0. Unfortunately, vmx will > code to emulate the instruction even though it was issued from guest > userspace, possibly leading to an unexpected trap later. > > It will be included in 2.6.30 / 2.6.27 stable (.29 is not maintained > anymore). Easy to check: Does the UML image still contain mov-to-db instructions? If not, this commit cannot make the difference. Jan signature.asc Description: OpenPGP digital signature
Re: kvm ptrace 32bit DoS bug - bisected
On Sun, Sep 06, 2009 at 02:50:00PM +0700, Antoine Martin wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA512 > > [snip] > >> Is this an AMD host? > > Nope, Intel Core2, more host info : > I have put all the relevant binaries and their config files here: > http://uml.devloop.org.uk/kvmbug/ > Host kernel, qemu binary, kvm guest kernel and the UML binary I have > used for bisecting. Antoine, Works for me with master branch. Its likely this commit fixed it: commit 76d4622776d007de3f90f311591babc5f6ba6f39 Author: Avi Kivity Date: Tue Sep 1 12:03:25 2009 +0300 KVM: VMX: Check cpl before emulating debug register access Debug registers may only be accessed from cpl 0. Unfortunately, vmx will code to emulate the instruction even though it was issued from guest userspace, possibly leading to an unexpected trap later. It will be included in 2.6.30 / 2.6.27 stable (.29 is not maintained anymore). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Adding a userspace application crash handling system to autotest
This patch adds a system to watch user space segmentation faults, writing core dumps and some degree of core dump analysis report. We believe that such a system will be beneficial for autotest as a whole, since the ability to get core dumps and dump analysis for each app crashing during an autotest execution can help test engineers with richer debugging information. The system is comprised by 2 parts: * Modifications on test code that enable core dumps generation, register a core handler script in the kernel and check by generated core files at the end of each test. * A core handler script that is going to write the core on each test debug dir in a convenient way, with a report that currently is comprised by the process that died and a gdb stacktrace of the process. As the system gets in shape, we could add more scripts that can do fancier stuff (such as handlers that use frysk to get more info such as memory maps, provided that we have frysk installed in the machine). This is the proof of concept of the system. I am sending it to the mailing list on this early stage so I can get feedback on the feature. The system passes my basic tests: * Run a simple long test, such as the kvm test, and then crash an application while the test is running. I get reports generated on test.debugdir * Run a slightly more complex control file, with 3 parallel bonnie instances at once and crash an application while the test is running. I get reports generated on all test.debugdirs. 3rd try: * Explicitely enable core dumps using the resource module * Fixed a bug on the crash detection code, and factored it into a utility function. I believe we are good to go now. Signed-off-by: Lucas Meneghel Rodrigues --- client/common_lib/test.py | 66 +- client/tools/crash_handler.py | 202 + 2 files changed, 266 insertions(+), 2 deletions(-) create mode 100755 client/tools/crash_handler.py diff --git a/client/common_lib/test.py b/client/common_lib/test.py index 362c960..65b78a3 100644 --- a/client/common_lib/test.py +++ b/client/common_lib/test.py @@ -17,7 +17,7 @@ # tmpdir eg. tmp/_ import fcntl, os, re, sys, shutil, tarfile, tempfile, time, traceback -import warnings, logging +import warnings, logging, glob, resource from autotest_lib.client.common_lib import error from autotest_lib.client.bin import utils @@ -31,7 +31,6 @@ class base_test: self.job = job self.pkgmgr = job.pkgmgr self.autodir = job.autodir - self.outputdir = outputdir self.tagged_testname = os.path.basename(self.outputdir) self.resultsdir = os.path.join(self.outputdir, 'results') @@ -40,6 +39,7 @@ class base_test: os.mkdir(self.profdir) self.debugdir = os.path.join(self.outputdir, 'debug') os.mkdir(self.debugdir) +self.configure_crash_handler() self.bindir = bindir if hasattr(job, 'libdir'): self.libdir = job.libdir @@ -54,6 +54,66 @@ class base_test: self.after_iteration_hooks = [] +def configure_crash_handler(self): +""" +Configure the crash handler by: + * Setting up core size to unlimited + * Putting an appropriate crash handler on /proc/sys/kernel/core_pattern + * Create files that the crash handler will use to figure which tests + are active at a given moment + +The crash handler will pick up the core file and write it to +self.debugdir, and perform analysis on it to generate a report. The +program also outputs some results to syslog. + +If multiple tests are running, an attempt to verify if we still have +the old PID on the system process table to determine whether it is a +parent of the current test execution. If we can't determine it, the +core file and the report file will be copied to all test debug dirs. +""" +self.pattern_file = '/proc/sys/kernel/core_pattern' +try: +# Enable core dumps +resource.setrlimit(resource.RLIMIT_CORE, (-1, -1)) +# Trying to backup core pattern and register our script +self.core_pattern_backup = open(self.pattern_file, 'r').read() +pattern_file = open(self.pattern_file, 'w') +tools_dir = os.path.join(self.autodir, 'tools') +crash_handler_path = os.path.join(tools_dir, 'crash_handler.py') +pattern_file.write('|' + crash_handler_path + ' %p %t %u %s %h %e') +# Writing the files that the crash handler is going to use +self.debugdir_tmp_file = ('/tmp/autotest_results_dir.%s' % + os.getpid()) +utils.open_write_close(self.debugdir_tmp_file, self.debugdir + "\n") +except Exception, e: +self.crash_handling_enabled = False +logging.error('Crash handling system disabled: %s' % e)
Re: [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest
Huang Ying wrote: I find there is already a function named qemu_ram_addr_from_host which translate from user space virtual address into qemu RAM address. But I need function to return a error code instead of abort in case of no RAM address corresponding specified user space virtual address. So I plan to use following code to deal with that. int do_qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr); ram_addr_t qemu_ram_addr_from_host(void *ptr); Does this follow the coding style of qemu? I don't like the do_ prefix much but I don't have a better suggestion. If the instruction gets skipped, we may be leaking host memory because the access never happened. There are two kinds of recoverable MCE named SRAO (Software Recoverable Action Optional) and SRAR (Software Recoverable Action Required). For your example, it is a SRAR error. Where kernel will munmap the error page and send SIGBUS to qemu via force_sig_info, which will unblock SIGBUS and reset its action to SIG_DFL, so qemu will be terminated. If the guest mode is interrupted, because signal mask processing of KVM kernel part, SIGBUS can be captured by qemu. Ah, I didn't realize this path just worked. -- Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM test: Renaming kvm_hugepages variant to hugepages
Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/kvm_tests.cfg.sample |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index a83ef9b..fdf2963 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -620,8 +620,8 @@ variants: variants: -- @kvm_smallpages: -- kvm_hugepages: +- @smallpages: +- hugepages: pre_command = "/usr/bin/python scripts/hugepage.py /mnt/kvm_hugepage" extra_params += " -mem-path /mnt/kvm_hugepage" @@ -638,7 +638,7 @@ variants: only Fedora.8.32 only install setup boot shutdown only rtl8139 -only kvm_hugepages +only hugepages - @sample1: only qcow2 only ide -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
rtl8139 and qemu-kvm-0.11.0-rc2: NFS not responding
Hello, with the current qemu-kvm release candidate our diskless linux guests cannot use their NFS root filesystem anymore. /usr/local/bin/qemu-system-x86_64 -m 4096 -smp 1 -boot n -net nic,macaddr=00:50:56:24:0b:57,model=rtl8139 -net tap,ifname=vm01,script=no,downscript=no This boots via the pxe-rtl8139.bin Boot ROM and starts a locally-developed diskless boot environment. (Unfortunately I'm not able to describe this in detail, and I didn't manage to reproduce this in an easier environment.) When this boot environment tries to mount the new root filesystem via NFS, these message appear, waiting severeal seconds between each line: nfs: server 172.31.11.10 not responding, still trying nfs: server 172.31.11.10 OK This continues until I kill the qemu process. During this problem the guests IP address can be pinged. The problem disappears with each of these: - qemu-kvm-0.10.6 - model=virtio or model=e1000 These changes didn't help: - -no-kvm-irqchip - -no-kvm-pit - -no-kvm - qemu-system-x86_64 instantly coredumps - qemu-kvm-0.11.0-rc1 My environment: - Host - Linux 2.6.31-rc9 x86_64 - kvm kernel components from this kernel - Dual-Socket AMD Opteron 2347 HE - Guest - Linux 2.6.25.9 i686 I tried to watch this with tcpdump. Before the line "nfs: server 172.31.11.10 OK" it looks like this: 13:45:28.665935 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.665940 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666162 IP 172.31.11.10.948098278 > 172.31.10.11.2049: 112 read [|nfs] 13:45:28.666259 IP 172.31.11.10.964875494 > 172.31.10.11.2049: 112 read [|nfs] 13:45:28.666345 IP 172.31.10.11.2049 > 172.31.11.10.948098278: reply ok 1472 read 13:45:28.666403 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666408 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666412 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666416 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666421 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666464 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666469 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666476 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666482 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666487 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666526 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666532 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666538 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666543 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666549 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666587 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666594 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666599 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.05 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.13 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.22 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.28 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666929 IP 172.31.10.11.2049 > 172.31.11.10.964875494: reply ok 1472 read 13:45:28.666935 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666940 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666944 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666949 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666991 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.666996 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667002 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667008 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667014 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667052 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667058 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667064 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667069 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667075 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667114 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667121 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667128 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667134 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667140 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667160 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667168 IP 172.31.10.11 > 172.31.11.10: udp 13:45:28.667174 IP 172.31.10.11 > 172.31.11.10: udp and after this line this appears: 13:45:41.261362 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 13:45:41.682275 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 13:45:42.091219 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 13:45:42.942096 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 13:45:46.260603 arp who-has 172.31.10.11 tell 172.31.11.10 13:45:46.260756 arp reply 172.31.10.11 is-at 00:c0:9f:ca:9a:78 13:45:56.507139 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 13:45:57.207030 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 13:45:58.688814 IP 172.31.11.10 > 172.31.10.11: ICMP ip reassembly time exceeded, length 556 Thats all inf
Re: [qemu-kvm][PATCH] Add "sda" alias options to "hda" options
> -hda is deprecated in favor of -drive, please use -drive instead. I see, it's better. -- Tsuyoshi Ozawa -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] QEMU-KVM: MCE: Relay UCR MCE to guest
On Mon, Sep 07, 2009 at 03:48:07PM -0500, Anthony Liguori wrote: >> >> int kvm_set_irq_level(kvm_context_t kvm, int irq, int level, int *status) >> @@ -1515,6 +1546,38 @@ static void sig_ipi_handler(int n) >> { >> } >> >> +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void >> *ctx) >> +{ >> +if (siginfo->ssi_code == BUS_MCEERR_AO) { >> +uint64_t status; >> +unsigned long paddr; >> +CPUState *cenv; >> + >> +/* Hope we are lucky for AO MCE */ >> > > Even if the error was limited to guest memory, it could have been generated > by either the kernel or userspace reading guest memory, no? Only user space reads or asynchronously detected errors (e.g. patrol scrubbing) are reported this way. Kernel reading corrupted memory always leads to panic currently. > > Does this potentially open a security hole for us? Consider the following: > > 1) We happen to read guest memory and that causes an MCE. For instance, > say we're in virtio.c and we read the virtio ring. > 2) That should trigger the kernel to generate a sigbus. > 3) We catch sigbus, and queue an MCE for delivery. > 4) After sigbus handler completes, we're back in virtio.c, what was the > value of the memory operation we just completed? Yes for any errors on accessing qemu internal memory that is not owned by the guest image you should abort. I thought Ying's patch did that already though, by aborting if there's no slot match. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [qemu-kvm][PATCH] Add "sda" alias options to "hda" options
On 09/07/2009 05:00 PM, Ozawa Tsuyoshi wrote: qemu-kvm: Add "sda" alias options to "hda" options I know that the name "hda" come from IDE drive, but I felt strange when I use qemu to boot linux kernel directly as follows: $ qemu-system-x86 -kernel vmlinux-2.6.28.15 -initrd initrd.img-2.6.28.15 -hda vdisk.img By applying this patch, the command will change to: $ qemu-system-x86 -kernel vmlinux-2.6.28.15 -initrd initrd.img-2.6.28.15 -sda vdisk.img The latter one seems to be more intuitive for me. -hda is deprecated in favor of -drive, please use -drive instead. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html