Unable to get MementOS booting
Hi, I'm not able to get MementOS booting while using kvm modules. QEmu hangs on Floppy boot. Here is the procedure: cd /tmp wget -c 'http://www.menuetos.be/download.php?CurrentMenuetOS' -O menuetos.zip unzip -u menuetos.zip qemu-kvm -m 512 -fda M64-*.IMG -boot a I have a intel i7 920, I use kvm-88, kernel 2.6.31.4, x86_64, 64 bits release of MenuetOS. -no-kvm-irqchip or -no-kvm-pit don't solve the issue, but -no-kvm does. -- ubitux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Added VM Exit on RDTSC, trouble handling in userspace
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, In short, I have a need for trapping RDTSC with a VM Exit and this works, but I'm having trouble handling it in userspace. I have added the hooks I need (I only care about VMX right now), but a piece of the puzzle is missing and I don't know which. When I go back to userspace, it's triggering a different (faulty) execution vs. handling only in the kernel. Here's what I've done: 1. Added the CPU_BASED_RDTSC_EXITING flag to MSR_IA32_VMX_PROCBASED_CTLS in vmx.c:setup_vmcs_config() 2. Defined KVM_EXIT_RDTSC, and hooked into EXIT_REASON_RDTSC my handler for the exit: static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) = { // ... [EXIT_REASON_RDTSC] = handle_rdtsc, // ... } static int handle_rdtsc(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { u64 data; if (vmx_get_msr(vcpu, MSR_IA32_TIME_STAMP_COUNTER, &data)) { kvm_inject_gp(vcpu, 0); return 1; } vcpu->run->exit_reason = KVM_EXIT_RDTSC; vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u; vcpu->arch.regs[VCPU_REGS_RDX] = (data >> 32) & -1u; skip_emulated_instruction(vcpu); // flag a need for userspace invervention // note: this works when we return 1 and we don't involve userspace return 0; } 3. Handle KVM_EXIT_RDTSC in libkvm.c:kvm_run() : case KVM_EXIT_RDTSC: r = handle_rdtsc_usp(kvm, vcpu, env); break; via a handler where I do _nothing_ : static int handle_rdtsc_usp(kvm_context_t kvm, int vcpu, void *data) { return 0; } All well and good, right? I can add print statements to my userspace handle_rtsc_usp() and see I get in there just fine. However, when I try to boot Linux, the following code is called over and over and over, and Linux will never load: Breakpoint 4, 0xc01103d3 in ?? () (gdb) x/10i $rip-10 0xc01103c9: lea0x0(%rdi,%riz,1),%edi 0xc01103d0: push %rbp 0xc01103d1: mov%esp,%ebp 0xc01103d3: rdtsc 0xc01103d5: pop%rbp 0xc01103d6: retq If I only handle the exit in the kernel (by returning 1 from handle_rdtsc()), everything works and Linux will load! I counted the number of RDTSC exits before linux fully loads to be somewhere around 20. If I exit all the way to userspace (return 0 in my handle_rdtsc()) that count is infinitely surpassed in number of exits, wall time, and the value of RDTSC. So is anything glaringly wrong with my modifications? Maybe there is there some extra state that needs to be restored on VM entry? Is there an interrupt flag that needs to be cleared? Maybe I need to do something with kvm_run.if_flag or kvm_run.ready_for_interrupt_injection? Please, I need help, I'm losing sleep over this! Thanks, Kurt -BEGIN PGP SIGNATURE- Version: GnuPG/MacGPG2 v2.0.12 (Darwin) iEYEARECAAYFAkrVZvQACgkQYFGmU9mnI1FqvgCcC/+PswoXHQ5kVgv5tC6UadiA KKgAoKrLgsYSJN0+1d0pox9vzsLHoQIc =cQzR -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RFC] Xen PV-on-HVM guest support
As we discussed a while back, support for Xen PV-on-HVM guests can be implemented almost entirely in userspace, except for handling one annoying MSR that maps a Xen hypercall blob into guest address space. A generic mechanism to delegate MSR writes to userspace seems overkill and risks encouraging similar MSR abuse in the future. Thus this patch adds special support for the Xen HVM MSR. At Avi's suggestion[1] I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell KVM which MSR the guest will write to, as well as the starting address and size of the hypercall blobs (one each for 32-bit and 64-bit) that userspace has loaded from files. When the guest writes to the MSR, KVM copies one page of the blob from userspace to the guest. I've tested this patch against a hacked-up version of Gerd's userspace code[2]; I'm happy to share those hacks if anyone is interested. [1] http://www.mail-archive.com/kvm@vger.kernel.org/msg16065.html [2] http://git.et.redhat.com/?p=qemu-kraxel.git;a=log;h=refs/heads/xenner.v5 Signed-off-by: Ed Swierk --- diff -BurN a/include/asm-x86/kvm.h b/include/asm-x86/kvm.h --- a/include/asm-x86/kvm.h 2009-10-13 20:40:55.0 -0700 +++ b/include/asm-x86/kvm.h 2009-10-13 20:21:07.0 -0700 @@ -59,6 +59,7 @@ #define __KVM_HAVE_MSIX #define __KVM_HAVE_MCE #define __KVM_HAVE_PIT_STATE2 +#define __KVM_HAVE_XEN_HVM /* Architectural interrupt line count. */ #define KVM_NR_INTERRUPTS 256 diff -BurN a/include/linux/kvm.h b/include/linux/kvm.h --- a/include/linux/kvm.h 2009-10-13 20:40:55.0 -0700 +++ b/include/linux/kvm.h 2009-10-13 20:21:26.0 -0700 @@ -476,6 +476,9 @@ #endif #define KVM_CAP_IOEVENTFD 36 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37 +#ifdef __KVM_HAVE_XEN_HVM +#define KVM_CAP_XEN_HVM 90 +#endif #ifdef KVM_CAP_IRQ_ROUTING @@ -528,6 +531,14 @@ }; #endif +#ifdef KVM_CAP_XEN_HVM +struct kvm_xen_hvm_config { + __u32 msr; + __u64 blob_addr[2]; + __u8 blob_size[2]; +}; +#endif + #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) struct kvm_irqfd { @@ -586,6 +597,7 @@ #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78) #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) +#define KVM_XEN_HVM_CONFIG_IOW(KVMIO, 0xa1, struct kvm_xen_hvm_config) /* * ioctls for vcpu fds diff -BurN a/include/linux/kvm_host.h b/include/linux/kvm_host.h --- a/include/linux/kvm_host.h 2009-10-13 20:40:55.0 -0700 +++ b/include/linux/kvm_host.h 2009-10-13 20:27:03.0 -0700 @@ -236,6 +236,10 @@ unsigned long mmu_notifier_seq; long mmu_notifier_count; #endif + +#ifdef KVM_CAP_XEN_HVM + struct kvm_xen_hvm_config xen_hvm_config; +#endif }; /* The guest did something we don't support. */ diff -BurN a/x86/x86.c b/x86/x86.c --- a/x86/x86.c 2009-10-13 20:40:58.0 -0700 +++ b/x86/x86.c 2009-10-13 20:33:49.0 -0700 @@ -875,6 +875,33 @@ return 0; } +#ifdef KVM_CAP_XEN_HVM +static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data) +{ + int blob = !!(vcpu->arch.shadow_efer & EFER_LME); + u32 pnum = data & ~PAGE_MASK; + u64 paddr = data & PAGE_MASK; + u8 *page; + int r = 1; + printk(KERN_INFO "kvm: loading xen hvm blob %d page %d at %llx\n", + blob, pnum, paddr); + if (pnum >= vcpu->kvm->xen_hvm_config.blob_size[blob]) + goto out; + page = kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!page) + goto out; + if (copy_from_user(page, (u8 *)vcpu->kvm->xen_hvm_config.blob_addr[blob] + + pnum * PAGE_SIZE, PAGE_SIZE)) + goto out_free; + kvm_write_guest(vcpu->kvm, paddr, page, PAGE_SIZE); + r = 0; +out_free: + kfree(page); +out: + return r; +} +#endif + int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { switch (msr) { @@ -990,6 +1017,10 @@ "0x%x data 0x%llx\n", msr, data); break; default: +#ifdef KVM_CAP_XEN_HVM + if (msr && (msr == vcpu->kvm->xen_hvm_config.msr)) + return xen_hvm_config(vcpu, data); +#endif if (!ignore_msrs) { pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data); @@ -2453,6 +2484,17 @@ r = 0; break; } +#ifdef KVM_CAP_XEN_HVM + case KVM_XEN_HVM_CONFIG: { + r = -EFAULT; + printk(KERN_INFO "kvm: configuring xen hvm\n"); + if (copy_from_user(&kvm->xen_hvm_config, argp, + sizeof(struct kvm_xen_hvm_config))) + goto out; + r = 0; + break; + } +#endif default: ; } -- To unsubscribe from this list: se
Re: sync guest calls made async on host - SQLite performance
On Tue, Oct 13, 2009 at 9:09 PM, Matthew Tippett wrote: > I believe that I have removed the benchmark from discussion, we are now > looking at semantics of small writes followed by ... > And quoting from Dustin > > === > I have tried this, exactly as you have described. The tests took: > > * 1162.08033204 seconds on native hardware > * 2306.68306303 seconds in a kvm using if=scsi disk > * 405.382308006 seconds in a kvm using if=virtio Hang on now... My timings are from running the Phoronix test *as you described*. I have not looked at what magic is happening inside of this Phoronix test. I am most certainly *not* speaking as to the quality or legitimacy of the test. :-Dustin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] Add a kvm test guest_s4 which supports both Linux and Windows platform
On Tue, Oct 13, 2009 at 05:29:40PM -0300, Lucas Meneghel Rodrigues wrote: > Hi Yolkfull and Chen: > > Thanks for your test! I have some comments and doubts to clear, most > of them are about content of the messages delivered for the user and > some other details. > > On Sun, Sep 27, 2009 at 6:11 AM, Yolkfull Chow wrote: > > For this case, Ken Cao wrote the linux part previously and I did extensive > > modifications on Windows platform support. > > > > Signed-off-by: Ken Cao > > Signed-off-by: Yolkfull Chow > > --- > > client/tests/kvm/kvm_tests.cfg.sample | 14 +++ > > client/tests/kvm/tests/guest_s4.py | 66 > > + > > 2 files changed, 80 insertions(+), 0 deletions(-) > > create mode 100644 client/tests/kvm/tests/guest_s4.py > > > > diff --git a/client/tests/kvm/kvm_tests.cfg.sample > > b/client/tests/kvm/kvm_tests.cfg.sample > > index 285a38f..f9ecb61 100644 > > --- a/client/tests/kvm/kvm_tests.cfg.sample > > +++ b/client/tests/kvm/kvm_tests.cfg.sample > > @@ -94,6 +94,14 @@ variants: > > - linux_s3: install setup > > type = linux_s3 > > > > + - guest_s4: > > + type = guest_s4 > > + check_s4_support_cmd = grep -q disk /sys/power/state > > + test_s4_cmd = "cd /tmp/;nohup tcpdump -q -t ip host localhost" > > + check_s4_cmd = pgrep tcpdump > > + set_s4_cmd = echo disk > /sys/power/state > > + kill_test_s4_cmd = pkill tcpdump > > + > > - timedrift: install setup > > type = timedrift > > extra_params += " -rtc-td-hack" > > @@ -382,6 +390,12 @@ variants: > > # Alternative host load: > > #host_load_command = "dd if=/dev/urandom of=/dev/null" > > host_load_instances = 8 > > + guest_s4: > > + check_s4_support_cmd = powercfg /hibernate on > > + test_s4_cmd = start /B ping -n 3000 localhost > > + check_s4_cmd = tasklist | find /I "ping" > > + set_s4_cmd = rundll32.exe PowrProf.dll, SetSuspendState > > + kill_test_s4_cmd = taskkill /IM ping.exe /F > > > > variants: > > - Win2000: > > diff --git a/client/tests/kvm/tests/guest_s4.py > > b/client/tests/kvm/tests/guest_s4.py > > new file mode 100644 > > index 000..5d8fbdf > > --- /dev/null > > +++ b/client/tests/kvm/tests/guest_s4.py > > @@ -0,0 +1,66 @@ > > +import logging, time > > +from autotest_lib.client.common_lib import error > > +import kvm_test_utils, kvm_utils > > + > > + > > +def run_guest_s4(test, params, env): > > + """ > > + Suspend guest to disk,supports both Linux & Windows OSes. > > + > > + �...@param test: kvm test object. > > + �...@param params: Dictionary with test parameters. > > + �...@param env: Dictionary with the test environment. > > + """ > > + vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) > > + session = kvm_test_utils.wait_for_login(vm) > > + > > + logging.info("Checking whether VM supports S4") > > + status = session.get_command_status(params.get("check_s4_support_cmd")) > > + if status is None: > > + logging.error("Failed to check if S4 exists") > > + elif status != 0: > > + raise error.TestFail("Guest does not support S4") > > + > > + logging.info("Waiting for a while for X to start...") > > Yes, generally X starts a bit later than the SSH service, so I > understand the time being here, however: > > * In fact we are waiting for all services of the guest to be up and > functional, so depending on the level of load, I don't think 10s is > gonna make it. So I suggest something >= 30s Yeah,reasonable, we did ignore the circumstance with workload. But as you metioned,it can depend on different level of workload, therefore 30s may be not enough as well. Your idea that write a utility function waiting for some services up is good I think, thus it could be something like: def wait_services_up(services_list): ... and for this case: wait_services_up(["Xorg"]) for Linux and wait_services_up(["explore.exe"]) for Windows. > * It's also true that just wait for a given time and hope that it > will be OK kinda sucks, so ideally we need to write utility functions > to stablish as well as possible when all services of a host are fully > booted up. Stated this way, it looks simple, but it's not. > > Autotest experience suggests that there's no real sane way to > determine when a linux box is booted up, but we can take a > semi-rational approach and verify if all services for the current run > level have the status up or a similar approach. For windows, I was > talking to Yaniv Kaul and it seems that processing the output of the > 'sc query' command might give what we want. Bottom line, I'd like to > add a TODO item, and write a function to stablish (fairly confidently) > that a windows/linux guest is booted up. > > > + time.sleep(10) > > + > > + # Start up a program(tcpdump for linux OS & ping for M$ OS), as a flag. >
Re: sync guest calls made async on host - SQLite performance
No, it's an absurd assessment. You have additional layers of caching happening because you're running a guest from a filesystem on the host. Comments below. A benchmark running under a guest that happens do be faster than the host does not indicate anything. It could be that the benchmark is poorly written. I believe that I have removed the benchmark from discussion, we are now looking at semantics of small writes followed by What operation, specifically, do you think is not behaving properly under kvm? ext4 (karmic's default filesystem) does not enable barriers by default so it's unlikely this is anything barrier related. Re-quoting me from two replies ago. === I dug deeper into the actual syscalls being made by sqlite. The salient part of the behaviour is small sequential writes followed by a fdatasync (effectively a metadata-free fsync). === And quoting from Dustin === I have tried this, exactly as you have described. The tests took: * 1162.08033204 seconds on native hardware * 2306.68306303 seconds in a kvm using if=scsi disk * 405.382308006 seconds in a kvm using if=virtio === And finally Christoph === Can't remember anything like that. The "bug" was the complete lack of cache flush infrastructure for virtio, and the lack of advertising a volative write cache on ide. === The _Operation_ that I believe is not behaving as expected is fdatasync under virtio. I understand your position that this is not a bug, but a configuration/packaging issue. So I'll put it to you differently. When a Linux guest issues a fsync or fdatasync what should occur? o If the system has been configured in writeback mode then you don't worry about getting the data to the disk, so when the hypervisor has received the data, be happy with it. o If the system is configured in writethrough mode, shouldn't the hypervisor look to get the data to disk ASAP? Whether this is immediately, or batched with other data, I'll leave it to you guys. As mentioned above, I am not saying it is a bug in KVM, and may well be a poor choice of configuration options within distributions. From what I can interpret from above, scsi and writethrough is the safest model to go for. By extension, for enterprise workloads where data integrity is more critical the default configuration of KVM under Ubuntu and possibly other distributions may be a poor choice. Regards, Matthew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] Add a kvm test guest_s4 which supports both Linux and Windows platform
I mostly agree with Lucas's comments and would like to add some of my own. - "Lucas Meneghel Rodrigues" wrote: > Hi Yolkfull and Chen: > > Thanks for your test! I have some comments and doubts to clear, most > of them are about content of the messages delivered for the user and > some other details. > > On Sun, Sep 27, 2009 at 6:11 AM, Yolkfull Chow > wrote: > > For this case, Ken Cao wrote the linux part previously and I did > extensive > > modifications on Windows platform support. > > > > Signed-off-by: Ken Cao > > Signed-off-by: Yolkfull Chow > > --- > > client/tests/kvm/kvm_tests.cfg.sample | 14 +++ > > client/tests/kvm/tests/guest_s4.py | 66 > + > > 2 files changed, 80 insertions(+), 0 deletions(-) > > create mode 100644 client/tests/kvm/tests/guest_s4.py > > > > diff --git a/client/tests/kvm/kvm_tests.cfg.sample > b/client/tests/kvm/kvm_tests.cfg.sample > > index 285a38f..f9ecb61 100644 > > --- a/client/tests/kvm/kvm_tests.cfg.sample > > +++ b/client/tests/kvm/kvm_tests.cfg.sample > > @@ -94,6 +94,14 @@ variants: > > - linux_s3: install setup > > type = linux_s3 > > > > + - guest_s4: > > + type = guest_s4 > > + check_s4_support_cmd = grep -q disk /sys/power/state > > + test_s4_cmd = "cd /tmp/;nohup tcpdump -q -t ip host > localhost" > > + check_s4_cmd = pgrep tcpdump > > + set_s4_cmd = echo disk > /sys/power/state > > + kill_test_s4_cmd = pkill tcpdump > > + > > - timedrift: install setup > > type = timedrift > > extra_params += " -rtc-td-hack" > > @@ -382,6 +390,12 @@ variants: > > # Alternative host load: > > #host_load_command = "dd if=/dev/urandom of=/dev/null" > > host_load_instances = 8 > > + guest_s4: > > + check_s4_support_cmd = powercfg /hibernate on > > + test_s4_cmd = start /B ping -n 3000 localhost When running the test in user mode with a windows guest, I found that the only way to keep ping.exe alive through s4 was "start ping -t localhost" (or "start ping -n 3000 localhost"), without /B. I tried this with windows XP and it should be confirmed with other guests as well. Running it any other way redirects ping.exe's stdout to the shell session, and since the shell session can't survive s4 in user mode, ping.exe terminates as well. Did you get it to work in user mode with "start /B" or did you just test it in TAP mode? I haven't tried it on linux yet, but I think nohup should do the job. > > + check_s4_cmd = tasklist | find /I "ping" This should work, but I think "ping.exe" would be slightly more specific than "ping", because who knows what else the guest might be running. It still won't save us from processes ending with "mapping.exe" for example. Too bad find doesn't support regular expressions. > > + set_s4_cmd = rundll32.exe PowrProf.dll, > SetSuspendState > > + kill_test_s4_cmd = taskkill /IM ping.exe /F > > > > variants: > > - Win2000: > > diff --git a/client/tests/kvm/tests/guest_s4.py > b/client/tests/kvm/tests/guest_s4.py > > new file mode 100644 > > index 000..5d8fbdf > > --- /dev/null > > +++ b/client/tests/kvm/tests/guest_s4.py > > @@ -0,0 +1,66 @@ > > +import logging, time > > +from autotest_lib.client.common_lib import error > > +import kvm_test_utils, kvm_utils > > + > > + > > +def run_guest_s4(test, params, env): > > + """ > > + Suspend guest to disk,supports both Linux & Windows OSes. > > + > > + �...@param test: kvm test object. > > + �...@param params: Dictionary with test parameters. > > + �...@param env: Dictionary with the test environment. > > + """ > > + vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) > > + session = kvm_test_utils.wait_for_login(vm) > > + > > + logging.info("Checking whether VM supports S4") > > + status = > session.get_command_status(params.get("check_s4_support_cmd")) > > + if status is None: > > + logging.error("Failed to check if S4 exists") > > + elif status != 0: > > + raise error.TestFail("Guest does not support S4") > > + > > + logging.info("Waiting for a while for X to start...") > > Yes, generally X starts a bit later than the SSH service, so I > understand the time being here, however: > > * In fact we are waiting for all services of the guest to be up and > functional, so depending on the level of load, I don't think 10s is > gonna make it. So I suggest something >= 30s > * It's also true that just wait for a given time and hope that it > will be OK kinda sucks, so ideally we need to write utility functions > to stablish as well as possible when all services of a host are fully > booted up. Stated this way, it looks simple, but it's not. > > Autotest experience suggests that there's no real sane way to > determine when a linux box is booted up, but we can take a > semi-rational approach and verify
Re: sync guest calls made async on host - SQLite performance
Matthew Tippett wrote: Thanks Duncan for reproducing the behavior outside myself and Phoronix. I dug deeper into the actual syscalls being made by sqlite. The salient part of the behaviour is small sequential writes followed by a fdatasync (effectively a metadata-free fsync). As Dustin indicates, if scsi is used, you incur the cost of virtualization, if virtio is used, your guests fsyncs incur less cost. So back to the question to the kvm team. It appears that with the stock KVM setup customers who need higher data integrity (through fsync) should steer away from virtio for the moment. Is that assessment correct? No, it's an absurd assessment. You have additional layers of caching happening because you're running a guest from a filesystem on the host. A benchmark running under a guest that happens do be faster than the host does not indicate anything. It could be that the benchmark is poorly written. What operation, specifically, do you think is not behaving properly under kvm? ext4 (karmic's default filesystem) does not enable barriers by default so it's unlikely this is anything barrier related. Regards, Matthew Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sync guest calls made async on host - SQLite performance
On Sun, Oct 11, 2009 at 11:16:42AM +0200, Avi Kivity wrote: > > if scsi is used, you incur the cost of virtualization, > > if virtio is used, your guests fsyncs incur less cost. > > > >So back to the question to the kvm team. It appears that with the > >stock KVM setup customers who need higher data integrity (through > >fsync) should steer away from virtio for the moment. > > > >Is that assessment correct? > > > > Christoph, wasn't there a bug where the guest didn't wait for requests > in response to a barrier request? Can't remember anything like that. The "bug" was the complete lack of cache flush infrastructure for virtio, and the lack of advertising a volative write cache on ide. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] device assignment rom fixups
On Tue, Oct 13, 2009 at 05:20:34PM +0200, Gerd Hoffmann wrote: > Use new rom loading infrastructure. > Devices can simply register option roms now. > > Signed-off-by: Gerd Hoffmann Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] qemu-kvm: Fix configure to respect --kerneldir
On Tue, Oct 13, 2009 at 09:01:09PM +0200, Jan Kiszka wrote: > This simplifies working with new features without having to update the > locally mirrored headers. It also reduces the diff to upstream. > > Signed-off-by: Jan Kiszka Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 4/6] KVM test: Add unattended install script
On Tue, Oct 13, 2009 at 5:52 PM, Ryan Harper wrote: > * Lucas Meneghel Rodrigues [2009-10-09 15:41]: >> In order to make it possible to prepare the environment >> for the guests installation, we have to: >> >> > >> +class UnattendedInstall(object): >> + """ >> + Creates a floppy disk image that will contain a config file for >> unattended >> + OS install. Optionally, sets up a PXE install server using qemu built in >> + TFTP and DHCP servers to install a particular operating system. The >> + parameters to the script are retrieved from environment variables. >> + """ >> + def __init__(self): >> + """ >> + Gets params from environment variables and sets class attributes. >> + """ >> + script_dir = os.path.dirname(sys.modules[__name__].__file__) >> + kvm_test_dir = os.path.abspath(os.path.join(script_dir, "..")) >> + images_dir = os.path.join(kvm_test_dir, 'images') >> + self.deps_dir = os.path.join(kvm_test_dir, 'deps') >> + self.unattended_dir = os.path.join(kvm_test_dir, 'unattended') >> + >> + try: >> + tftp_root = os.environ['KVM_TEST_tftp'] >> + self.tftp_root = os.path.join(images_dir, tftp_root) > > Testing this out, the directory is just slightly wrong. My tftp_root > value ends up being: > > /home/rharper/work/git/autotest/client/tests/kvm/images/images/tftpboot > > The tftp param is built in kvm_vm.py by combining root_dir > (/home/rharper/work/git/autotest/client/tests/kvm) + the tftp value from > kvm_tests.cfg (defaults to 'images/tftpboot'). So if we want to keep > the relative tftp path in kvm_tests.cfg, then I think we need update > the unattended script. > > I think want we want instead is: > > self.tftp_root = os.path.join(kvm_test_dir, tftp_root) > > I made this small change and I can now run the fc11 unattended install. Thanks for pointing this out! I thought I had fixed this on the final patchset version I commited, but turns out I didn't :) Commited as http://autotest.kernel.org/changeset/3842 > diff --git a/client/tests/kvm/scripts/unattended.py > b/client/tests/kvm/scripts/unattended.py > index 6ceeef1..febea6e 100755 > --- a/client/tests/kvm/scripts/unattended.py > +++ b/client/tests/kvm/scripts/unattended.py > @@ -33,7 +33,7 @@ class UnattendedInstall(object): > > try: > tftp_root = os.environ['KVM_TEST_tftp'] > - self.tftp_root = os.path.join(images_dir, tftp_root) > + self.tftp_root = os.path.join(kvm_test_dir, tftp_root) > if not os.path.isdir(self.tftp_root): > os.makedirs(self.tftp_root) > except KeyError: > > > -- > Ryan Harper > Software Engineer; Linux Technology Center > IBM Corp., Austin, Tx > ry...@us.ibm.com > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] v3: allow userspace to adjust kvmclock offset
On Tue, Oct 13, 2009 at 04:55:05PM -0400, Glauber Costa wrote: > + case KVM_SET_CLOCK: { > + struct timespec now; > + struct kvm_clock_data user_ns; > + u64 now_ns; > + long delta; Should'nt that read s64? I guess such a large value won't happen in practice, but the 32bits case would truncate the value differently in the substraction below. Regards, Frederik > + > + r = -EFAULT; > + if (copy_from_user(&user_ns, argp, sizeof(user_ns))) > + goto out; > + > + r = 0; > + ktime_get_ts(&now); > + now_ns = timespec_to_ns(&now); > + delta = user_ns.clock - now_ns; > + kvm->arch.kvmclock_offset = delta; > + break; > + } > + case KVM_GET_CLOCK: { > + struct timespec now; > + struct kvm_clock_data user_ns; > + u64 now_ns; > + > + ktime_get_ts(&now); > + now_ns = timespec_to_ns(&now); > + user_ns.clock = kvm->arch.kvmclock_offset + now_ns; > + > + if (copy_to_user(argp, &user_ns, sizeof(user_ns))) > + r = -EFAULT; > + > + break; > + } > + > default: > ; > } > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index f8f8900..ad0ecbc 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -497,6 +497,11 @@ struct kvm_irqfd { > __u8 pad[20]; > }; > > +struct kvm_clock_data { > + __u64 clock; > + __u64 pad[2]; > +}; > + > /* > * ioctls for VM fds > */ > @@ -546,6 +551,8 @@ struct kvm_irqfd { > #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct > kvm_pit_config) > #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78) > #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) > +#define KVM_SET_CLOCK _IOW(KVMIO, 0x7a, struct > kvm_clock_data) > +#define KVM_GET_CLOCK _IOW(KVMIO, 0x7b, struct > kvm_clock_data) > > /* > * ioctls for vcpu fds > -- > 1.6.2.2 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] v3: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [ v2: uses a struct with a padding ] [ v3: provide an ioctl to get clock data too ] Signed-off-by: Glauber Costa --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 35 ++- include/linux/kvm.h |7 +++ 3 files changed, 42 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 179a919..c9b0d9f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -410,6 +410,7 @@ struct kvm_arch{ unsigned long irq_sources_bitmap; u64 vm_init_tsc; + s64 kvmclock_offset; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9601bc6..58a380a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -699,7 +699,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) /* With all the info we got, fill in the values */ vcpu->hv_clock.system_time = ts.tv_nsec + -(NSEC_PER_SEC * (u64)ts.tv_sec); +(NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset; + /* * The interface expects us to write an even number signaling that the * update is finished. Since the guest won't see the intermediate @@ -2441,6 +2442,38 @@ long kvm_arch_vm_ioctl(struct file *filp, r = 0; break; } + case KVM_SET_CLOCK: { + struct timespec now; + struct kvm_clock_data user_ns; + u64 now_ns; + long delta; + + r = -EFAULT; + if (copy_from_user(&user_ns, argp, sizeof(user_ns))) + goto out; + + r = 0; + ktime_get_ts(&now); + now_ns = timespec_to_ns(&now); + delta = user_ns.clock - now_ns; + kvm->arch.kvmclock_offset = delta; + break; + } + case KVM_GET_CLOCK: { + struct timespec now; + struct kvm_clock_data user_ns; + u64 now_ns; + + ktime_get_ts(&now); + now_ns = timespec_to_ns(&now); + user_ns.clock = kvm->arch.kvmclock_offset + now_ns; + + if (copy_to_user(argp, &user_ns, sizeof(user_ns))) + r = -EFAULT; + + break; + } + default: ; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f8f8900..ad0ecbc 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -497,6 +497,11 @@ struct kvm_irqfd { __u8 pad[20]; }; +struct kvm_clock_data { + __u64 clock; + __u64 pad[2]; +}; + /* * ioctls for VM fds */ @@ -546,6 +551,8 @@ struct kvm_irqfd { #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78) #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) +#define KVM_SET_CLOCK_IOW(KVMIO, 0x7a, struct kvm_clock_data) +#define KVM_GET_CLOCK_IOW(KVMIO, 0x7b, struct kvm_clock_data) /* * ioctls for vcpu fds -- 1.6.2.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 4/6] KVM test: Add unattended install script
* Lucas Meneghel Rodrigues [2009-10-09 15:41]: > In order to make it possible to prepare the environment > for the guests installation, we have to: > > > +class UnattendedInstall(object): > +""" > +Creates a floppy disk image that will contain a config file for > unattended > +OS install. Optionally, sets up a PXE install server using qemu built in > +TFTP and DHCP servers to install a particular operating system. The > +parameters to the script are retrieved from environment variables. > +""" > +def __init__(self): > +""" > +Gets params from environment variables and sets class attributes. > +""" > +script_dir = os.path.dirname(sys.modules[__name__].__file__) > +kvm_test_dir = os.path.abspath(os.path.join(script_dir, "..")) > +images_dir = os.path.join(kvm_test_dir, 'images') > +self.deps_dir = os.path.join(kvm_test_dir, 'deps') > +self.unattended_dir = os.path.join(kvm_test_dir, 'unattended') > + > +try: > +tftp_root = os.environ['KVM_TEST_tftp'] > +self.tftp_root = os.path.join(images_dir, tftp_root) Testing this out, the directory is just slightly wrong. My tftp_root value ends up being: /home/rharper/work/git/autotest/client/tests/kvm/images/images/tftpboot The tftp param is built in kvm_vm.py by combining root_dir (/home/rharper/work/git/autotest/client/tests/kvm) + the tftp value from kvm_tests.cfg (defaults to 'images/tftpboot'). So if we want to keep the relative tftp path in kvm_tests.cfg, then I think we need update the unattended script. I think want we want instead is: self.tftp_root = os.path.join(kvm_test_dir, tftp_root) I made this small change and I can now run the fc11 unattended install. diff --git a/client/tests/kvm/scripts/unattended.py b/client/tests/kvm/scripts/unattended.py index 6ceeef1..febea6e 100755 --- a/client/tests/kvm/scripts/unattended.py +++ b/client/tests/kvm/scripts/unattended.py @@ -33,7 +33,7 @@ class UnattendedInstall(object): try: tftp_root = os.environ['KVM_TEST_tftp'] -self.tftp_root = os.path.join(images_dir, tftp_root) +self.tftp_root = os.path.join(kvm_test_dir, tftp_root) if not os.path.isdir(self.tftp_root): os.makedirs(self.tftp_root) except KeyError: -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] Add a kvm test guest_s4 which supports both Linux and Windows platform
Hi Yolkfull and Chen: Thanks for your test! I have some comments and doubts to clear, most of them are about content of the messages delivered for the user and some other details. On Sun, Sep 27, 2009 at 6:11 AM, Yolkfull Chow wrote: > For this case, Ken Cao wrote the linux part previously and I did extensive > modifications on Windows platform support. > > Signed-off-by: Ken Cao > Signed-off-by: Yolkfull Chow > --- > client/tests/kvm/kvm_tests.cfg.sample | 14 +++ > client/tests/kvm/tests/guest_s4.py | 66 > + > 2 files changed, 80 insertions(+), 0 deletions(-) > create mode 100644 client/tests/kvm/tests/guest_s4.py > > diff --git a/client/tests/kvm/kvm_tests.cfg.sample > b/client/tests/kvm/kvm_tests.cfg.sample > index 285a38f..f9ecb61 100644 > --- a/client/tests/kvm/kvm_tests.cfg.sample > +++ b/client/tests/kvm/kvm_tests.cfg.sample > @@ -94,6 +94,14 @@ variants: > - linux_s3: install setup > type = linux_s3 > > + - guest_s4: > + type = guest_s4 > + check_s4_support_cmd = grep -q disk /sys/power/state > + test_s4_cmd = "cd /tmp/;nohup tcpdump -q -t ip host localhost" > + check_s4_cmd = pgrep tcpdump > + set_s4_cmd = echo disk > /sys/power/state > + kill_test_s4_cmd = pkill tcpdump > + > - timedrift: install setup > type = timedrift > extra_params += " -rtc-td-hack" > @@ -382,6 +390,12 @@ variants: > # Alternative host load: > #host_load_command = "dd if=/dev/urandom of=/dev/null" > host_load_instances = 8 > + guest_s4: > + check_s4_support_cmd = powercfg /hibernate on > + test_s4_cmd = start /B ping -n 3000 localhost > + check_s4_cmd = tasklist | find /I "ping" > + set_s4_cmd = rundll32.exe PowrProf.dll, SetSuspendState > + kill_test_s4_cmd = taskkill /IM ping.exe /F > > variants: > - Win2000: > diff --git a/client/tests/kvm/tests/guest_s4.py > b/client/tests/kvm/tests/guest_s4.py > new file mode 100644 > index 000..5d8fbdf > --- /dev/null > +++ b/client/tests/kvm/tests/guest_s4.py > @@ -0,0 +1,66 @@ > +import logging, time > +from autotest_lib.client.common_lib import error > +import kvm_test_utils, kvm_utils > + > + > +def run_guest_s4(test, params, env): > + """ > + Suspend guest to disk,supports both Linux & Windows OSes. > + > + �...@param test: kvm test object. > + �...@param params: Dictionary with test parameters. > + �...@param env: Dictionary with the test environment. > + """ > + vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) > + session = kvm_test_utils.wait_for_login(vm) > + > + logging.info("Checking whether VM supports S4") > + status = session.get_command_status(params.get("check_s4_support_cmd")) > + if status is None: > + logging.error("Failed to check if S4 exists") > + elif status != 0: > + raise error.TestFail("Guest does not support S4") > + > + logging.info("Waiting for a while for X to start...") Yes, generally X starts a bit later than the SSH service, so I understand the time being here, however: * In fact we are waiting for all services of the guest to be up and functional, so depending on the level of load, I don't think 10s is gonna make it. So I suggest something >= 30s * It's also true that just wait for a given time and hope that it will be OK kinda sucks, so ideally we need to write utility functions to stablish as well as possible when all services of a host are fully booted up. Stated this way, it looks simple, but it's not. Autotest experience suggests that there's no real sane way to determine when a linux box is booted up, but we can take a semi-rational approach and verify if all services for the current run level have the status up or a similar approach. For windows, I was talking to Yaniv Kaul and it seems that processing the output of the 'sc query' command might give what we want. Bottom line, I'd like to add a TODO item, and write a function to stablish (fairly confidently) that a windows/linux guest is booted up. > + time.sleep(10) > + > + # Start up a program(tcpdump for linux OS & ping for M$ OS), as a flag. > + # If the program died after suspend, then fails this testcase. > + test_s4_cmd = params.get("test_s4_cmd") > + session.sendline(test_s4_cmd) > + > + # Get the second session to start S4 > + session2 = kvm_test_utils.wait_for_login(vm) > + > + check_s4_cmd = params.get("check_s4_cmd") > + if session2.get_command_status(check_s4_cmd): > + raise error.TestError("Failed to launch %s background" % test_s4_cmd) > + logging.info("Launched command background in guest: %s" % test_s4_cmd) > + > + # Implement S4 > + logging.info("Start suspend to disk now...") > + session2.sendline(params.get("set_s4_cmd")) > + > + if not kvm_utils.wait_for(vm.is_dead, 360, 30, 2): > + raise error.Te
[PATCH v2] qemu-kvm: Fix configure to respect --kerneldir
This simplifies working with new features without having to update the locally mirrored headers. It also reduces the diff to upstream. Signed-off-by: Jan Kiszka --- v2: Rebase over git head configure | 46 -- 1 files changed, 28 insertions(+), 18 deletions(-) diff --git a/configure b/configure index 2341772..fdefcf6 100755 --- a/configure +++ b/configure @@ -1346,24 +1346,7 @@ fi ## # kvm probe if test "$kvm" != "no" ; then - case "$cpu" in - i386 | x86_64) -kvm_arch="x86" -;; - ppc) -kvm_arch="powerpc" -;; - *) -kvm_arch="$cpu" -;; - esac - - kvm_cflags="-I$source_path/kvm/include" - kvm_cflags="$kvm_cflags -include $source_path/kvm/include/linux/config.h" - kvm_cflags="$kvm_cflags -I$source_path/kvm/include/$kvm_arch" - kvm_cflags="$kvm_cflags -idirafter $source_path/compat" - - cat > $TMPC < $TMPC < #if !defined(KVM_API_VERSION) || KVM_API_VERSION < 12 || KVM_API_VERSION > 12 #error Invalid KVM version @@ -1379,6 +1362,33 @@ if test "$kvm" != "no" ; then #endif int main(void) { return 0; } EOF + if test "$kerneldir" != "" ; then + kvm_cflags=-I"$kerneldir"/include + if test \( "$cpu" = "i386" -o "$cpu" = "x86_64" \) \ + -a -d "$kerneldir/arch/x86/include" ; then +kvm_cflags="$kvm_cflags -I$kerneldir/arch/x86/include" + elif test "$cpu" = "ppc" -a -d "$kerneldir/arch/powerpc/include" ; then + kvm_cflags="$kvm_cflags -I$kerneldir/arch/powerpc/include" +elif test -d "$kerneldir/arch/$cpu/include" ; then +kvm_cflags="$kvm_cflags -I$kerneldir/arch/$cpu/include" + fi + else + case "$cpu" in + i386 | x86_64) +kvm_arch="x86" +;; + ppc) +kvm_arch="powerpc" +;; + *) +kvm_arch="$cpu" +;; + esac + kvm_cflags="-I$source_path/kvm/include" + kvm_cflags="$kvm_cflags -include $source_path/kvm/include/linux/config.h" + kvm_cflags="$kvm_cflags -I$source_path/kvm/include/$kvm_arch" + fi + kvm_cflags="$kvm_cflags -idirafter $source_path/compat" if compile_prog "$kvm_cflags" "" ; then kvm=yes else signature.asc Description: OpenPGP digital signature
Re: [PATCH 1/1] kvm/mmu: Resolve compile warning
Thank you. Sorry for the noise. Best regards - Javier Martínez Canillas +595 981 88 66 58 On Tue, Oct 13, 2009 at 1:10 PM, Marcelo Tosatti wrote: > javier, > > This is fixed in the -next branch of kvm.git. Thanks. > > On Sun, Oct 11, 2009 at 02:28:23AM -0400, javier martinez canillas wrote: >> I got this compile warning with today linux-next: >> >> arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’: >> arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different >> size >> arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’: >> arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different >> size >> >> This patch solves the issue: >> >> Signed-off-by: Javier Martinez Canillas > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Don't sync mpstate to/from kernel when unneeded.
On Tue, Oct 13, 2009 at 03:36:13PM -0300, Marcelo Tosatti wrote: > On Tue, Oct 13, 2009 at 02:17:20PM +0200, Gleb Natapov wrote: > > mp_state, unlike other cpu state, can be changed not only from vcpu > > context it belongs to, but by other vcpus too. That makes its loading > > from kernel/saving back not safe if mp_state value is changed inside > > kernel between load and save. For example vcpu 1 loads mp_sate into > > user-space and the state is RUNNING, vcpu 0 sends INIT/SIPI to vcpu 1 > > so in-kernel mp_sate becomes SIPI, vcpu 1 save user-space copy into > > kernel and calls vcpu_run(). SIPI sate is lost. > > > > The patch copies mp_sate into kernel only when it is knows that > > int-kernel value is outdated. This happens on reset and vmload. > > > > Signed-off-by: Gleb Natapov > > --- > > hw/apic.c |1 + > > monitor.c |2 ++ > > qemu-kvm.c|9 - > > qemu-kvm.h|1 - > > target-i386/machine.c |3 +++ > > 5 files changed, 10 insertions(+), 6 deletions(-) > > > > diff --git a/hw/apic.c b/hw/apic.c > > index 2952675..729 100644 > > --- a/hw/apic.c > > +++ b/hw/apic.c > > @@ -512,6 +512,7 @@ void apic_init_reset(CPUState *env) > > if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) { > > env->mp_state > > = env->halted ? KVM_MP_STATE_UNINITIALIZED : > > KVM_MP_STATE_RUNNABLE; > > +kvm_load_mpstate(env); > > } > > #endif > > } > > diff --git a/monitor.c b/monitor.c > > index 7f0f5a9..dd8f2ca 100644 > > --- a/monitor.c > > +++ b/monitor.c > > @@ -350,6 +350,7 @@ static CPUState *mon_get_cpu(void) > > mon_set_cpu(0); > > } > > cpu_synchronize_state(cur_mon->mon_cpu); > > +kvm_save_mpstate(cur_mon->mon_cpu); > > return cur_mon->mon_cpu; > > } > > > > @@ -377,6 +378,7 @@ static void do_info_cpus(Monitor *mon) > > > > for(env = first_cpu; env != NULL; env = env->next_cpu) { > > cpu_synchronize_state(env); > > +kvm_save_mpstate(env); > > monitor_printf(mon, "%c CPU #%d:", > > (env == mon->mon_cpu) ? '*' : ' ', > > env->cpu_index); > > diff --git a/qemu-kvm.c b/qemu-kvm.c > > index 3765818..2a1e0ff 100644 > > --- a/qemu-kvm.c > > +++ b/qemu-kvm.c > > @@ -1609,11 +1609,6 @@ static void on_vcpu(CPUState *env, void (*func)(void > > *data), void *data) > > void kvm_arch_get_registers(CPUState *env) > > { > > kvm_arch_save_regs(env); > > - kvm_arch_save_mpstate(env); > > -#ifdef KVM_CAP_MP_STATE > > - if (kvm_irqchip_in_kernel(kvm_context)) > > - env->halted = (env->mp_state == KVM_MP_STATE_HALTED); > > -#endif > > Why don't you keep saving it here (so there's no need to do it > explicitly elsewhere), and only explictly loading? To keep kvm_arch_get_registers/kvm_arch_set_registers symmetric I guess. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Don't sync mpstate to/from kernel when unneeded.
On Tue, Oct 13, 2009 at 02:17:20PM +0200, Gleb Natapov wrote: > mp_state, unlike other cpu state, can be changed not only from vcpu > context it belongs to, but by other vcpus too. That makes its loading > from kernel/saving back not safe if mp_state value is changed inside > kernel between load and save. For example vcpu 1 loads mp_sate into > user-space and the state is RUNNING, vcpu 0 sends INIT/SIPI to vcpu 1 > so in-kernel mp_sate becomes SIPI, vcpu 1 save user-space copy into > kernel and calls vcpu_run(). SIPI sate is lost. > > The patch copies mp_sate into kernel only when it is knows that > int-kernel value is outdated. This happens on reset and vmload. > > Signed-off-by: Gleb Natapov > --- > hw/apic.c |1 + > monitor.c |2 ++ > qemu-kvm.c|9 - > qemu-kvm.h|1 - > target-i386/machine.c |3 +++ > 5 files changed, 10 insertions(+), 6 deletions(-) > > diff --git a/hw/apic.c b/hw/apic.c > index 2952675..729 100644 > --- a/hw/apic.c > +++ b/hw/apic.c > @@ -512,6 +512,7 @@ void apic_init_reset(CPUState *env) > if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) { > env->mp_state > = env->halted ? KVM_MP_STATE_UNINITIALIZED : > KVM_MP_STATE_RUNNABLE; > +kvm_load_mpstate(env); > } > #endif > } > diff --git a/monitor.c b/monitor.c > index 7f0f5a9..dd8f2ca 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -350,6 +350,7 @@ static CPUState *mon_get_cpu(void) > mon_set_cpu(0); > } > cpu_synchronize_state(cur_mon->mon_cpu); > +kvm_save_mpstate(cur_mon->mon_cpu); > return cur_mon->mon_cpu; > } > > @@ -377,6 +378,7 @@ static void do_info_cpus(Monitor *mon) > > for(env = first_cpu; env != NULL; env = env->next_cpu) { > cpu_synchronize_state(env); > +kvm_save_mpstate(env); > monitor_printf(mon, "%c CPU #%d:", > (env == mon->mon_cpu) ? '*' : ' ', > env->cpu_index); > diff --git a/qemu-kvm.c b/qemu-kvm.c > index 3765818..2a1e0ff 100644 > --- a/qemu-kvm.c > +++ b/qemu-kvm.c > @@ -1609,11 +1609,6 @@ static void on_vcpu(CPUState *env, void (*func)(void > *data), void *data) > void kvm_arch_get_registers(CPUState *env) > { > kvm_arch_save_regs(env); > - kvm_arch_save_mpstate(env); > -#ifdef KVM_CAP_MP_STATE > - if (kvm_irqchip_in_kernel(kvm_context)) > - env->halted = (env->mp_state == KVM_MP_STATE_HALTED); > -#endif Why don't you keep saving it here (so there's no need to do it explicitly elsewhere), and only explictly loading? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.
On Tue, Oct 13, 2009 at 03:23:48PM -0300, Marcelo Tosatti wrote: > On Tue, Oct 13, 2009 at 03:19:08PM -0300, Marcelo Tosatti wrote: > > > @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env) > > > on_vcpu(env, kvm_arch_do_ioperm, data); > > > #endif > > > > > > -/* signal VCPU creation */ > > > +setup_kernel_sigmask(env); > > > + > > > pthread_mutex_lock(&qemu_mutex); > > > +cpu_single_env = env; > > > + > > > +kvm_arch_init_vcpu(env); > > > +#ifdef TARGET_I386 > > > +kvm_tpr_vcpu_start(env); > > > +#endif > > > + > > > +kvm_arch_load_regs(env); > > > + > > > +/* signal VCPU creation */ > > > current_env->created = 1; > > > pthread_cond_signal(&qemu_vcpu_cond); > > > > > > /* and wait for machine initialization */ > > > while (!qemu_system_ready) > > > qemu_cond_wait(&qemu_system_cond); > > > -pthread_mutex_unlock(&qemu_mutex); > > > > You don't set cpu_single_env after reacquiring > > qemu_mutex here (via qemu_cond_wait). > > > > Also i'm curious about the failure. This patch by itself doesn't fix the bug. Next one does. This one rearrange code to make more sense. CPU is created only when it is initialized and ready to run. > > Why say, bsp should care about other cpu's register state while doing MP > init? > Because vcpu init will reset MP state, so if bsp will send sipi to vcpu1 before vcpu1 is initialized sipi will be lost. > MP state is set via apic_reset, which happens before qemu_system_ready > is set. > Without my next patch MP state is set (by set I mean ioctl(mp_state)) only on vcpu_run. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.
On Tue, Oct 13, 2009 at 03:19:08PM -0300, Marcelo Tosatti wrote: > On Tue, Oct 13, 2009 at 02:17:19PM +0200, Gleb Natapov wrote: > > Otherwise some cpus may start executing code before others > > are fully initialized. > > > > Signed-off-by: Gleb Natapov > > --- > > qemu-kvm.c | 26 -- > > 1 files changed, 12 insertions(+), 14 deletions(-) > > > > diff --git a/qemu-kvm.c b/qemu-kvm.c > > index 62ca050..3765818 100644 > > --- a/qemu-kvm.c > > +++ b/qemu-kvm.c > > @@ -1954,18 +1954,6 @@ static void process_irqchip_events(CPUState *env) > > > > static int kvm_main_loop_cpu(CPUState *env) > > { > > -setup_kernel_sigmask(env); > > - > > -pthread_mutex_lock(&qemu_mutex); > > - > > -kvm_arch_init_vcpu(env); > > -#ifdef TARGET_I386 > > -kvm_tpr_vcpu_start(env); > > -#endif > > - > > -cpu_single_env = env; > > -kvm_arch_load_regs(env); > > - > > while (1) { > > int run_cpu = !is_cpu_stopped(env); > > if (run_cpu && !kvm_irqchip_in_kernel(kvm_context)) { > > @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env) > > on_vcpu(env, kvm_arch_do_ioperm, data); > > #endif > > > > -/* signal VCPU creation */ > > +setup_kernel_sigmask(env); > > + > > pthread_mutex_lock(&qemu_mutex); > > +cpu_single_env = env; > > + > > +kvm_arch_init_vcpu(env); > > +#ifdef TARGET_I386 > > +kvm_tpr_vcpu_start(env); > > +#endif > > + > > +kvm_arch_load_regs(env); > > + > > +/* signal VCPU creation */ > > current_env->created = 1; > > pthread_cond_signal(&qemu_vcpu_cond); > > > > /* and wait for machine initialization */ > > while (!qemu_system_ready) > > qemu_cond_wait(&qemu_system_cond); > > -pthread_mutex_unlock(&qemu_mutex); > > You don't set cpu_single_env after reacquiring > qemu_mutex here (via qemu_cond_wait). Hmm, as far as I can see it is not used any more until kvm_run call. But may we should set it anyway. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.
On Tue, Oct 13, 2009 at 03:19:08PM -0300, Marcelo Tosatti wrote: > > @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env) > > on_vcpu(env, kvm_arch_do_ioperm, data); > > #endif > > > > -/* signal VCPU creation */ > > +setup_kernel_sigmask(env); > > + > > pthread_mutex_lock(&qemu_mutex); > > +cpu_single_env = env; > > + > > +kvm_arch_init_vcpu(env); > > +#ifdef TARGET_I386 > > +kvm_tpr_vcpu_start(env); > > +#endif > > + > > +kvm_arch_load_regs(env); > > + > > +/* signal VCPU creation */ > > current_env->created = 1; > > pthread_cond_signal(&qemu_vcpu_cond); > > > > /* and wait for machine initialization */ > > while (!qemu_system_ready) > > qemu_cond_wait(&qemu_system_cond); > > -pthread_mutex_unlock(&qemu_mutex); > > You don't set cpu_single_env after reacquiring > qemu_mutex here (via qemu_cond_wait). > Also i'm curious about the failure. Why say, bsp should care about other cpu's register state while doing MP init? MP state is set via apic_reset, which happens before qemu_system_ready is set. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Complete cpu initialization before signaling main thread.
On Tue, Oct 13, 2009 at 02:17:19PM +0200, Gleb Natapov wrote: > Otherwise some cpus may start executing code before others > are fully initialized. > > Signed-off-by: Gleb Natapov > --- > qemu-kvm.c | 26 -- > 1 files changed, 12 insertions(+), 14 deletions(-) > > diff --git a/qemu-kvm.c b/qemu-kvm.c > index 62ca050..3765818 100644 > --- a/qemu-kvm.c > +++ b/qemu-kvm.c > @@ -1954,18 +1954,6 @@ static void process_irqchip_events(CPUState *env) > > static int kvm_main_loop_cpu(CPUState *env) > { > -setup_kernel_sigmask(env); > - > -pthread_mutex_lock(&qemu_mutex); > - > -kvm_arch_init_vcpu(env); > -#ifdef TARGET_I386 > -kvm_tpr_vcpu_start(env); > -#endif > - > -cpu_single_env = env; > -kvm_arch_load_regs(env); > - > while (1) { > int run_cpu = !is_cpu_stopped(env); > if (run_cpu && !kvm_irqchip_in_kernel(kvm_context)) { > @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env) > on_vcpu(env, kvm_arch_do_ioperm, data); > #endif > > -/* signal VCPU creation */ > +setup_kernel_sigmask(env); > + > pthread_mutex_lock(&qemu_mutex); > +cpu_single_env = env; > + > +kvm_arch_init_vcpu(env); > +#ifdef TARGET_I386 > +kvm_tpr_vcpu_start(env); > +#endif > + > +kvm_arch_load_regs(env); > + > +/* signal VCPU creation */ > current_env->created = 1; > pthread_cond_signal(&qemu_vcpu_cond); > > /* and wait for machine initialization */ > while (!qemu_system_ready) > qemu_cond_wait(&qemu_system_cond); > -pthread_mutex_unlock(&qemu_mutex); You don't set cpu_single_env after reacquiring qemu_mutex here (via qemu_cond_wait). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check
On Mon, Oct 12, 2009 at 08:51:40AM +0200, Jan Kiszka wrote: > This (broken) check dates back to the days when this code was shared > across architectures. x86 has IOMEM, so drop it. > > Signed-off-by: Jan Kiszka Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] kvm/mmu: Resolve compile warning
javier, This is fixed in the -next branch of kvm.git. Thanks. On Sun, Oct 11, 2009 at 02:28:23AM -0400, javier martinez canillas wrote: > I got this compile warning with today linux-next: > > arch/x86/kvm/mmu.c: In function ‘kvm_set_pte_rmapp’: > arch/x86/kvm/mmu.c:770: warning: cast to pointer from integer of different > size > arch/x86/kvm/mmu.c: In function ‘kvm_set_spte_hva’: > arch/x86/kvm/mmu.c:849: warning: cast from pointer to integer of different > size > > This patch solves the issue: > > Signed-off-by: Javier Martinez Canillas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/10] Clean up vcpu context structure
On Fri, Oct 09, 2009 at 03:03:08PM -0300, Glauber Costa wrote: > This series aims at cleanin up vcpu_context structure. I am not removing yet > the fd field, because it is used in the ioctls, and I want to do it > separadedly. > > But after this series, this structure exists only as a way to hold the file > descriptor, > and is, much cleaner, and much closer to upstream qemu than before. Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Modifying RAM during runtime on guest
Hi Jim. On Wednesday, 07 October 2009 14:21:15 -0400, Jim Paris wrote: > > > > I noticed no-one answered this, and I just ran into the same > > > > thing myself. As Avi pointed out earlier, it is a guest bug, and > > > > upgrading the guest to 2.6.27 should fix it: > > > > http://www.mail-archive.com/kvm@vger.kernel.org/msg10849.html > > > In this moment I doesn't have Internet connectivity in my house, > > > but, as soon as it be possible, I shall download the necessary > > > software to compile 2.6.27 or superior and then I tell you the > > > result of the tests. > > After to have compiled Linux 2.6.30.3 using the Debian way on guest > > Debian GNU/Linux Lenny, when trying to boot the guest with this > > kernel, the bootstrapping is freeze on "Loading, please wait..." > > message. > > > > In logs I don't get entries of the bootstrapping process with 2.6.30 > > (I think it is because the process in itself didn't start). Can it > > be due to a bug using 2.6.30.3 in guest with host KVM-88? > 2.6.30.3 should work fine, there must be some other problem. If you > remove "quiet" from the boot command line you should see the kernel > messages which may indicate the problem. I'd also recommend just > trying a standard prebuilt Debian kernel. > http://packages.debian.org/squeeze/linux-image-2.6.30-1-amd64 As we commented in this [1] thread, the problem was due to a patch that Debian developers have applied to stock kernels enabling only libata for the systems having a SATA controller. For that reason the Debian stock kernel saw disks as hdX and kernels 2.6.31.2 and 2.6.30.3 compiled by myself saw disks like sdX. After booting, with 2.6.3x, no longer panic is observed when restituting the memory to its initial value. Thanks for your reply. Regards, Daniel [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/41158 -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Squeeze - Linux user #188.598 signature.asc Description: Digital signature
Re: [PATCH qemu-kvm] Enable UFO on virtio-net and tap devices
On Fri, Oct 09, 2009 at 08:11:28AM +0100, Mark McLoughlin wrote: > On Thu, 2009-10-08 at 15:31 -0700, Sridhar Samudrala wrote: > > On Thu, 2009-10-08 at 11:07 +0100, Mark McLoughlin wrote: > > > On Wed, 2009-10-07 at 14:50 -0700, Sridhar Samudrala wrote: > > > > linux 2.6.32 includes UDP fragmentation offload support in software. > > > > So we can enable UFO on the host tap device if supported and allow > > > > setting > > > > UFO on virtio-net in the guest. > > > > > > Hmm, we really need to detect whether the host has tuntap UFO support > > > before advertising it to the guest. Maybe in net_tap_fd_init() we should > > > toggle TUN_F_UFO back and forth and check for EINVAL? > > > > Sure. Here is an updated patch that checks for UFO support in host. > > Looks good to me, thanks > > Acked-by: Mark McLoughlin Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] include stdlib.h in qemu-kvm.h
On Thu, Oct 08, 2009 at 03:53:59PM -0300, Glauber Costa wrote: > abort() needs it. Build with kvm disabled breaks without it. > > Signed-off-by: Glauber Costa Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH] kvm: x86: Add support for KVM_GET/PUT_VCPU_STATE
This is a demonstration patch for the new KVM IOCTLs proposed in [1]. It converts upstream kvm to use this in favor of the individual IOCTLs to get/set VCPU registers and related states. It works, fixes the missing NMI state handling but, of course, only makes sense if the interface is accepted by kvm. [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/41550 --- kvm-all.c |2 kvm.h |2 target-i386/cpu.h |1 target-i386/kvm.c | 507 +++-- target-i386/machine.c |1 target-ppc/kvm.c |4 6 files changed, 294 insertions(+), 223 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 48ae26c..31bc2f8 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -150,6 +150,7 @@ static void kvm_reset_vcpu(void *opaque) { CPUState *env = opaque; +kvm_arch_reset_vcpu(env); if (kvm_arch_put_registers(env)) { fprintf(stderr, "Fatal: kvm vcpu reset failed\n"); abort(); @@ -201,6 +202,7 @@ int kvm_init_vcpu(CPUState *env) ret = kvm_arch_init_vcpu(env); if (ret == 0) { qemu_register_reset(kvm_reset_vcpu, env); +kvm_arch_reset_vcpu(env); ret = kvm_arch_put_registers(env); } err: diff --git a/kvm.h b/kvm.h index e7d5beb..6a82f6a 100644 --- a/kvm.h +++ b/kvm.h @@ -93,6 +93,8 @@ int kvm_arch_init(KVMState *s, int smp_cpus); int kvm_arch_init_vcpu(CPUState *env); +void kvm_arch_reset_vcpu(CPUState *env); + struct kvm_guest_debug; struct kvm_debug_exit_arch; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 5929d28..37823fe 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -693,6 +693,7 @@ typedef struct CPUX86State { /* For KVM */ uint64_t interrupt_bitmap[256 / 64]; uint32_t mp_state; +uint32_t nmi_pending; /* in order to simplify APIC support, we leave this pointer to the user */ diff --git a/target-i386/kvm.c b/target-i386/kvm.c index aa90eff..05ff97a 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -221,6 +221,11 @@ int kvm_arch_init_vcpu(CPUState *env) return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data); } +void kvm_arch_reset_vcpu(CPUState *env) +{ +env->nmi_pending = 0; +} + static int kvm_has_msr_star(CPUState *env) { static int has_msr_star; @@ -346,113 +351,93 @@ static void kvm_getput_reg(__u64 *kvm_reg, target_ulong *qemu_reg, int set) *qemu_reg = *kvm_reg; } -static int kvm_getput_regs(CPUState *env, int set) +static void kvm_getput_regs(CPUState *env, struct kvm_regs *regs, int set) { -struct kvm_regs regs; -int ret = 0; - -if (!set) { -ret = kvm_vcpu_ioctl(env, KVM_GET_REGS, ®s); -if (ret < 0) -return ret; -} - -kvm_getput_reg(®s.rax, &env->regs[R_EAX], set); -kvm_getput_reg(®s.rbx, &env->regs[R_EBX], set); -kvm_getput_reg(®s.rcx, &env->regs[R_ECX], set); -kvm_getput_reg(®s.rdx, &env->regs[R_EDX], set); -kvm_getput_reg(®s.rsi, &env->regs[R_ESI], set); -kvm_getput_reg(®s.rdi, &env->regs[R_EDI], set); -kvm_getput_reg(®s.rsp, &env->regs[R_ESP], set); -kvm_getput_reg(®s.rbp, &env->regs[R_EBP], set); +kvm_getput_reg(®s->rax, &env->regs[R_EAX], set); +kvm_getput_reg(®s->rbx, &env->regs[R_EBX], set); +kvm_getput_reg(®s->rcx, &env->regs[R_ECX], set); +kvm_getput_reg(®s->rdx, &env->regs[R_EDX], set); +kvm_getput_reg(®s->rsi, &env->regs[R_ESI], set); +kvm_getput_reg(®s->rdi, &env->regs[R_EDI], set); +kvm_getput_reg(®s->rsp, &env->regs[R_ESP], set); +kvm_getput_reg(®s->rbp, &env->regs[R_EBP], set); #ifdef TARGET_X86_64 -kvm_getput_reg(®s.r8, &env->regs[8], set); -kvm_getput_reg(®s.r9, &env->regs[9], set); -kvm_getput_reg(®s.r10, &env->regs[10], set); -kvm_getput_reg(®s.r11, &env->regs[11], set); -kvm_getput_reg(®s.r12, &env->regs[12], set); -kvm_getput_reg(®s.r13, &env->regs[13], set); -kvm_getput_reg(®s.r14, &env->regs[14], set); -kvm_getput_reg(®s.r15, &env->regs[15], set); +kvm_getput_reg(®s->r8, &env->regs[8], set); +kvm_getput_reg(®s->r9, &env->regs[9], set); +kvm_getput_reg(®s->r10, &env->regs[10], set); +kvm_getput_reg(®s->r11, &env->regs[11], set); +kvm_getput_reg(®s->r12, &env->regs[12], set); +kvm_getput_reg(®s->r13, &env->regs[13], set); +kvm_getput_reg(®s->r14, &env->regs[14], set); +kvm_getput_reg(®s->r15, &env->regs[15], set); #endif -kvm_getput_reg(®s.rflags, &env->eflags, set); -kvm_getput_reg(®s.rip, &env->eip, set); - -if (set) -ret = kvm_vcpu_ioctl(env, KVM_SET_REGS, ®s); - -return ret; +kvm_getput_reg(®s->rflags, &env->eflags, set); +kvm_getput_reg(®s->rip, &env->eip, set); } -static int kvm_put_fpu(CPUState *env) +static void kvm_put_fpu(CPUState *env, struct kvm_fpu *fpu) { -struct kvm_fpu fpu; int i; -memset(&fpu, 0, sizeof fpu); -fpu.fsw = env->fpus & ~(7 << 11); -fpu.fsw |= (env->fpstt &
[PATCH 2/4] KVM: Add unified KVM_GET/SET_VCPU_STATE IOCTL
Add a new IOCTL pair to retrieve or set the VCPU state in one chunk. More precisely, the IOCTL is able to process a list of substates to be read or written. This list is easily extensible without breaking the existing ABI, thus we will no longer have to add new IOCTLs when we discover a missing VCPU state field or want to support new hardware features. This patch establishes the generic infrastructure for KVM_GET/ SET_VCPU_STATE and adds support for the generic substates REGS, SREGS, FPU, and MP. To avoid code duplication, the entry point for the corresponding original IOCTLs are converted to make use of the new infrastructure internally, too. Signed-off-by: Jan Kiszka --- arch/ia64/kvm/kvm-ia64.c | 12 ++ arch/powerpc/kvm/powerpc.c | 12 ++ arch/s390/kvm/kvm-s390.c | 12 ++ arch/x86/kvm/x86.c | 12 ++ include/linux/kvm.h| 24 +++ include/linux/kvm_host.h |5 + virt/kvm/kvm_main.c| 318 +++- 7 files changed, 303 insertions(+), 92 deletions(-) diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 5fdeec5..c3450a6 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1991,3 +1991,15 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, vcpu_put(vcpu); return r; } + +int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 5902bbc..3336ad5 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -436,3 +436,15 @@ int kvm_arch_init(void *opaque) void kvm_arch_exit(void) { } + +int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 5445058..978ed6c 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -450,6 +450,18 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, return -EINVAL; /* not implemented yet */ } +int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} + static void __vcpu_run(struct kvm_vcpu *vcpu) { memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 11a6f2f..839b1c5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4662,6 +4662,18 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) } EXPORT_SYMBOL_GPL(kvm_put_guest_fpu); +int kvm_arch_vcpu_get_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} + +int kvm_arch_vcpu_set_substate(struct kvm_vcpu *vcpu, uint8_t __user *arg_base, + struct kvm_vcpu_substate *substate) +{ + return -EINVAL; +} + void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) { if (vcpu->arch.time_page) { diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 7d8c382..da81b89 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -393,6 +393,26 @@ struct kvm_ioeventfd { __u8 pad[36]; }; +/* for KVM_GET_VCPU_STATE and KVM_SET_VCPU_STATE */ +#define KVM_VCPU_REGS 0 +#define KVM_VCPU_SREGS 1 +#define KVM_VCPU_FPU 2 +#define KVM_VCPU_MP3 + +struct kvm_vcpu_substate { + __u32 type; + __u32 pad; + __s64 offset; +}; + +#define KVM_MAX_VCPU_SUBSTATES 64 + +struct kvm_vcpu_state { + __u32 nsubstates; /* number of elements in substates */ + __u32 nprocessed; /* return value: successfully processed substates */ + struct kvm_vcpu_substate substates[0]; +}; + #define KVMIO 0xAE /* @@ -480,6 +500,7 @@ struct kvm_ioeventfd { #endif #define KVM_CAP_IOEVENTFD 36 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37 +#define KVM_CAP_VCPU_STATE 38 #ifdef KVM_CAP_IRQ_ROUTING @@ -642,6 +663,9 @@ struct kvm_irqfd { /* IA64 stack access */ #define KVM_IA64_VCPU_GET_STACK _IOR(KVMIO, 0x9a, void *) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) +/* Available with KVM_CAP_VCP
[PATCH 0/4] Extensible VCPU state IOCTL
As you may have noticed, we are constantly adding IOCTLs as yet another state field has to be exchanged between kernel and user space. I was about to add one for the missing hidden NMI states (pending and masked), but Avi suggested to take this chance, inventing a more easily extensible interface. And here comes my suggestion for VCPU states. Please see patch 2 for details on this approach, patch 4 demonstrates how extensions may look like in the future. I will follow up with a patch against qemu upstream to convert kvm_arch_get/put_registers to the new interface, ie. query/set all substates via one IOCTL when available. I did not convert qemu-kvm, only added support for the NMI substate, as the corresponding code will likely by modified to use the upstream implementation anyway. Comments welcome, also suggestion for further substates to be added in this round. Jan Find this series also at git://git.kiszka.org/linux-kvm.git queues/vcpu-state Jan Kiszka (4): KVM: Reorder IOCTLs in main kvm.h KVM: Add unified KVM_GET/SET_VCPU_STATE IOCTL KVM: x86: Add support for KVM_GET/SET_VCPU_STATE KVM: x86: Add VCPU substate for NMI states arch/ia64/kvm/kvm-ia64.c| 12 ++ arch/powerpc/kvm/powerpc.c | 12 ++ arch/s390/kvm/kvm-s390.c| 12 ++ arch/x86/include/asm/kvm.h | 15 ++- arch/x86/include/asm/kvm_host.h |2 + arch/x86/kvm/svm.c | 22 +++ arch/x86/kvm/vmx.c | 30 arch/x86/kvm/x86.c | 243 - include/linux/kvm.h | 246 +-- include/linux/kvm_host.h|5 + virt/kvm/kvm_main.c | 318 +++--- 11 files changed, 637 insertions(+), 280 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: Reorder IOCTLs in main kvm.h
Obviously, people tend to extend this header at the bottom - more or less blindly. Ensure that deprecated stuff gets its own corner again by moving things to the top. Also add some comments and reindent IOCTLs to make them more readable and reduce the risk of number collisions. Signed-off-by: Jan Kiszka --- include/linux/kvm.h | 228 ++- 1 files changed, 114 insertions(+), 114 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f8f8900..7d8c382 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -14,12 +14,76 @@ #define KVM_API_VERSION 12 -/* for KVM_TRACE_ENABLE, deprecated */ +/* *** Deprecated interfaces *** */ + +#define KVM_TRC_SHIFT 16 + +#define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT) +#define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) + +#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01) +#define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) +#define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) + +#define KVM_TRC_HEAD_SIZE 12 +#define KVM_TRC_CYCLE_SIZE 8 +#define KVM_TRC_EXTRA_MAX 7 + +#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02) +#define KVM_TRC_REDELIVER_EVT(KVM_TRC_HANDLER + 0x03) +#define KVM_TRC_PEND_INTR(KVM_TRC_HANDLER + 0x04) +#define KVM_TRC_IO_READ (KVM_TRC_HANDLER + 0x05) +#define KVM_TRC_IO_WRITE (KVM_TRC_HANDLER + 0x06) +#define KVM_TRC_CR_READ (KVM_TRC_HANDLER + 0x07) +#define KVM_TRC_CR_WRITE (KVM_TRC_HANDLER + 0x08) +#define KVM_TRC_DR_READ (KVM_TRC_HANDLER + 0x09) +#define KVM_TRC_DR_WRITE (KVM_TRC_HANDLER + 0x0A) +#define KVM_TRC_MSR_READ (KVM_TRC_HANDLER + 0x0B) +#define KVM_TRC_MSR_WRITE(KVM_TRC_HANDLER + 0x0C) +#define KVM_TRC_CPUID(KVM_TRC_HANDLER + 0x0D) +#define KVM_TRC_INTR (KVM_TRC_HANDLER + 0x0E) +#define KVM_TRC_NMI (KVM_TRC_HANDLER + 0x0F) +#define KVM_TRC_VMMCALL (KVM_TRC_HANDLER + 0x10) +#define KVM_TRC_HLT (KVM_TRC_HANDLER + 0x11) +#define KVM_TRC_CLTS (KVM_TRC_HANDLER + 0x12) +#define KVM_TRC_LMSW (KVM_TRC_HANDLER + 0x13) +#define KVM_TRC_APIC_ACCESS (KVM_TRC_HANDLER + 0x14) +#define KVM_TRC_TDP_FAULT(KVM_TRC_HANDLER + 0x15) +#define KVM_TRC_GTLB_WRITE (KVM_TRC_HANDLER + 0x16) +#define KVM_TRC_STLB_WRITE (KVM_TRC_HANDLER + 0x17) +#define KVM_TRC_STLB_INVAL (KVM_TRC_HANDLER + 0x18) +#define KVM_TRC_PPC_INSTR(KVM_TRC_HANDLER + 0x19) + struct kvm_user_trace_setup { - __u32 buf_size; /* sub_buffer size of each per-cpu */ - __u32 buf_nr; /* the number of sub_buffers of each per-cpu */ + __u32 buf_size; + __u32 buf_nr; }; +#define __KVM_DEPRECATED_MAIN_W_0x06 \ + _IOW(KVMIO, 0x06, struct kvm_user_trace_setup) +#define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07) +#define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08) + +#define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq) + +struct kvm_breakpoint { + __u32 enabled; + __u32 padding; + __u64 address; +}; + +struct kvm_debug_guest { + __u32 enabled; + __u32 pad; + struct kvm_breakpoint breakpoints[4]; + __u32 singlestep; +}; + +#define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest) + +/* *** End of deprecated interfaces *** */ + + /* for KVM_CREATE_MEMORY_REGION */ struct kvm_memory_region { __u32 slot; @@ -329,24 +393,6 @@ struct kvm_ioeventfd { __u8 pad[36]; }; -#define KVM_TRC_SHIFT 16 -/* - * kvm trace categories - */ -#define KVM_TRC_ENTRYEXIT (1 << KVM_TRC_SHIFT) -#define KVM_TRC_HANDLER (1 << (KVM_TRC_SHIFT + 1)) /* only 12 bits */ - -/* - * kvm trace action - */ -#define KVM_TRC_VMENTRY (KVM_TRC_ENTRYEXIT + 0x01) -#define KVM_TRC_VMEXIT (KVM_TRC_ENTRYEXIT + 0x02) -#define KVM_TRC_PAGE_FAULT (KVM_TRC_HANDLER + 0x01) - -#define KVM_TRC_HEAD_SIZE 12 -#define KVM_TRC_CYCLE_SIZE 8 -#define KVM_TRC_EXTRA_MAX 7 - #define KVMIO 0xAE /* @@ -367,12 +413,10 @@ struct kvm_ioeventfd { */ #define KVM_GET_VCPU_MMAP_SIZE_IO(KVMIO, 0x04) /* in bytes */ #define KVM_GET_SUPPORTED_CPUID _IOWR(KVMIO, 0x05, struct kvm_cpuid2) -/* - * ioctls for kvm trace - */ -#define KVM_TRACE_ENABLE _IOW(KVMIO, 0x06, struct kvm_user_trace_setup) -#define KVM_TRACE_PAUSE _IO(KVMIO, 0x07) -#define KVM_TRACE_DISABLE _IO(KVMIO, 0x08) +#define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06 +#define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07 +#define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08 + /* * Extension capability list. */ @@ -500,52 +544,54 @@ struct kvm_irqfd { /* * ioctls for VM fds */ -#define KVM_SET_MEMORY_REGION _IOW(KVMIO, 0x40, struct kvm_memory_region) +#define KVM_SET_MEMORY_REGION _IOW(KVMIO,
[PATCH 4/4] KVM: x86: Add VCPU substate for NMI states
This plugs an NMI-related hole in the VCPU synchronization between kernel and user space. So far, neither pending NMIs nor the inhibit NMI mask was properly read/set which was able to cause problems on vmsave/restore, live migration and system reset. Fix it by making use of the new VCPU substate interface. Signed-off-by: Jan Kiszka --- arch/x86/include/asm/kvm.h |7 +++ arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/svm.c | 22 ++ arch/x86/kvm/vmx.c | 30 ++ arch/x86/kvm/x86.c | 26 ++ 5 files changed, 87 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index 1b184c3..fd5713a 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -256,5 +256,12 @@ struct kvm_reinject_control { #define KVM_X86_VCPU_MSRS 1000 #define KVM_X86_VCPU_CPUID 1001 #define KVM_X86_VCPU_LAPIC 1002 +#define KVM_X86_VCPU_NMI 1003 + +struct kvm_nmi_state { + __u8 pending; + __u8 masked; + __u8 pad1[2]; +}; #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 179a919..d22a0cd 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -513,6 +513,8 @@ struct kvm_x86_ops { unsigned char *hypercall_addr); void (*set_irq)(struct kvm_vcpu *vcpu); void (*set_nmi)(struct kvm_vcpu *vcpu); + int (*get_nmi_mask)(struct kvm_vcpu *vcpu); + void (*set_nmi_mask)(struct kvm_vcpu *vcpu, int masked); void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr, bool has_error_code, u32 error_code); int (*interrupt_allowed)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 279a2ae..67ff5f1 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2456,6 +2456,26 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu) !(svm->vcpu.arch.hflags & HF_NMI_MASK); } +static int svm_get_nmi_mask(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + return !!(svm->vcpu.arch.hflags & HF_NMI_MASK); +} + +static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, int masked) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + if (masked) { + svm->vcpu.arch.hflags |= HF_NMI_MASK; + svm->vmcb->control.intercept |= (1UL << INTERCEPT_IRET); + } else { + svm->vcpu.arch.hflags &= ~HF_NMI_MASK; + svm->vmcb->control.intercept &= ~(1UL << INTERCEPT_IRET); + } +} + static int svm_interrupt_allowed(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); @@ -2897,6 +2917,8 @@ static struct kvm_x86_ops svm_x86_ops = { .queue_exception = svm_queue_exception, .interrupt_allowed = svm_interrupt_allowed, .nmi_allowed = svm_nmi_allowed, + .get_nmi_mask = svm_get_nmi_mask, + .set_nmi_mask = svm_set_nmi_mask, .enable_nmi_window = enable_nmi_window, .enable_irq_window = enable_irq_window, .update_cr8_intercept = update_cr8_intercept, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 70020e5..5dd766b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2619,6 +2619,34 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) GUEST_INTR_STATE_NMI)); } +static int vmx_get_nmi_mask(struct kvm_vcpu *vcpu) +{ + if (!cpu_has_virtual_nmis()) + return to_vmx(vcpu)->soft_vnmi_blocked; + else + return !!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & + GUEST_INTR_STATE_NMI); +} + +static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, int masked) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (!cpu_has_virtual_nmis()) { + if (vmx->soft_vnmi_blocked != masked) { + vmx->soft_vnmi_blocked = masked; + vmx->vnmi_blocked_time = 0; + } + } else { + if (masked) + vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO, + GUEST_INTR_STATE_NMI); + else + vmcs_clear_bits(GUEST_INTERRUPTIBILITY_INFO, + GUEST_INTR_STATE_NMI); + } +} + static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { return (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && @@ -3957,6 +3985,8 @@ static struct kvm_x86_ops vmx_x86_ops = { .queue_exception = vmx_queue_exception, .interrupt_allowed = vmx_interrupt_allowed, .nmi_allowed = vmx_nmi_allowed, + .get_nmi_mask = vmx_get_nmi_mask, + .set_nmi_mask = vmx_set_nmi_mask, .enable_nmi_window = enable_nmi_window, .e
[PATCH 3/4] KVM: x86: Add support for KVM_GET/SET_VCPU_STATE
Add support for getting/setting MSRs, CPUID tree, and the LACPIC via the new VCPU state interface. Also in this case we convert the existing IOCTLs to use the new infrastructure internally. The MSR interface has to be extended to pass back the number of processed MSRs via the header structure instead of the return code as the latter is not available with the new IOCTL. The semantic of the original KVM_GET/SET_MSRS is not affected by this change. Signed-off-by: Jan Kiszka --- arch/x86/include/asm/kvm.h |8 +- arch/x86/kvm/x86.c | 209 2 files changed, 138 insertions(+), 79 deletions(-) diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h index f02e87a..1b184c3 100644 --- a/arch/x86/include/asm/kvm.h +++ b/arch/x86/include/asm/kvm.h @@ -150,7 +150,7 @@ struct kvm_msr_entry { /* for KVM_GET_MSRS and KVM_SET_MSRS */ struct kvm_msrs { __u32 nmsrs; /* number of msrs in entries */ - __u32 pad; + __u32 nprocessed; /* return value: successfully processed entries */ struct kvm_msr_entry entries[0]; }; @@ -251,4 +251,10 @@ struct kvm_reinject_control { __u8 pit_reinject; __u8 reserved[31]; }; + +/* for KVM_GET/SET_VCPU_STATE */ +#define KVM_X86_VCPU_MSRS 1000 +#define KVM_X86_VCPU_CPUID 1001 +#define KVM_X86_VCPU_LAPIC 1002 + #endif /* _ASM_X86_KVM_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 839b1c5..733e2d3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1179,11 +1179,11 @@ static int __msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs *msrs, static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs, int (*do_msr)(struct kvm_vcpu *vcpu, unsigned index, u64 *data), - int writeback) + int writeback, int write_nprocessed) { struct kvm_msrs msrs; struct kvm_msr_entry *entries; - int r, n; + int r; unsigned size; r = -EFAULT; @@ -1204,15 +1204,22 @@ static int msr_io(struct kvm_vcpu *vcpu, struct kvm_msrs __user *user_msrs, if (copy_from_user(entries, user_msrs->entries, size)) goto out_free; - r = n = __msr_io(vcpu, &msrs, entries, do_msr); + r = __msr_io(vcpu, &msrs, entries, do_msr); if (r < 0) goto out_free; + msrs.nprocessed = r; + r = -EFAULT; + if (write_nprocessed && + copy_to_user(&user_msrs->nprocessed, &msrs.nprocessed, +sizeof(msrs.nprocessed))) + goto out_free; + if (writeback && copy_to_user(user_msrs->entries, entries, size)) goto out_free; - r = n; + r = msrs.nprocessed; out_free: vfree(entries); @@ -1785,55 +1792,36 @@ long kvm_arch_vcpu_ioctl(struct file *filp, { struct kvm_vcpu *vcpu = filp->private_data; void __user *argp = (void __user *)arg; + struct kvm_vcpu_substate substate; int r; - struct kvm_lapic_state *lapic = NULL; switch (ioctl) { - case KVM_GET_LAPIC: { - lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); - - r = -ENOMEM; - if (!lapic) - goto out; - r = kvm_vcpu_ioctl_get_lapic(vcpu, lapic); - if (r) - goto out; - r = -EFAULT; - if (copy_to_user(argp, lapic, sizeof(struct kvm_lapic_state))) - goto out; - r = 0; + case KVM_GET_LAPIC: + substate.type = KVM_X86_VCPU_LAPIC; + substate.offset = 0; + r = kvm_arch_vcpu_get_substate(vcpu, argp, &substate); break; - } - case KVM_SET_LAPIC: { - lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); - r = -ENOMEM; - if (!lapic) - goto out; - r = -EFAULT; - if (copy_from_user(lapic, argp, sizeof(struct kvm_lapic_state))) - goto out; - r = kvm_vcpu_ioctl_set_lapic(vcpu, lapic); - if (r) - goto out; - r = 0; + case KVM_SET_LAPIC: + substate.type = KVM_X86_VCPU_LAPIC; + substate.offset = 0; + r = kvm_arch_vcpu_set_substate(vcpu, argp, &substate); break; - } case KVM_INTERRUPT: { struct kvm_interrupt irq; r = -EFAULT; if (copy_from_user(&irq, argp, sizeof irq)) - goto out; + break; r = kvm_vcpu_ioctl_interrupt(vcpu, &irq); if (r) - goto out; + break; r = 0; break;
Re: Problem booting guest with Linux 2.6.3x
Hi Michael. On Saturday, 10 October 2009 20:10:16 +0400, Michael Tokarev wrote: > >>>But according to it seems, I could verify that the disks that are > >>>passed with -hdX in KVM-88 are mapped in 2.6.31.2 guests like > >>>SATA/SCSI devices. With Linux stock 2.6.26 these are mapped like > >>>IDE disks. Can it be due to some change in the kernel code related > >>>with KVM? > >>It has nothing to do with kvm. It's different kernel options, all > >>kernels since very early 2.6.x are able to see ide disks as hdX or > >>sdX, depending on the kernel options and modules loaded. There are > >>2 drivers for each IDE controller - IDE/ATA one, which creates hdX, > >>and PATA one which creates sdX. > >According to I was investigating, I have the impression that the > >newest kernels delegate this disks denomination to the use of libata. > >It would be that in 2.6.26 Debian stock kernel not yet was productive > >to be in experimental phase? > Debian "stock" kernel config does not enable ata devices, only ide ones. Apparently the Debian GNU/Linux stock kernels has applied a patch [1] which causes that libata only is enabled for SATA controllers. It draws attention to me that being 2.6.31 the last branch of stable kernel from kernel.org, the Debian developers are applying this patch. I had thought that at the moment libata was sufficiently stable. Thanks for your reply. Regards, Daniel [1] http://svn.debian.org/viewsvn/kernel/dists/trunk/linux-2.6/debian/patches/debian/drivers-ata-ata_piix-postpone-pata.patch?revision=13847&view=markup -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Squeeze - Linux user #188.598 signature.asc Description: Digital signature
[PATCH] device assignment rom fixups
Use new rom loading infrastructure. Devices can simply register option roms now. Signed-off-by: Gerd Hoffmann --- hw/device-assignment.c | 144 --- hw/device-assignment.h |1 - hw/pc.c|3 - 3 files changed, 61 insertions(+), 87 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 237060f..6f792db 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -37,6 +37,7 @@ #include "sysemu.h" #include "console.h" #include "device-assignment.h" +#include "loader.h" /* From linux/ioport.h */ #define IORESOURCE_IO 0x0100 /* Resource type */ @@ -56,6 +57,8 @@ #define DEBUG(fmt, ...) do { } while(0) #endif +static void assigned_dev_load_option_rom(AssignedDevice *dev); + static uint32_t guest_to_host_ioport(AssignedDevRegion *region, uint32_t addr) { return region->u.r_baseport + (addr - region->e_physbase); @@ -1168,6 +1171,7 @@ static int assigned_initfn(struct PCIDevice *pci_dev) if (assigned_dev_register_msix_mmio(dev)) goto assigned_out; +assigned_dev_load_option_rom(dev); return 0; assigned_out: @@ -1329,11 +1333,10 @@ struct option_rom_pci_header { * both 2KB and target page size. */ #define OPTION_ROM_ALIGN(x) (((x) + 2047) & ~2047) -static int scan_option_rom(uint8_t devfn, void *roms, ram_addr_t offset) +static void scan_option_rom(const char *name, uint8_t devfn, void *roms) { -int i, size, total_size; +int i, size; uint8_t csum; -ram_addr_t addr; struct option_rom_header *rom; struct option_rom_pci_header *pcih; @@ -1362,29 +1365,12 @@ static int scan_option_rom(uint8_t devfn, void *roms, ram_addr_t offset) rom = (struct option_rom_header *)((char *)rom + size); } - -return 0; +return; found: -/* The size should be both 2K-aligned and page-aligned */ -total_size = (TARGET_PAGE_SIZE < 2048) - ? OPTION_ROM_ALIGN(size + 1) - : TARGET_PAGE_ALIGN(size + 1); - -/* Size of all available ram space is 0x1 (0xd to 0xe) */ -if ((offset + total_size) > 0x1u) { -fprintf(stderr, "Option ROM size %x exceeds available space\n", size); -return 0; -} - -addr = qemu_ram_alloc(total_size); -cpu_register_physical_memory(0xd + offset, total_size, addr | IO_MEM_ROM); - -/* Write ROM data and devfn to phys_addr */ -cpu_physical_memory_write_rom(0xd + offset, (uint8_t *)rom, size); -cpu_physical_memory_write_rom(0xd + offset + size, &devfn, 1); - -return total_size; +rom_add_blob(name ? name : "assigned device", rom, size, + PC_ROM_MIN_OPTION, PC_ROM_MAX, PC_ROM_ALIGN); +return; } /* @@ -1392,75 +1378,67 @@ static int scan_option_rom(uint8_t devfn, void *roms, ram_addr_t offset) * load the corresponding ROM data to RAM. If an error occurs while loading an * option ROM, we just ignore that option ROM and continue with the next one. */ -ram_addr_t assigned_dev_load_option_roms(ram_addr_t rom_base_offset) +static void assigned_dev_load_option_rom(AssignedDevice *dev) { -ram_addr_t offset = rom_base_offset; -AssignedDevice *dev; - -QLIST_FOREACH(dev, &devs, next) { -int size, len; -void *buf; -FILE *fp; -uint8_t i = 1; -char rom_file[64]; +int size, len; +void *buf; +FILE *fp; +uint8_t i = 1; +char rom_file[64]; -snprintf(rom_file, sizeof(rom_file), - "/sys/bus/pci/devices/:%02x:%02x.%01x/rom", - dev->host.bus, dev->host.dev, dev->host.func); +snprintf(rom_file, sizeof(rom_file), + "/sys/bus/pci/devices/:%02x:%02x.%01x/rom", + dev->host.bus, dev->host.dev, dev->host.func); -if (access(rom_file, F_OK)) -continue; - -/* Write something to the ROM file to enable it */ -fp = fopen(rom_file, "wb"); -if (fp == NULL) -continue; -len = fwrite(&i, 1, 1, fp); -fclose(fp); -if (len != 1) -continue; - -/* The file has to be closed and reopened, otherwise it won't work */ -fp = fopen(rom_file, "rb"); -if (fp == NULL) -continue; +if (access(rom_file, F_OK)) +return; -fseek(fp, 0, SEEK_END); -size = ftell(fp); -fseek(fp, 0, SEEK_SET); +/* Write something to the ROM file to enable it */ +fp = fopen(rom_file, "wb"); +if (fp == NULL) +return; +len = fwrite(&i, 1, 1, fp); +fclose(fp); +if (len != 1) +return; -buf = malloc(size); -if (buf == NULL) { -fclose(fp); -continue; -} +/* The file has to be closed and reopened, otherwise it won't work */ +fp = fopen(rom_file, "rb"); +if (fp == NULL) +return; -fread(buf, size, 1, fp); -
[PATCH] fix quoting in configure
Signed-off-by: Gerd Hoffmann --- configure |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/configure b/configure index 2341772..b0d5bd9 100755 --- a/configure +++ b/configure @@ -1414,7 +1414,7 @@ if test "$kvm_cap_pit" != "no" ; then #endif int main(void) { return 0; } EOF - if compile_prog $kvm_cflags ""; then + if compile_prog "$kvm_cflags" ""; then kvm_cap_pit=yes else if test "$kvm_cap_pit" = "yes" ; then @@ -1438,7 +1438,7 @@ if test "$kvm_cap_device_assignment" != "no" ; then #endif int main(void) { return 0; } EOF - if compile_prog $kvm_cflags "" ; then + if compile_prog "$kvm_cflags" "" ; then kvm_cap_device_assignment=yes else if test "$kvm_cap_device_assignment" = "yes" ; then -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel bug in kvm_intel
On Tue, Oct 13, 2009 at 08:50:07AM +0200, Avi Kivity wrote: > On 10/12/2009 08:42 PM, Andrew Theurer wrote: >> On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote: >> >>> On 10/09/2009 10:04 PM, Andrew Theurer wrote: >>> This is on latest master branch on kvm.git and qemu-kvm.git, running 12 Windows Server2008 VMs, and using oprofile. I ran again without oprofile and did not get the BUG. I am wondering if anyone else is seeing this. Thanks, -Andrew > Oct 9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel > paging request at 9fe9a2b4 > Oct 9 11:55:13 virtvictory-eth0 kernel: IP: [] > vmx_vcpu_run+0x26d/0x64f [kvm_intel] > >>> Can you run this through objdump or gdb to see what source this >>> corresponds to? >>> >>> >> Somewhere here I think (?) >> >> objdump -d >> > > > Look at the address where vmx_vcpu_run starts, add 0x26d, and show the > surrounding code. > > Thinking about it, it probably _is_ what you showed, due to module page > alignment. But please verify this; I can't reconcile the fault address > (9fe9a2b) with %rsp at the time of the fault. There's some scary erratas (such as corrupted RSP pushed on the stack on event injected, including NMI which is used by oprofile, right after VMExit, AAK56) on the Xeon X55xx spec update. Andrew, you might make sure the firmware/BIOS is uptodate on this machine before reproducing. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel bug in kvm_intel
On Tue, 2009-10-13 at 08:50 +0200, Avi Kivity wrote: > On 10/12/2009 08:42 PM, Andrew Theurer wrote: > > On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote: > > > >> On 10/09/2009 10:04 PM, Andrew Theurer wrote: > >> > >>> This is on latest master branch on kvm.git and qemu-kvm.git, running > >>> 12 Windows Server2008 VMs, and using oprofile. I ran again without > >>> oprofile and did not get the BUG. I am wondering if anyone else is > >>> seeing this. > >>> > >>> Thanks, > >>> > >>> -Andrew > >>> > >>> > Oct 9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel > paging request at 9fe9a2b4 > Oct 9 11:55:13 virtvictory-eth0 kernel: IP: [] > vmx_vcpu_run+0x26d/0x64f [kvm_intel] > > >> Can you run this through objdump or gdb to see what source this > >> corresponds to? > >> > >> > > Somewhere here I think (?) > > > > objdump -d > > > > > Look at the address where vmx_vcpu_run starts, add 0x26d, and show the > surrounding code. > > Thinking about it, it probably _is_ what you showed, due to module page > alignment. But please verify this; I can't reconcile the fault address > (9fe9a2b) with %rsp at the time of the fault. Here is the start of the function: > 3884 : > 3884: 55 push %rbp > 3885: 48 89 e5mov%rsp,%rbp and 0x26d later is 0x3af1: > 3ad2: 4c 8b b1 88 01 00 00mov0x188(%rcx),%r14 > 3ad9: 4c 8b b9 90 01 00 00mov0x190(%rcx),%r15 > 3ae0: 48 8b 89 20 01 00 00mov0x120(%rcx),%rcx > 3ae7: 75 05 jne3aee > 3ae9: 0f 01 c2vmlaunch > 3aec: eb 03 jmp3af1 > 3aee: 0f 01 c3vmresume > 3af1: 48 87 0c 24 xchg %rcx,(%rsp) > 3af5: 48 89 81 18 01 00 00mov%rax,0x118(%rcx) > 3afc: 48 89 99 30 01 00 00mov%rbx,0x130(%rcx) > 3b03: ff 34 24pushq (%rsp) > 3b06: 8f 81 20 01 00 00 popq 0x120(%rcx) -Andrew -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] v2: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This new proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. [ v2: uses a struct with a padding ] Signed-off-by: Glauber Costa --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/x86.c | 20 +++- include/linux/kvm.h |6 ++ 3 files changed, 26 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 179a919..c9b0d9f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -410,6 +410,7 @@ struct kvm_arch{ unsigned long irq_sources_bitmap; u64 vm_init_tsc; + s64 kvmclock_offset; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 9601bc6..1b6c193 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -699,7 +699,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) /* With all the info we got, fill in the values */ vcpu->hv_clock.system_time = ts.tv_nsec + -(NSEC_PER_SEC * (u64)ts.tv_sec); +(NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset; + /* * The interface expects us to write an even number signaling that the * update is finished. Since the guest won't see the intermediate @@ -2441,6 +2442,23 @@ long kvm_arch_vm_ioctl(struct file *filp, r = 0; break; } + case KVM_ADJUST_CLOCK: { + struct timespec now; + struct kvm_adjust_clock user_ns; + u64 now_ns; + long delta; + + r = -EFAULT; + if (copy_from_user(&user_ns, argp, sizeof(user_ns))) + goto out; + + r = 0; + ktime_get_ts(&now); + now_ns = timespec_to_ns(&now); + delta = user_ns.clock - now_ns; + kvm->arch.kvmclock_offset = delta; + break; + } default: ; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index f8f8900..c07fc23 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -497,6 +497,11 @@ struct kvm_irqfd { __u8 pad[20]; }; +struct kvm_adjust_clock { + __u64 clock; + __u64 pad[2]; +}; + /* * ioctls for VM fds */ @@ -546,6 +551,7 @@ struct kvm_irqfd { #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct kvm_pit_config) #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78) #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) +#define KVM_ADJUST_CLOCK _IOW(KVMIO, 0x7a, struct kvm_adjust_clock) /* * ioctls for vcpu fds -- 1.6.2.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH] Little bug fix in pci_hotplug.py
Applied, thanks! On Tue, Oct 13, 2009 at 6:13 AM, Yolkfull Chow wrote: > If command executed timeout, the return value of status could be None, > which is missed in judge statement: > > if s: > ... > > Thanks Jason Wang for pointing this out. > > Signed-off-by: Yolkfull Chow > --- > client/tests/kvm/tests/pci_hotplug.py | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/client/tests/kvm/tests/pci_hotplug.py > b/client/tests/kvm/tests/pci_hotplug.py > index 01d9447..3ad9ea2 100644 > --- a/client/tests/kvm/tests/pci_hotplug.py > +++ b/client/tests/kvm/tests/pci_hotplug.py > @@ -83,7 +83,7 @@ def run_pci_hotplug(test, params, env): > > # Test the newly added device > s, o = session.get_command_status_output(params.get("pci_test_cmd")) > - if s: > + if s != 0: > raise error.TestFail("Check for %s device failed after PCI hotplug. " > "Output: %s" % (test_type, o)) > > -- > 1.6.2.5 > > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] allow userspace to adjust kvmclock offset
On Tue, Oct 13, 2009 at 03:31:08PM +0300, Avi Kivity wrote: > On 10/13/2009 03:28 PM, Glauber Costa wrote: >> >>> Do we want an absolute or relative adjustment? >>> >> What exactly do you mean? >> > > Absolute adjustment: clock = t > Relative adjustment: clock += t The delta is absolute, but the adjustment in the clock is relative. So we pick the difference between what userspace is passing us and what we currently have, then relatively adds up so we can make sure we won't go back or suffer a too big skew. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] allow userspace to adjust kvmclock offset
On 10/13/2009 03:28 PM, Glauber Costa wrote: Do we want an absolute or relative adjustment? What exactly do you mean? Absolute adjustment: clock = t Relative adjustment: clock += t -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fix msr list
On Mon, Oct 12, 2009 at 02:50:27PM -0300, Marcelo Tosatti wrote: > On Tue, Oct 06, 2009 at 01:24:50PM -0400, Glauber Costa wrote: > > For a while now, we are issuing a rdmsr instruction to find out which msrs > > in our > > save list are really supported by the underlying machine. However, it fails > > to account > > for kvm-specific msrs, such as the pvclock ones. > > > > This patch moves then to the beginning of the list, and skip testing them. > > > > Signed-off-by: Glauber Costa > > Applied, thanks. > > But this alone won't fix migration to include the two pvclock MSRs > right? Yes, exactly. It will just make userspace see the correct list, but we'd still have to take action based on it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] allow userspace to adjust kvmclock offset
On Mon, Oct 12, 2009 at 10:53:26AM +0200, Avi Kivity wrote: > On 10/06/2009 07:24 PM, Glauber Costa wrote: >> When we migrate a kvm guest that uses pvclock between two hosts, we may >> suffer a large skew. This is because there can be significant differences >> between the monotonic clock of the hosts involved. When a new host with >> a much larger monotonic time starts running the guest, the view of time >> will be significantly impacted. >> >> Situation is much worse when we do the opposite, and migrate to a host with >> a smaller monotonic clock. >> >> This new proposed ioctl will allow userspace to inform us what is the >> monotonic >> clock value in the source host, so we can keep the time skew short, and more >> importantly, never goes backwards. >> >> > >> diff --git a/include/linux/kvm.h b/include/linux/kvm.h >> index f8f8900..0cd5ad8 100644 >> --- a/include/linux/kvm.h >> +++ b/include/linux/kvm.h >> @@ -546,6 +546,7 @@ struct kvm_irqfd { >> #define KVM_CREATE_PIT2 _IOW(KVMIO, 0x77, struct >> kvm_pit_config) >> #define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78) >> #define KVM_IOEVENTFD _IOW(KVMIO, 0x79, struct kvm_ioeventfd) >> +#define KVM_ADJUST_CLOCK _IOW(KVMIO, 0x7a, __u64) >> > > Please change to a struct with some reserved space. Ok, can do it. > > Do we want an absolute or relative adjustment? What exactly do you mean? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] Don't sync mpstate to/from kernel when unneeded.
mp_state, unlike other cpu state, can be changed not only from vcpu context it belongs to, but by other vcpus too. That makes its loading from kernel/saving back not safe if mp_state value is changed inside kernel between load and save. For example vcpu 1 loads mp_sate into user-space and the state is RUNNING, vcpu 0 sends INIT/SIPI to vcpu 1 so in-kernel mp_sate becomes SIPI, vcpu 1 save user-space copy into kernel and calls vcpu_run(). SIPI sate is lost. The patch copies mp_sate into kernel only when it is knows that int-kernel value is outdated. This happens on reset and vmload. Signed-off-by: Gleb Natapov --- hw/apic.c |1 + monitor.c |2 ++ qemu-kvm.c|9 - qemu-kvm.h|1 - target-i386/machine.c |3 +++ 5 files changed, 10 insertions(+), 6 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 2952675..729 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -512,6 +512,7 @@ void apic_init_reset(CPUState *env) if (kvm_enabled() && qemu_kvm_irqchip_in_kernel()) { env->mp_state = env->halted ? KVM_MP_STATE_UNINITIALIZED : KVM_MP_STATE_RUNNABLE; +kvm_load_mpstate(env); } #endif } diff --git a/monitor.c b/monitor.c index 7f0f5a9..dd8f2ca 100644 --- a/monitor.c +++ b/monitor.c @@ -350,6 +350,7 @@ static CPUState *mon_get_cpu(void) mon_set_cpu(0); } cpu_synchronize_state(cur_mon->mon_cpu); +kvm_save_mpstate(cur_mon->mon_cpu); return cur_mon->mon_cpu; } @@ -377,6 +378,7 @@ static void do_info_cpus(Monitor *mon) for(env = first_cpu; env != NULL; env = env->next_cpu) { cpu_synchronize_state(env); +kvm_save_mpstate(env); monitor_printf(mon, "%c CPU #%d:", (env == mon->mon_cpu) ? '*' : ' ', env->cpu_index); diff --git a/qemu-kvm.c b/qemu-kvm.c index 3765818..2a1e0ff 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1609,11 +1609,6 @@ static void on_vcpu(CPUState *env, void (*func)(void *data), void *data) void kvm_arch_get_registers(CPUState *env) { kvm_arch_save_regs(env); - kvm_arch_save_mpstate(env); -#ifdef KVM_CAP_MP_STATE - if (kvm_irqchip_in_kernel(kvm_context)) - env->halted = (env->mp_state == KVM_MP_STATE_HALTED); -#endif } static void do_kvm_cpu_synchronize_state(void *_env) @@ -1707,6 +1702,10 @@ static void kvm_do_save_mpstate(void *_env) CPUState *env = _env; kvm_arch_save_mpstate(env); +#ifdef KVM_CAP_MP_STATE +if (kvm_irqchip_in_kernel(kvm_context)) +env->halted = (env->mp_state == KVM_MP_STATE_HALTED); +#endif } void kvm_save_mpstate(CPUState *env) diff --git a/qemu-kvm.h b/qemu-kvm.h index d6748c7..e2a87b8 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -1186,7 +1186,6 @@ void kvm_arch_get_registers(CPUState *env); static inline void kvm_arch_put_registers(CPUState *env) { kvm_load_registers(env); -kvm_load_mpstate(env); } void kvm_cpu_synchronize_state(CPUState *env); diff --git a/target-i386/machine.c b/target-i386/machine.c index e640dad..16d9c57 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -324,6 +324,7 @@ static void cpu_pre_save(void *opaque) int i, bit; cpu_synchronize_state(env); +kvm_save_mpstate(env); /* FPU */ env->fpus_vmstate = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11; @@ -385,6 +386,8 @@ static int cpu_post_load(void *opaque, int version_id) } tlb_flush(env, 1); +kvm_load_mpstate(env); + return 0; } -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Complete cpu initialization before signaling main thread.
Otherwise some cpus may start executing code before others are fully initialized. Signed-off-by: Gleb Natapov --- qemu-kvm.c | 26 -- 1 files changed, 12 insertions(+), 14 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 62ca050..3765818 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1954,18 +1954,6 @@ static void process_irqchip_events(CPUState *env) static int kvm_main_loop_cpu(CPUState *env) { -setup_kernel_sigmask(env); - -pthread_mutex_lock(&qemu_mutex); - -kvm_arch_init_vcpu(env); -#ifdef TARGET_I386 -kvm_tpr_vcpu_start(env); -#endif - -cpu_single_env = env; -kvm_arch_load_regs(env); - while (1) { int run_cpu = !is_cpu_stopped(env); if (run_cpu && !kvm_irqchip_in_kernel(kvm_context)) { @@ -2003,15 +1991,25 @@ static void *ap_main_loop(void *_env) on_vcpu(env, kvm_arch_do_ioperm, data); #endif -/* signal VCPU creation */ +setup_kernel_sigmask(env); + pthread_mutex_lock(&qemu_mutex); +cpu_single_env = env; + +kvm_arch_init_vcpu(env); +#ifdef TARGET_I386 +kvm_tpr_vcpu_start(env); +#endif + +kvm_arch_load_regs(env); + +/* signal VCPU creation */ current_env->created = 1; pthread_cond_signal(&qemu_vcpu_cond); /* and wait for machine initialization */ while (!qemu_system_ready) qemu_cond_wait(&qemu_system_cond); -pthread_mutex_unlock(&qemu_mutex); kvm_main_loop_cpu(env); return NULL; -- 1.6.3.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-AUTOTEST,01/17] Add new module kvm_subprocess
- "Chen Cao" wrote: > On Mon, Oct 12, 2009 at 09:07:45AM -0400, Michael Goldish wrote: > > You're right, currently the sessions must be closed explicitly. > > This is due to the fact that both qemu and ssh/telnet are handled by > the > > same code, and qemu has to keep running in the background if we want > to > > pass it from one test to another. > > > > To deal with this, first, we should use try..finally blocks to close > all > > sessions in tests. As far as I know all existing tests (or at least > most > > of them) already do this. > > > > Second, we can add a destructor function to kvm_shell_session that > will > > close the session automatically when it's no longer referenced. At > the > > moment this won't work because there's a thread running in the > background > > tracking output from the session, but this thread is usually not > needed > > for ssh/telnet (it's needed mainly for qemu), so we can selectively > get > > rid of it and allow the reference count to drop to zero when the > test exits, > > thus allowing the destructor to be called. > > > > I'll think of a way to do the second thing, and if it works, maybe > we won't > > need the first. But for now every test should close its sessions > explicitly. > > > > BTW, I'm not sure I understand why cleaning up the sessions should > be > > exhausting in the case you presented. You can just wrap everything > in one > > big try..finally block: > > > > session = ... > > > > try: > > try: > > except: > > try: > > except: > > ... > > finally: > > session.close() > > > > Thanks for your explanation. > > It is just boring and error-prone to add lots of > '(dst|src|tmp|etc)*sesson.close()' to our code (the internal version) > into different files and big number of functions. and some of the > 'sessions' are in the try...except blocks, and some are not. > > We have to make sure where we started the sessions and to close all > of > them when they are not needed any longer. I feel it's a little weird > that we have to do the garbage-collection-like work while using > python. > > so, since this is a known issue, or precisely, limitation of 'ease of > use', I'm looking forward to your impovement, and I think, we will > also try to work it out at the same time. > > Thanks again for your help. > > > Cao, Chen > 2009-10-13 OK, agreed. I posted 3 patches that should fix this but I've only given them minimal testing. Let me know what you think. Thanks, Michael > > - Original Message - > > From: "Chen Cao" > > To: "Michael Goldish" > > Cc: autot...@test.kernel.org, kvm@vger.kernel.org > > Sent: Monday, October 12, 2009 8:55:59 AM (GMT+0200) Auto-Detected > > Subject: Re: [KVM-AUTOTEST,01/17] Add new module kvm_subprocess > > > > > > Hi, Michael, > > > > I found that if the sessions initialized using kvm_subprcoess are > not closed, > > the processes will never exit, and /tmp/kvm_spawn will be filled > with the > > temporary files. > > > > And we can find in the code, > > # kvm_subprocess.py > > ... > > # Read from child and write to files/pipes > > while True: > > check_termination = False > > # Make a list of reader pipes whose buffers are not > empty > > fds = [fd for (i, fd) in enumerate(reader_fds) if > buffers[i]] > > # Wait until there's something to do > > r, w, x = select.select([shell_fd, inpipe_fd], fds, [], > 0.5) > > # If a reader pipe is ready for writing -- > > for (i, fd) in enumerate(reader_fds): > > if fd in w: > > bytes_written = os.write(fd, buffers[i]) > > buffers[i] = buffers[i][bytes_written:] > > # If there's data to read from the child process -- > > if shell_fd in r: > > try: > > data = os.read(shell_fd, 16384) > > except OSError: > > data = "" > > if not data: > > check_termination = True > > # Remove carriage returns from the data -- they > often cause > > # trouble and are normally not needed > > data = data.replace("\r", "") > > output_file.write(data) > > output_file.flush() > > for i in range(len(readers)): > > buffers[i] += data > > # If os.read() raised an exception or there was nothing > to read -- > > if check_termination or shell_fd not in r: > > pid, status = os.waitpid(shell_pid, os.WNOHANG) > > if pid: > > status = os.WEXITSTATUS(status) > > break > > # If there's data to read from the client -- > > if inpipe_fd in r: > > data = os.read(inpipe_fd, 1024) > > os.write(shell_fd, data) > > ... > > > > that if session.close() is no
[KVM-AUTOTEST PATCH 3/3] KVM test: kvm_subprocess.py: automatically close unreferenced shell sessions
Note that if a session has a tracking thread (i.e. if output_func or termination_func are set to something other than None) then the session will not be garbage collected (it must be closed explicitly by the test). Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_subprocess.py |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/kvm_subprocess.py b/client/tests/kvm/kvm_subprocess.py index a625315..859aa2b 100755 --- a/client/tests/kvm/kvm_subprocess.py +++ b/client/tests/kvm/kvm_subprocess.py @@ -1010,6 +1010,10 @@ class kvm_shell_session(kvm_expect): self.status_test_command) +def __del__(self): +self.close() + + def set_prompt(self, prompt): """ Set the prompt attribute for later use by read_up_to_prompt. -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 2/3] KVM test: kvm_subprocess.py: use only unbound methods as close() hooks
close() will pass 'self' as a parameter to the hook functions, i.e. it will call hook(self) instead of just hook(), thus allowing the use of unbound methods rather than bound ones. This allows us to avoid self referencing: if a bound method is used, a reference to it is kept in the class instance, and if the method is bound to the same instance then we have a self-reference that prevents garbage collection. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_subprocess.py |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/kvm_subprocess.py b/client/tests/kvm/kvm_subprocess.py index ede8081..a625315 100755 --- a/client/tests/kvm/kvm_subprocess.py +++ b/client/tests/kvm/kvm_subprocess.py @@ -490,7 +490,7 @@ class kvm_spawn: _wait(self.lock_server_running_filename) # Call all cleanup routines for hook in self.close_hooks: -hook() +hook(self) # Close reader file descriptors for fd in self.reader_fds.values(): try: @@ -583,7 +583,7 @@ class kvm_tail(kvm_spawn): """ # Add a reader and a close hook self._add_reader("tail") -self._add_close_hook(self._join_thread) +self._add_close_hook(kvm_tail._join_thread) # Init the superclass kvm_spawn.__init__(self, command, id, echo, linesep) -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-AUTOTEST PATCH 1/3] KVM test: kvm_subprocess.py: do not start tail thread by default
Start the tail thread only if the user specifies a non-None output_func or termination_func. Signed-off-by: Michael Goldish --- client/tests/kvm/kvm_subprocess.py | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/client/tests/kvm/kvm_subprocess.py b/client/tests/kvm/kvm_subprocess.py index 2ac062a..ede8081 100755 --- a/client/tests/kvm/kvm_subprocess.py +++ b/client/tests/kvm/kvm_subprocess.py @@ -596,9 +596,10 @@ class kvm_tail(kvm_spawn): self.output_prefix = output_prefix # Start the thread in the background +self.tail_thread = None self.__thread_kill_requested = False -self.tail_thread = threading.Thread(None, self._tail) -self.tail_thread.start() +if termination_func or output_func: +self._start_thread() def __getinitargs__(self): @@ -617,6 +618,8 @@ class kvm_tail(kvm_spawn): Must take a single parameter -- the exit status. """ self.termination_func = termination_func +if termination_func and not self.tail_thread: +self._start_thread() def set_termination_params(self, termination_params): @@ -637,6 +640,8 @@ class kvm_tail(kvm_spawn): output from the process. Must take a single string parameter. """ self.output_func = output_func +if output_func and not self.tail_thread: +self._start_thread() def set_output_params(self, output_params): @@ -726,6 +731,11 @@ class kvm_tail(kvm_spawn): pass +def _start_thread(self): +self.tail_thread = threading.Thread(None, self._tail) +self.tail_thread.start() + + def _join_thread(self): # Wait for the tail thread to exit if self.tail_thread: -- 1.5.4.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] virtio: adding __devexit to virtballoon_remove
Thanks, I already have this from Uwe. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Little bug fix in pci_hotplug.py
If command executed timeout, the return value of status could be None, which is missed in judge statement: if s: ... Thanks Jason Wang for pointing this out. Signed-off-by: Yolkfull Chow --- client/tests/kvm/tests/pci_hotplug.py |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/tests/pci_hotplug.py b/client/tests/kvm/tests/pci_hotplug.py index 01d9447..3ad9ea2 100644 --- a/client/tests/kvm/tests/pci_hotplug.py +++ b/client/tests/kvm/tests/pci_hotplug.py @@ -83,7 +83,7 @@ def run_pci_hotplug(test, params, env): # Test the newly added device s, o = session.get_command_status_output(params.get("pci_test_cmd")) -if s: +if s != 0: raise error.TestFail("Check for %s device failed after PCI hotplug. " "Output: %s" % (test_type, o)) -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: QemuOpts changes breaks multiple nic options
Hi Tom, On Mon, 2009-10-12 at 17:05 -0500, Tom Lendacky wrote: > The recent change to QemuOpts for the -net nic option breaks specifying -net > nic,... more than once. The net_init_nic function's return value in net.c is > a table index, which is non-zero after the first time it is called. The > qemu_opts_foreach function in qemu-option.c receives the non-zero return > value > and stops processing further -net options (like associated -net tap options). > > It looks like the usb net function makes use of the index value, so the fix > might best be to have qemu_opts_foreach check for a return code < 0 as being > an error? Thanks for the report; I sent a patch to qemu-devel yesterday: http://lists.gnu.org/archive/html/qemu-devel/2009-10/msg01070.html Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Biweekly KVM Test report, kernel 94252... qemu 5cc3c...
On Monday, October 05, 2009 7:00 PM Avi Kivity wrote: > On 09/29/2009 05:34 AM, Xu, Jiajun wrote: >> Hi All, >> >> This Weekly KVM Testing Report against lastest kvm.git >> 94252a58662dc4ca6191eac479efb40e0716865c and qemu-kvm.git >> 5cc3cfb6c2254483ae324da407a13307fe7355f3. >> >> Qemu-kvm tree build issue is fixed by qemu commit > 781774b38c90797add71d029b7fbee43200c66d4. >> There is no other new bug found in this two weeks. There are > 7 old bugs open in bug tracking. >> >> >> Seven Old Issues: >> >> 1. Guest hang with exhausted IRQ sources error if 8 VFs assigned >> > https://sourceforge.net/tracker/?func=detail&aid=2847560&group_ > id=180599&atid=893831 >> > > Does the attached patch fix this issue? With attached patch, VF can not be enabled with following error: igb :01:00.0: can't find IRQ for PCI INT A; probably buggy MP table igb :01:00.0: setting latency timer to 64 igb :01:00.0: irq 88 for MSI/MSI-X igb :01:00.0: irq 89 for MSI/MSI-X igb :01:00.0: irq 90 for MSI/MSI-X igb :01:00.0: irq 91 for MSI/MSI-X igb :01:00.0: irq 92 for MSI/MSI-X igb :01:00.0: irq 93 for MSI/MSI-X igb :01:00.0: irq 94 for MSI/MSI-X igb :01:00.0: irq 95 for MSI/MSI-X igb :01:00.0: irq 96 for MSI/MSI-X igb :01:00.0: Intel(R) Gigabit Ethernet Network Connection igb :01:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 00:30:48:cb:79:e8 igb :01:00.0: eth0: PBA No: 0010ff-0ff igb :01:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) igb :01:00.1: can't find IRQ for PCI INT B; probably buggy MP table igb :01:00.1: setting latency timer to 64 igb :01:00.1: irq 97 for MSI/MSI-X igb :01:00.1: irq 98 for MSI/MSI-X igb :01:00.1: irq 99 for MSI/MSI-X igb :01:00.1: irq 100 for MSI/MSI-X igb :01:00.1: irq 101 for MSI/MSI-X igb :01:00.1: irq 102 for MSI/MSI-X igb :01:00.1: irq 103 for MSI/MSI-X igb :01:00.1: irq 104 for MSI/MSI-X igb :01:00.1: irq 105 for MSI/MSI-X igb :01:00.1: Intel(R) Gigabit Ethernet Network Connection igb :01:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 00:30:48:cb:79:e9 igb :01:00.1: eth1: PBA No: 0010ff-0ff igb :01:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s) Best Regards Jiajun-- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel bug in kvm_intel
On 10/12/2009 08:42 PM, Andrew Theurer wrote: On Sun, 2009-10-11 at 07:19 +0200, Avi Kivity wrote: On 10/09/2009 10:04 PM, Andrew Theurer wrote: This is on latest master branch on kvm.git and qemu-kvm.git, running 12 Windows Server2008 VMs, and using oprofile. I ran again without oprofile and did not get the BUG. I am wondering if anyone else is seeing this. Thanks, -Andrew Oct 9 11:55:13 virtvictory-eth0 kernel: BUG: unable to handle kernel paging request at 9fe9a2b4 Oct 9 11:55:13 virtvictory-eth0 kernel: IP: [] vmx_vcpu_run+0x26d/0x64f [kvm_intel] Can you run this through objdump or gdb to see what source this corresponds to? Somewhere here I think (?) objdump -d Look at the address where vmx_vcpu_run starts, add 0x26d, and show the surrounding code. Thinking about it, it probably _is_ what you showed, due to module page alignment. But please verify this; I can't reconcile the fault address (9fe9a2b) with %rsp at the time of the fault. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: I/O performance of VirtIO
Michael Tokarev wrote: > René Pfeiffer wrote: >> Hello! >> >> I just tested qemu-kvm-0.11.0 with the KVM module of kernel 2.6.31.1. I >> noticed that the I/O performance of an unattended stock Debian Lenny >> install dropped somehow. The test machines ran with kvm-88 and 2.6.30.x >> before. The difference is very noticeable (went from about 5 minutes up >> to 15-25 minutes). The two test machines have different CPUs (one is an >> Intel Core2 CPU, the other runs with an AMD Athlon 64 X2 Dual). >> >> Is this the effect of added code regarding caching/data integrity to the >> VirtIO block layer or somewhere else? The qemu-system-x86_64 seems to >> hang a lot more in heavy I/O (showing 'D' in top/htop). >> >> The command line is quite straight-forward: >> qemu-system-x86_64 -drive file=debian.qcow2,if=virtio,boot=on -cdrom \ >> /srv/isos/debian-502-i386-netinst.iso -smp 2 -boot d -m 512 -net nic \ >> -net user -usb > ^ > > Care to try with something more real than user-level networking? > You're using netinstall which - apparently - tries to use some > networking d/loading components etc, and userlevel networking is > known to be very very slow It can be particularly slow if you use in-kernel irqchips and the default NIC emulation (up to 10 times slower), some effect I always wanted to understand on a rainy day. So, when you actually want -net user, try -no-kvm-irqchip. Jan signature.asc Description: OpenPGP digital signature