Re: [PATCH: kvm 4/5] Fix hotremove of CPUs for KVM.
On 09/26/2009 10:54 PM, Avi Kivity wrote: First, I'm not sure per_cpu works for possible but not actual cpus. Second, we now eagerly allocate but lazily free, leading to lots of ifs and buts. I think the code can be cleaner by eagerly allocating and eagerly freeing. Eager freeing requires a hotplug remove notification to the arch layer. I had done that originally, but not sure. How does per_cpu() work when defined in a module anyway? The linker magic going on here evades a simple one-minute analysis. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH: kvm 3/5] Fix hotadd of CPUs for KVM.
On 09/26/2009 10:52 PM, Avi Kivity wrote: On 09/25/2009 03:47 AM, Zachary Amsden wrote: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1716,9 +1716,6 @@ static int kvm_cpu_hotplug(struct notifier_block *notifier, unsigned long val, { int cpu = (long)v; -if (!kvm_usage_count) -return NOTIFY_OK; - Why? You'll now do hardware_enable() even if kvm is not in use. Because otherwise you'll never allocate and hardware_enable_all will fail: Switch to broadcast mode on CPU1 svm_hardware_enable: svm_data is NULL on 1 kvm: enabling virtualization on CPU1 failed qemu-system-x86[8698]: segfault at 20 ip 004db22f sp 7fff0a3b4560 error 6 in qemu-system-x86_64[40+21f000] Perhaps I can make this work better by putting the allocation within hardware_enable_all. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_debian_5_0
The Buildbot has detected a new failure of default_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/87 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: The Nightly scheduler named 'nightly_default' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
trivial patch: echo -e in ./configure
The following one-liner eliminates an annoying -e output during ./configure run if your /bin/sh is not bash or ksh: $ ./configure ... IO thread no Install blobs yes -e KVM support yes <=== KVM trace support no fdt support no preadv supportno $ _ (I dunno if it's qemu or kvm thing) Thanks! /mjt --- --- qemu-kvm-0.11.0/configure.sav 2009-09-23 11:30:02.0 +0400 +++ qemu-kvm-0.11.0/configure 2009-09-27 20:04:03.230408438 +0400 @@ -1591 +1591 @@ -echo -e "KVM support $kvm" +echo "KVM support $kvm" -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Unix domain socket device
Hi all, as I can read from the website, a point of the kvm TODO list is "Add a Unix domain socket device. With this, the guest can talk to a pci device which is connected to a Unix domain socket on the host.", it is classified as a smaller scale task that can be done by someone wishing to get involved. Since the Unix domain socket device is exactly what I need for my degree thesis, I can (I have to) develop this device, but I'm a little lost in the kvm sources and documentation; so I need someone that points me to the right place (documentation and source code) where to start from. I have a fair knowledge in programming, expecially with C, and a "not so bad" knowledge of linux sources since I supervised the porting of linux for the Sam440ep board[1]. Regards, Giuseppe [1] http://en.wikipedia.org/wiki/Sam440ep -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On Sun, Sep 27, 2009 at 04:18:00PM +0200, Avi Kivity wrote: > On 09/27/2009 04:07 PM, Joerg Roedel wrote: >> On Sun, Sep 27, 2009 at 03:47:55PM +0200, Avi Kivity wrote: >> >>> On 09/27/2009 03:46 PM, Joerg Roedel wrote: >>> > We can't find exactly which vcpu, but we can: > > - rule out threads that are not vcpus for this guest > - rule out threads that are already running > > A major problem with sleep() is that it effectively reduces the vm > priority relative to guests that don't have spinlock contention. By > selecting a random nonrunnable vcpu belonging to this guest, we at least > preserve the guest's timeslice. > > Ok, that makes sense. But before trying that we should probably try to call just yield() instead of schedule()? I remember someone from our team here at AMD did this for Xen a while ago and already had pretty good results with that. Xen has a completly other scheduler but maybe its worth trying? >>> yield() is a no-op in CFS. >>> >> Hmm, true. At least when kernel.sched_compat_yield == 0, which it is on my >> distro. >> If the scheduler would give us something like a real_yield() function >> which asumes kernel.sched_compat_yield = 1 might help. At least its >> better than sleeping for some random amount of time. >> >> > > Depends. If it's a global yield(), yes. If it's a local yield() that > doesn't rebalance the runqueues we might be left with the spinning task > re-running. Only one runable task on each cpu is unlikely in a situation of high vcpu overcommit (where pause filtering matters). > Also, if yield means "give up the reminder of our timeslice", then we > potentially end up sleeping a much longer random amount of time. If we > yield to another vcpu in the same guest we might not care, but if we > yield to some other guest we're seriously penalizing ourselves. I agree that a directed yield with possible rebalance would be good to have, but this is very intrusive to the scheduler code and I think we should at least try if this simpler approach already gives us good results. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Notify nested hypervisor of lost event injections
Hi Avi, can you pleas apply this patch (only 5/5) directly before Alex does a repost? It is pretty independet from the others and contains an important bugfix for nested svm and should go in as soon as possible. Joerg On Fri, Sep 18, 2009 at 03:00:32PM +0200, Alexander Graf wrote: > Normally when event_inj is valid the host CPU would write the contents to > exit_int_info, so the hypervisor knows that the event wasn't injected. > > We failed to do so so far, so let's model closer to the CPU. > > Signed-off-by: Alexander Graf > --- > arch/x86/kvm/svm.c | 16 > 1 files changed, 16 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index 12ec8ee..75e3d75 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1643,6 +1643,22 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) > nested_vmcb->control.exit_info_2 = vmcb->control.exit_info_2; > nested_vmcb->control.exit_int_info = vmcb->control.exit_int_info; > nested_vmcb->control.exit_int_info_err = > vmcb->control.exit_int_info_err; > + > + /* > + * If we emulate a VMRUN/#VMEXIT in the same host #vmexit cycle we have > + * to make sure that we do not lose injected events. So check event_inj > + * here and copy it to exit_int_info if it is valid. > + * exit_int_info and event_inj can't be both valid because the below > + * case only happens on a VMRUN instruction intercept which has not > + * valid exit_int_info set. > + */ > + if (vmcb->control.event_inj & SVM_EVTINJ_VALID) { > + struct vmcb_control_area *nc = &nested_vmcb->control; > + > + nc->exit_int_info = vmcb->control.event_inj; > + nc->exit_int_info_err = vmcb->control.event_inj_err; > + } > + > nested_vmcb->control.tlb_ctl = 0; > nested_vmcb->control.event_inj = 0; > nested_vmcb->control.event_inj_err = 0; > -- > 1.6.0.2 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On 09/27/2009 04:07 PM, Joerg Roedel wrote: On Sun, Sep 27, 2009 at 03:47:55PM +0200, Avi Kivity wrote: On 09/27/2009 03:46 PM, Joerg Roedel wrote: We can't find exactly which vcpu, but we can: - rule out threads that are not vcpus for this guest - rule out threads that are already running A major problem with sleep() is that it effectively reduces the vm priority relative to guests that don't have spinlock contention. By selecting a random nonrunnable vcpu belonging to this guest, we at least preserve the guest's timeslice. Ok, that makes sense. But before trying that we should probably try to call just yield() instead of schedule()? I remember someone from our team here at AMD did this for Xen a while ago and already had pretty good results with that. Xen has a completly other scheduler but maybe its worth trying? yield() is a no-op in CFS. Hmm, true. At least when kernel.sched_compat_yield == 0, which it is on my distro. If the scheduler would give us something like a real_yield() function which asumes kernel.sched_compat_yield = 1 might help. At least its better than sleeping for some random amount of time. Depends. If it's a global yield(), yes. If it's a local yield() that doesn't rebalance the runqueues we might be left with the spinning task re-running. Also, if yield means "give up the reminder of our timeslice", then we potentially end up sleeping a much longer random amount of time. If we yield to another vcpu in the same guest we might not care, but if we yield to some other guest we're seriously penalizing ourselves. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On Sun, Sep 27, 2009 at 03:47:55PM +0200, Avi Kivity wrote: > On 09/27/2009 03:46 PM, Joerg Roedel wrote: >> >>> We can't find exactly which vcpu, but we can: >>> >>> - rule out threads that are not vcpus for this guest >>> - rule out threads that are already running >>> >>> A major problem with sleep() is that it effectively reduces the vm >>> priority relative to guests that don't have spinlock contention. By >>> selecting a random nonrunnable vcpu belonging to this guest, we at least >>> preserve the guest's timeslice. >>> >> Ok, that makes sense. But before trying that we should probably try to >> call just yield() instead of schedule()? I remember someone from our >> team here at AMD did this for Xen a while ago and already had pretty >> good results with that. Xen has a completly other scheduler but maybe >> its worth trying? >> > > yield() is a no-op in CFS. Hmm, true. At least when kernel.sched_compat_yield == 0, which it is on my distro. If the scheduler would give us something like a real_yield() function which asumes kernel.sched_compat_yield = 1 might help. At least its better than sleeping for some random amount of time. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On 09/27/2009 03:46 PM, Joerg Roedel wrote: We can't find exactly which vcpu, but we can: - rule out threads that are not vcpus for this guest - rule out threads that are already running A major problem with sleep() is that it effectively reduces the vm priority relative to guests that don't have spinlock contention. By selecting a random nonrunnable vcpu belonging to this guest, we at least preserve the guest's timeslice. Ok, that makes sense. But before trying that we should probably try to call just yield() instead of schedule()? I remember someone from our team here at AMD did this for Xen a while ago and already had pretty good results with that. Xen has a completly other scheduler but maybe its worth trying? yield() is a no-op in CFS. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On Sun, Sep 27, 2009 at 10:31:21AM +0200, Avi Kivity wrote: > On 09/25/2009 11:43 PM, Joerg Roedel wrote: >> On Wed, Sep 23, 2009 at 05:09:38PM +0300, Avi Kivity wrote: >> >>> We haven't sorted out what is the correct thing to do here. I think we >>> should go for a directed yield, but until we have it, you can use >>> hrtimers to sleep for 100 microseconds and hope the holding vcpu will >>> get scheduled. Even if it doesn't, we're only wasting a few percent cpu >>> time instead of spinning. >>> >> How do you plan to find out to which vcpu thread the current thread >> should yield? >> > > We can't find exactly which vcpu, but we can: > > - rule out threads that are not vcpus for this guest > - rule out threads that are already running > > A major problem with sleep() is that it effectively reduces the vm > priority relative to guests that don't have spinlock contention. By > selecting a random nonrunnable vcpu belonging to this guest, we at least > preserve the guest's timeslice. Ok, that makes sense. But before trying that we should probably try to call just yield() instead of schedule()? I remember someone from our team here at AMD did this for Xen a while ago and already had pretty good results with that. Xen has a completly other scheduler but maybe its worth trying? Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sync guest calls made async on host - SQLite performance
I have created a launchpad bug against qemu-kvm in Ubuntu. https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/437473 Just re-iterating, my concern isn't so much performance, but integrity of stock KVM configurations with server or other workloads that expect sync fileIO requests to be honored and synchronous to the underlying physical disk. (That and ensuring that sanity reigns where a benchmark doesn't show a guest operating 10 times faster than a host for the same test :). Regards, Matthew Original Message Subject: Re: sync guest calls made async on host - SQLite performance From: Avi Kivity To: RW Cc: kvm@vger.kernel.org Date: 09/27/2009 07:37 AM On 09/25/2009 10:00 AM, RW wrote: I think ext3 with "data=writeback" in a KVM and KVM started with "if=virtio,cache=none" is a little bit crazy. I don't know if this is the case with current Ubuntu Alpha but it looks like so. It's not crazy, qemu bypasses the cache with cache=none so the ext3 data= setting is immaterial. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sync guest calls made async on host - SQLite performance
On 09/25/2009 10:00 AM, RW wrote: I think ext3 with "data=writeback" in a KVM and KVM started with "if=virtio,cache=none" is a little bit crazy. I don't know if this is the case with current Ubuntu Alpha but it looks like so. It's not crazy, qemu bypasses the cache with cache=none so the ext3 data= setting is immaterial. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Change log message of VM login
We may use this function 'wait_for_login' for several times in a case, only the first time login could be "Waiting guest to be up". Signed-off-by: Yolkfull Chow --- client/tests/kvm/kvm_test_utils.py |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/kvm_test_utils.py b/client/tests/kvm/kvm_test_utils.py index 601b350..aa3f2ee 100644 --- a/client/tests/kvm/kvm_test_utils.py +++ b/client/tests/kvm/kvm_test_utils.py @@ -52,7 +52,7 @@ def wait_for_login(vm, nic_index=0, timeout=240): @param timeout: Time to wait before giving up. @return: A shell session object. """ -logging.info("Waiting for guest '%s' to be up..." % vm.name) +logging.info("Try to login to guest '%s'..." % vm.name) session = kvm_utils.wait_for(lambda: vm.remote_login(nic_index=nic_index), timeout, 0, 2) if not session: -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On 09/26/2009 12:32 AM, Gregory Haskins wrote: I realize in retrospect that my choice of words above implies vbus _is_ complete, but this is not what I was saying. What I was trying to convey is that vbus is _more_ complete. Yes, in either case some kind of glue needs to be written. The difference is that vbus implements more of the glue generally, and leaves less required to be customized for each iteration. No argument there. Since you care about non-virt scenarios and virtio doesn't, naturally vbus is a better fit for them as the code stands. Thanks for finally starting to acknowledge there's a benefit, at least. I think I've mentioned vbus' finer grained layers as helpful here, though I doubt the value of this. Hypervisors are added rarely, while devices and drivers are added (and modified) much more often. I don't buy the anything-to-anything promise. To be more precise, IMO virtio is designed to be a performance oriented ring-based driver interface that supports all types of hypervisors (e.g. shmem based kvm, and non-shmem based Xen). vbus is designed to be a high-performance generic shared-memory interconnect (for rings or otherwise) framework for environments where linux is the underpinning "host" (physical or virtual). They are distinctly different, but complementary (the former addresses the part of the front-end, and latter addresses the back-end, and a different part of the front-end). They're not truly complementary since they're incompatible. A 2.6.27 guest, or Windows guest with the existing virtio drivers, won't work over vbus. Further, non-shmem virtio can't work over vbus. Since virtio is guest-oriented and host-agnostic, it can't ignore non-shared-memory hosts (even though it's unlikely virtio will be adopted there). In addition, the kvm-connector used in AlacrityVM's design strives to add value and improve performance via other mechanisms, such as dynamic allocation, interrupt coalescing (thus reducing exit-ratio, which is a serious issue in KVM) Do you have measurements of inter-interrupt coalescing rates (excluding intra-interrupt coalescing). and priortizable/nestable signals. That doesn't belong in a bus. Today there is a large performance disparity between what a KVM guest sees and what a native linux application sees on that same host. Just take a look at some of my graphs between "virtio", and "native", for example: http://developer.novell.com/wiki/images/b/b7/31-rc4_throughput.png That's a red herring. The problem is not with virtio as an ABI, but with its implementation in userspace. vhost-net should offer equivalent performance to vbus. A dominant vbus design principle is to try to achieve the same IO performance for all "linux applications" whether they be literally userspace applications, or things like KVM vcpus or Ira's physical boards. It also aims to solve problems not previously expressible with current technologies (even virtio), like nested real-time. And even though you repeatedly insist otherwise, the neat thing here is that the two technologies mesh (at least under certain circumstances, like when virtio is deployed on a shared-memory friendly linux backend like KVM). I hope that my stack diagram below depicts that clearly. Right, when you ignore the points where they don't fit, it's a perfect mesh. But that's not a strong argument for vbus; instead of adding vbus you could make virtio more friendly to non-virt Actually, it _is_ a strong argument then because adding vbus is what helps makes virtio friendly to non-virt, at least for when performance matters. As vhost-net shows, you can do that without vbus and without breaking compatibility. Right. virtio assumes that it's in a virt scenario and that the guest architecture already has enumeration and hotplug mechanisms which it would prefer to use. That happens to be the case for kvm/x86. No, virtio doesn't assume that. It's stack provides the "virtio-bus" abstraction and what it does assume is that it will be wired up to something underneath. Kvm/x86 conveniently has pci, so the virtio-pci adapter was created to reuse much of that facility. For other things like lguest and s360, something new had to be created underneath to make up for the lack of pci-like support. Right, I was wrong there. But it does allow you to have a 1:1 mapping between native devices and virtio devices. So to answer your question, the difference is that the part that has to be customized in vbus should be a fraction of what needs to be customized with vhost because it defines more of the stack. But if you want to use the native mechanisms, vbus doesn't have any added value. First of all, thats incorrect. If you want to use the "native" mechanisms (via the way the vbus-connector is implemented, for instance) you at least still have the benefit that the backend design is more broadly re-u
[PATCH] Add a kvm test guest_s4 which supports both Linux and Windows platform
For this case, Ken Cao wrote the linux part previously and I did extensive modifications on Windows platform support. Signed-off-by: Ken Cao Signed-off-by: Yolkfull Chow --- client/tests/kvm/kvm_tests.cfg.sample | 14 +++ client/tests/kvm/tests/guest_s4.py| 66 + 2 files changed, 80 insertions(+), 0 deletions(-) create mode 100644 client/tests/kvm/tests/guest_s4.py diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 285a38f..f9ecb61 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -94,6 +94,14 @@ variants: - linux_s3: install setup type = linux_s3 +- guest_s4: +type = guest_s4 +check_s4_support_cmd = grep -q disk /sys/power/state +test_s4_cmd = "cd /tmp/;nohup tcpdump -q -t ip host localhost" +check_s4_cmd = pgrep tcpdump +set_s4_cmd = echo disk > /sys/power/state +kill_test_s4_cmd = pkill tcpdump + - timedrift:install setup type = timedrift extra_params += " -rtc-td-hack" @@ -382,6 +390,12 @@ variants: # Alternative host load: #host_load_command = "dd if=/dev/urandom of=/dev/null" host_load_instances = 8 +guest_s4: +check_s4_support_cmd = powercfg /hibernate on +test_s4_cmd = start /B ping -n 3000 localhost +check_s4_cmd = tasklist | find /I "ping" +set_s4_cmd = rundll32.exe PowrProf.dll, SetSuspendState +kill_test_s4_cmd = taskkill /IM ping.exe /F variants: - Win2000: diff --git a/client/tests/kvm/tests/guest_s4.py b/client/tests/kvm/tests/guest_s4.py new file mode 100644 index 000..5d8fbdf --- /dev/null +++ b/client/tests/kvm/tests/guest_s4.py @@ -0,0 +1,66 @@ +import logging, time +from autotest_lib.client.common_lib import error +import kvm_test_utils, kvm_utils + + +def run_guest_s4(test, params, env): +""" +Suspend guest to disk,supports both Linux & Windows OSes. + +@param test: kvm test object. +@param params: Dictionary with test parameters. +@param env: Dictionary with the test environment. +""" +vm = kvm_test_utils.get_living_vm(env, params.get("main_vm")) +session = kvm_test_utils.wait_for_login(vm) + +logging.info("Checking whether VM supports S4") +status = session.get_command_status(params.get("check_s4_support_cmd")) +if status is None: +logging.error("Failed to check if S4 exists") +elif status != 0: +raise error.TestFail("Guest does not support S4") + +logging.info("Waiting for a while for X to start...") +time.sleep(10) + +# Start up a program(tcpdump for linux OS & ping for M$ OS), as a flag. +# If the program died after suspend, then fails this testcase. +test_s4_cmd = params.get("test_s4_cmd") +session.sendline(test_s4_cmd) + +# Get the second session to start S4 +session2 = kvm_test_utils.wait_for_login(vm) + +check_s4_cmd = params.get("check_s4_cmd") +if session2.get_command_status(check_s4_cmd): +raise error.TestError("Failed to launch %s background" % test_s4_cmd) +logging.info("Launched command background in guest: %s" % test_s4_cmd) + +# Implement S4 +logging.info("Start suspend to disk now...") +session2.sendline(params.get("set_s4_cmd")) + +if not kvm_utils.wait_for(vm.is_dead, 360, 30, 2): +raise error.TestFail("VM refuse to go down,suspend failed") +logging.info("VM suspended successfully.") + +logging.info("VM suspended to disk. sleep 10 seconds to have a break...") +time.sleep(10) + +# Start vm, and check whether the program is still running +logging.info("Restart VM now...") + +if not vm.create(): +raise error.TestError("failed to start the vm again.") +if not vm.is_alive(): +raise error.TestError("VM seems to be dead; Test requires a live VM.") + +# Check whether test command still alive +if session2.get_command_status(check_s4_cmd): +raise error.TestFail("%s died, indicating that S4 failed" % test_s4_cmd) + +logging.info("VM resumed after S4") +session2.sendline(params.get("kill_test_s4_cmd")) +session.close() +session2.close() -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH: kvm 4/5] Fix hotremove of CPUs for KVM.
On 09/25/2009 03:47 AM, Zachary Amsden wrote: In the process of bringing down CPUs, the SVM / VMX structures associated with those CPUs are not freed. This may cause leaks when unloading and reloading the KVM module, as only the structures associated with online CPUs are cleaned up. So, clean up all possible CPUs, not just online ones. Signed-off-by: Zachary Amsden --- arch/x86/kvm/svm.c |2 +- arch/x86/kvm/vmx.c |7 +-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 8f99d0c..13ca268 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -525,7 +525,7 @@ static __exit void svm_hardware_unsetup(void) { int cpu; - for_each_online_cpu(cpu) + for_each_possible_cpu(cpu) svm_cpu_uninit(cpu); __free_pages(pfn_to_page(iopm_base>> PAGE_SHIFT), IOPM_ALLOC_ORDER); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b8a8428..603bde3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1350,8 +1350,11 @@ static void free_kvm_area(void) { int cpu; - for_each_online_cpu(cpu) - free_vmcs(per_cpu(vmxarea, cpu)); + for_each_possible_cpu(cpu) + if (per_cpu(vmxarea, cpu)) { + free_vmcs(per_cpu(vmxarea, cpu)); + per_cpu(vmxarea, cpu) = NULL; + } } static __init int alloc_kvm_area(void) First, I'm not sure per_cpu works for possible but not actual cpus. Second, we now eagerly allocate but lazily free, leading to lots of ifs and buts. I think the code can be cleaner by eagerly allocating and eagerly freeing. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH: kvm 3/5] Fix hotadd of CPUs for KVM.
On 09/25/2009 03:47 AM, Zachary Amsden wrote: Both VMX and SVM require per-cpu memory allocation, which is done at module init time, for only online cpus. When bringing a new CPU online, we must also allocate this structure. The method chosen to implement this is to make the CPU online notifier available via a call to the arch code. This allows memory allocation to be done smoothly, without any need to allocate extra structures. Note: CPU up notifiers may call KVM callback before calling cpufreq callbacks. This would causes the CPU frequency not to be detected (and it is not always clear on non-constant TSC platforms what the bringup TSC rate will be, so the guess of using tsc_khz could be wrong). So, we clear the rate to zero in such a case and add logic to query it upon entry. Signed-off-by: Zachary Amsden --- arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/svm.c | 15 +-- arch/x86/kvm/vmx.c | 17 + arch/x86/kvm/x86.c | 13 + include/linux/kvm_host.h|6 ++ virt/kvm/kvm_main.c |6 ++ 6 files changed, 49 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 299cc1b..b7dd14b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -459,6 +459,7 @@ struct descriptor_table { struct kvm_x86_ops { int (*cpu_has_kvm_support)(void); /* __init */ int (*disabled_by_bios)(void); /* __init */ + int (*cpu_hotadd)(int cpu); int (*hardware_enable)(void *dummy); void (*hardware_disable)(void *dummy); void (*check_processor_compatibility)(void *rtn); @@ -791,6 +792,7 @@ asmlinkage void kvm_handle_fault_on_reboot(void); _ASM_PTR " 666b, 667b \n\t" \ ".popsection" +#define KVM_ARCH_WANT_HOTPLUG_NOTIFIER #define KVM_ARCH_WANT_MMU_NOTIFIER int kvm_unmap_hva(struct kvm *kvm, unsigned long hva); int kvm_age_hva(struct kvm *kvm, unsigned long hva); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 9a4daca..8f99d0c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -330,13 +330,13 @@ static int svm_hardware_enable(void *garbage) return -EBUSY; if (!has_svm()) { - printk(KERN_ERR "svm_cpu_init: err EOPNOTSUPP on %d\n", me); + printk(KERN_ERR "svm_hardware_enable: err EOPNOTSUPP on %d\n", me); return -EINVAL; } svm_data = per_cpu(svm_data, me); if (!svm_data) { - printk(KERN_ERR "svm_cpu_init: svm_data is NULL on %d\n", + printk(KERN_ERR "svm_hardware_enable: svm_data is NULL on %d\n", me); return -EINVAL; } @@ -394,6 +394,16 @@ err_1: } +static __cpuinit int svm_cpu_hotadd(int cpu) +{ + struct svm_cpu_data *svm_data = per_cpu(svm_data, cpu); + + if (svm_data) + return 0; + + return svm_cpu_init(cpu); +} + static void set_msr_interception(u32 *msrpm, unsigned msr, int read, int write) { @@ -2858,6 +2868,7 @@ static struct kvm_x86_ops svm_x86_ops = { .hardware_setup = svm_hardware_setup, .hardware_unsetup = svm_hardware_unsetup, .check_processor_compatibility = svm_check_processor_compat, + .cpu_hotadd = svm_cpu_hotadd, .hardware_enable = svm_hardware_enable, .hardware_disable = svm_hardware_disable, .cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3fe0d42..b8a8428 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1408,6 +1408,22 @@ static __exit void hardware_unsetup(void) free_kvm_area(); } +static __cpuinit int vmx_cpu_hotadd(int cpu) +{ + struct vmcs *vmcs; + + if (per_cpu(vmxarea, cpu)) + return 0; + + vmcs = alloc_vmcs_cpu(cpu); + if (!vmcs) + return -ENOMEM; + + per_cpu(vmxarea, cpu) = vmcs; + + return 0; +} + static void fix_pmode_dataseg(int seg, struct kvm_save_segment *save) { struct kvm_vmx_segment_field *sf =&kvm_vmx_segment_fields[seg]; @@ -3925,6 +3941,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .hardware_setup = hardware_setup, .hardware_unsetup = hardware_unsetup, .check_processor_compatibility = vmx_check_processor_compat, + .cpu_hotadd = vmx_cpu_hotadd, .hardware_enable = hardware_enable, .hardware_disable = hardware_disable, .cpu_has_accelerated_tpr = report_flexpriority, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c18e2fc..66c6bb9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1326,6 +1326,8 @@ out: void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { kvm_x86_ops->vcpu_load(vcpu, cpu); + if (unlikely(per_cpu(cpu_
Re: [PATCH: kvm 3/6] Fix hotadd of CPUs for KVM.
On 09/24/2009 11:32 PM, Zachary Amsden wrote: On 09/24/2009 05:52 AM, Marcelo Tosatti wrote: +static __cpuinit int vmx_cpu_hotadd(int cpu) +{ +struct vmcs *vmcs; + +if (per_cpu(vmxarea, cpu)) +return 0; + +vmcs = alloc_vmcs_cpu(cpu); +if (!vmcs) +return -ENOMEM; + +per_cpu(vmxarea, cpu) = vmcs; + +return 0; +} Have to free in __cpuexit? Is it too wasteful to allocate statically with DEFINE_PER_CPU_PAGE_ALIGNED? Unfortunately, I think it is. The VMX / SVM structures are quite large, and we can have a lot of potential CPUs. I think percpu is only allocated when the cpu is online (it would still be wasteful if the modules were loaded but unused). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: fix lock imbalance
On 09/25/2009 10:33 AM, Jiri Slaby wrote: Stanse found 2 lock imbalances in kvm_request_irq_source_id and kvm_free_irq_source_id. They omit to unlock kvm->irq_lock on fail paths. Fix that by adding unlock labels at the end of the functions and jump there from the fail paths. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] x86: Pick up local arch trace headers
On 09/25/2009 07:18 PM, Jan Kiszka wrote: This unbreaks 2.6.31 builds but also ensures that we always use the most recent ones. Signed-off-by: Jan Kiszka --- include/arch/x86/kvm |1 + 1 files changed, 1 insertions(+), 0 deletions(-) create mode 12 include/arch/x86/kvm diff --git a/include/arch/x86/kvm b/include/arch/x86/kvm new file mode 12 index 000..c635817 --- /dev/null +++ b/include/arch/x86/kvm @@ -0,0 +1 @@ +../../../x86 \ No newline at end of file Shouldn't it be asm-x86? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Fix segfault on -no-kvm startup
On 09/25/2009 10:03 PM, Jan Kiszka wrote: Gleb Natapov wrote: On Fri, Sep 25, 2009 at 06:05:49PM +0200, Jan Kiszka wrote: The check for in-kernel irqchip must be protected by kvm_enabled, and we have a different wrapper for it. Why not move kvm_enabled() into kvm_irqchip_in_kernel()? It will return false if !kvm_enabled(). Yes, possible. But I'm not sure if it's worth to refactor at this level. In any case, fix bugs first, refactor later. I think the whole irqchip interface has to go through some broader refactoring when pushing it upstream. The result should either be a specific, in-kernel-irqchip apic device or generic wrapper services that cover all cases, is easily compiled away in the absence of KVM and avoid #ifdefs like below. s/when/before/ -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Fix segfault on -no-kvm startup
On 09/25/2009 07:05 PM, Jan Kiszka wrote: The check for in-kernel irqchip must be protected by kvm_enabled, and we have a different wrapper for it. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On 09/25/2009 11:43 PM, Joerg Roedel wrote: On Wed, Sep 23, 2009 at 05:09:38PM +0300, Avi Kivity wrote: We haven't sorted out what is the correct thing to do here. I think we should go for a directed yield, but until we have it, you can use hrtimers to sleep for 100 microseconds and hope the holding vcpu will get scheduled. Even if it doesn't, we're only wasting a few percent cpu time instead of spinning. How do you plan to find out to which vcpu thread the current thread should yield? We can't find exactly which vcpu, but we can: - rule out threads that are not vcpus for this guest - rule out threads that are already running A major problem with sleep() is that it effectively reduces the vm priority relative to guests that don't have spinlock contention. By selecting a random nonrunnable vcpu belonging to this guest, we at least preserve the guest's timeslice. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RESEND] KVM:VMX: Add support for Pause-Loop Exiting
On 09/25/2009 04:11 AM, Zhai, Edwin wrote: Avi, hrtimer is used for sleep in attached patch, which have similar perf gain with previous one. Maybe we can check in this patch first, and turn to direct yield in future, as you suggested. +/* + * These 2 parameters are used to config the controls for Pause-Loop Exiting: + * ple_gap:upper bound on the amount of time between two successive + * executions of PAUSE in a loop. Also indicate if ple enabled. + * According to test, this time is usually small than 41 cycles. + * ple_window: upper bound on the amount of time a guest is allowed to execute + * in a PAUSE loop. Tests indicate that most spinlocks are held for + * less than 2^12 cycles + * Time is measured based on a counter that runs at the same rate as the TSC, + * refer SDM volume 3b section 21.6.13& 22.1.3. + */ +#define KVM_VMX_DEFAULT_PLE_GAP41 +#define KVM_VMX_DEFAULT_PLE_WINDOW 4096 +static int __read_mostly ple_gap = KVM_VMX_DEFAULT_PLE_GAP; +module_param(ple_gap, int, S_IRUGO); + +static int __read_mostly ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; +module_param(ple_window, int, S_IRUGO); Shouldn't be __read_mostly since they're read very rarely (__read_mostly should be for variables that are very often read, and rarely written). I'm not even sure they should be parameters. /* + * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE + * exiting, so only get here on cpu with PAUSE-Loop-Exiting. + */ +static int handle_pause(struct kvm_vcpu *vcpu, + struct kvm_run *kvm_run) +{ + ktime_t expires; + skip_emulated_instruction(vcpu); + + /* Sleep for 1 msec, and hope lock-holder got scheduled */ + expires = ktime_add_ns(ktime_get(), 100UL); I think this should be much lower, 50-100us. Maybe this should be a parameter. With 1ms we losing significant cpu time if the congestion clears. + set_current_state(TASK_INTERRUPTIBLE); + schedule_hrtimeout(&expires, HRTIMER_MODE_ABS); + Please add a tracepoint for this (since it can cause significant change in behaviour), and move the logic to kvm_main.c. It will be reused by the AMD implementation, possibly my software spinlock detector, paravirtualized spinlocks, and hopefully other architectures. + return 1; +} + +/* -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server
On Fri, Sep 25, 2009 at 10:01:58AM -0700, Ira W. Snyder wrote: > > + case VHOST_SET_VRING_KICK: > > + r = copy_from_user(&f, argp, sizeof f); > > + if (r < 0) > > + break; > > + eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); > > + if (IS_ERR(eventfp)) > > + return PTR_ERR(eventfp); > > + if (eventfp != vq->kick) { > > + pollstop = filep = vq->kick; > > + pollstart = vq->kick = eventfp; > > + } else > > + filep = eventfp; > > + break; > > + case VHOST_SET_VRING_CALL: > > + r = copy_from_user(&f, argp, sizeof f); > > + if (r < 0) > > + break; > > + eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); > > + if (IS_ERR(eventfp)) > > + return PTR_ERR(eventfp); > > + if (eventfp != vq->call) { > > + filep = vq->call; > > + ctx = vq->call_ctx; > > + vq->call = eventfp; > > + vq->call_ctx = eventfp ? > > + eventfd_ctx_fileget(eventfp) : NULL; > > + } else > > + filep = eventfp; > > + break; > > + case VHOST_SET_VRING_ERR: > > + r = copy_from_user(&f, argp, sizeof f); > > + if (r < 0) > > + break; > > + eventfp = f.fd == -1 ? NULL : eventfd_fget(f.fd); > > + if (IS_ERR(eventfp)) > > + return PTR_ERR(eventfp); > > + if (eventfp != vq->error) { > > + filep = vq->error; > > + vq->error = eventfp; > > + ctx = vq->error_ctx; > > + vq->error_ctx = eventfp ? > > + eventfd_ctx_fileget(eventfp) : NULL; > > + } else > > + filep = eventfp; > > + break; > > I'm not sure how these eventfd's save a trip to userspace. > > AFAICT, eventfd's cannot be used to signal another part of the kernel, > they can only be used to wake up userspace. Yes, they can. See irqfd code in virt/kvm/eventfd.c. > In my system, when an IRQ for kick() comes in, I have an eventfd which > gets signalled to notify userspace. When I want to send a call(), I have > to use a special ioctl(), just like lguest does. > > Doesn't this mean that for call(), vhost is just going to signal an > eventfd to wake up userspace, which is then going to call ioctl(), and > then we're back in kernelspace. Seems like a wasted userspace > round-trip. > > Or am I mis-reading this code? Yes. Kernel can poll eventfd and deliver an interrupt directly without involving userspace. > PS - you can see my current code at: > http://www.mmarray.org/~iws/virtio-phys/ > > Thanks, > Ira > > > + default: > > + r = -ENOIOCTLCMD; > > + } > > + > > + if (pollstop && vq->handle_kick) > > + vhost_poll_stop(&vq->poll); > > + > > + if (ctx) > > + eventfd_ctx_put(ctx); > > + if (filep) > > + fput(filep); > > + > > + if (pollstart && vq->handle_kick) > > + vhost_poll_start(&vq->poll, vq->kick); > > + > > + mutex_unlock(&vq->mutex); > > + > > + if (pollstop && vq->handle_kick) > > + vhost_poll_flush(&vq->poll); > > + return 0; > > +} > > + > > +long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, unsigned > > long arg) > > +{ > > + void __user *argp = (void __user *)arg; > > + long r; > > + > > + mutex_lock(&d->mutex); > > + /* If you are not the owner, you can become one */ > > + if (ioctl == VHOST_SET_OWNER) { > > + r = vhost_dev_set_owner(d); > > + goto done; > > + } > > + > > + /* You must be the owner to do anything else */ > > + r = vhost_dev_check_owner(d); > > + if (r) > > + goto done; > > + > > + switch (ioctl) { > > + case VHOST_SET_MEM_TABLE: > > + r = vhost_set_memory(d, argp); > > + break; > > + default: > > + r = vhost_set_vring(d, ioctl, argp); > > + break; > > + } > > +done: > > + mutex_unlock(&d->mutex); > > + return r; > > +} > > + > > +static const struct vhost_memory_region *find_region(struct vhost_memory > > *mem, > > +__u64 addr, __u32 len) > > +{ > > + struct vhost_memory_region *reg; > > + int i; > > + /* linear search is not brilliant, but we really have on the order of 6 > > +* regions in practice */ > > + for (i = 0; i < mem->nregions; ++i) { > > + reg = mem->regions + i; > > + if (reg->guest_phys_addr <= addr && > > + reg->guest_phys_addr + reg->memory_size - 1 >= addr) > > + return reg; > > + } > > + return NULL; > > +} > > + > > +int translate_desc(struct vhost_dev *dev, u64 addr, u32 len, > > + struct iovec iov[], int iov_size) > > +{ > > + const s
[ANNOUNCE] qemu-kvm-0.11.0 released
qemu-kvm-0.11.0 is now available. This release is is based on the upstream qemu 0.11.0, plus kvm-specific enhancements. Changes from the qemu-kvm-0.10 series: - merge qemu 0.11.0 - qdev device model - qemu-io - i386: multiboot support for -kernel - gdbstub: vCont support - i386: control over boot menu - i386: pc-0.10 compatibility machine type - qcow2: use cache=writethrough by default - i386: MCE emulation - i386: host cpuid support - slirp: host network config - virtio: MSI-x support - pci: allow devices to specify bus address - migration: allow down time based threshold - virtio-net: filtering support - http block device support - i386: expose numa topology to guests - native preadv/pwritev support - kvm: guest debugging support - vnc: support for acls and gssapi - monitor: allow multiple monitors - device assignment: MSI-X support (Sheng Yang) - device assignment: SR/IOV support (Sheng Yang) - irqfd support (Gregory Haskins) - drop libkvm, use some of the upstream kvm support (Glauber Costa) - device assignment: option ROM support (Alex Williamson) - x2apic support (Gleb Natapov) - kvm/msi integration (Michael S. Tsirkin) - hpet/kvm integration (Beth Kon) - mce/kvm ingration (Huang Ying) http://www.linux-kvm.org -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html