Re: kvm: odd time values since kvmclock: set scheduler clock stable
On Thu, 21 May 2015 21:41:23 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, May 18, 2015 at 10:13:03PM -0400, Sasha Levin wrote: On 05/18/2015 10:02 PM, Sasha Levin wrote: On 05/18/2015 08:13 PM, Marcelo Tosatti wrote: GOn Mon, May 18, 2015 at 07:45:41PM -0400, Sasha Levin wrote: On 05/18/2015 06:39 PM, Marcelo Tosatti wrote: On Tue, May 12, 2015 at 07:17:24PM -0400, Sasha Levin wrote: Hi all, I'm seeing odd jump in time values during boot of a KVM guest: [...] [0.00] tsc: Detected 2260.998 MHz processor [3376355.247558] Calibrating delay loop (skipped) preset value.. [...] I've bisected it to: Paolo, Sasha, Although this might seem undesirable, there is no requirement for sched_clock to initialize at 0: * * There is no strict promise about the base, although it tends to start * at 0 on boot (but people really shouldn't rely on that). * Sasha, are you seeing any problem other than the apparent time jump? Nope, but I've looked at it again and it seems that it jumps to the host's clock (that is, in the example above the 3376355 value was the host's clock value). Thanks, Sasha Sasha, thats right. Its the host monotonic clock. It's worth figuring out if (what) userspace breaks on that. I know it says that you shouldn't rely on that, but I'd happily place a bet on at least one userspace treating it as seconds since boot or something similar. Didn't need to go far... In the guest: # date Tue May 19 02:11:46 UTC 2015 # echo hi /dev/kmsg [3907533.080112] hi # dmesg -T [Fri Jul 3 07:33:41 2015] hi Sasha, Can you give the suggested patch (hypervisor patch...) a try please? (with a patched guest, obviously). KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR I've tried your v2, it works for me. My test-case is very simple though: I just boot a VM, log in and reboot. This reproduces the issue Sasha reported 100% of the times for me (don't need multi-vcpu guest either). Would be nice to hear from Sasha too. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR
On Sat, 23 May 2015 17:06:29 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Initialize kvmclock base, on kvmclock system MSR write time, so that the guest sees kvmclock counting from zero. This matches baremetal behaviour when kvmclock in guest sets sched clock stable. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Tested-by: Luiz Capitulino lcapitul...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc2c759..ea40d24 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2188,6 +2188,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) vcpu-requests); ka-boot_vcpu_runs_old_kvmclock = tmp; + + ka-kvmclock_offset = -get_kernel_ns(); } vcpu-arch.time = data; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvmclock: set scheduler clock stable
If you try to enable NOHZ_FULL on a guest today, you'll get the following error when the guest tries to deactivate the scheduler tick: WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290() NO_HZ FULL will not work with unstable sched clock CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: events flush_to_ldisc 8162a0c7 88011f583e88 814e6ba0 0002 88011f583ed8 88011f583ec8 8104d095 88011f583eb8 0003 0001 0001 Call Trace: IRQ [814e6ba0] dump_stack+0x4f/0x7b [8104d095] warn_slowpath_common+0x85/0xc0 [8104d146] warn_slowpath_fmt+0x46/0x50 [810bd2a9] can_stop_full_tick+0xb9/0x290 [810bd9ed] tick_nohz_irq_exit+0x8d/0xb0 [810511c5] irq_exit+0xc5/0x130 [814f180a] smp_apic_timer_interrupt+0x4a/0x60 [814eff5e] apic_timer_interrupt+0x6e/0x80 EOI [814ee5d1] ? _raw_spin_unlock_irqrestore+0x31/0x60 [8108bbc8] __wake_up+0x48/0x60 [8134836c] n_tty_receive_buf_common+0x49c/0xba0 [8134a6bf] ? tty_ldisc_ref+0x1f/0x70 [81348a84] n_tty_receive_buf2+0x14/0x20 [8134b390] flush_to_ldisc+0xe0/0x120 [81064d05] process_one_work+0x1d5/0x540 [81064c81] ? process_one_work+0x151/0x540 [81065191] worker_thread+0x121/0x470 [81065070] ? process_one_work+0x540/0x540 [8106b4df] kthread+0xef/0x110 [8106b3f0] ? __kthread_parkme+0xa0/0xa0 [814ef4f2] ret_from_fork+0x42/0x70 [8106b3f0] ? __kthread_parkme+0xa0/0xa0 ---[ end trace 06e3507544a38866 ]--- However, it turns out that kvmclock does provide a stable sched_clock callback. So, let the scheduler know this which in turn makes NOHZ_FULL work in the guest. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: Original author of this patch is Marcelo. I did most of the testing and backported it to an older real-time kernel tree. Works like a charm. arch/x86/kernel/kvmclock.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 42caaef..4e03921 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -24,6 +24,7 @@ #include linux/percpu.h #include linux/hardirq.h #include linux/memblock.h +#include linux/sched.h #include asm/x86_init.h #include asm/reboot.h @@ -265,6 +266,8 @@ void __init kvmclock_init(void) if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT)) pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT); + + set_sched_clock_stable(); } int __init kvm_setup_vsyscall_timeinfo(void) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Balloon malfunctions with memory hotplug
On Mon, 2 Mar 2015 11:52:34 +0530 Amit Shah amit.s...@redhat.com wrote: Another important detail is that, I *suspect* that a very similar bug already exists with 32-bit guests even without memory hotplug: what happens if you assign 6GB to a 32-bit without PAE support? I think the same problem we're seeing with memory hotplug will happen and solution 1 won't fix this, although no one seems to care about 32-bit guests... Not just 32-bit guests; even 64-bit guests restricted with mem= on the cmdline. You're right. So, it's an already existing issue that becomes very apparent with memory hotplug. I know we've discussed this in the past, and I recall virtio-balloon v2 was going to address this all; sadly I've not kept uptodate with it. Me neither :( -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [BUG] Balloon malfunctions with memory hotplug
On Fri, 27 Feb 2015 08:27:00 +0100 Markus Armbruster arm...@redhat.com wrote: Luiz Capitulino lcapitul...@redhat.com writes: Hello, Reproducer: 1. Start QEMU with balloon and memory hotplug support: # qemu [...] -m 1G,slots=2,maxmem=2G -balloon virtio 2. Check balloon size: (qemu) info balloon balloon: actual=1024 (qemu) 3. Hotplug some memory: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 4. This is step is _not_ needed to reproduce the problem, but you may need to online memory manually on Linux so that it becomes available in the guest 5. Check balloon size again: (qemu) info balloon balloon: actual=1024 (qemu) BUG: The guest now has 2GB of memory, but the balloon thinks the guest has 1GB Impact other than info balloon? You can only balloon what's reported by info balloon. If the guest was booted with 1GB but you hot added another 6GB, then you'll only be able to balloon 1GB. One may think that the problem is that the balloon driver is ignoring hotplugged memory. This is not what's happening. If you do balloon your guest, there's nothing stopping the balloon driver in the guest from ballooning hotplugged memory. The problem is that the balloon device in QEMU needs to know the current amount of memory available to the guest. Before memory hotplug this information was easy to obtain: the current amount of memory available to the guest is the memory the guest was booted with. This value is stored in the ram_size global variable in QEMU and this is what the balloon device emulation code uses today. However, when memory is hotplugged ram_size is _not_ updated and the balloon device breaks. I see two possible solutions for this problem: 1. In addition to reading ram_size, the balloon device in QEMU could scan pc-dimm devices to account for hotplugged memory. This solution was already implemented by zhanghailiang: http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html It works, except that on Linux memory hotplug is a two-step procedure: first memory is inserted then it has to be onlined from user-space. So, if memory is inserted but not onlined this solution gives the opposite problem: the balloon device will report a larger memory amount than the guest actually has. Can we live with that? I guess not, but I'm open for discussion. If QEMU could be notified when Linux makes memory online, then the problem would be gone. But I guess this can't be done. 2. Modify the balloon driver in the guest to inform the balloon device on the host about the current memory available to the guest. This way, whenever the balloon device in QEMU needs to know the current amount of memory in the guest, it asks the guest. This drops any usage of ram_size in the balloon device What happens when the guest lies? There are two kinds of guests that would lie: a broken guest or a malicious guest. For a malicious guest, the worst case I can think of is that we get the same problem we have today. Not a big deal for the host, I guess. A broken guest has to be fixed :) However, I'm getting to the conclusion that this solution will complicate things even more and may add a bunch of new problems. I'm leaning towards applying zhanghailiang's series for now. This series fixes the best case: memory is inserted and the guests uses all of it right away. The worst case is: memory is inserted and the guest doesn't use it. In this case QEMU will allow you to balloon more memory than the guest is using, which can crash the guest. For example, the guest is booted with 1GB you hot add 6GB but the guest doesn't use it. info balloon will report 6GB and will allow you to balloon the guest down to 2GB, which will crash the guest. In theory I think this case has always been broken, but in practice it's very hard (almost impossible?) to reproduce in a Linux 64-bit guest as you'd have to be able to start the guest with more memory than it can recognize. I'm not completely sure this is feasible though. For example, what happens if the guest reports a memory amount to QEMU and right after this more memory is plugged? Besides, this solution is more complex than solution 1 and won't address older guests. Another important detail is that, I *suspect* that a very similar bug already exists with 32-bit guests even without memory hotplug: what happens if you assign 6GB to a 32-bit without PAE support? I think the same problem we're seeing with memory hotplug will happen and solution 1 won't fix this, although no one seems to care about 32-bit guests... Fun... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo
Re: [BUG] Balloon malfunctions with memory hotplug
On Fri, 27 Feb 2015 12:09:20 +0800 zhanghailiang zhang.zhanghaili...@huawei.com wrote: On 2015/2/27 3:26, Luiz Capitulino wrote: Hello, Reproducer: 1. Start QEMU with balloon and memory hotplug support: # qemu [...] -m 1G,slots=2,maxmem=2G -balloon virtio 2. Check balloon size: (qemu) info balloon balloon: actual=1024 (qemu) 3. Hotplug some memory: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 4. This is step is _not_ needed to reproduce the problem, but you may need to online memory manually on Linux so that it becomes available in the guest 5. Check balloon size again: (qemu) info balloon balloon: actual=1024 (qemu) BUG: The guest now has 2GB of memory, but the balloon thinks the guest has 1GB One may think that the problem is that the balloon driver is ignoring hotplugged memory. This is not what's happening. If you do balloon your guest, there's nothing stopping the balloon driver in the guest from ballooning hotplugged memory. The problem is that the balloon device in QEMU needs to know the current amount of memory available to the guest. Before memory hotplug this information was easy to obtain: the current amount of memory available to the guest is the memory the guest was booted with. This value is stored in the ram_size global variable in QEMU and this is what the balloon device emulation code uses today. However, when memory is hotplugged ram_size is _not_ updated and the balloon device breaks. I see two possible solutions for this problem: 1. In addition to reading ram_size, the balloon device in QEMU could scan pc-dimm devices to account for hotplugged memory. This solution was already implemented by zhanghailiang: http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html It works, except that on Linux memory hotplug is a two-step procedure: first memory is inserted then it has to be onlined from user-space. So, if memory is inserted but not onlined this solution gives the opposite problem: the balloon device will report a larger memory amount than the guest actually has. Can we live with that? I guess not, but I'm open for discussion. If QEMU could be notified when Linux makes memory online, then the problem would be gone. But I guess this can't be done. Yes, it is really a problem, balloon can't work well with memory block online/offline now. virtio-balloon can't be notified when memory block online/offline now, actually, we can add this capability by using the exist kernel memory hotplug/unplug notifier mechanism. ( just a simple register_memory_notifier().) I'm leaning towards applying your series now and do anything that involves the guest on top if needed. 2. Modify the balloon driver in the guest to inform the balloon device on the host about the current memory available to the guest. This way, whenever the balloon device in QEMU needs to know the current amount of memory in the guest, it asks the guest. This drops any usage of ram_size in the balloon device I'm not completely sure this is feasible though. For example, what happens if the guest reports a memory amount to QEMU and right after this more memory is plugged? Hmm, i wonder why we notify the number of pages which should be adjusted to virtio-balloon, why not the memory 'target' size ? Is there any special reason ? I don't know either. I guess it's just how the balloon feature was designed. For linux guest, it can always know exactly its current real memory size, but QEMU may not, because guest can do online/offline memory block by themselves. If virtio-balloon in guest know the balloon's 'target' size, it can calculate the exact memory size that should be adjuested. and also can do corresponding action (fill or leak balloon) when there is online/offline memory block occurred. I'm not sure this would work as all sorts of races are possible with memory allocations that may occur during or after the calculation is done. Your series makes the best case work, which is memory is inserted and the guest uses all of it. Today not even the best case works. Besides, this solution is more complex than solution 1 and won't address older guests. Another important detail is that, I *suspect* that a very similar bug already exists with 32-bit guests even without memory hotplug: what happens if you assign 6GB to a 32-bit without PAE support? I think the same problem we're seeing with memory hotplug will happen and solution 1 won't fix this, although no one seems to care about 32-bit guests... . -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org
[BUG] Balloon malfunctions with memory hotplug
Hello, Reproducer: 1. Start QEMU with balloon and memory hotplug support: # qemu [...] -m 1G,slots=2,maxmem=2G -balloon virtio 2. Check balloon size: (qemu) info balloon balloon: actual=1024 (qemu) 3. Hotplug some memory: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 4. This is step is _not_ needed to reproduce the problem, but you may need to online memory manually on Linux so that it becomes available in the guest 5. Check balloon size again: (qemu) info balloon balloon: actual=1024 (qemu) BUG: The guest now has 2GB of memory, but the balloon thinks the guest has 1GB One may think that the problem is that the balloon driver is ignoring hotplugged memory. This is not what's happening. If you do balloon your guest, there's nothing stopping the balloon driver in the guest from ballooning hotplugged memory. The problem is that the balloon device in QEMU needs to know the current amount of memory available to the guest. Before memory hotplug this information was easy to obtain: the current amount of memory available to the guest is the memory the guest was booted with. This value is stored in the ram_size global variable in QEMU and this is what the balloon device emulation code uses today. However, when memory is hotplugged ram_size is _not_ updated and the balloon device breaks. I see two possible solutions for this problem: 1. In addition to reading ram_size, the balloon device in QEMU could scan pc-dimm devices to account for hotplugged memory. This solution was already implemented by zhanghailiang: http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html It works, except that on Linux memory hotplug is a two-step procedure: first memory is inserted then it has to be onlined from user-space. So, if memory is inserted but not onlined this solution gives the opposite problem: the balloon device will report a larger memory amount than the guest actually has. Can we live with that? I guess not, but I'm open for discussion. If QEMU could be notified when Linux makes memory online, then the problem would be gone. But I guess this can't be done. 2. Modify the balloon driver in the guest to inform the balloon device on the host about the current memory available to the guest. This way, whenever the balloon device in QEMU needs to know the current amount of memory in the guest, it asks the guest. This drops any usage of ram_size in the balloon device I'm not completely sure this is feasible though. For example, what happens if the guest reports a memory amount to QEMU and right after this more memory is plugged? Besides, this solution is more complex than solution 1 and won't address older guests. Another important detail is that, I *suspect* that a very similar bug already exists with 32-bit guests even without memory hotplug: what happens if you assign 6GB to a 32-bit without PAE support? I think the same problem we're seeing with memory hotplug will happen and solution 1 won't fix this, although no one seems to care about 32-bit guests... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] README: add information about memory usage
I got a report of someone trying to run tests with a large amount of RAM (4GB), which broke the guest as free_memory() function (called by setup_vm()) will override the PCI hole. Let's document memory constraints so that people don't do that. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- README | 4 1 file changed, 4 insertions(+) diff --git a/README b/README index db525e3..0f5d810 100644 --- a/README +++ b/README @@ -18,6 +18,10 @@ This invocation runs the msr test case. The test outputs to stdio. Using qemu (supported since qemu 1.3): qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio -device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/msr.flat +Note that it's not necessary to specify the -m option to qemu. The default +memory size is enough. Actually, the tests infrastructure doesn't support too +much RAM anyway, so specifying a large amount of RAM may break it. + Or use a runner script to detect the correct invocation: ./x86-run ./x86/msr.flat To select a specific qemu binary, specify the QEMU=path environment: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue
On Mon, 20 Jan 2014 12:43:45 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Luiz Capitulino lcapitul...@redhat.com writes: On Fri, 17 Jan 2014 09:10:47 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Luiz Capitulino lcapitul...@redhat.com writes: From: Luiz capitulino lcapitul...@redhat.com This commit adds support to a new virtqueue called message virtqueue. OK, this needs a lot of thought (especially since reworking the virtio balloon is on the TODO list for the OASIS virtio technical committee...) But AFAICT this is a way of explicitly saying no to the host's target (vs the implicit method of just not meeting the target). I'm not sure that gives enough information to the host. On the other hand, I'm not sure what information we *can* give. Should we instead be counter-proposing a target? The problem is how to estimate a target value. I found it simpler to just try to obey what the host is asking for (and fail if not possible) than trying to make the guest negotiate with the host. Understood, but we already do this the other way where the host tells the guest how much memory to give up. And is a guest expected to automatically inflate the balloon even if the host doesn't want the memory, or wait to be asked? In my current design the guest always waits to be asked. Actually, all automatic inflates and deflates are started by the host. An advantage of this approach is that all the logic stays on the host, which makes things simple as it matches with current balloon design. Btw, you asked about what information we could provide to the host and I forgot something important. The vmpressure notification (take a look at patch 1/4 and patch 2/4) does provide an information that could be interesting to have in the host: it provides a pressure level (low, medium or critical) which is part of kernel's ABI with user-space. I'm thinking about using this pressure level information to implement different policies for automatic ballooning, which could be set by the user or a management tool. What does qemu do with this information? There are two possible scenarios: 1. The balloon driver is currently inflating when it gets under pressure QEMU resets num_pages to the current balloon size. This cancels the on-going inflate 2. The balloon driver is not inflating, eg. it's possibly sleeping QEMU issues a deflate But note that those scenarios are not supposed to be used with the current device, they are part of the automatic ballooning feature. I CC'ed you on the QEMU patch, you can find it here case you didn't see it: http://marc.info/?l=kvmm=138988966315690w=2 Yes, caught that after I replied; I should have checked first. It seems like we are still figuring out how to do ballooning. The best approach in cases like this is to make it simple, so I don't hate this. Good news to me :) But note that Daniel Kiper and I have been discussing a new virtio balloon for draft 2 of the standard. I'll CC you when I post that to one of the OASIS virtio mailing lists. Thank you, I appreciate that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/4] virtio_balloon: add pressure notification via a new virtqueue
The pressure notification added to the virtio-balloon driver by this series is going to be used by the host (QEMU, in this case) to implement automatic balloning support. More details in patch 3/4 and patch 4/4. Patch 1/4 adds in-kernel vmpressure support and is not really part of this series, it's added here for the convenience of someone who wants to try automatic ballooning. Patch 2/4 is a hack to make in-kernel vmpressure work for something not related to cgroups, I'll improve it in later versions. Glauber Costa (1): vmpressure: in-kernel notifications Luiz capitulino (3): vmpressure in-kernel: hack virtio_balloon: add pressure notification via a new virtqueue virtio_balloon: skip inflation if guest is under pressure drivers/virtio/virtio_balloon.c | 100 include/linux/vmpressure.h | 6 +++ include/uapi/linux/virtio_balloon.h | 1 + mm/vmpressure.c | 58 +++-- 4 files changed, 151 insertions(+), 14 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 2/4] vmpressure in-kernel: hack
From: Luiz capitulino lcapitul...@redhat.com 1. Allow drivers to register private data 2. Allow drivers to pass css=NULL 3. Pass level to the callback Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- include/linux/vmpressure.h | 3 ++- mm/vmpressure.c| 13 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h index 9102e53..de416b6 100644 --- a/include/linux/vmpressure.h +++ b/include/linux/vmpressure.h @@ -42,7 +42,8 @@ extern int vmpressure_register_event(struct cgroup_subsys_state *css, struct eventfd_ctx *eventfd, const char *args); extern int vmpressure_register_kernel_event(struct cgroup_subsys_state *css, - void (*fn)(void)); + void (*fn)(void *data, int level), + void *data); extern void vmpressure_unregister_event(struct cgroup_subsys_state *css, struct cftype *cft, struct eventfd_ctx *eventfd); diff --git a/mm/vmpressure.c b/mm/vmpressure.c index 730e7c1..4ed0e85 100644 --- a/mm/vmpressure.c +++ b/mm/vmpressure.c @@ -132,9 +132,10 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned, struct vmpressure_event { union { struct eventfd_ctx *efd; - void (*fn)(void); + void (*fn)(void *data, int level); }; enum vmpressure_levels level; + void *data; bool kernel_event; struct list_head node; }; @@ -152,7 +153,7 @@ static bool vmpressure_event(struct vmpressure *vmpr, list_for_each_entry(ev, vmpr-events, node) { if (ev-kernel_event) { - ev-fn(); + ev-fn(ev-data, level); } else if (vmpr-notify_userspace level = ev-level) { eventfd_signal(ev-efd, 1); signalled = true; @@ -352,21 +353,25 @@ int vmpressure_register_event(struct cgroup_subsys_state *css, * well-defined cgroup aware interface. */ int vmpressure_register_kernel_event(struct cgroup_subsys_state *css, - void (*fn)(void)) +void (*fn)(void *data, int level), void *data) { - struct vmpressure *vmpr = css_to_vmpressure(css); + struct vmpressure *vmpr; struct vmpressure_event *ev; + vmpr = css ? css_to_vmpressure(css) : memcg_to_vmpressure(NULL); + ev = kzalloc(sizeof(*ev), GFP_KERNEL); if (!ev) return -ENOMEM; ev-kernel_event = true; + ev-data = data; ev-fn = fn; mutex_lock(vmpr-events_lock); list_add(ev-node, vmpr-events); mutex_unlock(vmpr-events_lock); + return 0; } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 1/4] vmpressure: in-kernel notifications
From: Glauber Costa glom...@openvz.org During the past weeks, it became clear to us that the shrinker interface we have right now works very well for some particular types of users, but not that well for others. The latter are usually people interested in one-shot notifications, that were forced to adapt themselves to the count+scan behavior of shrinkers. To do so, they had no choice than to greatly abuse the shrinker interface producing little monsters all over. During LSF/MM, one of the proposals that popped out during our session was to reuse Anton Voronstsov's vmpressure for this. They are designed for userspace consumption, but also provide a well-stablished, cgroup-aware entry point for notifications. This patch extends that to also support in-kernel users. Events that should be generated for in-kernel consumption will be marked as such, and for those, we will call a registered function instead of triggering an eventfd notification. Please note that due to my lack of understanding of each shrinker user, I will stay away from converting the actual users, you are all welcome to do so. Signed-off-by: Glauber Costa glom...@openvz.org Signed-off-by: Vladimir Davydov vdavy...@parallels.com Acked-by: Anton Vorontsov an...@enomsg.org Acked-by: Pekka Enberg penb...@kernel.org Reviewed-by: Greg Thelen gthe...@google.com Cc: Dave Chinner dchin...@redhat.com Cc: John Stultz john.stu...@linaro.org Cc: Andrew Morton a...@linux-foundation.org Cc: Joonsoo Kim iamjoonsoo@lge.com Cc: Michal Hocko mho...@suse.cz Cc: Kamezawa Hiroyuki kamezawa.hir...@jp.fujitsu.com Cc: Johannes Weiner han...@cmpxchg.org Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- include/linux/vmpressure.h | 5 + mm/vmpressure.c| 53 +++--- 2 files changed, 55 insertions(+), 3 deletions(-) diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h index 3f3788d..9102e53 100644 --- a/include/linux/vmpressure.h +++ b/include/linux/vmpressure.h @@ -19,6 +19,9 @@ struct vmpressure { /* Have to grab the lock on events traversal or modifications. */ struct mutex events_lock; + /* False if only kernel users want to be notified, true otherwise. */ + bool notify_userspace; + struct work_struct work; }; @@ -38,6 +41,8 @@ extern int vmpressure_register_event(struct cgroup_subsys_state *css, struct cftype *cft, struct eventfd_ctx *eventfd, const char *args); +extern int vmpressure_register_kernel_event(struct cgroup_subsys_state *css, + void (*fn)(void)); extern void vmpressure_unregister_event(struct cgroup_subsys_state *css, struct cftype *cft, struct eventfd_ctx *eventfd); diff --git a/mm/vmpressure.c b/mm/vmpressure.c index e0f6283..730e7c1 100644 --- a/mm/vmpressure.c +++ b/mm/vmpressure.c @@ -130,8 +130,12 @@ static enum vmpressure_levels vmpressure_calc_level(unsigned long scanned, } struct vmpressure_event { - struct eventfd_ctx *efd; + union { + struct eventfd_ctx *efd; + void (*fn)(void); + }; enum vmpressure_levels level; + bool kernel_event; struct list_head node; }; @@ -147,12 +151,15 @@ static bool vmpressure_event(struct vmpressure *vmpr, mutex_lock(vmpr-events_lock); list_for_each_entry(ev, vmpr-events, node) { - if (level = ev-level) { + if (ev-kernel_event) { + ev-fn(); + } else if (vmpr-notify_userspace level = ev-level) { eventfd_signal(ev-efd, 1); signalled = true; } } + vmpr-notify_userspace = false; mutex_unlock(vmpr-events_lock); return signalled; @@ -222,7 +229,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, * we account it too. */ if (!(gfp (__GFP_HIGHMEM | __GFP_MOVABLE | __GFP_IO | __GFP_FS))) - return; + goto schedule; /* * If we got here with no pages scanned, then that is an indicator @@ -239,8 +246,15 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, vmpr-scanned += scanned; vmpr-reclaimed += reclaimed; scanned = vmpr-scanned; + /* +* If we didn't reach this point, only kernel events will be triggered. +* It is the job of the worker thread to clean this up once the +* notifications are all delivered. +*/ + vmpr-notify_userspace = true; spin_unlock(vmpr-sr_lock); +schedule: if (scanned vmpressure_win) return; schedule_work(vmpr-work); @@ -324,6 +338,39 @@ int vmpressure_register_event(struct cgroup_subsys_state *css
[RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue
From: Luiz capitulino lcapitul...@redhat.com This commit adds support to a new virtqueue called message virtqueue. The message virtqueue can be used by guests to notify the host about important memory-related state changes in the guest. Currently, the only implemented notification is the guest is under pressure one, which informs the host that the guest is under memory pressure. This notification can be used to implement automatic memory ballooning in the host. For example, once learning that the guest is under pressure, the host could cancel an on-going inflate and/or start a deflate operation. Doing this through a virtqueue might seem like overkill, as all we're doing is to transfer an integer between guest and host. However, using a virtqueue offers the following advantages: 1. We can realibly synchronize host and guest. That is, the guest will only continue once the host acknowledges the notification. This is important, because if the guest gets under pressure while inflating the balloon, it has to stop to give the host a chance to reset num_pages (see next commit) 2. It's extensible. We may (or may not) want to tell the host which pressure level the guest finds itself in (ie. low, medium or critical) The lightweight alternative is to use a configuration space parameter. For this to work though, the guest would have to wait the for host to acknowloedge the receipt of a configuration change update. I could try this if the virtqueue is too overkill. Finally, the guest learns it's under pressure by registering a callback with the in-kernel vmpressure API. FIXMEs: - vmpressure API is missing an de-registration routine - not completely sure my changes in virtballoon_probe() are correct Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 93 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 84 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 5c4a95b..1c3ee71 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -29,6 +29,9 @@ #include linux/module.h #include linux/balloon_compaction.h +#include linux/cgroup.h +#include linux/vmpressure.h + /* * Balloon device works in 4K page units. So each page is pointed to by * multiple balloon pages. All memory counters in this driver are in balloon @@ -37,10 +40,12 @@ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 +#define VIRTIO_BALLOON_MSG_PRESSURE 1 + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq; /* Where the ballooning thread waits for config to change. */ wait_queue_head_t config_change; @@ -51,6 +56,8 @@ struct virtio_balloon /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; + wait_queue_head_t message_acked; + /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; /* @@ -71,6 +78,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* Message virtqueue */ + atomic_t guest_pressure; }; static struct virtio_device_id id_table[] = { @@ -78,6 +88,41 @@ static struct virtio_device_id id_table[] = { { 0 }, }; +static inline bool guest_under_pressure(const struct virtio_balloon *vb) +{ + return atomic_read(vb-guest_pressure) == 1; +} + +static void vmpressure_event_handler(void *data, int level) +{ + struct virtio_balloon *vb = data; + + atomic_set(vb-guest_pressure, 1); + wake_up(vb-config_change); +} + +static void tell_host_pressure(struct virtio_balloon *vb) +{ + const uint32_t msg = VIRTIO_BALLOON_MSG_PRESSURE; + struct scatterlist sg; + unsigned int len; + int err; + + sg_init_one(sg, msg, sizeof(msg)); + + err = virtqueue_add_outbuf(vb-message_vq, sg, 1, vb, GFP_KERNEL); + if (err 0) { + printk(KERN_WARNING virtio-balloon: failed to send host message (%d)\n, err); + goto out; + } + virtqueue_kick(vb-message_vq); + + wait_event(vb-message_acked, virtqueue_get_buf(vb-message_vq, len)); + +out: + atomic_set(vb-guest_pressure, 0); +} + static u32 page_to_balloon_pfn(struct page *page) { unsigned long pfn = page_to_pfn(page); @@ -100,6 +145,13 @@ static void balloon_ack(struct virtqueue *vq) wake_up(vb-acked); } +static void message_ack(struct virtqueue *vq) +{ + struct virtio_balloon *vb = vq-vdev-priv; + + wake_up(vb-message_acked); +} + static void tell_host(struct virtio_balloon *vb
[RFC PATCH 4/4] virtio_balloon: skip inflation if guest is under pressure
From: Luiz capitulino lcapitul...@redhat.com This is necessary for automatic ballooning. If the guest gets under pressure while there's an on-going inflation operation, we want the guest to do the following: 1. Stop on-going inflation 2. Notify the host we're under pressure 3. Wait for host's acknowledge While the guest is waiting the host has the opportunity to reset num_pages to a value before the guest got into pressure. This will cancel current inflation. Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 1c3ee71..7f5b7d2 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -188,8 +188,13 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { - struct page *page = balloon_page_enqueue(vb_dev_info); + struct page *page; + if (guest_under_pressure(vb)) { + break; + } + + page = balloon_page_enqueue(vb_dev_info); if (!page) { dev_info_ratelimited(vb-vdev-dev, Out of puff! Can't get %u pages\n, -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH] balloon: add automatic ballooning support
of three runs. We measure host swap I/O, QEMU as a host process and perf. info from the guest. Units: - swap in/out: number of pages swapped - Elapsed, user, sys: seconds - total recs: total number of ebizzy records/s. This is a sum of all ebizzy runs for a VM vanilla === Host swap in: 36478.66 swap out: 372551.0 QEMU (as a process in the host) --- Elapsed usersys CPU% major f. minor f. total recs swap in swap out vm-down: 395.42309.60 3.72 79 2772.66 120046.66 4692.33 0 0 vm-up: 396.40310 4.04 79 2053.66 208394.33 4684 0 0 Guest (ebizzy run in the guest) --- total recs swap in swap out vm-down: 4692.33 0 0 vm-up: 4684 0 0 automatic balloon = Host swap in: 2.66 swap out: 8225.33 QEMU (as a process in the host) --- Elapsed usersys CPU% major f. minor f. total recs swap in swap out vm-down: 387.95 309.66 3.43 80 106.66 29497.33 4710.66 0 0 vm-up: 388.79 310.98 4.35 81 63.66 110307 4704.33 2.67 822.66 Guest (ebizzy run in the guest) --- total recs swap in swap out vm-down: 4710.66 0 0 vm-up: 4704.33 2.67 822.66 Some conclusions: - The number of pages swapped in the host and the number of QEMU's major faults is hugely reduced by automatic balloon - Elapsed time is also better for the automatic balloon VMs, vm-down run time as 1.89% lower and vm-up 1.92% lower - The records/s is about the same for both, which I guess means automatic balloon is not regressing this - vm-up did swap a bit, not sure if this is a problem Now the code, and I think I deserve a coffee after having wrote all this stuff... Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- hw/virtio/virtio-balloon.c | 180 + hw/virtio/virtio-pci.c | 5 ++ hw/virtio/virtio-pci.h | 2 + include/hw/virtio/virtio-balloon.h | 21 - 4 files changed, 207 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index d9754db..3b3b6d2 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -31,6 +31,139 @@ #include hw/virtio/virtio-bus.h +#define LINUX_MEMCG_DEF_PATH /sys/fs/cgroup/memory +#define AUTO_BALLOON_NR_PAGES ((32 * 1024 * 1024) VIRTIO_BALLOON_PFN_SHIFT) +#define AUTO_BALLOON_PRESSURE_PERIOD 60 + +void virtio_balloon_set_conf(DeviceState *dev, const VirtIOBalloonConf *bconf) +{ +VirtIOBalloon *s = VIRTIO_BALLOON(dev); +memcpy((s-bconf), bconf, sizeof(struct VirtIOBalloonConf)); +} + +static bool auto_balloon_enabled_cmdline(const VirtIOBalloon *s) +{ +return s-bconf.auto_balloon_enabled; +} + +static bool guest_in_pressure(const VirtIOBalloon *s) +{ +time_t t = s-autob_last_guest_pressure; +return difftime(time(NULL), t) = AUTO_BALLOON_PRESSURE_PERIOD; +} + +static void inflate_guest(VirtIOBalloon *s) +{ +if (guest_in_pressure(s)) { +return; +} + +s-num_pages += AUTO_BALLOON_NR_PAGES; +virtio_notify_config(VIRTIO_DEVICE(s)); +} + +static void deflate_guest(VirtIOBalloon *s) +{ +if (!s-autob_cur_size) { +return; +} + +s-num_pages -= AUTO_BALLOON_NR_PAGES; +virtio_notify_config(VIRTIO_DEVICE(s)); +} + +static void virtio_balloon_handle_host_pressure(EventNotifier *ev) +{ +VirtIOBalloon *s = container_of(ev, VirtIOBalloon, event); + +if (!event_notifier_test_and_clear(ev)) { +fprintf(stderr, virtio-balloon: failed to drain the notify pipe\n); +return; +} + +inflate_guest(s); +} + +static void register_vmpressure(int cfd, int efd, int lfd, Error **errp) +{ +char *p; +ssize_t ret; + +p = g_strdup_printf(%d %d low, efd, lfd); +ret = write(cfd, p, strlen(p)); +if (ret 0) { +error_setg_errno(errp, errno, failed to write to control fd: %d, cfd); +} else { +g_assert(ret == strlen(p)); /* XXX: this should be always true, right? */ +} + +g_free(p); +} + +static int open_file_in_dir(const char *dir_path, const char *file, mode_t mode, +Error **errp) +{ +char *p; +int fd; + +p = g_strjoin(/, dir_path, file, NULL); +fd = qemu_open(p, mode); +if (fd 0) { +error_setg_errno(errp, errno, can't open '%s', p); +} + +g_free(p); +return fd; +} + +static void automatic_balloon_init(VirtIOBalloon *s, const char *memcg_path, + Error **errp) +{ +Error *local_err = NULL; +int ret; + +if (!memcg_path) { +memcg_path = LINUX_MEMCG_DEF_PATH; +} + +s-lfd = open_file_in_dir(memcg_path, memory.pressure_level, O_RDONLY
Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue
On Fri, 17 Jan 2014 09:10:47 +1030 Rusty Russell ru...@rustcorp.com.au wrote: Luiz Capitulino lcapitul...@redhat.com writes: From: Luiz capitulino lcapitul...@redhat.com This commit adds support to a new virtqueue called message virtqueue. OK, this needs a lot of thought (especially since reworking the virtio balloon is on the TODO list for the OASIS virtio technical committee...) But AFAICT this is a way of explicitly saying no to the host's target (vs the implicit method of just not meeting the target). I'm not sure that gives enough information to the host. On the other hand, I'm not sure what information we *can* give. Should we instead be counter-proposing a target? The problem is how to estimate a target value. I found it simpler to just try to obey what the host is asking for (and fail if not possible) than trying to make the guest negotiate with the host. What does qemu do with this information? There are two possible scenarios: 1. The balloon driver is currently inflating when it gets under pressure QEMU resets num_pages to the current balloon size. This cancels the on-going inflate 2. The balloon driver is not inflating, eg. it's possibly sleeping QEMU issues a deflate But note that those scenarios are not supposed to be used with the current device, they are part of the automatic ballooning feature. I CC'ed you on the QEMU patch, you can find it here case you didn't see it: http://marc.info/?l=kvmm=138988966315690w=2 Thanks, Rusty. The message virtqueue can be used by guests to notify the host about important memory-related state changes in the guest. Currently, the only implemented notification is the guest is under pressure one, which informs the host that the guest is under memory pressure. This notification can be used to implement automatic memory ballooning in the host. For example, once learning that the guest is under pressure, the host could cancel an on-going inflate and/or start a deflate operation. Doing this through a virtqueue might seem like overkill, as all we're doing is to transfer an integer between guest and host. However, using a virtqueue offers the following advantages: 1. We can realibly synchronize host and guest. That is, the guest will only continue once the host acknowledges the notification. This is important, because if the guest gets under pressure while inflating the balloon, it has to stop to give the host a chance to reset num_pages (see next commit) 2. It's extensible. We may (or may not) want to tell the host which pressure level the guest finds itself in (ie. low, medium or critical) The lightweight alternative is to use a configuration space parameter. For this to work though, the guest would have to wait the for host to acknowloedge the receipt of a configuration change update. I could try this if the virtqueue is too overkill. Finally, the guest learns it's under pressure by registering a callback with the in-kernel vmpressure API. FIXMEs: - vmpressure API is missing an de-registration routine - not completely sure my changes in virtballoon_probe() are correct Signed-off-by: Luiz capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 93 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 84 insertions(+), 10 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 5c4a95b..1c3ee71 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -29,6 +29,9 @@ #include linux/module.h #include linux/balloon_compaction.h +#include linux/cgroup.h +#include linux/vmpressure.h + /* * Balloon device works in 4K page units. So each page is pointed to by * multiple balloon pages. All memory counters in this driver are in balloon @@ -37,10 +40,12 @@ #define VIRTIO_BALLOON_PAGES_PER_PAGE (unsigned)(PAGE_SIZE VIRTIO_BALLOON_PFN_SHIFT) #define VIRTIO_BALLOON_ARRAY_PFNS_MAX 256 +#define VIRTIO_BALLOON_MSG_PRESSURE 1 + struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *message_vq; /* Where the ballooning thread waits for config to change. */ wait_queue_head_t config_change; @@ -51,6 +56,8 @@ struct virtio_balloon /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; + wait_queue_head_t message_acked; + /* Number of balloon pages we've told the Host we're not using. */ unsigned int num_pages; /* @@ -71,6 +78,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* Message virtqueue
Re: [RFC PATCH 3/4] virtio_balloon: add pressure notification via a new virtqueue
On Thu, 16 Jan 2014 20:38:19 -0500 Luiz Capitulino lcapitul...@redhat.com wrote: What does qemu do with this information? There are two possible scenarios: 1. The balloon driver is currently inflating when it gets under pressure QEMU resets num_pages to the current balloon size. This cancels the on-going inflate 2. The balloon driver is not inflating, eg. it's possibly sleeping QEMU issues a deflate But note that those scenarios are not supposed to be used with the current device, they are part of the automatic ballooning feature. I CC'ed you on the QEMU patch, you can find it here case you didn't see it: http://marc.info/?l=kvmm=138988966315690w=2 By current device I meant outside of automatic ballooning scope. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio_balloon: update_balloon_size(): update correct field
According to the virtio spec, the device configuration field that should be updated after an inflation or deflation operation is the 'actual' field, not the 'num_pages' one. Commit 855e0c5288177bcb193f6f6316952d2490478e1c swapped them in update_balloon_size(). Fix it. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index c444654..5c4a95b 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -285,7 +285,7 @@ static void update_balloon_size(struct virtio_balloon *vb) { __le32 actual = cpu_to_le32(vb-num_pages); - virtio_cwrite(vb-vdev, struct virtio_balloon_config, num_pages, + virtio_cwrite(vb-vdev, struct virtio_balloon_config, actual, actual); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] x86-64 apic panic on shutdown on 1.4.93.
On Wed, 26 Jun 2013 00:52:31 -0500 Rob Landley r...@landley.net wrote: I intermittently get this from current kernels running under currentish qemu-git. Look familiar to anybody? Which kernel do you run in the host? Is the guest doing anything special? reboot: machine restart general protection fault: fff2 [#1] CPU: 0 PID: 44 Comm: oneit Not tainted 3.10.0-rc7+ #3 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: 8800068fd500 ti: 880006a26000 task.ti: 880006a26000 RIP: 0010:[81014441] [81014441] lapic_shutdown+0x32/0x34 RSP: 0018:880006a27e28 EFLAGS: 0202 RAX: 2193fbf9078bfbf9 RBX: 0202 RCX: RDX: 81015f71 RSI: 00ff RDI: 00f0 RBP: fee1dead R08: 0400 R09: R10: R11: 88000699f500 R12: R13: 01234567 R14: 0004 R15: 00423872 FS: () GS:81308000() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 00657ad0 CR3: 0697c000 CR4: 06b0 DR0: DR1: DR2: DR3: DR6: DR7: Stack: 28121969 81013bda 03f9 81013dc5 8102ad4b Call Trace: [81013bda] ? native_machine_shutdown+0x6/0x1a [81013dc5] ? native_machine_restart+0x1d/0x31 [8102ad4b] ? SyS_reboot+0x126/0x15b [810374bc] ? schedule_tail+0x1e/0x44 [8122f57f] ? ret_from_fork+0xf/0xb0 [8122f690] ? system_call_fastpath+0x16/0x1b Code: 53 f6 c4 02 75 1b 31 c0 83 3d af 42 50 00 00 74 0c 31 c0 83 3d b4 42 50 00 00 0f 94 c0 85 c0 74 0a 9c 5b fa e8 88 ff ff ff 53 9d 5b c3 50 e8 11 ec 00 00 e8 d6 2f ff ff 48 8b 05 43 4b 32 00 bf RIP [81014441] lapic_shutdown+0x32/0x34 RSP 880006a27e28 ---[ end trace dd3c376274d1a087 ]--- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Wed, 5 Jun 2013 21:18:37 -0400 Luiz Capitulino lcapitul...@redhat.com wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Acked-by: Rafael Aquini aqu...@redhat.com Andrew, can you pick this one? --- o v2 - Improve changelog drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Thu, 6 Jun 2013 11:13:58 -0300 Rafael Aquini aqu...@redhat.com wrote: On Wed, Jun 05, 2013 at 09:18:37PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Acked-by: Rafael Aquini aqu...@redhat.com --- o v2 - Improve changelog drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); Luiz, sorry for not being clearer before. I was referring to add a commentary on code, to explain in short words why we should not get rid of this check point. Oh. + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); If the comment is regarded as unnecessary, then just ignore my suggestion. I'm OK with your patch. :) IMHO, the code is clear enough. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: I didn't get this in practice. I found it by code review. On the other hand, automatic-ballooning was able to put such invalid requests in the virtqueue and QEMU would explode... PPS: Very lightly tested drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
On Wed, 5 Jun 2013 18:24:49 -0300 Rafael Aquini aqu...@redhat.com wrote: On Wed, Jun 05, 2013 at 05:10:31PM -0400, Luiz Capitulino wrote: The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- PS: I didn't get this in practice. I found it by code review. On the other hand, automatic-ballooning was able to put such invalid requests in the virtqueue and QEMU would explode... Nice catch! The patch looks sane and replicates the check done at fill_balloon(). I think we also could use this P.S. as a commentary to let others aware of this scenario. Thanks Luiz! Want me to respin? Acked-by: Rafael Aquini aqu...@redhat.com Thanks for your review! PPS: Very lightly tested drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] virtio_balloon: leak_balloon(): only tell host if we got pages deflated
The balloon_page_dequeue() function can return NULL. If it does for the first page being freed, then leak_balloon() will create a scatter list with len=0. Which in turn seems to generate an invalid virtio request. I didn't get this in practice, I found it by code review. On the other hand, such an invalid virtio request will cause errors in QEMU and fill_balloon() also performs the same check implemented by this commit. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com Acked-by: Rafael Aquini aqu...@redhat.com --- o v2 - Improve changelog drivers/virtio/virtio_balloon.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..71af7b5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -191,7 +191,8 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST); * is true, we *have* to do it in this order */ - tell_host(vb, vb-deflate_vq); + if (vb-num_pfns != 0) + tell_host(vb, vb-deflate_vq); mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH uq/master] fix double free the memslot in kvm_set_phys_mem
On Fri, 31 May 2013 16:52:18 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Luiz Capitulino reported that guest refused to boot and qemu complained with: kvm_set_phys_mem: error unregistering overlapping slot: Invalid argument It is caused by commit 235e8982ad that did double free for the memslot so that the second one raises the -EINVAL error Fix it by reset memory size only if it is needed Reported-by: Luiz Capitulino lcapitul...@redhat.com Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com Tested-by: Luiz Capitulino lcapitul...@redhat.com Thanks Xiao for the fix, and thanks everyone for debugging this issue! --- kvm-all.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 8e7bbf8..405480e 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -206,7 +206,8 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot) if (s-migration_log) { mem.flags |= KVM_MEM_LOG_DIRTY_PAGES; } -if (mem.flags KVM_MEM_READONLY) { + +if (slot-memory_size mem.flags KVM_MEM_READONLY) { /* Set the slot size to 0 before setting the slot to the desired * value. This is needed based on KVM commit 75d61fbc. */ mem.memory_size = 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PATCH] virtio-spec: small English/punctuation corrections
1. s/These are devices are/These devices are 2. s/Thefirst/The first 3. s/, Guest should/. Guest should Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- virtio-spec.lyx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/virtio-spec.lyx b/virtio-spec.lyx index 6e188d0..7e4ce71 100644 --- a/virtio-spec.lyx +++ b/virtio-spec.lyx @@ -116,7 +116,7 @@ description Peripheral Component Interconnect; a common device bus. Seehtt \end_inset devices. - These are devices are found in + These devices are found in \emph on virtual \emph default @@ -1558,7 +1558,7 @@ name sub:Feature-Bits \end_layout \begin_layout Standard -Thefirst configuration field indicates the features that the device supports. +The first configuration field indicates the features that the device supports. The bits are allocated as follows: \end_layout @@ -2919,7 +2919,7 @@ For each ring, guest should then disable interrupts by writing VRING_AVAIL_F_NO_ INTERRUPT flag in avail structure, if required. It can then process used ring entries finally enabling interrupts by clearing the VRING_AVAIL_F_NO_INTERRUPT flag or updating the EVENT_IDX field in - the available structure, Guest should then execute a memory barrier, and + the available structure. Guest should then execute a memory barrier, and then recheck the ring empty condition. This is necessary to handle the case where, after the last check and before enabling interrupts, an interrupt has been suppressed by the device: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2 0/2] virtio_balloon: auto-ballooning support
On Thu, 16 May 2013 16:56:34 -0400 Sasha Levin sasha.le...@oracle.com wrote: On 05/09/2013 10:53 AM, Luiz Capitulino wrote: Hi, This series is a respin of automatic ballooning support I started working on last year. Patch 2/2 contains all relevant technical details and performance measurements results. This is in RFC state because it's a work in progress. Hi Luiz, Is there a virtio spec patch I could use to get it implemented on kvmtool? Not yet, this will come with v1. But I got some homework to do before posting it (more perf tests). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] virtio_balloon: auto-ballooning support
On Sun, 12 May 2013 21:49:34 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Sun, May 12, 2013 at 12:36:09PM -0400, Rik van Riel wrote: On 05/12/2013 10:30 AM, Michael S. Tsirkin wrote: On Thu, May 09, 2013 at 10:53:49AM -0400, Luiz Capitulino wrote: Automatic ballooning consists of dynamically adjusting the guest's balloon according to memory pressure in the host and in the guest. This commit implements the guest side of automatic balloning, which basically consists of registering a shrinker callback with the kernel, which will try to deflate the guest's balloon by the amount of pages being requested. The shrinker callback is only registered if the host supports the VIRTIO_BALLOON_F_AUTO_BALLOON feature bit. OK so we have a new feature bit, such that: - if AUTO_BALLOON is set in host, guest can leak a page from a balloon at any time questions left unanswered - what meaning does num_pages have now? as large as we could go I see. This is the reverse of it's current meaning. I would suggest a new field instead of overriding the existing one. I'll write a spec patch as you suggested on irc and will decide what to do from there. - when will the balloon be re-inflated? I believe the qemu changes Luiz wrote address that side, with qemu-kvm getting notifications from the host kernel when there is memory pressure, and shrinking the guest in reaction to that notification. But it's the guest memory pressure we care about: - host asks balloon to inflate later - guest asks balloon to deflate with this patch guest takes priority, balloon deflates. So we should only inflate if guest does not need the memory. Inflate will actually fail if the guest doesn't have memory to fill the balloon. But in any case, and as you said elsewhere in this thread, inflate is not new and could be even done by mngt. So I don't think this should be changed in this patch. I'd like to see a spec patch addressing these questions. Would we ever want to mix the two types of ballooning? If yes possibly when we put a page in the balloon we might want to give host a hint this page might be leaked again soon. It might not be the same page, and the host really does not care which page it is. Whether we care depends on what we do with the page. For example, in the future we might care which numa node is used. The automatic inflation happens when the host needs to free up memory. This can be done today by management, with no need to change qemu. So automatic inflate, IMHO does not need a feature flag. It's the automatic deflate in guest that's new. Makes sense. Automatic inflate is performed by the host. Here are some numbers. The test-case is to run 35 VMs (1G of RAM each) in parallel doing a kernel build. Host has 32GB of RAM and 16GB of swap. SWAP IN and SWAP OUT correspond to the number of pages swapped in and swapped out, respectively. Auto-ballooning disabled: RUN TIME(s) SWAP IN SWAP OUT 1634 930980 1588522 2610 627422 1362174 3649 1079847 1616367 4543 953289 1635379 5642 913237 1514000 Auto-ballooning enabled: RUN TIME(s) SWAP IN SWAP OUT 1629 901 12537 2624 981 18506 3626 573 9085 4631 2250 42534 5627 1610 20808 So what exactly happens here? Much less swap in/out activity, but no gain in total runtime. Doesn't this show there's a bottleneck somewhere? Could be a problem in the implementation? It could also be an issue with the benchmark chosen, which may not have swap as its bottleneck at any point. However, the reduced swapping is still very promising! Isn't this a direct result of inflating the balloon? E.g. just inflating the balloon without the shrinker will make us avoid swap in host. What we would want to check is whether shrinking works as expected, and whether we need to speed up shrinking. As I say above, inflating the balloon is easy. A good benchmark would show how we can deflate and re-inflate it efficiently with demand. I'm going to measure this. Also, what happened with the balloon? Did we end up with balloon completely inflated? deflated? In my test-case VMs are started with 1G. After the test, almost all of them have between 200-300MB. One question to consider: possibly if we are going to reuse the page in the balloon soon, we might want to bypass notify before use for it? Maybe that will help speed things up. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 55 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 56 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 9d5fe2b..f9dcae8 100644
Re: [RFC 2/2] virtio_balloon: auto-ballooning support
On Mon, 13 May 2013 11:34:41 -0300 Rafael Aquini aqu...@redhat.com wrote: You're right, and the host's member is used to communicate the configured size to guest's balloon device, however, by not changing it when the shrinker causes the balloon to deflate will make the balloon thread to be woken up again in order to chase the balloon size target again, won't it? Check I don't see the balloon thread waking up after the shrinker executes in my testing. Maybe this is so because it will only wake up when QEMU notifies a config change. But anyway, I'll think how to improve this as suggested by Michael too, as I seem to be changing num_pages' semantics according to the virtio spec. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] virtio_balloon: auto-ballooning support
On Mon, 13 May 2013 22:02:50 +0300 Michael S. Tsirkin m...@redhat.com wrote: On Mon, May 13, 2013 at 02:25:11PM -0400, Luiz Capitulino wrote: On Mon, 13 May 2013 11:34:41 -0300 Rafael Aquini aqu...@redhat.com wrote: You're right, and the host's member is used to communicate the configured size to guest's balloon device, however, by not changing it when the shrinker causes the balloon to deflate will make the balloon thread to be woken up again in order to chase the balloon size target again, won't it? Check I don't see the balloon thread waking up after the shrinker executes in my testing. Maybe this is so because it will only wake up when QEMU notifies a config change. Well that's also a problem. Need some mechanism to re-inflate balloon In this implemention this is done by QEMU, have you looked at the QEMU patch yet? https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg01295.html when guest memory pressure is down. In this implementation we do this when there's pressure in the host. I expect things to balance over time, and this seems to be what I'm observing in my testing, but of course we need more testing. virtio fs mechanism worth a look? Can you elaborate? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 1/2] virtio_balloon: move balloon_lock mutex to callers
On Thu, 9 May 2013 18:03:09 -0300 Rafael Aquini aqu...@redhat.com wrote: On Thu, May 09, 2013 at 10:53:48AM -0400, Luiz Capitulino wrote: This commit moves the balloon_lock mutex out of the fill_balloon() and leak_balloon() functions to their callers. The reason for this change is that the next commit will introduce a shrinker callback for the balloon driver, which will also call leak_balloon() but will require different locking semantics. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..9d5fe2b 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -133,7 +133,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); - mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { struct page *page = balloon_page_enqueue(vb_dev_info); @@ -154,7 +153,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* Did we get any? */ if (vb-num_pfns != 0) tell_host(vb, vb-inflate_vq); - mutex_unlock(vb-balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -176,7 +174,6 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); - mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { page = balloon_page_dequeue(vb_dev_info); @@ -192,7 +189,6 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * is true, we *have* to do it in this order */ tell_host(vb, vb-deflate_vq); - mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } @@ -305,11 +301,13 @@ static int balloon(void *_vballoon) || freezing(current)); if (vb-need_stats_update) stats_handle_request(vb); + mutex_lock(vb-balloon_lock); if (diff 0) fill_balloon(vb, diff); else if (diff 0) leak_balloon(vb, -diff); update_balloon_size(vb); + mutex_unlock(vb-balloon_lock); } return 0; } @@ -490,9 +488,11 @@ out: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ + mutex_lock(vb-balloon_lock); while (vb-num_pages) leak_balloon(vb, vb-num_pages); update_balloon_size(vb); + mutex_unlock(vb-balloon_lock); I think you will need to introduce the same change as above to virtballoon_restore() Thanks Rafael, I've fixed it in my tree. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/2] virtio_balloon: auto-ballooning support
On Thu, 9 May 2013 18:15:19 -0300 Rafael Aquini aqu...@redhat.com wrote: On Thu, May 09, 2013 at 10:53:49AM -0400, Luiz Capitulino wrote: Automatic ballooning consists of dynamically adjusting the guest's balloon according to memory pressure in the host and in the guest. This commit implements the guest side of automatic balloning, which basically consists of registering a shrinker callback with the kernel, which will try to deflate the guest's balloon by the amount of pages being requested. The shrinker callback is only registered if the host supports the VIRTIO_BALLOON_F_AUTO_BALLOON feature bit. Automatic inflate is performed by the host. Here are some numbers. The test-case is to run 35 VMs (1G of RAM each) in parallel doing a kernel build. Host has 32GB of RAM and 16GB of swap. SWAP IN and SWAP OUT correspond to the number of pages swapped in and swapped out, respectively. Auto-ballooning disabled: RUN TIME(s) SWAP IN SWAP OUT 1634 930980 1588522 2610 627422 1362174 3649 1079847 1616367 4543 953289 1635379 5642 913237 1514000 Auto-ballooning enabled: RUN TIME(s) SWAP IN SWAP OUT 1629 901 12537 2624 981 18506 3626 573 9085 4631 2250 42534 5627 1610 20808 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- Nice work Luiz! Just allow me a silly question, though. I have 100% more chances of committing sillynesses than you, so please go ahead. Since your shrinker doesn't change the balloon target size, Which target size are you referring to? The one in the host (member num_pages of VirtIOBalloon in QEMU)? If it the one in the host, then my understanding is that that member is only used to communicate the new balloon target to the guest. The guest driver will only read it when told (by the host) to do so, and when it does the target value will be correct. Am I right? as soon as the shrink round finishes the balloon will re-inflate again, won't it? Doesn't this cause a sort of balloon thrashing scenario, if both guest and host are suffering from memory pressure? The rest I have for the moment, are only nitpicks :) drivers/virtio/virtio_balloon.c | 55 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 56 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 9d5fe2b..f9dcae8 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -71,6 +71,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* Memory shrinker */ + struct shrinker shrinker; }; static struct virtio_device_id id_table[] = { @@ -126,6 +129,7 @@ static void set_page_pfns(u32 pfns[], struct page *page) pfns[i] = page_to_balloon_pfn(page) + i; } +/* This function should be called with vb-balloon_mutex held */ static void fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; @@ -166,6 +170,7 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) } } +/* This function should be called with vb-balloon_mutex held */ static void leak_balloon(struct virtio_balloon *vb, size_t num) { struct page *page; @@ -285,6 +290,45 @@ static void update_balloon_size(struct virtio_balloon *vb) actual, sizeof(actual)); } +static unsigned long balloon_get_nr_pages(const struct virtio_balloon *vb) +{ + return vb-num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE; +} + +static int balloon_shrinker(struct shrinker *shrinker,struct shrink_control *sc) +{ + unsigned int nr_pages, new_target; + struct virtio_balloon *vb; + + vb = container_of(shrinker, struct virtio_balloon, shrinker); + if (!mutex_trylock(vb-balloon_lock)) { + return -1; + } + + nr_pages = balloon_get_nr_pages(vb); + if (!sc-nr_to_scan || !nr_pages) { + goto out; + } + + /* +* If the current balloon size is greater than the number of +* pages being reclaimed by the kernel, deflate only the needed +* amount. Otherwise deflate everything we have. +*/ + new_target = 0; + if (nr_pages sc-nr_to_scan) { + new_target = nr_pages - sc-nr_to_scan; + } + CodingStyle: you don't need the curly-braces for all these single staments above Oh, this comes from QEMU coding style. Fixed. + leak_balloon(vb, new_target); + update_balloon_size(vb); + nr_pages = balloon_get_nr_pages(vb); + +out: + mutex_unlock(vb-balloon_lock); + return nr_pages; +} + static int balloon(void
Re: [RFC 2/2] virtio_balloon: auto-ballooning support
On Fri, 10 May 2013 09:20:46 -0400 Luiz Capitulino lcapitul...@redhat.com wrote: On Thu, 9 May 2013 18:15:19 -0300 Rafael Aquini aqu...@redhat.com wrote: On Thu, May 09, 2013 at 10:53:49AM -0400, Luiz Capitulino wrote: Automatic ballooning consists of dynamically adjusting the guest's balloon according to memory pressure in the host and in the guest. This commit implements the guest side of automatic balloning, which basically consists of registering a shrinker callback with the kernel, which will try to deflate the guest's balloon by the amount of pages being requested. The shrinker callback is only registered if the host supports the VIRTIO_BALLOON_F_AUTO_BALLOON feature bit. Automatic inflate is performed by the host. Here are some numbers. The test-case is to run 35 VMs (1G of RAM each) in parallel doing a kernel build. Host has 32GB of RAM and 16GB of swap. SWAP IN and SWAP OUT correspond to the number of pages swapped in and swapped out, respectively. Auto-ballooning disabled: RUN TIME(s) SWAP IN SWAP OUT 1634 930980 1588522 2610 627422 1362174 3649 1079847 1616367 4543 953289 1635379 5642 913237 1514000 Auto-ballooning enabled: RUN TIME(s) SWAP IN SWAP OUT 1629 901 12537 2624 981 18506 3626 573 9085 4631 2250 42534 5627 1610 20808 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- Nice work Luiz! Just allow me a silly question, though. I have 100% more chances of committing sillynesses than you, so please go ahead. Since your shrinker doesn't change the balloon target size, Which target size are you referring to? The one in the host (member num_pages of VirtIOBalloon in QEMU)? If it the one in the host, then my understanding is that that member is only used to communicate the new balloon target to the guest. The guest driver will only read it when told (by the host) to do so, and when it does the target value will be correct. Am I right? as soon as the shrink round finishes the balloon will re-inflate again, won't it? Doesn't this cause a sort of balloon thrashing scenario, if both guest and host are suffering from memory pressure? Forgot to say that I didn't observe this in my testing. But I'll try harder as soon as we clarify which target size we're talking about. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC v2] virtio-balloon: automatic ballooning support
Automatic ballooning consists of dynamically adjusting the guest's balloon according to memory pressure in the host and in the guest. This commit implements the host side of automatic balloning, which basically consists of: 1. Registering with the memory.pressure_level API (from the Linux memory controller cgroup) for the MEDIUM pressure event This is a new feature starting on Linux kernel 3.10. For more information on this please check Documentation/cgroups/memory.txt in Linux kernel sources. 2. On MEDIUM pressure event reception, QEMU asks the guest kernel to inflate the balloon by 16MB 3. This is only done if the guest negotiates VIRTIO_BALLOON_F_AUTO_BALLOON which means the guest's kernel virtio-balloon driver also supports automatic ballooning Automatic deflate is performed by the guest. Here are some numbers. The test-case is to run 35 VMs (1G of RAM each) in parallel doing a kernel build. Host has 32GB of RAM and 16GB of swap. SWAP IN and SWAP OUT correspond to the number of pages swapped in and swapped out, respectively. Auto-ballooning disabled: RUN TIME(s) SWAP IN SWAP OUT 1634 930980 1588522 2610 627422 1362174 3649 1079847 1616367 4543 953289 1635379 5642 913237 1514000 Auto-ballooning enabled: RUN TIME(s) SWAP IN SWAP OUT 1629 901 12537 2624 981 18506 3626 573 9085 4631 2250 42534 5627 1610 20808 FIXMEs/TODOs: - Should we have a lower limit for guest memory? Otherwise it can reach 0 if too many events are received - Or maybe we should rate-limit events? - It seems that events are being lost when too many of them are sent at the same time on a busy host - Allow this to be dynamically enabled by mngt Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- o You can find my test script here: http://repo.or.cz/w/qemu/qmp-unstable.git/blob/refs/heads/balloon/auto-ballooning/memcg/rfc:/scripts/autob-test o You can find the guest driver counterpart code at: http://repo.or.cz/w/linux-2.6/luiz-linux-2.6.git/shortlog/refs/heads/virtio-balloon/auto-deflate/rfc o To play with automatic ballooning, do the following: 1. You'll need 3.9+ for the host kernel 2. Get the guest kernel bits from: git://repo.or.cz/linux-2.6/luiz-linux-2.6.git virtio-balloon/auto-deflate/rfc 3. Apply this patch to QEMU 4. Enable the balloon device in qemu with: -device virtio-balloon-pci,auto-balloon=true 5. Generate memory pressure in the host, or put QEMU in a memcg cgroup with limited memory. Watch the VM memory going down 6. Generate pressure in the guest to see it going up again (say, a kernel build with -j16) hw/virtio/virtio-balloon.c | 162 + hw/virtio/virtio-pci.c | 5 ++ hw/virtio/virtio-pci.h | 1 + include/hw/virtio/virtio-balloon.h | 15 4 files changed, 183 insertions(+) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index d669756..4b23360 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -31,6 +31,12 @@ #include hw/virtio/virtio-bus.h +void virtio_balloon_set_conf(DeviceState *dev, VirtIOBalloonConf *bconf) +{ +VirtIOBalloon *s = VIRTIO_BALLOON(dev); +memcpy((s-bconf), bconf, sizeof(struct VirtIOBalloonConf)); +} + static void balloon_page(void *addr, int deflate) { #if defined(__linux__) @@ -279,9 +285,21 @@ static void virtio_balloon_set_config(VirtIODevice *vdev, } } +static bool auto_balloon_enabled(const VirtIOBalloon *s) +{ +return s-bconf.auto_balloon; +} + static uint32_t virtio_balloon_get_features(VirtIODevice *vdev, uint32_t f) { +VirtIOBalloon *s = VIRTIO_BALLOON(vdev); + f |= (1 VIRTIO_BALLOON_F_STATS_VQ); + +if (auto_balloon_enabled(s)) { +f |= (1 VIRTIO_BALLOON_F_AUTO_BALLOON); +} + return f; } @@ -336,6 +354,141 @@ static int virtio_balloon_load(QEMUFile *f, void *opaque, int version_id) return 0; } +static int open_sysfile(const char *path, const char *file, mode_t mode) +{ +char *p; +int fd; + +p = g_strjoin(/, path, file, NULL); +fd = qemu_open(p, mode); +if (fd 0) { +error_report(balloon: can't open '%s': %s, p, strerror(errno)); +} + +g_free(p); +return fd; +} + +static int write_fd(int fd, const char *fmt, ...) +{ +va_list ap; +char *str; +int ret; + +va_start(ap, fmt); +str = g_strdup_vprintf(fmt, ap); +va_end(ap); + +do { +ret = write(fd, str, strlen(str)); +} while (ret 0 errno == EINTR); + +if (ret 0) { +error_report(balloon: write failed: %s, strerror(errno)); +} + +g_free(str); +return ret; +} + +static bool guest_supports_auto_balloon(const VirtIOBalloon *s) +{ +VirtIODevice *vdev = VIRTIO_DEVICE(s); +return vdev-guest_features (1
[RFC v2 0/2] virtio_balloon: auto-ballooning support
Hi, This series is a respin of automatic ballooning support I started working on last year. Patch 2/2 contains all relevant technical details and performance measurements results. This is in RFC state because it's a work in progress. Here's some information if you want to try automatic ballooning: 1. You'll need 3.9+ for the host kernel 2. Apply this series for the guest kernel 3. Grab the QEMU bits from: git://repo.or.cz/qemu/qmp-unstable.git balloon/auto-ballooning/memcg/rfc 4. Enable the balloon device in qemu with: -device virtio-balloon-pci,auto-balloon=true 5. Balloon the guest memory down, say from 1G to 256MB 6. Generate some pressure in the guest, say a kernel build with -j16 Any feedback is appreciated! Luiz Capitulino (2): virtio_balloon: move balloon_lock mutex to callers virtio_balloon: auto-ballooning support drivers/virtio/virtio_balloon.c | 63 ++--- include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 60 insertions(+), 4 deletions(-) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 2/2] virtio_balloon: auto-ballooning support
Automatic ballooning consists of dynamically adjusting the guest's balloon according to memory pressure in the host and in the guest. This commit implements the guest side of automatic balloning, which basically consists of registering a shrinker callback with the kernel, which will try to deflate the guest's balloon by the amount of pages being requested. The shrinker callback is only registered if the host supports the VIRTIO_BALLOON_F_AUTO_BALLOON feature bit. Automatic inflate is performed by the host. Here are some numbers. The test-case is to run 35 VMs (1G of RAM each) in parallel doing a kernel build. Host has 32GB of RAM and 16GB of swap. SWAP IN and SWAP OUT correspond to the number of pages swapped in and swapped out, respectively. Auto-ballooning disabled: RUN TIME(s) SWAP IN SWAP OUT 1634 930980 1588522 2610 627422 1362174 3649 1079847 1616367 4543 953289 1635379 5642 913237 1514000 Auto-ballooning enabled: RUN TIME(s) SWAP IN SWAP OUT 1629 901 12537 2624 981 18506 3626 573 9085 4631 2250 42534 5627 1610 20808 Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 55 + include/uapi/linux/virtio_balloon.h | 1 + 2 files changed, 56 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 9d5fe2b..f9dcae8 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -71,6 +71,9 @@ struct virtio_balloon /* Memory statistics */ int need_stats_update; struct virtio_balloon_stat stats[VIRTIO_BALLOON_S_NR]; + + /* Memory shrinker */ + struct shrinker shrinker; }; static struct virtio_device_id id_table[] = { @@ -126,6 +129,7 @@ static void set_page_pfns(u32 pfns[], struct page *page) pfns[i] = page_to_balloon_pfn(page) + i; } +/* This function should be called with vb-balloon_mutex held */ static void fill_balloon(struct virtio_balloon *vb, size_t num) { struct balloon_dev_info *vb_dev_info = vb-vb_dev_info; @@ -166,6 +170,7 @@ static void release_pages_by_pfn(const u32 pfns[], unsigned int num) } } +/* This function should be called with vb-balloon_mutex held */ static void leak_balloon(struct virtio_balloon *vb, size_t num) { struct page *page; @@ -285,6 +290,45 @@ static void update_balloon_size(struct virtio_balloon *vb) actual, sizeof(actual)); } +static unsigned long balloon_get_nr_pages(const struct virtio_balloon *vb) +{ + return vb-num_pages / VIRTIO_BALLOON_PAGES_PER_PAGE; +} + +static int balloon_shrinker(struct shrinker *shrinker,struct shrink_control *sc) +{ + unsigned int nr_pages, new_target; + struct virtio_balloon *vb; + + vb = container_of(shrinker, struct virtio_balloon, shrinker); + if (!mutex_trylock(vb-balloon_lock)) { + return -1; + } + + nr_pages = balloon_get_nr_pages(vb); + if (!sc-nr_to_scan || !nr_pages) { + goto out; + } + + /* +* If the current balloon size is greater than the number of +* pages being reclaimed by the kernel, deflate only the needed +* amount. Otherwise deflate everything we have. +*/ + new_target = 0; + if (nr_pages sc-nr_to_scan) { + new_target = nr_pages - sc-nr_to_scan; + } + + leak_balloon(vb, new_target); + update_balloon_size(vb); + nr_pages = balloon_get_nr_pages(vb); + +out: + mutex_unlock(vb-balloon_lock); + return nr_pages; +} + static int balloon(void *_vballoon) { struct virtio_balloon *vb = _vballoon; @@ -471,6 +515,13 @@ static int virtballoon_probe(struct virtio_device *vdev) goto out_del_vqs; } + memset(vb-shrinker, 0, sizeof(vb-shrinker)); + if (virtio_has_feature(vb-vdev, VIRTIO_BALLOON_F_AUTO_BALLOON)) { + vb-shrinker.shrink = balloon_shrinker; + vb-shrinker.seeks = DEFAULT_SEEKS; + register_shrinker(vb-shrinker); + } + return 0; out_del_vqs: @@ -487,6 +538,9 @@ out: static void remove_common(struct virtio_balloon *vb) { + if (vb-shrinker.shrink) + unregister_shrinker(vb-shrinker); + /* There might be pages left in the balloon: free them. */ mutex_lock(vb-balloon_lock); while (vb-num_pages) @@ -543,6 +597,7 @@ static int virtballoon_restore(struct virtio_device *vdev) static unsigned int features[] = { VIRTIO_BALLOON_F_MUST_TELL_HOST, VIRTIO_BALLOON_F_STATS_VQ, + VIRTIO_BALLOON_F_AUTO_BALLOON, }; static struct virtio_driver virtio_balloon_driver = { diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h index 5e26f61..bd378a4 100644
[RFC 1/2] virtio_balloon: move balloon_lock mutex to callers
This commit moves the balloon_lock mutex out of the fill_balloon() and leak_balloon() functions to their callers. The reason for this change is that the next commit will introduce a shrinker callback for the balloon driver, which will also call leak_balloon() but will require different locking semantics. Signed-off-by: Luiz Capitulino lcapitul...@redhat.com --- drivers/virtio/virtio_balloon.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index bd3ae32..9d5fe2b 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -133,7 +133,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); - mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { struct page *page = balloon_page_enqueue(vb_dev_info); @@ -154,7 +153,6 @@ static void fill_balloon(struct virtio_balloon *vb, size_t num) /* Did we get any? */ if (vb-num_pfns != 0) tell_host(vb, vb-inflate_vq); - mutex_unlock(vb-balloon_lock); } static void release_pages_by_pfn(const u32 pfns[], unsigned int num) @@ -176,7 +174,6 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) /* We can only do one array worth at a time. */ num = min(num, ARRAY_SIZE(vb-pfns)); - mutex_lock(vb-balloon_lock); for (vb-num_pfns = 0; vb-num_pfns num; vb-num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { page = balloon_page_dequeue(vb_dev_info); @@ -192,7 +189,6 @@ static void leak_balloon(struct virtio_balloon *vb, size_t num) * is true, we *have* to do it in this order */ tell_host(vb, vb-deflate_vq); - mutex_unlock(vb-balloon_lock); release_pages_by_pfn(vb-pfns, vb-num_pfns); } @@ -305,11 +301,13 @@ static int balloon(void *_vballoon) || freezing(current)); if (vb-need_stats_update) stats_handle_request(vb); + mutex_lock(vb-balloon_lock); if (diff 0) fill_balloon(vb, diff); else if (diff 0) leak_balloon(vb, -diff); update_balloon_size(vb); + mutex_unlock(vb-balloon_lock); } return 0; } @@ -490,9 +488,11 @@ out: static void remove_common(struct virtio_balloon *vb) { /* There might be pages left in the balloon: free them. */ + mutex_lock(vb-balloon_lock); while (vb-num_pages) leak_balloon(vb, vb-num_pages); update_balloon_size(vb); + mutex_unlock(vb-balloon_lock); /* Now we reset the device so we can clean up the queues. */ vb-vdev-config-reset(vb-vdev); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for 2013-04-23
On Tue, 23 Apr 2013 10:06:41 -0600 Eric Blake ebl...@redhat.com wrote: we can change drive_mirror to use a new command to see if there are the new features. drive-mirror changed in 1.4 to add optional buf-size parameter; right now, libvirt is forced to limit itself to 1.3 interface (no buf-size or granularity) because there is no introspection and no query-* command that witnesses that the feature is present. Idea was that we need to add a new query-drive-mirror-capabilities (name subject to bikeshedding) command into 1.5 that would let libvirt know that buf-size/granularity is usable (done right, it would also prevent the situation of buf-size being a write-only interface where it is set when starting the mirror but can not be queried later to see what size is in use). Unclear whether anyone was signing up to tackle the addition of a query command counterpart for drive-mirror in time for 1.5. I can do it. Nice write-up Eric! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes for 2013-04-23
On Wed, 24 Apr 2013 10:03:21 +0200 Stefan Hajnoczi stefa...@gmail.com wrote: On Tue, Apr 23, 2013 at 10:06:41AM -0600, Eric Blake wrote: On 04/23/2013 08:45 AM, Juan Quintela wrote: we can change drive_mirror to use a new command to see if there are the new features. drive-mirror changed in 1.4 to add optional buf-size parameter; right now, libvirt is forced to limit itself to 1.3 interface (no buf-size or granularity) because there is no introspection and no query-* command that witnesses that the feature is present. Idea was that we need to add a new query-drive-mirror-capabilities (name subject to bikeshedding) command into 1.5 that would let libvirt know that buf-size/granularity is usable (done right, it would also prevent the situation of buf-size being a write-only interface where it is set when starting the mirror but can not be queried later to see what size is in use). Unclear whether anyone was signing up to tackle the addition of a query command counterpart for drive-mirror in time for 1.5. Seems like the trivial solution is a query-command-capabilities QMP command. query-command-capabilities drive-mirror = ['buf-size'] It should only be a few lines of code and can be used for other commands that add optional parameters in the future. In other words: IMO, a separate command is better because we'll return CommandNotFound error if the command doesn't exist. If we add query-command-capabilities we'd need a new error class, otherwise a client won't be able to tell if the command passed as argument exists. Besides, separate commands tend to be simpler; and we already have query-migrate-capabilities anyway. The only disadvantage is some duplication and an increase in the number of commands, but I don't think this is avoidable. typedef struct mon_cmd_t { ... const char **capabilities; /* drive-mirror uses [buf-size, NULL] */ }; if we have a stable c-api we can do test cases that work. Having such a testsuite would make a stable C API more important. Writing tests in Python has been productive, see qemu-iotests 041 and friends. The tests spawn QEMU guests and use QMP to interact: Good to know. result = self.vm.qmp('query-block') self.assert_qmp(result, 'return[0]/inserted/file', target_img) Using this XPath-style syntax it's very easy to access the JSON. QEMU users tend not to use C, except libvirt. Even libvirt implements the QMP protocol dynamically and can handle optional arguments well. I don't think a static C API makes sense when we have an extensible JSON protocol. Let's use the extensibility to our advantage. Agreed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v10] kvm: notify host when the guest is panicked
On Wed, 29 Aug 2012 13:18:54 +0800 Wen Congyang we...@cn.fujitsu.com wrote: We can know the guest is panicked when the guest runs on xen. But we do not have such feature on kvm. What's the status of this series? It got lost in my queue and I ended up not reviewing it, but it seems to be stuck. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is panicked. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is panicked. We have three solutions to implement this feature: 1. use vmcall 2. use I/O port 3. use virtio-serial. We have decided to avoid touching hypervisor. The reason why I choose choose the I/O port is: 1. it is easier to implememt 2. it does not depend any virtual device 3. it can work when starting the kernel Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- Documentation/virtual/kvm/pv_event.txt | 32 arch/ia64/include/asm/kvm_para.h | 14 ++ arch/powerpc/include/asm/kvm_para.h| 14 ++ arch/s390/include/asm/kvm_para.h | 14 ++ arch/x86/include/asm/kvm_para.h| 27 +++ arch/x86/kernel/kvm.c | 25 + include/linux/kvm_para.h | 23 +++ 7 files changed, 149 insertions(+), 0 deletions(-) create mode 100644 Documentation/virtual/kvm/pv_event.txt diff --git a/Documentation/virtual/kvm/pv_event.txt b/Documentation/virtual/kvm/pv_event.txt new file mode 100644 index 000..bb04de0 --- /dev/null +++ b/Documentation/virtual/kvm/pv_event.txt @@ -0,0 +1,32 @@ +The KVM paravirtual event interface += + +Initializing the paravirtual event interface +== +kvm_pv_event_init() +Argiments: + None + +Return Value: + 0: The guest kernel can use paravirtual event interface. + 1: The guest kernel can't use paravirtual event interface. + +Querying whether the event can be ejected +== +kvm_pv_has_feature() +Arguments: + feature: The bit value of this paravirtual event to query + +Return Value: + 0 : The guest kernel can't eject this paravirtual event. + -1: The guest kernel can eject this paravirtual event. + + +Ejecting paravirtual event +== +kvm_pv_eject_event() +Arguments: + event: The event to be ejected. + +Return Value: + None diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h index 2019cb9..b5ec658 100644 --- a/arch/ia64/include/asm/kvm_para.h +++ b/arch/ia64/include/asm/kvm_para.h @@ -31,6 +31,20 @@ static inline bool kvm_check_and_clear_guest_paused(void) return false; } +static inline int kvm_arch_pv_event_init(void) +{ + return 0; +} + +static inline unsigned int kvm_arch_pv_features(void) +{ + return 0; +} + +static inline void kvm_arch_pv_eject_event(unsigned int event) +{ +} + #endif #endif diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index c18916b..01b98c7 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -211,6 +211,20 @@ static inline bool kvm_check_and_clear_guest_paused(void) return false; } +static inline int kvm_arch_pv_event_init(void) +{ + return 0; +} + +static inline unsigned int kvm_arch_pv_features(void) +{ + return 0; +} + +static inline void kvm_arch_pv_eject_event(unsigned int event) +{ +} + #endif /* __KERNEL__ */ #endif /* __POWERPC_KVM_PARA_H__ */ diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h index da44867..00ce058 100644 --- a/arch/s390/include/asm/kvm_para.h +++ b/arch/s390/include/asm/kvm_para.h @@ -154,6 +154,20 @@ static inline bool kvm_check_and_clear_guest_paused(void) return false; } +static inline int kvm_arch_pv_event_init(void) +{ + return 0; +} + +static inline unsigned int kvm_arch_pv_features(void) +{ + return 0; +} + +static inline void kvm_arch_pv_eject_event(unsigned int event) +{ +} + #endif #endif /* __S390_KVM_PARA_H */ diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 2f7712e..7d297f0 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -96,8 +96,11 @@ struct kvm_vcpu_pv_apf_data { #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK #define KVM_PV_EOI_DISABLED 0x0 +#define KVM_PV_EVENT_PORT(0x505UL) + #ifdef __KERNEL__ #include asm/processor.h +#include linux/ioport.h extern void kvmclock_init(void); extern int kvm_register_clock(char *txt); @@ -228,6 +231,30 @@ static inline void kvm_disable_steal_time(void) } #endif +static inline int kvm_arch_pv_event_init(void) +{ +
Re: [Qemu-devel] KVM call agenda for September 25th
On Tue, 25 Sep 2012 07:57:53 -0500 Anthony Liguori anth...@codemonkey.ws wrote: Paolo Bonzini pbonz...@redhat.com writes: Il 24/09/2012 13:28, Juan Quintela ha scritto: Hi Please send in any agenda items you are interested in covering. URI parsing library for glusterfs: libxml2 vs. in-tree fork of the same code. The call is a bit late for Bharata but I think copying is the way to go. Something I've been thinking about since this discussion started though. Maybe we could standardize on using URIs as short-hand syntax for backends. Agreed, just suggested this for qmp commands taking a file path and a fd name in other thread. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for September 25th
On Mon, 24 Sep 2012 13:48:26 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 24/09/2012 13:28, Juan Quintela ha scritto: Hi Please send in any agenda items you are interested in covering. URI parsing library for glusterfs: libxml2 vs. in-tree fork of the same code. Case we're going to have the call (otherwise let's discuss it on the list): - change blocking I/O to non-blocking I/O for qmp commands? We have a few qmp commands that do blocking I/O (eg. screendump and dump-guest-memory). Theoretically, those commands could block forever. This is a more serious issue with the screendump command, which doesn't stop vcpus. I've never received a report about this, so maybe this is not an issue. But, while the perfect solution here is to have async commands, I was wondering if it would be feasible for synchronous commands like screendump to be changed to use non-blocking fds. This way we don't risk blocking. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM Call minutes for 2012-09-25
On Tue, 25 Sep 2012 16:59:00 +0200 Markus Armbruster arm...@redhat.com wrote: Juan Quintela quint...@redhat.com writes: Hi This are this week minutes: - URI parsing library for glusterfs: libxml2 vs. in-tree fork of the same code. (Paolo) * code hasn't changed in 2 years, it is really stable * anthony wants to copy the code - there are several commands that do blocking IO dump-guest-memory/screen-dump convert to asynchronous commands after we move all to QAPI only two commands missingto port to QAPI, and one is posted on list non-blocking IO to a file is a challenge (we have code on the block layer for it) - how to give errors from OpenFile to the caller putting errno as int: bad idea putting as strerrno string: also a bad idea, no warantees Use the identifiers instead of their non-portable numeric encodings or strerror() descriptions: EPERM, EINVAL, ... Yes, but for me the important point in this discussion is whether or not a new class is necessary. I think it it isn't. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt.kvm: Handle migrate errors using QMP monitor properly
On Tue, 21 Aug 2012 12:02:11 -0300 Lucas Meneghel Rodrigues l...@redhat.com wrote: When using QMP monitor as the sole monitor on KVM autotest (something that we sadly did not exercise on our test farms), starting qemu with -S and then issuing 'cont' will cause errors, since the error treatment with QMP monitors is more strict [1]. Take advantage of the fact that error treatment with the QMP json structures is much easier, and handle failures during migration accordingly. With this patch, migration works properly using only QMP monitors. [1] This means we probably should be more rigorous treating Human Monitor errors, but that's going to be handled later. CC: Qingtang Zhou qz...@redhat.com CC: Gerd Hoffmann kra...@redhat.com Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/virt/kvm_monitor.py | 8 +++- client/virt/kvm_vm.py | 10 +- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/client/virt/kvm_monitor.py b/client/virt/kvm_monitor.py index 8b5e251..9d8ed87 100644 --- a/client/virt/kvm_monitor.py +++ b/client/virt/kvm_monitor.py @@ -1152,7 +1152,13 @@ class QMPMonitor(Monitor): args = {uri: uri, blk: full_copy, inc: incremental_copy} -return self.cmd(migrate, args) +try: +return self.cmd(migrate, args) +except QMPCmdError, e: +if e.data['class'] == 'SockConnectInprogress': We've refactored our errors in QMP and most errors are going away (the one above included). The only errors that are staying for compatibility are: CommandNotFound, DeviceEncrypted, DeviceNotActive, DeviceNotFound, KVMMissingCap, MigrationExpected. All other errors are going to be simply GenericError. +logging.debug(Migrate socket connection still initializing...) +else: +raise e def migrate_set_speed(self, value): diff --git a/client/virt/kvm_vm.py b/client/virt/kvm_vm.py index 871b824..19d018d 100644 --- a/client/virt/kvm_vm.py +++ b/client/virt/kvm_vm.py @@ -1743,7 +1743,15 @@ class VM(virt_vm.BaseVM): output_params=(outfile,)) # start guest -self.monitor.cmd(cont) +if self.monitor.verify_status(paused): +try: +self.monitor.cmd(cont) +except kvm_monitor.QMPCmdError, e: +if ((e.data['class'] == MigrationExpected) and +(migration_mode is not None)): +logging.debug(Migration did not start yet...) +else: +raise e finally: fcntl.lockf(lockfile, fcntl.LOCK_UN) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt.kvm_monitor: Future proof migration handling on QMP monitor
On Wed, 22 Aug 2012 11:43:38 -0300 Lucas Meneghel Rodrigues l...@redhat.com wrote: With d46ad35c74, the exception handling for migrations happening when using a single QMP monitor relies on an exception class that's going to disappear in future versions of QEMU, being replaced by the GenericError class. So let's also handle this exception class. CC: Luiz Capitulino lcapitul...@redhat.com Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/virt/kvm_monitor.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/client/virt/kvm_monitor.py b/client/virt/kvm_monitor.py index 9d8ed87..932725b 100644 --- a/client/virt/kvm_monitor.py +++ b/client/virt/kvm_monitor.py @@ -1155,7 +1155,7 @@ class QMPMonitor(Monitor): try: return self.cmd(migrate, args) except QMPCmdError, e: -if e.data['class'] == 'SockConnectInprogress': +if e.data['class'] in ['SockConnectInprogress', 'GenericError']: logging.debug(Migrate socket connection still initializing...) else: raise e Patch looks correct now. There's only one small detail that I forgot telling you in the previous email (sorry!). The SockConnectInprogress shouldn't be returned by qemu. Actually, migration works most of the time this error is returned. That error was being used to communicate a special condition in qemu that was handled internally, however, due to a bug, the error ended up being propagated up and returned as a qmp response. This was fixed in qemu.git for 1.2. But this patch made me check this issue for 1.1 and it seems to exist there too, which means my fix has to be backported to -stable. There's no problem keeping this patch as is though. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event
On Tue, 12 Jun 2012 14:55:37 +0800 Wen Congyang we...@cn.fujitsu.com wrote: +static void panicked_perform_action(void) +{ +switch(panicked_action) { +case PANICKED_REPORT: +panicked_mon_event(report); +break; + +case PANICKED_PAUSE: +panicked_mon_event(pause); +vm_stop(RUN_STATE_GUEST_PANICKED); +break; + +case PANICKED_QUIT: +panicked_mon_event(quit); +exit(0); +break; +} Having the data argument is not needed/wanted. The mngt app can guess it if it needs to know it, but I think it doesn't want to. Libvirt will do something when the kernel is panicked, so it should know the action in qemu side. But the action will be set by libvirt itself, no? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event
On Tue, 12 Jun 2012 13:40:45 +0100 Daniel P. Berrange berra...@redhat.com wrote: On Tue, Jun 12, 2012 at 09:35:04AM -0300, Luiz Capitulino wrote: On Tue, 12 Jun 2012 14:55:37 +0800 Wen Congyang we...@cn.fujitsu.com wrote: +static void panicked_perform_action(void) +{ +switch(panicked_action) { +case PANICKED_REPORT: +panicked_mon_event(report); +break; + +case PANICKED_PAUSE: +panicked_mon_event(pause); +vm_stop(RUN_STATE_GUEST_PANICKED); +break; + +case PANICKED_QUIT: +panicked_mon_event(quit); +exit(0); +break; +} Having the data argument is not needed/wanted. The mngt app can guess it if it needs to know it, but I think it doesn't want to. Libvirt will do something when the kernel is panicked, so it should know the action in qemu side. But the action will be set by libvirt itself, no? Sure, but the whole world isn't libvirt. If the process listening to the monitor is not the same as the process which launched the VM, then I think including the action is worthwhile. Besides, the way Wen has done this is identical to what we already do with QEVENT_WATCHDOG and I think it is desirable to keep consistency here. That's right, I had forgotten about the WATCHDOG event. Maybe it would make more sense to have this info in a query- command though, specially if we plan to have a command to change that setting. But I won't oppose having it in the event. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Mon, 04 Jun 2012 12:56:41 +0800 Anthony Liguori anth...@codemonkey.ws wrote: On 05/25/2012 08:53 PM, Luiz Capitulino wrote: On Fri, 25 May 2012 13:01:37 +0100 Stefan Hajnoczistefa...@gmail.com wrote: I agree it would be nice to drop entirely but I don't feel happy doing that to users who might have QEMU buried in scripts somewhere. One day they upgrade packages and suddenly their stuff doesn't work anymore. This is very similar to kqemu and I don't think we regret having dropped it. You couldn't imagine the number of complaints I got from users about dropping kqemu. It caused me considerable pain. Complaints ranged from down right hostile (I had to involve the Launchpad admins at one point because of a particular user) to entirely sympathetic. kqemu wasn't just a maintenance burden, it was preventing large guest memory support in KVM guests. There was no simple way around it without breaking kqemu ABI and making significant changes to the kqemu module. Dropping features is only something that should be approached lightly and certainly not something that should be done just because you don't like a particular bit of code. It's not just because I don't like the code. Afaik, there are better external tools that seem to do exact the same thing (and even seem to do it better) But as Markus said in the other thread, it's just advice, not strong objection. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/3] start vm after reseting it
On Mon, 21 May 2012 14:49:32 +0800 Wen Congyang we...@cn.fujitsu.com wrote: The guest should run after reseting it, but it does not run if its old state is RUN_STATE_INTERNAL_ERROR or RUN_STATE_PAUSED. Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- vl.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/vl.c b/vl.c index 23ab3a3..7f5fed8 100644 --- a/vl.c +++ b/vl.c @@ -1539,6 +1539,7 @@ static bool main_loop_should_exit(void) if (runstate_check(RUN_STATE_INTERNAL_ERROR) || runstate_check(RUN_STATE_SHUTDOWN)) { runstate_set(RUN_STATE_PAUSED); +vm_start(); Please, drop the runstate_set() call. I think you also have to to call bdrv_iostatus_reset(), as qmp_cont() does. } } if (qemu_powerdown_requested()) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/3] deal with guest panicked event
On Mon, 21 May 2012 14:50:51 +0800 Wen Congyang we...@cn.fujitsu.com wrote: When the guest is panicked, it will write 0x1 to the port 0x505. So if qemu reads 0x1 from this port, we can do the folloing three things according to the parameter -onpanic: 1. emit QEVENT_GUEST_PANICKED only 2. emit QEVENT_GUEST_PANICKED and pause VM 3. emit QEVENT_GUEST_PANICKED and quit VM Note: if we emit QEVENT_GUEST_PANICKED only, and the management application does not receive this event(the management may not run when the event is emitted), the management won't know the guest is panicked. It will if it checks the vm status. Btw, please, split this further into a patch adding the event, another one adding the new runstate and then the rest. One more comment below. Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- kvm-all.c| 84 ++ kvm-stub.c |9 ++ kvm.h|3 ++ monitor.c|3 ++ monitor.h|1 + qapi-schema.json |6 +++- qemu-options.hx | 14 + qmp.c|3 +- vl.c | 17 ++- 9 files changed, 137 insertions(+), 3 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 9b73ccf..b5b0531 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -19,6 +19,7 @@ #include stdarg.h #include linux/kvm.h +#include linux/kvm_para.h #include qemu-common.h #include qemu-barrier.h @@ -29,6 +30,8 @@ #include bswap.h #include memory.h #include exec-memory.h +#include iorange.h +#include qemu-objects.h /* This check must be after config-host.h is included */ #ifdef CONFIG_EVENTFD @@ -1707,3 +1710,84 @@ int kvm_on_sigbus(int code, void *addr) { return kvm_arch_on_sigbus(code, addr); } + +/* Possible values for action parameter. */ +#define PANICKED_REPORT 1 /* emit QEVENT_GUEST_PANICKED only */ +#define PANICKED_PAUSE 2 /* emit QEVENT_GUEST_PANICKED and pause VM */ +#define PANICKED_QUIT 3 /* emit QEVENT_GUEST_PANICKED and quit VM */ + +static int panicked_action = PANICKED_REPORT; + +static void kvm_pv_port_read(IORange *iorange, uint64_t offset, unsigned width, + uint64_t *data) +{ +*data = (1 KVM_PV_FEATURE_PANICKED); +} + +static void panicked_mon_event(const char *action) +{ +QObject *data; + +data = qobject_from_jsonf({ 'action': %s }, action); +monitor_protocol_event(QEVENT_GUEST_PANICKED, data); +qobject_decref(data); +} + +static void panicked_perform_action(void) +{ +switch(panicked_action) { +case PANICKED_REPORT: +panicked_mon_event(report); +break; + +case PANICKED_PAUSE: +panicked_mon_event(pause); +vm_stop(RUN_STATE_GUEST_PANICKED); +break; + +case PANICKED_QUIT: +panicked_mon_event(quit); +exit(0); +break; +} Having the data argument is not needed/wanted. The mngt app can guess it if it needs to know it, but I think it doesn't want to. +} + +static void kvm_pv_port_write(IORange *iorange, uint64_t offset, unsigned width, + uint64_t data) +{ +if (data == KVM_PV_PANICKED) +panicked_perform_action(); +} + +static void kvm_pv_port_destructor(IORange *iorange) +{ +g_free(iorange); +} + +static IORangeOps pv_io_range_ops = { +.read = kvm_pv_port_read, +.write = kvm_pv_port_write, +.destructor = kvm_pv_port_destructor, +}; + +void kvm_pv_port_init(void) +{ +IORange *pv_io_range = g_malloc(sizeof(IORange)); + +iorange_init(pv_io_range, pv_io_range_ops, 0x505, 1); +ioport_register(pv_io_range); +} + +int select_panicked_action(const char *p) +{ +if (strcasecmp(p, report) == 0) +panicked_action = PANICKED_REPORT; +else if (strcasecmp(p, pause) == 0) +panicked_action = PANICKED_PAUSE; +else if (strcasecmp(p, quit) == 0) +panicked_action = PANICKED_QUIT; +else +return -1; + +return 0; +} diff --git a/kvm-stub.c b/kvm-stub.c index 47c573d..4cf977e 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -128,3 +128,12 @@ int kvm_on_sigbus(int code, void *addr) { return 1; } + +void kvm_pv_port_init(void) +{ +} + +int select_panicked_action(const char *p) +{ +return -1; +} diff --git a/kvm.h b/kvm.h index 4ccae8c..95075cf 100644 --- a/kvm.h +++ b/kvm.h @@ -60,6 +60,9 @@ int kvm_has_gsi_routing(void); int kvm_allows_irq0_override(void); +void kvm_pv_port_init(void); +int select_panicked_action(const char *p); + #ifdef NEED_CPU_H int kvm_init_vcpu(CPUArchState *env); diff --git a/monitor.c b/monitor.c index 12a6fe2..83cb059 100644 --- a/monitor.c +++ b/monitor.c @@ -493,6 +493,9 @@ void monitor_protocol_event(MonitorEvent event, QObject *data) case QEVENT_WAKEUP: event_name = WAKEUP; break; +
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Mon, 28 May 2012 12:17:04 +0100 Stefan Hajnoczi stefa...@linux.vnet.ibm.com wrote: What we need to decide is whether it's okay to drop QEMU VLANs completely and change dump command-line syntax? I'd vote for dropping it. I think vlan-hub doesn't hurt anyone because the code has been isolated and we keep backwards compatibility. So I'd personally still go the vlan-hub route for QEMU 1.x. Just to make it clear: I'm not against this series. I'm against having the functionality in qemu. If we want to keep the functionality, then I completely agree that this series is the way to go. Having glanced at this series I think it's beyond comparison with the current code... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 08:47:18 +0800 Zhi Yong Wu zwu.ker...@gmail.com wrote: On Fri, May 25, 2012 at 4:53 AM, Luiz Capitulino lcapitul...@redhat.com wrote: On Fri, 25 May 2012 01:59:06 +0800 zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com The patchset implements network hub stead of vlan. The main work was done by stefan, and i rebased it to latest QEMU upstream, did some testings and am responsible for pushing it to QEMU upstream. Honest question: does it really pay off to have this in qemu vs. using one of It's said that it can speed up packets delivery, but i have not do every bechmark testings. For more details, please refer to http://thread.gmane.org/gmane.comp.emulators.qemu/133362 the externaly available solutions? Is there external available solutions?:), What are they? Open vSwitch? I _guess_ that for non-linux unices we have only vde (do bsds have a tap interface that can be used like we use it in linux?). For Linux we also vde, openvswitch and macvtap in bridge mode. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 13:01:37 +0100 Stefan Hajnoczi stefa...@gmail.com wrote: I agree it would be nice to drop entirely but I don't feel happy doing that to users who might have QEMU buried in scripts somewhere. One day they upgrade packages and suddenly their stuff doesn't work anymore. This is very similar to kqemu and I don't think we regret having dropped it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 14:59:25 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 14:53, Luiz Capitulino ha scritto: I agree it would be nice to drop entirely but I don't feel happy doing that to users who might have QEMU buried in scripts somewhere. One day they upgrade packages and suddenly their stuff doesn't work anymore. This is very similar to kqemu and I don't think we regret having dropped it. It's not. kqemu was putting maintainance burden, the aim of this patch is exactly to isolate the feature to command-line parsing and a magic net client. If you don't use -net, the new code is absolutely dead, unlike kqemu. Let me quote Stefan on this thread: The point of this patch series is to remove the special-case net.c code for the legacy vlan feature. Today's code makes it harder to implement a clean QOM model and is a burden for the net subsystem in general -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 15:14:39 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 15:07, Luiz Capitulino ha scritto: On Fri, 25 May 2012 14:59:25 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 14:53, Luiz Capitulino ha scritto: I agree it would be nice to drop entirely but I don't feel happy doing that to users who might have QEMU buried in scripts somewhere. One day they upgrade packages and suddenly their stuff doesn't work anymore. This is very similar to kqemu and I don't think we regret having dropped it. It's not. kqemu was putting maintainance burden, the aim of this patch is exactly to isolate the feature to command-line parsing and a magic net client. If you don't use -net, the new code is absolutely dead, unlike kqemu. Let me quote Stefan on this thread: The point of this patch series is to remove the special-case net.c code for the legacy vlan feature. Today's code makes it harder to implement a clean QOM model and is a burden for the net subsystem in general Still not sure what you mean... I meant it's a similar case. kqemu was a special case and maintenance burden. We've dropped it and didn't regret. What's stopping us from doing the same thing with vlans? we removed kqemu and didn't give an alternative. This time we are providing an alternative. Alternatives already exist, we don't have to provide them. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 15:19:28 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 15:18, Luiz Capitulino ha scritto: Still not sure what you mean... I meant it's a similar case. kqemu was a special case and maintenance burden. We've dropped it and didn't regret. What's stopping us from doing the same thing with vlans? That we have an alternative, and that -net dump is actually useful. I haven't reviewed the series yet, but -net dump can work without this, can't it? It's always possible to have alternatives in qemu, the point is how far we're going on bloating it. we removed kqemu and didn't give an alternative. This time we are providing an alternative. Alternatives already exist, we don't have to provide them. Alternatives that require you to have root privileges (anything involving libvirt or iptables) are not really alternatives. It seems to me that vde doesn't require root, but even if it does, moving this outside of qemu would also be feasible. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 15:37:15 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 15:30, Luiz Capitulino ha scritto: On Fri, 25 May 2012 15:19:28 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 15:18, Luiz Capitulino ha scritto: Still not sure what you mean... I meant it's a similar case. kqemu was a special case and maintenance burden. We've dropped it and didn't regret. What's stopping us from doing the same thing with vlans? That we have an alternative, and that -net dump is actually useful. I haven't reviewed the series yet, but -net dump can work without this, can't it? -net dump requires putting a back-end, a front-end and the dump client in the same VLAN. So it is quite useless without this. VDE allows this too :) It's always possible to have alternatives in qemu, the point is how far we're going on bloating it. we removed kqemu and didn't give an alternative. This time we are providing an alternative. Alternatives already exist, we don't have to provide them. Alternatives that require you to have root privileges (anything involving libvirt or iptables) are not really alternatives. It seems to me that vde doesn't require root, but even if it does, moving this outside of qemu would also be feasible. Yeah, VDE probably includes something like an hub. But then we could drop even -net socket, -net udp, -net dump, and only leave in vde+tap+slirp. Or even move slirp into VDE. :) That's a very different thing. Let's start with what is hurting us. Do distributions package VDE at all? I'm not sure. But note that openvswitch is a better alternative for linux. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 15:47:28 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 25/05/2012 15:43, Luiz Capitulino ha scritto: Yeah, VDE probably includes something like an hub. But then we could drop even -net socket, -net udp, -net dump, and only leave in vde+tap+slirp. Or even move slirp into VDE. :) That's a very different thing. Let's start with what is hurting us. But is it? The patch makes it quite clean. vlan is and the cleanest solution is to drop it. But that's just my opnion, I won't (and possibly can't) nack this. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 00/16] net: hub-based networking
On Fri, 25 May 2012 01:59:06 +0800 zwu.ker...@gmail.com wrote: From: Zhi Yong Wu wu...@linux.vnet.ibm.com The patchset implements network hub stead of vlan. The main work was done by stefan, and i rebased it to latest QEMU upstream, did some testings and am responsible for pushing it to QEMU upstream. Honest question: does it really pay off to have this in qemu vs. using one of the externaly available solutions? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for May, Tuesday 8th
On Tue, 08 May 2012 07:14:11 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 05/07/2012 06:47 AM, Juan Quintela wrote: Hi Please send in any agenda items you are interested in covering. I've got a conflict at 9am as it turns out so I won't be able to attend. Does this mean the call is canceled? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Can VMX provide real mode support?
On Wed, 21 Mar 2012 15:48:43 +0200 Avi Kivity a...@redhat.com wrote: On 03/21/2012 03:40 PM, Jan Kiszka wrote: On 2012-03-21 13:38, GaoYi wrote: Hi Jan, Since the newest Intel-VT supports the guest OS under the real mode, which was already supported in AMD-V, can the VMX in the latest KVM support that case? Yes, both with our without that unrestricted guest support (as Intel called it), real mode will generally work. Without that CPU feature, I think to recall that there were some limitations for big real mode, not sure. Yes, big real mode will not work without unrestricted guest. There was some work to emulate it (module option emulate_invalid_guest_state), but it is not complete. Can you provide a pointer for this? series? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH]qemu: deal with guest paniced event
On Mon, 27 Feb 2012 11:05:58 +0800 Wen Congyang we...@cn.fujitsu.com wrote: When the host knows the guest is paniced, it will set exit_reason to KVM_EXIT_GUEST_PANIC. So if qemu receive this exit_reason, we can send a event to tell management application that the guest is paniced. Signed-off-by: Wen Congyang we...@cn.fujitsu.com --- kvm-all.c |3 +++ linux-headers/linux/kvm.h |1 + monitor.c |3 +++ monitor.h |1 + 4 files changed, 8 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index c4babda..ae428ab 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1190,6 +1190,9 @@ int kvm_cpu_exec(CPUState *env) (uint64_t)run-hw.hardware_exit_reason); ret = -1; break; +case KVM_EXIT_GUEST_PANIC: +monitor_protocol_event(QEVENT_GUEST_PANICED, NULL); +break; The event alone is not enough, because the mngt app may miss it (eg. the panic happens before the mngt app connected to qemu). A simple way to solve this would be to also add a new RunState called guest-panic and make the transition to it (besides sending the event). A more general way would be to model this after -drive's werror/rerror options, say guest-error=report|ignore|stop. When guest-error=stop, the mngt app will get a STOP event and can check the VM runstate to know if it's guest-panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for October 25
On Mon, 24 Oct 2011 13:02:05 +0100 Peter Maydell peter.mayd...@linaro.org wrote: On 24 October 2011 12:35, Paolo Bonzini pbonz...@redhat.com wrote: On 10/24/2011 01:04 PM, Juan Quintela wrote: Please send in any agenda items you are interested in covering. - What's left to merge for 1.0. Things on my list, FWIW: * current target-arm pullreq * PL041 support (needs another patch round to fix a minor bug Andrzej spotted) * cpu_single_env must be thread-local I submitted today the second round of QAPI conversions, which converts all existing QMP query commands to the QAPI (plus some fixes). I expect that to make 1.0. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address
On Fri, 29 Apr 2011 08:30:25 +0800 Huang Ying ying.hu...@intel.com wrote: On 04/28/2011 10:04 PM, Marcelo Tosatti wrote: On Thu, Apr 28, 2011 at 08:00:19AM -0500, Anthony Liguori wrote: On 04/27/2011 06:06 PM, Marcelo Tosatti wrote: On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote: On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote: Author: Max Asbockmasb...@linux.vnet.ibm.com Add command x-gpa2hva to translate guest physical address to host virtual address. Because gpa to hva translation is not consistent, so this command is only used for debugging. The x-gpa2hva command provides one step in a chain of translations from guest virtual to guest physical to host virtual to host physical. Host physical is then used to inject a machine check error. As a consequence the HWPOISON code on the host and the MCE injection code in qemu-kvm are exercised. v3: - Rename to x-gpa2hva - Remove QMP support, because gpa2hva is not consistent Is this patch an acceptable solution for now? This command is useful for our testing. Anthony? Yeah, but it should come through qemu-devel, no? Yes, Huang Ying, can you please resend? Via QEMU git or uq/master branch of KVM git? If there isn't anything qemu-kvm.git specific, it should be against qemu.git. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 V9] hmp,qmp: add inject-nmi
On Thu, 28 Apr 2011 11:35:20 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: Adds new QERR_UNSUPPORTED, converts nmi to inject-nmi and make it supports qmp. Lai, unfortunately this series still have some issues (like changing the HMP command name). I think V7 was the best submission so far, so I decided to do this: I've incorporated your v7 patches in a new series and fixed a few issues. I'll submit it for review. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC PATCH 0/3 V8] QAPI: add inject-nmi qmp command
On Wed, 27 Apr 2011 09:54:34 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: On 04/26/2011 09:29 PM, Anthony Liguori wrote: On 04/26/2011 08:26 AM, Luiz Capitulino wrote: On Thu, 21 Apr 2011 11:23:54 +0800 Lai Jiangshanla...@cn.fujitsu.com wrote: Hi, Anthony Liguori Any suggestion? Although all command line interfaces will be converted to to use QMP interfaces in 0.16, I hope inject-nmi come into QAPI earlier, 0.15. I don't know what Anthony thinks about adding new commands like this one that early to the new QMP interface, but adding them to current QMP will certainly cause less code churn on your side. That's what I'd recommend for now. Yeah, sorry, this whole series has been confused in the QAPI discussion. I did not intend for QAPI to be disruptive to current development. As far as I can tell, the last series that was posted (before the QAPI post) still had checkpatch.pl issues (scripts/checkpatch.pl btw) and we had agreed that once that was resolved, it would come in through Luiz's tree. Sorry, I didn't caught the meaning. Fix checkpatch.pl issues of V7 Patch, and sent it again? Yes, my recommendation for your series is: 1. Address checkpatch.pl errors 2. Change the HMP to use your implementation, which send the NMI to all CPUs 3. Any other _code_ review comments I might be missing -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 0/3 V8] QAPI: add inject-nmi qmp command
On Thu, 21 Apr 2011 11:23:54 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: Hi, Anthony Liguori Any suggestion? Although all command line interfaces will be converted to to use QMP interfaces in 0.16, I hope inject-nmi come into QAPI earlier, 0.15. I don't know what Anthony thinks about adding new commands like this one that early to the new QMP interface, but adding them to current QMP will certainly cause less code churn on your side. That's what I'd recommend for now. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Wed, 20 Apr 2011 09:53:56 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: On 04/04/2011 09:09 PM, Anthony Liguori wrote: On 04/04/2011 07:19 AM, Markus Armbruster wrote: [Note cc: Anthony] Daniel P. Berrangeberra...@redhat.com writes: On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote: From: Lai Jiangshanla...@cn.fujitsu.com Date: Mon, 7 Mar 2011 17:05:15 +0800 Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command inject-nmi command injects an NMI on all CPUs of guest. It is only supported for x86 guest currently, it will returns Unsupported error for non-x86 guest. --- hmp-commands.hx |2 +- monitor.c | 18 +- qmp-commands.hx | 29 + 3 files changed, 47 insertions(+), 2 deletions(-) Does anyone have any feedback on this addition, or are all new QMP patch proposals blocked pending Anthony's QAPI work ? That would be bad. Anthony, what's holding this back? It doesn't pass checkpath.pl. But I'd also expect this to come through Luiz's QMP tree. Regards, Anthony Liguori Hi, Anthony, I cannot find checkpath.pl in the source tree. It's ./scripts/checkpatch.pl And how/where to write errors descriptions? Is the following description suitable? ## # @inject-nmi: # # Inject an NMI on the guest. # # Returns: Nothing on success. # If the guest(non-x86) does not support NMI injection, Unsupported # # Since: 0.15.0 ## { 'command': 'inject-nmi' } Thanks, Lai -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Tue, 12 Apr 2011 21:31:18 +0300 Blue Swirl blauwir...@gmail.com wrote: On Tue, Apr 12, 2011 at 10:52 AM, Avi Kivity a...@redhat.com wrote: On 04/11/2011 08:15 PM, Blue Swirl wrote: On Mon, Apr 11, 2011 at 10:01 AM, Markus Armbrusterarm...@redhat.com  wrote:  Avi Kivitya...@redhat.com  writes:  On 04/08/2011 12:41 AM, Anthony Liguori wrote:  And it's a good thing to have, but exposing this as the only API to  do something as simple as generating a guest crash dump is not the  friendliest thing in the world to do to users.  nmi is a fine name for something that corresponds to a real-life nmi  button (often labeled NMI).  Agree. We could also introduce an alias mechanism for user friendly names, so nmi could be used in addition of full path. Aliases could be useful for device paths as well. Yes.  Perhaps limited to the human monitor. I'd limit all debugging commands (including NMI) to the human monitor. Why? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Sat, 9 Apr 2011 13:34:43 +0300 Blue Swirl blauwir...@gmail.com wrote: On Sat, Apr 9, 2011 at 2:25 AM, Luiz Capitulino lcapitul...@redhat.com wrote: Hi there, Summary:  - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got  the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's  as fast as qemu-kvm.git)  - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried  with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I tried with qemu.git v0.13.0 in order to check if this was a regression, but I got the same problem... Then I inspected qemu-kvm.git under the assumption that it could have a fix that wasn't commited to qemu.git. Found this:  - commit 0836b77f0f65d56d08bdeffbac25cd6d78267dc9 which is merge, works  - commit cc015e9a5dde2f03f123357fa060acbdfcd570a4 does not work (it's slow) I tried a bisect, but it brakes due to gcc4 vs. gcc3 changes. Then I inspected commits manually, and found out that commit 64d7e9a4 doesn't work, which makes me think that the fix could be in the conflict resolution of 0836b77f, which makes me remember that I'm late for diner, so my conclusions at this point are not reliable :) Ideas? What is the test case? It's an external PXE server, command-line is: qemu -boot n -enable-kvm -net nic,model=virtio -net tap,ifname=vnet0,script= I tried PXE booting a 10M file with and without KVM and the results are pretty much the same with pcnet and e1000. time qemu -monitor stdio -boot n -net nic,model=e1000 -net user,tftp=.,bootfile=10M -net dump,file=foo -enable-kvm time qemu -monitor stdio -boot n -net nic,model=pcnet -net user,tftp=.,bootfile=10M -net dump,file=foo -enable-kvm time qemu -monitor stdio -boot n -net nic,model=e1000 -net user,tftp=.,bootfile=10M -net dump,file=foo time qemu -monitor stdio -boot n -net nic,model=pcnet -net user,tftp=.,bootfile=10M -net dump,file=foo All times are ~10s. Yeah, you're using the internal tftp server. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I tried with qemu.git v0.13.0 in order to check if this was a regression, but I got the same problem... Then I inspected qemu-kvm.git under the assumption that it could have a fix that wasn't commited to qemu.git. Found this: - commit 0836b77f0f65d56d08bdeffbac25cd6d78267dc9 which is merge, works - commit cc015e9a5dde2f03f123357fa060acbdfcd570a4 does not work (it's slow) I tried a bisect, but it brakes due to gcc4 vs. gcc3 changes. Then I inspected commits manually, and found out that commit 64d7e9a4 doesn't work, which makes me think that the fix could be in the conflict resolution of 0836b77f, which makes me remember that I'm late for diner, so my conclusions at this point are not reliable :) Can you run kvm_stat to see what the exit rates are? Here you go, both collected after the VM is fully booted: qemu.git: efer_reload0 0 exits 15976719599 fpu_reload 203 0 halt_exits 54427 halt_wakeup0 0 host_state_reload 29985170 hypercalls 0 0 insn_emulation 13449597341 insn_emulation_fail0 0 invlpg 9687 0 io_exits 85979 0 irq_exits 162179 4 irq_injections 1158227 irq_window 2071227 largepages 0 0 mmio_exits 954541 mmu_cache_miss 5307 0 mmu_flooded 2493 0 mmu_pde_zapped 1188 0 mmu_pte_updated 5355 0 mmu_pte_write 181550 0 mmu_recycled 0 0 mmu_shadow_zapped 6437 0 mmu_unsync15 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 73983 0 pf_guest4027 0 remote_tlb_flush 1 0 request_irq6 0 signal_exits 135731 2 tlb_flush 26760 0 qemu-kvm.git: efer_reload0 0 exits869724433 fpu_reload46 0 halt_exits 206 8 halt_wakeup7 0 host_state_reload 105173 8 hypercalls 0 0 insn_emulation 698411821 insn_emulation_fail0 0 invlpg 9682 0 io_exits 626201 0 irq_exits 22930 4 irq_injections 2815 8 irq_window 1029 0 largepages 0 0 mmio_exits 3657 0 mmu_cache_miss 5271 0 mmu_flooded 2466 0 mmu_pde_zapped 1146 0 mmu_pte_updated 5294 0 mmu_pte_write 191173 0 mmu_recycled 0 0 mmu_shadow_zapped 6405 0 mmu_unsync17 0 nmi_injections 0 0 nmi_window 0 0 pf_fixed 73580 0 pf_guest4169 0 remote_tlb_flush 1 0 request_irq0 0 signal_exits 24873 0 tlb_flush 26628 0 Maybe we're missing a coalesced io in qemu.git? It's also possible that gpxe is hitting the apic or pit quite a lot. Regards, Anthony Liguori Ideas? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Slow PXE boot in qemu.git (fast in qemu-kvm.git)
On Mon, 11 Apr 2011 13:00:32 -0600 Alex Williamson alex.william...@redhat.com wrote: On Mon, 2011-04-11 at 15:35 -0300, Luiz Capitulino wrote: On Fri, 08 Apr 2011 19:50:57 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/08/2011 06:25 PM, Luiz Capitulino wrote: Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I was having this problem too, but I think it's because I forgot to build qemu with --enable-io-thread, which is the default for qemu-kvm. Can you re-configure and build with that and see if it's fast? Thanks, Yes, nice catch, it's faster with I/O thread enabled, even seem faster than qemu-kvm.git. So, does this have to be fixed w/o I/O thread? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Slow PXE boot in qemu.git (fast in qemu-kvm.git)
Hi there, Summary: - PXE boot in qemu.git (HEAD f124a41) is quite slow, more than 5 minutes. Got the problem with e1000, virtio and rtl8139. However, pcnet *works* (it's as fast as qemu-kvm.git) - PXE boot in qemu-kvm.git (HEAD df85c051) is fast, less than a minute. Tried with e1000, virtio and rtl8139 (I don't remember if I tried with pcnet) I tried with qemu.git v0.13.0 in order to check if this was a regression, but I got the same problem... Then I inspected qemu-kvm.git under the assumption that it could have a fix that wasn't commited to qemu.git. Found this: - commit 0836b77f0f65d56d08bdeffbac25cd6d78267dc9 which is merge, works - commit cc015e9a5dde2f03f123357fa060acbdfcd570a4 does not work (it's slow) I tried a bisect, but it brakes due to gcc4 vs. gcc3 changes. Then I inspected commits manually, and found out that commit 64d7e9a4 doesn't work, which makes me think that the fix could be in the conflict resolution of 0836b77f, which makes me remember that I'm late for diner, so my conclusions at this point are not reliable :) Ideas? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Mon, 04 Apr 2011 08:05:48 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/04/2011 07:54 AM, Avi Kivity wrote: On 04/04/2011 01:59 PM, Daniel P. Berrange wrote: Interesting that with HMP you need to specify a single CPU index, but with QMP it is injecting to all CPUs at once. Is there any compelling reason why we'd ever need the ability to only inject to a single CPU from an app developer POV ? When a PC has an NMI button, it is (I presume) connected to all CPUs' LINT1 pin, which is often configured as an NMI input. So the all-cpu variant corresponds to real hardware, while the single-cpu variant doesn't. wrt the app developer POV, the only use I'm aware of is that you can configure Windows to dump core when the NMI button is pressed and thus debug driver problems. It's likely more reliable when sent to all cpus. It either needs to be removed from HMP or added to QMP. HMP shouldn't have more features than QMP (even if those features are non-sensible). Is anyone against changing HMP behavior to send it to all CPUs? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Wed, 06 Apr 2011 13:03:37 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/06/2011 12:47 PM, Luiz Capitulino wrote: On Mon, 04 Apr 2011 08:05:48 -0500 Anthony Liguorianth...@codemonkey.ws wrote: On 04/04/2011 07:54 AM, Avi Kivity wrote: On 04/04/2011 01:59 PM, Daniel P. Berrange wrote: Interesting that with HMP you need to specify a single CPU index, but with QMP it is injecting to all CPUs at once. Is there any compelling reason why we'd ever need the ability to only inject to a single CPU from an app developer POV ? When a PC has an NMI button, it is (I presume) connected to all CPUs' LINT1 pin, which is often configured as an NMI input. So the all-cpu variant corresponds to real hardware, while the single-cpu variant doesn't. wrt the app developer POV, the only use I'm aware of is that you can configure Windows to dump core when the NMI button is pressed and thus debug driver problems. It's likely more reliable when sent to all cpus. It either needs to be removed from HMP or added to QMP. HMP shouldn't have more features than QMP (even if those features are non-sensible). Is anyone against changing HMP behavior to send it to all CPUs? Makes sense to me. So, Lai, in order to get this merged could you please do the following: 1. Address checkpath.pl errors 2. Change the HMP to use this implementation, which send the NMI to all CPUs 3. Any other review comments I might be missing :) Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Wed, 06 Apr 2011 20:17:47 +0200 Jan Kiszka jan.kis...@siemens.com wrote: On 2011-04-06 20:08, Luiz Capitulino wrote: On Wed, 06 Apr 2011 13:03:37 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/06/2011 12:47 PM, Luiz Capitulino wrote: On Mon, 04 Apr 2011 08:05:48 -0500 Anthony Liguorianth...@codemonkey.ws wrote: On 04/04/2011 07:54 AM, Avi Kivity wrote: On 04/04/2011 01:59 PM, Daniel P. Berrange wrote: Interesting that with HMP you need to specify a single CPU index, but with QMP it is injecting to all CPUs at once. Is there any compelling reason why we'd ever need the ability to only inject to a single CPU from an app developer POV ? When a PC has an NMI button, it is (I presume) connected to all CPUs' LINT1 pin, which is often configured as an NMI input. So the all-cpu variant corresponds to real hardware, while the single-cpu variant doesn't. wrt the app developer POV, the only use I'm aware of is that you can configure Windows to dump core when the NMI button is pressed and thus debug driver problems. It's likely more reliable when sent to all cpus. It either needs to be removed from HMP or added to QMP. HMP shouldn't have more features than QMP (even if those features are non-sensible). Is anyone against changing HMP behavior to send it to all CPUs? Makes sense to me. So, Lai, in order to get this merged could you please do the following: 1. Address checkpath.pl errors 2. Change the HMP to use this implementation, which send the NMI to all CPUs HMP is currently x86-only, thus it's probably OK to model it after some PC feature (though I don't know if there aren't NMI buttons with BP-only wirings). But will the consolidate version be defined for all architectures? We should avoid exporting x86-specific assumptions. Right, but honestly speaking, I don't know how this works for other arches. So, the best thing to do is to have a general design that can be used by any architecture. Of course that we can also add a new command later if needed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Mon, 04 Apr 2011 14:19:58 +0200 Markus Armbruster arm...@redhat.com wrote: [Note cc: Anthony] Daniel P. Berrange berra...@redhat.com writes: On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote: From: Lai Jiangshan la...@cn.fujitsu.com Date: Mon, 7 Mar 2011 17:05:15 +0800 Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command inject-nmi command injects an NMI on all CPUs of guest. It is only supported for x86 guest currently, it will returns Unsupported error for non-x86 guest. --- hmp-commands.hx |2 +- monitor.c | 18 +- qmp-commands.hx | 29 + 3 files changed, 47 insertions(+), 2 deletions(-) Does anyone have any feedback on this addition, or are all new QMP patch proposals blocked pending Anthony's QAPI work ? No, we agreed on merging stuff against current QMP. That would be bad. Anthony, what's holding this back? I remember Anthony asked for errors descriptions. We'd like to support it in libvirt and thus want it to be available in QMP, as well as HMP. @@ -2566,6 +2566,22 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) break; } } + +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +CPUState *env; + +for (env = first_cpu; env != NULL; env = env-next_cpu) +cpu_interrupt(env, CPU_INTERRUPT_NMI); + +return 0; +} +#else +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +qerror_report(QERR_UNSUPPORTED); +return -1; +} #endif Interesting that with HMP you need to specify a single CPU index, but with QMP it is injecting to all CPUs at once. Is there any compelling reason why we'd ever need the ability to only inject to a single CPU from an app developer POV ? Quoting my own executive summary on this issue: * Real hardware's NMI button injects all CPUs. This is the primary use case. * Lai said injecting a single CPU can be useful for debugging. Was deemed acceptable as secondary use case. Lai also pointed out that the human monitor's nmi command injects a single CPU. That was dismissed as irrelevant for QMP. * No other use cases have been presented. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2 V7] qemu,qmp: add inject-nmi qmp command
On Mon, 04 Apr 2011 08:09:29 -0500 Anthony Liguori anth...@codemonkey.ws wrote: On 04/04/2011 07:19 AM, Markus Armbruster wrote: [Note cc: Anthony] Daniel P. Berrangeberra...@redhat.com writes: On Mon, Mar 07, 2011 at 05:46:28PM +0800, Lai Jiangshan wrote: From: Lai Jiangshanla...@cn.fujitsu.com Date: Mon, 7 Mar 2011 17:05:15 +0800 Subject: [PATCH 2/2] qemu,qmp: add inject-nmi qmp command inject-nmi command injects an NMI on all CPUs of guest. It is only supported for x86 guest currently, it will returns Unsupported error for non-x86 guest. --- hmp-commands.hx |2 +- monitor.c | 18 +- qmp-commands.hx | 29 + 3 files changed, 47 insertions(+), 2 deletions(-) Does anyone have any feedback on this addition, or are all new QMP patch proposals blocked pending Anthony's QAPI work ? That would be bad. Anthony, what's holding this back? It doesn't pass checkpath.pl. But I'd also expect this to come through Luiz's QMP tree. I had this ready in my tree some time ago, but you commented on that version asking for errors descriptions and other things, so I didn't push it. But we have to set expectations here. My tree will eventually die, specially wrt new interfaces where I expect you to jump in ASAP. First because you have to be sure a new interface conforms to QAPI, and second (and more importantly) because it's time to pass this on to someone else (preferably you). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for Mars 21th
On Mon, 21 Mar 2011 13:58:35 +0100 Juan Quintela quint...@redhat.com wrote: Please, send in any agenda items you are interested in covening. - Merge patches speed. I just feel, that patches are not being handled fast enough, so ... I looked how much patches have been integrated since Mars 1st: - QAPI speedup merge proposal -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V6 1/4 resend] nmi: convert cpu_index to cpu-index
On Mon, 21 Feb 2011 09:37:57 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: Hi, Luiz Capitulino Any problem? Sorry for the delay. Looks good in general to me know, there's only one small problem and it's the error message: (qemu) nmi 100 Parameter 'cpu-index' expects a CPU number (qemu) I would expect that kind of error message when no CPU number is provided, but in the case above the CPU number is provided but it happens to be invalid. Why? By the way, please add an introductory email with proper changelog when submitting series/patches, so that it's easier to review. Thanks, Lai On 02/14/2011 06:09 PM, Lai Jiangshan wrote: cpu-index which uses hyphen is better name. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 5d4cb9e..e43ac7c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -721,7 +721,7 @@ ETEXI #if defined(TARGET_I386) { .name = nmi, -.args_type = cpu_index:i, +.args_type = cpu-index:i, .params = cpu, .help = inject an NMI on the given CPU, .mhandler.cmd = do_inject_nmi, diff --git a/monitor.c b/monitor.c index 27883f8..a916771 100644 --- a/monitor.c +++ b/monitor.c @@ -2545,7 +2545,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) static void do_inject_nmi(Monitor *mon, const QDict *qdict) { CPUState *env; -int cpu_index = qdict_get_int(qdict, cpu_index); +int cpu_index = qdict_get_int(qdict, cpu-index); for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject
On Wed, 23 Feb 2011 13:25:38 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 01/27/2011 02:20 AM, Lai Jiangshan wrote: Make we can inject NMI via qemu-monitor-protocol. We use inject-nmi for the qmp command name, the meaning is clearer. Signed-off-by: Lai Jiangshanla...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index ec1a4db..e763bf9 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -725,7 +725,8 @@ ETEXI .params = [cpu], .help = Inject an NMI on all CPUs if no argument is given, otherwise inject it on the specified CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index 387b020..1b1c0ba 100644 --- a/monitor.c +++ b/monitor.c @@ -2542,7 +2542,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index; @@ -2550,7 +2550,7 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) if (!qdict_haskey(qdict, cpu-index)) { for (env = first_cpu; env != NULL; env = env-next_cpu) cpu_interrupt(env, CPU_INTERRUPT_NMI); -return; +return 0; } cpu_index = qdict_get_int(qdict, cpu-index); @@ -2560,8 +2560,10 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) kvm_inject_interrupt(env, CPU_INTERRUPT_NMI); else cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index 56c4d8b..a887dd5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,34 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject-nmi, +.args_type = cpu-index:i?, +.params = [cpu], +.help = Inject an NMI on all CPUs if no argument is given, + otherwise inject it on the specified CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject-nmi +-- + +Inject an NMI on all CPUs or the given CPU (x86 only). + +Arguments: + +- cpu-index: the index of the CPU to be injected NMI (json-int, optional) + +Example: + +- { execute: inject-nmi, arguments: { cpu-index: 0 } } +- { return: {} } Please describe all expected errors. Don't hide this command for !defined(TARGET_I386), instead have it throw an error in the implementation. Don't have commands that multiple behavior based on the presence or absence of arguments. Make it take a list of cpus if you want the ability to inject the NMI to more than one CPU. We had this exactly same discussion last year: http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg01285.html Regards, Anthony Liguori + +EQMP + { .name = migrate, .args_type = detach:-d,blk:-b,inc:-i,uri:s, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM Agenda for Feb 22
On Mon, 21 Feb 2011 14:13:04 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 02/21/2011 11:12 AM, Juan Quintela wrote: please send in any agenda items you are interested in covering. - 0.14.0 release is out, thanks to everyone that participated! Let's discuss what worked well, what could be improved. - 0.15 planning - Should we do a bump to 1.0? Maybe we should wait for QAPI? - Probably should have a placeholder for something like criteria for block format acceptance - GSoC projects We have only three projects so far, not sure what the impact is, but I think we need more. Also note that Google beings accepting mentoring organizations applications on February, 28. Regards, Anthony Liguori thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V6 1/4] nmi: convert cpu_index to cpu-index
On Wed, 09 Feb 2011 14:46:32 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: On 02/01/2011 09:29 PM, Luiz Capitulino wrote: On Thu, 27 Jan 2011 16:20:27 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: cpu-index which uses hyphen is better name. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com It looks ok from a quick pass, but I can't apply it on current master, what commit HEAD did you? It is origin for this HEAD: commit 6f32e3d09d990fd50008756fcb446b55e0c0af79 Merge: f447f8c 0f46d15 Author: Marcelo Tosatti mtosa...@redhat.com Date: Fri Jan 21 20:34:47 2011 -0200 Merge branch 'upstream-merge' It can be also applied on the today's master. The tree I used is: http://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git You should use Anthony's tree: git://git.qemu.org/qemu.git Thank you very much. Lai. Btw, please, do include the patch 0/0 with a general description about the series and a small changelog between changes. --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 5d4cb9e..e43ac7c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -721,7 +721,7 @@ ETEXI #if defined(TARGET_I386) { .name = nmi, -.args_type = cpu_index:i, +.args_type = cpu-index:i, .params = cpu, .help = inject an NMI on the given CPU, .mhandler.cmd = do_inject_nmi, diff --git a/monitor.c b/monitor.c index 27883f8..a916771 100644 --- a/monitor.c +++ b/monitor.c @@ -2545,7 +2545,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) static void do_inject_nmi(Monitor *mon, const QDict *qdict) { CPUState *env; -int cpu_index = qdict_get_int(qdict, cpu_index); +int cpu_index = qdict_get_int(qdict, cpu-index); for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for Feb 1
On Mon, 31 Jan 2011 15:39:22 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 01/31/2011 12:10 PM, Jan Kiszka wrote: On 2011-01-31 11:02, Juan Quintela wrote: Please send in any agenda items you are interested incovering. o KVM upstream merge: status, plans, coordination o QMP support status for 0.14. Luiz and I already chatted about it today but would be good to discuss in the call just to see if anyone has opinions. Basically, declare it fully supported with a few minor caveats (like human-monitor-passthrough is no more supported than the actual monitor and recommendations about how to deal with devices in the device tree). o Summer of code 2011 Reminder: tomorrow is 0.14 stable fork. Regards, Anthony Liguori Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for Feb 1
On Tue, 1 Feb 2011 10:53:21 -0200 Luiz Capitulino lcapitul...@redhat.com wrote: On Mon, 31 Jan 2011 15:39:22 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 01/31/2011 12:10 PM, Jan Kiszka wrote: On 2011-01-31 11:02, Juan Quintela wrote: Please send in any agenda items you are interested incovering. o KVM upstream merge: status, plans, coordination o QMP support status for 0.14. Luiz and I already chatted about it today but would be good to discuss in the call just to see if anyone has opinions. Basically, declare it fully supported with a few minor caveats (like human-monitor-passthrough is no more supported than the actual monitor and recommendations about how to deal with devices in the device tree). o Summer of code 2011 Forgot to mention the wiki page: http://wiki.qemu.org/Google_Summer_of_Code_2011 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V6 1/4] nmi: convert cpu_index to cpu-index
On Thu, 27 Jan 2011 16:20:27 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: cpu-index which uses hyphen is better name. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com It looks ok from a quick pass, but I can't apply it on current master, what commit HEAD did you? Btw, please, do include the patch 0/0 with a general description about the series and a small changelog between changes. --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 5d4cb9e..e43ac7c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -721,7 +721,7 @@ ETEXI #if defined(TARGET_I386) { .name = nmi, -.args_type = cpu_index:i, +.args_type = cpu-index:i, .params = cpu, .help = inject an NMI on the given CPU, .mhandler.cmd = do_inject_nmi, diff --git a/monitor.c b/monitor.c index 27883f8..a916771 100644 --- a/monitor.c +++ b/monitor.c @@ -2545,7 +2545,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) static void do_inject_nmi(Monitor *mon, const QDict *qdict) { CPUState *env; -int cpu_index = qdict_get_int(qdict, cpu_index); +int cpu_index = qdict_get_int(qdict, cpu-index); for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Google Summer of Code 2011
On Sun, 30 Jan 2011 16:06:20 +0100 Alexander Graf ag...@suse.de wrote: On 28.01.2011, at 21:10, Luiz Capitulino wrote: Hi there, GSoC 2011 has been announced[1]. As we were pretty successful last year, I think we should participate again. I've already created a wiki page: http://wiki.qemu.org/Google_Summer_of_Code_2011 We should now populate it with projects and people willing to be mentors should say so (or just add a project)[2]. Also, I'd like to do something different this year, I'd like to invite libvirt people to join. There are two ways of doing this: 1. They join in the program as a regular mentoring organization, or 2. They join with QEMU The second option means that libvirt can suggest and run its own projects (preferably with QEMU relevance), but from a GSoC perspective, the project will be part of the QEMU org. Keep in mind that every full org gets a free trip to the west coast for 2 people ;). So splitting up means we could almost do a mini-summit at the google campus on google's expenses ;). Actually, they have a limited budget and if you live too far (say, in Brazil), the trip might not be 100% free :) Please coordinate that with Carol. Apparently traction for GSOC is declining (according to last year's summit). So there might be plenty of available slots this year. So I'd say sign up separately for now and if you don't get accepted, just join forces with us! Yes, that's a good plan and I fully agree that we get more benefits if we apply separately. It's a call to libvirt's people. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Google Summer of Code 2011
Hi there, GSoC 2011 has been announced[1]. As we were pretty successful last year, I think we should participate again. I've already created a wiki page: http://wiki.qemu.org/Google_Summer_of_Code_2011 We should now populate it with projects and people willing to be mentors should say so (or just add a project)[2]. Also, I'd like to do something different this year, I'd like to invite libvirt people to join. There are two ways of doing this: 1. They join in the program as a regular mentoring organization, or 2. They join with QEMU The second option means that libvirt can suggest and run its own projects (preferably with QEMU relevance), but from a GSoC perspective, the project will be part of the QEMU org. Thanks! PS: Hope you don't mind the cross posting :) [1] http://google-opensource.blogspot.com/2011/01/google-summer-of-code-announced-at-lca.html [2] Please, note that being a mentor means having time to dedicate to your student -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for Jan 25
On Mon, 24 Jan 2011 16:06:34 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 01/24/2011 07:25 AM, Chris Wright wrote: Please send in any agenda items you are interested in covering. - coroutines for the block layer - glib everywhere - Let's start planning our next release in advance, here's a simple example: http://wiki.qemu.org/Planning/0.15-example Regards, Anthony Liguori thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: KVM call agenda for Jan 25
On Tue, 25 Jan 2011 11:57:27 -0200 Luiz Capitulino lcapitul...@redhat.com wrote: On Mon, 24 Jan 2011 16:06:34 -0600 Anthony Liguori anth...@codemonkey.ws wrote: On 01/24/2011 07:25 AM, Chris Wright wrote: Please send in any agenda items you are interested in covering. - coroutines for the block layer - glib everywhere - Let's start planning our next release in advance, here's a simple example: http://wiki.qemu.org/Planning/0.15-example Forgot: - Google summer of code 2011 is on, are we interested? (note: I just saw the news, I don't have any information yet) Regards, Anthony Liguori thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 2/4] nmi: make cpu-index argument optional
Sorry for the long delay on this one, in general looks good, I have just a few small comments. On Mon, 10 Jan 2011 17:27:51 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: When the argument cpu-index is not given, then nmi command will inject NMI on all CPUs. Please, state that we're changing the human monitor behavior on this. This simulate the nmi button on physical machine. Thanks to Markus Armbruster for correcting the logic detecting cpu-index is given or not. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 99b96a8..a49fcd4 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -721,9 +721,9 @@ ETEXI #if defined(TARGET_I386) { .name = nmi, -.args_type = cpu-index:i, -.params = cpu, -.help = inject an NMI on the given CPU, +.args_type = cpu-index:i?, +.params = [cpu], +.help = inject an NMI on all CPUs or the given CPU, IMO, it's better to be a bit more clear, something like: Inject an NMI on all CPUs if no argument is given, otherwise inject it on the specified CPU. .mhandler.cmd = do_inject_nmi, }, #endif diff --git a/monitor.c b/monitor.c index fd18887..952f67f 100644 --- a/monitor.c +++ b/monitor.c @@ -2520,8 +2520,15 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) static void do_inject_nmi(Monitor *mon, const QDict *qdict) { CPUState *env; -int cpu_index = qdict_get_int(qdict, cpu-index); +int cpu_index; +if (!qdict_get(qdict, cpu-index)) { Please, use qdict_haskey(). +for (env = first_cpu; env != NULL; env = env-next_cpu) +cpu_interrupt(env, CPU_INTERRUPT_NMI); +return; +} + +cpu_index = qdict_get_int(qdict, cpu-index); for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V5 3/4] qmp,nmi: convert do_inject_nmi() to QObject
On Mon, 10 Jan 2011 17:28:14 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: Make we can inject NMI via qemu-monitor-protocol. We use inject-nmi for the qmp command name, the meaning is clearer. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index a49fcd4..4db413d 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -724,7 +724,8 @@ ETEXI .args_type = cpu-index:i?, .params = [cpu], .help = inject an NMI on all CPUs or the given CPU, -.mhandler.cmd = do_inject_nmi, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, }, #endif STEXI diff --git a/monitor.c b/monitor.c index 952f67f..1bee840 100644 --- a/monitor.c +++ b/monitor.c @@ -2517,7 +2517,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) #endif #if defined(TARGET_I386) -static void do_inject_nmi(Monitor *mon, const QDict *qdict) +static int do_inject_nmi(Monitor *mon, const QDict *qdict, QObject **ret_data) { CPUState *env; int cpu_index; @@ -2525,15 +2525,17 @@ static void do_inject_nmi(Monitor *mon, const QDict *qdict) if (!qdict_get(qdict, cpu-index)) { for (env = first_cpu; env != NULL; env = env-next_cpu) cpu_interrupt(env, CPU_INTERRUPT_NMI); -return; +return 0; } cpu_index = qdict_get_int(qdict, cpu-index); for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { cpu_interrupt(env, CPU_INTERRUPT_NMI); -break; +return 0; } + +return -1; } #endif diff --git a/qmp-commands.hx b/qmp-commands.hx index 56c4d8b..c2d619c 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -429,6 +429,33 @@ Example: EQMP +#if defined(TARGET_I386) +{ +.name = inject_nmi, Please use an hyphen. +.args_type = cpu-index:i?, +.params = [cpu], +.help = inject an NMI on all CPUs or the given CPU, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_inject_nmi, +}, +#endif +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). Please, explain that we can also inject on all CPUs. + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) It's actually cpu-index, and you should write (json-int, optional). + +Example: + +- { execute: inject_nmi, arguments: { cpu-index: 0 } } Hyphen. +- { return: {} } + +EQMP + { .name = migrate, .args_type = detach:-d,blk:-b,inc:-i,uri:s, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM test: qmp_basic: Go through available monitors to find a qmp one
On Tue, 11 Jan 2011 14:55:47 -0200 Lucas Meneghel Rodrigues l...@redhat.com wrote: It is more convenient to look at all available monitors that the VM has and return the first qmp monitor than relying that the qmp monitor will be allways be the primary one. In case we can't find one, just error the test with a more descriptive message Also, clarify the exception thrown when the monitor is not responsive after the test. Signed-off-by: Qingtang Zhou qz...@redhat.com Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com Makes sense: Acked-by: Luiz Capitulino lcapitul...@redhat.com --- client/tests/kvm/tests/qmp_basic.py | 25 + 1 files changed, 17 insertions(+), 8 deletions(-) diff --git a/client/tests/kvm/tests/qmp_basic.py b/client/tests/kvm/tests/qmp_basic.py index 952da99..94ba9ee 100644 --- a/client/tests/kvm/tests/qmp_basic.py +++ b/client/tests/kvm/tests/qmp_basic.py @@ -1,4 +1,4 @@ -import kvm_test_utils +import kvm_test_utils, kvm_monitor from autotest_lib.client.common_lib import error def run_qmp_basic(test, params, env): @@ -384,13 +384,22 @@ def run_qmp_basic(test, params, env): vm = env.get_vm(params[main_vm]) vm.verify_alive() +# Look for the first qmp monitor available, otherwise, fail the test +qmp_monitor = None +for m in vm.monitors: +if isinstance(m, kvm_monitor.QMPMonitor): +qmp_monitor = m + +if qmp_monitor is None: +raise error.TestError('Could not find a QMP monitor, aborting test') + # Run all suites -greeting_suite(vm.monitor) -input_object_suite(vm.monitor) -argument_checker_suite(vm.monitor) -unknown_commands_suite(vm.monitor) -json_parsing_errors_suite(vm.monitor) +greeting_suite(qmp_monitor) +input_object_suite(qmp_monitor) +argument_checker_suite(qmp_monitor) +unknown_commands_suite(qmp_monitor) +json_parsing_errors_suite(qmp_monitor) # check if QMP is still alive -if not vm.monitor.is_responsive(): -raise error.TestFail('QEMU is not alive after QMP testing') +if not qmp_monitor.is_responsive(): +raise error.TestFail('QMP monitor is not responsive after testing') -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Mon, 20 Dec 2010 14:09:05 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: On 12/17/2010 11:25 PM, Avi Kivity wrote: On 12/17/2010 01:22 PM, Luiz Capitulino wrote: I think Avi's suggest is better, and I will use inject-nmi (without cpu-index argument) to send NMI to all cpus, like physical GUI. If some one want to send NMI to a set of cpus, he can use inject-nmi multiple times. His suggestion is to drop _all_ arguments, right Avi? Yes. We don't need to drop the cpu-index argument, the upstream tools(libvirt etc.) can just issue inject-nmi command without any argument when need. Reasons to keep this argument 1) Useful for kernel developer or debuger sending NMI to a special CPU. Ok. 2) Share the code with nmi of hmp version. Share the way how to use these two commands.(hmp version and qmp version) This is bad. As a general rule, we shouldn't tweak QMP interfaces with the intention of sharing code with HMP or anything like that. Anyway, I buy your first argument, although I'm not a kernel developer so I'm just trusting your use case. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/3] nmi: convert cpu_index to cpu-index
On Mon, 20 Dec 2010 18:00:34 +0100 Markus Armbruster arm...@redhat.com wrote: Lai Jiangshan la...@cn.fujitsu.com writes: cpu-index is better name. Signed-off-by: Lai Jiangshan la...@cn.fujitsu.com --- diff --git a/hmp-commands.hx b/hmp-commands.hx index 4befbe2..8de7aa3 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -721,7 +721,7 @@ ETEXI #if defined(TARGET_I386) { .name = nmi, -.args_type = cpu_index:i, +.args_type = cpu-index:i, .params = cpu, .help = inject an NMI on the given CPU, .mhandler.cmd = do_inject_nmi, diff --git a/monitor.c b/monitor.c index 5d74fe3..c16b39d 100644 --- a/monitor.c +++ b/monitor.c @@ -2410,7 +2410,7 @@ static void do_wav_capture(Monitor *mon, const QDict *qdict) static void do_inject_nmi(Monitor *mon, const QDict *qdict) { CPUState *env; -int cpu_index = qdict_get_int(qdict, cpu_index); +int cpu_index = qdict_get_int(qdict, cpu-index); for (env = first_cpu; env != NULL; env = env-next_cpu) if (env-cpu_index == cpu_index) { Fine with me, but it would be nice if we could make up our mind once and for all whether to use dash or underscore in monitor commands and arguments. I vote for dash, although we obviously can't change existing commands. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] qemu,qmp: convert do_inject_nmi() to QObject, QError
On Mon, 20 Dec 2010 08:47:46 -0200 Marcelo Tosatti mtosa...@redhat.com wrote: On Fri, Dec 10, 2010 at 09:20:26AM -0200, Luiz Capitulino wrote: On Fri, 10 Dec 2010 14:36:08 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: +SQMP +inject_nmi +-- + +Inject an NMI on the given CPU (x86 only). + +Arguments: + +- cpu_index: the index of the CPU to be injected NMI (json-int) + +Example: + +- { execute: inject_nmi, arguments: { cpu_index: 0 } } +- { return: {} } + +EQMP + Avi, Anthony, can you please review this? Do we expect some kind of ack from the guest? Do we expect it respond in some way? Looks good to me. Don't except any response from the guest. Also note that the current series defines only one error condition: invalid cpu index. Can this fail in other ways? -- Not really. An NMI can be pending already (which means the current command has no effect), but i don't see the need to report that. Ok, thanks for the feedback Marcelo. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH v3] qemu, qmp: convert do_inject_nmi() to QObject, QError
On Fri, 17 Dec 2010 14:20:15 +0800 Lai Jiangshan la...@cn.fujitsu.com wrote: On 12/16/2010 09:17 PM, Luiz Capitulino wrote: On Thu, 16 Dec 2010 15:11:50 +0200 Avi Kivity a...@redhat.com wrote: Why have an argument at all? Always nmi to all cpus. I think Avi's suggest is better, and I will use inject-nmi (without cpu-index argument) to send NMI to all cpus, like physical GUI. If some one want to send NMI to a set of cpus, he can use inject-nmi multiple times. His suggestion is to drop _all_ arguments, right Avi? This will simplify things, but you'll need a small refactoring to keep the human monitor behavior (which accepts a cpu index). I also like cpu-index, so I have to add another patch for coverting current cpu_index to cpu-index. Thanks, Lai -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Test report, kernel d335b15... qemu cb1983b8...
On Thu, 16 Dec 2010 10:32:12 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 08:05 AM, Hao, Xudong wrote: Hi, all, This is KVM test result against kvm.git d335b156f9fafd177d0606cf845d9a2df2dc5431, and qemu-kvm.git cb1983b8809d0e06a97384a40bad1194a32fc814. Currently qemu-kvm build fail on RHEL5 with a undeclared PCI_PM_CTRL_NO_SOFT_RST error. I saw there already were fix patch in mail list. There are 2 bugs got fixed. Fixed issues: 1. Guest qemu processor will be defunct process by be killed https://bugzilla.kernel.org/show_bug.cgi?id=23612 Good to see it was indeed fixed by -rc5. 2. [SR] qemu return form migrate command spend long time https://sourceforge.net/tracker/?func=detailaid=2942079group_id=180599atid=893831 Juan, Luiz, any idea what fixed this? I saw it too. The funny thing is that you've fixed it yourself: :) commit 5e77aaa0d7d2f4ceaa4fcaf50f3a26d5150f34a6 Author: Avi Kivity a...@redhat.com Date: Wed Jul 7 19:44:22 2010 +0300 QEMUFileBuffered: indicate that we're ready when the underlying file is ready Xudong, please open qemu bugs on launchpad (https://launchpad.net/qemu), not on sourceforge. Kernel bugs go to the kernel bugzilla as usual. We'll retire the sourceforge bug tracker. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
On Thu, 16 Dec 2010 11:03:38 +0200 Avi Kivity a...@redhat.com wrote: On 12/15/2010 08:00 PM, Luiz Capitulino wrote: Looks like a GUI feature to me, Really? Can't see how you can build NMI to all CPUs from NMI this CPU. Or am I misunderstanding you? I guess so. Avi referred to 'nmi button on many machines', I assumed he meant a virtual machine GUI, am I wrong? I meant a real machine's GUI (it's a physical button you can press with your finger, if you have thin fingers). Ok, I didn't know that, but I had another idea: the command could accept either a single cpu index or a list: { execute: inject-nmi, arguments: { cpus: 2 } } { execute: inject-nmi, arguments: { cpus: [1, 2, 3, 4] } } This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. If we agree on this we'll have to wait because the monitor doesn't currently support hybrid arguments. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] qemu,qmp: convert do_inject_nmi() to QObject, QError
On Thu, 16 Dec 2010 12:51:14 +0200 Avi Kivity a...@redhat.com wrote: On 12/16/2010 12:48 PM, Luiz Capitulino wrote: Ok, I didn't know that, but I had another idea: the command could accept either a single cpu index or a list: { execute: inject-nmi, arguments: { cpus: 2 } } { execute: inject-nmi, arguments: { cpus: [1, 2, 3, 4] } } This has the feature of injecting the nmi in just some cpus, although I'm not sure this is going to be desired/useful. If we agree on this we'll have to wait because the monitor doesn't currently support hybrid arguments. I hope it never does. They're hard to support in old-school statically typed languages. We could accept only a list then. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html