Re: [PATCH -tip v9 3/7] kprobes: checks probe address is instruction boudary on x86
On Mon, Jun 01, 2009 at 08:37:31PM -0400, Masami Hiramatsu wrote: > Ensure safeness of inserting kprobes by checking whether the specified > address is at the first byte of a instruction on x86. > This is done by decoding probed function from its head to the probe point. > > Signed-off-by: Masami Hiramatsu > Cc: Ananth N Mavinakayanahalli > Cc: Jim Keniston > Cc: Ingo Molnar Acked-by: Ananth N Mavinakayanahalli > --- > > arch/x86/kernel/kprobes.c | 69 > + > 1 files changed, 69 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c > index 7b5169d..41d524f 100644 > --- a/arch/x86/kernel/kprobes.c > +++ b/arch/x86/kernel/kprobes.c > @@ -48,12 +48,14 @@ > #include > #include > #include > +#include > > #include > #include > #include > #include > #include > +#include > > void jprobe_return_end(void); > > @@ -244,6 +246,71 @@ retry: > } > } > > +/* Recover the probed instruction at addr for further analysis. */ > +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long > addr) > +{ > + struct kprobe *kp; > + kp = get_kprobe((void *)addr); > + if (!kp) > + return -EINVAL; > + > + /* > + * Basically, kp->ainsn.insn has an original instruction. > + * However, RIP-relative instruction can not do single-stepping > + * at different place, fix_riprel() tweaks the displacement of > + * that instruction. In that case, we can't recover the instruction > + * from the kp->ainsn.insn. > + * > + * On the other hand, kp->opcode has a copy of the first byte of > + * the probed instruction, which is overwritten by int3. And > + * the instruction at kp->addr is not modified by kprobes except > + * for the first byte, we can recover the original instruction > + * from it and kp->opcode. > + */ > + memcpy(buf, kp->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t)); > + buf[0] = kp->opcode; > + return 0; > +} > + > +/* Dummy buffers for kallsyms_lookup */ > +static char __dummy_buf[KSYM_NAME_LEN]; > + > +/* Check if paddr is at an instruction boundary */ > +static int __kprobes can_probe(unsigned long paddr) > +{ > + int ret; > + unsigned long addr, offset = 0; > + struct insn insn; > + kprobe_opcode_t buf[MAX_INSN_SIZE]; > + > + if (!kallsyms_lookup(paddr, NULL, &offset, NULL, __dummy_buf)) > + return 0; > + > + /* Decode instructions */ > + addr = paddr - offset; > + while (addr < paddr) { > + kernel_insn_init(&insn, (void *)addr); > + insn_get_opcode(&insn); > + > + /* Check if the instruction has been modified. */ > + if (OPCODE1(&insn) == BREAKPOINT_INSTRUCTION) { > + ret = recover_probed_instruction(buf, addr); > + if (ret) > + /* > + * Another debugging subsystem might insert > + * this breakpoint. In that case, we can't > + * recover it. > + */ > + return 0; > + kernel_insn_init(&insn, buf); > + } > + insn_get_length(&insn); > + addr += insn.length; > + } > + > + return (addr == paddr); > +} > + > /* > * Returns non-zero if opcode modifies the interrupt flag. > */ > @@ -359,6 +426,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p) > > int __kprobes arch_prepare_kprobe(struct kprobe *p) > { > + if (!can_probe((unsigned long)p->addr)) > + return -EILSEQ; > /* insn: must be on special executable page on x86. */ > p->ainsn.insn = get_insn_slot(); > if (!p->ainsn.insn) > > > -- > Masami Hiramatsu > > Software Engineer > Hitachi Computer Products (America), Inc. > Software Solutions Division > > e-mail: mhira...@redhat.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
On Fri, Jun 05, 2009 at 09:03:37AM +0300, Avi Kivity wrote: > Balbir Singh wrote: >>> I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), >>> and a cpu hog running in each group, how would the algorithm divide >>> resources? >>> >>> >> >> As per the matrix calculation, but as soon as we reach an idle point, >> we redistribute the b/w and start a new quantum so to speak, where all >> groups are charged up to their hard limits. >> >> For your question, if there is a CPU hog running, it would be as per >> the matrix calculation, since the system has no idle point during the >> bandwidth period. >> > > So the groups with guarantees get a priority boost. That's not a good > side effect. That happens only in the presence of idle cycles when other groups [with or without guarantees] have nothing useful to do. So how would that matter since there is nothing else to run anyway ? Regards, Bharata. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Balbir Singh wrote: I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a cpu hog running in each group, how would the algorithm divide resources? As per the matrix calculation, but as soon as we reach an idle point, we redistribute the b/w and start a new quantum so to speak, where all groups are charged up to their hard limits. For your question, if there is a CPU hog running, it would be as per the matrix calculation, since the system has no idle point during the bandwidth period. So the groups with guarantees get a priority boost. That's not a good side effect. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Bharata B Rao wrote: On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote: * Avi Kivity [2009-06-05 08:21:43]: Balbir Singh wrote: But then there is no other way to make a *guarantee*, guarantees come at a cost of idling resources, no? Can you show me any other combination that will provide the guarantee and without idling the system for the specified guarantees? OK, I see part of your concern, but I think we could do some optimizations during design. For example if all groups have reached their hard-limit and the system is idle, should we do start a new hard limit interval and restart, so that idleness can be removed. Would that be an acceptable design point? I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a cpu hog running in each group, how would the algorithm divide resources? As per the matrix calculation, but as soon as we reach an idle point, we redistribute the b/w and start a new quantum so to speak, where all groups are charged up to their hard limits. But could there be client models where you are required to strictly adhere to the limit within the bandwidth and not provide more (by advancing the bandwidth period) in the presence of idle cycles ? That's the limit part. I'd like to be able to specify limits and guarantees on the same host and for the same groups; I don't think that works when you advance the bandwidth period. I think we need to treat guarantees as first-class goals, not something derived from limits (in fact I think guarantees are more useful as they can be used to provide SLAs). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on Debian
Aaron Clausen wrote: [] is too old to support this. Is there a reasonably safe way of upgrading to one of the newer versions of KVM on this server? Can't say for "safe" but you can grab my .debs which I use here on a bunch of machines, from http://www.corpit.ru/debian/tls/kvm/ - both binaries and sources. To make them more safe to you, you can download .dsc, .diff.gz, examine the content and build it yourself. /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote: > * Avi Kivity [2009-06-05 08:21:43]: > > > Balbir Singh wrote: > >>> But then there is no other way to make a *guarantee*, guarantees come > >>> at a cost of idling resources, no? Can you show me any other > >>> combination that will provide the guarantee and without idling the > >>> system for the specified guarantees? > >>> > >> > >> OK, I see part of your concern, but I think we could do some > >> optimizations during design. For example if all groups have reached > >> their hard-limit and the system is idle, should we do start a new hard > >> limit interval and restart, so that idleness can be removed. Would > >> that be an acceptable design point? > > > > I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a > > cpu hog running in each group, how would the algorithm divide resources? > > > > As per the matrix calculation, but as soon as we reach an idle point, > we redistribute the b/w and start a new quantum so to speak, where all > groups are charged up to their hard limits. But could there be client models where you are required to strictly adhere to the limit within the bandwidth and not provide more (by advancing the bandwidth period) in the presence of idle cycles ? Regards, Bharata. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote: > On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: > > Avi Kivity wrote: > > > Gregory Haskins wrote: > > > One idea is similar to signalfd() or eventfd() > > > > And thus the "kvm-eventfd" (irqfd/iosignalfd) interface project was born. > > ;) > > The lguest patch queue already has such an interface :) And I have a > partially complete in-kernel virtio_pci patch with the same trick. > > I switched from "kernel created eventfd" to "userspace passes in eventfd" > after a while though; it lets you connect multiple virtqueues to a single fd > if you want. > > Combined with a minor change to allow any process with access to the lguest fd > to queue interrupts, this allowed lguest to move to a thread-per-virtqueue > model which was a significant speedup as well as nice code reduction. > > Here's the relevant kernel patch for reading. > > Thanks! > Rusty. > > lguest: use eventfds for device notification > > Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with > an address: the main Launcher process returns with this address, and figures > out what device to run. > > A far nicer model is to let processes bind an eventfd to an address: if we > find one, we simply signal the eventfd. A couple of (probably misguided) RCU questions/suggestions interspersed. > Signed-off-by: Rusty Russell > Cc: Davide Libenzi > --- > drivers/lguest/Kconfig |2 - > drivers/lguest/core.c |8 ++-- > drivers/lguest/lg.h |9 > drivers/lguest/lguest_user.c| 73 > > include/linux/lguest_launcher.h |1 > 5 files changed, 89 insertions(+), 4 deletions(-) > > diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig > --- a/drivers/lguest/Kconfig > +++ b/drivers/lguest/Kconfig > @@ -1,6 +1,6 @@ > config LGUEST > tristate "Linux hypervisor example code" > - depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX > + depends on X86_32 && EXPERIMENTAL && !X86_PAE && EVENTFD > select HVC_DRIVER > ---help--- > This is a very simple module which allows you to run > diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c > --- a/drivers/lguest/core.c > +++ b/drivers/lguest/core.c > @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign > /* It's possible the Guest did a NOTIFY hypercall to the >* Launcher, in which case we return from the read() now. */ > if (cpu->pending_notify) { > - if (put_user(cpu->pending_notify, user)) > - return -EFAULT; > - return sizeof(cpu->pending_notify); > + if (!send_notify_to_eventfd(cpu)) { > + if (put_user(cpu->pending_notify, user)) > + return -EFAULT; > + return sizeof(cpu->pending_notify); > + } > } > > /* Check for signals */ > diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h > --- a/drivers/lguest/lg.h > +++ b/drivers/lguest/lg.h > @@ -82,6 +82,11 @@ struct lg_cpu { > struct lg_cpu_arch arch; > }; > > +struct lg_eventfds { > + unsigned long addr; > + struct file *event; > +}; > + > /* The private info the thread maintains about the guest. */ > struct lguest > { > @@ -102,6 +107,9 @@ struct lguest > unsigned int stack_pages; > u32 tsc_khz; > > + unsigned int num_eventfds; > + struct lg_eventfds *eventfds; > + > /* Dead? */ > const char *dead; > }; > @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg > void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt, > const unsigned long *def); > void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta); > +bool send_notify_to_eventfd(struct lg_cpu *cpu); > void init_clockdev(struct lg_cpu *cpu); > bool check_syscall_vector(struct lguest *lg); > int init_interrupts(void); > diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c > --- a/drivers/lguest/lguest_user.c > +++ b/drivers/lguest/lguest_user.c > @@ -7,6 +7,8 @@ > #include > #include > #include > +#include > +#include > #include "lg.h" > > /*L:055 When something happens, the Waker process needs a way to stop the > @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu > } > } > > +bool send_notify_to_eventfd(struct lg_cpu *cpu) > +{ > + unsigned int i; > + > + /* lg->eventfds is RCU-protected */ > + preempt_disable(); Suggest changing to rcu_read_lock() to match the synchronize_rcu(). > + for (i = 0; i < cpu->lg->num_eventfds; i++) { > + if (cpu->lg->eventfds[i].addr == cpu->pending_notify) { > + eventfd_signal(cpu->lg->eventfds[i].event, 1); Shouldn't this be something like the following?
Re: [RFC] CPU hard limits
* Avi Kivity [2009-06-05 08:21:43]: > Balbir Singh wrote: >>> But then there is no other way to make a *guarantee*, guarantees come >>> at a cost of idling resources, no? Can you show me any other >>> combination that will provide the guarantee and without idling the >>> system for the specified guarantees? >>> >> >> OK, I see part of your concern, but I think we could do some >> optimizations during design. For example if all groups have reached >> their hard-limit and the system is idle, should we do start a new hard >> limit interval and restart, so that idleness can be removed. Would >> that be an acceptable design point? > > I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a > cpu hog running in each group, how would the algorithm divide resources? > As per the matrix calculation, but as soon as we reach an idle point, we redistribute the b/w and start a new quantum so to speak, where all groups are charged up to their hard limits. For your question, if there is a CPU hog running, it would be as per the matrix calculation, since the system has no idle point during the bandwidth period. -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Balbir Singh wrote: But then there is no other way to make a *guarantee*, guarantees come at a cost of idling resources, no? Can you show me any other combination that will provide the guarantee and without idling the system for the specified guarantees? OK, I see part of your concern, but I think we could do some optimizations during design. For example if all groups have reached their hard-limit and the system is idle, should we do start a new hard limit interval and restart, so that idleness can be removed. Would that be an acceptable design point? I think so. Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a cpu hog running in each group, how would the algorithm divide resources? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
* Avi Kivity [2009-06-05 08:16:21]: > Balbir Singh wrote: > > > How, it works out fine in my calculation 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are limited to 90% 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are limited to 90% 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are limited to 100% >>> It's fine in that it satisfies the guarantees, but it is deeply >>> suboptimal. If I ran a cpu hog in the first group, while the other >>> two were idle, it would be limited to 50% cpu. On the other hand, >>> if it consumed all 100% cpu it would still satisfy the guarantees >>> (as the other groups are idle). >>> >>> The result is that in such a situation, wall clock time would double >>> even though cpu resources are available. >>> >> >> But then there is no other way to make a *guarantee*, guarantees come >> at a cost of idling resources, no? Can you show me any other >> combination that will provide the guarantee and without idling the >> system for the specified guarantees? >> > > Suppose in my example cgroup 1 consumed 100% of the cpu resources and > cgroup 2 and 3 were completely idle. All of the guarantees are met (if > cgroup 2 is idle, there's no need to give it the 10% cpu time it is > guaranteed). > > If your only tool to achieve the guarantees is a limit system, then > yes, the equation yields the correct results. But given that it yields > such inferior results, I think we need to look for a more involved > solution. > > I think the limits method fits cases where it is difficult to evict a > resource (say, disk quotas -- if you want to guarantee 10% of space to > cgroups 1, you must limit all others to 90%). But for processor usage, > you can evict a cgroup instantly, so nothing prevents a cgroup from > consuming all available resources as long as others do not contend for > them. Avi, Could you look at my newer email and comment, where I've mentioned that I see your concern and discussed a design point. We could probably take this discussion forward from there? -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Balbir Singh wrote: How, it works out fine in my calculation 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are limited to 90% 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are limited to 90% 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are limited to 100% It's fine in that it satisfies the guarantees, but it is deeply suboptimal. If I ran a cpu hog in the first group, while the other two were idle, it would be limited to 50% cpu. On the other hand, if it consumed all 100% cpu it would still satisfy the guarantees (as the other groups are idle). The result is that in such a situation, wall clock time would double even though cpu resources are available. But then there is no other way to make a *guarantee*, guarantees come at a cost of idling resources, no? Can you show me any other combination that will provide the guarantee and without idling the system for the specified guarantees? Suppose in my example cgroup 1 consumed 100% of the cpu resources and cgroup 2 and 3 were completely idle. All of the guarantees are met (if cgroup 2 is idle, there's no need to give it the 10% cpu time it is guaranteed). If your only tool to achieve the guarantees is a limit system, then yes, the equation yields the correct results. But given that it yields such inferior results, I think we need to look for a more involved solution. I think the limits method fits cases where it is difficult to evict a resource (say, disk quotas -- if you want to guarantee 10% of space to cgroups 1, you must limit all others to 90%). But for processor usage, you can evict a cgroup instantly, so nothing prevents a cgroup from consuming all available resources as long as others do not contend for them. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
* Chris Friesen [2009-06-04 23:09:22]: > Balbir Singh wrote: > > > But then there is no other way to make a *guarantee*, guarantees come > > at a cost of idling resources, no? Can you show me any other > > combination that will provide the guarantee and without idling the > > system for the specified guarantees? > > The example given was two 10% guaranteed groups and one best-effort > group. Why would this require idling resources? > > If I have a hog in each group, the requirements would be met if the > groups got 33, 33, and 33. (Or 10/10/80, for that matter.) If the > second and third groups go idle, why not let the first group use 100% of > the cpu? > > The only hard restriction is that the sum of the guarantees must be less > than 100%. > Chris, I just responded to a variation of this, I think that some of this could be handled during design. I just sent out the email a few minutes ago. Could you look at that and respond. -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
* Balbir Singh [2009-06-05 12:49:46]: > * Avi Kivity [2009-06-05 07:44:27]: > > > Balbir Singh wrote: > >> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity wrote: > >> > >>> Bharata B Rao wrote: > >>> > > Another way is to place the 8 groups in a container group, and limit > > that to 80%. But that doesn't work if I want to provide guarantees to > > several groups. > > > > > Hmm why not ? Reduce the guarantee of the container group and provide > the same to additional groups ? > > > >>> This method produces suboptimal results: > >>> > >>> $ cgroup-limits 10 10 0 > >>> [50.0, 50.0, 40.0] > >>> > >>> I want to provide two 10% guaranteed groups and one best-effort group. > >>> Using the limits method, no group can now use more than 50% of the > >>> resources. However, having the first group use 90% of the resources does > >>> not violate any guarantees, but it not allowed by the solution. > >>> > >>> > >> > >> How, it works out fine in my calculation > >> > >> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are > >> limited to 90% > >> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are > >> limited to 90% > >> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are > >> limited to 100% > >> > > > > It's fine in that it satisfies the guarantees, but it is deeply > > suboptimal. If I ran a cpu hog in the first group, while the other two > > were idle, it would be limited to 50% cpu. On the other hand, if it > > consumed all 100% cpu it would still satisfy the guarantees (as the > > other groups are idle). > > > > The result is that in such a situation, wall clock time would double > > even though cpu resources are available. > > But then there is no other way to make a *guarantee*, guarantees come > at a cost of idling resources, no? Can you show me any other > combination that will provide the guarantee and without idling the > system for the specified guarantees? OK, I see part of your concern, but I think we could do some optimizations during design. For example if all groups have reached their hard-limit and the system is idle, should we do start a new hard limit interval and restart, so that idleness can be removed. Would that be an acceptable design point? -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Balbir Singh wrote: > But then there is no other way to make a *guarantee*, guarantees come > at a cost of idling resources, no? Can you show me any other > combination that will provide the guarantee and without idling the > system for the specified guarantees? The example given was two 10% guaranteed groups and one best-effort group. Why would this require idling resources? If I have a hog in each group, the requirements would be met if the groups got 33, 33, and 33. (Or 10/10/80, for that matter.) If the second and third groups go idle, why not let the first group use 100% of the cpu? The only hard restriction is that the sum of the guarantees must be less than 100%. Chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote: > Avi Kivity wrote: > > Gregory Haskins wrote: > > One idea is similar to signalfd() or eventfd() > > And thus the "kvm-eventfd" (irqfd/iosignalfd) interface project was born. > ;) The lguest patch queue already has such an interface :) And I have a partially complete in-kernel virtio_pci patch with the same trick. I switched from "kernel created eventfd" to "userspace passes in eventfd" after a while though; it lets you connect multiple virtqueues to a single fd if you want. Combined with a minor change to allow any process with access to the lguest fd to queue interrupts, this allowed lguest to move to a thread-per-virtqueue model which was a significant speedup as well as nice code reduction. Here's the relevant kernel patch for reading. Thanks! Rusty. lguest: use eventfds for device notification Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell Cc: Davide Libenzi --- drivers/lguest/Kconfig |2 - drivers/lguest/core.c |8 ++-- drivers/lguest/lg.h |9 drivers/lguest/lguest_user.c| 73 include/linux/lguest_launcher.h |1 5 files changed, 89 insertions(+), 4 deletions(-) diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig --- a/drivers/lguest/Kconfig +++ b/drivers/lguest/Kconfig @@ -1,6 +1,6 @@ config LGUEST tristate "Linux hypervisor example code" - depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX + depends on X86_32 && EXPERIMENTAL && !X86_PAE && EVENTFD select HVC_DRIVER ---help--- This is a very simple module which allows you to run diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c --- a/drivers/lguest/core.c +++ b/drivers/lguest/core.c @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign /* It's possible the Guest did a NOTIFY hypercall to the * Launcher, in which case we return from the read() now. */ if (cpu->pending_notify) { - if (put_user(cpu->pending_notify, user)) - return -EFAULT; - return sizeof(cpu->pending_notify); + if (!send_notify_to_eventfd(cpu)) { + if (put_user(cpu->pending_notify, user)) + return -EFAULT; + return sizeof(cpu->pending_notify); + } } /* Check for signals */ diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h --- a/drivers/lguest/lg.h +++ b/drivers/lguest/lg.h @@ -82,6 +82,11 @@ struct lg_cpu { struct lg_cpu_arch arch; }; +struct lg_eventfds { + unsigned long addr; + struct file *event; +}; + /* The private info the thread maintains about the guest. */ struct lguest { @@ -102,6 +107,9 @@ struct lguest unsigned int stack_pages; u32 tsc_khz; + unsigned int num_eventfds; + struct lg_eventfds *eventfds; + /* Dead? */ const char *dead; }; @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt, const unsigned long *def); void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta); +bool send_notify_to_eventfd(struct lg_cpu *cpu); void init_clockdev(struct lg_cpu *cpu); bool check_syscall_vector(struct lguest *lg); int init_interrupts(void); diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c --- a/drivers/lguest/lguest_user.c +++ b/drivers/lguest/lguest_user.c @@ -7,6 +7,8 @@ #include #include #include +#include +#include #include "lg.h" /*L:055 When something happens, the Waker process needs a way to stop the @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu } } +bool send_notify_to_eventfd(struct lg_cpu *cpu) +{ + unsigned int i; + + /* lg->eventfds is RCU-protected */ + preempt_disable(); + for (i = 0; i < cpu->lg->num_eventfds; i++) { + if (cpu->lg->eventfds[i].addr == cpu->pending_notify) { + eventfd_signal(cpu->lg->eventfds[i].event, 1); + cpu->pending_notify = 0; + break; + } + } + preempt_enable(); + return cpu->pending_notify == 0; +} + +static int add_eventfd(struct lguest *lg, unsigned long addr, int fd) +{ + struct lg_eventfds *new, *old; + + if (!addr) + return -EINVAL; + + /* Replace the old array with the new one, carefully: others can +* be accessing it at the sa
Re: [RFC] CPU hard limits
* Avi Kivity [2009-06-05 07:44:27]: > Balbir Singh wrote: >> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity wrote: >> >>> Bharata B Rao wrote: >>> > Another way is to place the 8 groups in a container group, and limit > that to 80%. But that doesn't work if I want to provide guarantees to > several groups. > > Hmm why not ? Reduce the guarantee of the container group and provide the same to additional groups ? >>> This method produces suboptimal results: >>> >>> $ cgroup-limits 10 10 0 >>> [50.0, 50.0, 40.0] >>> >>> I want to provide two 10% guaranteed groups and one best-effort group. >>> Using the limits method, no group can now use more than 50% of the >>> resources. However, having the first group use 90% of the resources does >>> not violate any guarantees, but it not allowed by the solution. >>> >>> >> >> How, it works out fine in my calculation >> >> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are >> limited to 90% >> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are >> limited to 90% >> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are >> limited to 100% >> > > It's fine in that it satisfies the guarantees, but it is deeply > suboptimal. If I ran a cpu hog in the first group, while the other two > were idle, it would be limited to 50% cpu. On the other hand, if it > consumed all 100% cpu it would still satisfy the guarantees (as the > other groups are idle). > > The result is that in such a situation, wall clock time would double > even though cpu resources are available. But then there is no other way to make a *guarantee*, guarantees come at a cost of idling resources, no? Can you show me any other combination that will provide the guarantee and without idling the system for the specified guarantees? >> Now if we really have zeros, I would recommend using >> >> cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output. >> >> Adding zeros to the calcuation is not recommended. Does that help? > > What do you mean, it is not recommended? I have two groups which need at > least 10% and one which does not need any guarantee, how do I express it? > Ignore this part of my comment > In any case, changing the zero to 1% does not materially change the results. True. -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Balbir Singh wrote: On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity wrote: Bharata B Rao wrote: Another way is to place the 8 groups in a container group, and limit that to 80%. But that doesn't work if I want to provide guarantees to several groups. Hmm why not ? Reduce the guarantee of the container group and provide the same to additional groups ? This method produces suboptimal results: $ cgroup-limits 10 10 0 [50.0, 50.0, 40.0] I want to provide two 10% guaranteed groups and one best-effort group. Using the limits method, no group can now use more than 50% of the resources. However, having the first group use 90% of the resources does not violate any guarantees, but it not allowed by the solution. How, it works out fine in my calculation 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are limited to 90% 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are limited to 90% 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are limited to 100% It's fine in that it satisfies the guarantees, but it is deeply suboptimal. If I ran a cpu hog in the first group, while the other two were idle, it would be limited to 50% cpu. On the other hand, if it consumed all 100% cpu it would still satisfy the guarantees (as the other groups are idle). The result is that in such a situation, wall clock time would double even though cpu resources are available. Now if we really have zeros, I would recommend using cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output. Adding zeros to the calcuation is not recommended. Does that help? What do you mean, it is not recommended? I have two groups which need at least 10% and one which does not need any guarantee, how do I express it? In any case, changing the zero to 1% does not materially change the results. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity wrote: > Bharata B Rao wrote: >>> >>> Another way is to place the 8 groups in a container group, and limit >>> that to 80%. But that doesn't work if I want to provide guarantees to >>> several groups. >>> >> >> Hmm why not ? Reduce the guarantee of the container group and provide >> the same to additional groups ? >> > > This method produces suboptimal results: > > $ cgroup-limits 10 10 0 > [50.0, 50.0, 40.0] > > I want to provide two 10% guaranteed groups and one best-effort group. > Using the limits method, no group can now use more than 50% of the > resources. However, having the first group use 90% of the resources does > not violate any guarantees, but it not allowed by the solution. > How, it works out fine in my calculation 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are limited to 90% 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are limited to 90% 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are limited to 100% Now if we really have zeros, I would recommend using cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output. Adding zeros to the calcuation is not recommended. Does that help? Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on Debian
Hi, An update in the hope that this is useful to someone :-) On Fri, Jun 05, 2009 at 09:03:03AM +1200, Mark van Walraven wrote: > My next step is to try qemu-kvm, built from source. The Debianised libvirt > expects the kvm binaries to be in /usr/bin/kvm, so you can symlink them > from /usr/local/bin if you prefer to install there. I've also experimented > with shell script wrapper in /usr/bin/kvm that condenses the output of > qemu-kvm --help so that libvirtd for Lenny works. Actually, the current Debian Lenny libvirt* (0.4.6-10) seem to work fine with qemu-kvm-0.10.5 built from source. All I needed to do was symlink /usr/local/bin/qemu-system-x86_64 to /usr/bin/kvm and copy extboot.bin into /usr/local/share/qemu/ (I used the one from the kvm 85+dfsg-3 package in Experimental). So far, so good. Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Bharata B Rao wrote: Another way is to place the 8 groups in a container group, and limit that to 80%. But that doesn't work if I want to provide guarantees to several groups. Hmm why not ? Reduce the guarantee of the container group and provide the same to additional groups ? This method produces suboptimal results: $ cgroup-limits 10 10 0 [50.0, 50.0, 40.0] I want to provide two 10% guaranteed groups and one best-effort group. Using the limits method, no group can now use more than 50% of the resources. However, having the first group use 90% of the resources does not violate any guarantees, but it not allowed by the solution. #!/usr/bin/python def calculate_limits(g, R): N = len(g) if N == 1: return [R] s = sum([R - gi for gi in g]) return [(s - (R - gi) - (N - 2) * (R - gi)) / (N - 1) for gi in g] import sys print calculate_limits([float(x) for x in sys.argv[1:]], 100) -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
* Avi Kivity [2009-06-04 15:19:22]: > Bharata B Rao wrote: > > 2. Need for hard limiting CPU resource > > -- > > - Pay-per-use: In enterprise systems that cater to multiple > > clients/customers > > where a customer demands a certain share of CPU resources and pays only > > that, CPU hard limits will be useful to hard limit the customer's job > > to consume only the specified amount of CPU resource. > > - In container based virtualization environments running multiple > > containers, > > hard limits will be useful to ensure a container doesn't exceed its > > CPU entitlement. > > - Hard limits can be used to provide guarantees. > > > How can hard limits provide guarantees? > > Let's take an example where I have 1 group that I wish to guarantee a > 20% share of the cpu, and anther 8 groups with no limits or guarantees. > > One way to achieve the guarantee is to hard limit each of the 8 other > groups to 10%; the sum total of the limits is 80%, leaving 20% for the > guarantee group. The downside is the arbitrary limit imposed on the > other groups. > > Another way is to place the 8 groups in a container group, and limit > that to 80%. But that doesn't work if I want to provide guarantees to > several groups. > Hi, Avi, Take a look at http://wiki.openvz.org/Containers/Guarantees_for_resources and the associated program in the wiki page. -- Balbir -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
On Thu, Jun 04, 2009 at 03:19:22PM +0300, Avi Kivity wrote: > Bharata B Rao wrote: >> 2. Need for hard limiting CPU resource >> -- >> - Pay-per-use: In enterprise systems that cater to multiple clients/customers >> where a customer demands a certain share of CPU resources and pays only >> that, CPU hard limits will be useful to hard limit the customer's job >> to consume only the specified amount of CPU resource. >> - In container based virtualization environments running multiple containers, >> hard limits will be useful to ensure a container doesn't exceed its >> CPU entitlement. >> - Hard limits can be used to provide guarantees. >> > How can hard limits provide guarantees? > > Let's take an example where I have 1 group that I wish to guarantee a > 20% share of the cpu, and anther 8 groups with no limits or guarantees. > > One way to achieve the guarantee is to hard limit each of the 8 other > groups to 10%; the sum total of the limits is 80%, leaving 20% for the > guarantee group. The downside is the arbitrary limit imposed on the > other groups. This method sounds very similar to the openvz method: http://wiki.openvz.org/Containers/Guarantees_for_resources > > Another way is to place the 8 groups in a container group, and limit > that to 80%. But that doesn't work if I want to provide guarantees to > several groups. Hmm why not ? Reduce the guarantee of the container group and provide the same to additional groups ? Regards, Bharata. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] qemu-kvm: Flush icache after dma operations for ia64
Jes Sorensen wrote: > Zhang, Xiantao wrote: >> Hi, Jes >> Have you verified whether it works for you ? You may run kernel >> build in the guest with 4 vcpus, if it can be done successfully >> without any error, it should be Okay I think, otherwise, we may need >> to investigate it further. :) Xiantao > > Hi Xiantao, > > I was able to run a 16 vCPU guest and build the kernel using make -j > 16. How quickly would the problem show up for you, on every run, or > should I run more tests? Hi Jes, Good news! On my machine, without the patch, smp guest can't build one whole kernel at all. So if you can build it without errors and use it to boot up the guest, I think it should work well. Xiantao -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1971512 ] failure to migrate guests with more than 4GB of RAM
Bugs item #1971512, was opened at 2008-05-24 17:45 Message generated for change (Comment added) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1971512&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Resolution: None Priority: 3 Private: No Submitted By: Marcelo Tosatti (mtosatti) Assigned to: Anthony Liguori (aliguori) Summary: failure to migrate guests with more than 4GB of RAM Initial Comment: The migration code assumes linear "phys_ram_base": [r...@localhost kvm-userspace.tip]# qemu/x86_64-softmmu/qemu-system-x86_64 -hda /root/images/marcelo5-io-test.img -m 4097 -net nic,model=rtl8139 -net tap,script=/root/iptables/ifup -incoming tcp://0:/ audit_log_user_command(): Connection refused audit_log_user_command(): Connection refused migration: memory size mismatch: recv 22032384 mine 4316999680 migrate_incoming_fd failed (rc=232) -- >Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 20:00 Message: This has been fixed by Glauber. -- Comment By: Jiajun Xu (jiajun) Date: 2008-12-15 22:37 Message: We did not run anyworkload, we do migration just after guest boots up and becomes idle. -- Comment By: Avi Kivity (avik) Date: 2008-12-14 11:45 Message: What workload is the guest running during the migration? -- Comment By: Jiajun Xu (jiajun) Date: 2008-12-09 23:09 Message: Open the bug again since Live Migration 4G guest still fail on my machine. Guest will call trace after Live Migration. -- Comment By: SourceForge Robot (sf-robot) Date: 2008-12-07 22:22 Message: This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). -- Comment By: Jiajun Xu (jiajun) Date: 2008-11-25 01:52 Message: I tried latest commit, userspace.git 6e63ba19476753595e508713eb9daf559dc50bf6 with a 64-bit RHEL5.1 Guest. My host kernel is 2.6.26.2. And My host has 8GB memory and 4GB swap. Guest can be live migrated, but after that, guest will call trace. Maybe we can have a check with each other's environment. My steps as following: 1. qemu-system-x86_64 -incoming tcp:localhost: -m 4096 -net nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img 2. qemu-system-x86_64 -m 4096 -net nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img 3. In qemu console, type "migrate tcp:localhost:" The call trace messages in guest: ### Kernel BUG at block/elevator.c:560 invalid opcode: [1] SMP last sysfs file: /block/hda/removable CPU 0 Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc iscsi_tcp ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm ib_srp ib_sdp rdma_cm ib_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_sa ib_uverbs ib_umad ib_mad ib_core dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac lp floppy pcspkr serio_raw 8139cp 8139too parport_pc parport mii ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 0, comm: swapper Not tainted 2.6.18-53.el5 #1 RIP: 0010:[] [] elv_dequeue_request+0x8/0x3c RSP: 0018:8040ddc0 EFLAGS: 00010046 RAX: 0001 RBX: 81011381b398 RCX: RDX: 81011381b398 RSI: 81011381b398 RDI: 81011fb912c0 RBP: 804abe18 R08: 80304108 R09: 0012 R10: 0022 R11: R12: R13: 0001 R14: 0086 R15: 8040deb8 FS: () GS:80396000() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 2ad6f4d0 CR3: 0001126cc000 CR4: 06e0 Process swapper (pid: 0, threadinfo 803c6000, task 802dcae0) Stack: 8000ae3c 804abe18 804abe50 804abd00 0246 8003ba73 8003ba0c 804abe18 81011fbe5800 8000d2a5 81011fb8c5c0 Call Trace: [] ide_end_request+0xc6/0xfc [] ide_dma_intr+0x67/0xab [] ide_dma_intr+0x0/0xab [] ide_intr+0x16f/0x1df [] handle_IRQ_event+0x29/0x58 [] __do_IRQ+0xa4/0x105 [] do_IRQ+0xe7/0xf5 [] ret_from_intr+0x0/0xa [] __do_softir
[ kvm-Bugs-2624842 ] kernel BUG at /kvm-84/kernel/x86/kvm_main.c:2148!
Bugs item #2624842, was opened at 2009-02-21 14:27 Message generated for change (Comment added) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2624842&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: kernel Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: jb17bsome (jb17bsome) Assigned to: Nobody/Anonymous (nobody) Summary: kernel BUG at /kvm-84/kernel/x86/kvm_main.c:2148! Initial Comment: cpu: AMD Phenom 9750 (4) host distro: fedora 10 x86_64 host kernel: linus-2.6 git (v2.6.29-rc5-276-g2ec77fc) guest: any. I have tried fedora 10, windows nt 4, and windows 2008 images. kvm version: 84 usage: qemu-system-x86_64 -m 512 -no-kvm-pit and -no-kvm-irqchip cause the same bug. -no-kvm runs fine. -- >Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 19:57 Message: jb17bsome, Try unloading the virtualbox driver. -- Comment By: jb17bsome (jb17bsome) Date: 2009-05-06 21:13 Message: I upgraded my system to the latest F11 dev as of May 6 2009, but I get the same result... So the same kernel BUG bug at kvm_handle_fault_on_reboot. I attached another kbug dump (with a little context from dmesg). Is there anything else I can try? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2624842&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2287677 ] kvm79 compiling errors (with-patched-kernel)
Bugs item #2287677, was opened at 2008-11-14 21:39 Message generated for change (Settings changed) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Darkman (darkman82) Assigned to: Nobody/Anonymous (nobody) Summary: kvm79 compiling errors (with-patched-kernel) Initial Comment: config.mak : ARCH=i386 PROCESSOR=i386 PREFIX=/usr KERNELDIR=/usr/src/linux-2.6.27.6/ KERNELSOURCEDIR= LIBKVM_KERNELDIR=/root/kvm-79/kernel WANT_MODULE= CROSS_COMPILE= CC=gcc LD=ld OBJCOPY=objcopy AR=ar ERRORS: /root/kvm-79/qemu/qemu-kvm.c: In function 'ap_main_loop': /root/kvm-79/qemu/qemu-kvm.c:459: error: 'kvm_arch_do_ioperm' undeclared (first use in this function) /root/kvm-79/qemu/qemu-kvm.c:459: error: (Each undeclared identifier is reported only once /root/kvm-79/qemu/qemu-kvm.c:459: error: for each function it appears in.) /root/kvm-79/qemu/qemu-kvm.c: In function 'sigfd_handler': /root/kvm-79/qemu/qemu-kvm.c:544: warning: format '%ld' expects type 'long int', but argument 2 has type 'ssize_t' make[2]: *** [qemu-kvm.o] Error 1 make[2]: Leaving directory `/root/kvm-79/qemu/x86_64-softmmu' make[1]: *** [subdir-x86_64-softmmu] Error 2 make[1]: Leaving directory `/root/kvm-79/qemu' make: *** [qemu] Error 2 Same problem with 2.6.27.2 source. kvm78 works fine. -- >Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 19:51 Message: Please try with kvm-85, and reopen in case its still problematic. -- Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 19:51 Message: Please try with kvm-85, and reopen in case its still problematic. -- Comment By: Darkman (darkman82) Date: 2008-12-05 19:33 Message: It seems due to undefined USE_KVM_DEVICE_ASSIGNMENT. In qemu-kvm.h: qemu-kvm.h:95 #ifdef USE_KVM_DEVICE_ASSIGNMENT qemu-kvm.h:96 void kvm_ioperm(CPUState *env, void *data); qemu-kvm.h:97 void kvm_arch_do_ioperm(void *_data); qemu-kvm.h:98 #endif but in qemu-kvm.c we have qemu-kvm.c:457/* do ioperm for io ports of assigned devices */ qemu-kvm.c:458LIST_FOREACH(data, &ioperm_head, entries) qemu-kvm.c:459on_vcpu(env, kvm_arch_do_ioperm, data); without #ifdef block. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2287677 ] kvm79 compiling errors (with-patched-kernel)
Bugs item #2287677, was opened at 2008-11-14 21:39 Message generated for change (Comment added) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Darkman (darkman82) Assigned to: Nobody/Anonymous (nobody) Summary: kvm79 compiling errors (with-patched-kernel) Initial Comment: config.mak : ARCH=i386 PROCESSOR=i386 PREFIX=/usr KERNELDIR=/usr/src/linux-2.6.27.6/ KERNELSOURCEDIR= LIBKVM_KERNELDIR=/root/kvm-79/kernel WANT_MODULE= CROSS_COMPILE= CC=gcc LD=ld OBJCOPY=objcopy AR=ar ERRORS: /root/kvm-79/qemu/qemu-kvm.c: In function 'ap_main_loop': /root/kvm-79/qemu/qemu-kvm.c:459: error: 'kvm_arch_do_ioperm' undeclared (first use in this function) /root/kvm-79/qemu/qemu-kvm.c:459: error: (Each undeclared identifier is reported only once /root/kvm-79/qemu/qemu-kvm.c:459: error: for each function it appears in.) /root/kvm-79/qemu/qemu-kvm.c: In function 'sigfd_handler': /root/kvm-79/qemu/qemu-kvm.c:544: warning: format '%ld' expects type 'long int', but argument 2 has type 'ssize_t' make[2]: *** [qemu-kvm.o] Error 1 make[2]: Leaving directory `/root/kvm-79/qemu/x86_64-softmmu' make[1]: *** [subdir-x86_64-softmmu] Error 2 make[1]: Leaving directory `/root/kvm-79/qemu' make: *** [qemu] Error 2 Same problem with 2.6.27.2 source. kvm78 works fine. -- >Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 19:51 Message: Please try with kvm-85, and reopen in case its still problematic. -- Comment By: Darkman (darkman82) Date: 2008-12-05 19:33 Message: It seems due to undefined USE_KVM_DEVICE_ASSIGNMENT. In qemu-kvm.h: qemu-kvm.h:95 #ifdef USE_KVM_DEVICE_ASSIGNMENT qemu-kvm.h:96 void kvm_ioperm(CPUState *env, void *data); qemu-kvm.h:97 void kvm_arch_do_ioperm(void *_data); qemu-kvm.h:98 #endif but in qemu-kvm.c we have qemu-kvm.c:457/* do ioperm for io ports of assigned devices */ qemu-kvm.c:458LIST_FOREACH(data, &ioperm_head, entries) qemu-kvm.c:459on_vcpu(env, kvm_arch_do_ioperm, data); without #ifdef block. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2782199 ] linux_s3 ceased function
Bugs item #2782199, was opened at 2009-04-27 11:04 Message generated for change (Settings changed) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: linux_s3 ceased function Initial Comment: Test linux_s3, that worked fine with KVM-85rc3 ceased to function in KVM-85rc5/rc6/final release. S3 is a power sleep test. (suspend) Now it only works with 2 guests: RHEL 5, Fedora 8. (32 and 64-bit) Previously, with KVM-85rc3, linux_s3 test worked with the following guests: RHEL 4, RHEL 5, Fedora 8, Fedora 9, openSUSE 11.0, openSUSE 11.1, and Ubuntu 8.10. I see this as a regression. -Alexey, 27.4.2009. -- Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 19:47 Message: virtio_balloon thing -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2782199 ] linux_s3 ceased function
Bugs item #2782199, was opened at 2009-04-27 11:04 Message generated for change (Comment added) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Technologov (technologov) Assigned to: Nobody/Anonymous (nobody) Summary: linux_s3 ceased function Initial Comment: Test linux_s3, that worked fine with KVM-85rc3 ceased to function in KVM-85rc5/rc6/final release. S3 is a power sleep test. (suspend) Now it only works with 2 guests: RHEL 5, Fedora 8. (32 and 64-bit) Previously, with KVM-85rc3, linux_s3 test worked with the following guests: RHEL 4, RHEL 5, Fedora 8, Fedora 9, openSUSE 11.0, openSUSE 11.1, and Ubuntu 8.10. I see this as a regression. -Alexey, 27.4.2009. -- >Comment By: Marcelo Tosatti (mtosatti) Date: 2009-06-04 19:47 Message: virtio_balloon thing -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2801212 ] sles10sp2 guest timer run too fast
Bugs item #2801212, was opened at 2009-06-04 11:17 Message generated for change (Settings changed) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) >Assigned to: Marcelo Tosatti (mtosatti) Summary: sles10sp2 guest timer run too fast Initial Comment: With kvm.git Commit:7ff90748cebbfbafc8cfa6bdd633113cd9537789 qemu-kvm Commit:a1cd3c985c848dae73966f9601f15fbcade72f1, we found that sles10sp2 will run much faster than real, about 27s faster each after 60s real time. Reproduce steps: (1)qemu-system-x86_64 -m 1024 -net nic,macaddr=00:16:3e:6f:f3:d1,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sles10sp2.img (2)Run ntpdate in guest: ntpdate sync_machine_ip && sleep 60 && ntpdate sync_machine_ip Current result: sles10sp2rc1-guest:~ # ntpdate sync_machine_ip && sleep 60 && ntpdate sync_machine_ip 31 May 23:16:59 ntpdate[3303]: step time server 192.168.198.248 offset -61.27418 31 May 23:17:32 ntpdate[3305]: step time server 192.168.198.248 offset -27.626469 sec -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2801459 ] i8042.c: No controller found...
Bugs item #2801459, was opened at 2009-06-04 19:39 Message generated for change (Tracker Item Submitted) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801459&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Marcelo Tosatti (mtosatti) Assigned to: Marcelo Tosatti (mtosatti) Summary: i8042.c: No controller found... Initial Comment: http://marc.info/?l=qemu-devel&m=124329227728366&w=2 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801459&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2801458 ] BUG at mmu.c:615 from localhost migration using ept+hugetlbf
Bugs item #2801458, was opened at 2009-06-04 19:36 Message generated for change (Tracker Item Submitted) made by mtosatti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801458&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: kernel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Marcelo Tosatti (mtosatti) Assigned to: Marcelo Tosatti (mtosatti) Summary: BUG at mmu.c:615 from localhost migration using ept+hugetlbf Initial Comment: http://www.mail-archive.com/kvm@vger.kernel.org/msg16136.html -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801458&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH KVM VMX 0/2] Enable Unrestricted Guest
Hi Avi, I have modified the earlier patch as per you comments. I have prepared a separate patch for renaming rmode.active to rmode.vm86_active. And 2nd patch enables the Unrestricted Guest feature in the KVM. This patch will also work with unfixed (cpu reset state) qemu. Please apply. Thanks & Regards, Nitin Nitin A Kamble (2): KVM: VMX: Rename rmode.active to rmode.vm86_active KVM: VMX: Support Unrestricted Guest feature arch/x86/include/asm/kvm_host.h | 14 --- arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/vmx.c | 77 +-- 3 files changed, 66 insertions(+), 26 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH KVM VMX 2/2] KVM: VMX: Support Unrestricted Guest feature
"Unrestricted Guest" feature is added in the VMX specification. Intel Westmere and onwards processors will support this feature. It allows kvm guests to run real mode and unpaged mode code natively in the VMX mode when EPT is turned on. With the unrestricted guest there is no need to emulate the guest real mode code in the vm86 container or in the emulator. Also the guest big real mode code works like native. The attached patch enhances KVM to use the unrestricted guest feature if available on the processor. It also adds a new kernel/module parameter to disable the unrestricted guest feature at the boot time. Signed-off-by: Nitin A Kamble --- arch/x86/include/asm/kvm_host.h | 12 + arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/vmx.c | 49 ++ 3 files changed, 51 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1cc901e..a1a96a5 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -37,12 +37,14 @@ #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS | \ 0xFF00ULL) -#define KVM_GUEST_CR0_MASK\ - (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE \ -| X86_CR0_NW | X86_CR0_CD) +#define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ + (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) +#define KVM_GUEST_CR0_MASK \ + (KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE) +#define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST \ + (X86_CR0_WP | X86_CR0_NE | X86_CR0_TS | X86_CR0_MP) #define KVM_VM_CR0_ALWAYS_ON \ - (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE | X86_CR0_TS \ -| X86_CR0_MP) + (KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE) #define KVM_GUEST_CR4_MASK \ (X86_CR4_VME | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_PGE | X86_CR4_VMXE) #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 498f944..c73da02 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -55,6 +55,7 @@ #define SECONDARY_EXEC_ENABLE_EPT 0x0002 #define SECONDARY_EXEC_ENABLE_VPID 0x0020 #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 +#define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 #define PIN_BASED_EXT_INTR_MASK 0x0001 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d1ec8a9..b3d8a3a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -50,6 +50,10 @@ module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO); static int __read_mostly enable_ept = 1; module_param_named(ept, enable_ept, bool, S_IRUGO); +static int __read_mostly enable_unrestricted_guest = 1; +module_param_named(unrestricted_guest, + enable_unrestricted_guest, bool, S_IRUGO); + static int __read_mostly emulate_invalid_guest_state = 0; module_param(emulate_invalid_guest_state, bool, S_IRUGO); @@ -270,6 +274,12 @@ static inline int cpu_has_vmx_ept(void) SECONDARY_EXEC_ENABLE_EPT; } +static inline int cpu_has_vmx_unrestricted_guest(void) +{ + return vmcs_config.cpu_based_2nd_exec_ctrl & + SECONDARY_EXEC_UNRESTRICTED_GUEST; +} + static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm) { return flexpriority_enabled && @@ -1201,7 +1211,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | SECONDARY_EXEC_WBINVD_EXITING | SECONDARY_EXEC_ENABLE_VPID | - SECONDARY_EXEC_ENABLE_EPT; + SECONDARY_EXEC_ENABLE_EPT | + SECONDARY_EXEC_UNRESTRICTED_GUEST; if (adjust_vmx_controls(min2, opt2, MSR_IA32_VMX_PROCBASED_CTLS2, &_cpu_based_2nd_exec_control) < 0) @@ -1331,8 +1342,13 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_vpid()) enable_vpid = 0; - if (!cpu_has_vmx_ept()) + if (!cpu_has_vmx_ept()) { enable_ept = 0; + enable_unrestricted_guest = 0; + } + + if (!cpu_has_vmx_unrestricted_guest()) + enable_unrestricted_guest = 0; if (!cpu_has_vmx_flexpriority()) flexpriority_enabled = 0; @@ -1431,6 +1447,9 @@ static void enter_rmode(struct kvm_vcpu *vcpu) unsigned long flags; struct vcpu_vmx *vmx = to_vmx(vcpu); + if (enable_unr
[PATCH KVM VMX 1/2] KVM: VMX: Rename rmode.active to rmode.vm86_active
That way the interpretation of rmode.active becomes more clear with unrestricted guest code. Signed-off-by: Nitin A Kamble --- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/vmx.c | 28 ++-- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 1951d39..1cc901e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -339,7 +339,7 @@ struct kvm_vcpu_arch { } interrupt; struct { - int active; + int vm86_active; u8 save_iopl; struct kvm_save_segment { u16 selector; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index fd05fd2..d1ec8a9 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -497,7 +497,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) eb |= 1u << BP_VECTOR; } - if (vcpu->arch.rmode.active) + if (vcpu->arch.rmode.vm86_active) eb = ~0; if (enable_ept) eb &= ~(1u << PF_VECTOR); /* bypass_guest_pf = 0 */ @@ -733,7 +733,7 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu) static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) { - if (vcpu->arch.rmode.active) + if (vcpu->arch.rmode.vm86_active) rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM; vmcs_writel(GUEST_RFLAGS, rflags); } @@ -790,7 +790,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, intr_info |= INTR_INFO_DELIVER_CODE_MASK; } - if (vcpu->arch.rmode.active) { + if (vcpu->arch.rmode.vm86_active) { vmx->rmode.irq.pending = true; vmx->rmode.irq.vector = nr; vmx->rmode.irq.rip = kvm_rip_read(vcpu); @@ -1370,7 +1370,7 @@ static void enter_pmode(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); vmx->emulation_required = 1; - vcpu->arch.rmode.active = 0; + vcpu->arch.rmode.vm86_active = 0; vmcs_writel(GUEST_TR_BASE, vcpu->arch.rmode.tr.base); vmcs_write32(GUEST_TR_LIMIT, vcpu->arch.rmode.tr.limit); @@ -1432,7 +1432,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu) struct vcpu_vmx *vmx = to_vmx(vcpu); vmx->emulation_required = 1; - vcpu->arch.rmode.active = 1; + vcpu->arch.rmode.vm86_active = 1; vcpu->arch.rmode.tr.base = vmcs_readl(GUEST_TR_BASE); vmcs_writel(GUEST_TR_BASE, rmode_tss_base(vcpu->kvm)); @@ -1616,10 +1616,10 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) vmx_fpu_deactivate(vcpu); - if (vcpu->arch.rmode.active && (cr0 & X86_CR0_PE)) + if (vcpu->arch.rmode.vm86_active && (cr0 & X86_CR0_PE)) enter_pmode(vcpu); - if (!vcpu->arch.rmode.active && !(cr0 & X86_CR0_PE)) + if (!vcpu->arch.rmode.vm86_active && !(cr0 & X86_CR0_PE)) enter_rmode(vcpu); #ifdef CONFIG_X86_64 @@ -1675,7 +1675,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long hw_cr4 = cr4 | (vcpu->arch.rmode.active ? + unsigned long hw_cr4 = cr4 | (vcpu->arch.rmode.vm86_active ? KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON); vcpu->arch.cr4 = cr4; @@ -1758,7 +1758,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu, struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg]; u32 ar; - if (vcpu->arch.rmode.active && seg == VCPU_SREG_TR) { + if (vcpu->arch.rmode.vm86_active && seg == VCPU_SREG_TR) { vcpu->arch.rmode.tr.selector = var->selector; vcpu->arch.rmode.tr.base = var->base; vcpu->arch.rmode.tr.limit = var->limit; @@ -1768,7 +1768,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu, vmcs_writel(sf->base, var->base); vmcs_write32(sf->limit, var->limit); vmcs_write16(sf->selector, var->selector); - if (vcpu->arch.rmode.active && var->s) { + if (vcpu->arch.rmode.vm86_active && var->s) { /* * Hack real-mode segments into vm86 compatibility. */ @@ -2337,7 +2337,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu) goto out; } - vmx->vcpu.arch.rmode.active = 0; + vmx->vcpu.arch.rmode.vm86_active = 0; vmx->soft_vnmi_blocked = 0; @@ -2475,7 +2475,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu) KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler); ++vcpu->stat.irq_injections; - if (vcpu->arch.rmode.active) { + if (vcpu->arch.rmode.vm86_active) { vmx->rmode.irq.pend
Re: KVM on Debian
On Thu, Jun 04, 2009 at 01:37:54PM -0700, Aaron Clausen wrote: > I'm running a production Debian Lenny server using KVM to run a couple > of Windows and a couple of Linux guests. All is working well, but I > want to give my Server 2003 guest access to a SCSI tape drive. > Unfortunately, Debian is pretty conservative, and the version of KVM > is too old to support this. Is there a reasonably safe way of > upgrading to one of the newer versions of KVM on this server? Backporting kvm from experimental is straightforward, and has worked fine for me. - Matt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] QEMU KVM: i386: Fix the cpu reset state
As per the IA32 processor manual, the accessed bit is set to 1 in the processor state after reset. qemu pc cpu_reset code was missing this accessed bit setting. Signed-off-by: Nitin A Kamble --- target-i386/helper.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/target-i386/helper.c b/target-i386/helper.c index 7fc5366..573fb5b 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -493,17 +493,23 @@ void cpu_reset(CPUX86State *env) env->tr.flags = DESC_P_MASK | (11 << DESC_TYPE_SHIFT); cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, - DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | DESC_R_MASK); + DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | + DESC_R_MASK | DESC_A_MASK); cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, - DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); + DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | + DESC_A_MASK); cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, - DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); + DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | + DESC_A_MASK); cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, - DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); + DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | + DESC_A_MASK); cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, - DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); + DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | + DESC_A_MASK); cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, - DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); + DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | + DESC_A_MASK); env->eip = 0xfff0; env->regs[R_EDX] = env->cpuid_version; -- 1.6.0.6 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Avi Kivity wrote: Bharata B Rao wrote: 2. Need for hard limiting CPU resource -- - Pay-per-use: In enterprise systems that cater to multiple clients/customers where a customer demands a certain share of CPU resources and pays only that, CPU hard limits will be useful to hard limit the customer's job to consume only the specified amount of CPU resource. - In container based virtualization environments running multiple containers, hard limits will be useful to ensure a container doesn't exceed its CPU entitlement. - Hard limits can be used to provide guarantees. How can hard limits provide guarantees? Hard limits are useful and desirable in situations where we would like to maintain deterministic behavior. Placing a hard cap on the cpu usage of a given task group (and configuring such that this cpu time is not overcommited) on a system allows us to create a hard guarantee that throughput for that task group will not fluctuate as other workloads are added and removed on the system. Cache use and bus bandwidth in a multi-workload environment can still cause a performance deviation, but these are second order compared to the cpu scheduling guarantees themselves. Mike Waychison -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM on Debian
On Thu, Jun 04, 2009 at 01:37:54PM -0700, Aaron Clausen wrote: > I'm running a production Debian Lenny server using KVM to run a couple > of Windows and a couple of Linux guests. All is working well, but I > want to give my Server 2003 guest access to a SCSI tape drive. > Unfortunately, Debian is pretty conservative, and the version of KVM > is too old to support this. Is there a reasonably safe way of > upgrading to one of the newer versions of KVM on this server? I'm interested in this too, so far I have found that Lenny's libvirt fails to parse the output of kvm --help, though this is fixed in the libvirt in testing. The kvm package from experimental seems to work well - after a day of testing. My next step is to try qemu-kvm, built from source. The Debianised libvirt expects the kvm binaries to be in /usr/bin/kvm, so you can symlink them from /usr/local/bin if you prefer to install there. I've also experimented with shell script wrapper in /usr/bin/kvm that condenses the output of qemu-kvm --help so that libvirtd for Lenny works. Regards, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM on Debian
I'm running a production Debian Lenny server using KVM to run a couple of Windows and a couple of Linux guests. All is working well, but I want to give my Server 2003 guest access to a SCSI tape drive. Unfortunately, Debian is pretty conservative, and the version of KVM is too old to support this. Is there a reasonably safe way of upgrading to one of the newer versions of KVM on this server? -- Aaron Clausen mightymartia...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use KVMState, as upstream do
On Thu, Jun 04, 2009 at 05:18:06PM -0300, Glauber Costa wrote: > > > this first phase has nothing to do with functionality. To begin with, > > > KVMState is qemu style, kvm_context_t is not, like it or not (I don't). > > > > > I am not against this mechanical change at all, don't get me wrong. I > > don't want to mix two kvm implementation together in strange ways. > > > too late for not wanting anything strange to happen ;-) > You are right, I should have said "in stranger ways". > But I do believe this is the way to turn qemu-kvm.git into something > that feeds qemu.git. And that's what we all want. Disagree with first part, agree with second :) -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use KVMState, as upstream do
On Thu, Jun 04, 2009 at 11:09:52PM +0300, Gleb Natapov wrote: > On Thu, Jun 04, 2009 at 05:10:51PM -0300, Glauber Costa wrote: > > On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote: > > > On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote: > > > > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote: > > > > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote: > > > > > > This is a pretty mechanical change. To make code look > > > > > > closer to upstream qemu, I'm renaming kvm_context_t to > > > > > > KVMState. Mid term goal here is to start sharing code > > > > > > whereas possible. > > > > > > > > > > > > Avi, please apply, or I'll send you a video of myself > > > > > > dancing naked. > > > > > > > > > > > You can start recording it since I doubt this patch will apply cleanly > > > > > to today's master (other mechanical change was applied). Regardless, I > > > > > think trying to use bits of qemu kvm is dangerous. It has similar > > > > > function > > > > > with same names, but with different assumptions about conditional they > > > > > can be executed in (look at commit a5ddb119). I actually prefer to be > > > > > different enough to not call upstream qemu function by mistake. > > > > > > > > I did it against today's master. If new patches came in, is just > > > > a matter of regenerating this, since it is, as I said, mechanical. > > > > > > > > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c > > > > are not included in the final object), there is no such risk. > > > > Of course, I am aiming towards it, but the first step will be to change > > > > the name of conflicting functions until we can pick qemu's > > > > implementation, > > > > in which case the former will just go away. > > > That is the point. We can't just pick qemu's implementation most of the > > > times. > > "until we can pick up qemu's implementation" potentially involves replacing > > that particular piece with upstream version first. > > > > > > > > > > > > > If we are serious about merging qemu-kvm into qemu, I don't see a way > > > > out > > > > of it. We should start changing things this way to accomodate it. > > > > Different > > > > enough won't do. > > > I don't really like the idea to morph working implementation to look like > > > non-working one. I do agree that qemu-kvm should be cleaned substantially > > > before going upstream. Upstream qemu kvm should go away than. I don't > > > see much work done to enhance it anyway. > > > > > > > this first phase has nothing to do with functionality. To begin with, > > KVMState is qemu style, kvm_context_t is not, like it or not (I don't). > > > I am not against this mechanical change at all, don't get me wrong. I > don't want to mix two kvm implementation together in strange ways. > too late for not wanting anything strange to happen ;-) But I do believe this is the way to turn qemu-kvm.git into something that feeds qemu.git. And that's what we all want. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use KVMState, as upstream do
On Thu, Jun 04, 2009 at 05:10:51PM -0300, Glauber Costa wrote: > On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote: > > On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote: > > > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote: > > > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote: > > > > > This is a pretty mechanical change. To make code look > > > > > closer to upstream qemu, I'm renaming kvm_context_t to > > > > > KVMState. Mid term goal here is to start sharing code > > > > > whereas possible. > > > > > > > > > > Avi, please apply, or I'll send you a video of myself > > > > > dancing naked. > > > > > > > > > You can start recording it since I doubt this patch will apply cleanly > > > > to today's master (other mechanical change was applied). Regardless, I > > > > think trying to use bits of qemu kvm is dangerous. It has similar > > > > function > > > > with same names, but with different assumptions about conditional they > > > > can be executed in (look at commit a5ddb119). I actually prefer to be > > > > different enough to not call upstream qemu function by mistake. > > > > > > I did it against today's master. If new patches came in, is just > > > a matter of regenerating this, since it is, as I said, mechanical. > > > > > > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c > > > are not included in the final object), there is no such risk. > > > Of course, I am aiming towards it, but the first step will be to change > > > the name of conflicting functions until we can pick qemu's implementation, > > > in which case the former will just go away. > > That is the point. We can't just pick qemu's implementation most of the > > times. > "until we can pick up qemu's implementation" potentially involves replacing > that particular piece with upstream version first. > > > > > > > > > If we are serious about merging qemu-kvm into qemu, I don't see a way out > > > of it. We should start changing things this way to accomodate it. > > > Different > > > enough won't do. > > I don't really like the idea to morph working implementation to look like > > non-working one. I do agree that qemu-kvm should be cleaned substantially > > before going upstream. Upstream qemu kvm should go away than. I don't > > see much work done to enhance it anyway. > > > > this first phase has nothing to do with functionality. To begin with, > KVMState is qemu style, kvm_context_t is not, like it or not (I don't). > I am not against this mechanical change at all, don't get me wrong. I don't want to mix two kvm implementation together in strange ways. > I don't plan to introduce regressions, you can rest assured. But we _do_ > have to make things look much more qemuer, and that's what this patch > aims at. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use KVMState, as upstream do
On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote: > On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote: > > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote: > > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote: > > > > This is a pretty mechanical change. To make code look > > > > closer to upstream qemu, I'm renaming kvm_context_t to > > > > KVMState. Mid term goal here is to start sharing code > > > > whereas possible. > > > > > > > > Avi, please apply, or I'll send you a video of myself > > > > dancing naked. > > > > > > > You can start recording it since I doubt this patch will apply cleanly > > > to today's master (other mechanical change was applied). Regardless, I > > > think trying to use bits of qemu kvm is dangerous. It has similar function > > > with same names, but with different assumptions about conditional they > > > can be executed in (look at commit a5ddb119). I actually prefer to be > > > different enough to not call upstream qemu function by mistake. > > > > I did it against today's master. If new patches came in, is just > > a matter of regenerating this, since it is, as I said, mechanical. > > > > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c > > are not included in the final object), there is no such risk. > > Of course, I am aiming towards it, but the first step will be to change > > the name of conflicting functions until we can pick qemu's implementation, > > in which case the former will just go away. > That is the point. We can't just pick qemu's implementation most of the > times. "until we can pick up qemu's implementation" potentially involves replacing that particular piece with upstream version first. > > > > > If we are serious about merging qemu-kvm into qemu, I don't see a way out > > of it. We should start changing things this way to accomodate it. Different > > enough won't do. > I don't really like the idea to morph working implementation to look like > non-working one. I do agree that qemu-kvm should be cleaned substantially > before going upstream. Upstream qemu kvm should go away than. I don't > see much work done to enhance it anyway. > this first phase has nothing to do with functionality. To begin with, KVMState is qemu style, kvm_context_t is not, like it or not (I don't). I don't plan to introduce regressions, you can rest assured. But we _do_ have to make things look much more qemuer, and that's what this patch aims at. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] apic/ioapic kvm free implementation
On Thu, Jun 04, 2009 at 09:46:23PM +0200, Jan Kiszka wrote: > Gleb Natapov wrote: > > On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote: > >> Same thing, > >> > >> addressing comments from gleb. > >> > >> > > Jan, can you run your test on this one? It differs from previous one in > > halt handling. > > Still works for me. > Cool, thanks. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] use KVMState, as upstream do
On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote: > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote: > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote: > > > This is a pretty mechanical change. To make code look > > > closer to upstream qemu, I'm renaming kvm_context_t to > > > KVMState. Mid term goal here is to start sharing code > > > whereas possible. > > > > > > Avi, please apply, or I'll send you a video of myself > > > dancing naked. > > > > > You can start recording it since I doubt this patch will apply cleanly > > to today's master (other mechanical change was applied). Regardless, I > > think trying to use bits of qemu kvm is dangerous. It has similar function > > with same names, but with different assumptions about conditional they > > can be executed in (look at commit a5ddb119). I actually prefer to be > > different enough to not call upstream qemu function by mistake. > > I did it against today's master. If new patches came in, is just > a matter of regenerating this, since it is, as I said, mechanical. > > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c > are not included in the final object), there is no such risk. > Of course, I am aiming towards it, but the first step will be to change > the name of conflicting functions until we can pick qemu's implementation, > in which case the former will just go away. That is the point. We can't just pick qemu's implementation most of the times. > > If we are serious about merging qemu-kvm into qemu, I don't see a way out > of it. We should start changing things this way to accomodate it. Different > enough won't do. I don't really like the idea to morph working implementation to look like non-working one. I do agree that qemu-kvm should be cleaned substantially before going upstream. Upstream qemu kvm should go away than. I don't see much work done to enhance it anyway. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] apic/ioapic kvm free implementation
Gleb Natapov wrote: > On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote: >> Same thing, >> >> addressing comments from gleb. >> >> > Jan, can you run your test on this one? It differs from previous one in > halt handling. Still works for me. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH] use KVMState, as upstream do
On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote: > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote: > > This is a pretty mechanical change. To make code look > > closer to upstream qemu, I'm renaming kvm_context_t to > > KVMState. Mid term goal here is to start sharing code > > whereas possible. > > > > Avi, please apply, or I'll send you a video of myself > > dancing naked. > > > You can start recording it since I doubt this patch will apply cleanly > to today's master (other mechanical change was applied). Regardless, I > think trying to use bits of qemu kvm is dangerous. It has similar function > with same names, but with different assumptions about conditional they > can be executed in (look at commit a5ddb119). I actually prefer to be > different enough to not call upstream qemu function by mistake. I did it against today's master. If new patches came in, is just a matter of regenerating this, since it is, as I said, mechanical. Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c are not included in the final object), there is no such risk. Of course, I am aiming towards it, but the first step will be to change the name of conflicting functions until we can pick qemu's implementation, in which case the former will just go away. If we are serious about merging qemu-kvm into qemu, I don't see a way out of it. We should start changing things this way to accomodate it. Different enough won't do. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure
Gregory Haskins wrote: Oh, I don't doubt that (in fact, I was pretty sure that was the case based on some of the optimizations I could see in studying the c_t_u() path). I just didn't realize there were other ways to do it if its a non "current" task. ;) I guess the enigma for me right now is what cost does switch_mm have? (Thats not a slam against the suggested approach...I really do not know and am curious). switch_mm() is probably very cheap (reloads cr3), but it does dirty the current cpu's tlb. When the kernel needs to flush a process' tlb, it will have to IPI that cpu in addition to all others. This takes place, for example, after munmap() or after a page is swapped out (though significant batching is done there). It's still plenty cheaper in my estimation. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 00/19] virtual-bus
Avi Kivity wrote: > Gregory Haskins wrote: >> Avi, >> >> Gregory Haskins wrote: >> >>> Todo: >>> *) Develop some kind of hypercall registration mechanism for KVM so >>> that >>>we can use that as an integration point instead of directly hooking >>>kvm hypercalls >>> >> >> What would you like to see here? I now remember why I removed the >> original patch I had for registration...it requires some kind of >> discovery mechanism on its own. Note that this is hard, but I figured >> it would make the overall series simpler if I didn't go this route and >> instead just integrated with a statically allocated vector. That being >> said, I have no problem adding this back in but figure we should discuss >> the approach so I don't go down a rat-hole ;) >> >> > > > One idea is similar to signalfd() or eventfd(). Provide a kvm ioctl > that takes a gsi and returns an fd. Writes to the fd change the state > of the line, possible triggering an interrupt. Another ioctl takes a > hypercall number or pio port as well as an existing fd. Invocations > of the hypercall or writes to the port write to the fd (using the same > protocol as eventfd), so the other end can respond. > > The nice thing is that this can be used by both kernel and userspace > components, and for kernel components, hypercalls can be either > buffered or unbuffered. And thus the "kvm-eventfd" (irqfd/iosignalfd) interface project was born. ;) (Michael FYI: so I will be pushing a vbus-v4 series at some point in the near future that is expressed in terms of irqfd/iosignalfd, per the conversation above. The patches in v3 and earlier are more intrusive to the KVM core than they will be in final form) -Greg signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure
Avi Kivity wrote: > Gregory Haskins wrote: > > >>> BTW, why did you decide to use get_user_pages? >>> Would switch_mm + copy_to_user work as well >>> avoiding page walk if all pages are present? >>> >> >> Well, basic c_t_u() won't work because its likely not "current" if you >> are updating the ring from some other task, but I think you have already >> figured that out based on the switch_mm suggestion. The simple truth is >> I was not familiar with switch_mm at the time I wrote this (nor am I >> now). If this is a superior method that allows you to acquire >> c_t_u(some_other_ctx) like behavior, I see no problem in changing. I >> will look into this, and thanks for the suggestion! >> > > copy_to_user() is significantly faster than get_user_pages() + kmap() > + memcmp() (or their variants). > Oh, I don't doubt that (in fact, I was pretty sure that was the case based on some of the optimizations I could see in studying the c_t_u() path). I just didn't realize there were other ways to do it if its a non "current" task. ;) I guess the enigma for me right now is what cost does switch_mm have? (Thats not a slam against the suggested approach...I really do not know and am curious). As an aside, note that we seem to be reviewing v2, where v3 is really the last set I pushed. I think this patch is more or less the same across both iterations, but FYI that I would recommend looking at v3 instead. -Greg signature.asc Description: OpenPGP digital signature
Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure
Gregory Haskins wrote: BTW, why did you decide to use get_user_pages? Would switch_mm + copy_to_user work as well avoiding page walk if all pages are present? Well, basic c_t_u() won't work because its likely not "current" if you are updating the ring from some other task, but I think you have already figured that out based on the switch_mm suggestion. The simple truth is I was not familiar with switch_mm at the time I wrote this (nor am I now). If this is a superior method that allows you to acquire c_t_u(some_other_ctx) like behavior, I see no problem in changing. I will look into this, and thanks for the suggestion! copy_to_user() is significantly faster than get_user_pages() + kmap() + memcmp() (or their variants). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure
Michael S. Tsirkin wrote: Also - if we just had vmexit because a process executed io (or hypercall), can't we just do copy_to_user there? Avi, I think at some point you said that we can? You can do copy_to_user() whereever it is legal in Linux. Almost all of kvm runs in process context, preemptible, and with interrupts enabled. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure
Michael S. Tsirkin wrote: > On Thu, Apr 09, 2009 at 12:30:57PM -0400, Gregory Haskins wrote: > >> +static unsigned long >> +task_memctx_copy_to(struct vbus_memctx *ctx, void *dst, const void *src, >> +unsigned long n) >> +{ >> +struct task_memctx *tm = to_task_memctx(ctx); >> +struct task_struct *p = tm->task; >> + >> +while (n) { >> +unsigned long offset = ((unsigned long)dst)%PAGE_SIZE; >> +unsigned long len = PAGE_SIZE - offset; >> +int ret; >> +struct page *pg; >> +void *maddr; >> + >> +if (len > n) >> +len = n; >> + >> +down_read(&p->mm->mmap_sem); >> +ret = get_user_pages(p, p->mm, >> + (unsigned long)dst, 1, 1, 0, &pg, NULL); >> + >> +if (ret != 1) { >> +up_read(&p->mm->mmap_sem); >> +break; >> +} >> + >> +maddr = kmap_atomic(pg, KM_USER0); >> +memcpy(maddr + offset, src, len); >> +kunmap_atomic(maddr, KM_USER0); >> +set_page_dirty_lock(pg); >> +put_page(pg); >> +up_read(&p->mm->mmap_sem); >> + >> +src += len; >> +dst += len; >> +n -= len; >> +} >> + >> +return n; >> +} >> > > BTW, why did you decide to use get_user_pages? > Would switch_mm + copy_to_user work as well > avoiding page walk if all pages are present? > Well, basic c_t_u() won't work because its likely not "current" if you are updating the ring from some other task, but I think you have already figured that out based on the switch_mm suggestion. The simple truth is I was not familiar with switch_mm at the time I wrote this (nor am I now). If this is a superior method that allows you to acquire c_t_u(some_other_ctx) like behavior, I see no problem in changing. I will look into this, and thanks for the suggestion! > Also - if we just had vmexit because a process executed > io (or hypercall), can't we just do copy_to_user there? > Avi, I think at some point you said that we can? > Right, and yes that will work I believe. We could always do a "if (p == current)" check to test for this. To date, I don't typically do anything mem-ops related directly in vcpu context so this wasn't an issue...but that doesn't mean someone wont try in the future. Therefore, I agree we should strive to optimize it if we can. > > Thanks Michael, -Greg signature.asc Description: OpenPGP digital signature
[patch 4/4] KVM: switch irq injection/acking data structures to irq_lock
Protect irq injection/acking data structures with a separate irq_lock mutex. This fixes the following deadlock: CPU A CPU B kvm_vm_ioctl_deassign_dev_irq() mutex_lock(&kvm->lock);worker_thread() -> kvm_deassign_irq()-> kvm_assigned_dev_interrupt_work_handler() -> deassign_host_irq() mutex_lock(&kvm->lock); -> cancel_work_sync() [blocked] Reported-by: Alex Williamson Signed-off-by: Marcelo Tosatti Index: kvm/arch/x86/kvm/i8254.c === --- kvm.orig/arch/x86/kvm/i8254.c +++ kvm/arch/x86/kvm/i8254.c @@ -651,10 +651,10 @@ static void __inject_pit_timer_intr(stru struct kvm_vcpu *vcpu; int i; - mutex_lock(&kvm->lock); + mutex_lock(&kvm->irq_lock); kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1); kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->irq_lock); /* * Provides NMI watchdog support via Virtual Wire mode. Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -2099,10 +2099,10 @@ long kvm_arch_vm_ioctl(struct file *filp goto out; if (irqchip_in_kernel(kvm)) { __s32 status; - mutex_lock(&kvm->lock); + mutex_lock(&kvm->irq_lock); status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irq_event.irq, irq_event.level); - mutex_unlock(&kvm->lock); + mutex_unlock(&kvm->irq_lock); if (ioctl == KVM_IRQ_LINE_STATUS) { irq_event.status = status; if (copy_to_user(argp, &irq_event, @@ -2348,12 +2348,11 @@ mmio: */ mutex_lock(&vcpu->kvm->lock); mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0); + mutex_unlock(&vcpu->kvm->lock); if (mmio_dev) { kvm_iodevice_read(mmio_dev, gpa, bytes, val); - mutex_unlock(&vcpu->kvm->lock); return X86EMUL_CONTINUE; } - mutex_unlock(&vcpu->kvm->lock); vcpu->mmio_needed = 1; vcpu->mmio_phys_addr = gpa; @@ -2403,12 +2402,11 @@ mmio: */ mutex_lock(&vcpu->kvm->lock); mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 1); + mutex_unlock(&vcpu->kvm->lock); if (mmio_dev) { kvm_iodevice_write(mmio_dev, gpa, bytes, val); - mutex_unlock(&vcpu->kvm->lock); return X86EMUL_CONTINUE; } - mutex_unlock(&vcpu->kvm->lock); vcpu->mmio_needed = 1; vcpu->mmio_phys_addr = gpa; @@ -2731,7 +2729,6 @@ static void kernel_pio(struct kvm_io_dev { /* TODO: String I/O for in kernel device */ - mutex_lock(&vcpu->kvm->lock); if (vcpu->arch.pio.in) kvm_iodevice_read(pio_dev, vcpu->arch.pio.port, vcpu->arch.pio.size, @@ -2740,7 +2737,6 @@ static void kernel_pio(struct kvm_io_dev kvm_iodevice_write(pio_dev, vcpu->arch.pio.port, vcpu->arch.pio.size, pd); - mutex_unlock(&vcpu->kvm->lock); } static void pio_string_write(struct kvm_io_device *pio_dev, @@ -2750,14 +2746,12 @@ static void pio_string_write(struct kvm_ void *pd = vcpu->arch.pio_data; int i; - mutex_lock(&vcpu->kvm->lock); for (i = 0; i < io->cur_count; i++) { kvm_iodevice_write(pio_dev, io->port, io->size, pd); pd += io->size; } - mutex_unlock(&vcpu->kvm->lock); } static struct kvm_io_device *vcpu_find_pio_dev(struct kvm_vcpu *vcpu, @@ -2794,7 +2788,9 @@ int kvm_emulate_pio(struct kvm_vcpu *vcp val = kvm_register_read(vcpu, VCPU_REGS_RAX); memcpy(vcpu->arch.pio_data, &val, 4); + mutex_lock(&vcpu->kvm->lock); pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in); + mutex_unlock(&vcpu->kvm->lock); if (pio_dev) { kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data); complete_pio(vcpu); @@ -2858,9 +2854,12 @@ int kvm_emulate_pio_string(struct kvm_vc vcpu->arch.pio.guest_gva = address; + mutex_lock(&vcpu->kvm->lock); pio_dev = vcpu_find_pio_dev(vcpu, port, vcpu->arch.pio.cur_count, !vcpu->arch.pio.in); + mutex_unlock(&vcpu->kvm->lock); + if (!vcpu->arch.pio.in) { /* string PIO write */ ret = pio_copy_data(vcpu); Index: kvm/virt/k
Re: [patch] VMX Unrestricted mode support
Nitin A Kamble wrote: > Hi Avi, > I find that the qemu processor reset state is not per the IA32 > processor specifications. (Sections 8.1.1 of > http://www.intel.com/Assets/PDF/manual/253668.pdf) > > In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the > segment registers are initialized as follows: > > cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, >DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | > DESC_R_MASK); > cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, >DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, >DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, >DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, >DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, >DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > > While the IA32 cpu reset state specification says that Segment Accessed > bit is also 1 at the time of cpu reset. so the above code should look > like this: > > cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x, > DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | > DESC_R_MASK | DESC_A_MASK); > cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x, > DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK); > cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x, > DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK); > cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x, > DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK); > cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x, > DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x, > DESC_P_MASK | DESC_S_MASK | DESC_W_MASK); > > This discrepancy is adding the need of the following function in the > unrestricted guest patch. As Avi already indicated: Independent of the kvm workaround for older qemu versions, please post (to qemu-devel) a patch against upstream's git to fix the discrepancy. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/4] move irq protection role to separate lock v4
This is to fix a deadlock reported by Alex Williamson, while at the same time makes it easier to allow PIO/MMIO regions to be registered/unregistered while a guest is alive. -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/4] KVM: introduce irq_lock, use it to protect ioapic
Introduce irq_lock, and use to protect ioapic data structures. Signed-off-by: Marcelo Tosatti Index: kvm/include/linux/kvm_host.h === --- kvm.orig/include/linux/kvm_host.h +++ kvm/include/linux/kvm_host.h @@ -123,7 +123,6 @@ struct kvm_kernel_irq_routing_entry { }; struct kvm { - struct mutex lock; /* protects the vcpus array and APIC accesses */ spinlock_t mmu_lock; struct rw_semaphore slots_lock; struct mm_struct *mm; /* userspace tied to this vm */ @@ -132,6 +131,7 @@ struct kvm { KVM_PRIVATE_MEM_SLOTS]; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; struct list_head vm_list; + struct mutex lock; struct kvm_io_bus mmio_bus; struct kvm_io_bus pio_bus; #ifdef CONFIG_HAVE_KVM_EVENTFD @@ -145,6 +145,7 @@ struct kvm { struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; #endif + struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP struct list_head irq_routing; /* of kvm_kernel_irq_routing_entry */ struct hlist_head mask_notifier_list; Index: kvm/virt/kvm/ioapic.c === --- kvm.orig/virt/kvm/ioapic.c +++ kvm/virt/kvm/ioapic.c @@ -243,6 +243,7 @@ static void ioapic_mmio_read(struct kvm_ ioapic_debug("addr %lx\n", (unsigned long)addr); ASSERT(!(addr & 0xf)); /* check alignment */ + mutex_lock(&ioapic->kvm->irq_lock); addr &= 0xff; switch (addr) { case IOAPIC_REG_SELECT: @@ -269,6 +270,7 @@ static void ioapic_mmio_read(struct kvm_ default: printk(KERN_WARNING "ioapic: wrong length %d\n", len); } + mutex_unlock(&ioapic->kvm->irq_lock); } static void ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, @@ -280,6 +282,8 @@ static void ioapic_mmio_write(struct kvm ioapic_debug("ioapic_mmio_write addr=%p len=%d val=%p\n", (void*)addr, len, val); ASSERT(!(addr & 0xf)); /* check alignment */ + + mutex_lock(&ioapic->kvm->irq_lock); if (len == 4 || len == 8) data = *(u32 *) val; else { @@ -305,6 +309,7 @@ static void ioapic_mmio_write(struct kvm default: break; } + mutex_unlock(&ioapic->kvm->irq_lock); } void kvm_ioapic_reset(struct kvm_ioapic *ioapic) Index: kvm/virt/kvm/kvm_main.c === --- kvm.orig/virt/kvm/kvm_main.c +++ kvm/virt/kvm/kvm_main.c @@ -979,6 +979,7 @@ static struct kvm *kvm_create_vm(void) kvm_io_bus_init(&kvm->pio_bus); kvm_irqfd_init(kvm); mutex_init(&kvm->lock); + mutex_init(&kvm->irq_lock); kvm_io_bus_init(&kvm->mmio_bus); init_rwsem(&kvm->slots_lock); atomic_set(&kvm->users_count, 1); -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/4] KVM: x86: grab pic lock in kvm_pic_clear_isr_ack
isr_ack is protected by kvm_pic->lock. Signed-off-by: Marcelo Tosatti Index: kvm/arch/x86/kvm/i8259.c === --- kvm.orig/arch/x86/kvm/i8259.c +++ kvm/arch/x86/kvm/i8259.c @@ -72,8 +72,10 @@ static void pic_clear_isr(struct kvm_kpi void kvm_pic_clear_isr_ack(struct kvm *kvm) { struct kvm_pic *s = pic_irqchip(kvm); + pic_lock(s); s->pics[0].isr_ack = 0xff; s->pics[1].isr_ack = 0xff; + pic_unlock(s); } /* -- -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/4] KVM: move coalesced_mmio locking to its own device
Move coalesced_mmio locking to its own device, instead of relying on kvm->lock. Signed-off-by: Marcelo Tosatti Index: kvm/virt/kvm/coalesced_mmio.c === --- kvm.orig/virt/kvm/coalesced_mmio.c +++ kvm/virt/kvm/coalesced_mmio.c @@ -31,10 +31,6 @@ static int coalesced_mmio_in_range(struc if (!is_write) return 0; - /* kvm->lock is taken by the caller and must be not released before - * dev.read/write - */ - /* Are we able to batch it ? */ /* last is the first free entry @@ -70,7 +66,7 @@ static void coalesced_mmio_write(struct struct kvm_coalesced_mmio_dev *dev = to_mmio(this); struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring; - /* kvm->lock must be taken by caller before call to in_range()*/ + spin_lock(&dev->lock); /* copy data in first free entry of the ring */ @@ -79,6 +75,7 @@ static void coalesced_mmio_write(struct memcpy(ring->coalesced_mmio[ring->last].data, val, len); smp_wmb(); ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX; + spin_unlock(&dev->lock); } static void coalesced_mmio_destructor(struct kvm_io_device *this) @@ -101,6 +98,7 @@ int kvm_coalesced_mmio_init(struct kvm * dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL); if (!dev) return -ENOMEM; + spin_lock_init(&dev->lock); kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops); dev->kvm = kvm; kvm->coalesced_mmio_dev = dev; Index: kvm/virt/kvm/coalesced_mmio.h === --- kvm.orig/virt/kvm/coalesced_mmio.h +++ kvm/virt/kvm/coalesced_mmio.h @@ -12,6 +12,7 @@ struct kvm_coalesced_mmio_dev { struct kvm_io_device dev; struct kvm *kvm; + spinlock_t lock; int nb_zones; struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX]; }; -- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: > On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote: > >> Michael S. Tsirkin wrote: >> >>> As I'm new to qemu/kvm, to figure out how networking performance can be >>> improved, I >>> went over the code and took some notes. As I did this, I tried to record >>> ideas >>> from recent discussions and ideas that came up on improving performance. >>> Thus >>> this list. >>> >>> This includes a partial overview of networking code in a virtual >>> environment, with >>> focus on performance: I'm only interested in sending and receiving packets, >>> ignoring configuration etc. >>> >>> I have likely missed a ton of clever ideas and older discussions, and >>> probably >>> misunderstood some code. Please pipe up with corrections, additions, etc. >>> And >>> please don't take offence if I didn't attribute the idea correctly - most of >>> them are marked mst by I don't claim they are original. Just let me know. >>> >>> And there are a couple of trivial questions on the code - I'll >>> add answers here as they become available. >>> >>> I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as >>> well, and intend to dump updates there from time to time. >>> >>> >> Hi Michael, >> Not sure if you have seen this, but I've already started to work on >> the code for in-kernel devices and have a (currently non-virtio based) >> proof-of-concept network device which you can for comparative data. You >> can find details here: >> >> http://lkml.org/lkml/2009/4/21/408 >> >> >> > > Thanks > > >> (Will look at your list later, to see if I can add anything) >> >>> --- >>> >>> Short term plans: I plan to start out with trying out the following ideas: >>> >>> save a copy in qemu on RX side in case of a single nic in vlan >>> implement virtio-host kernel module >>> >>> *detail on virtio-host-net kernel module project* >>> >>> virtio-host-net is a simple character device which gets memory layout >>> information >>> from qemu, and uses this to convert between virtio descriptors to skbs. >>> The skbs are then passed to/from raw socket (or we could bind virtio-host >>> to physical device like raw socket does TBD). >>> >>> Interrupts will be reported to eventfd descriptors, and device will poll >>> eventfd descriptors to get kicks from guest. >>> >>> >>> >> I currently have a virtio transport for vbus implemented, but it still >> needs a virtio-net device-model backend written. >> > > You mean virtio-ring implementation? > Right. > I intended to basically start by reusing the code from > Documentation/lguest/lguest.c > Isn't this all there is to it? > Not sure. I reused the ring code already in the kernel. > >> If you are interested, >> we can work on this together to implement your idea. Its on my "todo" >> list for vbus anyway, but I am currently distracted with the >> irqfd/iosignalfd projects which are prereqs for vbus to be considered >> for merge. >> >> Basically vbus is a framework for declaring in-kernel devices (not kvm >> specific, per se) with a full security/containment model, a >> hot-pluggable configuration engine, and a dynamically loadable >> device-model. The framework takes care of the details of signal-path >> and memory routing for you so that something like a virtio-net model can >> be implemented once and work in a variety of environments such as kvm, >> lguest, etc. >> >> Interested? >> -Greg >> >> > > It seems that a character device with a couple of ioctls would be simpler > for an initial prototype. > Suit yourself, but I suspect that by the time you build the prototype you will either end up re-solving all the same problems anyway, or have diminished functionality (or both). Its actually very simple to declare a new virtio-vbus device, but the choice is yours. I can crank out a skeleton for you, if you like. -Greg signature.asc Description: OpenPGP digital signature
Re: TODO list for qemu+KVM networking performance v2
On Thu, Jun 04, 2009 at 01:50:20PM -0400, Gregory Haskins wrote: > Suit yourself, but I suspect that by the time you build the prototype > you will either end up re-solving all the same problems anyway, or have > diminished functionality (or both). /me goes to look at vbus patches. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure
On Thu, Apr 09, 2009 at 12:30:57PM -0400, Gregory Haskins wrote: > +static unsigned long > +task_memctx_copy_to(struct vbus_memctx *ctx, void *dst, const void *src, > + unsigned long n) > +{ > + struct task_memctx *tm = to_task_memctx(ctx); > + struct task_struct *p = tm->task; > + > + while (n) { > + unsigned long offset = ((unsigned long)dst)%PAGE_SIZE; > + unsigned long len = PAGE_SIZE - offset; > + int ret; > + struct page *pg; > + void *maddr; > + > + if (len > n) > + len = n; > + > + down_read(&p->mm->mmap_sem); > + ret = get_user_pages(p, p->mm, > + (unsigned long)dst, 1, 1, 0, &pg, NULL); > + > + if (ret != 1) { > + up_read(&p->mm->mmap_sem); > + break; > + } > + > + maddr = kmap_atomic(pg, KM_USER0); > + memcpy(maddr + offset, src, len); > + kunmap_atomic(maddr, KM_USER0); > + set_page_dirty_lock(pg); > + put_page(pg); > + up_read(&p->mm->mmap_sem); > + > + src += len; > + dst += len; > + n -= len; > + } > + > + return n; > +} BTW, why did you decide to use get_user_pages? Would switch_mm + copy_to_user work as well avoiding page walk if all pages are present? Also - if we just had vmexit because a process executed io (or hypercall), can't we just do copy_to_user there? Avi, I think at some point you said that we can? -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote: > Michael S. Tsirkin wrote: > > As I'm new to qemu/kvm, to figure out how networking performance can be > > improved, I > > went over the code and took some notes. As I did this, I tried to record > > ideas > > from recent discussions and ideas that came up on improving performance. > > Thus > > this list. > > > > This includes a partial overview of networking code in a virtual > > environment, with > > focus on performance: I'm only interested in sending and receiving packets, > > ignoring configuration etc. > > > > I have likely missed a ton of clever ideas and older discussions, and > > probably > > misunderstood some code. Please pipe up with corrections, additions, etc. > > And > > please don't take offence if I didn't attribute the idea correctly - most of > > them are marked mst by I don't claim they are original. Just let me know. > > > > And there are a couple of trivial questions on the code - I'll > > add answers here as they become available. > > > > I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as > > well, and intend to dump updates there from time to time. > > > > Hi Michael, > Not sure if you have seen this, but I've already started to work on > the code for in-kernel devices and have a (currently non-virtio based) > proof-of-concept network device which you can for comparative data. You > can find details here: > > http://lkml.org/lkml/2009/4/21/408 > > Thanks > (Will look at your list later, to see if I can add anything) > > --- > > > > Short term plans: I plan to start out with trying out the following ideas: > > > > save a copy in qemu on RX side in case of a single nic in vlan > > implement virtio-host kernel module > > > > *detail on virtio-host-net kernel module project* > > > > virtio-host-net is a simple character device which gets memory layout > > information > > from qemu, and uses this to convert between virtio descriptors to skbs. > > The skbs are then passed to/from raw socket (or we could bind virtio-host > > to physical device like raw socket does TBD). > > > > Interrupts will be reported to eventfd descriptors, and device will poll > > eventfd descriptors to get kicks from guest. > > > > > > I currently have a virtio transport for vbus implemented, but it still > needs a virtio-net device-model backend written. You mean virtio-ring implementation? I intended to basically start by reusing the code from Documentation/lguest/lguest.c Isn't this all there is to it? > If you are interested, > we can work on this together to implement your idea. Its on my "todo" > list for vbus anyway, but I am currently distracted with the > irqfd/iosignalfd projects which are prereqs for vbus to be considered > for merge. > > Basically vbus is a framework for declaring in-kernel devices (not kvm > specific, per se) with a full security/containment model, a > hot-pluggable configuration engine, and a dynamically loadable > device-model. The framework takes care of the details of signal-path > and memory routing for you so that something like a virtio-net model can > be implemented once and work in a variety of environments such as kvm, > lguest, etc. > > Interested? > -Greg > It seems that a character device with a couple of ioctls would be simpler for an initial prototype. -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: TODO list for qemu+KVM networking performance v2
Michael S. Tsirkin wrote: > As I'm new to qemu/kvm, to figure out how networking performance can be > improved, I > went over the code and took some notes. As I did this, I tried to record > ideas > from recent discussions and ideas that came up on improving performance. Thus > this list. > > This includes a partial overview of networking code in a virtual environment, > with > focus on performance: I'm only interested in sending and receiving packets, > ignoring configuration etc. > > I have likely missed a ton of clever ideas and older discussions, and probably > misunderstood some code. Please pipe up with corrections, additions, etc. And > please don't take offence if I didn't attribute the idea correctly - most of > them are marked mst by I don't claim they are original. Just let me know. > > And there are a couple of trivial questions on the code - I'll > add answers here as they become available. > > I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as > well, and intend to dump updates there from time to time. > Hi Michael, Not sure if you have seen this, but I've already started to work on the code for in-kernel devices and have a (currently non-virtio based) proof-of-concept network device which you can for comparative data. You can find details here: http://lkml.org/lkml/2009/4/21/408 (Will look at your list later, to see if I can add anything) > --- > > Short term plans: I plan to start out with trying out the following ideas: > > save a copy in qemu on RX side in case of a single nic in vlan > implement virtio-host kernel module > > *detail on virtio-host-net kernel module project* > > virtio-host-net is a simple character device which gets memory layout > information > from qemu, and uses this to convert between virtio descriptors to skbs. > The skbs are then passed to/from raw socket (or we could bind virtio-host > to physical device like raw socket does TBD). > > Interrupts will be reported to eventfd descriptors, and device will poll > eventfd descriptors to get kicks from guest. > > I currently have a virtio transport for vbus implemented, but it still needs a virtio-net device-model backend written. If you are interested, we can work on this together to implement your idea. Its on my "todo" list for vbus anyway, but I am currently distracted with the irqfd/iosignalfd projects which are prereqs for vbus to be considered for merge. Basically vbus is a framework for declaring in-kernel devices (not kvm specific, per se) with a full security/containment model, a hot-pluggable configuration engine, and a dynamically loadable device-model. The framework takes care of the details of signal-path and memory routing for you so that something like a virtio-net model can be implemented once and work in a variety of environments such as kvm, lguest, etc. Interested? -Greg signature.asc Description: OpenPGP digital signature
TODO list for qemu+KVM networking performance v2
As I'm new to qemu/kvm, to figure out how networking performance can be improved, I went over the code and took some notes. As I did this, I tried to record ideas from recent discussions and ideas that came up on improving performance. Thus this list. This includes a partial overview of networking code in a virtual environment, with focus on performance: I'm only interested in sending and receiving packets, ignoring configuration etc. I have likely missed a ton of clever ideas and older discussions, and probably misunderstood some code. Please pipe up with corrections, additions, etc. And please don't take offence if I didn't attribute the idea correctly - most of them are marked mst by I don't claim they are original. Just let me know. And there are a couple of trivial questions on the code - I'll add answers here as they become available. I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as well, and intend to dump updates there from time to time. Thanks, MST --- There are many ways to set up networking in a virtual machone. here's one: linux guest -> virtio-net -> virtio-pci -> qemu+kvm -> tap -> bridge. Let's take a look at this one. Virtio is the guest side of things. Guest kernel virtio-net: TX: - Guest kernel allocates a packet (skb) in guest kernel memory and fills it in with data, passes it to networking stack. - The skb is passed on to guest network driver (hard_start_xmit) - skbs in flight are kept in send queue linked list, so that we can flush them when device is removed [ mst: optimization idea: virtqueue already tracks posted buffers. Add flush/purge operation and use that instead? ] - skb is reformatted to scattergather format [ mst: idea to try: this does a copy for skb head, which might be costly especially for small/linear packets. Try to avoid this? Might need to tweak virtio interface. ] - network driver adds the packet buffer on TX ring - network driver does a kick which causes a VM exit [ mst: any way to mitigate # of VM exits here? Possibly could be done on host side as well. ] [ markmc: All of our efforts there have been on the host side, I think that's preferable than trying to do anything on the guest side. ] - Full queue: we keep a single extra skb around: if we fail to transmit, we queue it [ mst: idea to try: what does it do to performance if we queue more packets? ] if we already have 1 outstanding packet, we stop the queue and discard the new packet [ mst: optimization idea: might be better to discard the old packet and queue the new one, e.g. with TCP old one might have timed out already ] [ markmc: the queue might soon be going away: 200905292346.04815.ru...@rustcorp.com.au http://archive.netbsd.se/?ml=linux-netdev&a=2009-05&m=10788575 ] - We get each buffer from host as it is completed and free it - TX interrupts are only enabled when queue is stopped, and when it is originally created (we disable them on completion) [ mst: idea: second part is probably unintentional. todo: we probably should disable interrupts when device is created. ] - We poll for buffer completions: 1. Before each TX 2. On a timer tasklet (unless 3 is supported) 3. When host sends us interrupt telling us that the queue is empty [ mst: idea to try: instead of empty, enable send interrupts on xmit when buffer is almost full (e.g. at least half empty): we are running out of buffers, it's important to free them ASAP. Can be done from host or from guest. ] [ Rusty proposing that we don't need (2) or (3) if the skbs are orphaned before start_xmit(). See subj "net: skb_orphan on dev_hard_start_xmit".] [ rusty also seems to be suggesting that disabling VIRTIO_F_NOTIFY_ON_EMPTY on the host should help the case where the host out-paces the guest ] 4. when queue is stopped or when first packet was sent after device was created (interrupts are enabled then) RX: - There are really 2 mostly separate code paths: with mergeable rx buffers support in host and without. I focus on mergeable buffers here since this is the default in recent qemu. [mst: optimization idea: mark mergeable_rx_bufs as likely() then?] - Each skb has a 128 byte buffer at head and a single page for data. Only full pages are passed to virtio buffers. [ mst: for large packets, managing the 128 head buffers is w
Re: NV-CUDA: a new way in virtualization is possible?
> It would be possible to use this technology in the KVM/Qemu project to > achieve better performance? > It could be a significative step for the develop in virtualization > technology? Nothing is "impossible", but it is at least not obvious how to pull off such a trick. Qemu/KVM is not "embarrassingly parallelizable", at least not in a straightforward way imho. > Someone, in experimental way, has (re)wrote the md-raid kernel modules using > the CUDA framework to accelerate some features... and it seems that works > fine. > Why not for KVM/Qemu or related projects, including kernel/user-space > extension? RAID is "easy", as is FFT, graphics operations, cryptography etc. People have been parallelizing these algorithms for several years before even nvidia existed and CUDA is just a new backend to apply more or less the same techniques. KVM/Qemu on the other hand are not 100% CPU bound and are also not trivial to massively parallelize, so you might find the task a bit hard. HTH, Pantelis -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NV-CUDA: a new way in virtualization is possible?
Hello all! I'm a KVM/Qemu user for a long time and I'm very satisfied by its features of flexibility, power and portability - really a good project! Recently, reading some technical articles over internet, I have discoverd the big potentialities of the NV-CUDA framework in relation to the scientific and graphic computing that takes strong advantage from the most recent GPUs. Someone has used it for password recovery, realtime rendering, etc, with great results. It would be possible to use this technology in the KVM/Qemu project to achieve better performance? It could be a significative step for the develop in virtualization technology? Someone, in experimental way, has (re)wrote the md-raid kernel modules using the CUDA framework to accelerate some features... and it seems that works fine. Why not for KVM/Qemu or related projects, including kernel/user-space extension? What do you think about this draft idea? Any feedback is welcome... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2801212 ] sles10sp2 guest timer run too fast
Bugs item #2801212, was opened at 2009-06-04 08:17 Message generated for change (Tracker Item Submitted) made by jiajun You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: sles10sp2 guest timer run too fast Initial Comment: With kvm.git Commit:7ff90748cebbfbafc8cfa6bdd633113cd9537789 qemu-kvm Commit:a1cd3c985c848dae73966f9601f15fbcade72f1, we found that sles10sp2 will run much faster than real, about 27s faster each after 60s real time. Reproduce steps: (1)qemu-system-x86_64 -m 1024 -net nic,macaddr=00:16:3e:6f:f3:d1,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sles10sp2.img (2)Run ntpdate in guest: ntpdate sync_machine_ip && sleep 60 && ntpdate sync_machine_ip Current result: sles10sp2rc1-guest:~ # ntpdate sync_machine_ip && sleep 60 && ntpdate sync_machine_ip 31 May 23:16:59 ntpdate[3303]: step time server 192.168.198.248 offset -61.27418 31 May 23:17:32 ntpdate[3305]: step time server 192.168.198.248 offset -27.626469 sec -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] apic/ioapic kvm free implementation
On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote: > Same thing, > > addressing comments from gleb. > > Jan, can you run your test on this one? It differs from previous one in halt handling. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()
Gregory Haskins wrote: (Applies to kvm.git/master:25deed73) Please see the header for 2/2 for a description. This patch series has been fully tested and appears to be working correctly. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
On Thu, Jun 04, 2009 at 04:49:50PM +0300, Avi Kivity wrote: > Andi Kleen wrote: > >>There's no good place as it breaks the nice exit handler table. You > >>could put it in vmx_complete_interrupts() next to NMI handling. > >> > > > >I think I came up with a easy cheesy but not too bad solution now that > >should work. It simply remembers the CPU in the vcpu structure and > >schedules back to it. That's fine for this purpose. > > > > We might be able schedule back in a timely manner. Why not hack > vmx_complete_interrupts()? You're still in the critical section so > you're guaranteed no delays or surprises. Yes, have to do that. My original scheme was too risky because the Machine checks have synchronization mechanisms now and preemption has no time limit. I'll hack on it later today, hope fully have a patch tomorrow. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
Andi Kleen wrote: There's no good place as it breaks the nice exit handler table. You could put it in vmx_complete_interrupts() next to NMI handling. I think I came up with a easy cheesy but not too bad solution now that should work. It simply remembers the CPU in the vcpu structure and schedules back to it. That's fine for this purpose. We might be able schedule back in a timely manner. Why not hack vmx_complete_interrupts()? You're still in the critical section so you're guaranteed no delays or surprises. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support
Hi Greg, On Wed, 2009-06-03 at 18:04 -0400, Gregory Haskins wrote: > Hi Mark, > So with the v5 release of iosignalfd, we now have the notion of a > "trigger", the API of which is as follows: > > --- > /*! > * \brief Assign an eventfd to an IO port (PIO or MMIO) > * > * Assigns an eventfd based file-descriptor to a specific PIO or MMIO > * address range. Any guest writes to the specified range will generate > * an eventfd signal. > * > * A data-match pointer can be optionally provided in "trigger" and only > * writes which match this value exactly will generate an event. The length > * of the trigger is established by the length of the overall IO range, and > * therefore must be in a natural byte-width for the IO routines of your > * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64). This looks like it'll work fine for virtio-pci. Thanks, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
On Thu, Jun 04, 2009 at 04:10:14PM +0300, Avi Kivity wrote: > Andi Kleen wrote: > >>vmcs access work because we have a preempt notifier called when we are > >>scheduled in, and will execute vmclear/vmptrld as necessary. Look at > >>kvm_preempt_ops in virt/kvm_main.c. > >> > > > >I see. So we need to move that check earlier. Do you have a preference > >where it should be? > > > > There's no good place as it breaks the nice exit handler table. You > could put it in vmx_complete_interrupts() next to NMI handling. I think I came up with a easy cheesy but not too bad solution now that should work. It simply remembers the CPU in the vcpu structure and schedules back to it. That's fine for this purpose. Currently testing the patch. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
Andi Kleen wrote: vmcs access work because we have a preempt notifier called when we are scheduled in, and will execute vmclear/vmptrld as necessary. Look at kvm_preempt_ops in virt/kvm_main.c. I see. So we need to move that check earlier. Do you have a preference where it should be? There's no good place as it breaks the nice exit handler table. You could put it in vmx_complete_interrupts() next to NMI handling. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Flush icache after dma operations for ia64
Zhang, Xiantao wrote: Hi, Jes Have you verified whether it works for you ? You may run kernel build in the guest with 4 vcpus, if it can be done successfully without any error, it should be Okay I think, otherwise, we may need to investigate it further. :) Xiantao Hi Xiantao, I was able to run a 16 vCPU guest and build the kernel using make -j 16. How quickly would the problem show up for you, on every run, or should I run more tests? Cheers, Jes -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] revert part of 3db8b916e merge
Gleb Natapov wrote: kvm_*_mpstate() cannot be called from kvm_arch_*_registers() since kvm_arch_*_registers() sometimes called from io thread, but kvm_*_mpstate() can be called only by cpu thread. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] Do not use cpu_index in interface between libkvm and qemu
Gleb Natapov wrote: On vcpu creation cookie is returned which is used in future communication. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
On Thu, Jun 04, 2009 at 03:49:03PM +0300, Avi Kivity wrote: > Andi Kleen wrote: > >>This assumption is incorrect. This code is executed after preemption > >>has been enabled, and we may have even slept before reaching it. > >> > > > >The only thing that counts here is the context before the machine > >check event. If there was a vmexit we know it was in guest context. > > > >The only requirement we have is that we're running still on the same > >CPU. I assume that's true, otherwise the vmcb accesses wouldn't work? > > > > It's not true, we're in preemptible context and may have even slept. > > vmcs access work because we have a preempt notifier called when we are > scheduled in, and will execute vmclear/vmptrld as necessary. Look at > kvm_preempt_ops in virt/kvm_main.c. I see. So we need to move that check earlier. Do you have a preference where it should be? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
Andi Kleen wrote: This assumption is incorrect. This code is executed after preemption has been enabled, and we may have even slept before reaching it. The only thing that counts here is the context before the machine check event. If there was a vmexit we know it was in guest context. The only requirement we have is that we're running still on the same CPU. I assume that's true, otherwise the vmcb accesses wouldn't work? It's not true, we're in preemptible context and may have even slept. vmcs access work because we have a preempt notifier called when we are scheduled in, and will execute vmclear/vmptrld as necessary. Look at kvm_preempt_ops in virt/kvm_main.c. We get both an explicit EXIT_REASON and an exception? These are different cases. The exception is #MC in guest context, the EXIT_REASON is when a #MC happens while the CPU is executing the VM entry microcode. I see, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl
Assigning an irqfd object to a kvm object creates a relationship that we currently manage by having the kvm oject acquire/hold a file* reference to the underlying eventfd. The lifetime of these objects is properly maintained by decoupling the two objects whenever the irqfd is closed or kvm is closed, whichever comes first. However, the irqfd "close" method is less than ideal since it requires two system calls to complete (one for ioctl(kvmfd, IRQFD_DEASSIGN), the other for close(eventfd)). This dual-call approach was utilized because there was no notification mechanism on the eventfd side at the time irqfd was implemented. Recently, Davide proposed a patch to send a POLLHUP wakeup whenever an eventfd is about to close. So we eliminate the IRQFD_DEASSIGN ioctl (*) vector in favor of sensing the desassign automatically when the fd is closed. The resulting code is slightly more complex as a result since we need to allow either side to sever the relationship independently. We utilize SRCU to guarantee stable concurrent access to the KVM pointer without adding additional atomic operations in the fast path. At minimum, this design should be acked by both Davide and Paul (cc'd). (*) The irqfd patch does not exist in any released tree, so the understanding is that we can alter the irqfd specific ABI without taking the normal precautions, such as CAP bits. Signed-off-by: Gregory Haskins CC: Davide Libenzi CC: Michael S. Tsirkin CC: Paul E. McKenney --- include/linux/kvm.h |2 - virt/kvm/eventfd.c | 177 +++ virt/kvm/kvm_main.c |3 + 3 files changed, 81 insertions(+), 101 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 632a856..29b62cc 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -482,8 +482,6 @@ struct kvm_x86_mce { }; #endif -#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) - struct kvm_irqfd { __u32 fd; __u32 gsi; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index f3f2ea1..004c660 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -37,39 +37,92 @@ * */ struct _irqfd { + struct mutex lock; + struct srcu_structsrcu; struct kvm *kvm; int gsi; - struct file *file; struct list_head list; poll_tablept; wait_queue_head_t*wqh; wait_queue_t wait; - struct work_structwork; + struct work_structinject; }; static void irqfd_inject(struct work_struct *work) { - struct _irqfd *irqfd = container_of(work, struct _irqfd, work); - struct kvm *kvm = irqfd->kvm; + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); + struct kvm *kvm; + int idx; + + idx = srcu_read_lock(&irqfd->srcu); + + kvm = rcu_dereference(irqfd->kvm); + if (kvm) { + mutex_lock(&kvm->lock); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); + mutex_unlock(&kvm->lock); + } + + srcu_read_unlock(&irqfd->srcu, idx); +} + +static void +irqfd_disconnect(struct _irqfd *irqfd) +{ + struct kvm *kvm; + + mutex_lock(&irqfd->lock); + + kvm = rcu_dereference(irqfd->kvm); + rcu_assign_pointer(irqfd->kvm, NULL); + + mutex_unlock(&irqfd->lock); + + if (!kvm) + return; mutex_lock(&kvm->lock); - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1); - kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0); + list_del(&irqfd->list); mutex_unlock(&kvm->lock); + + /* +* It is important to not drop the kvm reference until the next grace +* period because there might be lockless references in flight up +* until then +*/ + synchronize_srcu(&irqfd->srcu); + kvm_put_kvm(kvm); } static int irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); + unsigned long flags = (unsigned long)key; - /* -* The wake_up is called with interrupts disabled. Therefore we need -* to defer the IRQ injection until later since we need to acquire the -* kvm->lock to do so. -*/ - schedule_work(&irqfd->work); + if (flags & POLLIN) + /* +* The POLLIN wake_up is called with interrupts disabled. +* Therefore we need to defer the IRQ injection until later +* since we need to acquire the kvm->lock to do so. +*/ + schedule_work(&irqfd->inject); + + if (flags & POLLHUP) { + /* +
[KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()
(Applies to kvm.git/master:25deed73) Please see the header for 2/2 for a description. This patch series has been fully tested and appears to be working correctly. [Review notes: *) Paul has looked at the SRCU design and, to my knowledge, didn't find any holes. *) Michael, Avi, and myself agree that while the removal of the DEASSIGN vector is not desirable, the fix on close() is more important in the short-term. We can always add DEASSIGN support again in the future with a CAP bit. ] [Changelog: v2: *) Pulled in Davide's official patch for 1/2 from his submission accepted into -mmotm. *) Fixed patch 2/2 to use the "key" field as a bitmap in the wakeup logic, per Davide's feedback. v1: *) Initial release ] --- Davide Libenzi (1): Allow waiters to be notified about the eventfd file* going away, and give Gregory Haskins (1): kvm: use POLLHUP to close an irqfd instead of an explicit ioctl fs/eventfd.c| 10 +++ include/linux/kvm.h |2 - virt/kvm/eventfd.c | 177 +++ virt/kvm/kvm_main.c |3 + 4 files changed, 90 insertions(+), 102 deletions(-) -- Signature -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM PATCH v2 1/2] Allow waiters to be notified about the eventfd file* going away, and give
From: Davide Libenzi them a change to unregister from the wait queue. This is turn allows eventfd users to use the eventfd file* w/out holding a live reference to it. After the eventfd user callbacks returns, any usage of the eventfd file* should be dropped. The eventfd user callback can acquire sleepy locks since it is invoked lockless. This is a feature, needed by KVM to avoid an awkward workaround when using eventdf. [gmh: pulled from -mmotm for inclusion in kvm.git] Signed-off-by: Davide Libenzi Tested-by: Gregory Haskins Signed-off-by: Andrew Morton Signed-off-by: Gregory Haskins --- fs/eventfd.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 3f0e197..72f5f8d 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -61,7 +61,15 @@ EXPORT_SYMBOL_GPL(eventfd_signal); static int eventfd_release(struct inode *inode, struct file *file) { - kfree(file->private_data); + struct eventfd_ctx *ctx = file->private_data; + + /* +* No need to hold the lock here, since we are on the file cleanup +* path and the ones still attached to the wait queue will be +* serialized by wake_up_locked_poll(). +*/ + wake_up_locked_poll(&ctx->wqh, POLLHUP); + kfree(ctx); return 0; } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] cleanup acpi table creation
Current code is a mess. And addition of acpi tables is broken. Signed-off-by: Gleb Natapov diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c index 369cbef..fda4894 100755 --- a/kvm/bios/rombios32.c +++ b/kvm/bios/rombios32.c @@ -1293,15 +1293,13 @@ struct rsdp_descriptor /* Root System Descriptor Pointer */ uint8_treserved [3]; /* Reserved field must be 0 */ } __attribute__((__packed__)); -#define MAX_RSDT_ENTRIES 100 - /* * ACPI 1.0 Root System Description Table (RSDT) */ struct rsdt_descriptor_rev1 { ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t table_offset_entry [MAX_RSDT_ENTRIES]; /* Array of pointers to other */ + uint32_t table_offset_entry [0]; /* Array of pointers to other */ /* ACPI tables */ } __attribute__((__packed__)); @@ -1585,324 +1583,332 @@ static void acpi_build_srat_memory(struct srat_memory_affinity *numamem, return; } -/* base_addr must be a multiple of 4KB */ -void acpi_bios_init(void) +static void rsdp_build(struct rsdp_descriptor *rsdp, uint32_t rsdt) { -struct rsdp_descriptor *rsdp; -struct rsdt_descriptor_rev1 *rsdt; -struct fadt_descriptor_rev1 *fadt; -struct facs_descriptor_rev1 *facs; -struct multiple_apic_table *madt; -uint8_t *dsdt, *ssdt; + memset(rsdp, 0, sizeof(*rsdp)); + memcpy(rsdp->signature, "RSD PTR ", 8); #ifdef BX_QEMU -struct system_resource_affinity_table *srat; -struct acpi_20_hpet *hpet; -uint32_t hpet_addr; -#endif -uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, ssdt_addr; -uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size; -uint32_t srat_addr,srat_size; -uint16_t i, external_tables; -int nb_numa_nodes; -int nb_rsdt_entries = 0; - -/* reserve memory space for tables */ -#ifdef BX_USE_EBDA_TABLES -ebda_cur_addr = align(ebda_cur_addr, 16); -rsdp = (void *)(ebda_cur_addr); -ebda_cur_addr += sizeof(*rsdp); + memcpy(rsdp->oem_id, "QEMU ", 6); #else -bios_table_cur_addr = align(bios_table_cur_addr, 16); -rsdp = (void *)(bios_table_cur_addr); -bios_table_cur_addr += sizeof(*rsdp); + memcpy(rsdp->oem_id, "BOCHS ", 6); #endif + rsdp->rsdt_physical_address = rsdt; + rsdp->checksum = acpi_checksum((void*)rsdp, 20); +} -#ifdef BX_QEMU -external_tables = acpi_additional_tables(); -#else -external_tables = 0; -#endif +static uint32_t facs_build(uint32_t *addr) +{ + struct facs_descriptor_rev1 *facs; -addr = base_addr = ram_size - ACPI_DATA_SIZE; -rsdt_addr = addr; -rsdt = (void *)(addr); -rsdt_size = sizeof(*rsdt) + external_tables * 4; -addr += rsdt_size; + *addr = (*addr + 63) & ~63; /* 64 byte alignment for FACS */ + facs = (void*)(*addr); + *addr += sizeof(*facs); -fadt_addr = addr; -fadt = (void *)(addr); -addr += sizeof(*fadt); + memset(facs, 0, sizeof(*facs)); + memcpy(facs->signature, "FACS", 4); + facs->length = cpu_to_le32(sizeof(*facs)); + BX_INFO("Firmware waking vector %p\n", &facs->firmware_waking_vector); -/* XXX: FACS should be in RAM */ -addr = (addr + 63) & ~63; /* 64 byte alignment for FACS */ -facs_addr = addr; -facs = (void *)(addr); -addr += sizeof(*facs); + return (uint32_t)facs; +} -dsdt_addr = addr; -dsdt = (void *)(addr); -addr += sizeof(AmlCode); +static uint32_t dsdt_build(uint32_t *addr) +{ + uint8_t *dsdt = (void*)(*addr); -#ifdef BX_QEMU -qemu_cfg_select(QEMU_CFG_NUMA); -nb_numa_nodes = qemu_cfg_get64(); + *addr += sizeof(AmlCode); + + memcpy(dsdt, AmlCode, sizeof(AmlCode)); + + return (uint32_t)dsdt; +} + +static uint32_t fadt_build(uint32_t *addr, uint32_t facs, uint32_t dsdt) +{ + struct fadt_descriptor_rev1 *fadt = (void*)(*addr); + + *addr += sizeof(*fadt); + memset(fadt, 0, sizeof(*fadt)); + fadt->firmware_ctrl = facs; + fadt->dsdt = dsdt; + fadt->model = 1; + fadt->reserved1 = 0; + fadt->sci_int = cpu_to_le16(pm_sci_int); + fadt->smi_cmd = cpu_to_le32(SMI_CMD_IO_ADDR); + fadt->acpi_enable = 0xf1; + fadt->acpi_disable = 0xf0; + fadt->pm1a_evt_blk = cpu_to_le32(pm_io_base); + fadt->pm1a_cnt_blk = cpu_to_le32(pm_io_base + 0x04); + fadt->pm_tmr_blk = cpu_to_le32(pm_io_base + 0x08); + fadt->pm1_evt_len = 4; + fadt->pm1_cnt_len = 2; + fadt->pm_tmr_len = 4; + fadt->plvl2_lat = cpu_to_le16(0xfff); // C2 state not supported + fadt->plvl3_lat = cpu_to_le16(0xfff); // C3 state not supported + fadt->gpe0_blk = cpu_to_le32(0xafe0); + fadt->gpe0_blk_len = 4; + /* WBINVD + PROC_C1 + SLP_BUTTON + FIX_RTC */ + fadt->flags = cpu_to_le32((1 << 0) | (1 << 2) | (1 << 5) | (1 << 6)); + acpi_build_table_header((struct acpi_table_header *)fadt, "
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
On Thu, Jun 04, 2009 at 02:48:17PM +0300, Avi Kivity wrote: > Andi Kleen wrote: > >[Avi could you please still consider this patch for your 2.6.31 patchqueue? > >It's fairly simple, but important to handle memory errors in guests] > > > > Oh yes, and it'll be needed for -stable. IIUC, right now a machine > check is trapped by the guest, so the guest is killed instead of the host? Yes the guest will receive int 18. But it will not kill itmelf because the guest cannot access the machine check MSRs, so it will not see any machine check. So it's kind of ignored, which is pretty bad. > > >+/* > >+ * Trigger machine check on the host. We assume all the MSRs are already > >set up > >+ * by the CPU and that we still run on the same CPU as the MCE occurred > >on. > >+ * We pass a fake environment to the machine check handler because we want > >+ * the guest to be always treated like user space, no matter what context > >+ * it used internally. > >+ */ > > > > This assumption is incorrect. This code is executed after preemption > has been enabled, and we may have even slept before reaching it. The only thing that counts here is the context before the machine check event. If there was a vmexit we know it was in guest context. The only requirement we have is that we're running still on the same CPU. I assume that's true, otherwise the vmcb accesses wouldn't work? > > [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, > >+[EXIT_REASON_MACHINE_CHECK] = handle_machine_check, > > }; > > > > static const int kvm_vmx_max_exit_handlers = > > > > We get both an explicit EXIT_REASON and an exception? These are different cases. The exception is #MC in guest context, the EXIT_REASON is when a #MC happens while the CPU is executing the VM entry microcode. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v5 0/2] iosignalfd
Gregory Haskins wrote: Marcello, Avi, and myself have previously agreed that Marcello's mmio-locking cleanup should go in first. When that happens, I will need to rebase this series because it changes how you interface to the io_bus code. I should have mentioned that here, but forgot. (Speaking of, is there an ETA when that code will be merged Avi?) I had issues with the unbalanced locking the patchset introduced in coalesced_mmio, once these are resolved the patchset will be merged. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] CPU hard limits
Bharata B Rao wrote: 2. Need for hard limiting CPU resource -- - Pay-per-use: In enterprise systems that cater to multiple clients/customers where a customer demands a certain share of CPU resources and pays only that, CPU hard limits will be useful to hard limit the customer's job to consume only the specified amount of CPU resource. - In container based virtualization environments running multiple containers, hard limits will be useful to ensure a container doesn't exceed its CPU entitlement. - Hard limits can be used to provide guarantees. How can hard limits provide guarantees? Let's take an example where I have 1 group that I wish to guarantee a 20% share of the cpu, and anther 8 groups with no limits or guarantees. One way to achieve the guarantee is to hard limit each of the 8 other groups to 10%; the sum total of the limits is 80%, leaving 20% for the guarantee group. The downside is the arbitrary limit imposed on the other groups. Another way is to place the 8 groups in a container group, and limit that to 80%. But that doesn't work if I want to provide guarantees to several groups. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
Gregory Haskins wrote: Since Paul ok'd (I think?) the srcu design, and the only other feedback was the key-bitmap thing from Davide, I will go ahead and push a v2 with just that one fix (unless there is any other feedback?) I'll do a detailed review on your next posting. When I see a long thread I go hide under the bed, where there is no Internet access. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
Avi Kivity wrote: > Gregory Haskins wrote: >>> I agree that deassign is needed for reasons of symmetry, and that it >>> can be added later. >>> >>> >> Cool. >> >> FYI: Davide's patch has been accepted into -mm (Andrew CC'd). I am not >> sure of the protocol here, but I assume this means you can now safely >> pull it from -mm into kvm.git so the prerequisite for 2/2 is properly >> met. >> > > I'm not sure either. > > But I think I saw a "Thanks for catching that" for 2/2? > Ah, right! I queued that fix up eons ago after David's feedback and forgot that it was there waiting for me ;) Since Paul ok'd (I think?) the srcu design, and the only other feedback was the key-bitmap thing from Davide, I will go ahead and push a v2 with just that one fix (unless there is any other feedback?) -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
Gregory Haskins wrote: I agree that deassign is needed for reasons of symmetry, and that it can be added later. Cool. FYI: Davide's patch has been accepted into -mm (Andrew CC'd). I am not sure of the protocol here, but I assume this means you can now safely pull it from -mm into kvm.git so the prerequisite for 2/2 is properly met. I'm not sure either. But I think I saw a "Thanks for catching that" for 2/2? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [2/2] KVM: Add VT-x machine check support
Andi Kleen wrote: [Avi could you please still consider this patch for your 2.6.31 patchqueue? It's fairly simple, but important to handle memory errors in guests] Oh yes, and it'll be needed for -stable. IIUC, right now a machine check is trapped by the guest, so the guest is killed instead of the host? +/* + * Trigger machine check on the host. We assume all the MSRs are already set up + * by the CPU and that we still run on the same CPU as the MCE occurred on. + * We pass a fake environment to the machine check handler because we want + * the guest to be always treated like user space, no matter what context + * it used internally. + */ This assumption is incorrect. This code is executed after preemption has been enabled, and we may have even slept before reaching it. NMI suffers from the same issue, see vmx_complete_interrupts(). You could handle it the same way. @@ -3150,6 +3171,7 @@ [EXIT_REASON_WBINVD] = handle_wbinvd, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, + [EXIT_REASON_MACHINE_CHECK] = handle_machine_check, }; static const int kvm_vmx_max_exit_handlers = We get both an explicit EXIT_REASON and an exception? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
Avi Kivity wrote: > Michael S. Tsirkin wrote: >> On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote: >> And having close not clean up the state unless you do an ioctl first is very messy IMO - I don't think you'll find any such examples in kernel. >>> I agree, and that is why I am advocating this POLLHUP solution. It was >>> only this other way to begin with because the technology didn't exist >>> until Davide showed me the light. >>> >>> Problem with your request is that I already looked into what is >>> essentially a bi-directional reference problem (for a different reason) >>> when I started the POLLHUP series. Its messy to do this in a way that >>> doesn't negatively impact the fast path (introducing locking, etc) or >>> make my head explode making sure it doesn't race. Afaict, we would >>> need >>> to solve this problem to do what you are proposing (patches welcome). >>> >>> If this hybrid decoupled-deassign + unified-close is indeed an >>> important >>> feature set, I suggest that we still consider this POLLHUP series for >>> inclusion, and then someone can re-introduce DEASSIGN support in the >>> future as a CAP bit extension. That way we at least get the desirable >>> close() properties that we both seem in favor of, and get this advanced >>> use case when we need it (and can figure out the locking design). >>> >>> >> >> FWIW, I took a look and yes, it is non-trivial. >> I concur, we can always add the deassign ioctl later. >> > > I agree that deassign is needed for reasons of symmetry, and that it > can be added later. > Cool. FYI: Davide's patch has been accepted into -mm (Andrew CC'd). I am not sure of the protocol here, but I assume this means you can now safely pull it from -mm into kvm.git so the prerequisite for 2/2 is properly met. -Greg signature.asc Description: OpenPGP digital signature
[PATCH] [2/2] KVM: Add VT-x machine check support
[Avi could you please still consider this patch for your 2.6.31 patchqueue? It's fairly simple, but important to handle memory errors in guests] VT-x needs an explicit MC vector intercept to handle machine checks in the hyper visor. It also has a special option to catch machine checks that happen during VT entry. Do these interceptions and forward them to the Linux machine check handler. Make it always look like user space is interrupted because the machine check handler treats kernel/user space differently. Thanks to Huang Ying and Jiang Yunhong for help and testing. Cc: ying.hu...@intel.com Signed-off-by: Andi Kleen --- arch/x86/include/asm/vmx.h |1 + arch/x86/kvm/vmx.c | 26 -- 2 files changed, 25 insertions(+), 2 deletions(-) Index: linux/arch/x86/include/asm/vmx.h === --- linux.orig/arch/x86/include/asm/vmx.h 2009-05-28 10:47:53.0 +0200 +++ linux/arch/x86/include/asm/vmx.h2009-06-04 11:58:49.0 +0200 @@ -247,6 +247,7 @@ #define EXIT_REASON_MSR_READ31 #define EXIT_REASON_MSR_WRITE 32 #define EXIT_REASON_MWAIT_INSTRUCTION 36 +#define EXIT_REASON_MACHINE_CHECK 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 #define EXIT_REASON_EPT_VIOLATION 48 Index: linux/arch/x86/kvm/vmx.c === --- linux.orig/arch/x86/kvm/vmx.c 2009-05-28 10:47:53.0 +0200 +++ linux/arch/x86/kvm/vmx.c2009-06-04 12:05:44.0 +0200 @@ -32,6 +32,7 @@ #include #include #include +#include #define __ex(x) __kvm_handle_fault_on_reboot(x) @@ -478,7 +479,7 @@ { u32 eb; - eb = (1u << PF_VECTOR) | (1u << UD_VECTOR); + eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR); if (!vcpu->fpu_active) eb |= 1u << NM_VECTOR; if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) { @@ -2585,6 +2586,23 @@ return 0; } +/* + * Trigger machine check on the host. We assume all the MSRs are already set up + * by the CPU and that we still run on the same CPU as the MCE occurred on. + * We pass a fake environment to the machine check handler because we want + * the guest to be always treated like user space, no matter what context + * it used internally. + */ +static int handle_machine_check(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) +{ + struct pt_regs regs = { + .cs = 3, /* Fake ring 3 no matter what the guest ran on */ + .flags = X86_EFLAGS_IF, + }; + do_machine_check(®s, 0); + return 1; +} + static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -2596,6 +2614,10 @@ vect_info = vmx->idt_vectoring_info; intr_info = vmcs_read32(VM_EXIT_INTR_INFO); + ex_no = intr_info & INTR_INFO_VECTOR_MASK; + if (ex_no == MCE_VECTOR) + return handle_machine_check(vcpu, kvm_run); + if ((vect_info & VECTORING_INFO_VALID_MASK) && !is_page_fault(intr_info)) printk(KERN_ERR "%s: unexpected, vectoring info 0x%x " @@ -2648,7 +2670,6 @@ return 1; } - ex_no = intr_info & INTR_INFO_VECTOR_MASK; switch (ex_no) { case DB_VECTOR: dr6 = vmcs_readl(EXIT_QUALIFICATION); @@ -3150,6 +3171,7 @@ [EXIT_REASON_WBINVD] = handle_wbinvd, [EXIT_REASON_TASK_SWITCH] = handle_task_switch, [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation, + [EXIT_REASON_MACHINE_CHECK] = handle_machine_check, }; static const int kvm_vmx_max_exit_handlers = -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [1/2] x86: MCE: Define MCE_VECTOR
[This patch is already in the "mce3" branch in mce3 tip, but I'm including it here because it's needed for the next patch.] Signed-off-by: Andi Kleen --- arch/x86/include/asm/irq_vectors.h |1 + 1 file changed, 1 insertion(+) Index: linux/arch/x86/include/asm/irq_vectors.h === --- linux.orig/arch/x86/include/asm/irq_vectors.h 2009-05-27 21:48:38.0 +0200 +++ linux/arch/x86/include/asm/irq_vectors.h2009-05-27 21:48:38.0 +0200 @@ -25,6 +25,7 @@ */ #define NMI_VECTOR 0x02 +#define MCE_VECTOR 0x12 /* * IDT vectors usable for external interrupt sources start -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()
Michael S. Tsirkin wrote: On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote: And having close not clean up the state unless you do an ioctl first is very messy IMO - I don't think you'll find any such examples in kernel. I agree, and that is why I am advocating this POLLHUP solution. It was only this other way to begin with because the technology didn't exist until Davide showed me the light. Problem with your request is that I already looked into what is essentially a bi-directional reference problem (for a different reason) when I started the POLLHUP series. Its messy to do this in a way that doesn't negatively impact the fast path (introducing locking, etc) or make my head explode making sure it doesn't race. Afaict, we would need to solve this problem to do what you are proposing (patches welcome). If this hybrid decoupled-deassign + unified-close is indeed an important feature set, I suggest that we still consider this POLLHUP series for inclusion, and then someone can re-introduce DEASSIGN support in the future as a CAP bit extension. That way we at least get the desirable close() properties that we both seem in favor of, and get this advanced use case when we need it (and can figure out the locking design). FWIW, I took a look and yes, it is non-trivial. I concur, we can always add the deassign ioctl later. I agree that deassign is needed for reasons of symmetry, and that it can be added later. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8] qemu-kvm: add irqfd support
Gregory Haskins wrote: irqfd lets you create an eventfd based file-desriptor to inject interrupts to a kvm guest. We associate one gsi per fd for fine-grained routing. [note: this is meant to work in conjunction with the POLLHUP version of irqfd, which has not yet been accepted into kvm.git] Applied with two changes: added a dependency on CONFIG_eventfd (with the kvm external module, you can have irqfd support without eventfd support), and adjusted for the new libkvm location (libkvm-all.[ch]). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM-86 not exposing 64 bits CPU anymore, NICE
On Thu, 2009-06-04 at 09:20 +0200, Gilles PIETRI wrote: > I'm quite pissed off. I just upgraded to kvm-86 on a host that has > worked nicely on kvm-78 for quite some time. But since I was fearing the > qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the > performance, I decided to switch. How stupid that was. That was really > putting too much trust in KVM. Jim has already responded with details on the first for the particular issue, but speaking more generally ... The kvm-XX releases are snapshots of the development tree. They do not go to through the kind of stabilisation cycle you would expect from a new kernel release, for example. If you want a KVM version you can trust, use the latest qemu-kvm-0.x.y release with the stock version of kvm.ko that comes with your kernel or, if you need particular new features, the latest kvm-kmod-2.6.z release. Cheers, Mark. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM-86 not exposing 64 bits CPU anymore, NICE
Le 04/06/2009 09:46, Jim Paris a écrit : Gilles PIETRI wrote: Hi, I'm quite pissed off. I just upgraded to kvm-86 on a host that has worked nicely on kvm-78 for quite some time. But since I was fearing the qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the performance, I decided to switch. How stupid that was. That was really putting too much trust in KVM. Now I can't have 64 bits CPUs on my guests. My host is running a 2.6.27.7 kernel, and is x86_64 enabled. Until the upgrade, guests were running x86_64 fine. Now, it says long mode can't be used or something like that, and I can only have 32 bits guests. Please see http://www.mail-archive.com/kvm@vger.kernel.org/msg15757.html http://www.mail-archive.com/kvm@vger.kernel.org/msg15769.html -jim Gonna check that, thanks a lot, this didn't get on my radar.. Regards, Gilles -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM-86 not exposing 64 bits CPU anymore, NICE
- "Gilles PIETRI" wrote: > Hi, > > I'm quite pissed off. I just upgraded to kvm-86 on a host that has > worked nicely on kvm-78 for quite some time. But since I was fearing > the > qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the > > performance, I decided to switch. How stupid that was. That was really > > putting too much trust in KVM. > > Now I can't have 64 bits CPUs on my guests. > My host is running a 2.6.27.7 kernel, and is x86_64 enabled. > Until the upgrade, guests were running x86_64 fine. > Now, it says long mode can't be used or something like that, and I can > > only have 32 bits guests. > > Looks really like the bug explained here: > http://www.mail-archive.com/kvm@vger.kernel.org/msg09431.html > > If I use -no-kvm, it works, but obviously, I want to be able to have > kvm > support enabled. > > Now, I really am happy about this upgrade, and I'm gonna have to roll > it > back. I really would appreciate some help on this.. > > Gilles Hi Gilles, What are you saying is very strange, because KVM-Autotest has passed all tests for KVM-86 release, and I can say that 64-bit guests work here. (both Intel & AMD, on RHEL 5.3/x64) -Alexey -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html