Re: [PATCH -tip v9 3/7] kprobes: checks probe address is instruction boudary on x86

2009-06-04 Thread Ananth N Mavinakayanahalli
On Mon, Jun 01, 2009 at 08:37:31PM -0400, Masami Hiramatsu wrote:
> Ensure safeness of inserting kprobes by checking whether the specified
> address is at the first byte of a instruction on x86.
> This is done by decoding probed function from its head to the probe point.
> 
> Signed-off-by: Masami Hiramatsu 
> Cc: Ananth N Mavinakayanahalli 
> Cc: Jim Keniston 
> Cc: Ingo Molnar 

Acked-by: Ananth N Mavinakayanahalli 

> ---
> 
>  arch/x86/kernel/kprobes.c |   69 
> +
>  1 files changed, 69 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/kernel/kprobes.c b/arch/x86/kernel/kprobes.c
> index 7b5169d..41d524f 100644
> --- a/arch/x86/kernel/kprobes.c
> +++ b/arch/x86/kernel/kprobes.c
> @@ -48,12 +48,14 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  void jprobe_return_end(void);
> 
> @@ -244,6 +246,71 @@ retry:
>   }
>  }
> 
> +/* Recover the probed instruction at addr for further analysis. */
> +static int recover_probed_instruction(kprobe_opcode_t *buf, unsigned long 
> addr)
> +{
> + struct kprobe *kp;
> + kp = get_kprobe((void *)addr);
> + if (!kp)
> + return -EINVAL;
> +
> + /*
> +  *  Basically, kp->ainsn.insn has an original instruction.
> +  *  However, RIP-relative instruction can not do single-stepping
> +  *  at different place, fix_riprel() tweaks the displacement of
> +  *  that instruction. In that case, we can't recover the instruction
> +  *  from the kp->ainsn.insn.
> +  *
> +  *  On the other hand, kp->opcode has a copy of the first byte of
> +  *  the probed instruction, which is overwritten by int3. And
> +  *  the instruction at kp->addr is not modified by kprobes except
> +  *  for the first byte, we can recover the original instruction
> +  *  from it and kp->opcode.
> +  */
> + memcpy(buf, kp->addr, MAX_INSN_SIZE * sizeof(kprobe_opcode_t));
> + buf[0] = kp->opcode;
> + return 0;
> +}
> +
> +/* Dummy buffers for kallsyms_lookup */
> +static char __dummy_buf[KSYM_NAME_LEN];
> +
> +/* Check if paddr is at an instruction boundary */
> +static int __kprobes can_probe(unsigned long paddr)
> +{
> + int ret;
> + unsigned long addr, offset = 0;
> + struct insn insn;
> + kprobe_opcode_t buf[MAX_INSN_SIZE];
> +
> + if (!kallsyms_lookup(paddr, NULL, &offset, NULL, __dummy_buf))
> + return 0;
> +
> + /* Decode instructions */
> + addr = paddr - offset;
> + while (addr < paddr) {
> + kernel_insn_init(&insn, (void *)addr);
> + insn_get_opcode(&insn);
> +
> + /* Check if the instruction has been modified. */
> + if (OPCODE1(&insn) == BREAKPOINT_INSTRUCTION) {
> + ret = recover_probed_instruction(buf, addr);
> + if (ret)
> + /*
> +  * Another debugging subsystem might insert
> +  * this breakpoint. In that case, we can't
> +  * recover it.
> +  */
> + return 0;
> + kernel_insn_init(&insn, buf);
> + }
> + insn_get_length(&insn);
> + addr += insn.length;
> + }
> +
> + return (addr == paddr);
> +}
> +
>  /*
>   * Returns non-zero if opcode modifies the interrupt flag.
>   */
> @@ -359,6 +426,8 @@ static void __kprobes arch_copy_kprobe(struct kprobe *p)
> 
>  int __kprobes arch_prepare_kprobe(struct kprobe *p)
>  {
> + if (!can_probe((unsigned long)p->addr))
> + return -EILSEQ;
>   /* insn: must be on special executable page on x86. */
>   p->ainsn.insn = get_insn_slot();
>   if (!p->ainsn.insn)
> 
> 
> -- 
> Masami Hiramatsu
> 
> Software Engineer
> Hitachi Computer Products (America), Inc.
> Software Solutions Division
> 
> e-mail: mhira...@redhat.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Bharata B Rao
On Fri, Jun 05, 2009 at 09:03:37AM +0300, Avi Kivity wrote:
> Balbir Singh wrote:
>>> I think so.  Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), 
>>> and a  cpu hog running in each group, how would the algorithm divide 
>>> resources?
>>>
>>> 
>>
>> As per the matrix calculation, but as soon as we reach an idle point,
>> we redistribute the b/w and start a new quantum so to speak, where all
>> groups are charged up to their hard limits.
>>
>> For your question, if there is a CPU hog running, it would be as per
>> the matrix calculation, since the system has no idle point during the
>> bandwidth period.
>>   
>
> So the groups with guarantees get a priority boost.  That's not a good  
> side effect.

That happens only in the presence of idle cycles when other groups [with or
without guarantees] have nothing useful to do. So how would that matter
since there is nothing else to run anyway ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:
I think so.  Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a  
cpu hog running in each group, how would the algorithm divide resources?





As per the matrix calculation, but as soon as we reach an idle point,
we redistribute the b/w and start a new quantum so to speak, where all
groups are charged up to their hard limits.

For your question, if there is a CPU hog running, it would be as per
the matrix calculation, since the system has no idle point during the
bandwidth period.
  


So the groups with guarantees get a priority boost.  That's not a good 
side effect.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Bharata B Rao wrote:

On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
  

* Avi Kivity  [2009-06-05 08:21:43]:



Balbir Singh wrote:
  

But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?

  

OK, I see part of your concern, but I think we could do some
optimizations during design. For example if all groups have reached
their hard-limit and the system is idle, should we do start a new hard
limit interval and restart, so that idleness can be removed. Would
that be an acceptable design point?

I think so.  Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a  
cpu hog running in each group, how would the algorithm divide resources?


  

As per the matrix calculation, but as soon as we reach an idle point,
we redistribute the b/w and start a new quantum so to speak, where all
groups are charged up to their hard limits.



But could there be client models where you are required to strictly
adhere to the limit within the bandwidth and not provide more (by advancing
the bandwidth period) in the presence of idle cycles ?
  


That's the limit part.  I'd like to be able to specify limits and 
guarantees on the same host and for the same groups; I don't think that 
works when you advance the bandwidth period.


I think we need to treat guarantees as first-class goals, not something 
derived from limits (in fact I think guarantees are more useful as they 
can be used to provide SLAs).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Michael Tokarev

Aaron Clausen wrote:
[]

is too old to support this.  Is there a reasonably safe way of
upgrading to one of the newer versions of KVM on this server?


Can't say for "safe" but you can grab my .debs which I use here
on a bunch of machines, from http://www.corpit.ru/debian/tls/kvm/ -
both binaries and sources.  To make them more safe to you, you
can download .dsc, .diff.gz, examine the content and build it
yourself.

/mjt

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Bharata B Rao
On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
> * Avi Kivity  [2009-06-05 08:21:43]:
> 
> > Balbir Singh wrote:
> >>> But then there is no other way to make a *guarantee*, guarantees come
> >>> at a cost of idling resources, no? Can you show me any other
> >>> combination that will provide the guarantee and without idling the
> >>> system for the specified guarantees?
> >>> 
> >>
> >> OK, I see part of your concern, but I think we could do some
> >> optimizations during design. For example if all groups have reached
> >> their hard-limit and the system is idle, should we do start a new hard
> >> limit interval and restart, so that idleness can be removed. Would
> >> that be an acceptable design point?
> >
> > I think so.  Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a  
> > cpu hog running in each group, how would the algorithm divide resources?
> >
> 
> As per the matrix calculation, but as soon as we reach an idle point,
> we redistribute the b/w and start a new quantum so to speak, where all
> groups are charged up to their hard limits.

But could there be client models where you are required to strictly
adhere to the limit within the bandwidth and not provide more (by advancing
the bandwidth period) in the presence of idle cycles ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Paul E. McKenney
On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote:
> On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
> > Avi Kivity wrote:
> > > Gregory Haskins wrote:
> > > One idea is similar to signalfd() or eventfd()
> >
> > And thus the "kvm-eventfd" (irqfd/iosignalfd) interface project was born.
> > ;)
> 
> The lguest patch queue already has such an interface :)  And I have a
> partially complete in-kernel virtio_pci patch with the same trick.
> 
> I switched from "kernel created eventfd" to "userspace passes in eventfd"
> after a while though; it lets you connect multiple virtqueues to a single fd
> if you want.
> 
> Combined with a minor change to allow any process with access to the lguest fd
> to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
> model which was a significant speedup as well as nice code reduction.
> 
> Here's the relevant kernel patch for reading.
> 
> Thanks!
> Rusty.
> 
> lguest: use eventfds for device notification
> 
> Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
> an address: the main Launcher process returns with this address, and figures
> out what device to run.
> 
> A far nicer model is to let processes bind an eventfd to an address: if we
> find one, we simply signal the eventfd.

A couple of (probably misguided) RCU questions/suggestions interspersed.

> Signed-off-by: Rusty Russell 
> Cc: Davide Libenzi 
> ---
>  drivers/lguest/Kconfig  |2 -
>  drivers/lguest/core.c   |8 ++--
>  drivers/lguest/lg.h |9 
>  drivers/lguest/lguest_user.c|   73 
> 
>  include/linux/lguest_launcher.h |1 
>  5 files changed, 89 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
> --- a/drivers/lguest/Kconfig
> +++ b/drivers/lguest/Kconfig
> @@ -1,6 +1,6 @@
>  config LGUEST
>   tristate "Linux hypervisor example code"
> - depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX
> + depends on X86_32 && EXPERIMENTAL && !X86_PAE && EVENTFD
>   select HVC_DRIVER
>   ---help---
> This is a very simple module which allows you to run
> diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
> --- a/drivers/lguest/core.c
> +++ b/drivers/lguest/core.c
> @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
>   /* It's possible the Guest did a NOTIFY hypercall to the
>* Launcher, in which case we return from the read() now. */
>   if (cpu->pending_notify) {
> - if (put_user(cpu->pending_notify, user))
> - return -EFAULT;
> - return sizeof(cpu->pending_notify);
> + if (!send_notify_to_eventfd(cpu)) {
> + if (put_user(cpu->pending_notify, user))
> + return -EFAULT;
> + return sizeof(cpu->pending_notify);
> + }
>   }
> 
>   /* Check for signals */
> diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
> --- a/drivers/lguest/lg.h
> +++ b/drivers/lguest/lg.h
> @@ -82,6 +82,11 @@ struct lg_cpu {
>   struct lg_cpu_arch arch;
>  };
> 
> +struct lg_eventfds {
> + unsigned long addr;
> + struct file *event;
> +};
> +
>  /* The private info the thread maintains about the guest. */
>  struct lguest
>  {
> @@ -102,6 +107,9 @@ struct lguest
>   unsigned int stack_pages;
>   u32 tsc_khz;
> 
> + unsigned int num_eventfds;
> + struct lg_eventfds *eventfds;
> +
>   /* Dead? */
>   const char *dead;
>  };
> @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
>  void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
>   const unsigned long *def);
>  void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
> +bool send_notify_to_eventfd(struct lg_cpu *cpu);
>  void init_clockdev(struct lg_cpu *cpu);
>  bool check_syscall_vector(struct lguest *lg);
>  int init_interrupts(void);
> diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
> --- a/drivers/lguest/lguest_user.c
> +++ b/drivers/lguest/lguest_user.c
> @@ -7,6 +7,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include "lg.h"
> 
>  /*L:055 When something happens, the Waker process needs a way to stop the
> @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu
>   }
>  }
> 
> +bool send_notify_to_eventfd(struct lg_cpu *cpu)
> +{
> + unsigned int i;
> +
> + /* lg->eventfds is RCU-protected */
> + preempt_disable();

Suggest changing to rcu_read_lock() to match the synchronize_rcu().

> + for (i = 0; i < cpu->lg->num_eventfds; i++) {
> + if (cpu->lg->eventfds[i].addr == cpu->pending_notify) {
> + eventfd_signal(cpu->lg->eventfds[i].event, 1);

Shouldn't this be something like the following?


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity  [2009-06-05 08:21:43]:

> Balbir Singh wrote:
>>> But then there is no other way to make a *guarantee*, guarantees come
>>> at a cost of idling resources, no? Can you show me any other
>>> combination that will provide the guarantee and without idling the
>>> system for the specified guarantees?
>>> 
>>
>> OK, I see part of your concern, but I think we could do some
>> optimizations during design. For example if all groups have reached
>> their hard-limit and the system is idle, should we do start a new hard
>> limit interval and restart, so that idleness can be removed. Would
>> that be an acceptable design point?
>
> I think so.  Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a  
> cpu hog running in each group, how would the algorithm divide resources?
>

As per the matrix calculation, but as soon as we reach an idle point,
we redistribute the b/w and start a new quantum so to speak, where all
groups are charged up to their hard limits.

For your question, if there is a CPU hog running, it would be as per
the matrix calculation, since the system has no idle point during the
bandwidth period.

-- 
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:

But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?



OK, I see part of your concern, but I think we could do some
optimizations during design. For example if all groups have reached
their hard-limit and the system is idle, should we do start a new hard
limit interval and restart, so that idleness can be removed. Would
that be an acceptable design point?


I think so.  Given guarantees G1..Gn (0 <= Gi <= 1; sum(Gi) <= 1), and a 
cpu hog running in each group, how would the algorithm divide resources?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity  [2009-06-05 08:16:21]:

> Balbir Singh wrote:
>
>
>
 How, it works out fine in my calculation

 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
 limited to 90%
 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
 limited to 90%
 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
 limited to 100%
 
>>> It's fine in that it satisfies the guarantees, but it is deeply   
>>> suboptimal.  If I ran a cpu hog in the first group, while the other 
>>> two  were idle, it would be limited to 50% cpu.  On the other hand, 
>>> if it  consumed all 100% cpu it would still satisfy the guarantees 
>>> (as the  other groups are idle).
>>>
>>> The result is that in such a situation, wall clock time would double  
>>> even though cpu resources are available.
>>> 
>>
>> But then there is no other way to make a *guarantee*, guarantees come
>> at a cost of idling resources, no? Can you show me any other
>> combination that will provide the guarantee and without idling the
>> system for the specified guarantees?
>>   
>
> Suppose in my example cgroup 1 consumed 100% of the cpu resources and  
> cgroup 2 and 3 were completely idle.  All of the guarantees are met (if  
> cgroup 2 is idle, there's no need to give it the 10% cpu time it is  
> guaranteed).
>
> If  your only tool to achieve the guarantees is a limit system, then  
> yes, the equation yields the correct results.  But given that it yields  
> such inferior results, I think we need to look for a more involved 
> solution.
>
> I think the limits method fits cases where it is difficult to evict a  
> resource (say, disk quotas -- if you want to guarantee 10% of space to  
> cgroups 1, you must limit all others to 90%).  But for processor usage,  
> you can evict a cgroup instantly, so nothing prevents a cgroup from  
> consuming all available resources as long as others do not contend for 
> them.

Avi,

Could you look at my newer email and comment, where I've mentioned
that I see your concern and discussed a design point. We could
probably take this discussion forward from there?

-- 
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:

   


How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%
  
  
It's fine in that it satisfies the guarantees, but it is deeply  
suboptimal.  If I ran a cpu hog in the first group, while the other two  
were idle, it would be limited to 50% cpu.  On the other hand, if it  
consumed all 100% cpu it would still satisfy the guarantees (as the  
other groups are idle).


The result is that in such a situation, wall clock time would double  
even though cpu resources are available.



But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?
  


Suppose in my example cgroup 1 consumed 100% of the cpu resources and 
cgroup 2 and 3 were completely idle.  All of the guarantees are met (if 
cgroup 2 is idle, there's no need to give it the 10% cpu time it is 
guaranteed).


If  your only tool to achieve the guarantees is a limit system, then 
yes, the equation yields the correct results.  But given that it yields 
such inferior results, I think we need to look for a more involved solution.


I think the limits method fits cases where it is difficult to evict a 
resource (say, disk quotas -- if you want to guarantee 10% of space to 
cgroups 1, you must limit all others to 90%).  But for processor usage, 
you can evict a cgroup instantly, so nothing prevents a cgroup from 
consuming all available resources as long as others do not contend for them.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Chris Friesen  [2009-06-04 23:09:22]:

> Balbir Singh wrote:
> 
> > But then there is no other way to make a *guarantee*, guarantees come
> > at a cost of idling resources, no? Can you show me any other
> > combination that will provide the guarantee and without idling the
> > system for the specified guarantees?
> 
> The example given was two 10% guaranteed groups and one best-effort
> group.  Why would this require idling resources?
> 
> If I have a hog in each group, the requirements would be met if the
> groups got 33, 33, and 33.  (Or 10/10/80, for that matter.)  If the
> second and third groups go idle, why not let the first group use 100% of
> the cpu?
> 
> The only hard restriction is that the sum of the guarantees must be less
> than 100%.
>

Chris,

I just responded to a variation of this, I think that some of this
could be handled during design. I just sent out the email a few
minutes ago. Could you look at that and respond. 

-- 
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Balbir Singh  [2009-06-05 12:49:46]:

> * Avi Kivity  [2009-06-05 07:44:27]:
> 
> > Balbir Singh wrote:
> >> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity  wrote:
> >>   
> >>> Bharata B Rao wrote:
> >>> 
> > Another way is to place the 8 groups in a container group, and limit
> >  that to 80%. But that doesn't work if I want to provide guarantees to
> >  several groups.
> >
> > 
>  Hmm why not ? Reduce the guarantee of the container group and provide
>  the same to additional groups ?
> 
>    
> >>> This method produces suboptimal results:
> >>>
> >>> $ cgroup-limits 10 10 0
> >>> [50.0, 50.0, 40.0]
> >>>
> >>> I want to provide two 10% guaranteed groups and one best-effort group.
> >>>  Using the limits method, no group can now use more than 50% of the
> >>> resources.  However, having the first group use 90% of the resources does
> >>> not violate any guarantees, but it not allowed by the solution.
> >>>
> >>> 
> >>
> >> How, it works out fine in my calculation
> >>
> >> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
> >> limited to 90%
> >> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
> >> limited to 90%
> >> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
> >> limited to 100%
> >>   
> >
> > It's fine in that it satisfies the guarantees, but it is deeply  
> > suboptimal.  If I ran a cpu hog in the first group, while the other two  
> > were idle, it would be limited to 50% cpu.  On the other hand, if it  
> > consumed all 100% cpu it would still satisfy the guarantees (as the  
> > other groups are idle).
> >
> > The result is that in such a situation, wall clock time would double  
> > even though cpu resources are available.
> 
> But then there is no other way to make a *guarantee*, guarantees come
> at a cost of idling resources, no? Can you show me any other
> combination that will provide the guarantee and without idling the
> system for the specified guarantees?

OK, I see part of your concern, but I think we could do some
optimizations during design. For example if all groups have reached
their hard-limit and the system is idle, should we do start a new hard
limit interval and restart, so that idleness can be removed. Would
that be an acceptable design point?

-- 
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Chris Friesen
Balbir Singh wrote:

> But then there is no other way to make a *guarantee*, guarantees come
> at a cost of idling resources, no? Can you show me any other
> combination that will provide the guarantee and without idling the
> system for the specified guarantees?

The example given was two 10% guaranteed groups and one best-effort
group.  Why would this require idling resources?

If I have a hog in each group, the requirements would be met if the
groups got 33, 33, and 33.  (Or 10/10/80, for that matter.)  If the
second and third groups go idle, why not let the first group use 100% of
the cpu?

The only hard restriction is that the sum of the guarantees must be less
than 100%.

Chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Rusty Russell
On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
> Avi Kivity wrote:
> > Gregory Haskins wrote:
> > One idea is similar to signalfd() or eventfd()
>
> And thus the "kvm-eventfd" (irqfd/iosignalfd) interface project was born.
> ;)

The lguest patch queue already has such an interface :)  And I have a
partially complete in-kernel virtio_pci patch with the same trick.

I switched from "kernel created eventfd" to "userspace passes in eventfd"
after a while though; it lets you connect multiple virtqueues to a single fd
if you want.

Combined with a minor change to allow any process with access to the lguest fd
to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
model which was a significant speedup as well as nice code reduction.

Here's the relevant kernel patch for reading.

Thanks!
Rusty.

lguest: use eventfds for device notification

Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
an address: the main Launcher process returns with this address, and figures
out what device to run.

A far nicer model is to let processes bind an eventfd to an address: if we
find one, we simply signal the eventfd.

Signed-off-by: Rusty Russell 
Cc: Davide Libenzi 
---
 drivers/lguest/Kconfig  |2 -
 drivers/lguest/core.c   |8 ++--
 drivers/lguest/lg.h |9 
 drivers/lguest/lguest_user.c|   73 
 include/linux/lguest_launcher.h |1 
 5 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
--- a/drivers/lguest/Kconfig
+++ b/drivers/lguest/Kconfig
@@ -1,6 +1,6 @@
 config LGUEST
tristate "Linux hypervisor example code"
-   depends on X86_32 && EXPERIMENTAL && !X86_PAE && FUTEX
+   depends on X86_32 && EXPERIMENTAL && !X86_PAE && EVENTFD
select HVC_DRIVER
---help---
  This is a very simple module which allows you to run
diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
/* It's possible the Guest did a NOTIFY hypercall to the
 * Launcher, in which case we return from the read() now. */
if (cpu->pending_notify) {
-   if (put_user(cpu->pending_notify, user))
-   return -EFAULT;
-   return sizeof(cpu->pending_notify);
+   if (!send_notify_to_eventfd(cpu)) {
+   if (put_user(cpu->pending_notify, user))
+   return -EFAULT;
+   return sizeof(cpu->pending_notify);
+   }
}
 
/* Check for signals */
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -82,6 +82,11 @@ struct lg_cpu {
struct lg_cpu_arch arch;
 };
 
+struct lg_eventfds {
+   unsigned long addr;
+   struct file *event;
+};
+
 /* The private info the thread maintains about the guest. */
 struct lguest
 {
@@ -102,6 +107,9 @@ struct lguest
unsigned int stack_pages;
u32 tsc_khz;
 
+   unsigned int num_eventfds;
+   struct lg_eventfds *eventfds;
+
/* Dead? */
const char *dead;
 };
@@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
 void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
const unsigned long *def);
 void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
+bool send_notify_to_eventfd(struct lg_cpu *cpu);
 void init_clockdev(struct lg_cpu *cpu);
 bool check_syscall_vector(struct lguest *lg);
 int init_interrupts(void);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -7,6 +7,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "lg.h"
 
 /*L:055 When something happens, the Waker process needs a way to stop the
@@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu
}
 }
 
+bool send_notify_to_eventfd(struct lg_cpu *cpu)
+{
+   unsigned int i;
+
+   /* lg->eventfds is RCU-protected */
+   preempt_disable();
+   for (i = 0; i < cpu->lg->num_eventfds; i++) {
+   if (cpu->lg->eventfds[i].addr == cpu->pending_notify) {
+   eventfd_signal(cpu->lg->eventfds[i].event, 1);
+   cpu->pending_notify = 0;
+   break;
+   }
+   }
+   preempt_enable();
+   return cpu->pending_notify == 0;
+}
+
+static int add_eventfd(struct lguest *lg, unsigned long addr, int fd)
+{
+   struct lg_eventfds *new, *old;
+
+   if (!addr)
+   return -EINVAL;
+
+   /* Replace the old array with the new one, carefully: others can
+* be accessing it at the sa

Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity  [2009-06-05 07:44:27]:

> Balbir Singh wrote:
>> On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity  wrote:
>>   
>>> Bharata B Rao wrote:
>>> 
> Another way is to place the 8 groups in a container group, and limit
>  that to 80%. But that doesn't work if I want to provide guarantees to
>  several groups.
>
> 
 Hmm why not ? Reduce the guarantee of the container group and provide
 the same to additional groups ?

   
>>> This method produces suboptimal results:
>>>
>>> $ cgroup-limits 10 10 0
>>> [50.0, 50.0, 40.0]
>>>
>>> I want to provide two 10% guaranteed groups and one best-effort group.
>>>  Using the limits method, no group can now use more than 50% of the
>>> resources.  However, having the first group use 90% of the resources does
>>> not violate any guarantees, but it not allowed by the solution.
>>>
>>> 
>>
>> How, it works out fine in my calculation
>>
>> 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
>> limited to 90%
>> 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
>> limited to 90%
>> 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
>> limited to 100%
>>   
>
> It's fine in that it satisfies the guarantees, but it is deeply  
> suboptimal.  If I ran a cpu hog in the first group, while the other two  
> were idle, it would be limited to 50% cpu.  On the other hand, if it  
> consumed all 100% cpu it would still satisfy the guarantees (as the  
> other groups are idle).
>
> The result is that in such a situation, wall clock time would double  
> even though cpu resources are available.

But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?


>> Now if we really have zeros, I would recommend using
>>
>> cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.
>>
>> Adding zeros to the calcuation is not recommended. Does that help?
>
> What do you mean, it is not recommended? I have two groups which need at  
> least 10% and one which does not need any guarantee, how do I express it?
>
Ignore this part of my comment

> In any case, changing the zero to 1% does not materially change the results.

True.

-- 
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:

On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity  wrote:
  

Bharata B Rao wrote:


Another way is to place the 8 groups in a container group, and limit
 that to 80%. But that doesn't work if I want to provide guarantees to
 several groups.



Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?

  

This method produces suboptimal results:

$ cgroup-limits 10 10 0
[50.0, 50.0, 40.0]

I want to provide two 10% guaranteed groups and one best-effort group.
 Using the limits method, no group can now use more than 50% of the
resources.  However, having the first group use 90% of the resources does
not violate any guarantees, but it not allowed by the solution.




How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%
  


It's fine in that it satisfies the guarantees, but it is deeply 
suboptimal.  If I ran a cpu hog in the first group, while the other two 
were idle, it would be limited to 50% cpu.  On the other hand, if it 
consumed all 100% cpu it would still satisfy the guarantees (as the 
other groups are idle).


The result is that in such a situation, wall clock time would double 
even though cpu resources are available.

Now if we really have zeros, I would recommend using

cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.

Adding zeros to the calcuation is not recommended. Does that help?


What do you mean, it is not recommended? I have two groups which need at 
least 10% and one which does not need any guarantee, how do I express it?


In any case, changing the zero to 1% does not materially change the results.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity  wrote:
> Bharata B Rao wrote:
>>>
>>> Another way is to place the 8 groups in a container group, and limit
>>>  that to 80%. But that doesn't work if I want to provide guarantees to
>>>  several groups.
>>>
>>
>> Hmm why not ? Reduce the guarantee of the container group and provide
>> the same to additional groups ?
>>
>
> This method produces suboptimal results:
>
> $ cgroup-limits 10 10 0
> [50.0, 50.0, 40.0]
>
> I want to provide two 10% guaranteed groups and one best-effort group.
>  Using the limits method, no group can now use more than 50% of the
> resources.  However, having the first group use 90% of the resources does
> not violate any guarantees, but it not allowed by the solution.
>

How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%

Now if we really have zeros, I would recommend using

cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.

Adding zeros to the calcuation is not recommended. Does that help?

Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Mark van Walraven
Hi,

An update in the hope that this is useful to someone :-)

On Fri, Jun 05, 2009 at 09:03:03AM +1200, Mark van Walraven wrote:
> My next step is to try qemu-kvm, built from source.  The Debianised libvirt
> expects the kvm binaries to be in /usr/bin/kvm, so you can symlink them
> from /usr/local/bin if you prefer to install there.  I've also experimented
> with shell script wrapper in /usr/bin/kvm that condenses the output of
> qemu-kvm --help so that libvirtd for Lenny works.

Actually, the current Debian Lenny libvirt* (0.4.6-10) seem to work
fine with qemu-kvm-0.10.5 built from source.  All I needed to do was
symlink /usr/local/bin/qemu-system-x86_64 to /usr/bin/kvm and copy
extboot.bin into /usr/local/share/qemu/ (I used the one from the
kvm 85+dfsg-3 package in Experimental).

So far, so good.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Bharata B Rao wrote:
Another way is to place the 8 groups in a container group, and limit  
that to 80%. But that doesn't work if I want to provide guarantees to  
several groups.



Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?
  


This method produces suboptimal results:

$ cgroup-limits 10 10 0
[50.0, 50.0, 40.0]

I want to provide two 10% guaranteed groups and one best-effort group.  
Using the limits method, no group can now use more than 50% of the 
resources.  However, having the first group use 90% of the resources 
does not violate any guarantees, but it not allowed by the solution.


#!/usr/bin/python

def calculate_limits(g, R):
   N = len(g)
   if N == 1:
   return [R]

   s = sum([R - gi for gi in g])
   return [(s - (R - gi) - (N - 2) * (R - gi)) / (N - 1)
   for gi in g]

import sys
print calculate_limits([float(x) for x in sys.argv[1:]], 100)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity  [2009-06-04 15:19:22]:

> Bharata B Rao wrote:
> > 2. Need for hard limiting CPU resource
> > --
> > - Pay-per-use: In enterprise systems that cater to multiple 
> > clients/customers
> >   where a customer demands a certain share of CPU resources and pays only
> >   that, CPU hard limits will be useful to hard limit the customer's job
> >   to consume only the specified amount of CPU resource.
> > - In container based virtualization environments running multiple 
> > containers,
> >   hard limits will be useful to ensure a container doesn't exceed its
> >   CPU entitlement.
> > - Hard limits can be used to provide guarantees.
> >   
> How can hard limits provide guarantees?
> 
> Let's take an example where I have 1 group that I wish to guarantee a 
> 20% share of the cpu, and anther 8 groups with no limits or guarantees.
> 
> One way to achieve the guarantee is to hard limit each of the 8 other 
> groups to 10%; the sum total of the limits is 80%, leaving 20% for the 
> guarantee group. The downside is the arbitrary limit imposed on the 
> other groups.
> 
> Another way is to place the 8 groups in a container group, and limit 
> that to 80%. But that doesn't work if I want to provide guarantees to 
> several groups.
>

Hi, Avi,

Take a look at
http://wiki.openvz.org/Containers/Guarantees_for_resources
and the associated program in the wiki page.

-- 
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Bharata B Rao
On Thu, Jun 04, 2009 at 03:19:22PM +0300, Avi Kivity wrote:
> Bharata B Rao wrote:
>> 2. Need for hard limiting CPU resource
>> --
>> - Pay-per-use: In enterprise systems that cater to multiple clients/customers
>>   where a customer demands a certain share of CPU resources and pays only
>>   that, CPU hard limits will be useful to hard limit the customer's job
>>   to consume only the specified amount of CPU resource.
>> - In container based virtualization environments running multiple containers,
>>   hard limits will be useful to ensure a container doesn't exceed its
>>   CPU entitlement.
>> - Hard limits can be used to provide guarantees.
>>   
> How can hard limits provide guarantees?
>
> Let's take an example where I have 1 group that I wish to guarantee a  
> 20% share of the cpu, and anther 8 groups with no limits or guarantees.
>
> One way to achieve the guarantee is to hard limit each of the 8 other  
> groups to 10%; the sum total of the limits is 80%, leaving 20% for the  
> guarantee group. The downside is the arbitrary limit imposed on the  
> other groups.

This method sounds very similar to the openvz method:
http://wiki.openvz.org/Containers/Guarantees_for_resources

>
> Another way is to place the 8 groups in a container group, and limit  
> that to 80%. But that doesn't work if I want to provide guarantees to  
> several groups.

Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] qemu-kvm: Flush icache after dma operations for ia64

2009-06-04 Thread Zhang, Xiantao
Jes Sorensen wrote:
> Zhang, Xiantao wrote:
>> Hi, Jes
>> Have you verified whether it works for you ?  You may run kernel
>> build in the guest with 4 vcpus,  if it can be done successfully
>> without any error, it should be Okay I think, otherwise, we may need
>> to investigate it further. :) Xiantao  
> 
> Hi Xiantao,
> 
> I was able to run a 16 vCPU guest and build the kernel using make -j
> 16. How quickly would the problem show up for you, on every run, or
> should I run more tests?

Hi Jes, 
 Good news! On my machine, without the patch, smp guest can't build one whole 
kernel at all. So if you can build it without errors and use it to boot up the 
guest, I think it should work well.  
Xiantao



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-1971512 ] failure to migrate guests with more than 4GB of RAM

2009-06-04 Thread SourceForge.net
Bugs item #1971512, was opened at 2008-05-24 17:45
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=1971512&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 3
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Anthony Liguori (aliguori)
Summary: failure to migrate guests with more than 4GB of RAM

Initial Comment:

The migration code assumes linear "phys_ram_base":

[r...@localhost kvm-userspace.tip]# qemu/x86_64-softmmu/qemu-system-x86_64 -hda 
/root/images/marcelo5-io-test.img -m 4097 -net nic,model=rtl8139 -net 
tap,script=/root/iptables/ifup -incoming tcp://0:/
audit_log_user_command(): Connection refused
audit_log_user_command(): Connection refused
migration: memory size mismatch: recv 22032384 mine 4316999680
migrate_incoming_fd failed (rc=232)


--

>Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 20:00

Message:
This has been fixed by Glauber.

--

Comment By: Jiajun Xu (jiajun)
Date: 2008-12-15 22:37

Message:
We did not run anyworkload, we do migration just after guest boots up and
becomes idle.

--

Comment By: Avi Kivity (avik)
Date: 2008-12-14 11:45

Message:
What workload is the guest running during the migration?

--

Comment By: Jiajun Xu (jiajun)
Date: 2008-12-09 23:09

Message:
Open the bug again since Live Migration 4G guest still fail on my machine.
Guest will call trace after Live Migration.

--

Comment By: SourceForge Robot (sf-robot)
Date: 2008-12-07 22:22

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

--

Comment By: Jiajun Xu (jiajun)
Date: 2008-11-25 01:52

Message:
I tried latest commit, userspace.git
6e63ba19476753595e508713eb9daf559dc50bf6 with a 64-bit RHEL5.1 Guest. My
host kernel is 2.6.26.2. And My host has 8GB memory and 4GB swap.
Guest can be live migrated, but after that, guest will call trace.

Maybe we can have a check with each other's environment.

My steps as following:
1. qemu-system-x86_64 -incoming tcp:localhost: -m 4096  -net
nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net
tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img
2. qemu-system-x86_64  -m 4096 -net
nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net
tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img
3. In qemu console, type "migrate tcp:localhost:"

The call trace messages in guest:
###
Kernel BUG at block/elevator.c:560
invalid opcode:  [1] SMP 
last sysfs file: /block/hda/removable
CPU 0 
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc
iscsi_tcp
ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm ib_srp ib_sdp
rdma_cm
ib_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_sa ib_uverbs ib_umad ib_mad
ib_core
dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button
battery asus_acpi acpi_memhotplug ac lp floppy pcspkr serio_raw 8139cp
8139too
parport_pc parport mii ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3
jbd
ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-53.el5 #1
RIP: 0010:[]  []
elv_dequeue_request+0x8/0x3c
RSP: 0018:8040ddc0  EFLAGS: 00010046
RAX: 0001 RBX: 81011381b398 RCX: 
RDX: 81011381b398 RSI: 81011381b398 RDI: 81011fb912c0
RBP: 804abe18 R08: 80304108 R09: 0012
R10: 0022 R11:  R12: 
R13: 0001 R14: 0086 R15: 8040deb8
FS:  () GS:80396000()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ad6f4d0 CR3: 0001126cc000 CR4: 06e0
Process swapper (pid: 0, threadinfo 803c6000, task
802dcae0)
Stack:  8000ae3c 804abe18 804abe50

 804abd00 0246 8003ba73 8003ba0c
 804abe18 81011fbe5800 8000d2a5 81011fb8c5c0
Call Trace:
   [] ide_end_request+0xc6/0xfc
 [] ide_dma_intr+0x67/0xab
 [] ide_dma_intr+0x0/0xab
 [] ide_intr+0x16f/0x1df
 [] handle_IRQ_event+0x29/0x58
 [] __do_IRQ+0xa4/0x105
 [] do_IRQ+0xe7/0xf5
 [] ret_from_intr+0x0/0xa
 [] __do_softir

[ kvm-Bugs-2624842 ] kernel BUG at /kvm-84/kernel/x86/kvm_main.c:2148!

2009-06-04 Thread SourceForge.net
Bugs item #2624842, was opened at 2009-02-21 14:27
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2624842&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: kernel
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: jb17bsome (jb17bsome)
Assigned to: Nobody/Anonymous (nobody)
Summary: kernel BUG at /kvm-84/kernel/x86/kvm_main.c:2148!

Initial Comment:
cpu: AMD Phenom 9750 (4)
host distro: fedora 10 x86_64
host kernel: linus-2.6 git (v2.6.29-rc5-276-g2ec77fc)
guest: any.  I have tried fedora 10, windows nt 4, and windows 2008 images.
kvm version: 84

usage:
qemu-system-x86_64 -m 512

-no-kvm-pit and -no-kvm-irqchip cause the same bug.
-no-kvm runs fine.


--

>Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:57

Message:
jb17bsome,

Try unloading the virtualbox driver.

--

Comment By: jb17bsome (jb17bsome)
Date: 2009-05-06 21:13

Message:
I upgraded my system to the latest F11 dev as of May 6 2009, but I get the
same result...
So the same kernel BUG bug at kvm_handle_fault_on_reboot. 
I attached another kbug dump (with a little context from dmesg).
Is there anything else I can try? 

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2624842&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2287677 ] kvm79 compiling errors (with-patched-kernel)

2009-06-04 Thread SourceForge.net
Bugs item #2287677, was opened at 2008-11-14 21:39
Message generated for change (Settings changed) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Darkman (darkman82)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm79 compiling errors (with-patched-kernel)

Initial Comment:


config.mak :

ARCH=i386
PROCESSOR=i386
PREFIX=/usr
KERNELDIR=/usr/src/linux-2.6.27.6/
KERNELSOURCEDIR=
LIBKVM_KERNELDIR=/root/kvm-79/kernel
WANT_MODULE=
CROSS_COMPILE=
CC=gcc
LD=ld
OBJCOPY=objcopy
AR=ar


ERRORS:
/root/kvm-79/qemu/qemu-kvm.c: In function 'ap_main_loop':
/root/kvm-79/qemu/qemu-kvm.c:459: error: 'kvm_arch_do_ioperm' undeclared (first 
use in this function)
/root/kvm-79/qemu/qemu-kvm.c:459: error: (Each undeclared identifier is 
reported only once
/root/kvm-79/qemu/qemu-kvm.c:459: error: for each function it appears in.)
/root/kvm-79/qemu/qemu-kvm.c: In function 'sigfd_handler':
/root/kvm-79/qemu/qemu-kvm.c:544: warning: format '%ld' expects type 'long 
int', but argument 2 has type 'ssize_t'
make[2]: *** [qemu-kvm.o] Error 1
make[2]: Leaving directory `/root/kvm-79/qemu/x86_64-softmmu'
make[1]: *** [subdir-x86_64-softmmu] Error 2
make[1]: Leaving directory `/root/kvm-79/qemu'
make: *** [qemu] Error 2

Same problem with 2.6.27.2 source.

kvm78 works fine.

--

>Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:51

Message:
Please try with kvm-85, and reopen in case its still problematic.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:51

Message:
Please try with kvm-85, and reopen in case its still problematic.

--

Comment By: Darkman (darkman82)
Date: 2008-12-05 19:33

Message:
It seems due to undefined USE_KVM_DEVICE_ASSIGNMENT.

In qemu-kvm.h:

qemu-kvm.h:95 #ifdef USE_KVM_DEVICE_ASSIGNMENT
qemu-kvm.h:96 void kvm_ioperm(CPUState *env, void *data);
qemu-kvm.h:97 void kvm_arch_do_ioperm(void *_data);
qemu-kvm.h:98 #endif

but in qemu-kvm.c we have

qemu-kvm.c:457/* do ioperm for io ports of assigned devices */
qemu-kvm.c:458LIST_FOREACH(data, &ioperm_head, entries)
qemu-kvm.c:459on_vcpu(env, kvm_arch_do_ioperm, data);

without #ifdef block.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2287677 ] kvm79 compiling errors (with-patched-kernel)

2009-06-04 Thread SourceForge.net
Bugs item #2287677, was opened at 2008-11-14 21:39
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Darkman (darkman82)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm79 compiling errors (with-patched-kernel)

Initial Comment:


config.mak :

ARCH=i386
PROCESSOR=i386
PREFIX=/usr
KERNELDIR=/usr/src/linux-2.6.27.6/
KERNELSOURCEDIR=
LIBKVM_KERNELDIR=/root/kvm-79/kernel
WANT_MODULE=
CROSS_COMPILE=
CC=gcc
LD=ld
OBJCOPY=objcopy
AR=ar


ERRORS:
/root/kvm-79/qemu/qemu-kvm.c: In function 'ap_main_loop':
/root/kvm-79/qemu/qemu-kvm.c:459: error: 'kvm_arch_do_ioperm' undeclared (first 
use in this function)
/root/kvm-79/qemu/qemu-kvm.c:459: error: (Each undeclared identifier is 
reported only once
/root/kvm-79/qemu/qemu-kvm.c:459: error: for each function it appears in.)
/root/kvm-79/qemu/qemu-kvm.c: In function 'sigfd_handler':
/root/kvm-79/qemu/qemu-kvm.c:544: warning: format '%ld' expects type 'long 
int', but argument 2 has type 'ssize_t'
make[2]: *** [qemu-kvm.o] Error 1
make[2]: Leaving directory `/root/kvm-79/qemu/x86_64-softmmu'
make[1]: *** [subdir-x86_64-softmmu] Error 2
make[1]: Leaving directory `/root/kvm-79/qemu'
make: *** [qemu] Error 2

Same problem with 2.6.27.2 source.

kvm78 works fine.

--

>Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:51

Message:
Please try with kvm-85, and reopen in case its still problematic.

--

Comment By: Darkman (darkman82)
Date: 2008-12-05 19:33

Message:
It seems due to undefined USE_KVM_DEVICE_ASSIGNMENT.

In qemu-kvm.h:

qemu-kvm.h:95 #ifdef USE_KVM_DEVICE_ASSIGNMENT
qemu-kvm.h:96 void kvm_ioperm(CPUState *env, void *data);
qemu-kvm.h:97 void kvm_arch_do_ioperm(void *_data);
qemu-kvm.h:98 #endif

but in qemu-kvm.c we have

qemu-kvm.c:457/* do ioperm for io ports of assigned devices */
qemu-kvm.c:458LIST_FOREACH(data, &ioperm_head, entries)
qemu-kvm.c:459on_vcpu(env, kvm_arch_do_ioperm, data);

without #ifdef block.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2287677&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2782199 ] linux_s3 ceased function

2009-06-04 Thread SourceForge.net
Bugs item #2782199, was opened at 2009-04-27 11:04
Message generated for change (Settings changed) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: linux_s3 ceased function

Initial Comment:
Test linux_s3, that worked fine with KVM-85rc3 ceased to function in 
KVM-85rc5/rc6/final release.

S3 is a power sleep test. (suspend)

Now it only works with 2 guests: RHEL 5, Fedora 8. (32 and 64-bit)

Previously, with KVM-85rc3, linux_s3 test worked with the following guests: 
RHEL 4, RHEL 5, Fedora 8, Fedora 9, openSUSE 11.0, openSUSE 11.1, and Ubuntu 
8.10.

I see this as a regression.

-Alexey, 27.4.2009.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:47

Message:
virtio_balloon thing

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2782199 ] linux_s3 ceased function

2009-06-04 Thread SourceForge.net
Bugs item #2782199, was opened at 2009-04-27 11:04
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: linux_s3 ceased function

Initial Comment:
Test linux_s3, that worked fine with KVM-85rc3 ceased to function in 
KVM-85rc5/rc6/final release.

S3 is a power sleep test. (suspend)

Now it only works with 2 guests: RHEL 5, Fedora 8. (32 and 64-bit)

Previously, with KVM-85rc3, linux_s3 test worked with the following guests: 
RHEL 4, RHEL 5, Fedora 8, Fedora 9, openSUSE 11.0, openSUSE 11.1, and Ubuntu 
8.10.

I see this as a regression.

-Alexey, 27.4.2009.

--

>Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:47

Message:
virtio_balloon thing

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2782199&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801212 ] sles10sp2 guest timer run too fast

2009-06-04 Thread SourceForge.net
Bugs item #2801212, was opened at 2009-06-04 11:17
Message generated for change (Settings changed) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
>Assigned to: Marcelo Tosatti (mtosatti)
Summary: sles10sp2 guest timer run too fast 

Initial Comment:
With kvm.git Commit:7ff90748cebbfbafc8cfa6bdd633113cd9537789
qemu-kvm Commit:a1cd3c985c848dae73966f9601f15fbcade72f1, we found that 
sles10sp2 will run much faster than real, about 27s faster each after 60s real 
time.

Reproduce steps:

(1)qemu-system-x86_64  -m 1024 -net nic,macaddr=00:16:3e:6f:f3:d1,model=rtl8139 
-net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sles10sp2.img
(2)Run ntpdate in guest: ntpdate sync_machine_ip && sleep 60 && ntpdate 
sync_machine_ip

Current result:

sles10sp2rc1-guest:~ #  ntpdate sync_machine_ip && sleep 60 && ntpdate 
sync_machine_ip
31 May 23:16:59 ntpdate[3303]: step time server 192.168.198.248 offset
-61.27418
31 May 23:17:32 ntpdate[3305]: step time server 192.168.198.248 offset
-27.626469 sec

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801459 ] i8042.c: No controller found...

2009-06-04 Thread SourceForge.net
Bugs item #2801459, was opened at 2009-06-04 19:39
Message generated for change (Tracker Item Submitted) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801459&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Marcelo Tosatti (mtosatti)
Summary: i8042.c: No controller found...

Initial Comment:
http://marc.info/?l=qemu-devel&m=124329227728366&w=2

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801459&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801458 ] BUG at mmu.c:615 from localhost migration using ept+hugetlbf

2009-06-04 Thread SourceForge.net
Bugs item #2801458, was opened at 2009-06-04 19:36
Message generated for change (Tracker Item Submitted) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801458&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: kernel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Marcelo Tosatti (mtosatti)
Summary: BUG at mmu.c:615 from localhost migration using ept+hugetlbf

Initial Comment:
http://www.mail-archive.com/kvm@vger.kernel.org/msg16136.html

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801458&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH KVM VMX 0/2] Enable Unrestricted Guest

2009-06-04 Thread Nitin A Kamble
Hi Avi,
  I have modified the earlier patch as per you comments. I have prepared a 
separate patch for renaming rmode.active to rmode.vm86_active. And 2nd patch 
enables the Unrestricted Guest feature in the KVM.
   This patch will also work with unfixed (cpu reset state) qemu. 
Please apply.

Thanks & Regards,
Nitin

Nitin A Kamble (2):
  KVM: VMX: Rename rmode.active to rmode.vm86_active
  KVM: VMX: Support Unrestricted Guest feature

 arch/x86/include/asm/kvm_host.h |   14 ---
 arch/x86/include/asm/vmx.h  |1 +
 arch/x86/kvm/vmx.c  |   77 +--
 3 files changed, 66 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH KVM VMX 2/2] KVM: VMX: Support Unrestricted Guest feature

2009-06-04 Thread Nitin A Kamble
"Unrestricted Guest" feature is added in the VMX specification.
Intel Westmere and onwards processors will support this feature.

It allows kvm guests to run real mode and unpaged mode
code natively in the VMX mode when EPT is turned on. With the
unrestricted guest there is no need to emulate the guest real mode code
in the vm86 container or in the emulator. Also the guest big real mode
code works like native.

  The attached patch enhances KVM to use the unrestricted guest feature
if available on the processor. It also adds a new kernel/module
parameter to disable the unrestricted guest feature at the boot time.

Signed-off-by: Nitin A Kamble 
---
 arch/x86/include/asm/kvm_host.h |   12 +
 arch/x86/include/asm/vmx.h  |1 +
 arch/x86/kvm/vmx.c  |   49 ++
 3 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1cc901e..a1a96a5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,12 +37,14 @@
 #define CR3_L_MODE_RESERVED_BITS (CR3_NONPAE_RESERVED_BITS |   \
  0xFF00ULL)
 
-#define KVM_GUEST_CR0_MASK\
-   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE \
-| X86_CR0_NW | X86_CR0_CD)
+#define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
+   (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
+#define KVM_GUEST_CR0_MASK \
+   (KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE)
+#define KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST
\
+   (X86_CR0_WP | X86_CR0_NE | X86_CR0_TS | X86_CR0_MP)
 #define KVM_VM_CR0_ALWAYS_ON   \
-   (X86_CR0_PG | X86_CR0_PE | X86_CR0_WP | X86_CR0_NE | X86_CR0_TS \
-| X86_CR0_MP)
+   (KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST | X86_CR0_PG | X86_CR0_PE)
 #define KVM_GUEST_CR4_MASK \
(X86_CR4_VME | X86_CR4_PSE | X86_CR4_PAE | X86_CR4_PGE | X86_CR4_VMXE)
 #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 498f944..c73da02 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -55,6 +55,7 @@
 #define SECONDARY_EXEC_ENABLE_EPT   0x0002
 #define SECONDARY_EXEC_ENABLE_VPID  0x0020
 #define SECONDARY_EXEC_WBINVD_EXITING  0x0040
+#define SECONDARY_EXEC_UNRESTRICTED_GUEST  0x0080
 
 
 #define PIN_BASED_EXT_INTR_MASK 0x0001
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d1ec8a9..b3d8a3a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -50,6 +50,10 @@ module_param_named(flexpriority, flexpriority_enabled, bool, 
S_IRUGO);
 static int __read_mostly enable_ept = 1;
 module_param_named(ept, enable_ept, bool, S_IRUGO);
 
+static int __read_mostly enable_unrestricted_guest = 1;
+module_param_named(unrestricted_guest,
+   enable_unrestricted_guest, bool, S_IRUGO);
+
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
@@ -270,6 +274,12 @@ static inline int cpu_has_vmx_ept(void)
SECONDARY_EXEC_ENABLE_EPT;
 }
 
+static inline int cpu_has_vmx_unrestricted_guest(void)
+{
+   return vmcs_config.cpu_based_2nd_exec_ctrl &
+   SECONDARY_EXEC_UNRESTRICTED_GUEST;
+}
+
 static inline int vm_need_virtualize_apic_accesses(struct kvm *kvm)
 {
return flexpriority_enabled &&
@@ -1201,7 +1211,8 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
opt2 = SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
SECONDARY_EXEC_WBINVD_EXITING |
SECONDARY_EXEC_ENABLE_VPID |
-   SECONDARY_EXEC_ENABLE_EPT;
+   SECONDARY_EXEC_ENABLE_EPT |
+   SECONDARY_EXEC_UNRESTRICTED_GUEST;
if (adjust_vmx_controls(min2, opt2,
MSR_IA32_VMX_PROCBASED_CTLS2,
&_cpu_based_2nd_exec_control) < 0)
@@ -1331,8 +1342,13 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_vpid())
enable_vpid = 0;
 
-   if (!cpu_has_vmx_ept())
+   if (!cpu_has_vmx_ept()) {
enable_ept = 0;
+   enable_unrestricted_guest = 0;
+   }
+
+   if (!cpu_has_vmx_unrestricted_guest())
+   enable_unrestricted_guest = 0;
 
if (!cpu_has_vmx_flexpriority())
flexpriority_enabled = 0;
@@ -1431,6 +1447,9 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
unsigned long flags;
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
+   if (enable_unr

[PATCH KVM VMX 1/2] KVM: VMX: Rename rmode.active to rmode.vm86_active

2009-06-04 Thread Nitin A Kamble
That way the interpretation of rmode.active becomes more clear with
unrestricted guest code.

Signed-off-by: Nitin A Kamble 
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/vmx.c  |   28 ++--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1951d39..1cc901e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -339,7 +339,7 @@ struct kvm_vcpu_arch {
} interrupt;
 
struct {
-   int active;
+   int vm86_active;
u8 save_iopl;
struct kvm_save_segment {
u16 selector;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fd05fd2..d1ec8a9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -497,7 +497,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
eb |= 1u << BP_VECTOR;
}
-   if (vcpu->arch.rmode.active)
+   if (vcpu->arch.rmode.vm86_active)
eb = ~0;
if (enable_ept)
eb &= ~(1u << PF_VECTOR); /* bypass_guest_pf = 0 */
@@ -733,7 +733,7 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 
 static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
-   if (vcpu->arch.rmode.active)
+   if (vcpu->arch.rmode.vm86_active)
rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
vmcs_writel(GUEST_RFLAGS, rflags);
 }
@@ -790,7 +790,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
intr_info |= INTR_INFO_DELIVER_CODE_MASK;
}
 
-   if (vcpu->arch.rmode.active) {
+   if (vcpu->arch.rmode.vm86_active) {
vmx->rmode.irq.pending = true;
vmx->rmode.irq.vector = nr;
vmx->rmode.irq.rip = kvm_rip_read(vcpu);
@@ -1370,7 +1370,7 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
vmx->emulation_required = 1;
-   vcpu->arch.rmode.active = 0;
+   vcpu->arch.rmode.vm86_active = 0;
 
vmcs_writel(GUEST_TR_BASE, vcpu->arch.rmode.tr.base);
vmcs_write32(GUEST_TR_LIMIT, vcpu->arch.rmode.tr.limit);
@@ -1432,7 +1432,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
vmx->emulation_required = 1;
-   vcpu->arch.rmode.active = 1;
+   vcpu->arch.rmode.vm86_active = 1;
 
vcpu->arch.rmode.tr.base = vmcs_readl(GUEST_TR_BASE);
vmcs_writel(GUEST_TR_BASE, rmode_tss_base(vcpu->kvm));
@@ -1616,10 +1616,10 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 
vmx_fpu_deactivate(vcpu);
 
-   if (vcpu->arch.rmode.active && (cr0 & X86_CR0_PE))
+   if (vcpu->arch.rmode.vm86_active && (cr0 & X86_CR0_PE))
enter_pmode(vcpu);
 
-   if (!vcpu->arch.rmode.active && !(cr0 & X86_CR0_PE))
+   if (!vcpu->arch.rmode.vm86_active && !(cr0 & X86_CR0_PE))
enter_rmode(vcpu);
 
 #ifdef CONFIG_X86_64
@@ -1675,7 +1675,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long cr3)
 
 static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-   unsigned long hw_cr4 = cr4 | (vcpu->arch.rmode.active ?
+   unsigned long hw_cr4 = cr4 | (vcpu->arch.rmode.vm86_active ?
KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
 
vcpu->arch.cr4 = cr4;
@@ -1758,7 +1758,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
struct kvm_vmx_segment_field *sf = &kvm_vmx_segment_fields[seg];
u32 ar;
 
-   if (vcpu->arch.rmode.active && seg == VCPU_SREG_TR) {
+   if (vcpu->arch.rmode.vm86_active && seg == VCPU_SREG_TR) {
vcpu->arch.rmode.tr.selector = var->selector;
vcpu->arch.rmode.tr.base = var->base;
vcpu->arch.rmode.tr.limit = var->limit;
@@ -1768,7 +1768,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmcs_writel(sf->base, var->base);
vmcs_write32(sf->limit, var->limit);
vmcs_write16(sf->selector, var->selector);
-   if (vcpu->arch.rmode.active && var->s) {
+   if (vcpu->arch.rmode.vm86_active && var->s) {
/*
 * Hack real-mode segments into vm86 compatibility.
 */
@@ -2337,7 +2337,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
goto out;
}
 
-   vmx->vcpu.arch.rmode.active = 0;
+   vmx->vcpu.arch.rmode.vm86_active = 0;
 
vmx->soft_vnmi_blocked = 0;
 
@@ -2475,7 +2475,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler);
 
++vcpu->stat.irq_injections;
-   if (vcpu->arch.rmode.active) {
+   if (vcpu->arch.rmode.vm86_active) {
vmx->rmode.irq.pend

Re: KVM on Debian

2009-06-04 Thread Matthew Palmer
On Thu, Jun 04, 2009 at 01:37:54PM -0700, Aaron Clausen wrote:
> I'm running a production Debian Lenny server using KVM to run a couple
> of Windows and a couple of Linux guests.  All is working well, but I
> want to give my Server 2003 guest access to a SCSI tape drive.
> Unfortunately, Debian is pretty conservative, and the version of KVM
> is too old to support this.  Is there a reasonably safe way of
> upgrading to one of the newer versions of KVM on this server?

Backporting kvm from experimental is straightforward, and has worked fine
for me.

- Matt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] QEMU KVM: i386: Fix the cpu reset state

2009-06-04 Thread Nitin A Kamble
As per the IA32 processor manual, the accessed bit is set to 1 in the
processor state after reset. qemu pc cpu_reset code was missing this
accessed bit setting.

Signed-off-by: Nitin A Kamble 
---
 target-i386/helper.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 7fc5366..573fb5b 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -493,17 +493,23 @@ void cpu_reset(CPUX86State *env)
 env->tr.flags = DESC_P_MASK | (11 << DESC_TYPE_SHIFT);
 
 cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
DESC_R_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK |
+   DESC_R_MASK | DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 
 env->eip = 0xfff0;
 env->regs[R_EDX] = env->cpuid_version;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Mike Waychison

Avi Kivity wrote:

Bharata B Rao wrote:

2. Need for hard limiting CPU resource
--
- Pay-per-use: In enterprise systems that cater to multiple clients/customers
  where a customer demands a certain share of CPU resources and pays only
  that, CPU hard limits will be useful to hard limit the customer's job
  to consume only the specified amount of CPU resource.
- In container based virtualization environments running multiple containers,
  hard limits will be useful to ensure a container doesn't exceed its
  CPU entitlement.
- Hard limits can be used to provide guarantees.
  

How can hard limits provide guarantees?


Hard limits are useful and desirable in situations where we would like 
to maintain deterministic behavior.


Placing a hard cap on the cpu usage of a given task group (and 
configuring such that this cpu time is not overcommited) on a system 
allows us to create a hard guarantee that throughput for that task group 
will not fluctuate as other workloads are added and removed on the system.


Cache use and bus bandwidth in a multi-workload environment can still 
cause a performance deviation, but these are second order compared to 
the cpu scheduling guarantees themselves.


Mike Waychison
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Mark van Walraven
On Thu, Jun 04, 2009 at 01:37:54PM -0700, Aaron Clausen wrote:
> I'm running a production Debian Lenny server using KVM to run a couple
> of Windows and a couple of Linux guests.  All is working well, but I
> want to give my Server 2003 guest access to a SCSI tape drive.
> Unfortunately, Debian is pretty conservative, and the version of KVM
> is too old to support this.  Is there a reasonably safe way of
> upgrading to one of the newer versions of KVM on this server?

I'm interested in this too, so far I have found that Lenny's libvirt fails
to parse the output of kvm --help, though this is fixed in the libvirt in
testing.  The kvm package from experimental seems to work well - after a
day of testing.

My next step is to try qemu-kvm, built from source.  The Debianised libvirt
expects the kvm binaries to be in /usr/bin/kvm, so you can symlink them
from /usr/local/bin if you prefer to install there.  I've also experimented
with shell script wrapper in /usr/bin/kvm that condenses the output of
qemu-kvm --help so that libvirtd for Lenny works.

Regards,

Mark.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM on Debian

2009-06-04 Thread Aaron Clausen
I'm running a production Debian Lenny server using KVM to run a couple
of Windows and a couple of Linux guests.  All is working well, but I
want to give my Server 2003 guest access to a SCSI tape drive.
Unfortunately, Debian is pretty conservative, and the version of KVM
is too old to support this.  Is there a reasonably safe way of
upgrading to one of the newer versions of KVM on this server?

-- 
Aaron Clausen
mightymartia...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 05:18:06PM -0300, Glauber Costa wrote:
> > > this first phase has nothing to do with functionality. To begin with,
> > > KVMState is qemu style, kvm_context_t is not, like it or not (I don't).
> > > 
> > I am not against this mechanical change at all, don't get me wrong. I
> > don't want to mix two kvm implementation together in strange ways.
> > 
> too late for not wanting anything strange to happen ;-)
> 
You are right, I should have said "in stranger ways".

> But I do believe this is the way to turn qemu-kvm.git into something
> that feeds qemu.git. And that's what we all want.
Disagree with first part, agree with second :)

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Glauber Costa
On Thu, Jun 04, 2009 at 11:09:52PM +0300, Gleb Natapov wrote:
> On Thu, Jun 04, 2009 at 05:10:51PM -0300, Glauber Costa wrote:
> > On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote:
> > > On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
> > > > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
> > > > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
> > > > > > This is a pretty mechanical change. To make code look
> > > > > > closer to upstream qemu, I'm renaming kvm_context_t to
> > > > > > KVMState. Mid term goal here is to start sharing code
> > > > > > whereas possible.
> > > > > > 
> > > > > > Avi, please apply, or I'll send you a video of myself
> > > > > > dancing naked.
> > > > > > 
> > > > > You can start recording it since I doubt this patch will apply cleanly
> > > > > to today's master (other mechanical change was applied). Regardless, I
> > > > > think trying to use bits of qemu kvm is dangerous. It has similar 
> > > > > function
> > > > > with same names, but with different assumptions about conditional they
> > > > > can be executed in (look at commit a5ddb119). I actually prefer to be
> > > > > different enough to not call upstream qemu function by mistake.
> > > > 
> > > > I did it against today's master. If new patches came in, is just
> > > > a matter of regenerating this, since it is, as I said, mechanical.
> > > > 
> > > > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
> > > > are not included in the final object), there is no such risk.
> > > > Of course, I am aiming towards it, but the first step will be to change
> > > > the name of conflicting functions until we can pick qemu's 
> > > > implementation,
> > > > in which case the former will just go away.
> > > That is the point. We can't just pick qemu's implementation most of the
> > > times.
> > "until we can pick up qemu's implementation" potentially involves replacing
> > that particular piece with upstream version first.
> > 
> > > 
> > > > 
> > > > If we are serious about merging qemu-kvm into qemu, I don't see a way 
> > > > out
> > > > of it. We should start changing things this way to accomodate it. 
> > > > Different
> > > > enough won't do.
> > > I don't really like the idea to morph working implementation to look like
> > > non-working one. I do agree that qemu-kvm should be cleaned substantially
> > > before going upstream. Upstream qemu kvm should go away than. I don't
> > > see much work done to enhance it anyway.
> > > 
> > 
> > this first phase has nothing to do with functionality. To begin with,
> > KVMState is qemu style, kvm_context_t is not, like it or not (I don't).
> > 
> I am not against this mechanical change at all, don't get me wrong. I
> don't want to mix two kvm implementation together in strange ways.
> 
too late for not wanting anything strange to happen ;-)

But I do believe this is the way to turn qemu-kvm.git into something
that feeds qemu.git. And that's what we all want.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 05:10:51PM -0300, Glauber Costa wrote:
> On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote:
> > On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
> > > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
> > > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
> > > > > This is a pretty mechanical change. To make code look
> > > > > closer to upstream qemu, I'm renaming kvm_context_t to
> > > > > KVMState. Mid term goal here is to start sharing code
> > > > > whereas possible.
> > > > > 
> > > > > Avi, please apply, or I'll send you a video of myself
> > > > > dancing naked.
> > > > > 
> > > > You can start recording it since I doubt this patch will apply cleanly
> > > > to today's master (other mechanical change was applied). Regardless, I
> > > > think trying to use bits of qemu kvm is dangerous. It has similar 
> > > > function
> > > > with same names, but with different assumptions about conditional they
> > > > can be executed in (look at commit a5ddb119). I actually prefer to be
> > > > different enough to not call upstream qemu function by mistake.
> > > 
> > > I did it against today's master. If new patches came in, is just
> > > a matter of regenerating this, since it is, as I said, mechanical.
> > > 
> > > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
> > > are not included in the final object), there is no such risk.
> > > Of course, I am aiming towards it, but the first step will be to change
> > > the name of conflicting functions until we can pick qemu's implementation,
> > > in which case the former will just go away.
> > That is the point. We can't just pick qemu's implementation most of the
> > times.
> "until we can pick up qemu's implementation" potentially involves replacing
> that particular piece with upstream version first.
> 
> > 
> > > 
> > > If we are serious about merging qemu-kvm into qemu, I don't see a way out
> > > of it. We should start changing things this way to accomodate it. 
> > > Different
> > > enough won't do.
> > I don't really like the idea to morph working implementation to look like
> > non-working one. I do agree that qemu-kvm should be cleaned substantially
> > before going upstream. Upstream qemu kvm should go away than. I don't
> > see much work done to enhance it anyway.
> > 
> 
> this first phase has nothing to do with functionality. To begin with,
> KVMState is qemu style, kvm_context_t is not, like it or not (I don't).
> 
I am not against this mechanical change at all, don't get me wrong. I
don't want to mix two kvm implementation together in strange ways.

> I don't plan to introduce regressions, you can rest assured. But we _do_
> have to make things look much more qemuer, and that's what this patch
> aims at.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Glauber Costa
On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote:
> On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
> > On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
> > > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
> > > > This is a pretty mechanical change. To make code look
> > > > closer to upstream qemu, I'm renaming kvm_context_t to
> > > > KVMState. Mid term goal here is to start sharing code
> > > > whereas possible.
> > > > 
> > > > Avi, please apply, or I'll send you a video of myself
> > > > dancing naked.
> > > > 
> > > You can start recording it since I doubt this patch will apply cleanly
> > > to today's master (other mechanical change was applied). Regardless, I
> > > think trying to use bits of qemu kvm is dangerous. It has similar function
> > > with same names, but with different assumptions about conditional they
> > > can be executed in (look at commit a5ddb119). I actually prefer to be
> > > different enough to not call upstream qemu function by mistake.
> > 
> > I did it against today's master. If new patches came in, is just
> > a matter of regenerating this, since it is, as I said, mechanical.
> > 
> > Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
> > are not included in the final object), there is no such risk.
> > Of course, I am aiming towards it, but the first step will be to change
> > the name of conflicting functions until we can pick qemu's implementation,
> > in which case the former will just go away.
> That is the point. We can't just pick qemu's implementation most of the
> times.
"until we can pick up qemu's implementation" potentially involves replacing
that particular piece with upstream version first.

> 
> > 
> > If we are serious about merging qemu-kvm into qemu, I don't see a way out
> > of it. We should start changing things this way to accomodate it. Different
> > enough won't do.
> I don't really like the idea to morph working implementation to look like
> non-working one. I do agree that qemu-kvm should be cleaned substantially
> before going upstream. Upstream qemu kvm should go away than. I don't
> see much work done to enhance it anyway.
> 

this first phase has nothing to do with functionality. To begin with,
KVMState is qemu style, kvm_context_t is not, like it or not (I don't).

I don't plan to introduce regressions, you can rest assured. But we _do_
have to make things look much more qemuer, and that's what this patch
aims at.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] apic/ioapic kvm free implementation

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 09:46:23PM +0200, Jan Kiszka wrote:
> Gleb Natapov wrote:
> > On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote:
> >> Same thing,
> >>
> >> addressing comments from gleb.
> >>
> >>
> > Jan, can you run your test on this one? It differs from previous one in
> > halt handling.
> 
> Still works for me.
> 
Cool, thanks.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
> On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
> > On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
> > > This is a pretty mechanical change. To make code look
> > > closer to upstream qemu, I'm renaming kvm_context_t to
> > > KVMState. Mid term goal here is to start sharing code
> > > whereas possible.
> > > 
> > > Avi, please apply, or I'll send you a video of myself
> > > dancing naked.
> > > 
> > You can start recording it since I doubt this patch will apply cleanly
> > to today's master (other mechanical change was applied). Regardless, I
> > think trying to use bits of qemu kvm is dangerous. It has similar function
> > with same names, but with different assumptions about conditional they
> > can be executed in (look at commit a5ddb119). I actually prefer to be
> > different enough to not call upstream qemu function by mistake.
> 
> I did it against today's master. If new patches came in, is just
> a matter of regenerating this, since it is, as I said, mechanical.
> 
> Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
> are not included in the final object), there is no such risk.
> Of course, I am aiming towards it, but the first step will be to change
> the name of conflicting functions until we can pick qemu's implementation,
> in which case the former will just go away.
That is the point. We can't just pick qemu's implementation most of the
times.

> 
> If we are serious about merging qemu-kvm into qemu, I don't see a way out
> of it. We should start changing things this way to accomodate it. Different
> enough won't do.
I don't really like the idea to morph working implementation to look like
non-working one. I do agree that qemu-kvm should be cleaned substantially
before going upstream. Upstream qemu kvm should go away than. I don't
see much work done to enhance it anyway.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] apic/ioapic kvm free implementation

2009-06-04 Thread Jan Kiszka
Gleb Natapov wrote:
> On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote:
>> Same thing,
>>
>> addressing comments from gleb.
>>
>>
> Jan, can you run your test on this one? It differs from previous one in
> halt handling.

Still works for me.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Glauber Costa
On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
> On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
> > This is a pretty mechanical change. To make code look
> > closer to upstream qemu, I'm renaming kvm_context_t to
> > KVMState. Mid term goal here is to start sharing code
> > whereas possible.
> > 
> > Avi, please apply, or I'll send you a video of myself
> > dancing naked.
> > 
> You can start recording it since I doubt this patch will apply cleanly
> to today's master (other mechanical change was applied). Regardless, I
> think trying to use bits of qemu kvm is dangerous. It has similar function
> with same names, but with different assumptions about conditional they
> can be executed in (look at commit a5ddb119). I actually prefer to be
> different enough to not call upstream qemu function by mistake.

I did it against today's master. If new patches came in, is just
a matter of regenerating this, since it is, as I said, mechanical.

Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
are not included in the final object), there is no such risk.
Of course, I am aiming towards it, but the first step will be to change
the name of conflicting functions until we can pick qemu's implementation,
in which case the former will just go away.

If we are serious about merging qemu-kvm into qemu, I don't see a way out
of it. We should start changing things this way to accomodate it. Different
enough won't do.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

Oh, I don't doubt that (in fact, I was pretty sure that was the case
based on some of the optimizations I could see in studying the c_t_u()
path).  I just didn't realize there were other ways to do it if its a
non "current" task. ;)

I guess the enigma for me right now is what cost does switch_mm have? 
(Thats not a slam against the suggested approach...I really do not know

and am curious).
  


switch_mm() is probably very cheap (reloads cr3), but it does dirty the 
current cpu's tlb.  When the kernel needs to flush a process' tlb, it 
will have to IPI that cpu in addition to all others.  This takes place, 
for example, after munmap() or after a page is swapped out (though 
significant batching is done there).


It's still plenty cheaper in my estimation.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>> Avi,
>>
>> Gregory Haskins wrote:
>>  
>>> Todo:
>>> *) Develop some kind of hypercall registration mechanism for KVM so
>>> that
>>>we can use that as an integration point instead of directly hooking
>>>kvm hypercalls
>>>   
>>
>> What would you like to see here?  I now remember why I removed the
>> original patch I had for registration...it requires some kind of
>> discovery mechanism on its own.  Note that this is hard, but I figured
>> it would make the overall series simpler if I didn't go this route and
>> instead just integrated with a statically allocated vector.  That being
>> said, I have no problem adding this back in but figure we should discuss
>> the approach so I don't go down a rat-hole ;)
>>
>>   
>
>
> One idea is similar to signalfd() or eventfd().  Provide a kvm ioctl
> that takes a gsi and returns an fd.  Writes to the fd change the state
> of the line, possible triggering an interrupt.  Another ioctl takes a
> hypercall number or pio port as well as an existing fd.  Invocations
> of the hypercall or writes to the port write to the fd (using the same
> protocol as eventfd), so the other end can respond.
>
> The nice thing is that this can be used by both kernel and userspace
> components, and for kernel components, hypercalls can be either
> buffered or unbuffered.


And thus the "kvm-eventfd" (irqfd/iosignalfd) interface project was born. ;)

(Michael FYI: so I will be pushing a vbus-v4 series at some point in the
near future that is expressed in terms of irqfd/iosignalfd, per the
conversation above.  The patches in v3 and earlier are more intrusive to
the KVM core than they will be in final form)

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>
>   
>>> BTW, why did you decide to use get_user_pages?
>>> Would switch_mm + copy_to_user work as well
>>> avoiding page walk if all pages are present?
>>>   
>>
>> Well, basic c_t_u() won't work because its likely not "current" if you
>> are updating the ring from some other task, but I think you have already
>> figured that out based on the switch_mm suggestion.  The simple truth is
>> I was not familiar with switch_mm at the time I wrote this (nor am I
>> now).  If this is a superior method that allows you to acquire
>> c_t_u(some_other_ctx) like behavior, I see no problem in changing.  I
>> will look into this, and thanks for the suggestion!
>>   
>
> copy_to_user() is significantly faster than get_user_pages() + kmap()
> + memcmp() (or their variants).
>

Oh, I don't doubt that (in fact, I was pretty sure that was the case
based on some of the optimizations I could see in studying the c_t_u()
path).  I just didn't realize there were other ways to do it if its a
non "current" task. ;)

I guess the enigma for me right now is what cost does switch_mm have? 
(Thats not a slam against the suggested approach...I really do not know
and am curious).

As an aside, note that we seem to be reviewing v2, where v3 is really
the last set I pushed.  I think this patch is more or less the same
across both iterations, but FYI that I would recommend looking at v3
instead.

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

   


BTW, why did you decide to use get_user_pages?
Would switch_mm + copy_to_user work as well
avoiding page walk if all pages are present?
  



Well, basic c_t_u() won't work because its likely not "current" if you
are updating the ring from some other task, but I think you have already
figured that out based on the switch_mm suggestion.  The simple truth is
I was not familiar with switch_mm at the time I wrote this (nor am I
now).  If this is a superior method that allows you to acquire
c_t_u(some_other_ctx) like behavior, I see no problem in changing.  I
will look into this, and thanks for the suggestion!
  


copy_to_user() is significantly faster than get_user_pages() + kmap() + 
memcmp() (or their variants).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Avi Kivity

Michael S. Tsirkin wrote:

Also - if we just had vmexit because a process executed
io (or hypercall), can't we just do copy_to_user there?
Avi, I think at some point you said that we can?
  


You can do copy_to_user() whereever it is legal in Linux.  Almost all of 
kvm runs in process context, preemptible, and with interrupts enabled.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Thu, Apr 09, 2009 at 12:30:57PM -0400, Gregory Haskins wrote:
>   
>> +static unsigned long
>> +task_memctx_copy_to(struct vbus_memctx *ctx, void *dst, const void *src,
>> +unsigned long n)
>> +{
>> +struct task_memctx *tm = to_task_memctx(ctx);
>> +struct task_struct *p = tm->task;
>> +
>> +while (n) {
>> +unsigned long offset = ((unsigned long)dst)%PAGE_SIZE;
>> +unsigned long len = PAGE_SIZE - offset;
>> +int ret;
>> +struct page *pg;
>> +void *maddr;
>> +
>> +if (len > n)
>> +len = n;
>> +
>> +down_read(&p->mm->mmap_sem);
>> +ret = get_user_pages(p, p->mm,
>> + (unsigned long)dst, 1, 1, 0, &pg, NULL);
>> +
>> +if (ret != 1) {
>> +up_read(&p->mm->mmap_sem);
>> +break;
>> +}
>> +
>> +maddr = kmap_atomic(pg, KM_USER0);
>> +memcpy(maddr + offset, src, len);
>> +kunmap_atomic(maddr, KM_USER0);
>> +set_page_dirty_lock(pg);
>> +put_page(pg);
>> +up_read(&p->mm->mmap_sem);
>> +
>> +src += len;
>> +dst += len;
>> +n -= len;
>> +}
>> +
>> +return n;
>> +}
>> 
>
> BTW, why did you decide to use get_user_pages?
> Would switch_mm + copy_to_user work as well
> avoiding page walk if all pages are present?
>   

Well, basic c_t_u() won't work because its likely not "current" if you
are updating the ring from some other task, but I think you have already
figured that out based on the switch_mm suggestion.  The simple truth is
I was not familiar with switch_mm at the time I wrote this (nor am I
now).  If this is a superior method that allows you to acquire
c_t_u(some_other_ctx) like behavior, I see no problem in changing.  I
will look into this, and thanks for the suggestion!

> Also - if we just had vmexit because a process executed
> io (or hypercall), can't we just do copy_to_user there?
> Avi, I think at some point you said that we can?
>   

Right, and yes that will work I believe.  We could always do a "if (p ==
current)" check to test for this.  To date, I don't typically do
anything mem-ops related directly in vcpu context so this wasn't an
issue...but that doesn't mean someone wont try in the future.  
Therefore, I agree we should strive to optimize it if we can.
>
>   

Thanks Michael,
-Greg



signature.asc
Description: OpenPGP digital signature


[patch 4/4] KVM: switch irq injection/acking data structures to irq_lock

2009-06-04 Thread Marcelo Tosatti
Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A   CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(&kvm->lock);worker_thread()
  -> kvm_deassign_irq()-> 
kvm_assigned_dev_interrupt_work_handler()
-> deassign_host_irq()   mutex_lock(&kvm->lock);
  -> cancel_work_sync() [blocked]

Reported-by: Alex Williamson 
Signed-off-by: Marcelo Tosatti 

Index: kvm/arch/x86/kvm/i8254.c
===
--- kvm.orig/arch/x86/kvm/i8254.c
+++ kvm/arch/x86/kvm/i8254.c
@@ -651,10 +651,10 @@ static void __inject_pit_timer_intr(stru
struct kvm_vcpu *vcpu;
int i;
 
-   mutex_lock(&kvm->lock);
+   mutex_lock(&kvm->irq_lock);
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
-   mutex_unlock(&kvm->lock);
+   mutex_unlock(&kvm->irq_lock);
 
/*
 * Provides NMI watchdog support via Virtual Wire mode.
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -2099,10 +2099,10 @@ long kvm_arch_vm_ioctl(struct file *filp
goto out;
if (irqchip_in_kernel(kvm)) {
__s32 status;
-   mutex_lock(&kvm->lock);
+   mutex_lock(&kvm->irq_lock);
status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
irq_event.irq, irq_event.level);
-   mutex_unlock(&kvm->lock);
+   mutex_unlock(&kvm->irq_lock);
if (ioctl == KVM_IRQ_LINE_STATUS) {
irq_event.status = status;
if (copy_to_user(argp, &irq_event,
@@ -2348,12 +2348,11 @@ mmio:
 */
mutex_lock(&vcpu->kvm->lock);
mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
+   mutex_unlock(&vcpu->kvm->lock);
if (mmio_dev) {
kvm_iodevice_read(mmio_dev, gpa, bytes, val);
-   mutex_unlock(&vcpu->kvm->lock);
return X86EMUL_CONTINUE;
}
-   mutex_unlock(&vcpu->kvm->lock);
 
vcpu->mmio_needed = 1;
vcpu->mmio_phys_addr = gpa;
@@ -2403,12 +2402,11 @@ mmio:
 */
mutex_lock(&vcpu->kvm->lock);
mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 1);
+   mutex_unlock(&vcpu->kvm->lock);
if (mmio_dev) {
kvm_iodevice_write(mmio_dev, gpa, bytes, val);
-   mutex_unlock(&vcpu->kvm->lock);
return X86EMUL_CONTINUE;
}
-   mutex_unlock(&vcpu->kvm->lock);
 
vcpu->mmio_needed = 1;
vcpu->mmio_phys_addr = gpa;
@@ -2731,7 +2729,6 @@ static void kernel_pio(struct kvm_io_dev
 {
/* TODO: String I/O for in kernel device */
 
-   mutex_lock(&vcpu->kvm->lock);
if (vcpu->arch.pio.in)
kvm_iodevice_read(pio_dev, vcpu->arch.pio.port,
  vcpu->arch.pio.size,
@@ -2740,7 +2737,6 @@ static void kernel_pio(struct kvm_io_dev
kvm_iodevice_write(pio_dev, vcpu->arch.pio.port,
   vcpu->arch.pio.size,
   pd);
-   mutex_unlock(&vcpu->kvm->lock);
 }
 
 static void pio_string_write(struct kvm_io_device *pio_dev,
@@ -2750,14 +2746,12 @@ static void pio_string_write(struct kvm_
void *pd = vcpu->arch.pio_data;
int i;
 
-   mutex_lock(&vcpu->kvm->lock);
for (i = 0; i < io->cur_count; i++) {
kvm_iodevice_write(pio_dev, io->port,
   io->size,
   pd);
pd += io->size;
}
-   mutex_unlock(&vcpu->kvm->lock);
 }
 
 static struct kvm_io_device *vcpu_find_pio_dev(struct kvm_vcpu *vcpu,
@@ -2794,7 +2788,9 @@ int kvm_emulate_pio(struct kvm_vcpu *vcp
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(vcpu->arch.pio_data, &val, 4);
 
+   mutex_lock(&vcpu->kvm->lock);
pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in);
+   mutex_unlock(&vcpu->kvm->lock);
if (pio_dev) {
kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data);
complete_pio(vcpu);
@@ -2858,9 +2854,12 @@ int kvm_emulate_pio_string(struct kvm_vc
 
vcpu->arch.pio.guest_gva = address;
 
+   mutex_lock(&vcpu->kvm->lock);
pio_dev = vcpu_find_pio_dev(vcpu, port,
vcpu->arch.pio.cur_count,
!vcpu->arch.pio.in);
+   mutex_unlock(&vcpu->kvm->lock);
+
if (!vcpu->arch.pio.in) {
/* string PIO write */
ret = pio_copy_data(vcpu);
Index: kvm/virt/k

Re: [patch] VMX Unrestricted mode support

2009-06-04 Thread Jan Kiszka
Nitin A Kamble wrote:
> Hi Avi,
>   I find that the qemu processor reset state is not per the IA32
> processor specifications. (Sections 8.1.1 of
> http://www.intel.com/Assets/PDF/manual/253668.pdf)
> 
> In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the
> segment registers are initialized as follows:
> 
> cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
>DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
>   DESC_R_MASK);
> cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
>DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
>DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
>DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
>DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
>DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> 
> While the IA32 cpu reset state specification says that Segment Accessed
> bit is also 1 at the time of cpu reset. so the above code should look
> like this:
> 
> cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
>  DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
>  DESC_R_MASK | DESC_A_MASK);
> cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
>  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK);
> cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
>  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK);
> cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
>  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK);
> cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
>  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
>  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
> 
> This discrepancy is adding the need of the following function in the
> unrestricted guest patch.

As Avi already indicated: Independent of the kvm workaround for older
qemu versions, please post (to qemu-devel) a patch against upstream's
git to fix the discrepancy.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/4] move irq protection role to separate lock v4

2009-06-04 Thread Marcelo Tosatti
This is to fix a deadlock reported by Alex Williamson, while at
the same time makes it easier to allow PIO/MMIO regions to be
registered/unregistered while a guest is alive.


-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/4] KVM: introduce irq_lock, use it to protect ioapic

2009-06-04 Thread Marcelo Tosatti
Introduce irq_lock, and use to protect ioapic data structures.

Signed-off-by: Marcelo Tosatti 

Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -123,7 +123,6 @@ struct kvm_kernel_irq_routing_entry {
 };
 
 struct kvm {
-   struct mutex lock; /* protects the vcpus array and APIC accesses */
spinlock_t mmu_lock;
struct rw_semaphore slots_lock;
struct mm_struct *mm; /* userspace tied to this vm */
@@ -132,6 +131,7 @@ struct kvm {
KVM_PRIVATE_MEM_SLOTS];
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
struct list_head vm_list;
+   struct mutex lock;
struct kvm_io_bus mmio_bus;
struct kvm_io_bus pio_bus;
 #ifdef CONFIG_HAVE_KVM_EVENTFD
@@ -145,6 +145,7 @@ struct kvm {
struct kvm_coalesced_mmio_ring *coalesced_mmio_ring;
 #endif
 
+   struct mutex irq_lock;
 #ifdef CONFIG_HAVE_KVM_IRQCHIP
struct list_head irq_routing; /* of kvm_kernel_irq_routing_entry */
struct hlist_head mask_notifier_list;
Index: kvm/virt/kvm/ioapic.c
===
--- kvm.orig/virt/kvm/ioapic.c
+++ kvm/virt/kvm/ioapic.c
@@ -243,6 +243,7 @@ static void ioapic_mmio_read(struct kvm_
ioapic_debug("addr %lx\n", (unsigned long)addr);
ASSERT(!(addr & 0xf));  /* check alignment */
 
+   mutex_lock(&ioapic->kvm->irq_lock);
addr &= 0xff;
switch (addr) {
case IOAPIC_REG_SELECT:
@@ -269,6 +270,7 @@ static void ioapic_mmio_read(struct kvm_
default:
printk(KERN_WARNING "ioapic: wrong length %d\n", len);
}
+   mutex_unlock(&ioapic->kvm->irq_lock);
 }
 
 static void ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len,
@@ -280,6 +282,8 @@ static void ioapic_mmio_write(struct kvm
ioapic_debug("ioapic_mmio_write addr=%p len=%d val=%p\n",
 (void*)addr, len, val);
ASSERT(!(addr & 0xf));  /* check alignment */
+
+   mutex_lock(&ioapic->kvm->irq_lock);
if (len == 4 || len == 8)
data = *(u32 *) val;
else {
@@ -305,6 +309,7 @@ static void ioapic_mmio_write(struct kvm
default:
break;
}
+   mutex_unlock(&ioapic->kvm->irq_lock);
 }
 
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic)
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -979,6 +979,7 @@ static struct kvm *kvm_create_vm(void)
kvm_io_bus_init(&kvm->pio_bus);
kvm_irqfd_init(kvm);
mutex_init(&kvm->lock);
+   mutex_init(&kvm->irq_lock);
kvm_io_bus_init(&kvm->mmio_bus);
init_rwsem(&kvm->slots_lock);
atomic_set(&kvm->users_count, 1);

-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/4] KVM: x86: grab pic lock in kvm_pic_clear_isr_ack

2009-06-04 Thread Marcelo Tosatti
isr_ack is protected by kvm_pic->lock.

Signed-off-by: Marcelo Tosatti 

Index: kvm/arch/x86/kvm/i8259.c
===
--- kvm.orig/arch/x86/kvm/i8259.c
+++ kvm/arch/x86/kvm/i8259.c
@@ -72,8 +72,10 @@ static void pic_clear_isr(struct kvm_kpi
 void kvm_pic_clear_isr_ack(struct kvm *kvm)
 {
struct kvm_pic *s = pic_irqchip(kvm);
+   pic_lock(s);
s->pics[0].isr_ack = 0xff;
s->pics[1].isr_ack = 0xff;
+   pic_unlock(s);
 }
 
 /*

-- 

-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/4] KVM: move coalesced_mmio locking to its own device

2009-06-04 Thread Marcelo Tosatti
Move coalesced_mmio locking to its own device, instead of relying on
kvm->lock.

Signed-off-by: Marcelo Tosatti 

Index: kvm/virt/kvm/coalesced_mmio.c
===
--- kvm.orig/virt/kvm/coalesced_mmio.c
+++ kvm/virt/kvm/coalesced_mmio.c
@@ -31,10 +31,6 @@ static int coalesced_mmio_in_range(struc
if (!is_write)
return 0;
 
-   /* kvm->lock is taken by the caller and must be not released before
- * dev.read/write
- */
-
/* Are we able to batch it ? */
 
/* last is the first free entry
@@ -70,7 +66,7 @@ static void coalesced_mmio_write(struct 
struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
struct kvm_coalesced_mmio_ring *ring = dev->kvm->coalesced_mmio_ring;
 
-   /* kvm->lock must be taken by caller before call to in_range()*/
+   spin_lock(&dev->lock);
 
/* copy data in first free entry of the ring */
 
@@ -79,6 +75,7 @@ static void coalesced_mmio_write(struct 
memcpy(ring->coalesced_mmio[ring->last].data, val, len);
smp_wmb();
ring->last = (ring->last + 1) % KVM_COALESCED_MMIO_MAX;
+   spin_unlock(&dev->lock);
 }
 
 static void coalesced_mmio_destructor(struct kvm_io_device *this)
@@ -101,6 +98,7 @@ int kvm_coalesced_mmio_init(struct kvm *
dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
+   spin_lock_init(&dev->lock);
kvm_iodevice_init(&dev->dev, &coalesced_mmio_ops);
dev->kvm = kvm;
kvm->coalesced_mmio_dev = dev;
Index: kvm/virt/kvm/coalesced_mmio.h
===
--- kvm.orig/virt/kvm/coalesced_mmio.h
+++ kvm/virt/kvm/coalesced_mmio.h
@@ -12,6 +12,7 @@
 struct kvm_coalesced_mmio_dev {
struct kvm_io_device dev;
struct kvm *kvm;
+   spinlock_t lock;
int nb_zones;
struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];
 };

-- 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote:
>   
>> Michael S. Tsirkin wrote:
>> 
>>> As I'm new to qemu/kvm, to figure out how networking performance can be 
>>> improved, I
>>> went over the code and took some notes.  As I did this, I tried to record 
>>> ideas
>>> from recent discussions and ideas that came up on improving performance. 
>>> Thus
>>> this list.
>>>
>>> This includes a partial overview of networking code in a virtual 
>>> environment, with
>>> focus on performance: I'm only interested in sending and receiving packets,
>>> ignoring configuration etc.
>>>
>>> I have likely missed a ton of clever ideas and older discussions, and 
>>> probably
>>> misunderstood some code. Please pipe up with corrections, additions, etc. 
>>> And
>>> please don't take offence if I didn't attribute the idea correctly - most of
>>> them are marked mst by I don't claim they are original. Just let me know.
>>>
>>> And there are a couple of trivial questions on the code - I'll
>>> add answers here as they become available.
>>>
>>> I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
>>> well, and intend to dump updates there from time to time.
>>>   
>>>   
>> Hi Michael,
>>   Not sure if you have seen this, but I've already started to work on
>> the code for in-kernel devices and have a (currently non-virtio based)
>> proof-of-concept network device which you can for comparative data.  You
>> can find details here:
>>
>> http://lkml.org/lkml/2009/4/21/408
>>
>> 
>> 
>
> Thanks
>
>   
>> (Will look at your list later, to see if I can add anything)
>> 
>>> ---
>>>
>>> Short term plans: I plan to start out with trying out the following ideas:
>>>
>>> save a copy in qemu on RX side in case of a single nic in vlan
>>> implement virtio-host kernel module
>>>
>>> *detail on virtio-host-net kernel module project*
>>>
>>> virtio-host-net is a simple character device which gets memory layout 
>>> information
>>> from qemu, and uses this to convert between virtio descriptors to skbs.
>>> The skbs are then passed to/from raw socket (or we could bind virtio-host
>>> to physical device like raw socket does TBD).
>>>
>>> Interrupts will be reported to eventfd descriptors, and device will poll
>>> eventfd descriptors to get kicks from guest.
>>>
>>>   
>>>   
>> I currently have a virtio transport for vbus implemented, but it still
>> needs a virtio-net device-model backend written.
>> 
>
> You mean virtio-ring implementation?
>   

Right.

> I intended to basically start by reusing the code from
> Documentation/lguest/lguest.c
> Isn't this all there is to it?
>   

Not sure.  I reused the ring code already in the kernel.

>   
>>  If you are interested,
>> we can work on this together to implement your idea.  Its on my "todo"
>> list for vbus anyway, but I am currently distracted with the
>> irqfd/iosignalfd projects which are prereqs for vbus to be considered
>> for merge.
>>
>> Basically vbus is a framework for declaring in-kernel devices (not kvm
>> specific, per se) with a full security/containment model, a
>> hot-pluggable configuration engine, and a dynamically loadable 
>> device-model.  The framework takes care of the details of signal-path
>> and memory routing for you so that something like a virtio-net model can
>> be implemented once and work in a variety of environments such as kvm,
>> lguest, etc.
>>
>> Interested?
>> -Greg
>>
>> 
>
> It seems that a character device with a couple of ioctls would be simpler
> for an initial prototype.
>   

Suit yourself, but I suspect that by the time you build the prototype
you will either end up re-solving all the same problems anyway, or have
diminished functionality (or both).  Its actually very simple to declare
a new virtio-vbus device, but the choice is yours.  I can crank out a
skeleton for you, if you like.

-Greg




signature.asc
Description: OpenPGP digital signature


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Michael S. Tsirkin
On Thu, Jun 04, 2009 at 01:50:20PM -0400, Gregory Haskins wrote:
> Suit yourself, but I suspect that by the time you build the prototype
> you will either end up re-solving all the same problems anyway, or have
> diminished functionality (or both).

/me goes to look at vbus patches.


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Michael S. Tsirkin
On Thu, Apr 09, 2009 at 12:30:57PM -0400, Gregory Haskins wrote:
> +static unsigned long
> +task_memctx_copy_to(struct vbus_memctx *ctx, void *dst, const void *src,
> + unsigned long n)
> +{
> + struct task_memctx *tm = to_task_memctx(ctx);
> + struct task_struct *p = tm->task;
> +
> + while (n) {
> + unsigned long offset = ((unsigned long)dst)%PAGE_SIZE;
> + unsigned long len = PAGE_SIZE - offset;
> + int ret;
> + struct page *pg;
> + void *maddr;
> +
> + if (len > n)
> + len = n;
> +
> + down_read(&p->mm->mmap_sem);
> + ret = get_user_pages(p, p->mm,
> +  (unsigned long)dst, 1, 1, 0, &pg, NULL);
> +
> + if (ret != 1) {
> + up_read(&p->mm->mmap_sem);
> + break;
> + }
> +
> + maddr = kmap_atomic(pg, KM_USER0);
> + memcpy(maddr + offset, src, len);
> + kunmap_atomic(maddr, KM_USER0);
> + set_page_dirty_lock(pg);
> + put_page(pg);
> + up_read(&p->mm->mmap_sem);
> +
> + src += len;
> + dst += len;
> + n -= len;
> + }
> +
> + return n;
> +}

BTW, why did you decide to use get_user_pages?
Would switch_mm + copy_to_user work as well
avoiding page walk if all pages are present?

Also - if we just had vmexit because a process executed
io (or hypercall), can't we just do copy_to_user there?
Avi, I think at some point you said that we can?


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Michael S. Tsirkin
On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > As I'm new to qemu/kvm, to figure out how networking performance can be 
> > improved, I
> > went over the code and took some notes.  As I did this, I tried to record 
> > ideas
> > from recent discussions and ideas that came up on improving performance. 
> > Thus
> > this list.
> >
> > This includes a partial overview of networking code in a virtual 
> > environment, with
> > focus on performance: I'm only interested in sending and receiving packets,
> > ignoring configuration etc.
> >
> > I have likely missed a ton of clever ideas and older discussions, and 
> > probably
> > misunderstood some code. Please pipe up with corrections, additions, etc. 
> > And
> > please don't take offence if I didn't attribute the idea correctly - most of
> > them are marked mst by I don't claim they are original. Just let me know.
> >
> > And there are a couple of trivial questions on the code - I'll
> > add answers here as they become available.
> >
> > I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
> > well, and intend to dump updates there from time to time.
> >   
> 
> Hi Michael,
>   Not sure if you have seen this, but I've already started to work on
> the code for in-kernel devices and have a (currently non-virtio based)
> proof-of-concept network device which you can for comparative data.  You
> can find details here:
> 
> http://lkml.org/lkml/2009/4/21/408
> 
> 

Thanks

> (Will look at your list later, to see if I can add anything)
> > ---
> >
> > Short term plans: I plan to start out with trying out the following ideas:
> >
> > save a copy in qemu on RX side in case of a single nic in vlan
> > implement virtio-host kernel module
> >
> > *detail on virtio-host-net kernel module project*
> >
> > virtio-host-net is a simple character device which gets memory layout 
> > information
> > from qemu, and uses this to convert between virtio descriptors to skbs.
> > The skbs are then passed to/from raw socket (or we could bind virtio-host
> > to physical device like raw socket does TBD).
> >
> > Interrupts will be reported to eventfd descriptors, and device will poll
> > eventfd descriptors to get kicks from guest.
> >
> >   
> 
> I currently have a virtio transport for vbus implemented, but it still
> needs a virtio-net device-model backend written.

You mean virtio-ring implementation?
I intended to basically start by reusing the code from
Documentation/lguest/lguest.c
Isn't this all there is to it?

>  If you are interested,
> we can work on this together to implement your idea.  Its on my "todo"
> list for vbus anyway, but I am currently distracted with the
> irqfd/iosignalfd projects which are prereqs for vbus to be considered
> for merge.
> 
> Basically vbus is a framework for declaring in-kernel devices (not kvm
> specific, per se) with a full security/containment model, a
> hot-pluggable configuration engine, and a dynamically loadable 
> device-model.  The framework takes care of the details of signal-path
> and memory routing for you so that something like a virtio-net model can
> be implemented once and work in a variety of environments such as kvm,
> lguest, etc.
> 
> Interested?
> -Greg
> 

It seems that a character device with a couple of ioctls would be simpler
for an initial prototype.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
> As I'm new to qemu/kvm, to figure out how networking performance can be 
> improved, I
> went over the code and took some notes.  As I did this, I tried to record 
> ideas
> from recent discussions and ideas that came up on improving performance. Thus
> this list.
>
> This includes a partial overview of networking code in a virtual environment, 
> with
> focus on performance: I'm only interested in sending and receiving packets,
> ignoring configuration etc.
>
> I have likely missed a ton of clever ideas and older discussions, and probably
> misunderstood some code. Please pipe up with corrections, additions, etc. And
> please don't take offence if I didn't attribute the idea correctly - most of
> them are marked mst by I don't claim they are original. Just let me know.
>
> And there are a couple of trivial questions on the code - I'll
> add answers here as they become available.
>
> I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
> well, and intend to dump updates there from time to time.
>   

Hi Michael,
  Not sure if you have seen this, but I've already started to work on
the code for in-kernel devices and have a (currently non-virtio based)
proof-of-concept network device which you can for comparative data.  You
can find details here:

http://lkml.org/lkml/2009/4/21/408



(Will look at your list later, to see if I can add anything)
> ---
>
> Short term plans: I plan to start out with trying out the following ideas:
>
> save a copy in qemu on RX side in case of a single nic in vlan
> implement virtio-host kernel module
>
> *detail on virtio-host-net kernel module project*
>
> virtio-host-net is a simple character device which gets memory layout 
> information
> from qemu, and uses this to convert between virtio descriptors to skbs.
> The skbs are then passed to/from raw socket (or we could bind virtio-host
> to physical device like raw socket does TBD).
>
> Interrupts will be reported to eventfd descriptors, and device will poll
> eventfd descriptors to get kicks from guest.
>
>   

I currently have a virtio transport for vbus implemented, but it still
needs a virtio-net device-model backend written.  If you are interested,
we can work on this together to implement your idea.  Its on my "todo"
list for vbus anyway, but I am currently distracted with the
irqfd/iosignalfd projects which are prereqs for vbus to be considered
for merge.

Basically vbus is a framework for declaring in-kernel devices (not kvm
specific, per se) with a full security/containment model, a
hot-pluggable configuration engine, and a dynamically loadable 
device-model.  The framework takes care of the details of signal-path
and memory routing for you so that something like a virtio-net model can
be implemented once and work in a variety of environments such as kvm,
lguest, etc.

Interested?
-Greg



signature.asc
Description: OpenPGP digital signature


TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Michael S. Tsirkin
As I'm new to qemu/kvm, to figure out how networking performance can be 
improved, I
went over the code and took some notes.  As I did this, I tried to record ideas
from recent discussions and ideas that came up on improving performance. Thus
this list.

This includes a partial overview of networking code in a virtual environment, 
with
focus on performance: I'm only interested in sending and receiving packets,
ignoring configuration etc.

I have likely missed a ton of clever ideas and older discussions, and probably
misunderstood some code. Please pipe up with corrections, additions, etc. And
please don't take offence if I didn't attribute the idea correctly - most of
them are marked mst by I don't claim they are original. Just let me know.

And there are a couple of trivial questions on the code - I'll
add answers here as they become available.

I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
well, and intend to dump updates there from time to time.

Thanks,
MST

---

There are many ways to set up networking in a virtual machone.
here's one: linux guest -> virtio-net -> virtio-pci -> qemu+kvm -> tap -> 
bridge.
Let's take a look at this one.

Virtio is the guest side of things.

Guest kernel virtio-net:

TX:
- Guest kernel allocates a packet (skb) in guest kernel memory
  and fills it in with data, passes it to networking stack.
- The skb is passed on to guest network driver
  (hard_start_xmit)
- skbs in flight are kept in send queue linked list,
  so that we can flush them when device is removed
  [ mst: optimization idea: virtqueue already tracks
posted buffers. Add flush/purge operation and use that instead? ]
- skb is reformatted to scattergather format
  [ mst: idea to try: this does a copy for skb head,
which might be costly especially for small/linear packets.
Try to avoid this? Might need to tweak virtio interface.
  ]
- network driver adds the packet buffer on TX ring
- network driver does a kick which causes a VM exit
  [ mst: any way to mitigate # of VM exits here?
Possibly could be done on host side as well. ]
  [ markmc: All of our efforts there have been on the host side, I think
that's preferable than trying to do anything on the guest side. ]

- Full queue:
we keep a single extra skb around:
if we fail to transmit, we queue it
[ mst: idea to try: what does it do to
  performance if we queue more packets? ]
if we already have 1 outstanding packet,
we stop the queue and discard the new packet
[ mst: optimization idea: might be better to discard the old
  packet and queue the new one, e.g. with TCP old one
  might have timed out already ]
[ markmc: the queue might soon be going away:
   200905292346.04815.ru...@rustcorp.com.au
   
http://archive.netbsd.se/?ml=linux-netdev&a=2009-05&m=10788575
]

- We get each buffer from host as it is completed and free it
- TX interrupts are only enabled when queue is stopped,
  and when it is originally created (we disable them on completion)
  [ mst: idea: second part is probably unintentional.
todo: we probably should disable interrupts when device is created. 
]
- We poll for buffer completions:
  1. Before each TX 2. On a timer tasklet (unless 3 is supported)
  3. When host sends us interrupt telling us that the queue is empty
  [ mst: idea to try: instead of empty, enable send interrupts on xmit 
when
buffer is almost full (e.g. at least half empty): we are running 
out of
buffers, it's important to free them ASAP. Can be done
from host or from guest. ]
  [ Rusty proposing that we don't need (2) or (3) if the skbs are 
orphaned
before start_xmit(). See subj "net: skb_orphan on 
dev_hard_start_xmit".]
  [ rusty also seems to be suggesting that disabling 
VIRTIO_F_NOTIFY_ON_EMPTY
on the host should help the case where the host out-paces the guest
  ]
  4. when queue is stopped or when first packet was sent after device
 was created (interrupts are enabled then)


RX:
- There are really 2 mostly separate code paths: with mergeable
  rx buffers support in host and without. I focus on mergeable
  buffers here since this is the default in recent qemu.
  [mst: optimization idea: mark mergeable_rx_bufs as likely() then?]
- Each skb has a 128 byte buffer at head and a single page for data.
  Only full pages are passed to virtio buffers.
  [ mst: for large packets, managing the 128 head buffers is w

Re: NV-CUDA: a new way in virtualization is possible?

2009-06-04 Thread Pantelis Koukousoulas
> It would be possible to use this technology in the KVM/Qemu project to
> achieve better performance?
> It could be a significative step for the develop in virtualization
> technology?

Nothing is "impossible", but it is at least not obvious how to pull
off such a trick.
Qemu/KVM is not "embarrassingly parallelizable", at least not in a
straightforward
way imho.

> Someone, in experimental way, has (re)wrote the md-raid kernel modules using
> the CUDA framework to accelerate some features... and it seems that works
> fine.
> Why not for KVM/Qemu or related projects, including kernel/user-space
> extension?

RAID is "easy", as is FFT, graphics operations, cryptography etc. People
have been parallelizing these algorithms for several years before even nvidia
existed and CUDA is just a new backend to apply more or less the same
techniques.

KVM/Qemu on the other hand are not 100% CPU bound and are also not
trivial to massively parallelize, so you might find the task a bit hard.

HTH,
Pantelis
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NV-CUDA: a new way in virtualization is possible?

2009-06-04 Thread OneSoul

Hello all!

I'm a KVM/Qemu user for a long time and I'm very satisfied by its 
features of flexibility, power and portability - really a good project!


Recently, reading some technical articles over internet, I have 
discoverd the big potentialities of the NV-CUDA framework in relation to 
the scientific and graphic computing that takes strong advantage from 
the most recent GPUs. Someone has used it for password recovery, 
realtime rendering, etc, with great results.


It would be possible to use this technology in the KVM/Qemu project to 
achieve better performance?
It could be a significative step for the develop in virtualization 
technology?


Someone, in experimental way, has (re)wrote the md-raid kernel modules 
using the CUDA framework to accelerate some features... and it seems 
that works fine.
Why not for KVM/Qemu or related projects, including kernel/user-space 
extension?


What do you think about this draft idea?

Any feedback is welcome...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801212 ] sles10sp2 guest timer run too fast

2009-06-04 Thread SourceForge.net
Bugs item #2801212, was opened at 2009-06-04 08:17
Message generated for change (Tracker Item Submitted) made by jiajun
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
Assigned to: Nobody/Anonymous (nobody)
Summary: sles10sp2 guest timer run too fast 

Initial Comment:
With kvm.git Commit:7ff90748cebbfbafc8cfa6bdd633113cd9537789
qemu-kvm Commit:a1cd3c985c848dae73966f9601f15fbcade72f1, we found that 
sles10sp2 will run much faster than real, about 27s faster each after 60s real 
time.

Reproduce steps:

(1)qemu-system-x86_64  -m 1024 -net nic,macaddr=00:16:3e:6f:f3:d1,model=rtl8139 
-net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sles10sp2.img
(2)Run ntpdate in guest: ntpdate sync_machine_ip && sleep 60 && ntpdate 
sync_machine_ip

Current result:

sles10sp2rc1-guest:~ #  ntpdate sync_machine_ip && sleep 60 && ntpdate 
sync_machine_ip
31 May 23:16:59 ntpdate[3303]: step time server 192.168.198.248 offset
-61.27418
31 May 23:17:32 ntpdate[3305]: step time server 192.168.198.248 offset
-27.626469 sec

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2801212&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] apic/ioapic kvm free implementation

2009-06-04 Thread Gleb Natapov
On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote:
> Same thing,
> 
> addressing comments from gleb.
> 
> 
Jan, can you run your test on this one? It differs from previous one in
halt handling.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

(Applies to kvm.git/master:25deed73)

Please see the header for 2/2 for a description.  This patch series has been
fully tested and appears to be working correctly.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 04:49:50PM +0300, Avi Kivity wrote:
> Andi Kleen wrote:
> >>There's no good place as it breaks the nice exit handler table.  You 
> >>could put it in vmx_complete_interrupts() next to NMI handling.
> >>
> >
> >I think I came up with a easy cheesy but not too bad solution now that 
> >should work. It simply remembers the CPU in the vcpu structure and 
> >schedules back to it. That's fine for this purpose. 
> >  
> 
> We might be able schedule back in a timely manner.  Why not hack 
> vmx_complete_interrupts()?  You're still in the critical section so 
> you're guaranteed no delays or surprises.

Yes, have to do that. My original scheme was too risky because
the Machine checks have synchronization mechanisms now and 
preemption has no time limit.

I'll hack on it later today, hope fully have a patch tomorrow.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:
There's no good place as it breaks the nice exit handler table.  You 
could put it in vmx_complete_interrupts() next to NMI handling.



I think I came up with a easy cheesy but not too bad solution now that should 
work. It simply remembers the CPU in the vcpu structure and schedules back to 
it. That's fine for this purpose. 
  


We might be able schedule back in a timely manner.  Why not hack 
vmx_complete_interrupts()?  You're still in the critical section so 
you're guaranteed no delays or surprises.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support

2009-06-04 Thread Mark McLoughlin
Hi Greg,

On Wed, 2009-06-03 at 18:04 -0400, Gregory Haskins wrote:
> Hi Mark,
>   So with the v5 release of iosignalfd, we now have the notion of a
> "trigger", the API of which is as follows:
> 
> ---
> /*!
>  * \brief Assign an eventfd to an IO port (PIO or MMIO)
>  *
>  * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
>  * address range.  Any guest writes to the specified range will generate
>  * an eventfd signal.
>  *
>  * A data-match pointer can be optionally provided in "trigger" and only
>  * writes which match this value exactly will generate an event.  The length
>  * of the trigger is established by the length of the overall IO range, and
>  * therefore must be in a natural byte-width for the IO routines of your
>  * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64).

This looks like it'll work fine for virtio-pci.

Thanks,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 04:10:14PM +0300, Avi Kivity wrote:
> Andi Kleen wrote:
> >>vmcs access work because we have a preempt notifier called when we are 
> >>scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
> >>kvm_preempt_ops in virt/kvm_main.c.
> >>
> >
> >I see. So we need to move that check earlier. Do you have a preference
> >where it should be?
> >  
> 
> There's no good place as it breaks the nice exit handler table.  You 
> could put it in vmx_complete_interrupts() next to NMI handling.

I think I came up with a easy cheesy but not too bad solution now that should 
work. It simply remembers the CPU in the vcpu structure and schedules back to 
it. That's fine for this purpose. 

Currently testing the patch.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:
vmcs access work because we have a preempt notifier called when we are 
scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
kvm_preempt_ops in virt/kvm_main.c.



I see. So we need to move that check earlier. Do you have a preference
where it should be?
  


There's no good place as it breaks the nice exit handler table.  You 
could put it in vmx_complete_interrupts() next to NMI handling.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Flush icache after dma operations for ia64

2009-06-04 Thread Jes Sorensen

Zhang, Xiantao wrote:

Hi, Jes
Have you verified whether it works for you ?  You may run kernel build in 
the guest with 4 vcpus,  if it can be done successfully without any error, it 
should be Okay I think, otherwise, we may need to investigate it further. :)
Xiantao 


Hi Xiantao,

I was able to run a 16 vCPU guest and build the kernel using make -j 16.
How quickly would the problem show up for you, on every run, or should I
run more tests?

Cheers,
Jes
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] revert part of 3db8b916e merge

2009-06-04 Thread Avi Kivity

Gleb Natapov wrote:

kvm_*_mpstate() cannot be called from kvm_arch_*_registers()
since kvm_arch_*_registers() sometimes called from io thread, but
kvm_*_mpstate() can be called only by cpu thread.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Do not use cpu_index in interface between libkvm and qemu

2009-06-04 Thread Avi Kivity

Gleb Natapov wrote:

On vcpu creation cookie is returned which is used in future communication.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 03:49:03PM +0300, Avi Kivity wrote:
> Andi Kleen wrote:
> >>This assumption is incorrect.  This code is executed after preemption 
> >>has been enabled, and we may have even slept before reaching it.
> >>
> >
> >The only thing that counts here is the context before the machine
> >check event. If there was a vmexit we know it was in guest context.
> >
> >The only requirement we have is that we're running still on the same
> >CPU. I assume that's true, otherwise the vmcb accesses wouldn't work?
> >  
> 
> It's not true, we're in preemptible context and may have even slept.
> 
> vmcs access work because we have a preempt notifier called when we are 
> scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
> kvm_preempt_ops in virt/kvm_main.c.

I see. So we need to move that check earlier. Do you have a preference
where it should be?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:
This assumption is incorrect.  This code is executed after preemption 
has been enabled, and we may have even slept before reaching it.



The only thing that counts here is the context before the machine
check event. If there was a vmexit we know it was in guest context.

The only requirement we have is that we're running still on the same
CPU. I assume that's true, otherwise the vmcb accesses wouldn't work?
  


It's not true, we're in preemptible context and may have even slept.

vmcs access work because we have a preempt notifier called when we are 
scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
kvm_preempt_ops in virt/kvm_main.c.



We get both an explicit EXIT_REASON and an exception?



These are different cases. The exception is #MC in guest context,
the EXIT_REASON is when a #MC happens while the CPU is executing
the VM entry microcode.
  


I see, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl

2009-06-04 Thread Gregory Haskins
Assigning an irqfd object to a kvm object creates a relationship that we
currently manage by having the kvm oject acquire/hold a file* reference to
the underlying eventfd.  The lifetime of these objects is properly maintained
by decoupling the two objects whenever the irqfd is closed or kvm is closed,
whichever comes first.

However, the irqfd "close" method is less than ideal since it requires two
system calls to complete (one for ioctl(kvmfd, IRQFD_DEASSIGN), the other for
close(eventfd)).  This dual-call approach was utilized because there was no
notification mechanism on the eventfd side at the time irqfd was implemented.

Recently, Davide proposed a patch to send a POLLHUP wakeup whenever an
eventfd is about to close.  So we eliminate the IRQFD_DEASSIGN ioctl (*)
vector in favor of sensing the desassign automatically when the fd is closed.
The resulting code is slightly more complex as a result since we need to
allow either side to sever the relationship independently.  We utilize SRCU
to guarantee stable concurrent access to the KVM pointer without adding
additional atomic operations in the fast path.

At minimum, this design should be acked by both Davide and Paul (cc'd).

(*) The irqfd patch does not exist in any released tree, so the understanding
is that we can alter the irqfd specific ABI without taking the normal
precautions, such as CAP bits.

Signed-off-by: Gregory Haskins 
CC: Davide Libenzi 
CC: Michael S. Tsirkin 
CC: Paul E. McKenney 
---

 include/linux/kvm.h |2 -
 virt/kvm/eventfd.c  |  177 +++
 virt/kvm/kvm_main.c |3 +
 3 files changed, 81 insertions(+), 101 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 632a856..29b62cc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -482,8 +482,6 @@ struct kvm_x86_mce {
 };
 #endif
 
-#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
-
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f3f2ea1..004c660 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -37,39 +37,92 @@
  * 
  */
 struct _irqfd {
+   struct mutex  lock;
+   struct srcu_structsrcu;
struct kvm   *kvm;
int   gsi;
-   struct file  *file;
struct list_head  list;
poll_tablept;
wait_queue_head_t*wqh;
wait_queue_t  wait;
-   struct work_structwork;
+   struct work_structinject;
 };
 
 static void
 irqfd_inject(struct work_struct *work)
 {
-   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
-   struct kvm *kvm = irqfd->kvm;
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+   struct kvm *kvm;
+   int idx;
+
+   idx = srcu_read_lock(&irqfd->srcu);
+
+   kvm = rcu_dereference(irqfd->kvm);
+   if (kvm) {
+   mutex_lock(&kvm->lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+   mutex_unlock(&kvm->lock);
+   }
+
+   srcu_read_unlock(&irqfd->srcu, idx);
+}
+
+static void
+irqfd_disconnect(struct _irqfd *irqfd)
+{
+   struct kvm *kvm;
+
+   mutex_lock(&irqfd->lock);
+
+   kvm = rcu_dereference(irqfd->kvm);
+   rcu_assign_pointer(irqfd->kvm, NULL);
+
+   mutex_unlock(&irqfd->lock);
+
+   if (!kvm)
+   return;
 
mutex_lock(&kvm->lock);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
+   list_del(&irqfd->list);
mutex_unlock(&kvm->lock);
+
+   /*
+* It is important to not drop the kvm reference until the next grace
+* period because there might be lockless references in flight up
+* until then
+*/
+   synchronize_srcu(&irqfd->srcu);
+   kvm_put_kvm(kvm);
 }
 
 static int
 irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+   unsigned long flags = (unsigned long)key;
 
-   /*
-* The wake_up is called with interrupts disabled.  Therefore we need
-* to defer the IRQ injection until later since we need to acquire the
-* kvm->lock to do so.
-*/
-   schedule_work(&irqfd->work);
+   if (flags & POLLIN)
+   /*
+* The POLLIN wake_up is called with interrupts disabled.
+* Therefore we need to defer the IRQ injection until later
+* since we need to acquire the kvm->lock to do so.
+*/
+   schedule_work(&irqfd->inject);
+
+   if (flags & POLLHUP) {
+   /*
+ 

[KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Gregory Haskins
(Applies to kvm.git/master:25deed73)

Please see the header for 2/2 for a description.  This patch series has been
fully tested and appears to be working correctly.

[Review notes:
  *) Paul has looked at the SRCU design and, to my knowledge, didn't find
 any holes.
  *) Michael, Avi, and myself agree that while the removal of the DEASSIGN
 vector is not desirable, the fix on close() is more important in
 the short-term.  We can always add DEASSIGN support again in the
 future with a CAP bit.
]

[Changelog:

  v2:
 *) Pulled in Davide's official patch for 1/2 from his submission
accepted into -mmotm.
 *) Fixed patch 2/2 to use the "key" field as a bitmap in the wakeup
logic, per Davide's feedback.

  v1:
 *) Initial release
]
  

---

Davide Libenzi (1):
  Allow waiters to be notified about the eventfd file* going away, and give

Gregory Haskins (1):
  kvm: use POLLHUP to close an irqfd instead of an explicit ioctl


 fs/eventfd.c|   10 +++
 include/linux/kvm.h |2 -
 virt/kvm/eventfd.c  |  177 +++
 virt/kvm/kvm_main.c |3 +
 4 files changed, 90 insertions(+), 102 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v2 1/2] Allow waiters to be notified about the eventfd file* going away, and give

2009-06-04 Thread Gregory Haskins
From: Davide Libenzi 

them a change to unregister from the wait queue.  This is turn allows
eventfd users to use the eventfd file* w/out holding a live reference to
it.

After the eventfd user callbacks returns, any usage of the eventfd file*
should be dropped.  The eventfd user callback can acquire sleepy locks
since it is invoked lockless.

This is a feature, needed by KVM to avoid an awkward workaround when using
eventdf.

[gmh: pulled from -mmotm for inclusion in kvm.git]

Signed-off-by: Davide Libenzi 
Tested-by: Gregory Haskins 
Signed-off-by: Andrew Morton 
Signed-off-by: Gregory Haskins 
---

 fs/eventfd.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 3f0e197..72f5f8d 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -61,7 +61,15 @@ EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-   kfree(file->private_data);
+   struct eventfd_ctx *ctx = file->private_data;
+
+   /*
+* No need to hold the lock here, since we are on the file cleanup
+* path and the ones still attached to the wait queue will be
+* serialized by wake_up_locked_poll().
+*/
+   wake_up_locked_poll(&ctx->wqh, POLLHUP);
+   kfree(ctx);
return 0;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cleanup acpi table creation

2009-06-04 Thread Gleb Natapov
Current code is a mess. And addition of acpi tables is broken.

Signed-off-by: Gleb Natapov 
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 369cbef..fda4894 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1293,15 +1293,13 @@ struct rsdp_descriptor /* Root System 
Descriptor Pointer */
uint8_treserved [3];   /* Reserved 
field must be 0 */
 } __attribute__((__packed__));
 
-#define MAX_RSDT_ENTRIES 100
-
 /*
  * ACPI 1.0 Root System Description Table (RSDT)
  */
 struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
-   uint32_t table_offset_entry 
[MAX_RSDT_ENTRIES]; /* Array of pointers to other */
+   uint32_t table_offset_entry [0]; /* Array 
of pointers to other */
 /* ACPI tables */
 } __attribute__((__packed__));
 
@@ -1585,324 +1583,332 @@ static void acpi_build_srat_memory(struct 
srat_memory_affinity *numamem,
  return;
 }
 
-/* base_addr must be a multiple of 4KB */
-void acpi_bios_init(void)
+static void rsdp_build(struct rsdp_descriptor *rsdp, uint32_t rsdt)
 {
-struct rsdp_descriptor *rsdp;
-struct rsdt_descriptor_rev1 *rsdt;
-struct fadt_descriptor_rev1 *fadt;
-struct facs_descriptor_rev1 *facs;
-struct multiple_apic_table *madt;
-uint8_t *dsdt, *ssdt;
+ memset(rsdp, 0, sizeof(*rsdp));
+ memcpy(rsdp->signature, "RSD PTR ", 8);
 #ifdef BX_QEMU
-struct system_resource_affinity_table *srat;
-struct acpi_20_hpet *hpet;
-uint32_t hpet_addr;
-#endif
-uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, 
ssdt_addr;
-uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size;
-uint32_t srat_addr,srat_size;
-uint16_t i, external_tables;
-int nb_numa_nodes;
-int nb_rsdt_entries = 0;
-
-/* reserve memory space for tables */
-#ifdef BX_USE_EBDA_TABLES
-ebda_cur_addr = align(ebda_cur_addr, 16);
-rsdp = (void *)(ebda_cur_addr);
-ebda_cur_addr += sizeof(*rsdp);
+ memcpy(rsdp->oem_id, "QEMU  ", 6);
 #else
-bios_table_cur_addr = align(bios_table_cur_addr, 16);
-rsdp = (void *)(bios_table_cur_addr);
-bios_table_cur_addr += sizeof(*rsdp);
+ memcpy(rsdp->oem_id, "BOCHS ", 6);
 #endif
+ rsdp->rsdt_physical_address = rsdt;
+ rsdp->checksum = acpi_checksum((void*)rsdp, 20);
+}
 
-#ifdef BX_QEMU
-external_tables = acpi_additional_tables();
-#else
-external_tables = 0;
-#endif
+static uint32_t facs_build(uint32_t *addr)
+{
+ struct facs_descriptor_rev1 *facs;
 
-addr = base_addr = ram_size - ACPI_DATA_SIZE;
-rsdt_addr = addr;
-rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
-addr += rsdt_size;
+ *addr = (*addr + 63) & ~63; /* 64 byte alignment for FACS */
+ facs = (void*)(*addr);
+ *addr += sizeof(*facs);
 
-fadt_addr = addr;
-fadt = (void *)(addr);
-addr += sizeof(*fadt);
+ memset(facs, 0, sizeof(*facs));
+ memcpy(facs->signature, "FACS", 4);
+ facs->length = cpu_to_le32(sizeof(*facs));
+ BX_INFO("Firmware waking vector %p\n", &facs->firmware_waking_vector);
 
-/* XXX: FACS should be in RAM */
-addr = (addr + 63) & ~63; /* 64 byte alignment for FACS */
-facs_addr = addr;
-facs = (void *)(addr);
-addr += sizeof(*facs);
+ return (uint32_t)facs;
+}
 
-dsdt_addr = addr;
-dsdt = (void *)(addr);
-addr += sizeof(AmlCode);
+static uint32_t dsdt_build(uint32_t *addr)
+{
+ uint8_t *dsdt = (void*)(*addr);
 
-#ifdef BX_QEMU
-qemu_cfg_select(QEMU_CFG_NUMA);
-nb_numa_nodes = qemu_cfg_get64();
+ *addr += sizeof(AmlCode);
+
+ memcpy(dsdt, AmlCode, sizeof(AmlCode));
+
+ return (uint32_t)dsdt;
+}
+
+static uint32_t fadt_build(uint32_t *addr, uint32_t facs, uint32_t dsdt)
+{
+ struct fadt_descriptor_rev1 *fadt = (void*)(*addr);
+
+ *addr += sizeof(*fadt);
+ memset(fadt, 0, sizeof(*fadt));
+ fadt->firmware_ctrl = facs;
+ fadt->dsdt = dsdt;
+ fadt->model = 1;
+ fadt->reserved1 = 0;
+ fadt->sci_int = cpu_to_le16(pm_sci_int);
+ fadt->smi_cmd = cpu_to_le32(SMI_CMD_IO_ADDR);
+ fadt->acpi_enable = 0xf1;
+ fadt->acpi_disable = 0xf0;
+ fadt->pm1a_evt_blk = cpu_to_le32(pm_io_base);
+ fadt->pm1a_cnt_blk = cpu_to_le32(pm_io_base + 0x04);
+ fadt->pm_tmr_blk = cpu_to_le32(pm_io_base + 0x08);
+ fadt->pm1_evt_len = 4;
+ fadt->pm1_cnt_len = 2;
+ fadt->pm_tmr_len = 4;
+ fadt->plvl2_lat = cpu_to_le16(0xfff); // C2 state not supported
+ fadt->plvl3_lat = cpu_to_le16(0xfff); // C3 state not supported
+ fadt->gpe0_blk = cpu_to_le32(0xafe0);
+ fadt->gpe0_blk_len = 4;
+ /* WBINVD + PROC_C1 + SLP_BUTTON + FIX_RTC */
+ fadt->flags = cpu_to_le32((1 << 0) | (1 << 2) | (1 << 5) | (1 << 6));
+ acpi_build_table_header((struct acpi_table_header *)fadt, "

Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 02:48:17PM +0300, Avi Kivity wrote:
> Andi Kleen wrote:
> >[Avi could you please still consider this patch for your 2.6.31 patchqueue?
> >It's fairly simple, but important to handle memory errors in guests]
> >  
> 
> Oh yes, and it'll be needed for -stable.  IIUC, right now a machine 
> check is trapped by the guest, so the guest is killed instead of the host?

Yes the guest will receive int 18.

But it will not kill itmelf because the guest cannot access the machine check
MSRs, so it will not see any machine check. So it's kind of ignored,
which is pretty bad.

> 
> >+/*
> >+ * Trigger machine check on the host. We assume all the MSRs are already 
> >set up
> >+ * by the CPU and that we still run on the same CPU as the MCE occurred 
> >on.
> >+ * We pass a fake environment to the machine check handler because we want
> >+ * the guest to be always treated like user space, no matter what context
> >+ * it used internally.
> >+ */
> >  
> 
> This assumption is incorrect.  This code is executed after preemption 
> has been enabled, and we may have even slept before reaching it.

The only thing that counts here is the context before the machine
check event. If there was a vmexit we know it was in guest context.

The only requirement we have is that we're running still on the same
CPU. I assume that's true, otherwise the vmcb accesses wouldn't work?

> > [EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
> >+[EXIT_REASON_MACHINE_CHECK]   = handle_machine_check,
> > };
> > 
> > static const int kvm_vmx_max_exit_handlers =
> >  
> 
> We get both an explicit EXIT_REASON and an exception?

These are different cases. The exception is #MC in guest context,
the EXIT_REASON is when a #MC happens while the CPU is executing
the VM entry microcode.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v5 0/2] iosignalfd

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

Marcello, Avi, and myself have previously agreed that Marcello's
mmio-locking cleanup should go in first.   When that happens, I will
need to rebase this series because it changes how you interface to the
io_bus code.  I should have mentioned that here, but forgot.  (Speaking
of, is there an ETA when that code will be merged Avi?)
  


I had issues with the unbalanced locking the patchset introduced in 
coalesced_mmio, once these are resolved the patchset will be merged.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Bharata B Rao wrote:

2. Need for hard limiting CPU resource
--
- Pay-per-use: In enterprise systems that cater to multiple clients/customers
  where a customer demands a certain share of CPU resources and pays only
  that, CPU hard limits will be useful to hard limit the customer's job
  to consume only the specified amount of CPU resource.
- In container based virtualization environments running multiple containers,
  hard limits will be useful to ensure a container doesn't exceed its
  CPU entitlement.
- Hard limits can be used to provide guarantees.
  

How can hard limits provide guarantees?

Let's take an example where I have 1 group that I wish to guarantee a 
20% share of the cpu, and anther 8 groups with no limits or guarantees.


One way to achieve the guarantee is to hard limit each of the 8 other 
groups to 10%; the sum total of the limits is 80%, leaving 20% for the 
guarantee group. The downside is the arbitrary limit imposed on the 
other groups.


Another way is to place the 8 groups in a container group, and limit 
that to 80%. But that doesn't work if I want to provide guarantees to 
several groups.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

Since Paul ok'd (I think?) the srcu design, and the only other feedback
was the key-bitmap thing from Davide, I will go ahead and push a v2 with
just that one fix (unless there is any other feedback?)
  


I'll do a detailed review on your next posting.  When I see a long 
thread I go hide under the bed, where there is no Internet access.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
> Gregory Haskins wrote:
>>> I agree that deassign is needed for reasons of symmetry, and that it
>>> can be added later.
>>>
>>> 
>> Cool.
>>
>> FYI: Davide's patch has been accepted into -mm (Andrew CC'd).  I am not
>> sure of the protocol here, but I assume this means you can now safely
>> pull it from -mm into kvm.git so the prerequisite for 2/2 is properly
>> met.
>>   
>
> I'm not sure either.
>
> But I think I saw a "Thanks for catching that" for 2/2?
>
Ah, right!  I queued that fix up eons ago after David's feedback and
forgot that it was there waiting for me ;)

Since Paul ok'd (I think?) the srcu design, and the only other feedback
was the key-bitmap thing from Davide, I will go ahead and push a v2 with
just that one fix (unless there is any other feedback?)

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

I agree that deassign is needed for reasons of symmetry, and that it
can be added later.



Cool.

FYI: Davide's patch has been accepted into -mm (Andrew CC'd).  I am not
sure of the protocol here, but I assume this means you can now safely
pull it from -mm into kvm.git so the prerequisite for 2/2 is properly met.
  


I'm not sure either.

But I think I saw a "Thanks for catching that" for 2/2?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:

[Avi could you please still consider this patch for your 2.6.31 patchqueue?
It's fairly simple, but important to handle memory errors in guests]
  


Oh yes, and it'll be needed for -stable.  IIUC, right now a machine 
check is trapped by the guest, so the guest is killed instead of the host?



+/*
+ * Trigger machine check on the host. We assume all the MSRs are already set up
+ * by the CPU and that we still run on the same CPU as the MCE occurred on.
+ * We pass a fake environment to the machine check handler because we want
+ * the guest to be always treated like user space, no matter what context
+ * it used internally.
+ */
  


This assumption is incorrect.  This code is executed after preemption 
has been enabled, and we may have even slept before reaching it.


NMI suffers from the same issue, see vmx_complete_interrupts().  You 
could handle it the same way.



@@ -3150,6 +3171,7 @@
[EXIT_REASON_WBINVD]  = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
[EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
+   [EXIT_REASON_MACHINE_CHECK]   = handle_machine_check,
 };
 
 static const int kvm_vmx_max_exit_handlers =
  


We get both an explicit EXIT_REASON and an exception?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote:
>>  
 And having close not clean up the state unless you do an ioctl
 first is
 very messy IMO - I don't think you'll find any such examples in
 kernel.

 
>>> I agree, and that is why I am advocating this POLLHUP solution.  It was
>>> only this other way to begin with because the technology didn't exist
>>> until Davide showed me the light.
>>>
>>> Problem with your request is that I already looked into what is
>>> essentially a bi-directional reference problem (for a different reason)
>>> when I started the POLLHUP series.  Its messy to do this in a way that
>>> doesn't negatively impact the fast path (introducing locking, etc) or
>>> make my head explode making sure it doesn't race.  Afaict, we would
>>> need
>>> to solve this problem to do what you are proposing (patches welcome).
>>>
>>> If this hybrid decoupled-deassign + unified-close is indeed an
>>> important
>>> feature set, I suggest that we still consider this POLLHUP series for
>>> inclusion, and then someone can re-introduce DEASSIGN support in the
>>> future as a CAP bit extension.  That way we at least get the desirable
>>> close() properties that we both seem in favor of, and get this advanced
>>> use case when we need it (and can figure out the locking design).
>>>
>>> 
>>
>> FWIW, I took a look and yes, it is non-trivial.
>> I concur, we can always add the deassign ioctl later.
>>   
>
> I agree that deassign is needed for reasons of symmetry, and that it
> can be added later.
>
Cool.

FYI: Davide's patch has been accepted into -mm (Andrew CC'd).  I am not
sure of the protocol here, but I assume this means you can now safely
pull it from -mm into kvm.git so the prerequisite for 2/2 is properly met.

-Greg



signature.asc
Description: OpenPGP digital signature


[PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen

[Avi could you please still consider this patch for your 2.6.31 patchqueue?
It's fairly simple, but important to handle memory errors in guests]

VT-x needs an explicit MC vector intercept to handle machine checks in the 
hyper visor.

It also has a special option to catch machine checks that happen
during VT entry.

Do these interceptions and forward them to the Linux machine check
handler. Make it always look like user space is interrupted because
the machine check handler treats kernel/user space differently.

Thanks to Huang Ying and Jiang Yunhong for help and testing.

Cc: ying.hu...@intel.com
Signed-off-by: Andi Kleen 

---
 arch/x86/include/asm/vmx.h |1 +
 arch/x86/kvm/vmx.c |   26 --
 2 files changed, 25 insertions(+), 2 deletions(-)

Index: linux/arch/x86/include/asm/vmx.h
===
--- linux.orig/arch/x86/include/asm/vmx.h   2009-05-28 10:47:53.0 
+0200
+++ linux/arch/x86/include/asm/vmx.h2009-06-04 11:58:49.0 +0200
@@ -247,6 +247,7 @@
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_MACHINE_CHECK  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
 #define EXIT_REASON_EPT_VIOLATION   48
Index: linux/arch/x86/kvm/vmx.c
===
--- linux.orig/arch/x86/kvm/vmx.c   2009-05-28 10:47:53.0 +0200
+++ linux/arch/x86/kvm/vmx.c2009-06-04 12:05:44.0 +0200
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
@@ -478,7 +479,7 @@
 {
u32 eb;
 
-   eb = (1u << PF_VECTOR) | (1u << UD_VECTOR);
+   eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR);
if (!vcpu->fpu_active)
eb |= 1u << NM_VECTOR;
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
@@ -2585,6 +2586,23 @@
return 0;
 }
 
+/*
+ * Trigger machine check on the host. We assume all the MSRs are already set up
+ * by the CPU and that we still run on the same CPU as the MCE occurred on.
+ * We pass a fake environment to the machine check handler because we want
+ * the guest to be always treated like user space, no matter what context
+ * it used internally.
+ */
+static int handle_machine_check(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+   struct pt_regs regs = {
+   .cs = 3, /* Fake ring 3 no matter what the guest ran on */
+   .flags = X86_EFLAGS_IF,
+   };
+   do_machine_check(®s, 0);
+   return 1;
+}
+
 static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2596,6 +2614,10 @@
vect_info = vmx->idt_vectoring_info;
intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 
+   ex_no = intr_info & INTR_INFO_VECTOR_MASK;
+   if (ex_no == MCE_VECTOR)
+   return handle_machine_check(vcpu, kvm_run);
+
if ((vect_info & VECTORING_INFO_VALID_MASK) &&
!is_page_fault(intr_info))
printk(KERN_ERR "%s: unexpected, vectoring info 0x%x "
@@ -2648,7 +2670,6 @@
return 1;
}
 
-   ex_no = intr_info & INTR_INFO_VECTOR_MASK;
switch (ex_no) {
case DB_VECTOR:
dr6 = vmcs_readl(EXIT_QUALIFICATION);
@@ -3150,6 +3171,7 @@
[EXIT_REASON_WBINVD]  = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
[EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
+   [EXIT_REASON_MACHINE_CHECK]   = handle_machine_check,
 };
 
 static const int kvm_vmx_max_exit_handlers =
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [1/2] x86: MCE: Define MCE_VECTOR

2009-06-04 Thread Andi Kleen

[This patch is already in the "mce3" branch in mce3 tip, but I'm including
it here because it's needed for the next patch.]

Signed-off-by: Andi Kleen 

---
 arch/x86/include/asm/irq_vectors.h |1 +
 1 file changed, 1 insertion(+)

Index: linux/arch/x86/include/asm/irq_vectors.h
===
--- linux.orig/arch/x86/include/asm/irq_vectors.h   2009-05-27 
21:48:38.0 +0200
+++ linux/arch/x86/include/asm/irq_vectors.h2009-05-27 21:48:38.0 
+0200
@@ -25,6 +25,7 @@
  */
 
 #define NMI_VECTOR 0x02
+#define MCE_VECTOR 0x12
 
 /*
  * IDT vectors usable for external interrupt sources start
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote:
  

And having close not clean up the state unless you do an ioctl first is
very messy IMO - I don't think you'll find any such examples in kernel.

  
  

I agree, and that is why I am advocating this POLLHUP solution.  It was
only this other way to begin with because the technology didn't exist
until Davide showed me the light.

Problem with your request is that I already looked into what is
essentially a bi-directional reference problem (for a different reason)
when I started the POLLHUP series.  Its messy to do this in a way that
doesn't negatively impact the fast path (introducing locking, etc) or
make my head explode making sure it doesn't race.  Afaict, we would need
to solve this problem to do what you are proposing (patches welcome).

If this hybrid decoupled-deassign + unified-close is indeed an important
feature set, I suggest that we still consider this POLLHUP series for
inclusion, and then someone can re-introduce DEASSIGN support in the
future as a CAP bit extension.  That way we at least get the desirable
close() properties that we both seem in favor of, and get this advanced
use case when we need it (and can figure out the locking design).




FWIW, I took a look and yes, it is non-trivial.
I concur, we can always add the deassign ioctl later.
  


I agree that deassign is needed for reasons of symmetry, and that it can 
be added later.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] qemu-kvm: add irqfd support

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

irqfd lets you create an eventfd based file-desriptor to inject interrupts
to a kvm guest.  We associate one gsi per fd for fine-grained routing.

[note: this is meant to work in conjunction with the POLLHUP version of
 irqfd, which has not yet been accepted into kvm.git]
  


Applied with two changes: added a dependency on CONFIG_eventfd (with the 
kvm external module, you can have irqfd support without eventfd 
support), and adjusted for the new libkvm location (libkvm-all.[ch]).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Mark McLoughlin
On Thu, 2009-06-04 at 09:20 +0200, Gilles PIETRI wrote:

> I'm quite pissed off. I just upgraded to kvm-86 on a host that has 
> worked nicely on kvm-78 for quite some time. But since I was fearing the 
> qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the 
> performance, I decided to switch. How stupid that was. That was really 
> putting too much trust in KVM.

Jim has already responded with details on the first for the particular
issue, but speaking more generally ...

The kvm-XX releases are snapshots of the development tree. They do not
go to through the kind of stabilisation cycle you would expect from a
new kernel release, for example.

If you want a KVM version you can trust, use the latest qemu-kvm-0.x.y
release with the stock version of kvm.ko that comes with your kernel or,
if you need particular new features, the latest kvm-kmod-2.6.z release.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Gilles PIETRI

Le 04/06/2009 09:46, Jim Paris a écrit :

Gilles PIETRI wrote:

Hi,

I'm quite pissed off. I just upgraded to kvm-86 on a host that has  
worked nicely on kvm-78 for quite some time. But since I was fearing the  
qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the  
performance, I decided to switch. How stupid that was. That was really  
putting too much trust in KVM.


Now I can't have 64 bits CPUs on my guests.
My host is running a 2.6.27.7 kernel, and is x86_64 enabled.
Until the upgrade, guests were running x86_64 fine.
Now, it says long mode can't be used or something like that, and I can  
only have 32 bits guests.


Please see
  http://www.mail-archive.com/kvm@vger.kernel.org/msg15757.html
  http://www.mail-archive.com/kvm@vger.kernel.org/msg15769.html

-jim


Gonna check that, thanks a lot, this didn't get on my radar..

Regards,

Gilles
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Alexey Eromenko

- "Gilles PIETRI"  wrote:

> Hi,
> 
> I'm quite pissed off. I just upgraded to kvm-86 on a host that has 
> worked nicely on kvm-78 for quite some time. But since I was fearing
> the 
> qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the
> 
> performance, I decided to switch. How stupid that was. That was really
> 
> putting too much trust in KVM.
> 
> Now I can't have 64 bits CPUs on my guests.
> My host is running a 2.6.27.7 kernel, and is x86_64 enabled.
> Until the upgrade, guests were running x86_64 fine.
> Now, it says long mode can't be used or something like that, and I can
> 
> only have 32 bits guests.
> 
> Looks really like the bug explained here: 
> http://www.mail-archive.com/kvm@vger.kernel.org/msg09431.html
> 
> If I use -no-kvm, it works, but obviously, I want to be able to have
> kvm 
> support enabled.
> 
> Now, I really am happy about this upgrade, and I'm gonna have to roll
> it 
> back. I really would appreciate some help on this..
> 
> Gilles

Hi Gilles,

What are you saying is very strange, because KVM-Autotest has passed all tests 
for KVM-86 release,
and I can say that 64-bit guests work here. (both Intel & AMD, on RHEL 5.3/x64)

-Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >