Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-25 Thread Avi Kivity
On 09/24/2012 06:21 PM, Avi Kivity wrote:
 On 09/24/2012 06:13 PM, Peter Zijlstra wrote:
  On Mon, 2012-09-24 at 18:10 +0200, Avi Kivity wrote:
   Its also still a LAPIC write -- disguised as an MSR though :/
  
  It's probably a whole lot faster though. 
  
  I've been told its not, I haven't tried it.

 I'll see if I can find a machine with it (don't see it on my Westmere,
 it's probably on one of the Bridges.  Or maybe the other Peter knows.



Before measuring TSC_DEADLINE, I measured writing to TMICT, it costs 32
cycles.  This is on a Sandy Bridge.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Takuya Yoshikawa
On Fri, 21 Sep 2012 23:15:40 +0530
Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

  How about doing cond_resched() instead?
 
  Actually, an actual call to yield() may be better.
 
  That will set scheduler hints to make the scheduler pick
  another task for one round, while preserving this task's
  top position in the runqueue.
 
 I am not a scheduler expert, but I am also inclined towards
 Rik's suggestion here since we set skip buddy here. Takuya?
 

Yes, I think it's better.
But I hope that experts in Cc will suggest the best way.

Thanks,
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Avi Kivity
On 09/21/2012 03:00 PM, Raghavendra K T wrote:
 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 When PLE handler fails to find a better candidate to yield_to, it
 goes back and does spin again. This is acceptable when we do not
 have overcommit.
 But in overcommitted scenarios (especially when we have large
 number of small guests), it is better to yield.
 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  virt/kvm/kvm_main.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 8323685..713b677 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1660,6 +1660,10 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   }
   }
   }
 + /* In overcommitted cases, yield instead of spinning */
 + if (!yielded  rq_nr_running()  1)
 + schedule();
 +

I think this is a no-op these (CFS) days.  To get schedule() to do
anything, you need to wake up a task, or let time pass, or block.
Otherwise it will see that nothing has changed and as far as it's
concerned you're still the best task to be running (otherwise it
wouldn't have picked you in the first place).


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:26 +0200, Avi Kivity wrote:
 I think this is a no-op these (CFS) days.  To get schedule() to do
 anything, you need to wake up a task, or let time pass, or block.
 Otherwise it will see that nothing has changed and as far as it's
 concerned you're still the best task to be running (otherwise it
 wouldn't have picked you in the first place). 

Time could have passed enough before calling this that there's now a
different/more eligible task around to schedule.

Esp. for a !PREEMPT kernel this is could be significant.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Avi Kivity
On 09/24/2012 05:34 PM, Peter Zijlstra wrote:
 On Mon, 2012-09-24 at 17:26 +0200, Avi Kivity wrote:
 I think this is a no-op these (CFS) days.  To get schedule() to do
 anything, you need to wake up a task, or let time pass, or block.
 Otherwise it will see that nothing has changed and as far as it's
 concerned you're still the best task to be running (otherwise it
 wouldn't have picked you in the first place). 
 
 Time could have passed enough before calling this that there's now a
 different/more eligible task around to schedule.

Wouldn't this correspond to the scheduler interrupt firing and causing a
reschedule?  I thought the timer was programmed for exactly the point in
time that CFS considers the right time for a switch.  But I'm basing
this on my mental model of CFS, not CFS itself.

 Esp. for a !PREEMPT kernel this is could be significant.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:43 +0200, Avi Kivity wrote:
 Wouldn't this correspond to the scheduler interrupt firing and causing a
 reschedule?  I thought the timer was programmed for exactly the point in
 time that CFS considers the right time for a switch.  But I'm basing
 this on my mental model of CFS, not CFS itself. 

No, we tried this for hrtimer kernels for a while, but programming
hrtimers the whole time (every actual task-switch) turns out to be far
too expensive. So we're back to HZ ticks and 'polling' the preemption
state.

Even if we remove all the hrtimer infrastructure overhead (can do with a
few hacks) setting the hardware requires going out to the LAPIC, which
is stupid slow.

Some hardware actually has fast/reliable/usable timers, sadly none of it
is popular.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Avi Kivity
On 09/24/2012 05:52 PM, Peter Zijlstra wrote:
 On Mon, 2012-09-24 at 17:43 +0200, Avi Kivity wrote:
 Wouldn't this correspond to the scheduler interrupt firing and causing a
 reschedule?  I thought the timer was programmed for exactly the point in
 time that CFS considers the right time for a switch.  But I'm basing
 this on my mental model of CFS, not CFS itself. 
 
 No, we tried this for hrtimer kernels for a while, but programming
 hrtimers the whole time (every actual task-switch) turns out to be far
 too expensive. So we're back to HZ ticks and 'polling' the preemption
 state.

Ok, so I wasn't completely off base.

With HZ=1000, we can only be faster than the poll by a millisecond than
the interrupt-driven schedule(), and we need to be a lot faster.

 Even if we remove all the hrtimer infrastructure overhead (can do with a
 few hacks) setting the hardware requires going out to the LAPIC, which
 is stupid slow.
 
 Some hardware actually has fast/reliable/usable timers, sadly none of it
 is popular.

There is the TSC deadline timer mode of newer Intels.  Programming the
timer is a simple wrmsr, and it will fire immediately if it already
expired.  Unfortunately on AMDs it is not available, and on virtual
hardware it will be slow (~1-2 usec).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 17:58 +0200, Avi Kivity wrote:
 There is the TSC deadline timer mode of newer Intels.  Programming the
 timer is a simple wrmsr, and it will fire immediately if it already
 expired.  Unfortunately on AMDs it is not available, and on virtual
 hardware it will be slow (~1-2 usec). 

Its also still a LAPIC write -- disguised as an MSR though :/

Also, who gives a hoot about virtual crap ;-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Avi Kivity
On 09/24/2012 06:05 PM, Peter Zijlstra wrote:
 On Mon, 2012-09-24 at 17:58 +0200, Avi Kivity wrote:
 There is the TSC deadline timer mode of newer Intels.  Programming the
 timer is a simple wrmsr, and it will fire immediately if it already
 expired.  Unfortunately on AMDs it is not available, and on virtual
 hardware it will be slow (~1-2 usec). 
 
 Its also still a LAPIC write -- disguised as an MSR though :/

It's probably a whole lot faster though.

 Also, who gives a hoot about virtual crap ;-)

I only mentioned it to see if your virtual crap detector is still
working.  Looks like it's still in top condition, low latency and 100%
hit rate.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Peter Zijlstra
On Mon, 2012-09-24 at 18:10 +0200, Avi Kivity wrote:
  Its also still a LAPIC write -- disguised as an MSR though :/
 
 It's probably a whole lot faster though. 

I've been told its not, I haven't tried it.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-24 Thread Avi Kivity
On 09/24/2012 06:13 PM, Peter Zijlstra wrote:
 On Mon, 2012-09-24 at 18:10 +0200, Avi Kivity wrote:
  Its also still a LAPIC write -- disguised as an MSR though :/
 
 It's probably a whole lot faster though. 
 
 I've been told its not, I haven't tried it.

I'll see if I can find a machine with it (don't see it on my Westmere,
it's probably on one of the Bridges.  Or maybe the other Peter knows.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-21 Thread Rik van Riel

On 09/21/2012 08:00 AM, Raghavendra K T wrote:

From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

When PLE handler fails to find a better candidate to yield_to, it
goes back and does spin again. This is acceptable when we do not
have overcommit.
But in overcommitted scenarios (especially when we have large
number of small guests), it is better to yield.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com


Acked-by: Rik van Riel r...@redhat.com

--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-21 Thread Takuya Yoshikawa
On Fri, 21 Sep 2012 17:30:20 +0530
Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 From: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 
 When PLE handler fails to find a better candidate to yield_to, it
 goes back and does spin again. This is acceptable when we do not
 have overcommit.
 But in overcommitted scenarios (especially when we have large
 number of small guests), it is better to yield.
 
 Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
 Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
 ---
  virt/kvm/kvm_main.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 8323685..713b677 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1660,6 +1660,10 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
   }
   }
   }
 + /* In overcommitted cases, yield instead of spinning */
 + if (!yielded  rq_nr_running()  1)
 + schedule();

How about doing cond_resched() instead?

I'm not sure whether checking more sched stuff in KVM code is a
good thing.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-21 Thread Rik van Riel

On 09/21/2012 09:46 AM, Takuya Yoshikawa wrote:

On Fri, 21 Sep 2012 17:30:20 +0530
Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:


From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

When PLE handler fails to find a better candidate to yield_to, it
goes back and does spin again. This is acceptable when we do not
have overcommit.
But in overcommitted scenarios (especially when we have large
number of small guests), it is better to yield.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
  virt/kvm/kvm_main.c |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8323685..713b677 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1660,6 +1660,10 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
}
}
}
+   /* In overcommitted cases, yield instead of spinning */
+   if (!yielded  rq_nr_running()  1)
+   schedule();


How about doing cond_resched() instead?


Actually, an actual call to yield() may be better.

That will set scheduler hints to make the scheduler pick
another task for one round, while preserving this task's
top position in the runqueue.

--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: Be courteous to other VMs in overcommitted scenario in PLE handler

2012-09-21 Thread Raghavendra K T

On 09/21/2012 07:22 PM, Rik van Riel wrote:

On 09/21/2012 09:46 AM, Takuya Yoshikawa wrote:

On Fri, 21 Sep 2012 17:30:20 +0530
Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:


From: Raghavendra K T raghavendra...@linux.vnet.ibm.com

When PLE handler fails to find a better candidate to yield_to, it
goes back and does spin again. This is acceptable when we do not
have overcommit.
But in overcommitted scenarios (especially when we have large
number of small guests), it is better to yield.

Reviewed-by: Srikar Dronamraju sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
virt/kvm/kvm_main.c | 4 
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8323685..713b677 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1660,6 +1660,10 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
}
}
}
+ /* In overcommitted cases, yield instead of spinning */
+ if (!yielded  rq_nr_running()  1)
+ schedule();


How about doing cond_resched() instead?


Actually, an actual call to yield() may be better.

That will set scheduler hints to make the scheduler pick
another task for one round, while preserving this task's
top position in the runqueue.


I am not a scheduler expert, but I am also inclined towards
Rik's suggestion here since we set skip buddy here. Takuya?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html