Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-07-06 Thread Marcelo Tosatti
On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote:
 On Wed, 20 Jun 2012 01:50:50 +0530
 Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:
 
  
  In ple handler code, last_boosted_vcpu (lbv) variable is
  serving as reference point to start when we enter.
 
  Also statistical analysis (below) is showing lbv is not very well
  distributed with current approach.
 
 You are the second person to spot this bug today (yes, today).
 
 Due to time zones, the first person has not had a chance yet to
 test the patch below, which might fix the issue...
 
 Please let me know how it goes.
 
 8
 
 If last_boosted_vcpu == 0, then we fall through all test cases and
 may end up with all VCPUs pouncing on vcpu 0.  With a large enough
 guest, this can result in enormous runqueue lock contention, which
 can prevent vcpu0 from running, leading to a livelock.
 
 Changing  to = makes sure we properly handle that case.
 
 Signed-off-by: Rik van Riel r...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-07-05 Thread Andrew Theurer
On Mon, 2012-07-02 at 10:49 -0400, Rik van Riel wrote:
 On 06/28/2012 06:55 PM, Vinod, Chegu wrote:
  Hello,
 
  I am just catching up on this email thread...
 
  Perhaps one of you may be able to help answer this query.. preferably along 
  with some data.  [BTW, I do understand the basic intent behind PLE in a 
  typical [sweet spot] use case where there is over subscription etc. and the 
  need to optimize the PLE handler in the host etc. ]
 
  In a use case where the host has fewer but much larger guests (say 40VCPUs 
  and higher) and there is no over subscription (i.e. # of vcpus across 
  guests= physical cpus in the host  and perhaps each guest has their vcpu's 
  pinned to specific physical cpus for other reasons), I would like to 
  understand if/how  the PLE really helps ?  For these use cases would it be 
  ok to turn PLE off (ple_gap=0) since is no real need to take an exit and 
  find some other VCPU to yield to ?
 
 Yes, that should be ok.
 
 On a related note, I wonder if we should increase the ple_gap
 significantly.
 
 After all, 4096 cycles of spinning is not that much, when you
 consider how much time is spent doing the subsequent vmexit,
 scanning the other VCPU's status (200 cycles per cache miss),
 deciding what to do, maybe poking another CPU, and eventually
 a vmenter.
 
 A factor 4 increase in ple_gap might be what it takes to
 get the amount of time spent spinning equal to the amount of
 time spent on the host side doing KVM stuff...

I was recently thinking the same thing as I have observed over 180,000
exits/sec from a 40-way VM on a 80-way host, where there should be no
cpu overcommit.  Also, the number of directed yields for this was only
1800/sec, so we have a 1% usefulness for our exits.  I am wondering if
the ple_window should be similar to the host scheduler task switching
granularity, and not what we think a typical max cycles should be for
holding a lock.

BTW, I have a patch to add a couple PLE stats to kvmstat which I will
send out shortly.

-Andrew




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-07-02 Thread Rik van Riel

On 06/28/2012 06:55 PM, Vinod, Chegu wrote:

Hello,

I am just catching up on this email thread...

Perhaps one of you may be able to help answer this query.. preferably along 
with some data.  [BTW, I do understand the basic intent behind PLE in a typical 
[sweet spot] use case where there is over subscription etc. and the need to 
optimize the PLE handler in the host etc. ]

In a use case where the host has fewer but much larger guests (say 40VCPUs and 
higher) and there is no over subscription (i.e. # of vcpus across guests= 
physical cpus in the host  and perhaps each guest has their vcpu's pinned to 
specific physical cpus for other reasons), I would like to understand if/how  the 
PLE really helps ?  For these use cases would it be ok to turn PLE off (ple_gap=0) 
since is no real need to take an exit and find some other VCPU to yield to ?


Yes, that should be ok.

On a related note, I wonder if we should increase the ple_gap
significantly.

After all, 4096 cycles of spinning is not that much, when you
consider how much time is spent doing the subsequent vmexit,
scanning the other VCPU's status (200 cycles per cache miss),
deciding what to do, maybe poking another CPU, and eventually
a vmenter.

A factor 4 increase in ple_gap might be what it takes to
get the amount of time spent spinning equal to the amount of
time spent on the host side doing KVM stuff...

--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-07-02 Thread Raghavendra K T

On 07/02/2012 08:19 PM, Rik van Riel wrote:

On 06/28/2012 06:55 PM, Vinod, Chegu wrote:

Hello,

I am just catching up on this email thread...

Perhaps one of you may be able to help answer this query.. preferably
along with some data. [BTW, I do understand the basic intent behind
PLE in a typical [sweet spot] use case where there is over
subscription etc. and the need to optimize the PLE handler in the host
etc. ]

In a use case where the host has fewer but much larger guests (say
40VCPUs and higher) and there is no over subscription (i.e. # of vcpus
across guests= physical cpus in the host and perhaps each guest has
their vcpu's pinned to specific physical cpus for other reasons), I
would like to understand if/how the PLE really helps ? For these use
cases would it be ok to turn PLE off (ple_gap=0) since is no real need
to take an exit and find some other VCPU to yield to ?


Yes, that should be ok.


I think this should be true when we have ple_window tuned to correct
value for guest. (same what you raised)

But otherwise, IMO, it is a very tricky question to answer. PLE is
currently benefiting even flush_tlb_ipi etc apart from spinlock. Having
a properly tuned value for all types of workload, (+load) is really
complicated.
Coming back to ple_handler, IMHO, if we have slight increase in
run_queue length, having directed yield may worsen the scenario.

(In the case Vinod explained, even-though we will succeed in setting
other vcpu task as next_buddy, caller itself gets scheduled out, so
ganging effect reduces. on top of this we always have a question, have 
we chosen right guy OR a really bad guy for yielding.)




On a related note, I wonder if we should increase the ple_gap
significantly.


Did you mean ple_window?



After all, 4096 cycles of spinning is not that much, when you
consider how much time is spent doing the subsequent vmexit,
scanning the other VCPU's status (200 cycles per cache miss),
deciding what to do, maybe poking another CPU, and eventually
a vmenter.

A factor 4 increase in ple_gap might be what it takes to
get the amount of time spent spinning equal to the amount of
time spent on the host side doing KVM stuff...



I agree, I am experimenting with all these things left and right, along
with several optimization ideas I have. Hope to comeback on the
experiments soon.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-28 Thread Andrew Jones


- Original Message -
 In summary, current PV has huge benefit on non-PLE machine.
 
 On PLE machine, the results become very sensitive to load, type of
 workload and SPIN_THRESHOLD. Also PLE interference has significant
 effect on them. But still it has slight edge over non PV.
 

Hi Raghu,

sorry for my slow response. I'm on vacation right now (until the
9th of July) and I have limited access to mail. Also, thanks for
continuing the benchmarking. Question, when you compare PLE vs.
non-PLE, are you using different machines (one with and one
without), or are you disabling its use by loading the kvm module
with the ple_gap=0 modparam as I did?

Drew
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-28 Thread Raghavendra K T

On 06/28/2012 09:30 PM, Andrew Jones wrote:



- Original Message -

In summary, current PV has huge benefit on non-PLE machine.

On PLE machine, the results become very sensitive to load, type of
workload and SPIN_THRESHOLD. Also PLE interference has significant
effect on them. But still it has slight edge over non PV.



Hi Raghu,

sorry for my slow response. I'm on vacation right now (until the
9th of July) and I have limited access to mail.


Ok. Happy Vacation :)

Also, thanks for

continuing the benchmarking. Question, when you compare PLE vs.
non-PLE, are you using different machines (one with and one
without), or are you disabling its use by loading the kvm module
with the ple_gap=0 modparam as I did?


Yes, I am doing the same when I say with PLE disabled and comparing the
benchmarks (i.e loading kvm module with ple_gap=0).

But older non-PLE results were on a different machine altogether. (I
had limited access to PLE machine).


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-28 Thread Vinod, Chegu
Hello,

I am just catching up on this email thread... 

Perhaps one of you may be able to help answer this query.. preferably along 
with some data.  [BTW, I do understand the basic intent behind PLE in a typical 
[sweet spot] use case where there is over subscription etc. and the need to 
optimize the PLE handler in the host etc. ]

In a use case where the host has fewer but much larger guests (say 40VCPUs and 
higher) and there is no over subscription (i.e. # of vcpus across guests = 
physical cpus in the host  and perhaps each guest has their vcpu's pinned to 
specific physical cpus for other reasons), I would like to understand if/how  
the PLE really helps ?  For these use cases would it be ok to turn PLE off 
(ple_gap=0) since is no real need to take an exit and find some other VCPU to 
yield to ? 

Thanks
Vinod

-Original Message-
From: Raghavendra K T [mailto:raghavendra...@linux.vnet.ibm.com] 
Sent: Thursday, June 28, 2012 9:22 AM
To: Andrew Jones
Cc: Rik van Riel; Marcelo Tosatti; Srikar; Srivatsa Vaddagiri; Peter Zijlstra; 
Nikunj A. Dadhania; KVM; LKML; Gleb Natapov; Vinod, Chegu; Jeremy Fitzhardinge; 
Avi Kivity; Ingo Molnar
Subject: Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

On 06/28/2012 09:30 PM, Andrew Jones wrote:


 - Original Message -
 In summary, current PV has huge benefit on non-PLE machine.

 On PLE machine, the results become very sensitive to load, type of 
 workload and SPIN_THRESHOLD. Also PLE interference has significant 
 effect on them. But still it has slight edge over non PV.


 Hi Raghu,

 sorry for my slow response. I'm on vacation right now (until the 9th 
 of July) and I have limited access to mail.

Ok. Happy Vacation :)

Also, thanks for
 continuing the benchmarking. Question, when you compare PLE vs.
 non-PLE, are you using different machines (one with and one without), 
 or are you disabling its use by loading the kvm module with the 
 ple_gap=0 modparam as I did?

Yes, I am doing the same when I say with PLE disabled and comparing the 
benchmarks (i.e loading kvm module with ple_gap=0).

But older non-PLE results were on a different machine altogether. (I had 
limited access to PLE machine).




Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-27 Thread Raghavendra K T

On 06/24/2012 12:04 AM, Raghavendra K T wrote:

On 06/23/2012 02:30 AM, Raghavendra K T wrote:

On 06/22/2012 08:41 PM, Andrew Jones wrote:

[...]

My run for other benchmarks did not have Rik's patches, so re-spinning
everything with that now.

Here is the detailed info on env and benchmark I am currently trying.
Let me know if you have any comments

===
kernel 3.5.0-rc1 with Rik's Ple handler fix as base

Machine : Intel(R) Xeon(R) CPU X7560 @ 2.27GHz, 4 numa node, 256GB RAM,
32 core machine

Host: enterprise linux gcc version 4.4.6 20120305 (Red Hat 4.4.6-4)
(GCC) with test kernels
Guest: fedora 16 with different built-in kernel from same source tree.
32 vcpus 8GB memory. (configs not changed with patches except for
CONFIG_PARAVIRT_SPINLOCK)

Note: for Pv patches, SPIN_THRESHOLD is set to 4k

Benchmarks:
1) kernbench: kernbench-0.50

cmd:
echo 3  /proc/sys/vm/drop_caches
ccache -C
kernbench -f -H -M -o 2*vcpu

Very first run in kernbench is omitted.

2) dbench: dbench version 4.00
cmd: dbench --warmup=30 -t 120 2*vcpu

3) hackbench:
https://build.opensuse.org/package/files?package=hackbenchproject=benchmark

hackbench.c modified with loops=1
used hackbench with num-threads = 2* vcpu

4) Specjbb: specjbb2000-1.02
Input Properties:
ramp_up_seconds = 30
measurement_seconds = 120
forcegc = true
starting_number_warehouses = 1
increment_number_warehouses = 1
ending_number_warehouses = 8


5) sysbench: 0.4.12
sysbench --test=oltp --db-driver=pgsql prepare
sysbench --num-threads=2*vcpu --max-requests=10 --test=oltp
--oltp-table-size=50 --db-driver=pgsql --oltp-read-only run
Note that driver for this pgsql.


6) ebizzy: release 0.3
cmd: ebizzy -S 120

- specjbb ran for 1x and 2x others mostly for 1x, 2x, 3x overcommit.
- overcommit of 2x means same benchmark running on 2 guests.
- sample for each overcommit is mostly 8

Note: I ran kernbench with old kernbench0.50, may be I can try kcbench
with ramfs if necessary

will soon come with detailed results


With the above env, Here is the result I have for 4k SPIN_THRESHOLD.

Lower is better for following benchmarks:
kernbench: (time in sec)
hackbench: (time in sec)
sysbench : (time in sec)

Higher is better for following benchmarks:
specjbb: score (Throughput)
dbench : Throughput in MB/sec
ebizzy : records/sec

In summary, current PV has huge benefit on non-PLE machine.

On PLE machine, the results become very sensitive to load, type of
workload and SPIN_THRESHOLD. Also PLE interference has significant
effect on them. But still it has slight edge over non PV.

Overall, specjbb, sysbench, kernbench seem to do well with PV.

dbench has been little unreliable (same reason I have not published
2x, 3x result but experimental values are included in tarball) but
seem to be on par with PV

hackbench non-overcommit case is better and ebizzy overcommit case is 
better.

[ebizzy seems to very sensitive w.r.t SPIN_THRESHOLD].

I have still not experimented with SPIN_THRESHOLD of 2k/8k and w/, w/o PLE
after having Rik's fix.

+---+---+---++-+
  specjbb
+---+---+---++-+
|   value   |   stdev   |   value   |stdev   | %improve|
+---+---+---++-+
|114232.2500|21774.0660 |122591.| 18239.0900 | 7.31733 |
|112154.5000|19696.6860 |113386.2500| 22262.5890 | 1.09826 |
+---+---+---++-+

+---+---+---++-+
  kernbench
+---+---+---++-+
|   value   |   stdev   |   value   |stdev   | %improve|
+---+---+---++-+
|   48.9150 |   0.8608  |   48.5550 |   0.7372   | 0.74143 |
|   96.3691 |   7.9724  |   96.6367 |   1.6938   |-0.27691 |
|  192.6972 |   9.1881  |  188.3195 |   8.1267   | 2.32461 |
|  320.6500 |  29.6892  |  302.1225 |  16.0515   | 6.13245 |
++---+---+---++-+

+---+---+---++-+
  sysbench
+---+---+---++-+
|   value   |   stdev   |   value   |stdev   | %improve|
+---+---+---++-+
|   12.4082 |   0.2370  |   12.2797 |   0.1037   | 1.04644 |
|   14.1705 |   0.4272  |   14.0300 |   1.1478   | 1.00143 |
|   19.3769 |   1.0833  |   18.9745 |   0.0560   | 2.12074 |
|   24.5373 |   1.3237  |   22.3078 |   0.8999   | 9.99426 |
+---+---+---++-+

+---+---+---++-+
  hackbench
+---+---+---++-+
|   value   |   stdev   |   value   |stdev   | %improve|
+---+---+---++-+
|   73.2627 |  11.2413  |   67.5125 |   

Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case with benchmark detail attachment

2012-06-27 Thread Raghavendra K T

On 06/28/2012 01:57 AM, Raghavendra K T wrote:

On 06/24/2012 12:04 AM, Raghavendra K T wrote:

On 06/23/2012 02:30 AM, Raghavendra K T wrote:

On 06/22/2012 08:41 PM, Andrew Jones wrote:

[...]


(benchmark values will be attached in reply to this mail)


pv_benchmark_summary.bz2
Description: application/bzip


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-27 Thread Raghavendra K T

On 06/21/2012 12:13 PM, Gleb Natapov wrote:

On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote:

On Wed, 20 Jun 2012 01:50:50 +0530
Raghavendra K Traghavendra...@linux.vnet.ibm.com  wrote:



In ple handler code, last_boosted_vcpu (lbv) variable is
serving as reference point to start when we enter.



Also statistical analysis (below) is showing lbv is not very well
distributed with current approach.


You are the second person to spot this bug today (yes, today).

Due to time zones, the first person has not had a chance yet to
test the patch below, which might fix the issue...

Please let me know how it goes.

8

If last_boosted_vcpu == 0, then we fall through all test cases and
may end up with all VCPUs pouncing on vcpu 0.  With a large enough
guest, this can result in enormous runqueue lock contention, which
can prevent vcpu0 from running, leading to a livelock.

Changing  to= makes sure we properly handle that case.

Signed-off-by: Rik van Rielr...@redhat.com
---
  virt/kvm/kvm_main.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..1da542b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 */
for (pass = 0; pass  2  !yielded; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!pass  i  last_boosted_vcpu) {
+   if (!pass  i= last_boosted_vcpu) {
i = last_boosted_vcpu;
continue;
} else if (pass  i  last_boosted_vcpu)


Looks correct. We can simplify this by introducing something like:

#define kvm_for_each_vcpu_from(idx, n, vcpup, kvm) \
 for (n = atomic_read(kvm-online_vcpus); \
  n  (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
  n--, idx = (idx+1) % atomic_read(kvm-online_vcpus))



Gleb, Rik,
Any updates on this or Rik's patch status?
I can come up with the above suggested cleanup patch with Gleb's
from,sob.

Please let me know.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-23 Thread Raghavendra K T

On 06/23/2012 02:30 AM, Raghavendra K T wrote:

On 06/22/2012 08:41 PM, Andrew Jones wrote:

On Thu, Jun 21, 2012 at 04:56:08PM +0530, Raghavendra K T wrote:

Here are the results from kernbench.

PS: I think we have to only take that, both the patches perform better,
than reading into actual numbers since I am seeing more variance in
especially 3x. may be I can test with some more stable benchmark if
somebody points


[...]

can we agree like, for kernbench 1x= -j (2*#vcpu) in 1 vm.
1.5x = -j (2*#vcpu) in 1 vm and -j (#vcpu) in other.. and so on.
also a SPIN_THRESHOLD of 4k?


Please forget about 1.5x above. I am not too sure on that.



Any ideas on benchmarks is welcome from all.



My run for other benchmarks did not have Rik's patches, so re-spinning
everything with that now.

Here is the detailed info on env and benchmark I am currently trying. 
Let me know if you have any comments


===
kernel 3.5.0-rc1 with Rik's Ple handler fix  as base

Machine : Intel(R) Xeon(R) CPU X7560  @ 2.27GHz, 4 numa node, 256GB RAM, 
32 core machine


Host: enterprise linux  gcc version 4.4.6 20120305 (Red Hat 4.4.6-4) 
(GCC) with test kernels

Guest: fedora 16 with different built-in kernel from same source tree.
32 vcpus 8GB memory. (configs not changed with patches except for
CONFIG_PARAVIRT_SPINLOCK)

Note: for Pv patches, SPIN_THRESHOLD is set to 4k

Benchmarks:
1) kernbench: kernbench-0.50

cmd:
echo 3  /proc/sys/vm/drop_caches
ccache -C
kernbench -f -H -M -o 2*vcpu

Very first run in kernbench is omitted.

2) dbench: dbench version 4.00
cmd: dbench --warmup=30 -t 120 2*vcpu

3) hackbench:
https://build.opensuse.org/package/files?package=hackbenchproject=benchmark
hackbench.c modified with loops=1
used hackbench with num-threads = 2* vcpu

4) Specjbb: specjbb2000-1.02
Input Properties:
  ramp_up_seconds = 30
  measurement_seconds = 120
  forcegc = true
  starting_number_warehouses = 1
  increment_number_warehouses = 1
  ending_number_warehouses = 8


5) sysbench: 0.4.12
sysbench --test=oltp --db-driver=pgsql prepare
sysbench --num-threads=2*vcpu --max-requests=10 --test=oltp 
--oltp-table-size=50 --db-driver=pgsql --oltp-read-only run

Note that driver for this pgsql.


6) ebizzy: release 0.3
cmd: ebizzy -S 120

- specjbb ran for 1x and 2x others mostly for 1x, 2x, 3x overcommit.
- overcommit of 2x means same benchmark running on 2 guests.
- sample for each overcommit is mostly 8

Note: I ran kernbench with old kernbench0.50, may be I can try kcbench
with ramfs if necessary

will soon come with detailed results

- Raghu


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-22 Thread Andrew Jones
On Thu, Jun 21, 2012 at 04:56:08PM +0530, Raghavendra K T wrote:
 Here are the results from kernbench.
 
 PS: I think we have to only take that, both the patches perform better,
 than reading into actual numbers since I am seeing more variance in
 especially 3x. may be I can test with some more stable benchmark if
 somebody points
 

Hi Raghu,

I wonder if we should back up and try to determine the best
benchmark/test environment first. I think kernbench is good, but
I wonder about how to simulate the overcommit, and to what degree
(1x, 3x, ??). What are you currently running to simulate overcommit
now? Originally we were running kernbench in one VM and cpu hogs
(bash infinite loops) in other VMs. Then we added vcpus and infinite
loops to get up to the desired overcommit. I saw later that you've
experimented with running kernbench in the other VMs as well, rather
than cpu hogs. Is that still the case?

I started playing with benchmarking these proposals myself, but so
far have stuck to the cpu hog, since I wanted to keep variability
limited.  However, when targeting a reasonable host loadavg with a
bunch of cpu hog vcpus, it limits the overcommit too much. I certainly
haven't tried 3x this way. So I'm inclined to throw out the cpu hog
approach as well. The question is, what to replace it with? It appears
that the performance of the PLE and pvticketlock proposals are quite
dependant on the level of overcommit, so we should choose a target
overcommit level and also a constraint on the host loadavg first,
then determine how to setup a test environment that fits it and yields
results with low variance.

Here are results from my 1.125x overcommit test environment using
cpu hogs.

kcbench (a.k.a kernbench) results; 'mean-time (stddev)'
  base-noPLE:   235.730 (25.932)
  base-PLE: 238.820 (11.199)
  rand_start-PLE:   283.193 (23.262)
  pvticketlocks-noPLE:  244.987 (7.562)
  pvticketlocks-PLE:247.597 (17.200)

base kernel:  3.5.0-rc3 + Rik's new last_boosted patch
rand_start kernel:3.5.0-rc3 + Raghu's proposed random start patch
pvticketlocks kernel: 3.5.0-rc3 + Rik's new last_boosted patch
+ Raghu's pvticketlock series

The relative standard deviations are as high as 11%. So I'm not
real pleased with the results, and they show degradation everywhere.
Below are the details of the benchmarking. Everything is there except
the kernel config, but our benchmarking should be reproducible with
nearly random configs anyway.

Drew

= host =
  - Intel(R) Xeon(R) CPU X7560 @ 2.27GHz
  - 64 cpus, 4 nodes, 64G mem
  - Fedora 17 with test kernels (see tests)

= benchmark =
  - one cpu hog F17 VM
- 64 vcpus, 8G mem
- all vcpus run a bash infinite loop
- kernel: 3.5.0-rc3
  - one kcbench (a.k.a kernbench) F17 VM
- 8 vcpus, 8G mem
- 'kcbench -d /mnt/ram', /mnt/ram is 1G ramfs
- kcbench-0.3-8.1.noarch, kcbench-data-2.6.38-0.1-9.fc17.noarch,
  kcbench-data-0.1-9.fc17.noarch
- gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5)
- kernel: same test kernel as host

= test 1: base, PLE disabled (ple_gap=0) =
  - kernel: 3.5.0-rc3 + Rik's last_boosted patch

Run 1 (-j 16):  4211 (e:237.43 P:637% U:697.98 S:815.46 F:0)
Run 2 (-j 16):  3834 (e:260.77 P:631% U:729.69 S:917.56 F:0)
Run 3 (-j 16):  4784 (e:208.99 P:644% U:638.17 S:708.63 F:0)

mean: 235.730 stddev: 25.932

= test 2: base, PLE enabled =
  - kernel: 3.5.0-rc3 + Rik's last_boosted patch

Run 1 (-j 16):  4335 (e:230.67 P:639% U:657.74 S:818.28 F:0)
Run 2 (-j 16):  4269 (e:234.20 P:647% U:743.43 S:772.52 F:0)
Run 3 (-j 16):  3974 (e:251.59 P:639% U:724.29 S:884.21 F:0)

mean: 238.820 stddev: 11.199

= test 3: rand_start, PLE enabled =
  - kernel: 3.5.0-rc3 + Raghu's random start patch

Run 1 (-j 16):  3898 (e:256.52 P:639% U:756.14 S:884.63 F:0)
Run 2 (-j 16):  3341 (e:299.27 P:633% U:857.49 S:1039.62 F:0)
Run 3 (-j 16):  3403 (e:293.79 P:635% U:857.21 S:1008.83 F:0)

mean: 283.193 stddev: 23.262

= test 4: pvticketlocks, PLE disabled (ple_gap=0) =
  - kernel: 3.5.0-rc3 + Rik's last_boosted patch + Raghu's pvticketlock series
  + PARAVIRT_SPINLOCKS=y config change

Run 1 (-j 16):  3963 (e:252.29 P:647% U:736.43 S:897.16 F:0)
Run 2 (-j 16):  4216 (e:237.19 P:650% U:706.68 S:837.42 F:0)
Run 3 (-j 16):  4073 (e:245.48 P:649% U:709.46 S:884.68 F:0)

mean: 244.987 stddev: 7.562

= test 5: pvticketlocks, PLE enabled =
  - kernel: 3.5.0-rc3 + Rik's last_boosted patch + Raghu's pvticketlock series
  + PARAVIRT_SPINLOCKS=y config change

Run 1 (-j 16):  3978 (e:251.32 P:629% U:758.86 S:824.29 F:0)
Run 2 (-j 16):  4369 (e:228.84 P:634% U:708.32 S:743.71 F:0)
Run 3 (-j 16):  3807 (e:262.63 P:626% U:767.03 S:877.96 F:0)

mean: 247.597 stddev: 17.200
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-22 Thread Raghavendra K T

On 06/22/2012 08:41 PM, Andrew Jones wrote:

On Thu, Jun 21, 2012 at 04:56:08PM +0530, Raghavendra K T wrote:

Here are the results from kernbench.

PS: I think we have to only take that, both the patches perform better,
than reading into actual numbers since I am seeing more variance in
especially 3x. may be I can test with some more stable benchmark if
somebody points



Hi Raghu,



First of all Thank you for your test and raising valid points.
It also made the avenue for discussion of all the different experiments
 done over a month (apart from tuning/benchmarking), which may bring
more feedback and precious ideas from community to optimize the 
performance further.


I shall discuss in reply to this mail separately.


I wonder if we should back up and try to determine the best
benchmark/test environment first.


I agree, we have to be able to produce similar result independently.
So far sysbench (even pgbench) has been consistent, Currently trying,
if other  benchmarks like hackbench (modified #loops), ebizzy/dbench
have low variance.

[ but they too are dependent on #client/threads etc ]

I think kernbench is good, but

Yes kernbench atleast helped me to tune SPIN_THRESHOLD to good extent.
But Jeremy also had pointed out that kernbench is little inconsistent.


I wonder about how to simulate the overcommit, and to what degree
(1x, 3x, ??). What are you currently running to simulate overcommit
now? Originally we were running kernbench in one VM and cpu hogs
(bash infinite loops) in other VMs. Then we added vcpus and infinite
loops to get up to the desired overcommit. I saw later that you've
experimented with running kernbench in the other VMs as well, rather
than cpu hogs. Is that still the case?



Yes, I am now running same benchmark on all the guest.

on non PLE, while 1 cpuhogs, played good role of simulating LHP, but on
PLE machine It did not seem to be the case.


I started playing with benchmarking these proposals myself, but so
far have stuck to the cpu hog, since I wanted to keep variability
limited.  However, when targeting a reasonable host loadavg with a
bunch of cpu hog vcpus, it limits the overcommit too much. I certainly
haven't tried 3x this way. So I'm inclined to throw out the cpu hog
approach as well. The question is, what to replace it with? It appears
that the performance of the PLE and pvticketlock proposals are quite
dependant on the level of overcommit, so we should choose a target
overcommit level and also a constraint on the host loadavg first,
then determine how to setup a test environment that fits it and yields
results with low variance.

Here are results from my 1.125x overcommit test environment using
cpu hogs.


At first, result seemed backward, but after seeing individual runs and 
variations, it seems, except for rand start I believe all the result 
should converge to zero difference. So if we run the same again we may 
get completely different result.


IMO, on a 64 vcpu guest if we run -j16 it may not represent 1x load, so
what I believe is it has resulted in more of under-commit/nearly 1x
commit result. May be we should try atleast #threads = #vcpu or 2*#vcpu



kcbench (a.k.a kernbench) results; 'mean-time (stddev)'
   base-noPLE:   235.730 (25.932)
   base-PLE: 238.820 (11.199)
   rand_start-PLE:   283.193 (23.262)


Problem currently as we know, in PLE handler  we may end up choosing
same VCPU, which was in spinloop, that would unfortunately result in
more cpu burning.

And with randomizing start_vcpu, we are making that probability more.
we need to have a logic, not choose a vcpu that has recently PL exited 
since it cannot be a lock-holder. and next eligible lock-holder can be

picked up easily with PV patches.


   pvticketlocks-noPLE:  244.987 (7.562)
   pvticketlocks-PLE:247.597 (17.200)

base kernel:  3.5.0-rc3 + Rik's new last_boosted patch
rand_start kernel:3.5.0-rc3 + Raghu's proposed random start patch
pvticketlocks kernel: 3.5.0-rc3 + Rik's new last_boosted patch
 + Raghu's pvticketlock series


Ok, I believe SPIN_THRESHOLD was 2k right? what I had observed is with 
2k THRESHOLD, we see halt exit overheads. currently I am trying with

mostly 4k.



The relative standard deviations are as high as 11%. So I'm not
real pleased with the results, and they show degradation everywhere.
Below are the details of the benchmarking. Everything is there except
the kernel config, but our benchmarking should be reproducible with
nearly random configs anyway.

Drew

= host =
   - Intel(R) Xeon(R) CPU X7560 @ 2.27GHz
   - 64 cpus, 4 nodes, 64G mem
   - Fedora 17 with test kernels (see tests)

= benchmark =
   - one cpu hog F17 VM
 - 64 vcpus, 8G mem
 - all vcpus run a bash infinite loop
 - kernel: 3.5.0-rc3
   - one kcbench (a.k.a kernbench) F17 VM
 - 8 vcpus, 8G mem
 - 'kcbench -d /mnt/ram', /mnt/ram is 1G ramfs


may be we have to check whether 1GB RAM is ok when we have 

Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-21 Thread Gleb Natapov
On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote:
 On Wed, 20 Jun 2012 01:50:50 +0530
 Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:
 
  
  In ple handler code, last_boosted_vcpu (lbv) variable is
  serving as reference point to start when we enter.
 
  Also statistical analysis (below) is showing lbv is not very well
  distributed with current approach.
 
 You are the second person to spot this bug today (yes, today).
 
 Due to time zones, the first person has not had a chance yet to
 test the patch below, which might fix the issue...
 
 Please let me know how it goes.
 
 8
 
 If last_boosted_vcpu == 0, then we fall through all test cases and
 may end up with all VCPUs pouncing on vcpu 0.  With a large enough
 guest, this can result in enormous runqueue lock contention, which
 can prevent vcpu0 from running, leading to a livelock.
 
 Changing  to = makes sure we properly handle that case.
 
 Signed-off-by: Rik van Riel r...@redhat.com
 ---
  virt/kvm/kvm_main.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 7e14068..1da542b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
*/
   for (pass = 0; pass  2  !yielded; pass++) {
   kvm_for_each_vcpu(i, vcpu, kvm) {
 - if (!pass  i  last_boosted_vcpu) {
 + if (!pass  i = last_boosted_vcpu) {
   i = last_boosted_vcpu;
   continue;
   } else if (pass  i  last_boosted_vcpu)
 
Looks correct. We can simplify this by introducing something like:

#define kvm_for_each_vcpu_from(idx, n, vcpup, kvm) \
for (n = atomic_read(kvm-online_vcpus); \
 n  (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
 n--, idx = (idx+1) % atomic_read(kvm-online_vcpus))

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-21 Thread Raghavendra K T

On 06/21/2012 12:13 PM, Gleb Natapov wrote:

On Tue, Jun 19, 2012 at 04:51:04PM -0400, Rik van Riel wrote:

On Wed, 20 Jun 2012 01:50:50 +0530
Raghavendra K Traghavendra...@linux.vnet.ibm.com  wrote:



In ple handler code, last_boosted_vcpu (lbv) variable is
serving as reference point to start when we enter.



Also statistical analysis (below) is showing lbv is not very well
distributed with current approach.


You are the second person to spot this bug today (yes, today).

Due to time zones, the first person has not had a chance yet to
test the patch below, which might fix the issue...

Please let me know how it goes.

8

If last_boosted_vcpu == 0, then we fall through all test cases and
may end up with all VCPUs pouncing on vcpu 0.  With a large enough
guest, this can result in enormous runqueue lock contention, which
can prevent vcpu0 from running, leading to a livelock.

Changing  to= makes sure we properly handle that case.

Signed-off-by: Rik van Rielr...@redhat.com
---
  virt/kvm/kvm_main.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..1da542b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 */
for (pass = 0; pass  2  !yielded; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!pass  i  last_boosted_vcpu) {
+   if (!pass  i= last_boosted_vcpu) {
i = last_boosted_vcpu;
continue;
} else if (pass  i  last_boosted_vcpu)


Looks correct. We can simplify this by introducing something like:

#define kvm_for_each_vcpu_from(idx, n, vcpup, kvm) \
 for (n = atomic_read(kvm-online_vcpus); \
  n  (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
  n--, idx = (idx+1) % atomic_read(kvm-online_vcpus))



Thumbs up for this simplification. This really helps in all the places
where we want to start iterating from middle.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-21 Thread Raghavendra K T

On 06/21/2012 01:42 AM, Raghavendra K T wrote:

On 06/20/2012 02:21 AM, Rik van Riel wrote:

On Wed, 20 Jun 2012 01:50:50 +0530
Raghavendra K Traghavendra...@linux.vnet.ibm.com wrote:


[...]

Please let me know how it goes.


Yes, have got result today, too tired to summarize. got better
performance result too. will come back again tomorrow morning.
have to post, randomized start point patch also, which I discussed to
know the opinion.



Here are the results from kernbench.

PS: I think we have to only take that, both the patches perform better,
than reading into actual numbers since I am seeing more variance in
especially 3x. may be I can test with some more stable benchmark if
somebody points

+--+-+++---+
|  base|  Rik patch  | % improve  |Random patch|  %improve  |
+--+-+++---+
| 49.98|   49.935| 0.0901172  |  49.924286 |  0.111597 |
| 106.0051 |   89.25806  | 18.7625|  88.122217 |  20.2933  |
| 189.82067|   175.58783 | 8.10582|  166.99989 |  13.6651  |
+--+-+++---+

I also have posted result of randomizing starting point patch.

I agree that Rik's fix should ideally go into git ASAP. and when above
patches go into git, feel free to add,

Tested-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com

But I still see some questions unanswered.
1) why can't we move setting of last_boosted_vcpu up, it gives more
randomness ( As I said earlier, it gave degradation in 1x case because
of violent yields but performance benefit in 3x case. degradation
because  most of them yielding back to same spinning guy increasing
busy-wait but it gives huge benefit with ple_window set to higher
values such as 32k/64k. But that is a different issue altogethor)

2) Having the update of last_boosted_vcpu after yield_to does not seem
to be entirely correct. and having a common variable as starting point
may not be that good too. Also RR is little slower.

suppose we have 64 vcpu guest, and 4 vcpus enter ple_handler all of
them jumping on same guy to yield may not be good. Rather I personally
feel each of them starting at different point would be good idea.

But this alone will not help, we need more filtering of eligible VCPU.
for e.g. in first pass don't choose a VCPU that has recently done
PL exit. (Thanks Vatsa for brainstorming this). May be Peter/Avi
/Rik/Vatsa can give more idea in this area ( I mean, how can we identify
that a vcpu had done a PL exit/OR exited from spinlock context etc)

other idea  may be something like identifying next eligible lock-holder
(which is already possible with PV patches), and do yield-to him.

Here is the stat from randomizing starting point patch. We can see that
the patch has amazing fairness w.r.t starting point. IMO, this would be
great only after we add more eligibility criteria to target vcpus (of
yield_to).

Randomizing start index
===
snapshot1
PLE handler yield stat :
218416  176802  164554  141184  148495  154709  159871  145157
135476  158025  139997  247638  152498  18  122774  248228
158469  121825  138542  113351  164988  120432  136391  129855
172764  214015  158710  133049  83485   112134  81651   190878

PLE handler start stat :
547772  547725  547545  547931  547836  548656  548272  547849
548879  549012  547285  548185  548700  547132  548310  547286
547236  547307  548328  548059  547842  549152  547870  548340
548170  546996  546678  547842  547716  548096  547918  547546

snapshot2
==
PLE handler yield stat :
310690  222992  275829  156876  187354  185373  187584  155534
151578  205994  223731  320894  194995  167011  153415  286910
181290  143653  173988  181413  194505  170330  194455  181617
251108  226577  192070  143843  137878  166393  131405  250657

PLE handler start stat :
781335  782388  781837  782942  782025  781357  781950  781695
783183  783312  782004  782804  783766  780825  783232  781013
781587  781228  781642  781595  781665  783530  781546  781950
782268  781443  781327  781666  781907  781593  782105  781073


Sorry for attaching patch inline, I am using a dumb client. will post
it separately if needed.

8

Currently PLE handler uses per VM variable as starting point. Get rid
of the variable and use randomized starting point.
Thanks Vatsa for scheduler related clarifications.

Suggested-by: Srikar sri...@linux.vnet.ibm.com
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c446435..9799cab 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -275,7 +275,6 @@ struct kvm {
 #endif
 	struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
 	atomic_t online_vcpus;
-	int last_boosted_vcpu;
 	struct list_head vm_list;
 	struct mutex lock;
 	struct kvm_io_bus *buses[KVM_NR_BUSES];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 

Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-20 Thread Raghavendra K T

On 06/20/2012 02:21 AM, Rik van Riel wrote:

On Wed, 20 Jun 2012 01:50:50 +0530
Raghavendra K Traghavendra...@linux.vnet.ibm.com  wrote:



In ple handler code, last_boosted_vcpu (lbv) variable is
serving as reference point to start when we enter.



Also statistical analysis (below) is showing lbv is not very well
distributed with current approach.


You are the second person to spot this bug today (yes, today).


Oh! really interesting.



Due to time zones, the first person has not had a chance yet to
test the patch below, which might fix the issue...


May be his timezone also falls near to mine. I am also pretty late
now. :)



Please let me know how it goes.


Yes, have got result today, too tired to summarize. got better
performance result too. will come back again tomorrow morning.
have to post, randomized start point patch also, which I discussed to
know the opinion.



8

If last_boosted_vcpu == 0, then we fall through all test cases and
may end up with all VCPUs pouncing on vcpu 0.  With a large enough
guest, this can result in enormous runqueue lock contention, which
can prevent vcpu0 from running, leading to a livelock.

Changing  to= makes sure we properly handle that case.


Analysis shows distribution is more flatten now than before.
Here are the snapshots:
snapshot1
PLE handler yield stat :
66447   13  75510   65875   121298  92543  111267  79523
118134  105366  116441  114195  107493  6  86779   87733
84415   105778  94210   73197   55626   93036  112959  92035
95742   78558   72190   101719  94667   108593 63832   81580

PLE handler start stat :
334301  687807  384077  344917  504917  343988  439810  371389
466908  415509  394304  484276  376510  292821  370478  363727
366989  423441  392949  309706  292115  437900  413763  346135
364181  323031  348405  399593  336714  373995  302301  347383


snapshot2
PLE handler yield stat :
320547  267528  264316  164213  249246  182014  246468  225386
277179  310659  349767  310281  238680  187645  225791  266290
216202  316974  231077  216586  151679  356863  266031  213047
306229  182629  229334  241204  275975  265086  282218  242207

PLE handler start stat :
1335370  1378184  1252001  925414   1196973  951298   1219835  1108788
1265427  1290362  1308553  1271066  1107575  980036   1077210  1278611
1110779  1365130  1151200  1049859  937159   1577830  1209099  993391
1173766  987307   1144775  1102960  1100082  1177134  1207862  1119551




Signed-off-by: Rik van Rielr...@redhat.com
---
  virt/kvm/kvm_main.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..1da542b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 */
for (pass = 0; pass  2  !yielded; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!pass  i  last_boosted_vcpu) {
+   if (!pass  i= last_boosted_vcpu) {


Hmmm true, great catch. it was partial towards zero earlier.


i = last_boosted_vcpu;
continue;
} else if (pass  i  last_boosted_vcpu)




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-20 Thread Rik van Riel

On 06/20/2012 04:12 PM, Raghavendra K T wrote:

On 06/20/2012 02:21 AM, Rik van Riel wrote:



Please let me know how it goes.


Yes, have got result today, too tired to summarize. got better
performance result too. will come back again tomorrow morning.
have to post, randomized start point patch also, which I discussed to
know the opinion.


The other person's problem has also gone away with this
patch.

Avi, could I convince you to apply this obvious bugfix
to kvm.git? :)


8

If last_boosted_vcpu == 0, then we fall through all test cases and
may end up with all VCPUs pouncing on vcpu 0. With a large enough
guest, this can result in enormous runqueue lock contention, which
can prevent vcpu0 from running, leading to a livelock.

Changing to= makes sure we properly handle that case.




Signed-off-by: Rik van Rielr...@redhat.com
---
virt/kvm/kvm_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..1da542b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
*/
for (pass = 0; pass 2 !yielded; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
- if (!pass i last_boosted_vcpu) {
+ if (!pass i= last_boosted_vcpu) {
i = last_boosted_vcpu;
continue;
} else if (pass i last_boosted_vcpu)







--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: handle last_boosted_vcpu = 0 case

2012-06-19 Thread Rik van Riel
On Wed, 20 Jun 2012 01:50:50 +0530
Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote:

 
 In ple handler code, last_boosted_vcpu (lbv) variable is
 serving as reference point to start when we enter.

 Also statistical analysis (below) is showing lbv is not very well
 distributed with current approach.

You are the second person to spot this bug today (yes, today).

Due to time zones, the first person has not had a chance yet to
test the patch below, which might fix the issue...

Please let me know how it goes.

8

If last_boosted_vcpu == 0, then we fall through all test cases and
may end up with all VCPUs pouncing on vcpu 0.  With a large enough
guest, this can result in enormous runqueue lock contention, which
can prevent vcpu0 from running, leading to a livelock.

Changing  to = makes sure we properly handle that case.

Signed-off-by: Rik van Riel r...@redhat.com
---
 virt/kvm/kvm_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7e14068..1da542b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1586,7 +1586,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me)
 */
for (pass = 0; pass  2  !yielded; pass++) {
kvm_for_each_vcpu(i, vcpu, kvm) {
-   if (!pass  i  last_boosted_vcpu) {
+   if (!pass  i = last_boosted_vcpu) {
i = last_boosted_vcpu;
continue;
} else if (pass  i  last_boosted_vcpu)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html