Re: [PATCH V3 RFC 0/2] kvm: Improving undercommit scenarios

2012-11-29 Thread Raghavendra K T

On 11/29/2012 07:37 AM, Chegu Vinod wrote:

On 11/26/2012 4:07 AM, Raghavendra K T wrote:

  In some special scenarios like #vcpu <= #pcpu, PLE handler may
prove very costly, because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

  The first patch optimizes all the yield_to by bailing out when there
  is no need to continue in yield_to (i.e., when there is only one task
  in source and target rq).

  Second patch uses that in PLE handler. Further when a yield_to fails
  we do not immediately go out of PLE handler instead we try thrice
  to have better statistical possibility of false return. Otherwise that
  would affect moderate overcommit cases.
  Result on 3.7.0-rc6 kernel shows around 140% improvement for ebizzy
1x and
  around 51% for dbench 1x  with 32 core PLE machine with 32 vcpu guest.


base = 3.7.0-rc6
machine: 32 core mx3850 x5 PLE mc

--+---+---+---++---+
ebizzy (rec/sec higher is beter)
--+---+---+---++---+
 basestdev   patched stdev   %improve
--+---+---+---++---+
1x   2511.300021.54096051.8000   170.2592   140.98276
2x   2679.4000   332.44822692.3000   251.4005 0.48145
3x   2253.5000   266.42432192.1667   178.9753-2.72169
4x   1784.3750   102.26992018.7500   187.572313.13485
--+---+---+---++---+

--+---+---+---++---+
 dbench (throughput in MB/sec. higher is better)
--+---+---+---++---+
 basestdev   patched stdev   %improve
--+---+---+---++---+
1x  6677.4080   638.504810098.0060   3449.7026 51.22643
2x  2012.676064.76422019.0440 62.6702   0.31639
3x  1302.078340.83361292.7517 27.0515  -0.71629
4x  3043.1725  3243.72814664.4662   5946.5741  53.27643
--+---+---+---++---+

Here is the refernce of no ple result.
  ebizzy-1x_nople 7592.6000 rec/sec
  dbench_1x_nople 7853.6960 MB/sec

The result says we can still improve by 60% for ebizzy, but overall we
are
getting impressive performance with the patches.

  Changes Since V2:
  - Dropped global measures usage patch (Peter Zilstra)
  - Do not bail out on first failure (Avi Kivity)
  - Try thrice for the failure of yield_to to get statistically more
correct
behaviour.

  Changes since V1:
  - Discard the idea of exporting nrrunning and optimize in core
scheduler (Peter)
  - Use yield() instead of schedule in overcommit scenarios (Rik)
  - Use loadavg knowledge to detect undercommit/overcommit

  Peter Zijlstra (1):
   Bail out of yield_to when source and target runqueue has one task

  Raghavendra K T (1):
   Handle yield_to failure return for potential undercommit case

  Please let me know your comments and suggestions.

  Link for V2:
  https://lkml.org/lkml/2012/10/29/287

  Link for V1:
  https://lkml.org/lkml/2012/9/21/168

  kernel/sched/core.c | 25 +++--
  virt/kvm/kvm_main.c | 26 --
  2 files changed, 35 insertions(+), 16 deletions(-)

.


Tested-by: Chegu Vinod 



Thanks for testing..

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 RFC 0/2] kvm: Improving undercommit scenarios

2012-11-28 Thread Chegu Vinod

On 11/26/2012 4:07 AM, Raghavendra K T wrote:

  In some special scenarios like #vcpu <= #pcpu, PLE handler may
prove very costly, because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

  The first patch optimizes all the yield_to by bailing out when there
  is no need to continue in yield_to (i.e., when there is only one task
  in source and target rq).

  Second patch uses that in PLE handler. Further when a yield_to fails
  we do not immediately go out of PLE handler instead we try thrice
  to have better statistical possibility of false return. Otherwise that
  would affect moderate overcommit cases.
  
  Result on 3.7.0-rc6 kernel shows around 140% improvement for ebizzy 1x and

  around 51% for dbench 1x  with 32 core PLE machine with 32 vcpu guest.


base = 3.7.0-rc6
machine: 32 core mx3850 x5 PLE mc

--+---+---+---++---+
ebizzy (rec/sec higher is beter)
--+---+---+---++---+
 basestdev   patched stdev   %improve
--+---+---+---++---+
1x   2511.300021.54096051.8000   170.2592   140.98276
2x   2679.4000   332.44822692.3000   251.4005 0.48145
3x   2253.5000   266.42432192.1667   178.9753-2.72169
4x   1784.3750   102.26992018.7500   187.572313.13485
--+---+---+---++---+

--+---+---+---++---+
 dbench (throughput in MB/sec. higher is better)
--+---+---+---++---+
 basestdev   patched stdev   %improve
--+---+---+---++---+
1x  6677.4080   638.504810098.0060   3449.7026 51.22643
2x  2012.676064.76422019.0440 62.6702   0.31639
3x  1302.078340.83361292.7517 27.0515  -0.71629
4x  3043.1725  3243.72814664.4662   5946.5741  53.27643
--+---+---+---++---+

Here is the refernce of no ple result.
  ebizzy-1x_nople 7592.6000 rec/sec
  dbench_1x_nople 7853.6960 MB/sec

The result says we can still improve by 60% for ebizzy, but overall we are
getting impressive performance with the patches.

  Changes Since V2:
  - Dropped global measures usage patch (Peter Zilstra)
  - Do not bail out on first failure (Avi Kivity)
  - Try thrice for the failure of yield_to to get statistically more correct
behaviour.

  Changes since V1:
  - Discard the idea of exporting nrrunning and optimize in core scheduler 
(Peter)
  - Use yield() instead of schedule in overcommit scenarios (Rik)
  - Use loadavg knowledge to detect undercommit/overcommit

  Peter Zijlstra (1):
   Bail out of yield_to when source and target runqueue has one task

  Raghavendra K T (1):
   Handle yield_to failure return for potential undercommit case

  Please let me know your comments and suggestions.

  Link for V2:
  https://lkml.org/lkml/2012/10/29/287

  Link for V1:
  https://lkml.org/lkml/2012/9/21/168

  kernel/sched/core.c | 25 +++--
  virt/kvm/kvm_main.c | 26 --
  2 files changed, 35 insertions(+), 16 deletions(-)

.


Tested-by: Chegu Vinod 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 RFC 0/2] kvm: Improving undercommit scenarios

2012-11-26 Thread Raghavendra K T
 In some special scenarios like #vcpu <= #pcpu, PLE handler may
prove very costly, because there is no need to iterate over vcpus
and do unsuccessful yield_to burning CPU.

 The first patch optimizes all the yield_to by bailing out when there
 is no need to continue in yield_to (i.e., when there is only one task 
 in source and target rq).

 Second patch uses that in PLE handler. Further when a yield_to fails
 we do not immediately go out of PLE handler instead we try thrice 
 to have better statistical possibility of false return. Otherwise that
 would affect moderate overcommit cases.
 
 Result on 3.7.0-rc6 kernel shows around 140% improvement for ebizzy 1x and
 around 51% for dbench 1x  with 32 core PLE machine with 32 vcpu guest.


base = 3.7.0-rc6 
machine: 32 core mx3850 x5 PLE mc

--+---+---+---++---+
   ebizzy (rec/sec higher is beter)
--+---+---+---++---+
basestdev   patched stdev   %improve 
--+---+---+---++---+
1x   2511.300021.54096051.8000   170.2592   140.98276   
2x   2679.4000   332.44822692.3000   251.4005 0.48145
3x   2253.5000   266.42432192.1667   178.9753-2.72169
4x   1784.3750   102.26992018.7500   187.572313.13485
--+---+---+---++---+

--+---+---+---++---+
dbench (throughput in MB/sec. higher is better)
--+---+---+---++---+
basestdev   patched stdev   %improve 
--+---+---+---++---+
1x  6677.4080   638.504810098.0060   3449.7026 51.22643
2x  2012.676064.76422019.0440 62.6702   0.31639
3x  1302.078340.83361292.7517 27.0515  -0.71629
4x  3043.1725  3243.72814664.4662   5946.5741  53.27643
--+---+---+---++---+

Here is the refernce of no ple result.
 ebizzy-1x_nople 7592.6000 rec/sec
 dbench_1x_nople 7853.6960 MB/sec

The result says we can still improve by 60% for ebizzy, but overall we are
getting impressive performance with the patches.

 Changes Since V2:
 - Dropped global measures usage patch (Peter Zilstra)
 - Do not bail out on first failure (Avi Kivity)
 - Try thrice for the failure of yield_to to get statistically more correct
   behaviour.

 Changes since V1:
 - Discard the idea of exporting nrrunning and optimize in core scheduler 
(Peter)
 - Use yield() instead of schedule in overcommit scenarios (Rik)
 - Use loadavg knowledge to detect undercommit/overcommit

 Peter Zijlstra (1):
  Bail out of yield_to when source and target runqueue has one task

 Raghavendra K T (1):
  Handle yield_to failure return for potential undercommit case

 Please let me know your comments and suggestions.

 Link for V2:
 https://lkml.org/lkml/2012/10/29/287

 Link for V1:
 https://lkml.org/lkml/2012/9/21/168

 kernel/sched/core.c | 25 +++--
 virt/kvm/kvm_main.c | 26 --
 2 files changed, 35 insertions(+), 16 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html