Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-26 Thread Peter Zijlstra
On Tue, 2011-01-25 at 19:27 -0200, Glauber Costa wrote:
 On Tue, 2011-01-25 at 22:07 +0100, Peter Zijlstra wrote:
  On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
   On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:

 I fail to see how does clock_task influence cpu power.
 If we also have to touch clock_task for better accounting of other
 stuff, it is a separate story.
 But for cpu_power, I really fail. Please enlighten me.

static void update_rq_clock_task(struct rq *rq, s64 delta)
{
s64 irq_delta;

irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;

if (irq_delta  delta)
irq_delta = delta;

rq-prev_irq_time += irq_delta;
delta -= irq_delta;
rq-clock_task += delta;

if (irq_delta  sched_feat(NONIRQ_POWER))
sched_rt_avg_update(rq, irq_delta);
}

its done through that sched_rt_avg_update() (should probably rename
that), it computes a floating average of time not spend on fair tasks.

   It creates a dependency on CONFIG_IRQ_TIME_ACCOUNTING, though.
   This piece of code is simply compiled out if this option is disabled.
  
  We can pull this bit out and make the common bit also available for
  paravirt.
 
 scale_rt_power() seems to do the right thing, but all the path leading
 to it seem to work on rq-clock, rather than rq-clock_task.

Not quite, see how rq-clock_task is irq_delta less than the increment
to rq-clock? You want it to be your steal-time delta less too.

 Although I do can experiment with that as well, could you please
 elaborate on what are your reasons to prefer this over than variations
 of the method I proposed?

Because I want rq-clock_task to not include steal-time.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-26 Thread Glauber Costa
On Wed, 2011-01-26 at 10:57 +0100, Peter Zijlstra wrote:
 On Tue, 2011-01-25 at 19:27 -0200, Glauber Costa wrote:
  On Tue, 2011-01-25 at 22:07 +0100, Peter Zijlstra wrote:
   On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
 On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
 
  I fail to see how does clock_task influence cpu power.
  If we also have to touch clock_task for better accounting of other
  stuff, it is a separate story.
  But for cpu_power, I really fail. Please enlighten me.
 
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
 s64 irq_delta;
 
 irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
 
 if (irq_delta  delta)
 irq_delta = delta;
 
 rq-prev_irq_time += irq_delta;
 delta -= irq_delta;
 rq-clock_task += delta;
 
 if (irq_delta  sched_feat(NONIRQ_POWER))
 sched_rt_avg_update(rq, irq_delta);
 }
 
 its done through that sched_rt_avg_update() (should probably rename
 that), it computes a floating average of time not spend on fair tasks.
 
It creates a dependency on CONFIG_IRQ_TIME_ACCOUNTING, though.
This piece of code is simply compiled out if this option is disabled.
   
   We can pull this bit out and make the common bit also available for
   paravirt.
  
  scale_rt_power() seems to do the right thing, but all the path leading
  to it seem to work on rq-clock, rather than rq-clock_task.
 
 Not quite, see how rq-clock_task is irq_delta less than the increment
 to rq-clock? You want it to be your steal-time delta less too.
yes, but once this delta is subtracted from rq-clock_task, this value is not
used to dictate power, unless I am mistaken.

power is adjusted according to scale_rt_power(), which does it using the
values of rq-rt_avg, rq-age_stamp, and rq-clock.

So whatever I store into rq-clock_task, but not rq-clock (which
correct me if I'm wrong, is expected to be walltime), will not be used
to adjust cpu power, which is what I'm trying to achieve.

  Although I do can experiment with that as well, could you please
  elaborate on what are your reasons to prefer this over than variations
  of the method I proposed?
 
 Because I want rq-clock_task to not include steal-time.
Sure, fair deal. But at this point, those demands seem orthogonal to me.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-26 Thread Peter Zijlstra
On Wed, 2011-01-26 at 13:43 -0200, Glauber Costa wrote:

 yes, but once this delta is subtracted from rq-clock_task, this value is not
 used to dictate power, unless I am mistaken.
 
 power is adjusted according to scale_rt_power(), which does it using the
 values of rq-rt_avg, rq-age_stamp, and rq-clock.
 
 So whatever I store into rq-clock_task, but not rq-clock (which
 correct me if I'm wrong, is expected to be walltime), will not be used
 to adjust cpu power, which is what I'm trying to achieve.

No, see the below, it uses a per-cpu virt_steal_time() clock which is
expected to return steal-time in ns.

All time not accounted to -clock_task is accumulated in lost, and
passed into sched_rt_avg_update() and thus affects the cpu_power.

If it finds that 50% of the (recent) time is steal time, its cpu_power
will be 50%.

---
 kernel/sched.c  |   44 
 kernel/sched_features.h |2 +-
 2 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..c71384c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -523,6 +523,9 @@ struct rq {
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
u64 prev_irq_time;
 #endif
+#ifdef CONFIG_SCHED_PARAVIRT
+   u64 prev_steal_time;
+#endif
 
/* calc_load related fields */
unsigned long calc_load_update;
@@ -1888,11 +1891,15 @@ void account_system_vtime(struct task_struct *curr)
 }
 EXPORT_SYMBOL_GPL(account_system_vtime);
 
+#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
-   s64 irq_delta;
+   s64 lost_delta __maybe_unused;
+   s64 lost = 0;
 
-   irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+   lost_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
 
/*
 * Since irq_time is only updated on {soft,}irq_exit, we might run into
@@ -1909,26 +1916,31 @@ static void update_rq_clock_task(struct rq *rq, s64 
delta)
 * the current rq-clock timestamp, except that would require using
 * atomic ops.
 */
-   if (irq_delta  delta)
-   irq_delta = delta;
+   if (lost_delta  delta)
+   lost_delta = delta;
 
-   rq-prev_irq_time += irq_delta;
-   delta -= irq_delta;
-   rq-clock_task += delta;
+   rq-prev_irq_time += lost_delta;
+   lost += lost_delta;
+#endif
+#ifdef CONFIG_SCHED_PARAVIRT
+   lost_delta = virt_steal_time(cpu_of(rq)) - rq-prev_steal_time;
+   
+   /*
+* unlikely, unless steal_time accounting is iffy
+*/
+   if (lost + lost_delta  delta)
+   lost_delta = delta - lost;
 
-   if (irq_delta  sched_feat(NONIRQ_POWER))
-   sched_rt_avg_update(rq, irq_delta);
-}
+   rq-prev_steal_time += lost_delta;
+   lost += lost_delta
+#endif
 
-#else /* CONFIG_IRQ_TIME_ACCOUNTING */
+   rq-clock_task += delta - lost;
 
-static void update_rq_clock_task(struct rq *rq, s64 delta)
-{
-   rq-clock_task += delta;
+   if (lost  sched_feat(NONTASK_POWER))
+   sched_rt_avg_update(rq, lost);
 }
 
-#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
-
 #include sched_idletask.c
 #include sched_fair.c
 #include sched_rt.c
diff --git a/kernel/sched_features.h b/kernel/sched_features.h
index 68e69ac..b334a2d 100644
--- a/kernel/sched_features.h
+++ b/kernel/sched_features.h
@@ -63,4 +63,4 @@ SCHED_FEAT(OWNER_SPIN, 1)
 /*
  * Decrement CPU power based on irq activity
  */
-SCHED_FEAT(NONIRQ_POWER, 1)
+SCHED_FEAT(NONTASK_POWER, 1)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-26 Thread Peter Zijlstra
On Wed, 2011-01-26 at 17:46 +0100, Peter Zijlstra wrote:
 it uses a per-cpu virt_steal_time() clock which is
 expected to return steal-time in ns. 

This clock should return u64 and wrap on u64 and be provided when
CONFIG_SCHED_PARAVIRT.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-26 Thread Glauber Costa
On Wed, 2011-01-26 at 17:46 +0100, Peter Zijlstra wrote:
 On Wed, 2011-01-26 at 13:43 -0200, Glauber Costa wrote:
 
  yes, but once this delta is subtracted from rq-clock_task, this value is 
  not
  used to dictate power, unless I am mistaken.
  
  power is adjusted according to scale_rt_power(), which does it using the
  values of rq-rt_avg, rq-age_stamp, and rq-clock.
  
  So whatever I store into rq-clock_task, but not rq-clock (which
  correct me if I'm wrong, is expected to be walltime), will not be used
  to adjust cpu power, which is what I'm trying to achieve.
 
 No, see the below, it uses a per-cpu virt_steal_time() clock which is
 expected to return steal-time in ns.
 
 All time not accounted to -clock_task is accumulated in lost, and
 passed into sched_rt_avg_update() and thus affects the cpu_power.
 
 If it finds that 50% of the (recent) time is steal time, its cpu_power
 will be 50%.
ok, now I get your proposal. It does make sense.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-25 Thread Glauber Costa
On Mon, 2011-01-24 at 20:51 +0100, Peter Zijlstra wrote:
 On Mon, 2011-01-24 at 16:51 -0200, Glauber Costa wrote:
   I would really much rather see you change update_rq_clock_task() and
   subtract your ns resolution steal time from our wall-time,
   update_rq_clock_task() already updates the cpu_power relative to the
   remaining time available.
  
  But then we get into the problems we already discussed in previous
  submissions, which is, we don't really want to alter any notion of
  wallclock time. Could you list any more concrete advantages of the
  specific way you proposed? 
 
 clock_task is the time spend on the task, by not taking steal time into
 account all steal time is accounted as service to whatever task was
 current when the vcpu wasn't running.
 
 It doesn't change wall-time in the sense of gtod, only the service time
 to tasks.
I fail to see how does clock_task influence cpu power.
If we also have to touch clock_task for better accounting of other
stuff, it is a separate story.
But for cpu_power, I really fail. Please enlighten me.

I did have slightly better results accounting the whole steal time
period away from total_ticks(*) than just with units.

* Please note again that ticks here is just a (bad, I admit) name.
This is not tick accounting, is just an estimate of how much useful work
your cpu did.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-25 Thread Peter Zijlstra
On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:

 I fail to see how does clock_task influence cpu power.
 If we also have to touch clock_task for better accounting of other
 stuff, it is a separate story.
 But for cpu_power, I really fail. Please enlighten me.

static void update_rq_clock_task(struct rq *rq, s64 delta)
{
s64 irq_delta;

irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;

if (irq_delta  delta)
irq_delta = delta;

rq-prev_irq_time += irq_delta;
delta -= irq_delta;
rq-clock_task += delta;

if (irq_delta  sched_feat(NONIRQ_POWER))
sched_rt_avg_update(rq, irq_delta);
}

its done through that sched_rt_avg_update() (should probably rename
that), it computes a floating average of time not spend on fair tasks.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-25 Thread Glauber Costa
On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
 On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
 
  I fail to see how does clock_task influence cpu power.
  If we also have to touch clock_task for better accounting of other
  stuff, it is a separate story.
  But for cpu_power, I really fail. Please enlighten me.
 
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
 s64 irq_delta;
 
 irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
 
 if (irq_delta  delta)
 irq_delta = delta;
 
 rq-prev_irq_time += irq_delta;
 delta -= irq_delta;
 rq-clock_task += delta;
 
 if (irq_delta  sched_feat(NONIRQ_POWER))
 sched_rt_avg_update(rq, irq_delta);
 }
 
 its done through that sched_rt_avg_update() (should probably rename
 that), it computes a floating average of time not spend on fair tasks.
 
It creates a dependency on CONFIG_IRQ_TIME_ACCOUNTING, though.
This piece of code is simply compiled out if this option is disabled.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-25 Thread Peter Zijlstra
On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
 On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
  On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
  
   I fail to see how does clock_task influence cpu power.
   If we also have to touch clock_task for better accounting of other
   stuff, it is a separate story.
   But for cpu_power, I really fail. Please enlighten me.
  
  static void update_rq_clock_task(struct rq *rq, s64 delta)
  {
  s64 irq_delta;
  
  irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
  
  if (irq_delta  delta)
  irq_delta = delta;
  
  rq-prev_irq_time += irq_delta;
  delta -= irq_delta;
  rq-clock_task += delta;
  
  if (irq_delta  sched_feat(NONIRQ_POWER))
  sched_rt_avg_update(rq, irq_delta);
  }
  
  its done through that sched_rt_avg_update() (should probably rename
  that), it computes a floating average of time not spend on fair tasks.
  
 It creates a dependency on CONFIG_IRQ_TIME_ACCOUNTING, though.
 This piece of code is simply compiled out if this option is disabled.

We can pull this bit out and make the common bit also available for
paravirt.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-25 Thread Glauber Costa
On Tue, 2011-01-25 at 22:07 +0100, Peter Zijlstra wrote:
 On Tue, 2011-01-25 at 18:47 -0200, Glauber Costa wrote:
  On Tue, 2011-01-25 at 21:13 +0100, Peter Zijlstra wrote:
   On Tue, 2011-01-25 at 18:02 -0200, Glauber Costa wrote:
   
I fail to see how does clock_task influence cpu power.
If we also have to touch clock_task for better accounting of other
stuff, it is a separate story.
But for cpu_power, I really fail. Please enlighten me.
   
   static void update_rq_clock_task(struct rq *rq, s64 delta)
   {
   s64 irq_delta;
   
   irq_delta = irq_time_read(cpu_of(rq)) - rq-prev_irq_time;
   
   if (irq_delta  delta)
   irq_delta = delta;
   
   rq-prev_irq_time += irq_delta;
   delta -= irq_delta;
   rq-clock_task += delta;
   
   if (irq_delta  sched_feat(NONIRQ_POWER))
   sched_rt_avg_update(rq, irq_delta);
   }
   
   its done through that sched_rt_avg_update() (should probably rename
   that), it computes a floating average of time not spend on fair tasks.
   
  It creates a dependency on CONFIG_IRQ_TIME_ACCOUNTING, though.
  This piece of code is simply compiled out if this option is disabled.
 
 We can pull this bit out and make the common bit also available for
 paravirt.

scale_rt_power() seems to do the right thing, but all the path leading
to it seem to work on rq-clock, rather than rq-clock_task.

Although I do can experiment with that as well, could you please
elaborate on what are your reasons to prefer this over than variations
of the method I proposed?




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-24 Thread Peter Zijlstra
On Mon, 2011-01-24 at 13:06 -0500, Glauber Costa wrote:
 This is a first proposal for using steal time information
 to influence the scheduler. There are a lot of optimizations
 and fine grained adjustments to be done, but it is working reasonably
 so far for me (mostly)
 
 With this patch (and some host pinnings to demonstrate the situation),
 two vcpus with very different steal time (Say 80 % vs 1 %) will not get
 an even distribution of processes. This is a situation that can naturally
 arise, specially in overcommited scenarios. Previosly, the guest scheduler
 would wrongly think that all cpus have the same ability to run processes,
 lowering the overall throughput.
 
 Signed-off-by: Glauber Costa glom...@redhat.com
 CC: Rik van Riel r...@redhat.com
 CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
 CC: Peter Zijlstra pet...@infradead.org
 CC: Avi Kivity a...@redhat.com
 ---
  include/linux/sched.h |1 +
  kernel/sched.c|4 
  kernel/sched_fair.c   |   10 ++
  3 files changed, 15 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/sched.h b/include/linux/sched.h
 index d747f94..beab72d 100644
 --- a/include/linux/sched.h
 +++ b/include/linux/sched.h
 @@ -302,6 +302,7 @@ long io_schedule_timeout(long timeout);
  extern void cpu_init (void);
  extern void trap_init(void);
  extern void update_process_times(int user);
 +extern cputime_t (*hypervisor_steal_time)(void);
  extern void scheduler_tick(void);
  
  extern void sched_show_task(struct task_struct *p);
 diff --git a/kernel/sched.c b/kernel/sched.c
 index 3b3e88d..c460f0d 100644
 --- a/kernel/sched.c
 +++ b/kernel/sched.c
 @@ -501,6 +501,8 @@ struct rq {
   struct sched_domain *sd;
  
   unsigned long cpu_power;
 + unsigned long real_ticks;
 + unsigned long total_ticks;
  
   unsigned char idle_at_tick;
   /* For active balancing */
 @@ -3533,10 +3535,12 @@ static int touch_steal_time(int is_idle)
   if (is_idle)
   return 0;
  
 + rq-total_ticks++;
   if (steal) {
   account_steal_time(steal);
   return 1;
   }
 + rq-real_ticks++;
   return 0;
  }

yuck!! is ticks really the best you can do?

I thought kvm had a ns resolution steal-time clock?

 diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
 index c62ebae..1364c28 100644
 --- a/kernel/sched_fair.c
 +++ b/kernel/sched_fair.c
 @@ -2509,6 +2509,16 @@ static void update_cpu_power(struct sched_domain *sd, 
 int cpu)
   power = SCHED_LOAD_SHIFT;
   }
  
 + WARN_ON(cpu_rq(cpu)-real_ticks  cpu_rq(cpu)-total_ticks);
 +
 + if (cpu_rq(cpu)-total_ticks) {
 + power *= cpu_rq(cpu)-real_ticks;
 + power /= cpu_rq(cpu)-total_ticks;
 + }
 +
 + cpu_rq(cpu)-total_ticks = 0;
 + cpu_rq(cpu)-real_ticks = 0;
 +
   sdg-cpu_power_orig = power;
  
   if (sched_feat(ARCH_POWER))

I would really much rather see you change update_rq_clock_task() and
subtract your ns resolution steal time from our wall-time,
update_rq_clock_task() already updates the cpu_power relative to the
remaining time available.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-24 Thread Glauber Costa
On Mon, 2011-01-24 at 19:32 +0100, Peter Zijlstra wrote:
 On Mon, 2011-01-24 at 13:06 -0500, Glauber Costa wrote:
  This is a first proposal for using steal time information
  to influence the scheduler. There are a lot of optimizations
  and fine grained adjustments to be done, but it is working reasonably
  so far for me (mostly)
  
  With this patch (and some host pinnings to demonstrate the situation),
  two vcpus with very different steal time (Say 80 % vs 1 %) will not get
  an even distribution of processes. This is a situation that can naturally
  arise, specially in overcommited scenarios. Previosly, the guest scheduler
  would wrongly think that all cpus have the same ability to run processes,
  lowering the overall throughput.
  
  Signed-off-by: Glauber Costa glom...@redhat.com
  CC: Rik van Riel r...@redhat.com
  CC: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com
  CC: Peter Zijlstra pet...@infradead.org
  CC: Avi Kivity a...@redhat.com
  ---
   include/linux/sched.h |1 +
   kernel/sched.c|4 
   kernel/sched_fair.c   |   10 ++
   3 files changed, 15 insertions(+), 0 deletions(-)
  
  diff --git a/include/linux/sched.h b/include/linux/sched.h
  index d747f94..beab72d 100644
  --- a/include/linux/sched.h
  +++ b/include/linux/sched.h
  @@ -302,6 +302,7 @@ long io_schedule_timeout(long timeout);
   extern void cpu_init (void);
   extern void trap_init(void);
   extern void update_process_times(int user);
  +extern cputime_t (*hypervisor_steal_time)(void);
   extern void scheduler_tick(void);
   
   extern void sched_show_task(struct task_struct *p);
  diff --git a/kernel/sched.c b/kernel/sched.c
  index 3b3e88d..c460f0d 100644
  --- a/kernel/sched.c
  +++ b/kernel/sched.c
  @@ -501,6 +501,8 @@ struct rq {
  struct sched_domain *sd;
   
  unsigned long cpu_power;
  +   unsigned long real_ticks;
  +   unsigned long total_ticks;
   
  unsigned char idle_at_tick;
  /* For active balancing */
  @@ -3533,10 +3535,12 @@ static int touch_steal_time(int is_idle)
  if (is_idle)
  return 0;
   
  +   rq-total_ticks++;
  if (steal) {
  account_steal_time(steal);
  return 1;
  }
  +   rq-real_ticks++;
  return 0;
   }
 
 yuck!! is ticks really the best you can do?
No, but it is simple enough for a first try. With those variables, we're
not accounting anything, but rather, getting a rough estimate of the %
of steal time compared to the useful (non-idle) time.

 I thought kvm had a ns resolution steal-time clock?
Yes, the one I introduced earlier in this series is nsec. However, user
and system will be accounted in usec at most, so there is no point in
using nsec here.

 
  diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
  index c62ebae..1364c28 100644
  --- a/kernel/sched_fair.c
  +++ b/kernel/sched_fair.c
  @@ -2509,6 +2509,16 @@ static void update_cpu_power(struct sched_domain 
  *sd, int cpu)
  power = SCHED_LOAD_SHIFT;
  }
   
  +   WARN_ON(cpu_rq(cpu)-real_ticks  cpu_rq(cpu)-total_ticks);
  +
  +   if (cpu_rq(cpu)-total_ticks) {
  +   power *= cpu_rq(cpu)-real_ticks;
  +   power /= cpu_rq(cpu)-total_ticks;
  +   }
  +
  +   cpu_rq(cpu)-total_ticks = 0;
  +   cpu_rq(cpu)-real_ticks = 0;
  +
  sdg-cpu_power_orig = power;
   
  if (sched_feat(ARCH_POWER))
 
 I would really much rather see you change update_rq_clock_task() and
 subtract your ns resolution steal time from our wall-time,
 update_rq_clock_task() already updates the cpu_power relative to the
 remaining time available.

But then we get into the problems we already discussed in previous
submissions, which is, we don't really want to alter any notion of
wallclock time. Could you list any more concrete advantages of the
specific way you proposed?

I found that updating cpu power directly is pretty simple, and seems
to work well enough, without disturbing any notion of real time.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-24 Thread Peter Zijlstra
On Mon, 2011-01-24 at 16:51 -0200, Glauber Costa wrote:
  I would really much rather see you change update_rq_clock_task() and
  subtract your ns resolution steal time from our wall-time,
  update_rq_clock_task() already updates the cpu_power relative to the
  remaining time available.
 
 But then we get into the problems we already discussed in previous
 submissions, which is, we don't really want to alter any notion of
 wallclock time. Could you list any more concrete advantages of the
 specific way you proposed? 

clock_task is the time spend on the task, by not taking steal time into
account all steal time is accounted as service to whatever task was
current when the vcpu wasn't running.

It doesn't change wall-time in the sense of gtod, only the service time
to tasks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-24 Thread Peter Zijlstra
On Mon, 2011-01-24 at 16:51 -0200, Glauber Costa wrote:
 
  I thought kvm had a ns resolution steal-time clock?
 Yes, the one I introduced earlier in this series is nsec. However, user
 and system will be accounted in usec at most, so there is no point in
 using nsec here.

Well, the scheduler accounts most things in ns these days -- some archs
even do user/sys time in ns -- using ticks is really very crude, esp
when you've got better information.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/16] KVM-GST: adjust scheduler cpu power

2011-01-24 Thread Glauber Costa
On Mon, 2011-01-24 at 20:51 +0100, Peter Zijlstra wrote:
 On Mon, 2011-01-24 at 16:51 -0200, Glauber Costa wrote:
   I would really much rather see you change update_rq_clock_task() and
   subtract your ns resolution steal time from our wall-time,
   update_rq_clock_task() already updates the cpu_power relative to the
   remaining time available.
  
  But then we get into the problems we already discussed in previous
  submissions, which is, we don't really want to alter any notion of
  wallclock time. Could you list any more concrete advantages of the
  specific way you proposed? 
 
 clock_task is the time spend on the task, by not taking steal time into
 account all steal time is accounted as service to whatever task was
 current when the vcpu wasn't running.
 
 It doesn't change wall-time in the sense of gtod, only the service time
 to tasks.
Ok, I'll experiment with that and see how it goes.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html