Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-14 Thread Balbir Singh
* Rik van Riel  [2010-12-13 12:02:51]:

> On 12/11/2010 08:57 AM, Balbir Singh wrote:
> 
> >If the vpcu holding the lock runs more and capped, the timeslice
> >transfer is a heuristic that will not help.
> 
> That indicates you really need the cap to be per guest, and
> not per VCPU.
>

Yes, I personally think so too, but I suspect there needs to be a
larger agreement on the semantics. The VCPU semantics in terms of
power apply to each VCPU as opposed to the entire system (per guest).
 
> Having one VCPU spin on a lock (and achieve nothing), because
> the other one cannot give up the lock due to hitting its CPU
> cap could lead to showstoppingly bad performance.

Yes, that seems right!

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-13 Thread Rik van Riel

On 12/11/2010 08:57 AM, Balbir Singh wrote:


If the vpcu holding the lock runs more and capped, the timeslice
transfer is a heuristic that will not help.


That indicates you really need the cap to be per guest, and
not per VCPU.

Having one VCPU spin on a lock (and achieve nothing), because
the other one cannot give up the lock due to hitting its CPU
cap could lead to showstoppingly bad performance.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-13 Thread Avi Kivity

On 12/13/2010 02:39 PM, Balbir Singh wrote:

* Avi Kivity  [2010-12-13 13:57:37]:

>  On 12/11/2010 03:57 PM, Balbir Singh wrote:
>  >* Avi Kivity   [2010-12-11 09:31:24]:
>  >
>  >>   On 12/10/2010 07:03 AM, Balbir Singh wrote:
>  >>   >>
>  >>   >>Scheduler people, please flame me with anything I may have done
>  >>   >>wrong, so I can do it right for a next version :)
>  >>   >>
>  >>   >
>  >>   >This is a good problem statement, there are other things to consider
>  >>   >as well
>  >>   >
>  >>   >1. If a hard limit feature is enabled underneath, donating the
>  >>   >timeslice would probably not make too much sense in that case
>  >>
>  >>   What's the alternative?
>  >>
>  >>   Consider a two vcpu guest with a 50% hard cap.  Suppose the workload
>  >>   involves ping-ponging within the guest.  If the scheduler decides to
>  >>   schedule the vcpus without any overlap, then the throughput will be
>  >>   dictated by the time slice.  If we allow donation, throughput is
>  >>   limited by context switch latency.
>  >>
>  >
>  >If the vpcu holding the lock runs more and capped, the timeslice
>  >transfer is a heuristic that will not help.
>
>  Why not?  as long as we shift the cap as well.
>

Shifting the cap would break it, no?


The total cap for the guest would remain.


Anyway, that is something for us
to keep track of as we add additional heuristics, not a show stopper.


Sure, as long as we see a way to fix it eventually.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-13 Thread Balbir Singh
* Avi Kivity  [2010-12-13 13:57:37]:

> On 12/11/2010 03:57 PM, Balbir Singh wrote:
> >* Avi Kivity  [2010-12-11 09:31:24]:
> >
> >>  On 12/10/2010 07:03 AM, Balbir Singh wrote:
> >>  >>
> >>  >>   Scheduler people, please flame me with anything I may have done
> >>  >>   wrong, so I can do it right for a next version :)
> >>  >>
> >>  >
> >>  >This is a good problem statement, there are other things to consider
> >>  >as well
> >>  >
> >>  >1. If a hard limit feature is enabled underneath, donating the
> >>  >timeslice would probably not make too much sense in that case
> >>
> >>  What's the alternative?
> >>
> >>  Consider a two vcpu guest with a 50% hard cap.  Suppose the workload
> >>  involves ping-ponging within the guest.  If the scheduler decides to
> >>  schedule the vcpus without any overlap, then the throughput will be
> >>  dictated by the time slice.  If we allow donation, throughput is
> >>  limited by context switch latency.
> >>
> >
> >If the vpcu holding the lock runs more and capped, the timeslice
> >transfer is a heuristic that will not help.
> 
> Why not?  as long as we shift the cap as well.
>

Shifting the cap would break it, no? Anyway, that is something for us
to keep track of as we add additional heuristics, not a show stopper. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-13 Thread Avi Kivity

On 12/11/2010 03:57 PM, Balbir Singh wrote:

* Avi Kivity  [2010-12-11 09:31:24]:

>  On 12/10/2010 07:03 AM, Balbir Singh wrote:
>  >>
>  >>   Scheduler people, please flame me with anything I may have done
>  >>   wrong, so I can do it right for a next version :)
>  >>
>  >
>  >This is a good problem statement, there are other things to consider
>  >as well
>  >
>  >1. If a hard limit feature is enabled underneath, donating the
>  >timeslice would probably not make too much sense in that case
>
>  What's the alternative?
>
>  Consider a two vcpu guest with a 50% hard cap.  Suppose the workload
>  involves ping-ponging within the guest.  If the scheduler decides to
>  schedule the vcpus without any overlap, then the throughput will be
>  dictated by the time slice.  If we allow donation, throughput is
>  limited by context switch latency.
>

If the vpcu holding the lock runs more and capped, the timeslice
transfer is a heuristic that will not help.


Why not?  as long as we shift the cap as well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-13 Thread Balbir Singh
* Avi Kivity  [2010-12-11 09:31:24]:

> On 12/10/2010 07:03 AM, Balbir Singh wrote:
> >>
> >>  Scheduler people, please flame me with anything I may have done
> >>  wrong, so I can do it right for a next version :)
> >>
> >
> >This is a good problem statement, there are other things to consider
> >as well
> >
> >1. If a hard limit feature is enabled underneath, donating the
> >timeslice would probably not make too much sense in that case
> 
> What's the alternative?
> 
> Consider a two vcpu guest with a 50% hard cap.  Suppose the workload
> involves ping-ponging within the guest.  If the scheduler decides to
> schedule the vcpus without any overlap, then the throughput will be
> dictated by the time slice.  If we allow donation, throughput is
> limited by context switch latency.
>

If the vpcu holding the lock runs more and capped, the timeslice
transfer is a heuristic that will not help. 

-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-10 Thread Avi Kivity

On 12/10/2010 07:03 AM, Balbir Singh wrote:

>
>  Scheduler people, please flame me with anything I may have done
>  wrong, so I can do it right for a next version :)
>

This is a good problem statement, there are other things to consider
as well

1. If a hard limit feature is enabled underneath, donating the
timeslice would probably not make too much sense in that case


What's the alternative?

Consider a two vcpu guest with a 50% hard cap.  Suppose the workload 
involves ping-ponging within the guest.  If the scheduler decides to 
schedule the vcpus without any overlap, then the throughput will be 
dictated by the time slice.  If we allow donation, throughput is limited 
by context switch latency.



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-10 Thread Rik van Riel

On 12/10/2010 12:03 AM, Balbir Singh wrote:


This is a good problem statement, there are other things to consider
as well

1. If a hard limit feature is enabled underneath, donating the
timeslice would probably not make too much sense in that case


The idea is to get the VCPU that is holding the lock to run
ASAP, so it can release the lock.


2. The implict assumption is that spinning is bad, but for locks
held for short durations, the assumption is not true. I presume
by the problem statement above, the h/w does the detection of
when to pause, but that is not always correct as you suggest above.


The hardware waits a certain number of spins before it traps
to the virt host.  Short-held locks, held by a virtual CPU
that is running, will not trigger the exception.


3. With respect to donating timeslices, don't scheduler cgroups
and job isolation address that problem today?


No.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-09 Thread Balbir Singh
* Rik van Riel  [2010-12-02 14:41:29]:

> When running SMP virtual machines, it is possible for one VCPU to be
> spinning on a spinlock, while the VCPU that holds the spinlock is not
> currently running, because the host scheduler preempted it to run
> something else.
> 
> Both Intel and AMD CPUs have a feature that detects when a virtual
> CPU is spinning on a lock and will trap to the host.
> 
> The current KVM code sleeps for a bit whenever that happens, which
> results in eg. a 64 VCPU Windows guest taking forever and a bit to
> boot up.  This is because the VCPU holding the lock is actually
> running and not sleeping, so the pause is counter-productive.
> 
> In other workloads a pause can also be counter-productive, with
> spinlock detection resulting in one guest giving up its CPU time
> to the others.  Instead of spinning, it ends up simply not running
> much at all.
> 
> This patch series aims to fix that, by having a VCPU that spins
> give the remainder of its timeslice to another VCPU in the same
> guest before yielding the CPU - one that is runnable but got 
> preempted, hopefully the lock holder.
> 
> Scheduler people, please flame me with anything I may have done
> wrong, so I can do it right for a next version :)
>

This is a good problem statement, there are other things to consider
as well

1. If a hard limit feature is enabled underneath, donating the
timeslice would probably not make too much sense in that case
2. The implict assumption is that spinning is bad, but for locks
held for short durations, the assumption is not true. I presume
by the problem statement above, the h/w does the detection of
when to pause, but that is not always correct as you suggest above.
3. With respect to donating timeslices, don't scheduler cgroups
and job isolation address that problem today?
 
-- 
Three Cheers,
Balbir
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-05 Thread Avi Kivity

On 12/03/2010 12:41 AM, Chris Wright wrote:

* Rik van Riel (r...@redhat.com) wrote:
>  When running SMP virtual machines, it is possible for one VCPU to be
>  spinning on a spinlock, while the VCPU that holds the spinlock is not
>  currently running, because the host scheduler preempted it to run
>  something else.
>
>  Both Intel and AMD CPUs have a feature that detects when a virtual
>  CPU is spinning on a lock and will trap to the host.
>
>  The current KVM code sleeps for a bit whenever that happens, which
>  results in eg. a 64 VCPU Windows guest taking forever and a bit to
>  boot up.  This is because the VCPU holding the lock is actually
>  running and not sleeping, so the pause is counter-productive.

Seems like simply increasing the spin window help in that case?  Or is
it just too contended a lock (I think they use mcs locks, so I can see a
single wrong sleep causing real contention problems).


It may, but that just pushes the problem to a more contended lock or to 
a higher vcpu count.  We want something that works after PLE threshold 
tuning has failed.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-02 Thread Chris Wright
* Rik van Riel (r...@redhat.com) wrote:
> When running SMP virtual machines, it is possible for one VCPU to be
> spinning on a spinlock, while the VCPU that holds the spinlock is not
> currently running, because the host scheduler preempted it to run
> something else.
> 
> Both Intel and AMD CPUs have a feature that detects when a virtual
> CPU is spinning on a lock and will trap to the host.
> 
> The current KVM code sleeps for a bit whenever that happens, which
> results in eg. a 64 VCPU Windows guest taking forever and a bit to
> boot up.  This is because the VCPU holding the lock is actually
> running and not sleeping, so the pause is counter-productive.

Seems like simply increasing the spin window help in that case?  Or is
it just too contended a lock (I think they use mcs locks, so I can see a
single wrong sleep causing real contention problems).

> In other workloads a pause can also be counter-productive, with
> spinlock detection resulting in one guest giving up its CPU time
> to the others.  Instead of spinning, it ends up simply not running
> much at all.
> 
> This patch series aims to fix that, by having a VCPU that spins
> give the remainder of its timeslice to another VCPU in the same
> guest before yielding the CPU - one that is runnable but got 
> preempted, hopefully the lock holder.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/3] directed yield for Pause Loop Exiting

2010-12-02 Thread Rik van Riel
When running SMP virtual machines, it is possible for one VCPU to be
spinning on a spinlock, while the VCPU that holds the spinlock is not
currently running, because the host scheduler preempted it to run
something else.

Both Intel and AMD CPUs have a feature that detects when a virtual
CPU is spinning on a lock and will trap to the host.

The current KVM code sleeps for a bit whenever that happens, which
results in eg. a 64 VCPU Windows guest taking forever and a bit to
boot up.  This is because the VCPU holding the lock is actually
running and not sleeping, so the pause is counter-productive.

In other workloads a pause can also be counter-productive, with
spinlock detection resulting in one guest giving up its CPU time
to the others.  Instead of spinning, it ends up simply not running
much at all.

This patch series aims to fix that, by having a VCPU that spins
give the remainder of its timeslice to another VCPU in the same
guest before yielding the CPU - one that is runnable but got 
preempted, hopefully the lock holder.

Scheduler people, please flame me with anything I may have done
wrong, so I can do it right for a next version :)

-- 
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html