Re: [PATCH 17/17] Add timekeeping documentation

2010-06-18 Thread Andi Kleen
 The point is about hotplug CPUs.  Any hotplugged CPU will not have a 
 perfectly synchronized TSC, ever, even on a single socket, single crystal 
 board.

hotplug was in the next section, not in this.

Besides most systems do not support hotplug CPUs.

-Andi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17] Add timekeeping documentation

2010-06-18 Thread Zachary Amsden

On 06/17/2010 09:49 PM, Andi Kleen wrote:

The point is about hotplug CPUs.  Any hotplugged CPU will not have a
perfectly synchronized TSC, ever, even on a single socket, single crystal
board.
 

hotplug was in the next section, not in this.
   


Yeah, I reread it and this section was totally confused.  Hopefully it 
makes more sense this time around.



Besides most systems do not support hotplug CPUs.
   


Yet.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17] Add timekeeping documentation

2010-06-17 Thread Andi Kleen
Zachary Amsden zams...@redhat.com writes:

I think listing all the obscure bits in the PIT was an attempt to 
weed out the weak and weary readers early, right?

 +this as well.  Several hardware limitations make the problem worse - if it is
 +not possible to write the full 32-bits of the TSC, it may be impossible to
 +match the TSC in newly arriving CPUs to that of the rest of the system,
 +resulting in unsynchronized TSCs.  This may be done by BIOS or system 
 software,
 +but in practice, getting a perfectly synchronized TSC will not be possible
 +unless all values are read from the same clock, which generally only is
 +possible on single socket systems or those with special hardware
 +support.

That's not true, single crystal for all sockets is very common
as long as you only have a single motherboard.

Of course there might be other reasons why the TSC is unsynchronized
(e.g. stop count in C-states), but the single clock is not the problem.

 +3.4) TSC and C-states
 +
 +C-states, or idling states of the processor, especially C1E and deeper sleep
 +states may be problematic for TSC as well.  The TSC may stop advancing in 
 such
 +a state, resulting in a TSC which is behind that of other CPUs when execution
 +is resumed.  Such CPUs must be detected and flagged by the operating system
 +based on CPU and chipset identifications.
 +
 +The TSC in such a case may be corrected by catching it up to a known external
 +clocksource.

... This is fixed in recent CPUs ...

 +
 +3.5) TSC frequency change / P-states
 +
 +To make things slightly more interesting, some CPUs may change requency.  
 They
 +may or may not run the TSC at the same rate, and because the frequency change
 +may be staggered or slewed, at some points in time, the TSC rate may not be
 +known other than falling within a range of values.  In this case, the TSC 
 will
 +not be a stable time source, and must be calibrated against a known, stable,
 +external clock to be a usable source of time.
 +
 +Whether the TSC runs at a constant rate or scales with the P-state is model
 +dependent and must be determined by inspecting CPUID, chipset or various MSR
 +fields.

... In general newer CPUs should not have problems with this anymore

 +
 +4) Virtualization Problems
 +
 +Timekeeping is especially problematic for virtualization because a number of
 +challenges arise.  The most obvious problem is that time is now shared 
 between
 +the host and, potentially, a number of virtual machines.  This happens
 +naturally on X86 systems when SMM mode is used by the BIOS, but not to such a
 +degree nor with such frequency.  However, the fact that SMM mode may cause

The SMM reference here seems at best odd. 

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17] Add timekeeping documentation

2010-06-17 Thread Zachary Amsden

On 06/16/2010 10:55 PM, Andi Kleen wrote:

Zachary Amsdenzams...@redhat.com  writes:

I think listing all the obscure bits in the PIT was an attempt to
weed out the weak and weary readers early, right?
   


Very perceptive of you ;)

   

+this as well.  Several hardware limitations make the problem worse - if it is
+not possible to write the full 32-bits of the TSC, it may be impossible to
+match the TSC in newly arriving CPUs to that of the rest of the system,
+resulting in unsynchronized TSCs.  This may be done by BIOS or system software,
+but in practice, getting a perfectly synchronized TSC will not be possible
+unless all values are read from the same clock, which generally only is
+possible on single socket systems or those with special hardware
+support.
 

That's not true, single crystal for all sockets is very common
as long as you only have a single motherboard.

Of course there might be other reasons why the TSC is unsynchronized
(e.g. stop count in C-states), but the single clock is not the problem.
   


The point is about hotplug CPUs.  Any hotplugged CPU will not have a 
perfectly synchronized TSC, ever, even on a single socket, single 
crystal board.


   

+3.4) TSC and C-states
+
+C-states, or idling states of the processor, especially C1E and deeper sleep
+states may be problematic for TSC as well.  The TSC may stop advancing in such
+a state, resulting in a TSC which is behind that of other CPUs when execution
+is resumed.  Such CPUs must be detected and flagged by the operating system
+based on CPU and chipset identifications.
+
+The TSC in such a case may be corrected by catching it up to a known external
+clocksource.
 

... This is fixed in recent CPUs ...
   


And has a CPU flag associated with it (NONSTOP_TSC).  But whether it 
remains fixed across all models and vendors remains to be seen.



+
+3.5) TSC frequency change / P-states
+
+To make things slightly more interesting, some CPUs may change requency.  They
+may or may not run the TSC at the same rate, and because the frequency change
+may be staggered or slewed, at some points in time, the TSC rate may not be
+known other than falling within a range of values.  In this case, the TSC will
+not be a stable time source, and must be calibrated against a known, stable,
+external clock to be a usable source of time.
+
+Whether the TSC runs at a constant rate or scales with the P-state is model
+dependent and must be determined by inspecting CPUID, chipset or various MSR
+fields.
 

... In general newer CPUs should not have problems with this anymore
   


But that's not the point.  Old CPUs will, and I'm detailing all of the 
existing issues, relevant to new CPUs or not.  A lot of these old CPUs 
are still in service and will be for quite some time.


   

+
+4) Virtualization Problems
+
+Timekeeping is especially problematic for virtualization because a number of
+challenges arise.  The most obvious problem is that time is now shared between
+the host and, potentially, a number of virtual machines.  This happens
+naturally on X86 systems when SMM mode is used by the BIOS, but not to such a
+degree nor with such frequency.  However, the fact that SMM mode may cause
 

The SMM reference here seems at best odd.
   


SMIs are notorious for frustrating writers of careful timing loops, and 
several pieces of kernel code take time measurements multiple times to 
rule out outliers from it.


Seems a perfectly reasonable reference to me, perhaps I should explain 
it better.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17] Add timekeeping documentation

2010-06-16 Thread Zachary Amsden

On 06/15/2010 10:27 AM, Randy Dunlap wrote:

On Mon, 14 Jun 2010 21:34:19 -1000 Zachary Amsden wrote:

   

Basic informational document about x86 timekeeping and how KVM
is affected.
 

Nice job/information.  Thanks.

Just some typos etc. inline below.
   


Thanks for all the detailed feedback!  I'll include it all in the next 
revision.  I only have one response:



+ low, the count is halted.  If the output is low when the gate is lowered, the
+ output automatically goes high (this only affects timer 2).
+
+Mode 3: Square Wave.   This generates a sine wave.  The count determines the
 

a sine wave is a square wave???
   


Actually, yes.  The hardware output from timer 2 output goes through a 
low pass filter which effectively smooths the square wave to a sine to 
make the PC speaker sound somewhat less annoying.  I didn't have room to 
depict this in the schematic.  Of course, the counter output isn't 
directly a sine wave, and the LPF may not be all that effective, but it 
was the intent.


Zach
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/17] Add timekeeping documentation

2010-06-15 Thread Zachary Amsden
Basic informational document about x86 timekeeping and how KVM
is affected.

Signed-off-by: Zachary Amsden zams...@redhat.com
---
 Documentation/kvm/timekeeping.txt |  599 +
 1 files changed, 599 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/timekeeping.txt

diff --git a/Documentation/kvm/timekeeping.txt 
b/Documentation/kvm/timekeeping.txt
new file mode 100644
index 000..4ce8edf
--- /dev/null
+++ b/Documentation/kvm/timekeeping.txt
@@ -0,0 +1,599 @@
+
+   Timekeeping Virtualization for X86-Based Architectures
+
+   Zachary Amsden zams...@redhat.com
+   Copyright (c) 2010, Red Hat.  All rights reserved.
+
+1) Overview
+2) Timing Devices
+3) TSC Hardware
+4) Virtualization Problems
+
+=
+
+1) Overview
+
+One of the most complicated parts of the X86 platform, and specifically,
+the virtualization of this platform is the plethora of timing devices available
+and the complexity of emulating those devices.  In addition, virtualization of
+time introduces a new set of challenges because it introduces a multiplexed
+division of time beyond the control of the guest CPU.
+
+First, we will describe the various timekeeping hardware available, then
+present some of the problems which arise and solutions available, giving
+specific recommendations for certain classes of KVM guests.
+
+The purpose of this document is to collect data and information relevant to
+time keeping which may be difficult to find elsewhere, specifically,
+information relevant to KVM and hardware based virtualization.
+
+=
+
+2) Timing Devices
+
+First we discuss the basic hardware devices available.  TSC and the related
+KVM clock are special enough to warrant a full exposition and are described in
+the following section.
+
+2.1) i8254 - PIT
+
+One of the first timer devices available is the programmable interrupt timer,
+or PIT.  The PIT has a fixed frequency 1.193182 MHz base clock and three
+channels which can be programmed to deliver periodic or one-shot interrupts.
+These three channels can be configured in different modes and have individual
+counters.  Channel 1 and 2 were not available for general use in the original
+IBM PC, and historically were connected to control RAM refresh and the PC
+speaker.  Now the PIT is typically integrated as part of an emulated chipset
+and a separate physical PIT is not used.
+
+The PIT uses I/O ports 0x40h - 0x43.  Access to the 16-bit counters is done
+using single or multiple byte access to the I/O ports.  There are 6 modes
+available, but not all modes are available to all timers, as only timer 2
+has a connected gate input, required for modes 1 and 5.  The gate line is
+controlled by port 61h, bit 0, as illustrated in the following diagram.
+
+ --  
+|  |   ||
+|  1.1932 MHz  |--| CLOCK  OUT | - IRQ 0
+|Clock |   |   ||
+ --|+-| GATE  TIMER 0  |
+   | 
+   |
+   |
+   |   ||
+   |--| CLOCK  OUT | - 66.3 KHZ DRAM
+   |   ||(aka /dev/null)
+   |+-| GATE  TIMER 1  |
+   |
+   |
+   | 
+   |   ||
+   |--| CLOCK  OUT | - Port 61h, bit 5
+   ||  |
+Port 61h, bit 0 --| GATE  TIMER 2  |   \_.
+ _|) Speaker
+/ *
+Port 61h, bit 1 ---/
+
+The timer modes are now described.
+
+Mode 0: Single Timeout.   This is a one shot software timeout that counts down
+ when the gate is high (always true for timers 0 and 1).  When the count
+ reaches zero, the output goes high.
+
+Mode 1: Triggered One Shot.  The output is intially set high.  When the gate
+ line is set high, a countdown is initiated (which does not stop if the gate is
+ lowered), during which the output is set low.  When the count reaches zero,
+ the output goes high.
+
+Mode 2: Rate Generator.  The output is initially set high.  When the countdown
+ reaches 1, the output goes low for one count and then returns high.  The value
+ is reloaded and the countdown automatically resume.  If the gate line goes
+ low, the count is halted.  If the output is low when the gate is lowered, the
+ output automatically goes high (this only affects timer 2).
+
+Mode 3: Square Wave.   This generates a sine wave.  The count 

Re: [PATCH 17/17] Add timekeeping documentation

2010-06-15 Thread Avi Kivity

On 06/15/2010 10:34 AM, Zachary Amsden wrote:

Basic informational document about x86 timekeeping and how KVM
is affected.
   


Excellent.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 17/17] Add timekeeping documentation

2010-06-15 Thread Randy Dunlap
On Mon, 14 Jun 2010 21:34:19 -1000 Zachary Amsden wrote:

 Basic informational document about x86 timekeeping and how KVM
 is affected.

Nice job/information.  Thanks.

Just some typos etc. inline below.


 Signed-off-by: Zachary Amsden zams...@redhat.com
 ---
  Documentation/kvm/timekeeping.txt |  599 
 +
  1 files changed, 599 insertions(+), 0 deletions(-)
  create mode 100644 Documentation/kvm/timekeeping.txt
 
 diff --git a/Documentation/kvm/timekeeping.txt 
 b/Documentation/kvm/timekeeping.txt
 new file mode 100644
 index 000..4ce8edf
 --- /dev/null
 +++ b/Documentation/kvm/timekeeping.txt
 @@ -0,0 +1,599 @@
 +
 + Timekeeping Virtualization for X86-Based Architectures
 +
 + Zachary Amsden zams...@redhat.com
 + Copyright (c) 2010, Red Hat.  All rights reserved.
 +
 +1) Overview
 +2) Timing Devices
 +3) TSC Hardware
 +4) Virtualization Problems
 +
 +=
 +
 +1) Overview
 +
 +One of the most complicated parts of the X86 platform, and specifically,
 +the virtualization of this platform is the plethora of timing devices 
 available
 +and the complexity of emulating those devices.  In addition, virtualization 
 of
 +time introduces a new set of challenges because it introduces a multiplexed
 +division of time beyond the control of the guest CPU.
 +
 +First, we will describe the various timekeeping hardware available, then
 +present some of the problems which arise and solutions available, giving
 +specific recommendations for certain classes of KVM guests.
 +
 +The purpose of this document is to collect data and information relevant to
 +time keeping which may be difficult to find elsewhere, specifically,

   timekeeping

 +information relevant to KVM and hardware based virtualization.

   hardware-based

 +
 +=
 +
 +2) Timing Devices
 +
 +First we discuss the basic hardware devices available.  TSC and the related
 +KVM clock are special enough to warrant a full exposition and are described 
 in
 +the following section.
 +
 +2.1) i8254 - PIT
 +
 +One of the first timer devices available is the programmable interrupt timer,
 +or PIT.  The PIT has a fixed frequency 1.193182 MHz base clock and three
 +channels which can be programmed to deliver periodic or one-shot interrupts.
 +These three channels can be configured in different modes and have individual
 +counters.  Channel 1 and 2 were not available for general use in the original
 +IBM PC, and historically were connected to control RAM refresh and the PC
 +speaker.  Now the PIT is typically integrated as part of an emulated chipset
 +and a separate physical PIT is not used.
 +
 +The PIT uses I/O ports 0x40h - 0x43.  Access to the 16-bit counters is done

   drop the 'h'   ^

 +using single or multiple byte access to the I/O ports.  There are 6 modes
 +available, but not all modes are available to all timers, as only timer 2
 +has a connected gate input, required for modes 1 and 5.  The gate line is
 +controlled by port 61h, bit 0, as illustrated in the following diagram.
 +
 + --  
 +|  |   ||
 +|  1.1932 MHz  |--| CLOCK  OUT | - IRQ 0
 +|Clock |   |   ||
 + --|+-| GATE  TIMER 0  |
 +   | 
 +   |
 +   |
 +   |   ||
 +   |--| CLOCK  OUT | - 66.3 KHZ DRAM
 +   |   ||(aka /dev/null)
 +   |+-| GATE  TIMER 1  |
 +   |
 +   |
 +   | 
 +   |   ||
 +   |--| CLOCK  OUT | - Port 61h, bit 5
 +   ||  |
 +Port 61h, bit 0 --| GATE  TIMER 2  |   \_.
 + _|) Speaker
 +/ *
 +Port 61h, bit 1 ---/
 +
 +The timer modes are now described.
 +
 +Mode 0: Single Timeout.   This is a one shot software timeout that counts 
 down

   one-shot

 + when the gate is high (always true for timers 0 and 1).  When the count
 + reaches zero, the output goes high.
 +
 +Mode 1: Triggered One Shot.  The output is intially set high.  When the gate

 One-shot (or One-Shot)

 + line is set high, a countdown is initiated (which does not stop if the gate 
 is
 + lowered), during which the output is set low.  When the count reaches zero,
 + the output goes high.
 +
 +Mode 2: