Re: [kvm-devel] pinning, tsc and apic

2008-05-15 Thread Ryan Harper
* Chris Wright <[EMAIL PROTECTED]> [2008-05-15 02:01]:
> * Anthony Liguori ([EMAIL PROTECTED]) wrote:
> >  From a quick look, I suspect that the number of wildly off TSC 
> > calibrations correspond to the VMs that are misbehaving.  I think this 
> > may mean that we have to re-examine the tsc delta computation.
> > 
> > 10_serial.log:time.c: Detected 1995.038 MHz processor.
> > 11_serial.log:time.c: Detected 2363.195 MHz processor.
> > 12_serial.log:time.c: Detected 2492.675 MHz processor.
> > 13_serial.log:time.c: Detected 1995.061 MHz processor.
> > 14_serial.log:time.c: Detected 1994.917 MHz processor.
> > 15_serial.log:time.c: Detected 4100.735 MHz processor.
> > 16_serial.log:time.c: Detected 2075.800 MHz processor.
> > 17_serial.log:time.c: Detected 2674.350 MHz processor.
> > 18_serial.log:time.c: Detected 1995.002 MHz processor.
> > 19_serial.log:time.c: Detected 1994.978 MHz processor.
> > 1_serial.log:time.c: Detected 4384.310 MHz processor.
> 
> Is this with pinning?  We at least know we're losing small bits on
> migration.  From my measurements it's ~3000 (outliers are 10-20k).
> 
> Also, what happens if you roll back to kvm-userspace 7f5c4d15ece5?

I'll try that next.

> 
> I'm using this:
> 
On tip, using the patch, I still see hosed guests and tons of apic round
robin output, but the tsc calc seems to have stablized:

/tmp/10_serial.log:time.c: Detected 1995.018 MHz processor.
/tmp/11_serial.log:time.c: Detected 1995.009 MHz processor.
/tmp/12_serial.log:time.c: Detected 1995.012 MHz processor.
/tmp/13_serial.log:time.c: Detected 1995.013 MHz processor.
/tmp/14_serial.log:time.c: Detected 1995.016 MHz processor.
/tmp/15_serial.log:time.c: Detected 1995.020 MHz processor.
/tmp/16_serial.log:time.c: Detected 1995.020 MHz processor.
/tmp/18_serial.log:time.c: Detected 1995.020 MHz processor.
/tmp/19_serial.log:time.c: Detected 1995.023 MHz processor.
/tmp/1_serial.log:time.c: Detected 1995.008 MHz processor.
/tmp/20_serial.log:time.c: Detected 1995.011 MHz processor.
/tmp/21_serial.log:time.c: Detected 1995.016 MHz processor.
/tmp/22_serial.log:time.c: Detected 1995.016 MHz processor.
/tmp/23_serial.log:time.c: Detected 1995.013 MHz processor.
/tmp/24_serial.log:time.c: Detected 1995.018 MHz processor.
/tmp/25_serial.log:time.c: Detected 1995.030 MHz processor.
/tmp/26_serial.log:time.c: Detected 1995.021 MHz processor.
/tmp/27_serial.log:time.c: Detected 1995.026 MHz processor.
/tmp/28_serial.log:time.c: Detected 1995.016 MHz processor.
/tmp/29_serial.log:time.c: Detected 1995.012 MHz processor.
/tmp/2_serial.log:time.c: Detected 1995.020 MHz processor.
/tmp/30_serial.log:time.c: Detected 1995.021 MHz processor.
/tmp/31_serial.log:time.c: Detected 1995.021 MHz processor.
/tmp/32_serial.log:time.c: Detected 1995.008 MHz processor.
/tmp/33_serial.log:time.c: Detected 1995.015 MHz processor.
/tmp/34_serial.log:time.c: Detected 1995.018 MHz processor.
/tmp/35_serial.log:time.c: Detected 1995.017 MHz processor.
/tmp/36_serial.log:time.c: Detected 1995.013 MHz processor.
/tmp/37_serial.log:time.c: Detected 1995.003 MHz processor.
/tmp/38_serial.log:time.c: Detected 1995.036 MHz processor.
/tmp/39_serial.log:time.c: Detected 1995.020 MHz processor.
/tmp/3_serial.log:time.c: Detected 1995.017 MHz processor.
/tmp/40_serial.log:time.c: Detected 1994.998 MHz processor.
/tmp/41_serial.log:time.c: Detected 1995.015 MHz processor.
/tmp/43_serial.log:time.c: Detected 1995.007 MHz processor.
/tmp/44_serial.log:time.c: Detected 1995.029 MHz processor.
/tmp/45_serial.log:time.c: Detected 1995.009 MHz processor.
/tmp/46_serial.log:time.c: Detected 1995.025 MHz processor.
/tmp/47_serial.log:time.c: Detected 1995.019 MHz processor.
/tmp/48_serial.log:time.c: Detected 1995.013 MHz processor.
/tmp/4_serial.log:time.c: Detected 1995.024 MHz processor.
/tmp/5_serial.log:time.c: Detected 1995.016 MHz processor.
/tmp/6_serial.log:time.c: Detected 1995.023 MHz processor.
/tmp/7_serial.log:time.c: Detected 1995.036 MHz processor.
/tmp/8_serial.log:time.c: Detected 1995.013 MHz processor.
/tmp/9_serial.log:time.c: Detected 1995.014 MHz processor.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-15 Thread Anthony Liguori
Chris Wright wrote:
> * Anthony Liguori ([EMAIL PROTECTED]) wrote:
>   
>>  From a quick look, I suspect that the number of wildly off TSC 
>> calibrations correspond to the VMs that are misbehaving.  I think this 
>> may mean that we have to re-examine the tsc delta computation.
>>
>> 10_serial.log:time.c: Detected 1995.038 MHz processor.
>> 11_serial.log:time.c: Detected 2363.195 MHz processor.
>> 12_serial.log:time.c: Detected 2492.675 MHz processor.
>> 13_serial.log:time.c: Detected 1995.061 MHz processor.
>> 14_serial.log:time.c: Detected 1994.917 MHz processor.
>> 15_serial.log:time.c: Detected 4100.735 MHz processor.
>> 16_serial.log:time.c: Detected 2075.800 MHz processor.
>> 17_serial.log:time.c: Detected 2674.350 MHz processor.
>> 18_serial.log:time.c: Detected 1995.002 MHz processor.
>> 19_serial.log:time.c: Detected 1994.978 MHz processor.
>> 1_serial.log:time.c: Detected 4384.310 MHz processor.
>> 
>
> Is this with pinning?  We at least know we're losing small bits on
> migration.  From my measurements it's ~3000 (outliers are 10-20k).
>
> Also, what happens if you roll back to kvm-userspace 7f5c4d15ece5?
>
> I'm using this:
>
> diff -up arch/x86/kvm/svm.c~svm arch/x86/kvm/svm.c
> --- arch/x86/kvm/svm.c~svm2008-04-16 19:49:44.0 -0700
> +++ arch/x86/kvm/svm.c2008-05-14 23:44:18.0 -0700
> @@ -621,6 +621,13 @@ static void svm_free_vcpu(struct kvm_vcp
>   kmem_cache_free(kvm_vcpu_cache, svm);
>  }
>  
> +static void svm_tsc_update(void *arg)
> +{
> + struct vcpu_svm *svm = arg;
> + rdtscll(svm->vcpu.arch.host_tsc);
> +
> +}
> +
>  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
> @@ -633,6 +640,9 @@ static void svm_vcpu_load(struct kvm_vcp
>* Make sure that the guest sees a monotonically
>* increasing TSC.
>*/
> + if (vcpu->cpu != -1)
> + smp_call_function_single(vcpu->cpu, svm_tsc_update,
> +  svm, 0, 1);
>   

I like this approach because of its simplicity although the IPI is not 
wonderful.  I was also thinking of using cpu_clock() to take a timestamp 
on vcpu_put, then on vcpu_load, take another timestamp and use the 
cyc2ns conversion to try and estimate the elapsed tsc ticks on the new cpu.

Regards,

Anthony Liguori

>   rdtscll(tsc_this);
>   delta = vcpu->arch.host_tsc - tsc_this;
>   svm->vmcb->control.tsc_offset += delta;
>
>   


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-15 Thread Chris Wright
* Anthony Liguori ([EMAIL PROTECTED]) wrote:
>  From a quick look, I suspect that the number of wildly off TSC 
> calibrations correspond to the VMs that are misbehaving.  I think this 
> may mean that we have to re-examine the tsc delta computation.
> 
> 10_serial.log:time.c: Detected 1995.038 MHz processor.
> 11_serial.log:time.c: Detected 2363.195 MHz processor.
> 12_serial.log:time.c: Detected 2492.675 MHz processor.
> 13_serial.log:time.c: Detected 1995.061 MHz processor.
> 14_serial.log:time.c: Detected 1994.917 MHz processor.
> 15_serial.log:time.c: Detected 4100.735 MHz processor.
> 16_serial.log:time.c: Detected 2075.800 MHz processor.
> 17_serial.log:time.c: Detected 2674.350 MHz processor.
> 18_serial.log:time.c: Detected 1995.002 MHz processor.
> 19_serial.log:time.c: Detected 1994.978 MHz processor.
> 1_serial.log:time.c: Detected 4384.310 MHz processor.

Is this with pinning?  We at least know we're losing small bits on
migration.  From my measurements it's ~3000 (outliers are 10-20k).

Also, what happens if you roll back to kvm-userspace 7f5c4d15ece5?

I'm using this:

diff -up arch/x86/kvm/svm.c~svm arch/x86/kvm/svm.c
--- arch/x86/kvm/svm.c~svm  2008-04-16 19:49:44.0 -0700
+++ arch/x86/kvm/svm.c  2008-05-14 23:44:18.0 -0700
@@ -621,6 +621,13 @@ static void svm_free_vcpu(struct kvm_vcp
kmem_cache_free(kvm_vcpu_cache, svm);
 }
 
+static void svm_tsc_update(void *arg)
+{
+   struct vcpu_svm *svm = arg;
+   rdtscll(svm->vcpu.arch.host_tsc);
+
+}
+
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -633,6 +640,9 @@ static void svm_vcpu_load(struct kvm_vcp
 * Make sure that the guest sees a monotonically
 * increasing TSC.
 */
+   if (vcpu->cpu != -1)
+   smp_call_function_single(vcpu->cpu, svm_tsc_update,
+svm, 0, 1);
rdtscll(tsc_this);
delta = vcpu->arch.host_tsc - tsc_this;
svm->vmcb->control.tsc_offset += delta;


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-14 Thread Anthony Liguori
Marcelo Tosatti wrote:
> On Mon, May 12, 2008 at 02:19:24PM -0500, Ryan Harper wrote:
>   
> Hi Ryan,
>
> There are two places that attempt to use delivery mode 7: kexec crash
> and io_apic_64.c::check_timer().
>
> The later will happen if the guest fails to receive PIT IRQ's for 10
> ticks. If you're using HZ=1000 thats 10ms. See timer_irq_works().
>
> The in-kernel pit emulation has logic which avoids injecting more than
> one IRQ during 10ms.
>
> Note that the guest 10ms delay is TSC based and uses only the lower
> 32-bits of the value. It is quite likely that the TSC adjustment results
> in them increasing more rapidly then they should.
>   

Or that the TSC is terribly miscalibrated.  Here is the log of all 48 
guests.  In this case, the host detects 1995.008 as the frequency.  This 
is a Barcelona.  I suspect that we're masking the 
X86_FEATURE_CONSTANT_TSC though.

 From a quick look, I suspect that the number of wildly off TSC 
calibrations correspond to the VMs that are misbehaving.  I think this 
may mean that we have to re-examine the tsc delta computation.

10_serial.log:time.c: Detected 1995.038 MHz processor.
11_serial.log:time.c: Detected 2363.195 MHz processor.
12_serial.log:time.c: Detected 2492.675 MHz processor.
13_serial.log:time.c: Detected 1995.061 MHz processor.
14_serial.log:time.c: Detected 1994.917 MHz processor.
15_serial.log:time.c: Detected 4100.735 MHz processor.
16_serial.log:time.c: Detected 2075.800 MHz processor.
17_serial.log:time.c: Detected 2674.350 MHz processor.
18_serial.log:time.c: Detected 1995.002 MHz processor.
19_serial.log:time.c: Detected 1994.978 MHz processor.
1_serial.log:time.c: Detected 4384.310 MHz processor.
20_serial.log:time.c: Detected 1994.969 MHz processor.
21_serial.log:time.c: Detected 3670.696 MHz processor.
22_serial.log:time.c: Detected 1994.997 MHz processor.
23_serial.log:time.c: Detected 2218.613 MHz processor.
24_serial.log:time.c: Detected 1995.048 MHz processor.
25_serial.log:time.c: Detected 1995.015 MHz processor.
26_serial.log:time.c: Detected 1994.957 MHz processor.
27_serial.log:time.c: Detected 1995.051 MHz processor.
28_serial.log:time.c: Detected 1995.021 MHz processor.
29_serial.log:time.c: Detected 3679.640 MHz processor.
2_serial.log:time.c: Detected 2191.105 MHz processor.
30_serial.log:time.c: Detected 1995.086 MHz processor.
31_serial.log:time.c: Detected 1995.071 MHz processor.
32_serial.log:time.c: Detected 1995.051 MHz processor.
33_serial.log:time.c: Detected 2331.760 MHz processor.
34_serial.log:time.c: Detected 1995.011 MHz processor.
35_serial.log:time.c: Detected 1995.050 MHz processor.
36_serial.log:time.c: Detected 1994.911 MHz processor.
37_serial.log:time.c: Detected 1994.905 MHz processor.
38_serial.log:time.c: Detected 1994.881 MHz processor.
39_serial.log:time.c: Detected 1995.027 MHz processor.
3_serial.log:time.c: Detected 2051.467 MHz processor.
40_serial.log:time.c: Detected 1994.987 MHz processor.
41_serial.log:time.c: Detected 1994.970 MHz processor.
42_serial.log:time.c: Detected 1994.952 MHz processor.
43_serial.log:time.c: Detected 1995.042 MHz processor.
44_serial.log:time.c: Detected 1994.998 MHz processor.
45_serial.log:time.c: Detected 1995.016 MHz processor.
46_serial.log:time.c: Detected 1995.006 MHz processor.
47_serial.log:time.c: Detected 1995.000 MHz processor.
4_serial.log:time.c: Detected 1995.112 MHz processor.
5_serial.log:time.c: Detected 1995.081 MHz processor.
6_serial.log:time.c: Detected 2017.303 MHz processor.
7_serial.log:time.c: Detected 1995.046 MHz processor.
8_serial.log:time.c: Detected 1994.951 MHz processor.
9_serial.log:time.c: Detected 2184.754 MHz processor.

Regards,

Anthony Liguori

> So can you try setting KVM_MAX_PIT_INTR_INTERVAL to a lower value? HZ/10
> or something.
>
> You can confirm this theory by booting the guests with "apic=debug".
>
> -
> This SF.net email is sponsored by: Microsoft 
> Defy all challenges. Microsoft(R) Visual Studio 2008. 
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-14 Thread Marcelo Tosatti
On Mon, May 12, 2008 at 02:19:24PM -0500, Ryan Harper wrote:
> I've been digging into some of the instability we see when running
> larger numbers of guests at the same time.  The test I'm currently using
> involves launching 64 1vcpu guests on an 8-way AMD box.  With the latest
> kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
> 64 of these 1 second apart, and only a handful (1 to 3)  end up not
> making it up.  In dmesg on the host, I get a couple messages:
> 
> [321365.362534] vcpu not ready for apic_round_robin
> 
> and 
> 
> [321503.023788] Unsupported delivery mode 7
> 
> Now, the interesting bit for me was when I used numactl to pin the guest
> to a processor, all of the guests come up with no issues at all.  As I
> looked into it, it means that we're not running any of the vcpu
> migration code which on svm is comprised of tsc_offset recalibration and
> apic migration, and on vmx, a little more per-vcpu work

Hi Ryan,

There are two places that attempt to use delivery mode 7: kexec crash
and io_apic_64.c::check_timer().

The later will happen if the guest fails to receive PIT IRQ's for 10
ticks. If you're using HZ=1000 thats 10ms. See timer_irq_works().

The in-kernel pit emulation has logic which avoids injecting more than
one IRQ during 10ms.

Note that the guest 10ms delay is TSC based and uses only the lower
32-bits of the value. It is quite likely that the TSC adjustment results
in them increasing more rapidly then they should.

So can you try setting KVM_MAX_PIT_INTR_INTERVAL to a lower value? HZ/10
or something.

You can confirm this theory by booting the guests with "apic=debug".

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-13 Thread Ryan Harper
* Anthony Liguori <[EMAIL PROTECTED]> [2008-05-12 17:00]:
> Ryan Harper wrote:
> >>BTW, what if you don't pace-out the startups?  Do we still have issues 
> >>with that?
> >>
> >
> >Do you mean without the 1 second delay or with a longer delay?  My
> >experience is that delay helps (fewer hangs), but doesn't solve things
> >completely.
> >  
> 
> So you see problems when using numactrl to pin and using a 0-second 
> delay?  The short delay may help reduce the number of CPU migrations 
> which would explain your observation.
> 
> If there are problems when doing a 0-second delay and numactl, then 
> perhaps it's not just a cpu-migration issue.

nodelay, w/pinning -> all OK
delay, w/pinning   -> all OK

with -no-kvm-irqchip (with or without any delay (1 to 30 seconds), I get
in some guests:

..MP-BIOS bug: 8254 timer not connected to IO-APIC
Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the
'noapic' kernel parameter

> >In svm.c, I do think we account for most of that time since the delta
> >calculation will shift the guest time forward to the tsc value read in
> >svm_vcpu_load().  We'll still miss the time between fixing the offset
> >and when the guest can actually read its tsc.
> >  
> 
> Yes, which is the duration that the guest isn't scheduled on any 
> processor and the next time it runs happens to be on a different process.
> 
> >>A possible way to fix this (that's only valid on a processor with a 
> >>fixed-frequency TSC), is to take a high-res timestamp on vcpu_put, and 
> >>then on vcpu_load, take the delta timestamp since the old TSC was saved, 
> >>and use the TSC frequency on the new pcpu to calculate the number of 
> >>elapsed cycles.
> >>
> >>Assuming a fixed frequency TSC, and a calibrated TSC across all 
> >>processors, you could get the same affects by using the VT tsc delta 
> >>logic.  Basically, it always uses the new CPU's TSC unless that would 
> >>cause the guest to move backwards in time.  As long as you have a 
> >>stable, calibrated TSC, this would work out.
> >>
> >>Can you try your old patch that did this and see if it fixes the problem?
> >>
> >
> >Yeah, I'll give it a spin.

Testing the old patch with no-pinning, but just the tsc check doesn't
help the situation.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-12 Thread Anthony Liguori
Ryan Harper wrote:
> * Anthony Liguori <[EMAIL PROTECTED]> [2008-05-12 15:05]:
>   
>> Ryan Harper wrote:
>> 
>>> I've been digging into some of the instability we see when running
>>> larger numbers of guests at the same time.  The test I'm currently using
>>> involves launching 64 1vcpu guests on an 8-way AMD box.
>>>   
>> Note this is a Barcelona system and therefore should have a 
>> fixed-frequency TSC.
>>
>> 
>>>  With the latest
>>> kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
>>> 64 of these 1 second apart,
>>>   
>> BTW, what if you don't pace-out the startups?  Do we still have issues 
>> with that?
>> 
>
> Do you mean without the 1 second delay or with a longer delay?  My
> experience is that delay helps (fewer hangs), but doesn't solve things
> completely.
>   

So you see problems when using numactrl to pin and using a 0-second 
delay?  The short delay may help reduce the number of CPU migrations 
which would explain your observation.

If there are problems when doing a 0-second delay and numactl, then 
perhaps it's not just a cpu-migration issue.

> In svm.c, I do think we account for most of that time since the delta
> calculation will shift the guest time forward to the tsc value read in
> svm_vcpu_load().  We'll still miss the time between fixing the offset
> and when the guest can actually read its tsc.
>   

Yes, which is the duration that the guest isn't scheduled on any 
processor and the next time it runs happens to be on a different process.

>> A possible way to fix this (that's only valid on a processor with a 
>> fixed-frequency TSC), is to take a high-res timestamp on vcpu_put, and 
>> then on vcpu_load, take the delta timestamp since the old TSC was saved, 
>> and use the TSC frequency on the new pcpu to calculate the number of 
>> elapsed cycles.
>>
>> Assuming a fixed frequency TSC, and a calibrated TSC across all 
>> processors, you could get the same affects by using the VT tsc delta 
>> logic.  Basically, it always uses the new CPU's TSC unless that would 
>> cause the guest to move backwards in time.  As long as you have a 
>> stable, calibrated TSC, this would work out.
>>
>> Can you try your old patch that did this and see if it fixes the problem?
>> 
>
> Yeah, I'll give it a spin.
>   

Thanks,

Anthony Liguori



-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-12 Thread Ryan Harper
* Anthony Liguori <[EMAIL PROTECTED]> [2008-05-12 15:05]:
> Ryan Harper wrote:
> >I've been digging into some of the instability we see when running
> >larger numbers of guests at the same time.  The test I'm currently using
> >involves launching 64 1vcpu guests on an 8-way AMD box.
> 
> Note this is a Barcelona system and therefore should have a 
> fixed-frequency TSC.
> 
> >  With the latest
> >kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
> >64 of these 1 second apart,
> 
> BTW, what if you don't pace-out the startups?  Do we still have issues 
> with that?

Do you mean without the 1 second delay or with a longer delay?  My
experience is that delay helps (fewer hangs), but doesn't solve things
completely.

> 
> > and only a handful (1 to 3)  end up not
> >making it up.  In dmesg on the host, I get a couple messages:
> >
> >[321365.362534] vcpu not ready for apic_round_robin
> >
> >and 
> >
> >[321503.023788] Unsupported delivery mode 7
> >
> >Now, the interesting bit for me was when I used numactl to pin the guest
> >to a processor, all of the guests come up with no issues at all.  As I
> >looked into it, it means that we're not running any of the vcpu
> >migration code which on svm is comprised of tsc_offset recalibration and
> >apic migration, and on vmx, a little more per-vcpu work
> >  
> 
> Another data point is that -no-kvm-irqchip doesn't make the situation 
> better.

Right.  Let me clarify; I still see hung guests, but I need to validate
if I see the apic related messages in the host or not, I don't recall
for certain.

> >I've convinced myself that svm.c's tsc offset calculation works and
> >handles the migration from cpu to cpu quite well.  I added the following
> >snippet to trigger if we ever encountered the case where we migrated to
> >a tsc that was behind:
> >
> >rdtscll(tsc_this);
> >delta = vcpu->arch.host_tsc - tsc_this;
> >old_time = vcpu->arch.host_tsc + svm->vmcb->control.tsc_offset;
> >new_time = tsc_this + svm->vmcb->control.tsc_offset + delta;
> >if (new_time < old_time) {
> >printk(KERN_ERR "ACK! (CPU%d->CPU%d) time goes back %llu\n",
> >   vcpu->cpu, cpu, old_time - new_time);
> >}
> >svm->vmcb->control.tsc_offset += delta;
> >  
> 
> Time will never go backwards, but what can happen is that the TSC 
> frequency will slow down.  This is because upon VCPU migration, we don't 
> account for the time between vcpu_put on the old processor and vcpu_load 
> on the new processor.  This time then disappears.

In svm.c, I do think we account for most of that time since the delta
calculation will shift the guest time forward to the tsc value read in
svm_vcpu_load().  We'll still miss the time between fixing the offset
and when the guest can actually read its tsc.

> 
> A possible way to fix this (that's only valid on a processor with a 
> fixed-frequency TSC), is to take a high-res timestamp on vcpu_put, and 
> then on vcpu_load, take the delta timestamp since the old TSC was saved, 
> and use the TSC frequency on the new pcpu to calculate the number of 
> elapsed cycles.
> 
> Assuming a fixed frequency TSC, and a calibrated TSC across all 
> processors, you could get the same affects by using the VT tsc delta 
> logic.  Basically, it always uses the new CPU's TSC unless that would 
> cause the guest to move backwards in time.  As long as you have a 
> stable, calibrated TSC, this would work out.
> 
> Can you try your old patch that did this and see if it fixes the problem?

Yeah, I'll give it a spin.

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] pinning, tsc and apic

2008-05-12 Thread Anthony Liguori
Ryan Harper wrote:
> I've been digging into some of the instability we see when running
> larger numbers of guests at the same time.  The test I'm currently using
> involves launching 64 1vcpu guests on an 8-way AMD box.

Note this is a Barcelona system and therefore should have a 
fixed-frequency TSC.

>   With the latest
> kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
> 64 of these 1 second apart,

BTW, what if you don't pace-out the startups?  Do we still have issues 
with that?

>  and only a handful (1 to 3)  end up not
> making it up.  In dmesg on the host, I get a couple messages:
>
> [321365.362534] vcpu not ready for apic_round_robin
>
> and 
>
> [321503.023788] Unsupported delivery mode 7
>
> Now, the interesting bit for me was when I used numactl to pin the guest
> to a processor, all of the guests come up with no issues at all.  As I
> looked into it, it means that we're not running any of the vcpu
> migration code which on svm is comprised of tsc_offset recalibration and
> apic migration, and on vmx, a little more per-vcpu work
>   

Another data point is that -no-kvm-irqchip doesn't make the situation 
better.

> I've convinced myself that svm.c's tsc offset calculation works and
> handles the migration from cpu to cpu quite well.  I added the following
> snippet to trigger if we ever encountered the case where we migrated to
> a tsc that was behind:
>
> rdtscll(tsc_this);
> delta = vcpu->arch.host_tsc - tsc_this;
> old_time = vcpu->arch.host_tsc + svm->vmcb->control.tsc_offset;
> new_time = tsc_this + svm->vmcb->control.tsc_offset + delta;
> if (new_time < old_time) {
> printk(KERN_ERR "ACK! (CPU%d->CPU%d) time goes back %llu\n",
>vcpu->cpu, cpu, old_time - new_time);
> }
> svm->vmcb->control.tsc_offset += delta;
>   

Time will never go backwards, but what can happen is that the TSC 
frequency will slow down.  This is because upon VCPU migration, we don't 
account for the time between vcpu_put on the old processor and vcpu_load 
on the new processor.  This time then disappears.

A possible way to fix this (that's only valid on a processor with a 
fixed-frequency TSC), is to take a high-res timestamp on vcpu_put, and 
then on vcpu_load, take the delta timestamp since the old TSC was saved, 
and use the TSC frequency on the new pcpu to calculate the number of 
elapsed cycles.

Assuming a fixed frequency TSC, and a calibrated TSC across all 
processors, you could get the same affects by using the VT tsc delta 
logic.  Basically, it always uses the new CPU's TSC unless that would 
cause the guest to move backwards in time.  As long as you have a 
stable, calibrated TSC, this would work out.

Can you try your old patch that did this and see if it fixes the problem?

> Noting that vcpu->arch.host_tsc is the tsc of the previous cpu the vcpu
> was running on (see svm_put_vcpu()).  This allows me to check if we are
> in fact increasing the guest's view of the tsc.  I've not be able to
> trigger this at all when the vcpus are migrating.
>
> As for the apic, the migrate code seems to be rather simple, but I've
> not yet dived in to see if we've got anything racy in there:
>
> lapic.c:
> void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
> {
> struct kvm_lapic *apic = vcpu->arch.apic;
> struct hrtimer *timer;
>
> if (!apic)
> return;
>
> timer = &apic->timer.dev;
> if (hrtimer_cancel(timer))
> hrtimer_start(timer, timer->expires, HRTIMER_MODE_ABS);
> }
>
>   

There's a big FIXME in the __apic_timer_fn() to make sure the timer runs 
on the current "pCPU".  As written, it's possible for the timer to 
happen on a different pcpu as the current vcpu's but it wasn't obvious 
to me that it would cause problems.  Eddie, et al: Care to elaborate on 
what the TODO was trying to address?

Regards,

Anthony Liguori

> Ryan Harper
> Software Engineer; Linux Technology Center
> IBM Corp., Austin, Tx
> (512) 838-9253   T/L: 678-9253
> [EMAIL PROTECTED]
>
> -
> This SF.net email is sponsored by: Microsoft 
> Defy all challenges. Microsoft(R) Visual Studio 2008. 
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   


-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] pinning, tsc and apic

2008-05-12 Thread Ryan Harper
I've been digging into some of the instability we see when running
larger numbers of guests at the same time.  The test I'm currently using
involves launching 64 1vcpu guests on an 8-way AMD box.  With the latest
kvm-userspace git and kvm.git + Gerd's kvmclock fixes, I can launch all
64 of these 1 second apart, and only a handful (1 to 3)  end up not
making it up.  In dmesg on the host, I get a couple messages:

[321365.362534] vcpu not ready for apic_round_robin

and 

[321503.023788] Unsupported delivery mode 7

Now, the interesting bit for me was when I used numactl to pin the guest
to a processor, all of the guests come up with no issues at all.  As I
looked into it, it means that we're not running any of the vcpu
migration code which on svm is comprised of tsc_offset recalibration and
apic migration, and on vmx, a little more per-vcpu work

I've convinced myself that svm.c's tsc offset calculation works and
handles the migration from cpu to cpu quite well.  I added the following
snippet to trigger if we ever encountered the case where we migrated to
a tsc that was behind:

rdtscll(tsc_this);
delta = vcpu->arch.host_tsc - tsc_this;
old_time = vcpu->arch.host_tsc + svm->vmcb->control.tsc_offset;
new_time = tsc_this + svm->vmcb->control.tsc_offset + delta;
if (new_time < old_time) {
printk(KERN_ERR "ACK! (CPU%d->CPU%d) time goes back %llu\n",
   vcpu->cpu, cpu, old_time - new_time);
}
svm->vmcb->control.tsc_offset += delta;

Noting that vcpu->arch.host_tsc is the tsc of the previous cpu the vcpu
was running on (see svm_put_vcpu()).  This allows me to check if we are
in fact increasing the guest's view of the tsc.  I've not be able to
trigger this at all when the vcpus are migrating.

As for the apic, the migrate code seems to be rather simple, but I've
not yet dived in to see if we've got anything racy in there:

lapic.c:
void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
struct hrtimer *timer;

if (!apic)
return;

timer = &apic->timer.dev;
if (hrtimer_cancel(timer))
hrtimer_start(timer, timer->expires, HRTIMER_MODE_ABS);
}



Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
(512) 838-9253   T/L: 678-9253
[EMAIL PROTECTED]

-
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel