Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-10-25 10:56, Paolo Bonzini wrote: > Il 10/10/2013 17:20, Paolo Bonzini ha scritto: Arthur, find some fix-ups for your test case below. It avoids printing from within L2 as this could deadlock when the timer fires and L1 then tries to print something. Also, it disables the preemption timer on leave so that it cannot fire later on again. If you want to fold this into your patch, feel free. Otherwise I can post a separate patch on top. >> Is that a Signed-off-by? :) > > Ping? Sent as separate patch two days ago. Jan -- Siemens AG, Corporate Technology, CT RTC ITP SES-DE Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Il 10/10/2013 17:20, Paolo Bonzini ha scritto: >> > Arthur, find some fix-ups for your test case below. It avoids printing >> > from within L2 as this could deadlock when the timer fires and L1 then >> > tries to print something. Also, it disables the preemption timer on >> > leave so that it cannot fire later on again. If you want to fold this >> > into your patch, feel free. Otherwise I can post a separate patch on >> > top. > Is that a Signed-off-by? :) Ping? Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Hi Jan, On Fri, Oct 11, 2013 at 12:12 AM, Jan Kiszka wrote: > On 2013-10-02 20:47, Jan Kiszka wrote: >> On 2013-09-30 11:08, Jan Kiszka wrote: >>> On 2013-09-26 17:04, Paolo Bonzini wrote: Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: > This patch contains the following two changes: > 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > with some reasons not emulated by L1, preemption timer value should > be save in such exits. > 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > to nVMX. > > With this patch, nested VMX preemption timer features are fully > supported. > > Signed-off-by: Arthur Chunqi Li > --- > ChangeLog to v4: >Format changes and remove a flag in nested_vmx. > arch/x86/include/uapi/asm/msr-index.h |1 + > arch/x86/kvm/vmx.c| 44 > +++-- > 2 files changed, 43 insertions(+), 2 deletions(-) Hi all, the test fails for me if the preemption timer value is set to a value that is above ~2000 (which means ~65000 TSC cycles on this machine). The preemption timer seems to count faster than what is expected, for example only up to 4 million cycles if you set it to one million. So, I am leaving the patch out of kvm/queue for now, until I can test it on more processors. >>> >>> I've done some measurements with the help of ftrace on the time it takes >>> to let the preemption timer trigger (no adjustments via Arthur's patch >>> were involved): On my Core i7-620M, the preemption timer seems to tick >>> almost 10 times faster than spec and scale value (5) suggests. I've >>> loaded a value of 10, and it took about 130 盜 until I got a vmexit >>> with reason PREEMPTION_TIMER (no other exists in between). >>> >>> qemu-system-x86-13765 [003] 298562.966079: bprint: >>> prepare_vmcs02: preempt val 10 >>> qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0 >>> qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason >>> PREEMPTION_TIMER rip 0x401fea info 0 0 >>> >>> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz * >>> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty >>> clear on this. It just doesn't match reality. Very strange. >> >> ...but documented: I found an related errata for my processor (AAT59) >> and also for Xeon 5500 (AAK139). At least current Haswell generation is >> no affected. I can test the patch on a Haswell board I have at work >> later this week. > > To complete this story: Arthur's patch works fine on a non-broken CPU > (here: i7-4770S). > > Arthur, find some fix-ups for your test case below. It avoids printing > from within L2 as this could deadlock when the timer fires and L1 then > tries to print something. Also, it disables the preemption timer on > leave so that it cannot fire later on again. If you want to fold this > into your patch, feel free. Otherwise I can post a separate patch on > top. I think this can be treated as a separate patch to our test suite. You can post it on top. I have tested it and it works fine. Arthur > > Jan > > diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c > index 4372878..66a4201 100644 > --- a/x86/vmx_tests.c > +++ b/x86/vmx_tests.c > @@ -141,6 +141,9 @@ void preemption_timer_init() > preempt_val = 1000; > vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); > preempt_scale = rdmsr(MSR_IA32_VMX_MISC) & 0x1F; > + > + if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT)) > + printf("\tSave preemption value is not supported\n"); > } > > void preemption_timer_main() > @@ -150,9 +153,7 @@ void preemption_timer_main() > printf("\tPreemption timer is not supported\n"); > return; > } > - if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT)) > - printf("\tSave preemption value is not supported\n"); > - else { > + if (ctrl_exit_rev.clr & EXI_SAVE_PREEMPT) { > set_stage(0); > vmcall(); > if (get_stage() == 1) > @@ -161,8 +162,8 @@ void preemption_timer_main() > while (1) { > if (((rdtsc() - tsc_val) >> preempt_scale) > > 10 * preempt_val) { > - report("Preemption timer", 0); > - break; > + set_stage(2); > + vmcall(); > } > } > } > @@ -183,7 +184,7 @@ int preemption_timer_exit_handler() > report("Preemption timer", 0); > else > report("Preemption timer", 1); > - return VMX_TEST_VMEXIT; > + break; > case VMX_VMCALL: > switch (get_stage()) { > case 0: > @@ -195,24 +196,29 @@ int preemption_timer_exit_handle
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Il 10/10/2013 18:12, Jan Kiszka ha scritto: > On 2013-10-02 20:47, Jan Kiszka wrote: >> On 2013-09-30 11:08, Jan Kiszka wrote: >>> On 2013-09-26 17:04, Paolo Bonzini wrote: Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: > This patch contains the following two changes: > 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > with some reasons not emulated by L1, preemption timer value should > be save in such exits. > 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > to nVMX. > > With this patch, nested VMX preemption timer features are fully > supported. > > Signed-off-by: Arthur Chunqi Li > --- > ChangeLog to v4: > Format changes and remove a flag in nested_vmx. > arch/x86/include/uapi/asm/msr-index.h |1 + > arch/x86/kvm/vmx.c| 44 > +++-- > 2 files changed, 43 insertions(+), 2 deletions(-) Hi all, the test fails for me if the preemption timer value is set to a value that is above ~2000 (which means ~65000 TSC cycles on this machine). The preemption timer seems to count faster than what is expected, for example only up to 4 million cycles if you set it to one million. So, I am leaving the patch out of kvm/queue for now, until I can test it on more processors. >>> >>> I've done some measurements with the help of ftrace on the time it takes >>> to let the preemption timer trigger (no adjustments via Arthur's patch >>> were involved): On my Core i7-620M, the preemption timer seems to tick >>> almost 10 times faster than spec and scale value (5) suggests. I've >>> loaded a value of 10, and it took about 130 µs until I got a vmexit >>> with reason PREEMPTION_TIMER (no other exists in between). >>> >>> qemu-system-x86-13765 [003] 298562.966079: bprint: >>> prepare_vmcs02: preempt val 10 >>> qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0 >>> qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason >>> PREEMPTION_TIMER rip 0x401fea info 0 0 >>> >>> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz * >>> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty >>> clear on this. It just doesn't match reality. Very strange. >> >> ...but documented: I found an related errata for my processor (AAT59) >> and also for Xeon 5500 (AAK139). At least current Haswell generation is >> no affected. I can test the patch on a Haswell board I have at work >> later this week. > > To complete this story: Arthur's patch works fine on a non-broken CPU > (here: i7-4770S). > > Arthur, find some fix-ups for your test case below. It avoids printing > from within L2 as this could deadlock when the timer fires and L1 then > tries to print something. Also, it disables the preemption timer on > leave so that it cannot fire later on again. If you want to fold this > into your patch, feel free. Otherwise I can post a separate patch on > top. Is that a Signed-off-by? :) BTW, VirtualBox has a test for this erratum. It would be nice to skip the test when the processor is found to be buggy. I'll put Arthur's patch back. Thanks for testing! Paolo static bool hmR0InitIntelIsSubjectToVmxPreemptionTimerErratum(void) { uint32_t u = ASMCpuId_EAX(1); u &= ~(RT_BIT_32(14) | RT_BIT_32(15) | RT_BIT_32(28) | RT_BIT_32(29) | RT_BIT_32(30) | RT_BIT_32(31)); if ( u == UINT32_C(0x000206E6) /* 323344.pdf - BA86 - D0 - Intel Xeon Processor 7500 Series */ || u == UINT32_C(0x00020652) /* 323056.pdf - AAX65 - C2 - Intel Xeon Processor L3406 */ || u == UINT32_C(0x00020652) /* 322814.pdf - AAT59 - C2 - Intel CoreTM i7-600, i5-500, i5-400 and i3-300 Mobile Processor Series */ || u == UINT32_C(0x00020652) /* 322911.pdf - AAU65 - C2 - Intel CoreTM i5-600, i3-500 Desktop Processor Series and Intel Pentium Processor G6950 */ || u == UINT32_C(0x00020655) /* 322911.pdf - AAU65 - K0 - Intel CoreTM i5-600, i3-500 Desktop Processor Series and Intel Pentium Processor G6950 */ || u == UINT32_C(0x000106E5) /* 322373.pdf - AAO95 - B1 - Intel Xeon Processor 3400 Series */ || u == UINT32_C(0x000106E5) /* 322166.pdf - AAN92 - B1 - Intel CoreTM i7-800 and i5-700 Desktop Processor Series */ || u == UINT32_C(0x000106E5) /* 320767.pdf - AAP86 - B1 - Intel Core i7-900 Mobile Processor Extreme Edition Series, Intel Core i7-800 and i7-700 Mobile Processor Series */ || u == UINT32_C(0x000106A0) /*?321333.pdf - AAM126 - C0 - Intel Xeon Processor 3500 Series Specification */ || u == UINT32_C(0x000106A1) /*?321333.pdf - AAM126 - C1 - Intel Xeon Processor 3500 Series Specification */ || u == UINT32_C(0x000106A4) /* 320836.pdf - AAJ124 - C0 - Intel Core i7-900 Desktop Processor Extreme Edition Series and Intel Core i7-900 Desktop Processor Series */
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-10-02 20:47, Jan Kiszka wrote: > On 2013-09-30 11:08, Jan Kiszka wrote: >> On 2013-09-26 17:04, Paolo Bonzini wrote: >>> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of "Save VMX-preemption timer value" VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li --- ChangeLog to v4: Format changes and remove a flag in nested_vmx. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 44 +++-- 2 files changed, 43 insertions(+), 2 deletions(-) >>> >>> Hi all, >>> >>> the test fails for me if the preemption timer value is set to a value >>> that is above ~2000 (which means ~65000 TSC cycles on this machine). >>> The preemption timer seems to count faster than what is expected, for >>> example only up to 4 million cycles if you set it to one million. >>> So, I am leaving the patch out of kvm/queue for now, until I can >>> test it on more processors. >> >> I've done some measurements with the help of ftrace on the time it takes >> to let the preemption timer trigger (no adjustments via Arthur's patch >> were involved): On my Core i7-620M, the preemption timer seems to tick >> almost 10 times faster than spec and scale value (5) suggests. I've >> loaded a value of 10, and it took about 130 µs until I got a vmexit >> with reason PREEMPTION_TIMER (no other exists in between). >> >> qemu-system-x86-13765 [003] 298562.966079: bprint: >> prepare_vmcs02: preempt val 10 >> qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0 >> qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason >> PREEMPTION_TIMER rip 0x401fea info 0 0 >> >> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz * >> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty >> clear on this. It just doesn't match reality. Very strange. > > ...but documented: I found an related errata for my processor (AAT59) > and also for Xeon 5500 (AAK139). At least current Haswell generation is > no affected. I can test the patch on a Haswell board I have at work > later this week. To complete this story: Arthur's patch works fine on a non-broken CPU (here: i7-4770S). Arthur, find some fix-ups for your test case below. It avoids printing from within L2 as this could deadlock when the timer fires and L1 then tries to print something. Also, it disables the preemption timer on leave so that it cannot fire later on again. If you want to fold this into your patch, feel free. Otherwise I can post a separate patch on top. Jan diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c index 4372878..66a4201 100644 --- a/x86/vmx_tests.c +++ b/x86/vmx_tests.c @@ -141,6 +141,9 @@ void preemption_timer_init() preempt_val = 1000; vmcs_write(PREEMPT_TIMER_VALUE, preempt_val); preempt_scale = rdmsr(MSR_IA32_VMX_MISC) & 0x1F; + + if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT)) + printf("\tSave preemption value is not supported\n"); } void preemption_timer_main() @@ -150,9 +153,7 @@ void preemption_timer_main() printf("\tPreemption timer is not supported\n"); return; } - if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT)) - printf("\tSave preemption value is not supported\n"); - else { + if (ctrl_exit_rev.clr & EXI_SAVE_PREEMPT) { set_stage(0); vmcall(); if (get_stage() == 1) @@ -161,8 +162,8 @@ void preemption_timer_main() while (1) { if (((rdtsc() - tsc_val) >> preempt_scale) > 10 * preempt_val) { - report("Preemption timer", 0); - break; + set_stage(2); + vmcall(); } } } @@ -183,7 +184,7 @@ int preemption_timer_exit_handler() report("Preemption timer", 0); else report("Preemption timer", 1); - return VMX_TEST_VMEXIT; + break; case VMX_VMCALL: switch (get_stage()) { case 0: @@ -195,24 +196,29 @@ int preemption_timer_exit_handler() EXI_SAVE_PREEMPT) & ctrl_exit_rev.clr; vmcs_write(EXI_CONTROLS, ctrl_exit); } - break; + vmcs_write(GUEST_RIP, guest_rip + insn_len); + return VMX_TEST_RESUME; case 1:
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 30/09/2013 11:08, Jan Kiszka ha scritto: > On 2013-09-26 17:04, Paolo Bonzini wrote: >> Hi all, >> >> the test fails for me if the preemption timer value is set to a >> value that is above ~2000 (which means ~65000 TSC cycles on this >> machine). The preemption timer seems to count faster than what >> is expected, for example only up to 4 million cycles if you set >> it to one million. So, I am leaving the patch out of kvm/queue >> for now, until I can test it on more processors. > > I've done some measurements with the help of ftrace on the time it > takes to let the preemption timer trigger (no adjustments via > Arthur's patch were involved): On my Core i7-620M, the preemption > timer seems to tick almost 10 times faster than spec and scale > value (5) suggests. I've loaded a value of 10, and it took > about 130 µs until I got a vmexit with reason PREEMPTION_TIMER (no > other exists in between). 10x is similar to what I was observing. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.21 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSTSY2AAoJEBvWZb6bTYbyZmMQAJ7i2irCo/CDAK/zS//BHI/E OE2jsqKnUVkcSee5XJ+bdyclqwLmp7J/vEySKlXpdrzr2lcZ/FNP2muXQ1HXeK8L I3eoSbuYTY3DPHKTjR09GVNfzIZIC6H8TwKb8RdbtZgwci1r9kjmwcpAJt3Qh0UR xM/6a5gnOubdCGx6SdFBVPL+OfZ3zu1Si6Aw+3mnNYO9KvLpbtA3lO4u4HteTchM KOszK0rmR3Y1LoWQdUhuTgrP1DMFZyZKhW4nHOIq/DWK+/al+0knJ5z5QaeZgP3z GkvwzDbzJKGTGRJpAiAixQ40uKIycHkv6IKjvJn2iT6/XhI78W2+X5OVjhU4kt8p aJ3h+KacopaYQpxaHBZo8jSvT0U+vX5mxvoBbn6okaKMw/iZ7eXVR6n0Pi3zHF35 g8jUalHKbunY8q0bFjZcvVfmAIVf+oBEQ67nkxZSUw3vW4zvx+Vmtx7j0MwzmLc7 2IWSAWehSaYCsKzHh7Qf+j8/8qCDaPZDmaAIrzE9ODglMHd/fZk/zA3rhJ7+5MnZ FKVNV/ABcnvm/TP3s0bIL7hWHUawZ/PlxJ0IxKvD25TUm70kpG9FIbrbRZ25ltsh mmI+aL5HK5ztJR4lSj0jk51YBYlLWoef7yWzDSnVgnebceY5KP7o1WVzSPt+AJlD Z6dK0g1rJk39kQV5nrXh =BPPg -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-09-30 11:08, Jan Kiszka wrote: > On 2013-09-26 17:04, Paolo Bonzini wrote: >> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: >>> This patch contains the following two changes: >>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 >>> with some reasons not emulated by L1, preemption timer value should >>> be save in such exits. >>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls >>> to nVMX. >>> >>> With this patch, nested VMX preemption timer features are fully >>> supported. >>> >>> Signed-off-by: Arthur Chunqi Li >>> --- >>> ChangeLog to v4: >>> Format changes and remove a flag in nested_vmx. >>> arch/x86/include/uapi/asm/msr-index.h |1 + >>> arch/x86/kvm/vmx.c| 44 >>> +++-- >>> 2 files changed, 43 insertions(+), 2 deletions(-) >> >> Hi all, >> >> the test fails for me if the preemption timer value is set to a value >> that is above ~2000 (which means ~65000 TSC cycles on this machine). >> The preemption timer seems to count faster than what is expected, for >> example only up to 4 million cycles if you set it to one million. >> So, I am leaving the patch out of kvm/queue for now, until I can >> test it on more processors. > > I've done some measurements with the help of ftrace on the time it takes > to let the preemption timer trigger (no adjustments via Arthur's patch > were involved): On my Core i7-620M, the preemption timer seems to tick > almost 10 times faster than spec and scale value (5) suggests. I've > loaded a value of 10, and it took about 130 µs until I got a vmexit > with reason PREEMPTION_TIMER (no other exists in between). > > qemu-system-x86-13765 [003] 298562.966079: bprint: > prepare_vmcs02: preempt val 10 > qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0 > qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason > PREEMPTION_TIMER rip 0x401fea info 0 0 > > That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz * > 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty > clear on this. It just doesn't match reality. Very strange. ...but documented: I found an related errata for my processor (AAT59) and also for Xeon 5500 (AAK139). At least current Haswell generation is no affected. I can test the patch on a Haswell board I have at work later this week. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-09-26 17:04, Paolo Bonzini wrote: > Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: >> This patch contains the following two changes: >> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 >> with some reasons not emulated by L1, preemption timer value should >> be save in such exits. >> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls >> to nVMX. >> >> With this patch, nested VMX preemption timer features are fully >> supported. >> >> Signed-off-by: Arthur Chunqi Li >> --- >> ChangeLog to v4: >> Format changes and remove a flag in nested_vmx. >> arch/x86/include/uapi/asm/msr-index.h |1 + >> arch/x86/kvm/vmx.c| 44 >> +++-- >> 2 files changed, 43 insertions(+), 2 deletions(-) > > Hi all, > > the test fails for me if the preemption timer value is set to a value > that is above ~2000 (which means ~65000 TSC cycles on this machine). > The preemption timer seems to count faster than what is expected, for > example only up to 4 million cycles if you set it to one million. > So, I am leaving the patch out of kvm/queue for now, until I can > test it on more processors. I've done some measurements with the help of ftrace on the time it takes to let the preemption timer trigger (no adjustments via Arthur's patch were involved): On my Core i7-620M, the preemption timer seems to tick almost 10 times faster than spec and scale value (5) suggests. I've loaded a value of 10, and it took about 130 µs until I got a vmexit with reason PREEMPTION_TIMER (no other exists in between). qemu-system-x86-13765 [003] 298562.966079: bprint: prepare_vmcs02: preempt val 10 qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0 qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason PREEMPTION_TIMER rip 0x401fea info 0 0 That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz * 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty clear on this. It just doesn't match reality. Very strange. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-09-27 08:37, Jan Kiszka wrote: > On 2013-09-26 22:44, Paolo Bonzini wrote: >> Il 26/09/2013 19:47, Paolo Bonzini ha scritto: >>> >>> If I only apply this hunk, which disables the preemption timer while >>> in L1: >>> >>> @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) >>> >>> load_vmcs12_host_state(vcpu, vmcs12); >>> >>> + vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, >>> vmx_pin_based_exec_ctrl(vmx)); >>> + >>> /* Update TSC_OFFSET if TSC was changed while L2 ran */ >>> vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); >>> >>> then the testcase works for somewhat larger values of the preemption timer >>> (up to ~150 TSC cycles), but then fails. > > Err, does this mean we run L1 with PIN_BASED_VM_EXEC_CONTROL of L2? > Ouch. Should be fixed independently. No, it doesn't mean this. L1 and L2 run on different VMCS, thus should be able to set their PIN_BASED_VM_EXEC_CONTROL independently. I have no clue ATM what that hunk can make a difference for you. Will have a closer look. BTW, aren't many VMCS fields written redundantly in prepare_vmcs02? I mean, those that weren't changed by L1 in the shadow VMCS or require updates for other reasons. There seems to be some room for saving a few cycles. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On Thu, Sep 26, 2013 at 07:47:33PM +0200, Paolo Bonzini wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Il 26/09/2013 19:19, Jan Kiszka ha scritto: > > On 2013-09-26 17:04, Paolo Bonzini wrote: > >> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: > >>> This patch contains the following two changes: > >>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > >>> with some reasons not emulated by L1, preemption timer value should > >>> be save in such exits. > >>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > >>> to nVMX. > >>> > >>> With this patch, nested VMX preemption timer features are fully > >>> supported. > >>> > >>> Signed-off-by: Arthur Chunqi Li > >>> --- > >>> ChangeLog to v4: > >>> Format changes and remove a flag in nested_vmx. > >>> arch/x86/include/uapi/asm/msr-index.h |1 + > >>> arch/x86/kvm/vmx.c| 44 > >>> +++-- > >>> 2 files changed, 43 insertions(+), 2 deletions(-) > >> > >> Hi all, > >> > >> the test fails for me if the preemption timer value is set to a value > >> that is above ~2000 (which means ~65000 TSC cycles on this machine). > >> The preemption timer seems to count faster than what is expected, for > >> example only up to 4 million cycles if you set it to one million. > >> So, I am leaving the patch out of kvm/queue for now, until I can > >> test it on more processors. > > > > So this behaviour is a regression of this patch (and your own version as > > well)? > > Without this patch Arthur's preemption timer test doesn't work at all. > Have you ruled out test bugs? I see things like: + int i, j; ... + // Consume enough time to let L2->L0->L2 occurs + for(i = 0; i < 10; i++) + for (j = 0; j < 1; j++); which can be optimized out. Or use of rdtsc() which can be problematic if host tsc is not synchronised. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-09-26 22:44, Paolo Bonzini wrote: > Il 26/09/2013 19:47, Paolo Bonzini ha scritto: >> >> If I only apply this hunk, which disables the preemption timer while >> in L1: >> >> @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) >> >> load_vmcs12_host_state(vcpu, vmcs12); >> >> + vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, >> vmx_pin_based_exec_ctrl(vmx)); >> + >> /* Update TSC_OFFSET if TSC was changed while L2 ran */ >> vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); >> >> then the testcase works for somewhat larger values of the preemption timer >> (up to ~150 TSC cycles), but then fails. Err, does this mean we run L1 with PIN_BASED_VM_EXEC_CONTROL of L2? Ouch. Should be fixed independently. > > I mean if I apply it on top of current kvm/next, without Arthur's patch. > > If I apply the hunk on top of Arthur's patch nothing changes and the > timer testcase starts breaking around ~65000 TSC cycles. So there are several overlapping issues. > > It is a bit problematic that adding printks changes something, so that > the test starts passing. I haven't tried tracepoints yet. Yeah, just convert to trace_printk. > > Jan, which L1 is using the preemption timer? Any reason why you added > it? The one I'm going to present in Edinburgh. Will publish code soon, but it's unspectacular regard preemption timer usage: I'm only using it to enforce immediate exits on next vmentry, i.e. set the timer value to 0. > I wonder if it isn't better to revert it, since it is quite broken. It's in upstream since 3.10, so reverting doesn't buy us anything, we need to fix it anyway for long-term stable. Jan signature.asc Description: OpenPGP digital signature
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Il 26/09/2013 19:47, Paolo Bonzini ha scritto: > > If I only apply this hunk, which disables the preemption timer while > in L1: > > @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) > > load_vmcs12_host_state(vcpu, vmcs12); > > + vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx)); > + > /* Update TSC_OFFSET if TSC was changed while L2 ran */ > vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); > > then the testcase works for somewhat larger values of the preemption timer > (up to ~150 TSC cycles), but then fails. I mean if I apply it on top of current kvm/next, without Arthur's patch. If I apply the hunk on top of Arthur's patch nothing changes and the timer testcase starts breaking around ~65000 TSC cycles. It is a bit problematic that adding printks changes something, so that the test starts passing. I haven't tried tracepoints yet. Jan, which L1 is using the preemption timer? Any reason why you added it? I wonder if it isn't better to revert it, since it is quite broken. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 26/09/2013 19:19, Jan Kiszka ha scritto: > On 2013-09-26 17:04, Paolo Bonzini wrote: >> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: >>> This patch contains the following two changes: >>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 >>> with some reasons not emulated by L1, preemption timer value should >>> be save in such exits. >>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls >>> to nVMX. >>> >>> With this patch, nested VMX preemption timer features are fully >>> supported. >>> >>> Signed-off-by: Arthur Chunqi Li >>> --- >>> ChangeLog to v4: >>> Format changes and remove a flag in nested_vmx. >>> arch/x86/include/uapi/asm/msr-index.h |1 + >>> arch/x86/kvm/vmx.c| 44 >>> +++-- >>> 2 files changed, 43 insertions(+), 2 deletions(-) >> >> Hi all, >> >> the test fails for me if the preemption timer value is set to a value >> that is above ~2000 (which means ~65000 TSC cycles on this machine). >> The preemption timer seems to count faster than what is expected, for >> example only up to 4 million cycles if you set it to one million. >> So, I am leaving the patch out of kvm/queue for now, until I can >> test it on more processors. > > So this behaviour is a regression of this patch (and your own version as > well)? Without this patch Arthur's preemption timer test doesn't work at all. If I only apply this hunk, which disables the preemption timer while in L1: @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu) load_vmcs12_host_state(vcpu, vmcs12); + vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx)); + /* Update TSC_OFFSET if TSC was changed while L2 ran */ vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset); then the testcase works for somewhat larger values of the preemption timer (up to ~150 TSC cycles), but then fails. If I apply the full patch below, the save-timer-value control works but the timer stops working except for very small values of the preemption timer. >> @@ -7294,6 +7272,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu >> *vcpu) >> | (1 << VCPU_EXREG_CR3)); >> vcpu->arch.regs_dirty = 0; >> >> + if (is_guest_mode(vcpu)) { >> + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); >> + if ((vmcs12->pin_based_vm_exec_control & >> PIN_BASED_VMX_PREEMPTION_TIMER) && >> + (vmcs12->vm_exit_controls & >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { >> + vmcs12->vmx_preemption_timer_value = >> + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); > > Is the preemption timer guaranteed to continue to tick while in L0? I > think this is what you are relying on here, right? No, it is definitely not ticking. But I was trying to simplify Arthur's code as much as possible, while fixing the worst bug (i.e. that after L2->L0->L2 the timer value restarts ticking from the initial value). Support for save-timer-value comes for free, and doesn't affect the result of the testcase. But this would cause the preemption timer to be late. It doesn't explain why the preemption timer expires faster than the (scaled) TSC. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSRHM1AAoJEBvWZb6bTYbyMHEP+gJro4KyKB9cAPE6+e/305mV /px/FWoY1Y5GlQ9juVV2DGQRipKVeIcN3pf3vZYZMSWzPCmbprmnx5nAy3NIpDi7 UM8gjZ+8dyzIfU/BfaEQWrI5o1mzcOhQTUVHw3bz09NxUWgoq7bp+eODR/kRyrw3 5EVOHjzaffZOug//z+boPRnaLQjvd4cFKv9HuKoo4hGjgQLYC6HfYWou2wvRzw9V La63p8w3gkVn2Fx31ppszLyS1qDIAJ0C5b3RwQp2jl5Kzp407VK7QGe9xoPiD8ZO bTsJs8nTbbITezxk8z9TUbSWWoVCd0M8aAfKhLB4xzubIWC/qQDA0Mwcu2SiNo3v m+vZsX3A358efmmjnkA3FUlGwwFBp5xWk9XuHkLhxh2AYWBl3+/IAZXg2yJP8SSq ShD4mcImWZTLFI5IK/02erDNdGYLTlnGC06537Fv9fzGAOtgowaOc8ScSxJDQn/n MbBBGvGa3Ps6RnF+eY4Xi8N9rsfbcVAGbVSdVMR0SXanuUT+h8R1kd7xtnYV2LpH twAreBVO+zxKhYYfnLj6Gu/oDDWAENDFMf1t6xumlflchR2gdcztmgp/OMkTK4QS IhWiaCzjxl/lD9vxyxstIhJNvMzsKk0o+K4X0h80oKizoDJybPcNZIaU2wNXs3iJ Q5nhpbTsl+kZLUlAWvlp =vW1q -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-09-26 17:04, Paolo Bonzini wrote: > Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: >> This patch contains the following two changes: >> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 >> with some reasons not emulated by L1, preemption timer value should >> be save in such exits. >> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls >> to nVMX. >> >> With this patch, nested VMX preemption timer features are fully >> supported. >> >> Signed-off-by: Arthur Chunqi Li >> --- >> ChangeLog to v4: >> Format changes and remove a flag in nested_vmx. >> arch/x86/include/uapi/asm/msr-index.h |1 + >> arch/x86/kvm/vmx.c| 44 >> +++-- >> 2 files changed, 43 insertions(+), 2 deletions(-) > > Hi all, > > the test fails for me if the preemption timer value is set to a value > that is above ~2000 (which means ~65000 TSC cycles on this machine). > The preemption timer seems to count faster than what is expected, for > example only up to 4 million cycles if you set it to one million. > So, I am leaving the patch out of kvm/queue for now, until I can > test it on more processors. So this behaviour is a regression of this patch (and your own version as well)? > > I tried redoing the patch with a completely different logic, > and I get exactly the same behavior and the same wrong values for the > TSC. My patch (below is the diff from Arthur's) is a bit simpler; like > Arthur I run L2 with the save-timer control set, but I do not need any > adjustment upon entry to L2 because I just use the value left by the > processor in the VMCS. Also, in your patch L1 is also running with > the preemption timer, if I understand the code correctly. > > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6730,27 +6731,6 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, > u64 *info1, u64 *in > *info2 = vmcs_read32(VM_EXIT_INTR_INFO); > } > > -static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) > -{ > - u64 delta_tsc_l1; > - u32 preempt_val_l1, preempt_val_l2, preempt_scale; > - > - if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control & > - PIN_BASED_VMX_PREEMPTION_TIMER)) > - return; > - preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) & > - MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; > - preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); > - delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) > - - vcpu->arch.last_guest_tsc; > - preempt_val_l1 = delta_tsc_l1 >> preempt_scale; > - if (preempt_val_l2 <= preempt_val_l1) > - preempt_val_l2 = 0; > - else > - preempt_val_l2 -= preempt_val_l1; > - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); > -} > - > /* > * The guest has exited. See if we can fix it or if we need userspace > * assistance. > @@ -7161,8 +7141,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > atomic_switch_perf_msrs(vmx); > debugctlmsr = get_debugctlmsr(); > > - if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending)) > - nested_adjust_preemption_timer(vcpu); > vmx->__launched = vmx->loaded_vmcs->launched; > asm( > /* Store host registers */ > @@ -7294,6 +7272,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > | (1 << VCPU_EXREG_CR3)); > vcpu->arch.regs_dirty = 0; > > + if (is_guest_mode(vcpu)) { > + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); > + if ((vmcs12->pin_based_vm_exec_control & > PIN_BASED_VMX_PREEMPTION_TIMER) && > + (vmcs12->vm_exit_controls & > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { > + vmcs12->vmx_preemption_timer_value = > + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); Is the preemption timer guaranteed to continue to tick while in L0? I think this is what you are relying on here, right? Jan > + } > + } > + > vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); > > vmx->loaded_vmcs->launched = 1; > @@ -7633,7 +7633,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > vmcs_write64(VMCS_LINK_POINTER, -1ull); > > vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, > - (vmcs_config.pin_based_exec_ctrl | > + (vmx_pin_based_exec_ctrl(vmx) | > vmcs12->pin_based_vm_exec_control)); > > if (vmcs12->pin_based_vm_exec_control & > PIN_BASED_VMX_PREEMPTION_TIMER) > @@ -8159,13 +8146,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > vmcs12->guest_pending_dbg_exceptions = > vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); > > - if ((vmcs12->pin_based
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: > This patch contains the following two changes: > 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > with some reasons not emulated by L1, preemption timer value should > be save in such exits. > 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > to nVMX. > > With this patch, nested VMX preemption timer features are fully > supported. > > Signed-off-by: Arthur Chunqi Li > --- > ChangeLog to v4: > Format changes and remove a flag in nested_vmx. > arch/x86/include/uapi/asm/msr-index.h |1 + > arch/x86/kvm/vmx.c| 44 > +++-- > 2 files changed, 43 insertions(+), 2 deletions(-) Hi all, the test fails for me if the preemption timer value is set to a value that is above ~2000 (which means ~65000 TSC cycles on this machine). The preemption timer seems to count faster than what is expected, for example only up to 4 million cycles if you set it to one million. So, I am leaving the patch out of kvm/queue for now, until I can test it on more processors. I tried redoing the patch with a completely different logic, and I get exactly the same behavior and the same wrong values for the TSC. My patch (below is the diff from Arthur's) is a bit simpler; like Arthur I run L2 with the save-timer control set, but I do not need any adjustment upon entry to L2 because I just use the value left by the processor in the VMCS. Also, in your patch L1 is also running with the preemption timer, if I understand the code correctly. --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6730,27 +6731,6 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *in *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } -static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) -{ - u64 delta_tsc_l1; - u32 preempt_val_l1, preempt_val_l2, preempt_scale; - - if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control & - PIN_BASED_VMX_PREEMPTION_TIMER)) - return; - preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) & - MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; - preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); - delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) - - vcpu->arch.last_guest_tsc; - preempt_val_l1 = delta_tsc_l1 >> preempt_scale; - if (preempt_val_l2 <= preempt_val_l1) - preempt_val_l2 = 0; - else - preempt_val_l2 -= preempt_val_l1; - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); -} - /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -7161,8 +7141,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) atomic_switch_perf_msrs(vmx); debugctlmsr = get_debugctlmsr(); - if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending)) - nested_adjust_preemption_timer(vcpu); vmx->__launched = vmx->loaded_vmcs->launched; asm( /* Store host registers */ @@ -7294,6 +7272,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) | (1 << VCPU_EXREG_CR3)); vcpu->arch.regs_dirty = 0; + if (is_guest_mode(vcpu)) { + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) && + (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + vmcs12->vmx_preemption_timer_value = + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + } + } + vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); vmx->loaded_vmcs->launched = 1; @@ -7633,7 +7633,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs_write64(VMCS_LINK_POINTER, -1ull); vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, - (vmcs_config.pin_based_exec_ctrl | + (vmx_pin_based_exec_ctrl(vmx) | vmcs12->pin_based_vm_exec_control)); if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) @@ -8159,13 +8146,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12->guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); - if ((vmcs12->pin_based_vm_exec_control & - PIN_BASED_VMX_PREEMPTION_TIMER) && - (vmcs12->vm_exit_controls & - VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) - vmcs12->vmx_preemption_timer_value = - vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); - /* * In some cases (usually, nested EPT), L2 is allowed to change its * own CR3 without exiting. If it has changed it, we must
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto: > This patch contains the following two changes: > 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > with some reasons not emulated by L1, preemption timer value should > be save in such exits. > 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > to nVMX. > > With this patch, nested VMX preemption timer features are fully > supported. > > Signed-off-by: Arthur Chunqi Li > --- > ChangeLog to v4: > Format changes and remove a flag in nested_vmx. > arch/x86/include/uapi/asm/msr-index.h |1 + > arch/x86/kvm/vmx.c| 44 > +++-- > 2 files changed, 43 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/uapi/asm/msr-index.h > b/arch/x86/include/uapi/asm/msr-index.h > index bb04650..b93e09a 100644 > --- a/arch/x86/include/uapi/asm/msr-index.h > +++ b/arch/x86/include/uapi/asm/msr-index.h > @@ -536,6 +536,7 @@ > > /* MSR_IA32_VMX_MISC bits */ > #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29) > +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F > /* AMD-V MSRs */ > > #define MSR_VM_CR 0xc0010114 > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 1f1da43..e1fa13a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void) > #ifdef CONFIG_X86_64 > VM_EXIT_HOST_ADDR_SPACE_SIZE | > #endif > - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; > + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | > + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > + if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) || > + !(nested_vmx_exit_ctls_high & > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { > + nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > + nested_vmx_pinbased_ctls_high &= > ~PIN_BASED_VMX_PREEMPTION_TIMER; > + } > nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | > VM_EXIT_LOAD_IA32_EFER); > > @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, > u64 *info1, u64 *info2) > *info2 = vmcs_read32(VM_EXIT_INTR_INFO); > } > > +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) > +{ > + u64 delta_tsc_l1; > + u32 preempt_val_l1, preempt_val_l2, preempt_scale; > + > + if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control & > + PIN_BASED_VMX_PREEMPTION_TIMER)) > + return; > + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) & > + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; > + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); > + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) > + - vcpu->arch.last_guest_tsc; > + preempt_val_l1 = delta_tsc_l1 >> preempt_scale; > + if (preempt_val_l2 <= preempt_val_l1) > + preempt_val_l2 = 0; > + else > + preempt_val_l2 -= preempt_val_l1; > + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); > +} > + > /* > * The guest has exited. See if we can fix it or if we need userspace > * assistance. > @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > atomic_switch_perf_msrs(vmx); > debugctlmsr = get_debugctlmsr(); > > + if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending)) > + nested_adjust_preemption_timer(vcpu); > vmx->__launched = vmx->loaded_vmcs->launched; > asm( > /* Store host registers */ > @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > u32 exec_control; > + u32 exit_control; > > vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector); > vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector); > @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) >* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER >* bits are further modified by vmx_set_efer() below. >*/ > - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); > + exit_control = vmcs_config.vmexit_ctrl; > + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) > + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > + vmcs_write32(VM_EXIT_CONTROLS, exit_control); > > /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are >* emulated by vmx_set_efer(), below. > @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > vmcs12->guest_pending_dbg_exceptions = > vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); > > + if ((vmcs12->pin_based_v
Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
On Mon, Sep 16, 2013 at 04:11:44PM +0800, Arthur Chunqi Li wrote: > This patch contains the following two changes: > 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > with some reasons not emulated by L1, preemption timer value should > be save in such exits. > 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > to nVMX. > > With this patch, nested VMX preemption timer features are fully > supported. > > Signed-off-by: Arthur Chunqi Li Reviewed-by: Gleb Natapov One more if() on vmentry path is unfortunate, but I do not see a way around it yet. > --- > ChangeLog to v4: > Format changes and remove a flag in nested_vmx. > arch/x86/include/uapi/asm/msr-index.h |1 + > arch/x86/kvm/vmx.c| 44 > +++-- > 2 files changed, 43 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/uapi/asm/msr-index.h > b/arch/x86/include/uapi/asm/msr-index.h > index bb04650..b93e09a 100644 > --- a/arch/x86/include/uapi/asm/msr-index.h > +++ b/arch/x86/include/uapi/asm/msr-index.h > @@ -536,6 +536,7 @@ > > /* MSR_IA32_VMX_MISC bits */ > #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29) > +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F > /* AMD-V MSRs */ > > #define MSR_VM_CR 0xc0010114 > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 1f1da43..e1fa13a 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void) > #ifdef CONFIG_X86_64 > VM_EXIT_HOST_ADDR_SPACE_SIZE | > #endif > - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; > + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | > + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > + if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) || > + !(nested_vmx_exit_ctls_high & > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { > + nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > + nested_vmx_pinbased_ctls_high &= > ~PIN_BASED_VMX_PREEMPTION_TIMER; > + } > nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | > VM_EXIT_LOAD_IA32_EFER); > > @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, > u64 *info1, u64 *info2) > *info2 = vmcs_read32(VM_EXIT_INTR_INFO); > } > > +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) > +{ > + u64 delta_tsc_l1; > + u32 preempt_val_l1, preempt_val_l2, preempt_scale; > + > + if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control & > + PIN_BASED_VMX_PREEMPTION_TIMER)) > + return; > + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) & > + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; > + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); > + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) > + - vcpu->arch.last_guest_tsc; > + preempt_val_l1 = delta_tsc_l1 >> preempt_scale; > + if (preempt_val_l2 <= preempt_val_l1) > + preempt_val_l2 = 0; > + else > + preempt_val_l2 -= preempt_val_l1; > + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); > +} > + > /* > * The guest has exited. See if we can fix it or if we need userspace > * assistance. > @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu > *vcpu) > atomic_switch_perf_msrs(vmx); > debugctlmsr = get_debugctlmsr(); > > + if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending)) > + nested_adjust_preemption_timer(vcpu); > vmx->__launched = vmx->loaded_vmcs->launched; > asm( > /* Store host registers */ > @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > { > struct vcpu_vmx *vmx = to_vmx(vcpu); > u32 exec_control; > + u32 exit_control; > > vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector); > vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector); > @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) >* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER >* bits are further modified by vmx_set_efer() below. >*/ > - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); > + exit_control = vmcs_config.vmexit_ctrl; > + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) > + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > + vmcs_write32(VM_EXIT_CONTROLS, exit_control); > > /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are >* emulated by vmx_set_efer(), below. > @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, > struct vmcs12 *vmcs12) > vmcs
[PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer
This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of "Save VMX-preemption timer value" VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li --- ChangeLog to v4: Format changes and remove a flag in nested_vmx. arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 44 +++-- 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index bb04650..b93e09a 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -536,6 +536,7 @@ /* MSR_IA32_VMX_MISC bits */ #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29) +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F /* AMD-V MSRs */ #define MSR_VM_CR 0xc0010114 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1f1da43..e1fa13a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) || + !(nested_vmx_exit_ctls_high & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { + nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + nested_vmx_pinbased_ctls_high &= ~PIN_BASED_VMX_PREEMPTION_TIMER; + } nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) +{ + u64 delta_tsc_l1; + u32 preempt_val_l1, preempt_val_l2, preempt_scale; + + if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control & + PIN_BASED_VMX_PREEMPTION_TIMER)) + return; + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) & + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc()) + - vcpu->arch.last_guest_tsc; + preempt_val_l1 = delta_tsc_l1 >> preempt_scale; + if (preempt_val_l2 <= preempt_val_l1) + preempt_val_l2 = 0; + else + preempt_val_l2 -= preempt_val_l1; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2); +} + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) atomic_switch_perf_msrs(vmx); debugctlmsr = get_debugctlmsr(); + if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending)) + nested_adjust_preemption_timer(vcpu); vmx->__launched = vmx->loaded_vmcs->launched; asm( /* Store host registers */ @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector); @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12->guest_pending_dbg_exceptions = vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); + if ((vmcs12->pin_based_vm_exec_control & + PIN_BASED_VMX_PREEMPTION_TIMER) && + (vmcs12->vm_exit_controls & + VM_EXIT_SAVE_VMX_PREEMPTION_TIME