Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-25 Thread Jan Kiszka
On 2013-10-25 10:56, Paolo Bonzini wrote:
> Il 10/10/2013 17:20, Paolo Bonzini ha scritto:
 Arthur, find some fix-ups for your test case below. It avoids printing
 from within L2 as this could deadlock when the timer fires and L1 then
 tries to print something. Also, it disables the preemption timer on
 leave so that it cannot fire later on again. If you want to fold this
 into your patch, feel free. Otherwise I can post a separate patch on
 top.
>> Is that a Signed-off-by? :)
> 
> Ping?

Sent as separate patch two days ago.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-25 Thread Paolo Bonzini
Il 10/10/2013 17:20, Paolo Bonzini ha scritto:
>> > Arthur, find some fix-ups for your test case below. It avoids printing
>> > from within L2 as this could deadlock when the timer fires and L1 then
>> > tries to print something. Also, it disables the preemption timer on
>> > leave so that it cannot fire later on again. If you want to fold this
>> > into your patch, feel free. Otherwise I can post a separate patch on
>> > top.
> Is that a Signed-off-by? :)

Ping?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-11 Thread Arthur Chunqi Li
Hi Jan,

On Fri, Oct 11, 2013 at 12:12 AM, Jan Kiszka  wrote:
> On 2013-10-02 20:47, Jan Kiszka wrote:
>> On 2013-09-30 11:08, Jan Kiszka wrote:
>>> On 2013-09-26 17:04, Paolo Bonzini wrote:
 Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
> This patch contains the following two changes:
> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> with some reasons not emulated by L1, preemption timer value should
> be save in such exits.
> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> to nVMX.
>
> With this patch, nested VMX preemption timer features are fully
> supported.
>
> Signed-off-by: Arthur Chunqi Li 
> ---
> ChangeLog to v4:
>Format changes and remove a flag in nested_vmx.
>  arch/x86/include/uapi/asm/msr-index.h |1 +
>  arch/x86/kvm/vmx.c|   44 
> +++--
>  2 files changed, 43 insertions(+), 2 deletions(-)

 Hi all,

 the test fails for me if the preemption timer value is set to a value
 that is above ~2000 (which means ~65000 TSC cycles on this machine).
 The preemption timer seems to count faster than what is expected, for
 example only up to 4 million cycles if you set it to one million.
 So, I am leaving the patch out of kvm/queue for now, until I can
 test it on more processors.
>>>
>>> I've done some measurements with the help of ftrace on the time it takes
>>> to let the preemption timer trigger (no adjustments via Arthur's patch
>>> were involved): On my Core i7-620M, the preemption timer seems to tick
>>> almost 10 times faster than spec and scale value (5) suggests. I've
>>> loaded a value of 10, and it took about 130 盜 until I got a vmexit
>>> with reason PREEMPTION_TIMER (no other exists in between).
>>>
>>>  qemu-system-x86-13765 [003] 298562.966079: bprint:   
>>> prepare_vmcs02: preempt val 10
>>>  qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0
>>>  qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason 
>>> PREEMPTION_TIMER rip 0x401fea info 0 0
>>>
>>> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz *
>>> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty
>>> clear on this. It just doesn't match reality. Very strange.
>>
>> ...but documented: I found an related errata for my processor (AAT59)
>> and also for Xeon 5500 (AAK139). At least current Haswell generation is
>> no affected. I can test the patch on a Haswell board I have at work
>> later this week.
>
> To complete this story: Arthur's patch works fine on a non-broken CPU
> (here: i7-4770S).
>
> Arthur, find some fix-ups for your test case below. It avoids printing
> from within L2 as this could deadlock when the timer fires and L1 then
> tries to print something. Also, it disables the preemption timer on
> leave so that it cannot fire later on again. If you want to fold this
> into your patch, feel free. Otherwise I can post a separate patch on
> top.
I think this can be treated as a separate patch to our test suite. You
can post it on top.
I have tested it and it works fine.

Arthur
>
> Jan
>
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 4372878..66a4201 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -141,6 +141,9 @@ void preemption_timer_init()
> preempt_val = 1000;
> vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
> preempt_scale = rdmsr(MSR_IA32_VMX_MISC) & 0x1F;
> +
> +   if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT))
> +   printf("\tSave preemption value is not supported\n");
>  }
>
>  void preemption_timer_main()
> @@ -150,9 +153,7 @@ void preemption_timer_main()
> printf("\tPreemption timer is not supported\n");
> return;
> }
> -   if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT))
> -   printf("\tSave preemption value is not supported\n");
> -   else {
> +   if (ctrl_exit_rev.clr & EXI_SAVE_PREEMPT) {
> set_stage(0);
> vmcall();
> if (get_stage() == 1)
> @@ -161,8 +162,8 @@ void preemption_timer_main()
> while (1) {
> if (((rdtsc() - tsc_val) >> preempt_scale)
> > 10 * preempt_val) {
> -   report("Preemption timer", 0);
> -   break;
> +   set_stage(2);
> +   vmcall();
> }
> }
>  }
> @@ -183,7 +184,7 @@ int preemption_timer_exit_handler()
> report("Preemption timer", 0);
> else
> report("Preemption timer", 1);
> -   return VMX_TEST_VMEXIT;
> +   break;
> case VMX_VMCALL:
> switch (get_stage()) {
> case 0:
> @@ -195,24 +196,29 @@ int preemption_timer_exit_handle

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-10 Thread Paolo Bonzini
Il 10/10/2013 18:12, Jan Kiszka ha scritto:
> On 2013-10-02 20:47, Jan Kiszka wrote:
>> On 2013-09-30 11:08, Jan Kiszka wrote:
>>> On 2013-09-26 17:04, Paolo Bonzini wrote:
 Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
> This patch contains the following two changes:
> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> with some reasons not emulated by L1, preemption timer value should
> be save in such exits.
> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> to nVMX.
>
> With this patch, nested VMX preemption timer features are fully
> supported.
>
> Signed-off-by: Arthur Chunqi Li 
> ---
> ChangeLog to v4:
>   Format changes and remove a flag in nested_vmx.
>  arch/x86/include/uapi/asm/msr-index.h |1 +
>  arch/x86/kvm/vmx.c|   44 
> +++--
>  2 files changed, 43 insertions(+), 2 deletions(-)

 Hi all,

 the test fails for me if the preemption timer value is set to a value
 that is above ~2000 (which means ~65000 TSC cycles on this machine).
 The preemption timer seems to count faster than what is expected, for
 example only up to 4 million cycles if you set it to one million.
 So, I am leaving the patch out of kvm/queue for now, until I can
 test it on more processors.
>>>
>>> I've done some measurements with the help of ftrace on the time it takes
>>> to let the preemption timer trigger (no adjustments via Arthur's patch
>>> were involved): On my Core i7-620M, the preemption timer seems to tick
>>> almost 10 times faster than spec and scale value (5) suggests. I've
>>> loaded a value of 10, and it took about 130 µs until I got a vmexit
>>> with reason PREEMPTION_TIMER (no other exists in between).
>>>
>>>  qemu-system-x86-13765 [003] 298562.966079: bprint:   
>>> prepare_vmcs02: preempt val 10
>>>  qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0
>>>  qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason 
>>> PREEMPTION_TIMER rip 0x401fea info 0 0
>>>
>>> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz *
>>> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty
>>> clear on this. It just doesn't match reality. Very strange.
>>
>> ...but documented: I found an related errata for my processor (AAT59)
>> and also for Xeon 5500 (AAK139). At least current Haswell generation is
>> no affected. I can test the patch on a Haswell board I have at work
>> later this week.
> 
> To complete this story: Arthur's patch works fine on a non-broken CPU
> (here: i7-4770S).
> 
> Arthur, find some fix-ups for your test case below. It avoids printing
> from within L2 as this could deadlock when the timer fires and L1 then
> tries to print something. Also, it disables the preemption timer on
> leave so that it cannot fire later on again. If you want to fold this
> into your patch, feel free. Otherwise I can post a separate patch on
> top.

Is that a Signed-off-by? :)

BTW, VirtualBox has a test for this erratum.  It would be nice to skip the test
when the processor is found to be buggy.

I'll put Arthur's patch back.  Thanks for testing!

Paolo

static bool hmR0InitIntelIsSubjectToVmxPreemptionTimerErratum(void)
{
uint32_t u = ASMCpuId_EAX(1);
u &= ~(RT_BIT_32(14) | RT_BIT_32(15) | RT_BIT_32(28) | RT_BIT_32(29) | 
RT_BIT_32(30) | RT_BIT_32(31));
if (   u == UINT32_C(0x000206E6) /* 323344.pdf - BA86   - D0 - Intel Xeon 
Processor 7500 Series */
|| u == UINT32_C(0x00020652) /* 323056.pdf - AAX65  - C2 - Intel Xeon 
Processor L3406 */
|| u == UINT32_C(0x00020652) /* 322814.pdf - AAT59  - C2 - Intel CoreTM 
i7-600, i5-500, i5-400 and i3-300 Mobile Processor Series */
|| u == UINT32_C(0x00020652) /* 322911.pdf - AAU65  - C2 - Intel CoreTM 
i5-600, i3-500 Desktop Processor Series and Intel Pentium Processor G6950 */
|| u == UINT32_C(0x00020655) /* 322911.pdf - AAU65  - K0 - Intel CoreTM 
i5-600, i3-500 Desktop Processor Series and Intel Pentium Processor G6950 */
|| u == UINT32_C(0x000106E5) /* 322373.pdf - AAO95  - B1 - Intel Xeon 
Processor 3400 Series */
|| u == UINT32_C(0x000106E5) /* 322166.pdf - AAN92  - B1 - Intel CoreTM 
i7-800 and i5-700 Desktop Processor Series */
|| u == UINT32_C(0x000106E5) /* 320767.pdf - AAP86  - B1 - Intel Core 
i7-900 Mobile Processor Extreme Edition Series, Intel Core i7-800 and i7-700 
Mobile Processor Series */
|| u == UINT32_C(0x000106A0) /*?321333.pdf - AAM126 - C0 - Intel Xeon 
Processor 3500 Series Specification */
|| u == UINT32_C(0x000106A1) /*?321333.pdf - AAM126 - C1 - Intel Xeon 
Processor 3500 Series Specification */
|| u == UINT32_C(0x000106A4) /* 320836.pdf - AAJ124 - C0 - Intel Core 
i7-900 Desktop Processor Extreme Edition Series and Intel Core i7-900 Desktop 
Processor Series */
   

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-10 Thread Jan Kiszka
On 2013-10-02 20:47, Jan Kiszka wrote:
> On 2013-09-30 11:08, Jan Kiszka wrote:
>> On 2013-09-26 17:04, Paolo Bonzini wrote:
>>> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
 This patch contains the following two changes:
 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
 with some reasons not emulated by L1, preemption timer value should
 be save in such exits.
 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
 to nVMX.

 With this patch, nested VMX preemption timer features are fully
 supported.

 Signed-off-by: Arthur Chunqi Li 
 ---
 ChangeLog to v4:
Format changes and remove a flag in nested_vmx.
  arch/x86/include/uapi/asm/msr-index.h |1 +
  arch/x86/kvm/vmx.c|   44 
 +++--
  2 files changed, 43 insertions(+), 2 deletions(-)
>>>
>>> Hi all,
>>>
>>> the test fails for me if the preemption timer value is set to a value
>>> that is above ~2000 (which means ~65000 TSC cycles on this machine).
>>> The preemption timer seems to count faster than what is expected, for
>>> example only up to 4 million cycles if you set it to one million.
>>> So, I am leaving the patch out of kvm/queue for now, until I can
>>> test it on more processors.
>>
>> I've done some measurements with the help of ftrace on the time it takes
>> to let the preemption timer trigger (no adjustments via Arthur's patch
>> were involved): On my Core i7-620M, the preemption timer seems to tick
>> almost 10 times faster than spec and scale value (5) suggests. I've
>> loaded a value of 10, and it took about 130 µs until I got a vmexit
>> with reason PREEMPTION_TIMER (no other exists in between).
>>
>>  qemu-system-x86-13765 [003] 298562.966079: bprint:   
>> prepare_vmcs02: preempt val 10
>>  qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0
>>  qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason 
>> PREEMPTION_TIMER rip 0x401fea info 0 0
>>
>> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz *
>> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty
>> clear on this. It just doesn't match reality. Very strange.
> 
> ...but documented: I found an related errata for my processor (AAT59)
> and also for Xeon 5500 (AAK139). At least current Haswell generation is
> no affected. I can test the patch on a Haswell board I have at work
> later this week.

To complete this story: Arthur's patch works fine on a non-broken CPU
(here: i7-4770S).

Arthur, find some fix-ups for your test case below. It avoids printing
from within L2 as this could deadlock when the timer fires and L1 then
tries to print something. Also, it disables the preemption timer on
leave so that it cannot fire later on again. If you want to fold this
into your patch, feel free. Otherwise I can post a separate patch on
top.

Jan

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 4372878..66a4201 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -141,6 +141,9 @@ void preemption_timer_init()
preempt_val = 1000;
vmcs_write(PREEMPT_TIMER_VALUE, preempt_val);
preempt_scale = rdmsr(MSR_IA32_VMX_MISC) & 0x1F;
+
+   if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT))
+   printf("\tSave preemption value is not supported\n");
 }
 
 void preemption_timer_main()
@@ -150,9 +153,7 @@ void preemption_timer_main()
printf("\tPreemption timer is not supported\n");
return;
}
-   if (!(ctrl_exit_rev.clr & EXI_SAVE_PREEMPT))
-   printf("\tSave preemption value is not supported\n");
-   else {
+   if (ctrl_exit_rev.clr & EXI_SAVE_PREEMPT) {
set_stage(0);
vmcall();
if (get_stage() == 1)
@@ -161,8 +162,8 @@ void preemption_timer_main()
while (1) {
if (((rdtsc() - tsc_val) >> preempt_scale)
> 10 * preempt_val) {
-   report("Preemption timer", 0);
-   break;
+   set_stage(2);
+   vmcall();
}
}
 }
@@ -183,7 +184,7 @@ int preemption_timer_exit_handler()
report("Preemption timer", 0);
else
report("Preemption timer", 1);
-   return VMX_TEST_VMEXIT;
+   break;
case VMX_VMCALL:
switch (get_stage()) {
case 0:
@@ -195,24 +196,29 @@ int preemption_timer_exit_handler()
EXI_SAVE_PREEMPT) & ctrl_exit_rev.clr;
vmcs_write(EXI_CONTROLS, ctrl_exit);
}
-   break;
+   vmcs_write(GUEST_RIP, guest_rip + insn_len);
+   return VMX_TEST_RESUME;
case 1:
   

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-03 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 30/09/2013 11:08, Jan Kiszka ha scritto:
> On 2013-09-26 17:04, Paolo Bonzini wrote:
>> Hi all,
>> 
>> the test fails for me if the preemption timer value is set to a 
>> value that is above ~2000 (which means ~65000 TSC cycles on this 
>> machine). The preemption timer seems to count faster than what
>> is expected, for example only up to 4 million cycles if you set
>> it to one million. So, I am leaving the patch out of kvm/queue
>> for now, until I can test it on more processors.
> 
> I've done some measurements with the help of ftrace on the time it 
> takes to let the preemption timer trigger (no adjustments via 
> Arthur's patch were involved): On my Core i7-620M, the preemption 
> timer seems to tick almost 10 times faster than spec and scale 
> value (5) suggests. I've loaded a value of 10, and it took 
> about 130 µs until I got a vmexit with reason PREEMPTION_TIMER (no 
> other exists in between).

10x is similar to what I was observing.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.21 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSTSY2AAoJEBvWZb6bTYbyZmMQAJ7i2irCo/CDAK/zS//BHI/E
OE2jsqKnUVkcSee5XJ+bdyclqwLmp7J/vEySKlXpdrzr2lcZ/FNP2muXQ1HXeK8L
I3eoSbuYTY3DPHKTjR09GVNfzIZIC6H8TwKb8RdbtZgwci1r9kjmwcpAJt3Qh0UR
xM/6a5gnOubdCGx6SdFBVPL+OfZ3zu1Si6Aw+3mnNYO9KvLpbtA3lO4u4HteTchM
KOszK0rmR3Y1LoWQdUhuTgrP1DMFZyZKhW4nHOIq/DWK+/al+0knJ5z5QaeZgP3z
GkvwzDbzJKGTGRJpAiAixQ40uKIycHkv6IKjvJn2iT6/XhI78W2+X5OVjhU4kt8p
aJ3h+KacopaYQpxaHBZo8jSvT0U+vX5mxvoBbn6okaKMw/iZ7eXVR6n0Pi3zHF35
g8jUalHKbunY8q0bFjZcvVfmAIVf+oBEQ67nkxZSUw3vW4zvx+Vmtx7j0MwzmLc7
2IWSAWehSaYCsKzHh7Qf+j8/8qCDaPZDmaAIrzE9ODglMHd/fZk/zA3rhJ7+5MnZ
FKVNV/ABcnvm/TP3s0bIL7hWHUawZ/PlxJ0IxKvD25TUm70kpG9FIbrbRZ25ltsh
mmI+aL5HK5ztJR4lSj0jk51YBYlLWoef7yWzDSnVgnebceY5KP7o1WVzSPt+AJlD
Z6dK0g1rJk39kQV5nrXh
=BPPg
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-10-02 Thread Jan Kiszka
On 2013-09-30 11:08, Jan Kiszka wrote:
> On 2013-09-26 17:04, Paolo Bonzini wrote:
>> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
>>> This patch contains the following two changes:
>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>>> with some reasons not emulated by L1, preemption timer value should
>>> be save in such exits.
>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
>>> to nVMX.
>>>
>>> With this patch, nested VMX preemption timer features are fully
>>> supported.
>>>
>>> Signed-off-by: Arthur Chunqi Li 
>>> ---
>>> ChangeLog to v4:
>>> Format changes and remove a flag in nested_vmx.
>>>  arch/x86/include/uapi/asm/msr-index.h |1 +
>>>  arch/x86/kvm/vmx.c|   44 
>>> +++--
>>>  2 files changed, 43 insertions(+), 2 deletions(-)
>>
>> Hi all,
>>
>> the test fails for me if the preemption timer value is set to a value
>> that is above ~2000 (which means ~65000 TSC cycles on this machine).
>> The preemption timer seems to count faster than what is expected, for
>> example only up to 4 million cycles if you set it to one million.
>> So, I am leaving the patch out of kvm/queue for now, until I can
>> test it on more processors.
> 
> I've done some measurements with the help of ftrace on the time it takes
> to let the preemption timer trigger (no adjustments via Arthur's patch
> were involved): On my Core i7-620M, the preemption timer seems to tick
> almost 10 times faster than spec and scale value (5) suggests. I've
> loaded a value of 10, and it took about 130 µs until I got a vmexit
> with reason PREEMPTION_TIMER (no other exists in between).
> 
>  qemu-system-x86-13765 [003] 298562.966079: bprint:   
> prepare_vmcs02: preempt val 10
>  qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0
>  qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason 
> PREEMPTION_TIMER rip 0x401fea info 0 0
> 
> That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz *
> 2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty
> clear on this. It just doesn't match reality. Very strange.

...but documented: I found an related errata for my processor (AAT59)
and also for Xeon 5500 (AAK139). At least current Haswell generation is
no affected. I can test the patch on a Haswell board I have at work
later this week.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-30 Thread Jan Kiszka
On 2013-09-26 17:04, Paolo Bonzini wrote:
> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
>> This patch contains the following two changes:
>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>> with some reasons not emulated by L1, preemption timer value should
>> be save in such exits.
>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
>> to nVMX.
>>
>> With this patch, nested VMX preemption timer features are fully
>> supported.
>>
>> Signed-off-by: Arthur Chunqi Li 
>> ---
>> ChangeLog to v4:
>>  Format changes and remove a flag in nested_vmx.
>>  arch/x86/include/uapi/asm/msr-index.h |1 +
>>  arch/x86/kvm/vmx.c|   44 
>> +++--
>>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> Hi all,
> 
> the test fails for me if the preemption timer value is set to a value
> that is above ~2000 (which means ~65000 TSC cycles on this machine).
> The preemption timer seems to count faster than what is expected, for
> example only up to 4 million cycles if you set it to one million.
> So, I am leaving the patch out of kvm/queue for now, until I can
> test it on more processors.

I've done some measurements with the help of ftrace on the time it takes
to let the preemption timer trigger (no adjustments via Arthur's patch
were involved): On my Core i7-620M, the preemption timer seems to tick
almost 10 times faster than spec and scale value (5) suggests. I've
loaded a value of 10, and it took about 130 µs until I got a vmexit
with reason PREEMPTION_TIMER (no other exists in between).

 qemu-system-x86-13765 [003] 298562.966079: bprint:   
prepare_vmcs02: preempt val 10
 qemu-system-x86-13765 [003] 298562.966083: kvm_entry:vcpu 0
 qemu-system-x86-13765 [003] 298562.966212: kvm_exit: reason 
PREEMPTION_TIMER rip 0x401fea info 0 0

That's a frequency of ~769 MHz. The TSC ticks at 2.66 GHz. But 769 MHz *
2^5 is 24.6 GHz. I've read the spec several times, but it seems pretty
clear on this. It just doesn't match reality. Very strange.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-29 Thread Jan Kiszka
On 2013-09-27 08:37, Jan Kiszka wrote:
> On 2013-09-26 22:44, Paolo Bonzini wrote:
>> Il 26/09/2013 19:47, Paolo Bonzini ha scritto:
>>>
>>> If I only apply this hunk, which disables the preemption timer while
>>> in L1:
>>>
>>> @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
>>>
>>> load_vmcs12_host_state(vcpu, vmcs12);
>>>
>>> +   vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, 
>>> vmx_pin_based_exec_ctrl(vmx));
>>> +
>>> /* Update TSC_OFFSET if TSC was changed while L2 ran */
>>> vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>>>
>>> then the testcase works for somewhat larger values of the preemption timer
>>> (up to ~150 TSC cycles), but then fails.
> 
> Err, does this mean we run L1 with PIN_BASED_VM_EXEC_CONTROL of L2?
> Ouch. Should be fixed independently.

No, it doesn't mean this. L1 and L2 run on different VMCS, thus should
be able to set their PIN_BASED_VM_EXEC_CONTROL independently. I have no
clue ATM what that hunk can make a difference for you. Will have a
closer look.

BTW, aren't many VMCS fields written redundantly in prepare_vmcs02? I
mean, those that weren't changed by L1 in the shadow VMCS or require
updates for other reasons. There seems to be some room for saving a few
cycles.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-29 Thread Gleb Natapov
On Thu, Sep 26, 2013 at 07:47:33PM +0200, Paolo Bonzini wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Il 26/09/2013 19:19, Jan Kiszka ha scritto:
> > On 2013-09-26 17:04, Paolo Bonzini wrote:
> >> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
> >>> This patch contains the following two changes:
> >>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> >>> with some reasons not emulated by L1, preemption timer value should
> >>> be save in such exits.
> >>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> >>> to nVMX.
> >>>
> >>> With this patch, nested VMX preemption timer features are fully
> >>> supported.
> >>>
> >>> Signed-off-by: Arthur Chunqi Li 
> >>> ---
> >>> ChangeLog to v4:
> >>>   Format changes and remove a flag in nested_vmx.
> >>>  arch/x86/include/uapi/asm/msr-index.h |1 +
> >>>  arch/x86/kvm/vmx.c|   44 
> >>> +++--
> >>>  2 files changed, 43 insertions(+), 2 deletions(-)
> >>
> >> Hi all,
> >>
> >> the test fails for me if the preemption timer value is set to a value
> >> that is above ~2000 (which means ~65000 TSC cycles on this machine).
> >> The preemption timer seems to count faster than what is expected, for
> >> example only up to 4 million cycles if you set it to one million.
> >> So, I am leaving the patch out of kvm/queue for now, until I can
> >> test it on more processors.
> > 
> > So this behaviour is a regression of this patch (and your own version as
> > well)?
> 
> Without this patch Arthur's preemption timer test doesn't work at all.
> 
Have you ruled out test bugs?

I see things like:
+   int i, j;
...

+   // Consume enough time to let L2->L0->L2 occurs
+   for(i = 0; i < 10; i++)
+   for (j = 0; j < 1; j++);

which can be optimized out. Or use of rdtsc() which can be problematic
if host tsc is not synchronised.


--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-26 Thread Jan Kiszka
On 2013-09-26 22:44, Paolo Bonzini wrote:
> Il 26/09/2013 19:47, Paolo Bonzini ha scritto:
>>
>> If I only apply this hunk, which disables the preemption timer while
>> in L1:
>>
>> @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
>>
>> load_vmcs12_host_state(vcpu, vmcs12);
>>
>> +   vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, 
>> vmx_pin_based_exec_ctrl(vmx));
>> +
>> /* Update TSC_OFFSET if TSC was changed while L2 ran */
>> vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
>>
>> then the testcase works for somewhat larger values of the preemption timer
>> (up to ~150 TSC cycles), but then fails.

Err, does this mean we run L1 with PIN_BASED_VM_EXEC_CONTROL of L2?
Ouch. Should be fixed independently.

> 
> I mean if I apply it on top of current kvm/next, without Arthur's patch.
> 
> If I apply the hunk on top of Arthur's patch nothing changes and the
> timer testcase starts breaking around ~65000 TSC cycles.

So there are several overlapping issues.

> 
> It is a bit problematic that adding printks changes something, so that
> the test starts passing.  I haven't tried tracepoints yet.

Yeah, just convert to trace_printk.

> 
> Jan, which L1 is using the preemption timer?  Any reason why you added
> it?

The one I'm going to present in Edinburgh. Will publish code soon, but
it's unspectacular regard preemption timer usage: I'm only using it to
enforce immediate exits on next vmentry, i.e. set the timer value to 0.

>  I wonder if it isn't better to revert it, since it is quite broken.

It's in upstream since 3.10, so reverting doesn't buy us anything, we
need to fix it anyway for long-term stable.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-26 Thread Paolo Bonzini
Il 26/09/2013 19:47, Paolo Bonzini ha scritto:
> 
> If I only apply this hunk, which disables the preemption timer while
> in L1:
> 
> @@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
> 
> load_vmcs12_host_state(vcpu, vmcs12);
> 
> +   vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
> +
> /* Update TSC_OFFSET if TSC was changed while L2 ran */
> vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);
> 
> then the testcase works for somewhat larger values of the preemption timer
> (up to ~150 TSC cycles), but then fails.

I mean if I apply it on top of current kvm/next, without Arthur's patch.

If I apply the hunk on top of Arthur's patch nothing changes and the
timer testcase starts breaking around ~65000 TSC cycles.

It is a bit problematic that adding printks changes something, so that
the test starts passing.  I haven't tried tracepoints yet.

Jan, which L1 is using the preemption timer?  Any reason why you added
it?  I wonder if it isn't better to revert it, since it is quite broken.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-26 Thread Paolo Bonzini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Il 26/09/2013 19:19, Jan Kiszka ha scritto:
> On 2013-09-26 17:04, Paolo Bonzini wrote:
>> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
>>> This patch contains the following two changes:
>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>>> with some reasons not emulated by L1, preemption timer value should
>>> be save in such exits.
>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
>>> to nVMX.
>>>
>>> With this patch, nested VMX preemption timer features are fully
>>> supported.
>>>
>>> Signed-off-by: Arthur Chunqi Li 
>>> ---
>>> ChangeLog to v4:
>>> Format changes and remove a flag in nested_vmx.
>>>  arch/x86/include/uapi/asm/msr-index.h |1 +
>>>  arch/x86/kvm/vmx.c|   44 
>>> +++--
>>>  2 files changed, 43 insertions(+), 2 deletions(-)
>>
>> Hi all,
>>
>> the test fails for me if the preemption timer value is set to a value
>> that is above ~2000 (which means ~65000 TSC cycles on this machine).
>> The preemption timer seems to count faster than what is expected, for
>> example only up to 4 million cycles if you set it to one million.
>> So, I am leaving the patch out of kvm/queue for now, until I can
>> test it on more processors.
> 
> So this behaviour is a regression of this patch (and your own version as
> well)?

Without this patch Arthur's preemption timer test doesn't work at all.

If I only apply this hunk, which disables the preemption timer while
in L1:

@@ -8396,6 +8375,8 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu)
 
load_vmcs12_host_state(vcpu, vmcs12);
 
+   vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, vmx_pin_based_exec_ctrl(vmx));
+
/* Update TSC_OFFSET if TSC was changed while L2 ran */
vmcs_write64(TSC_OFFSET, vmx->nested.vmcs01_tsc_offset);

then the testcase works for somewhat larger values of the preemption timer
(up to ~150 TSC cycles), but then fails.

If I apply the full patch below, the save-timer-value control works but the
timer stops working except for very small values of the preemption timer.

>> @@ -7294,6 +7272,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
>> *vcpu)
>>   | (1 << VCPU_EXREG_CR3));
>> vcpu->arch.regs_dirty = 0;
>>  
>> +   if (is_guest_mode(vcpu)) {
>> +   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> +   if ((vmcs12->pin_based_vm_exec_control & 
>> PIN_BASED_VMX_PREEMPTION_TIMER) &&
>> +   (vmcs12->vm_exit_controls & 
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
>> +   vmcs12->vmx_preemption_timer_value =
>> +   vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> 
> Is the preemption timer guaranteed to continue to tick while in L0? I
> think this is what you are relying on here, right?

No, it is definitely not ticking.  But I was trying to simplify Arthur's
code as much as possible, while fixing the worst bug (i.e. that after
L2->L0->L2 the timer value restarts ticking from the initial value).
Support for save-timer-value comes for free, and doesn't affect the
result of the testcase.

But this would cause the preemption timer to be late.  It doesn't
explain why the preemption timer expires faster than the (scaled) TSC.

Paolo
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSRHM1AAoJEBvWZb6bTYbyMHEP+gJro4KyKB9cAPE6+e/305mV
/px/FWoY1Y5GlQ9juVV2DGQRipKVeIcN3pf3vZYZMSWzPCmbprmnx5nAy3NIpDi7
UM8gjZ+8dyzIfU/BfaEQWrI5o1mzcOhQTUVHw3bz09NxUWgoq7bp+eODR/kRyrw3
5EVOHjzaffZOug//z+boPRnaLQjvd4cFKv9HuKoo4hGjgQLYC6HfYWou2wvRzw9V
La63p8w3gkVn2Fx31ppszLyS1qDIAJ0C5b3RwQp2jl5Kzp407VK7QGe9xoPiD8ZO
bTsJs8nTbbITezxk8z9TUbSWWoVCd0M8aAfKhLB4xzubIWC/qQDA0Mwcu2SiNo3v
m+vZsX3A358efmmjnkA3FUlGwwFBp5xWk9XuHkLhxh2AYWBl3+/IAZXg2yJP8SSq
ShD4mcImWZTLFI5IK/02erDNdGYLTlnGC06537Fv9fzGAOtgowaOc8ScSxJDQn/n
MbBBGvGa3Ps6RnF+eY4Xi8N9rsfbcVAGbVSdVMR0SXanuUT+h8R1kd7xtnYV2LpH
twAreBVO+zxKhYYfnLj6Gu/oDDWAENDFMf1t6xumlflchR2gdcztmgp/OMkTK4QS
IhWiaCzjxl/lD9vxyxstIhJNvMzsKk0o+K4X0h80oKizoDJybPcNZIaU2wNXs3iJ
Q5nhpbTsl+kZLUlAWvlp
=vW1q
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-26 Thread Jan Kiszka
On 2013-09-26 17:04, Paolo Bonzini wrote:
> Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
>> This patch contains the following two changes:
>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>> with some reasons not emulated by L1, preemption timer value should
>> be save in such exits.
>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
>> to nVMX.
>>
>> With this patch, nested VMX preemption timer features are fully
>> supported.
>>
>> Signed-off-by: Arthur Chunqi Li 
>> ---
>> ChangeLog to v4:
>>  Format changes and remove a flag in nested_vmx.
>>  arch/x86/include/uapi/asm/msr-index.h |1 +
>>  arch/x86/kvm/vmx.c|   44 
>> +++--
>>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> Hi all,
> 
> the test fails for me if the preemption timer value is set to a value
> that is above ~2000 (which means ~65000 TSC cycles on this machine).
> The preemption timer seems to count faster than what is expected, for
> example only up to 4 million cycles if you set it to one million.
> So, I am leaving the patch out of kvm/queue for now, until I can
> test it on more processors.

So this behaviour is a regression of this patch (and your own version as
well)?

> 
> I tried redoing the patch with a completely different logic,
> and I get exactly the same behavior and the same wrong values for the
> TSC.  My patch (below is the diff from Arthur's) is a bit simpler; like
> Arthur I run L2 with the save-timer control set, but I do not need any
> adjustment upon entry to L2 because I just use the value left by the
> processor in the VMCS.  Also, in your patch L1 is also running with
> the preemption timer, if I understand the code correctly.
> 
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6730,27 +6731,6 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
> u64 *info1, u64 *in
> *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
>  }
>  
> -static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
> -{
> -   u64 delta_tsc_l1;
> -   u32 preempt_val_l1, preempt_val_l2, preempt_scale;
> -
> -   if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control &
> -   PIN_BASED_VMX_PREEMPTION_TIMER))
> -   return;
> -   preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
> -   MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
> -   preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> -   delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
> -   - vcpu->arch.last_guest_tsc;
> -   preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
> -   if (preempt_val_l2 <= preempt_val_l1)
> -   preempt_val_l2 = 0;
> -   else
> -   preempt_val_l2 -= preempt_val_l1;
> -   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
> -}
> -
>  /*
>   * The guest has exited.  See if we can fix it or if we need userspace
>   * assistance.
> @@ -7161,8 +7141,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> *vcpu)
> atomic_switch_perf_msrs(vmx);
> debugctlmsr = get_debugctlmsr();
>  
> -   if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending))
> -   nested_adjust_preemption_timer(vcpu);
> vmx->__launched = vmx->loaded_vmcs->launched;
> asm(
> /* Store host registers */
> @@ -7294,6 +7272,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> *vcpu)
>   | (1 << VCPU_EXREG_CR3));
> vcpu->arch.regs_dirty = 0;
>  
> +   if (is_guest_mode(vcpu)) {
> +   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +   if ((vmcs12->pin_based_vm_exec_control & 
> PIN_BASED_VMX_PREEMPTION_TIMER) &&
> +   (vmcs12->vm_exit_controls & 
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
> +   vmcs12->vmx_preemption_timer_value =
> +   vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);

Is the preemption timer guaranteed to continue to tick while in L0? I
think this is what you are relying on here, right?

Jan

> +   }
> +   }
> +
> vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
>  
> vmx->loaded_vmcs->launched = 1;
> @@ -7633,7 +7633,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
> vmcs_write64(VMCS_LINK_POINTER, -1ull);
>  
> vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
> -   (vmcs_config.pin_based_exec_ctrl |
> +   (vmx_pin_based_exec_ctrl(vmx) |
>  vmcs12->pin_based_vm_exec_control));
>  
> if (vmcs12->pin_based_vm_exec_control & 
> PIN_BASED_VMX_PREEMPTION_TIMER)
> @@ -8159,13 +8146,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
> vmcs12->guest_pending_dbg_exceptions =
> vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
>  
> -   if ((vmcs12->pin_based

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-26 Thread Paolo Bonzini
Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
> This patch contains the following two changes:
> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> with some reasons not emulated by L1, preemption timer value should
> be save in such exits.
> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> to nVMX.
> 
> With this patch, nested VMX preemption timer features are fully
> supported.
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
> ChangeLog to v4:
>   Format changes and remove a flag in nested_vmx.
>  arch/x86/include/uapi/asm/msr-index.h |1 +
>  arch/x86/kvm/vmx.c|   44 
> +++--
>  2 files changed, 43 insertions(+), 2 deletions(-)

Hi all,

the test fails for me if the preemption timer value is set to a value
that is above ~2000 (which means ~65000 TSC cycles on this machine).
The preemption timer seems to count faster than what is expected, for
example only up to 4 million cycles if you set it to one million.
So, I am leaving the patch out of kvm/queue for now, until I can
test it on more processors.

I tried redoing the patch with a completely different logic,
and I get exactly the same behavior and the same wrong values for the
TSC.  My patch (below is the diff from Arthur's) is a bit simpler; like
Arthur I run L2 with the save-timer control set, but I do not need any
adjustment upon entry to L2 because I just use the value left by the
processor in the VMCS.  Also, in your patch L1 is also running with
the preemption timer, if I understand the code correctly.

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6730,27 +6731,6 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *in
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
-static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
-{
-   u64 delta_tsc_l1;
-   u32 preempt_val_l1, preempt_val_l2, preempt_scale;
-
-   if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control &
-   PIN_BASED_VMX_PREEMPTION_TIMER))
-   return;
-   preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
-   MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
-   preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
-   delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
-   - vcpu->arch.last_guest_tsc;
-   preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
-   if (preempt_val_l2 <= preempt_val_l1)
-   preempt_val_l2 = 0;
-   else
-   preempt_val_l2 -= preempt_val_l1;
-   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
-}
-
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -7161,8 +7141,6 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
atomic_switch_perf_msrs(vmx);
debugctlmsr = get_debugctlmsr();
 
-   if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending))
-   nested_adjust_preemption_timer(vcpu);
vmx->__launched = vmx->loaded_vmcs->launched;
asm(
/* Store host registers */
@@ -7294,6 +7272,15 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
  | (1 << VCPU_EXREG_CR3));
vcpu->arch.regs_dirty = 0;
 
+   if (is_guest_mode(vcpu)) {
+   struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+   if ((vmcs12->pin_based_vm_exec_control & 
PIN_BASED_VMX_PREEMPTION_TIMER) &&
+   (vmcs12->vm_exit_controls & 
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
+   vmcs12->vmx_preemption_timer_value =
+   vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   }
+   }
+
vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD);
 
vmx->loaded_vmcs->launched = 1;
@@ -7633,7 +7633,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs_write64(VMCS_LINK_POINTER, -1ull);
 
vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
-   (vmcs_config.pin_based_exec_ctrl |
+   (vmx_pin_based_exec_ctrl(vmx) |
 vmcs12->pin_based_vm_exec_control));
 
if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER)
@@ -8159,13 +8146,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs12->guest_pending_dbg_exceptions =
vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
-   if ((vmcs12->pin_based_vm_exec_control &
-   PIN_BASED_VMX_PREEMPTION_TIMER) &&
-   (vmcs12->vm_exit_controls &
-   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
-   vmcs12->vmx_preemption_timer_value =
-   vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
-
/*
 * In some cases (usually, nested EPT), L2 is allowed to change its
 * own CR3 without exiting. If it has changed it, we must

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-25 Thread Paolo Bonzini
Il 16/09/2013 10:11, Arthur Chunqi Li ha scritto:
> This patch contains the following two changes:
> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> with some reasons not emulated by L1, preemption timer value should
> be save in such exits.
> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> to nVMX.
> 
> With this patch, nested VMX preemption timer features are fully
> supported.
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
> ChangeLog to v4:
>   Format changes and remove a flag in nested_vmx.
>  arch/x86/include/uapi/asm/msr-index.h |1 +
>  arch/x86/kvm/vmx.c|   44 
> +++--
>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/msr-index.h 
> b/arch/x86/include/uapi/asm/msr-index.h
> index bb04650..b93e09a 100644
> --- a/arch/x86/include/uapi/asm/msr-index.h
> +++ b/arch/x86/include/uapi/asm/msr-index.h
> @@ -536,6 +536,7 @@
>  
>  /* MSR_IA32_VMX_MISC bits */
>  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
> +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
>  /* AMD-V MSRs */
>  
>  #define MSR_VM_CR   0xc0010114
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1f1da43..e1fa13a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>  #ifdef CONFIG_X86_64
>   VM_EXIT_HOST_ADDR_SPACE_SIZE |
>  #endif
> - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
> + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
> + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> + if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) ||
> + !(nested_vmx_exit_ctls_high & 
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
> + nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> + nested_vmx_pinbased_ctls_high &= 
> ~PIN_BASED_VMX_PREEMPTION_TIMER;
> + }
>   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> VM_EXIT_LOAD_IA32_EFER);
>  
> @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
> u64 *info1, u64 *info2)
>   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
>  }
>  
> +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
> +{
> + u64 delta_tsc_l1;
> + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
> +
> + if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control &
> + PIN_BASED_VMX_PREEMPTION_TIMER))
> + return;
> + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
> + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
> + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
> + - vcpu->arch.last_guest_tsc;
> + preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
> + if (preempt_val_l2 <= preempt_val_l1)
> + preempt_val_l2 = 0;
> + else
> + preempt_val_l2 -= preempt_val_l1;
> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
> +}
> +
>  /*
>   * The guest has exited.  See if we can fix it or if we need userspace
>   * assistance.
> @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> *vcpu)
>   atomic_switch_perf_msrs(vmx);
>   debugctlmsr = get_debugctlmsr();
>  
> + if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending))
> + nested_adjust_preemption_timer(vcpu);
>   vmx->__launched = vmx->loaded_vmcs->launched;
>   asm(
>   /* Store host registers */
> @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>  {
>   struct vcpu_vmx *vmx = to_vmx(vcpu);
>   u32 exec_control;
> + u32 exit_control;
>  
>   vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>   vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
>* bits are further modified by vmx_set_efer() below.
>*/
> - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> + exit_control = vmcs_config.vmexit_ctrl;
> + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER)
> + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> + vmcs_write32(VM_EXIT_CONTROLS, exit_control);
>  
>   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
>* emulated by vmx_set_efer(), below.
> @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>   vmcs12->guest_pending_dbg_exceptions =
>   vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
>  
> + if ((vmcs12->pin_based_v

Re: [PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-22 Thread Gleb Natapov
On Mon, Sep 16, 2013 at 04:11:44PM +0800, Arthur Chunqi Li wrote:
> This patch contains the following two changes:
> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> with some reasons not emulated by L1, preemption timer value should
> be save in such exits.
> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
> to nVMX.
> 
> With this patch, nested VMX preemption timer features are fully
> supported.
> 
> Signed-off-by: Arthur Chunqi Li 
Reviewed-by: Gleb Natapov 

One more if() on vmentry path is unfortunate, but I do not see a way
around it yet.

> ---
> ChangeLog to v4:
>   Format changes and remove a flag in nested_vmx.
>  arch/x86/include/uapi/asm/msr-index.h |1 +
>  arch/x86/kvm/vmx.c|   44 
> +++--
>  2 files changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/msr-index.h 
> b/arch/x86/include/uapi/asm/msr-index.h
> index bb04650..b93e09a 100644
> --- a/arch/x86/include/uapi/asm/msr-index.h
> +++ b/arch/x86/include/uapi/asm/msr-index.h
> @@ -536,6 +536,7 @@
>  
>  /* MSR_IA32_VMX_MISC bits */
>  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
> +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
>  /* AMD-V MSRs */
>  
>  #define MSR_VM_CR   0xc0010114
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 1f1da43..e1fa13a 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>  #ifdef CONFIG_X86_64
>   VM_EXIT_HOST_ADDR_SPACE_SIZE |
>  #endif
> - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
> + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
> + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> + if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) ||
> + !(nested_vmx_exit_ctls_high & 
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
> + nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> + nested_vmx_pinbased_ctls_high &= 
> ~PIN_BASED_VMX_PREEMPTION_TIMER;
> + }
>   nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> VM_EXIT_LOAD_IA32_EFER);
>  
> @@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, 
> u64 *info1, u64 *info2)
>   *info2 = vmcs_read32(VM_EXIT_INTR_INFO);
>  }
>  
> +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
> +{
> + u64 delta_tsc_l1;
> + u32 preempt_val_l1, preempt_val_l2, preempt_scale;
> +
> + if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control &
> + PIN_BASED_VMX_PREEMPTION_TIMER))
> + return;
> + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
> + MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
> + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> + delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
> + - vcpu->arch.last_guest_tsc;
> + preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
> + if (preempt_val_l2 <= preempt_val_l1)
> + preempt_val_l2 = 0;
> + else
> + preempt_val_l2 -= preempt_val_l1;
> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
> +}
> +
>  /*
>   * The guest has exited.  See if we can fix it or if we need userspace
>   * assistance.
> @@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> *vcpu)
>   atomic_switch_perf_msrs(vmx);
>   debugctlmsr = get_debugctlmsr();
>  
> + if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending))
> + nested_adjust_preemption_timer(vcpu);
>   vmx->__launched = vmx->loaded_vmcs->launched;
>   asm(
>   /* Store host registers */
> @@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>  {
>   struct vcpu_vmx *vmx = to_vmx(vcpu);
>   u32 exec_control;
> + u32 exit_control;
>  
>   vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>   vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
> @@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>* we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
>* bits are further modified by vmx_set_efer() below.
>*/
> - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> + exit_control = vmcs_config.vmexit_ctrl;
> + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER)
> + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> + vmcs_write32(VM_EXIT_CONTROLS, exit_control);
>  
>   /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
>* emulated by vmx_set_efer(), below.
> @@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, 
> struct vmcs12 *vmcs12)
>   vmcs

[PATCH v5] KVM: nVMX: Fully support of nested VMX preemption timer

2013-09-16 Thread Arthur Chunqi Li
This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2->L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of "Save VMX-preemption timer value" VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.

Signed-off-by: Arthur Chunqi Li 
---
ChangeLog to v4:
Format changes and remove a flag in nested_vmx.
 arch/x86/include/uapi/asm/msr-index.h |1 +
 arch/x86/kvm/vmx.c|   44 +++--
 2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index bb04650..b93e09a 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -536,6 +536,7 @@
 
 /* MSR_IA32_VMX_MISC bits */
 #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << 29)
+#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
 /* AMD-V MSRs */
 
 #define MSR_VM_CR   0xc0010114
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..e1fa13a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2204,7 +2204,13 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
 #endif
-   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+   VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   if (!(nested_vmx_pinbased_ctls_high & PIN_BASED_VMX_PREEMPTION_TIMER) ||
+   !(nested_vmx_exit_ctls_high & 
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
+   nested_vmx_exit_ctls_high &= ~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   nested_vmx_pinbased_ctls_high &= 
~PIN_BASED_VMX_PREEMPTION_TIMER;
+   }
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
  VM_EXIT_LOAD_IA32_EFER);
 
@@ -6707,6 +6713,27 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 
*info1, u64 *info2)
*info2 = vmcs_read32(VM_EXIT_INTR_INFO);
 }
 
+static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu)
+{
+   u64 delta_tsc_l1;
+   u32 preempt_val_l1, preempt_val_l2, preempt_scale;
+
+   if (!(get_vmcs12(vcpu)->pin_based_vm_exec_control &
+   PIN_BASED_VMX_PREEMPTION_TIMER))
+   return;
+   preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
+   MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
+   preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
+   delta_tsc_l1 = vmx_read_l1_tsc(vcpu, native_read_tsc())
+   - vcpu->arch.last_guest_tsc;
+   preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
+   if (preempt_val_l2 <= preempt_val_l1)
+   preempt_val_l2 = 0;
+   else
+   preempt_val_l2 -= preempt_val_l1;
+   vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val_l2);
+}
+
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -7131,6 +7158,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
atomic_switch_perf_msrs(vmx);
debugctlmsr = get_debugctlmsr();
 
+   if (is_guest_mode(vcpu) && !(vmx->nested.nested_run_pending))
+   nested_adjust_preemption_timer(vcpu);
vmx->__launched = vmx->loaded_vmcs->launched;
asm(
/* Store host registers */
@@ -7518,6 +7547,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   u32 exit_control;
 
vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
@@ -7691,7 +7721,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
 * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
 * bits are further modified by vmx_set_efer() below.
 */
-   vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
+   exit_control = vmcs_config.vmexit_ctrl;
+   if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER)
+   exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
+   vmcs_write32(VM_EXIT_CONTROLS, exit_control);
 
/* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are
 * emulated by vmx_set_efer(), below.
@@ -8090,6 +8123,13 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmcs12->guest_pending_dbg_exceptions =
vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
 
+   if ((vmcs12->pin_based_vm_exec_control &
+   PIN_BASED_VMX_PREEMPTION_TIMER) &&
+   (vmcs12->vm_exit_controls &
+   VM_EXIT_SAVE_VMX_PREEMPTION_TIME