Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 13:14, Xiao Guangrong wrote:
>> More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
>> not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
>> guest EFER.NX=1 and host EFER.NX=0.
> 
> It is not in linux, the kernel always set EFER.NX if CPUID reports it,
> arch/x86/kernel/head_64.S:
> 
> 204 /* Setup EFER (Extended Feature Enable Register) */
> 205 movl$MSR_EFER, %ecx
> 206 rdmsr
> 207 btsl$_EFER_SCE, %eax/* Enable System Call */
> 208 btl $20,%edi/* No Execute supported? */
> 209 jnc 1f
> 210 btsl$_EFER_NX, %eax
> 211 btsq$_PAGE_BIT_NX,early_pmd_flags(%rip)
> 212 1:  wrmsr   /* Make changes effective */
> 
> So if guest sees NX in its cpuid then host EFER.NX should be 1.

You're right.  It's just in theory.  But ignoring EFER.NX when it is 1
is technically not correct; since we have to add some special EFER_NX
logic anyway, I preferred to make it pedantically right. :)

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 13:14, Xiao Guangrong wrote:
>> More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
>> not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
>> guest EFER.NX=1 and host EFER.NX=0.
> 
> It is not in linux, the kernel always set EFER.NX if CPUID reports it,
> arch/x86/kernel/head_64.S:
> 
> 204 /* Setup EFER (Extended Feature Enable Register) */
> 205 movl$MSR_EFER, %ecx
> 206 rdmsr
> 207 btsl$_EFER_SCE, %eax/* Enable System Call */
> 208 btl $20,%edi/* No Execute supported? */
> 209 jnc 1f
> 210 btsl$_EFER_NX, %eax
> 211 btsq$_PAGE_BIT_NX,early_pmd_flags(%rip)
> 212 1:  wrmsr   /* Make changes effective */
> 
> So if guest sees NX in its cpuid then host EFER.NX should be 1.

You're right.  It's just in theory.  But ignoring EFER.NX when it is 1
is technically not correct; since we have to add some special EFER_NX
logic anyway, I preferred to make it pedantically right. :)

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Xiao Guangrong



On 03/10/2016 06:09 PM, Paolo Bonzini wrote:



On 10/03/2016 09:27, Xiao Guangrong wrote:





+if (!enable_ept) {
+guest_efer |= EFER_NX;
+ignore_bits |= EFER_NX;


Update ignore_bits is not necessary i think.


More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
guest EFER.NX=1 and host EFER.NX=0.


It is not in linux, the kernel always set EFER.NX if CPUID reports it,
arch/x86/kernel/head_64.S:

204 /* Setup EFER (Extended Feature Enable Register) */
205 movl$MSR_EFER, %ecx
206 rdmsr
207 btsl$_EFER_SCE, %eax/* Enable System Call */
208 btl $20,%edi/* No Execute supported? */
209 jnc 1f
210 btsl$_EFER_NX, %eax
211 btsq$_PAGE_BIT_NX,early_pmd_flags(%rip)
212 1:  wrmsr   /* Make changes effective */

So if guest sees NX in its cpuid then host EFER.NX should be 1.



This is what I came up with (plus some comments :)):

u64 guest_efer = vmx->vcpu.arch.efer;
u64 ignore_bits = 0;

if (!enable_ept) {
if (boot_cpu_has(X86_FEATURE_SMEP))
guest_efer |= EFER_NX;
else if (!(guest_efer & EFER_NX))
ignore_bits |= EFER_NX;
}


Your logic is very right.

What my suggestion is we can keep ignore_bits = EFER_NX | EFER_SCE;
(needn't conditionally adjust it) because EFER_NX must be the same
between guest and host if we switch EFER manually.


My patch is bigger but the resulting code is smaller and easier to follow:

guest_efer = vmx->vcpu.arch.efer;
if (!enable_ept)
guest_efer |= EFER_NX;
...
if (...) {
...
} else {
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
}


I agreed. :)


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Xiao Guangrong



On 03/10/2016 06:09 PM, Paolo Bonzini wrote:



On 10/03/2016 09:27, Xiao Guangrong wrote:





+if (!enable_ept) {
+guest_efer |= EFER_NX;
+ignore_bits |= EFER_NX;


Update ignore_bits is not necessary i think.


More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
guest EFER.NX=1 and host EFER.NX=0.


It is not in linux, the kernel always set EFER.NX if CPUID reports it,
arch/x86/kernel/head_64.S:

204 /* Setup EFER (Extended Feature Enable Register) */
205 movl$MSR_EFER, %ecx
206 rdmsr
207 btsl$_EFER_SCE, %eax/* Enable System Call */
208 btl $20,%edi/* No Execute supported? */
209 jnc 1f
210 btsl$_EFER_NX, %eax
211 btsq$_PAGE_BIT_NX,early_pmd_flags(%rip)
212 1:  wrmsr   /* Make changes effective */

So if guest sees NX in its cpuid then host EFER.NX should be 1.



This is what I came up with (plus some comments :)):

u64 guest_efer = vmx->vcpu.arch.efer;
u64 ignore_bits = 0;

if (!enable_ept) {
if (boot_cpu_has(X86_FEATURE_SMEP))
guest_efer |= EFER_NX;
else if (!(guest_efer & EFER_NX))
ignore_bits |= EFER_NX;
}


Your logic is very right.

What my suggestion is we can keep ignore_bits = EFER_NX | EFER_SCE;
(needn't conditionally adjust it) because EFER_NX must be the same
between guest and host if we switch EFER manually.


My patch is bigger but the resulting code is smaller and easier to follow:

guest_efer = vmx->vcpu.arch.efer;
if (!enable_ept)
guest_efer |= EFER_NX;
...
if (...) {
...
} else {
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
}


I agreed. :)


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 09:27, Xiao Guangrong wrote:
> So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
> NX is inherited from kernel (kernel always sets NX if CPU supports it),
> right?

Yes, but I think !cpu_has_load_ia32_efer && SMEP does not exist.  On the
other hand it's really only when disabling ept, so it's a weird corner
case that only happens during testing.

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 09:27, Xiao Guangrong wrote:
> So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
> NX is inherited from kernel (kernel always sets NX if CPU supports it),
> right?

Yes, but I think !cpu_has_load_ia32_efer && SMEP does not exist.  On the
other hand it's really only when disabling ept, so it's a weird corner
case that only happens during testing.

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 09:27, Xiao Guangrong wrote:
>>
> 
>> +if (!enable_ept) {
>> +guest_efer |= EFER_NX;
>> +ignore_bits |= EFER_NX;
> 
> Update ignore_bits is not necessary i think.

More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
guest EFER.NX=1 and host EFER.NX=0.

This is what I came up with (plus some comments :)):

u64 guest_efer = vmx->vcpu.arch.efer;
u64 ignore_bits = 0;

if (!enable_ept) {
if (boot_cpu_has(X86_FEATURE_SMEP))
guest_efer |= EFER_NX;
else if (!(guest_efer & EFER_NX))
ignore_bits |= EFER_NX;
}

>> -guest_efer = vmx->vcpu.arch.efer;
>>   if (!(guest_efer & EFER_LMA))
>>   guest_efer &= ~EFER_LME;
>>   if (guest_efer != host_efer)
>>   add_atomic_switch_msr(vmx, MSR_EFER,
>> guest_efer, host_efer);
> 
> So, why not set EFER_NX (if !ept) just in this branch to make the fix
> more simpler?

I didn't like having

guest_efer = vmx->vcpu.arch.efer;
...
if (!enable_ept)
guest_efer |= EFER_NX;
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
...
if (...) {
guest_efer = vmx->vcpu.arch.efer;
if (!enable_ept)
guest_efer |= EFER_NX;
...
}

My patch is bigger but the resulting code is smaller and easier to follow:

guest_efer = vmx->vcpu.arch.efer;
if (!enable_ept)
guest_efer |= EFER_NX;
...
if (...) {
...
} else {
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
}

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 09:27, Xiao Guangrong wrote:
>>
> 
>> +if (!enable_ept) {
>> +guest_efer |= EFER_NX;
>> +ignore_bits |= EFER_NX;
> 
> Update ignore_bits is not necessary i think.

More precisely, ignore_bits is only needed if guest EFER.NX=0 and we're
not in this CR0.WP=1/CR4.SMEP=0 situation.  In theory you could have
guest EFER.NX=1 and host EFER.NX=0.

This is what I came up with (plus some comments :)):

u64 guest_efer = vmx->vcpu.arch.efer;
u64 ignore_bits = 0;

if (!enable_ept) {
if (boot_cpu_has(X86_FEATURE_SMEP))
guest_efer |= EFER_NX;
else if (!(guest_efer & EFER_NX))
ignore_bits |= EFER_NX;
}

>> -guest_efer = vmx->vcpu.arch.efer;
>>   if (!(guest_efer & EFER_LMA))
>>   guest_efer &= ~EFER_LME;
>>   if (guest_efer != host_efer)
>>   add_atomic_switch_msr(vmx, MSR_EFER,
>> guest_efer, host_efer);
> 
> So, why not set EFER_NX (if !ept) just in this branch to make the fix
> more simpler?

I didn't like having

guest_efer = vmx->vcpu.arch.efer;
...
if (!enable_ept)
guest_efer |= EFER_NX;
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
...
if (...) {
guest_efer = vmx->vcpu.arch.efer;
if (!enable_ept)
guest_efer |= EFER_NX;
...
}

My patch is bigger but the resulting code is smaller and easier to follow:

guest_efer = vmx->vcpu.arch.efer;
if (!enable_ept)
guest_efer |= EFER_NX;
...
if (...) {
...
} else {
guest_efer &= ~ignore_bits;
guest_efer |= host_efer & ignore_bits;
}

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 09:46, Xiao Guangrong wrote:
> 
>> Yes, all of these are needed. :) This is admittedly a bit odd, but
>> kvm-unit-tests access.flat tests this if you run it with "-cpu host"
>> and of course ept=0.
>>
>> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
>> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
>> to fault and a supervisor write to succeed (which is correct because
>> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
> 
> BTW, it should be pte.u = 1 where you mentioned above.

Ok, will fix.

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Paolo Bonzini


On 10/03/2016 09:46, Xiao Guangrong wrote:
> 
>> Yes, all of these are needed. :) This is admittedly a bit odd, but
>> kvm-unit-tests access.flat tests this if you run it with "-cpu host"
>> and of course ept=0.
>>
>> KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
>> setting U=0 and W=1 in the shadow PTE.  This will cause a user write
>> to fault and a supervisor write to succeed (which is correct because
>> CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
> 
> BTW, it should be pte.u = 1 where you mentioned above.

Ok, will fix.

Paolo


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Xiao Guangrong



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.


BTW, it should be pte.u = 1 where you mentioned above.


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Xiao Guangrong



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.


BTW, it should be pte.u = 1 where you mentioned above.


Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Xiao Guangrong



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.


Good catch!

So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
NX is inherited from kernel (kernel always sets NX if CPU supports it),
right?



There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.




Cc: sta...@vger.kernel.org
Cc: Xiao Guangrong 
Cc: Andy Lutomirski 
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini 
---
  Documentation/virtual/kvm/mmu.txt |  3 ++-
  arch/x86/kvm/vmx.c| 25 +++--
  2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
  - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
the kernel may now execute it.  We handle this by also setting spte.nx.
If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
  - if CR4.SMAP is disabled: since the page has been changed to a kernel
page, it can not be reused when CR4.SMAP is enabled. We set
CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
guest_efer = vmx->vcpu.arch.efer;

/*
-* NX is emulated; LMA and LME handled by hardware; SCE meaningless
-* outside long mode
+* LMA and LME handled by hardware; SCE meaningless outside long mode.
 */
-   ignore_bits = EFER_NX | EFER_SCE;
+   ignore_bits = EFER_SCE;
  #ifdef CONFIG_X86_64
ignore_bits |= EFER_LMA | EFER_LME;
/* SCE is meaningful only in long mode on Intel */
if (guest_efer & EFER_LMA)
ignore_bits &= ~(u64)EFER_SCE;
  #endif
-   guest_efer &= ~ignore_bits;
-   guest_efer |= host_efer & ignore_bits;
-   vmx->guest_msrs[efer_offset].data = guest_efer;
-   vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+   /* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */



+   if (!enable_ept) {
+   guest_efer |= EFER_NX;
+   ignore_bits |= EFER_NX;


Update ignore_bits is not necessary i think.


+   }

clear_atomic_switch_msr(vmx, MSR_EFER);

@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
 */
if (cpu_has_load_ia32_efer ||
(enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-   guest_efer = vmx->vcpu.arch.efer;
if (!(guest_efer & EFER_LMA))
guest_efer &= ~EFER_LME;
if (guest_efer != host_efer)
add_atomic_switch_msr(vmx, MSR_EFER,
  guest_efer, host_efer);


So, why not set EFER_NX (if !ept) just in this branch to make the fix more 
simpler?



Re: [PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-10 Thread Xiao Guangrong



On 03/08/2016 07:44 PM, Paolo Bonzini wrote:

Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.


Good catch!

So it only hurts the box which has cpu_has_load_ia32_efer support otherwise
NX is inherited from kernel (kernel always sets NX if CPU supports it),
right?



There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.




Cc: sta...@vger.kernel.org
Cc: Xiao Guangrong 
Cc: Andy Lutomirski 
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini 
---
  Documentation/virtual/kvm/mmu.txt |  3 ++-
  arch/x86/kvm/vmx.c| 25 +++--
  2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
  - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
the kernel may now execute it.  We handle this by also setting spte.nx.
If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
  - if CR4.SMAP is disabled: since the page has been changed to a kernel
page, it can not be reused when CR4.SMAP is enabled. We set
CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
guest_efer = vmx->vcpu.arch.efer;

/*
-* NX is emulated; LMA and LME handled by hardware; SCE meaningless
-* outside long mode
+* LMA and LME handled by hardware; SCE meaningless outside long mode.
 */
-   ignore_bits = EFER_NX | EFER_SCE;
+   ignore_bits = EFER_SCE;
  #ifdef CONFIG_X86_64
ignore_bits |= EFER_LMA | EFER_LME;
/* SCE is meaningful only in long mode on Intel */
if (guest_efer & EFER_LMA)
ignore_bits &= ~(u64)EFER_SCE;
  #endif
-   guest_efer &= ~ignore_bits;
-   guest_efer |= host_efer & ignore_bits;
-   vmx->guest_msrs[efer_offset].data = guest_efer;
-   vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+   /* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */



+   if (!enable_ept) {
+   guest_efer |= EFER_NX;
+   ignore_bits |= EFER_NX;


Update ignore_bits is not necessary i think.


+   }

clear_atomic_switch_msr(vmx, MSR_EFER);

@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
 */
if (cpu_has_load_ia32_efer ||
(enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-   guest_efer = vmx->vcpu.arch.efer;
if (!(guest_efer & EFER_LMA))
guest_efer &= ~EFER_LME;
if (guest_efer != host_efer)
add_atomic_switch_msr(vmx, MSR_EFER,
  guest_efer, host_efer);


So, why not set EFER_NX (if !ept) just in this branch to make the fix more 
simpler?



[PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-08 Thread Paolo Bonzini
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: sta...@vger.kernel.org
Cc: Xiao Guangrong 
Cc: Andy Lutomirski 
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini 
---
 Documentation/virtual/kvm/mmu.txt |  3 ++-
 arch/x86/kvm/vmx.c| 25 +++--
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
 - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
   the kernel may now execute it.  We handle this by also setting spte.nx.
   If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
 - if CR4.SMAP is disabled: since the page has been changed to a kernel
   page, it can not be reused when CR4.SMAP is enabled. We set
   CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
guest_efer = vmx->vcpu.arch.efer;
 
/*
-* NX is emulated; LMA and LME handled by hardware; SCE meaningless
-* outside long mode
+* LMA and LME handled by hardware; SCE meaningless outside long mode.
 */
-   ignore_bits = EFER_NX | EFER_SCE;
+   ignore_bits = EFER_SCE;
 #ifdef CONFIG_X86_64
ignore_bits |= EFER_LMA | EFER_LME;
/* SCE is meaningful only in long mode on Intel */
if (guest_efer & EFER_LMA)
ignore_bits &= ~(u64)EFER_SCE;
 #endif
-   guest_efer &= ~ignore_bits;
-   guest_efer |= host_efer & ignore_bits;
-   vmx->guest_msrs[efer_offset].data = guest_efer;
-   vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+   /* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */
+   if (!enable_ept) {
+   guest_efer |= EFER_NX;
+   ignore_bits |= EFER_NX;
+   }
 
clear_atomic_switch_msr(vmx, MSR_EFER);
 
@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
 */
if (cpu_has_load_ia32_efer ||
(enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-   guest_efer = vmx->vcpu.arch.efer;
if (!(guest_efer & EFER_LMA))
guest_efer &= ~EFER_LME;
if (guest_efer != host_efer)
add_atomic_switch_msr(vmx, MSR_EFER,
  guest_efer, host_efer);
return false;
-   }
+   } else {
+   guest_efer &= ~ignore_bits;
+   guest_efer |= host_efer & ignore_bits;
 
-   return true;
+   vmx->guest_msrs[efer_offset].data = guest_efer;
+   vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+   return true;
+   }
 }
 
 static unsigned long segment_base(u16 selector)
-- 
1.8.3.1




[PATCH 1/2] KVM: MMU: fix ept=0/pte.u=0/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

2016-03-08 Thread Paolo Bonzini
Yes, all of these are needed. :) This is admittedly a bit odd, but
kvm-unit-tests access.flat tests this if you run it with "-cpu host"
and of course ept=0.

KVM handles supervisor writes of a pte.u=0/pte.w=0/CR0.WP=0 page by
setting U=0 and W=1 in the shadow PTE.  This will cause a user write
to fault and a supervisor write to succeed (which is correct because
CR0.WP=0).  A user read instead will flip U=0 to 1 and W=1 back to 0.
This enables user reads; it also disables supervisor writes, the next
of which will then flip the bits again.

When SMEP is in effect, however, U=0 will enable kernel execution of
this page.  To avoid this, KVM also sets NX=1 in the shadow PTE together
with U=0.  If the guest has not enabled NX, the result is a continuous
stream of page faults due to the NX bit being reserved.

The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER
switch.

There is another bug in the reserved bit check, which I've split to a
separate patch for easier application to stable kernels.

Cc: sta...@vger.kernel.org
Cc: Xiao Guangrong 
Cc: Andy Lutomirski 
Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd
Signed-off-by: Paolo Bonzini 
---
 Documentation/virtual/kvm/mmu.txt |  3 ++-
 arch/x86/kvm/vmx.c| 25 +++--
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index daf9c0f742d2..c81731096a43 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -358,7 +358,8 @@ In the first case there are two additional complications:
 - if CR4.SMEP is enabled: since we've turned the page into a kernel page,
   the kernel may now execute it.  We handle this by also setting spte.nx.
   If we get a user fetch or read fault, we'll change spte.u=1 and
-  spte.nx=gpte.nx back.
+  spte.nx=gpte.nx back.  For this to work, KVM forces EFER.NX to 1 when
+  shadow paging is in use.
 - if CR4.SMAP is disabled: since the page has been changed to a kernel
   page, it can not be reused when CR4.SMAP is enabled. We set
   CR4.SMAP && !CR0.WP into shadow page's role to avoid this case. Note,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6e51493ff4f9..91830809d837 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1863,20 +1863,20 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
guest_efer = vmx->vcpu.arch.efer;
 
/*
-* NX is emulated; LMA and LME handled by hardware; SCE meaningless
-* outside long mode
+* LMA and LME handled by hardware; SCE meaningless outside long mode.
 */
-   ignore_bits = EFER_NX | EFER_SCE;
+   ignore_bits = EFER_SCE;
 #ifdef CONFIG_X86_64
ignore_bits |= EFER_LMA | EFER_LME;
/* SCE is meaningful only in long mode on Intel */
if (guest_efer & EFER_LMA)
ignore_bits &= ~(u64)EFER_SCE;
 #endif
-   guest_efer &= ~ignore_bits;
-   guest_efer |= host_efer & ignore_bits;
-   vmx->guest_msrs[efer_offset].data = guest_efer;
-   vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+   /* NX is needed to handle CR0.WP=1, CR4.SMEP=1.  */
+   if (!enable_ept) {
+   guest_efer |= EFER_NX;
+   ignore_bits |= EFER_NX;
+   }
 
clear_atomic_switch_msr(vmx, MSR_EFER);
 
@@ -1887,16 +1887,21 @@ static bool update_transition_efer(struct vcpu_vmx 
*vmx, int efer_offset)
 */
if (cpu_has_load_ia32_efer ||
(enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
-   guest_efer = vmx->vcpu.arch.efer;
if (!(guest_efer & EFER_LMA))
guest_efer &= ~EFER_LME;
if (guest_efer != host_efer)
add_atomic_switch_msr(vmx, MSR_EFER,
  guest_efer, host_efer);
return false;
-   }
+   } else {
+   guest_efer &= ~ignore_bits;
+   guest_efer |= host_efer & ignore_bits;
 
-   return true;
+   vmx->guest_msrs[efer_offset].data = guest_efer;
+   vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
+
+   return true;
+   }
 }
 
 static unsigned long segment_base(u16 selector)
-- 
1.8.3.1