Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-26 Thread Wei Huang
>>
>>
>> Is this specified anywhere in SDM as a requirement for x86 OS? If so,
>> maybe provide a pointer to support this.
>
>
> In the case of the Intel manuals, it mentions in several places that
> SS.DPL=CPL.  All the mentions are in the VMX sections of the manual, though
> I've found non-Intel material saying that system-management mode also used
> SS.DPL as the CPL:
>
> * "SS.DPL corresponds to the logical processor’s current privilege level
> (CPL)" (footnote in 26.3.1.5 Checks on Guest Non-Register State).
>
> * "SS.DPL is always loaded from the SS access-rights field. This will be the
> current privilege level (CPL) after the VM entry completes" (26.3.2.2
> Loading Guest Segment Registers and Descriptor-Table Registers)
>
> * "VMX-critical state [...] consists of the following: (1) SS.DPL (the
> current privilege level);" (34.14.1 Default Treatment of SMI delivery [in
> VMX mode]).
>
> Instead, AMD says that "the SS segment base, limit and attributes are not
> modified" by sysret.  It almost looks as if AMD processors never use SS.DPL;
> almost because searching "SS.attr" in the AMD manuals shows that the
> processor does write to SS.attr sometimes.  In the SVM documentation, it
> says "The processor reads the current privilege level from the CPL field in
> the VMCB, not from SS.DPL.  However, SS.DPL should match the CPL field" and
> sneakily leaves out what happens if they do not match...

My guess is that SS.DPL==CPL will fail during VMRUN. This can be
quickly tested by slightly
tweaking VMCB content of a regular VM.

>
>
>>> case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
>>> value in the STAR MSR, but force CPL=3 (Intel instead forces
>>> SS.DPL=SS.RPL=CPL=3).
>>
>>
>> Thinking out loud here... Should we force SYSRET SS.RPL to be 3 when
>> VM updates STAR MSR? Following the same thought, does it make sense to
>> check (and force) SS.DPL==3 when STAR MSR is being updated. Will
>> forcing SYSRET SS.DPL=3 break any OS? I think any reasonable OS would
>> probably sets SS.RPL=SS.DPL=3.
>
>
> Yes, I wondered in fact how much the AMD behavior is a bug.
>
> We could emulate Intel behavior on AMD by shadowing the STAR MSR; the guest
> reads the intended SS.DPL and SS.RPL but the processor actually always runs
> with bits 49-48 of STAR set to 3.  This should ensure that CPL=SS.DPL always
> even on AMD.  I'm not sure if this has any worth though...

When SS.DPL != CPL for a VM, the worst case without STAR emulation
proposed above
is the crash of the VM, which it deserves. So I think we are fine here.

>
> Paolo
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-26 Thread Marcelo Tosatti
On Thu, May 15, 2014 at 06:51:31PM +0200, Paolo Bonzini wrote:
> CS.RPL is not equal to the CPL in the few instructions between
> setting CR0.PE and reloading CS.  And CS.DPL is also not equal
> to the CPL for conforming code segments.
> 
> However, SS.DPL *is* always equal to the CPL except for the weird
> case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
> value in the STAR MSR, but force CPL=3 (Intel instead forces
> SS.DPL=SS.RPL=CPL=3).
> 
> So this patch:
> 
> - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
> the above case with SYSRET is not broken further, and the way
> to fix it would be to pass the CPL to userspace and back
> 
> - modifies VMX to always return the CPL from SS.DPL (except
> forcing it to 0 if we are emulating real mode via vm86 mode;
> in vm86 mode all DPLs have to be 3, but real mode does allow
> privileged instructions).  It also removes the CPL cache,
> which becomes a duplicate of the SS access rights cache.
> 
> This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
> CR0.PE=1 but before CS has been reloaded.
> 
> Signed-off-by: Paolo Bonzini 


Reviewed-by: Marcelo Tosatti 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-26 Thread Paolo Bonzini

Il 26/05/2014 01:21, Wei Huang ha scritto:

CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.


Out of my curiousity, could you elaborate the problem of this
CPL gap window, such as breaking any VMs or tests? From Linux kernel
code, it seems kernel enables CR0.PE and immediately ljmpl to
CS. This windows is very small and I am curious how severely it could
be.


It is almost guaranteed to happen if you run iPXE and in the meanwhile 
kick QEMU continuously:


# qemu-system-x86_64 -monitor unix:/tmp/m1,server,nowait -net ... &
# yes 'info cpus' | nc -U /tmp/m1


However, SS.DPL *is* always equal to the CPL except for the weird


Is this specified anywhere in SDM as a requirement for x86 OS? If so,
maybe provide a pointer to support this.


In the case of the Intel manuals, it mentions in several places that 
SS.DPL=CPL.  All the mentions are in the VMX sections of the manual, 
though I've found non-Intel material saying that system-management mode 
also used SS.DPL as the CPL:


* "SS.DPL corresponds to the logical processor’s current privilege level 
(CPL)" (footnote in 26.3.1.5 Checks on Guest Non-Register State).


* "SS.DPL is always loaded from the SS access-rights field. This will be 
the current privilege level (CPL) after the VM entry completes" 
(26.3.2.2 Loading Guest Segment Registers and Descriptor-Table Registers)


* "VMX-critical state [...] consists of the following: (1) SS.DPL (the 
current privilege level);" (34.14.1 Default Treatment of SMI delivery 
[in VMX mode]).


Instead, AMD says that "the SS segment base, limit and attributes are 
not modified" by sysret.  It almost looks as if AMD processors never use 
SS.DPL; almost because searching "SS.attr" in the AMD manuals shows that 
the processor does write to SS.attr sometimes.  In the SVM 
documentation, it says "The processor reads the current privilege level 
from the CPL field in the VMCB, not from SS.DPL.  However, SS.DPL should 
match the CPL field" and sneakily leaves out what happens if they do not 
match...



case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).


Thinking out loud here... Should we force SYSRET SS.RPL to be 3 when
VM updates STAR MSR? Following the same thought, does it make sense to
check (and force) SS.DPL==3 when STAR MSR is being updated. Will
forcing SYSRET SS.DPL=3 break any OS? I think any reasonable OS would
probably sets SS.RPL=SS.DPL=3.


Yes, I wondered in fact how much the AMD behavior is a bug.

We could emulate Intel behavior on AMD by shadowing the STAR MSR; the 
guest reads the intended SS.DPL and SS.RPL but the processor actually 
always runs with bits 49-48 of STAR set to 3.  This should ensure that 
CPL=SS.DPL always even on AMD.  I'm not sure if this has any worth though...


Paolo



So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting


nit-picking: s/KVM_IOCTL_SET_SREGS/KVM_SET_SREGS IOCTL/, to
match with IOCTL function name exactly.


CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm.c  | 35 ++-
 arch/x86/kvm/vmx.c  | 24 
 3 files changed, 18 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e21aee98a5c2..49314155b66c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -130,7 +130,6 @@ enum kvm_reg_ex {
  VCPU_EXREG_PDPTR = NR_VCPU_REGS,
  VCPU_EXREG_CR3,
  VCPU_EXREG_RFLAGS,
- VCPU_EXREG_CPL,
  VCPU_EXREG_SEGMENTS,
 };

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0b7d58d0c5fb..ec8366c5cfea 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1338,21 +1338,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
  wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }

-static void svm_update_cpl(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- int cpl;
-
- if (!is_protmode(vcpu))
- cpl = 0;
- else if (svm->vmcb->save.rflags & X86_EFLAGS_VM)
- cpl = 3;
- else
- cpl = svm->vmcb->save.cs.selector & 0x3;
-
- svm->vmcb->save.cpl = cpl;
-}
-
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
  return to_svm(vcpu)->vmcb->save.rflags;
@@ -1360,11 +1345,12 @@ static unsigned 

Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-26 Thread Paolo Bonzini

Il 26/05/2014 01:21, Wei Huang ha scritto:

CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.


Out of my curiousity, could you elaborate the problem of this
CPL gap window, such as breaking any VMs or tests? From Linux kernel
code, it seems kernel enables CR0.PE and immediately ljmpl to
CS. This windows is very small and I am curious how severely it could
be.


It is almost guaranteed to happen if you run iPXE and in the meanwhile 
kick QEMU continuously:


# qemu-system-x86_64 -monitor unix:/tmp/m1,server,nowait -net ... 
# yes 'info cpus' | nc -U /tmp/m1


However, SS.DPL *is* always equal to the CPL except for the weird


Is this specified anywhere in SDM as a requirement for x86 OS? If so,
maybe provide a pointer to support this.


In the case of the Intel manuals, it mentions in several places that 
SS.DPL=CPL.  All the mentions are in the VMX sections of the manual, 
though I've found non-Intel material saying that system-management mode 
also used SS.DPL as the CPL:


* SS.DPL corresponds to the logical processor’s current privilege level 
(CPL) (footnote in 26.3.1.5 Checks on Guest Non-Register State).


* SS.DPL is always loaded from the SS access-rights field. This will be 
the current privilege level (CPL) after the VM entry completes 
(26.3.2.2 Loading Guest Segment Registers and Descriptor-Table Registers)


* VMX-critical state [...] consists of the following: (1) SS.DPL (the 
current privilege level); (34.14.1 Default Treatment of SMI delivery 
[in VMX mode]).


Instead, AMD says that the SS segment base, limit and attributes are 
not modified by sysret.  It almost looks as if AMD processors never use 
SS.DPL; almost because searching SS.attr in the AMD manuals shows that 
the processor does write to SS.attr sometimes.  In the SVM 
documentation, it says The processor reads the current privilege level 
from the CPL field in the VMCB, not from SS.DPL.  However, SS.DPL should 
match the CPL field and sneakily leaves out what happens if they do not 
match...



case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).


Thinking out loud here... Should we force SYSRET SS.RPL to be 3 when
VM updates STAR MSR? Following the same thought, does it make sense to
check (and force) SS.DPL==3 when STAR MSR is being updated. Will
forcing SYSRET SS.DPL=3 break any OS? I think any reasonable OS would
probably sets SS.RPL=SS.DPL=3.


Yes, I wondered in fact how much the AMD behavior is a bug.

We could emulate Intel behavior on AMD by shadowing the STAR MSR; the 
guest reads the intended SS.DPL and SS.RPL but the processor actually 
always runs with bits 49-48 of STAR set to 3.  This should ensure that 
CPL=SS.DPL always even on AMD.  I'm not sure if this has any worth though...


Paolo



So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting


nit-picking: s/KVM_IOCTL_SET_SREGS/KVM_SET_SREGS IOCTL/, to
match with IOCTL function name exactly.


CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm.c  | 35 ++-
 arch/x86/kvm/vmx.c  | 24 
 3 files changed, 18 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e21aee98a5c2..49314155b66c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -130,7 +130,6 @@ enum kvm_reg_ex {
  VCPU_EXREG_PDPTR = NR_VCPU_REGS,
  VCPU_EXREG_CR3,
  VCPU_EXREG_RFLAGS,
- VCPU_EXREG_CPL,
  VCPU_EXREG_SEGMENTS,
 };

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0b7d58d0c5fb..ec8366c5cfea 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1338,21 +1338,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
  wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
 }

-static void svm_update_cpl(struct kvm_vcpu *vcpu)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
- int cpl;
-
- if (!is_protmode(vcpu))
- cpl = 0;
- else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
- cpl = 3;
- else
- cpl = svm-vmcb-save.cs.selector  0x3;
-
- svm-vmcb-save.cpl = cpl;
-}
-
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
  return to_svm(vcpu)-vmcb-save.rflags;
@@ -1360,11 +1345,12 @@ static unsigned long 

Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-26 Thread Marcelo Tosatti
On Thu, May 15, 2014 at 06:51:31PM +0200, Paolo Bonzini wrote:
 CS.RPL is not equal to the CPL in the few instructions between
 setting CR0.PE and reloading CS.  And CS.DPL is also not equal
 to the CPL for conforming code segments.
 
 However, SS.DPL *is* always equal to the CPL except for the weird
 case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
 value in the STAR MSR, but force CPL=3 (Intel instead forces
 SS.DPL=SS.RPL=CPL=3).
 
 So this patch:
 
 - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
 the above case with SYSRET is not broken further, and the way
 to fix it would be to pass the CPL to userspace and back
 
 - modifies VMX to always return the CPL from SS.DPL (except
 forcing it to 0 if we are emulating real mode via vm86 mode;
 in vm86 mode all DPLs have to be 3, but real mode does allow
 privileged instructions).  It also removes the CPL cache,
 which becomes a duplicate of the SS access rights cache.
 
 This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
 CR0.PE=1 but before CS has been reloaded.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com


Reviewed-by: Marcelo Tosatti mtosa...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-26 Thread Wei Huang


 Is this specified anywhere in SDM as a requirement for x86 OS? If so,
 maybe provide a pointer to support this.


 In the case of the Intel manuals, it mentions in several places that
 SS.DPL=CPL.  All the mentions are in the VMX sections of the manual, though
 I've found non-Intel material saying that system-management mode also used
 SS.DPL as the CPL:

 * SS.DPL corresponds to the logical processor’s current privilege level
 (CPL) (footnote in 26.3.1.5 Checks on Guest Non-Register State).

 * SS.DPL is always loaded from the SS access-rights field. This will be the
 current privilege level (CPL) after the VM entry completes (26.3.2.2
 Loading Guest Segment Registers and Descriptor-Table Registers)

 * VMX-critical state [...] consists of the following: (1) SS.DPL (the
 current privilege level); (34.14.1 Default Treatment of SMI delivery [in
 VMX mode]).

 Instead, AMD says that the SS segment base, limit and attributes are not
 modified by sysret.  It almost looks as if AMD processors never use SS.DPL;
 almost because searching SS.attr in the AMD manuals shows that the
 processor does write to SS.attr sometimes.  In the SVM documentation, it
 says The processor reads the current privilege level from the CPL field in
 the VMCB, not from SS.DPL.  However, SS.DPL should match the CPL field and
 sneakily leaves out what happens if they do not match...

My guess is that SS.DPL==CPL will fail during VMRUN. This can be
quickly tested by slightly
tweaking VMCB content of a regular VM.



 case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
 value in the STAR MSR, but force CPL=3 (Intel instead forces
 SS.DPL=SS.RPL=CPL=3).


 Thinking out loud here... Should we force SYSRET SS.RPL to be 3 when
 VM updates STAR MSR? Following the same thought, does it make sense to
 check (and force) SS.DPL==3 when STAR MSR is being updated. Will
 forcing SYSRET SS.DPL=3 break any OS? I think any reasonable OS would
 probably sets SS.RPL=SS.DPL=3.


 Yes, I wondered in fact how much the AMD behavior is a bug.

 We could emulate Intel behavior on AMD by shadowing the STAR MSR; the guest
 reads the intended SS.DPL and SS.RPL but the processor actually always runs
 with bits 49-48 of STAR set to 3.  This should ensure that CPL=SS.DPL always
 even on AMD.  I'm not sure if this has any worth though...

When SS.DPL != CPL for a VM, the worst case without STAR emulation
proposed above
is the crash of the VM, which it deserves. So I think we are fine here.


 Paolo


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-25 Thread Wei Huang
> CS.RPL is not equal to the CPL in the few instructions between
> setting CR0.PE and reloading CS.  And CS.DPL is also not equal
> to the CPL for conforming code segments.

Out of my curiousity, could you elaborate the problem of this
CPL gap window, such as breaking any VMs or tests? From Linux kernel
code, it seems kernel enables CR0.PE and immediately ljmpl to
CS. This windows is very small and I am curious how severely it could
be.

>
> However, SS.DPL *is* always equal to the CPL except for the weird

Is this specified anywhere in SDM as a requirement for x86 OS? If so,
maybe provide a pointer to support this.

> case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
> value in the STAR MSR, but force CPL=3 (Intel instead forces
> SS.DPL=SS.RPL=CPL=3).

Thinking out loud here... Should we force SYSRET SS.RPL to be 3 when
VM updates STAR MSR? Following the same thought, does it make sense to
check (and force) SS.DPL==3 when STAR MSR is being updated. Will
forcing SYSRET SS.DPL=3 break any OS? I think any reasonable OS would
probably sets SS.RPL=SS.DPL=3.

>
> So this patch:
>
> - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
> the above case with SYSRET is not broken further, and the way
> to fix it would be to pass the CPL to userspace and back
>
> - modifies VMX to always return the CPL from SS.DPL (except
> forcing it to 0 if we are emulating real mode via vm86 mode;
> in vm86 mode all DPLs have to be 3, but real mode does allow
> privileged instructions).  It also removes the CPL cache,
> which becomes a duplicate of the SS access rights cache.
>
> This fixes doing KVM_IOCTL_SET_SREGS exactly after setting

nit-picking: s/KVM_IOCTL_SET_SREGS/KVM_SET_SREGS IOCTL/, to
match with IOCTL function name exactly.

> CR0.PE=1 but before CS has been reloaded.
>
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/svm.c  | 35 ++-
>  arch/x86/kvm/vmx.c  | 24 
>  3 files changed, 18 insertions(+), 42 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index e21aee98a5c2..49314155b66c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -130,7 +130,6 @@ enum kvm_reg_ex {
>   VCPU_EXREG_PDPTR = NR_VCPU_REGS,
>   VCPU_EXREG_CR3,
>   VCPU_EXREG_RFLAGS,
> - VCPU_EXREG_CPL,
>   VCPU_EXREG_SEGMENTS,
>  };
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 0b7d58d0c5fb..ec8366c5cfea 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1338,21 +1338,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>   wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
>  }
>
> -static void svm_update_cpl(struct kvm_vcpu *vcpu)
> -{
> - struct vcpu_svm *svm = to_svm(vcpu);
> - int cpl;
> -
> - if (!is_protmode(vcpu))
> - cpl = 0;
> - else if (svm->vmcb->save.rflags & X86_EFLAGS_VM)
> - cpl = 3;
> - else
> - cpl = svm->vmcb->save.cs.selector & 0x3;
> -
> - svm->vmcb->save.cpl = cpl;
> -}
> -
>  static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
>  {
>   return to_svm(vcpu)->vmcb->save.rflags;
> @@ -1360,11 +1345,12 @@ static unsigned long svm_get_rflags(struct
> kvm_vcpu *vcpu)
>
>  static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
>  {
> - unsigned long old_rflags = to_svm(vcpu)->vmcb->save.rflags;
> -
> +   /*
> +* Any change of EFLAGS.VM is accompained by a reload of SS
> +* (caused by either a task switch or an inter-privilege IRET),
> +* so we do not need to update the CPL here.
> +*/
>   to_svm(vcpu)->vmcb->save.rflags = rflags;
> - if ((old_rflags ^ rflags) & X86_EFLAGS_VM)
> - svm_update_cpl(vcpu);
>  }
>
>  static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
> @@ -1631,8 +1617,15 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
>   s->attrib |= (var->db & 1) << SVM_SELECTOR_DB_SHIFT;
>   s->attrib |= (var->g & 1) << SVM_SELECTOR_G_SHIFT;
>   }
> - if (seg == VCPU_SREG_CS)
> - svm_update_cpl(vcpu);
> +
> + /*
> + * This is always accurate, except if SYSRET returned to a segment
> + * with SS.DPL != 3.  Intel does not have this quirk, and always
> + * forces SS.DPL to 3 on sysret, so we ignore that case; fixing it
> + * would entail passing the CPL to userspace and back.
> + */
> + if (seg == VCPU_SREG_SS)
> + svm->vmcb->save.cpl = (s->attrib >> SVM_SELECTOR_DPL_SHIFT) & 3;
>
>   mark_dirty(svm->vmcb, VMCB_SEG);
>  }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6f7463f53ed9..a267108403f5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -414,7 +414,6 @@ struct vcpu_vmx {
>   struct kvm_vcpu   vcpu;
>   unsigned long host_rsp;
>   u8fail;
> - u8cpl;
>   bool  nmi_known_unmasked;
>   u32   exit_intr_info;
>   u32   idt_vectoring_info;
> @@ 

Re: [PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-25 Thread Wei Huang
 CS.RPL is not equal to the CPL in the few instructions between
 setting CR0.PE and reloading CS.  And CS.DPL is also not equal
 to the CPL for conforming code segments.

Out of my curiousity, could you elaborate the problem of this
CPL gap window, such as breaking any VMs or tests? From Linux kernel
code, it seems kernel enables CR0.PE and immediately ljmpl to
CS. This windows is very small and I am curious how severely it could
be.


 However, SS.DPL *is* always equal to the CPL except for the weird

Is this specified anywhere in SDM as a requirement for x86 OS? If so,
maybe provide a pointer to support this.

 case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
 value in the STAR MSR, but force CPL=3 (Intel instead forces
 SS.DPL=SS.RPL=CPL=3).

Thinking out loud here... Should we force SYSRET SS.RPL to be 3 when
VM updates STAR MSR? Following the same thought, does it make sense to
check (and force) SS.DPL==3 when STAR MSR is being updated. Will
forcing SYSRET SS.DPL=3 break any OS? I think any reasonable OS would
probably sets SS.RPL=SS.DPL=3.


 So this patch:

 - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
 the above case with SYSRET is not broken further, and the way
 to fix it would be to pass the CPL to userspace and back

 - modifies VMX to always return the CPL from SS.DPL (except
 forcing it to 0 if we are emulating real mode via vm86 mode;
 in vm86 mode all DPLs have to be 3, but real mode does allow
 privileged instructions).  It also removes the CPL cache,
 which becomes a duplicate of the SS access rights cache.

 This fixes doing KVM_IOCTL_SET_SREGS exactly after setting

nit-picking: s/KVM_IOCTL_SET_SREGS/KVM_SET_SREGS IOCTL/, to
match with IOCTL function name exactly.

 CR0.PE=1 but before CS has been reloaded.

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |  1 -
  arch/x86/kvm/svm.c  | 35 ++-
  arch/x86/kvm/vmx.c  | 24 
  3 files changed, 18 insertions(+), 42 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index e21aee98a5c2..49314155b66c 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -130,7 +130,6 @@ enum kvm_reg_ex {
   VCPU_EXREG_PDPTR = NR_VCPU_REGS,
   VCPU_EXREG_CR3,
   VCPU_EXREG_RFLAGS,
 - VCPU_EXREG_CPL,
   VCPU_EXREG_SEGMENTS,
  };

 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index 0b7d58d0c5fb..ec8366c5cfea 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1338,21 +1338,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
   wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
  }

 -static void svm_update_cpl(struct kvm_vcpu *vcpu)
 -{
 - struct vcpu_svm *svm = to_svm(vcpu);
 - int cpl;
 -
 - if (!is_protmode(vcpu))
 - cpl = 0;
 - else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
 - cpl = 3;
 - else
 - cpl = svm-vmcb-save.cs.selector  0x3;
 -
 - svm-vmcb-save.cpl = cpl;
 -}
 -
  static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
  {
   return to_svm(vcpu)-vmcb-save.rflags;
 @@ -1360,11 +1345,12 @@ static unsigned long svm_get_rflags(struct
 kvm_vcpu *vcpu)

  static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
  {
 - unsigned long old_rflags = to_svm(vcpu)-vmcb-save.rflags;
 -
 +   /*
 +* Any change of EFLAGS.VM is accompained by a reload of SS
 +* (caused by either a task switch or an inter-privilege IRET),
 +* so we do not need to update the CPL here.
 +*/
   to_svm(vcpu)-vmcb-save.rflags = rflags;
 - if ((old_rflags ^ rflags)  X86_EFLAGS_VM)
 - svm_update_cpl(vcpu);
  }

  static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 @@ -1631,8 +1617,15 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
   s-attrib |= (var-db  1)  SVM_SELECTOR_DB_SHIFT;
   s-attrib |= (var-g  1)  SVM_SELECTOR_G_SHIFT;
   }
 - if (seg == VCPU_SREG_CS)
 - svm_update_cpl(vcpu);
 +
 + /*
 + * This is always accurate, except if SYSRET returned to a segment
 + * with SS.DPL != 3.  Intel does not have this quirk, and always
 + * forces SS.DPL to 3 on sysret, so we ignore that case; fixing it
 + * would entail passing the CPL to userspace and back.
 + */
 + if (seg == VCPU_SREG_SS)
 + svm-vmcb-save.cpl = (s-attrib  SVM_SELECTOR_DPL_SHIFT)  3;

   mark_dirty(svm-vmcb, VMCB_SEG);
  }
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 6f7463f53ed9..a267108403f5 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -414,7 +414,6 @@ struct vcpu_vmx {
   struct kvm_vcpu   vcpu;
   unsigned long host_rsp;
   u8fail;
 - u8cpl;
   bool  nmi_known_unmasked;
   u32   exit_intr_info;
   u32   idt_vectoring_info;
 @@ -3150,10 +3149,6 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
   fix_pmode_seg(vcpu, VCPU_SREG_DS, vmx-rmode.segs[VCPU_SREG_DS]);
   

[PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-15 Thread Paolo Bonzini
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm.c  | 35 ++-
 arch/x86/kvm/vmx.c  | 24 
 3 files changed, 18 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e21aee98a5c2..49314155b66c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -130,7 +130,6 @@ enum kvm_reg_ex {
VCPU_EXREG_PDPTR = NR_VCPU_REGS,
VCPU_EXREG_CR3,
VCPU_EXREG_RFLAGS,
-   VCPU_EXREG_CPL,
VCPU_EXREG_SEGMENTS,
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0b7d58d0c5fb..ec8366c5cfea 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1338,21 +1338,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm->host_user_msrs[i]);
 }
 
-static void svm_update_cpl(struct kvm_vcpu *vcpu)
-{
-   struct vcpu_svm *svm = to_svm(vcpu);
-   int cpl;
-
-   if (!is_protmode(vcpu))
-   cpl = 0;
-   else if (svm->vmcb->save.rflags & X86_EFLAGS_VM)
-   cpl = 3;
-   else
-   cpl = svm->vmcb->save.cs.selector & 0x3;
-
-   svm->vmcb->save.cpl = cpl;
-}
-
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
return to_svm(vcpu)->vmcb->save.rflags;
@@ -1360,11 +1345,12 @@ static unsigned long svm_get_rflags(struct kvm_vcpu 
*vcpu)
 
 static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
-   unsigned long old_rflags = to_svm(vcpu)->vmcb->save.rflags;
-
+   /*
+* Any change of EFLAGS.VM is accompained by a reload of SS
+* (caused by either a task switch or an inter-privilege IRET),
+* so we do not need to update the CPL here.
+*/
to_svm(vcpu)->vmcb->save.rflags = rflags;
-   if ((old_rflags ^ rflags) & X86_EFLAGS_VM)
-   svm_update_cpl(vcpu);
 }
 
 static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
@@ -1631,8 +1617,15 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
s->attrib |= (var->db & 1) << SVM_SELECTOR_DB_SHIFT;
s->attrib |= (var->g & 1) << SVM_SELECTOR_G_SHIFT;
}
-   if (seg == VCPU_SREG_CS)
-   svm_update_cpl(vcpu);
+
+   /*
+* This is always accurate, except if SYSRET returned to a segment
+* with SS.DPL != 3.  Intel does not have this quirk, and always
+* forces SS.DPL to 3 on sysret, so we ignore that case; fixing it
+* would entail passing the CPL to userspace and back.
+*/
+   if (seg == VCPU_SREG_SS)
+   svm->vmcb->save.cpl = (s->attrib >> SVM_SELECTOR_DPL_SHIFT) & 3;
 
mark_dirty(svm->vmcb, VMCB_SEG);
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6f7463f53ed9..a267108403f5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -414,7 +414,6 @@ struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
u8fail;
-   u8cpl;
bool  nmi_known_unmasked;
u32   exit_intr_info;
u32   idt_vectoring_info;
@@ -3150,10 +3149,6 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
fix_pmode_seg(vcpu, VCPU_SREG_DS, >rmode.segs[VCPU_SREG_DS]);
fix_pmode_seg(vcpu, VCPU_SREG_FS, >rmode.segs[VCPU_SREG_FS]);
fix_pmode_seg(vcpu, VCPU_SREG_GS, >rmode.segs[VCPU_SREG_GS]);
-
-   /* CPL is always 0 when CPU enters protected mode */
-   __set_bit(VCPU_EXREG_CPL, (ulong *)>arch.regs_avail);
-   vmx->cpl = 0;
 }
 
 static void fix_rmode_seg(int seg, struct kvm_segment *save)
@@ -3555,22 +3550,14 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-   if (!is_protmode(vcpu))
+   if (unlikely(vmx->rmode.vm86_active))
 

[PATCH v2 4/4] KVM: x86: get CPL from SS.DPL

2014-05-15 Thread Paolo Bonzini
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/svm.c  | 35 ++-
 arch/x86/kvm/vmx.c  | 24 
 3 files changed, 18 insertions(+), 42 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e21aee98a5c2..49314155b66c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -130,7 +130,6 @@ enum kvm_reg_ex {
VCPU_EXREG_PDPTR = NR_VCPU_REGS,
VCPU_EXREG_CR3,
VCPU_EXREG_RFLAGS,
-   VCPU_EXREG_CPL,
VCPU_EXREG_SEGMENTS,
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0b7d58d0c5fb..ec8366c5cfea 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1338,21 +1338,6 @@ static void svm_vcpu_put(struct kvm_vcpu *vcpu)
wrmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
 }
 
-static void svm_update_cpl(struct kvm_vcpu *vcpu)
-{
-   struct vcpu_svm *svm = to_svm(vcpu);
-   int cpl;
-
-   if (!is_protmode(vcpu))
-   cpl = 0;
-   else if (svm-vmcb-save.rflags  X86_EFLAGS_VM)
-   cpl = 3;
-   else
-   cpl = svm-vmcb-save.cs.selector  0x3;
-
-   svm-vmcb-save.cpl = cpl;
-}
-
 static unsigned long svm_get_rflags(struct kvm_vcpu *vcpu)
 {
return to_svm(vcpu)-vmcb-save.rflags;
@@ -1360,11 +1345,12 @@ static unsigned long svm_get_rflags(struct kvm_vcpu 
*vcpu)
 
 static void svm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
-   unsigned long old_rflags = to_svm(vcpu)-vmcb-save.rflags;
-
+   /*
+* Any change of EFLAGS.VM is accompained by a reload of SS
+* (caused by either a task switch or an inter-privilege IRET),
+* so we do not need to update the CPL here.
+*/
to_svm(vcpu)-vmcb-save.rflags = rflags;
-   if ((old_rflags ^ rflags)  X86_EFLAGS_VM)
-   svm_update_cpl(vcpu);
 }
 
 static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
@@ -1631,8 +1617,15 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
s-attrib |= (var-db  1)  SVM_SELECTOR_DB_SHIFT;
s-attrib |= (var-g  1)  SVM_SELECTOR_G_SHIFT;
}
-   if (seg == VCPU_SREG_CS)
-   svm_update_cpl(vcpu);
+
+   /*
+* This is always accurate, except if SYSRET returned to a segment
+* with SS.DPL != 3.  Intel does not have this quirk, and always
+* forces SS.DPL to 3 on sysret, so we ignore that case; fixing it
+* would entail passing the CPL to userspace and back.
+*/
+   if (seg == VCPU_SREG_SS)
+   svm-vmcb-save.cpl = (s-attrib  SVM_SELECTOR_DPL_SHIFT)  3;
 
mark_dirty(svm-vmcb, VMCB_SEG);
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6f7463f53ed9..a267108403f5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -414,7 +414,6 @@ struct vcpu_vmx {
struct kvm_vcpu   vcpu;
unsigned long host_rsp;
u8fail;
-   u8cpl;
bool  nmi_known_unmasked;
u32   exit_intr_info;
u32   idt_vectoring_info;
@@ -3150,10 +3149,6 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
fix_pmode_seg(vcpu, VCPU_SREG_DS, vmx-rmode.segs[VCPU_SREG_DS]);
fix_pmode_seg(vcpu, VCPU_SREG_FS, vmx-rmode.segs[VCPU_SREG_FS]);
fix_pmode_seg(vcpu, VCPU_SREG_GS, vmx-rmode.segs[VCPU_SREG_GS]);
-
-   /* CPL is always 0 when CPU enters protected mode */
-   __set_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail);
-   vmx-cpl = 0;
 }
 
 static void fix_rmode_seg(int seg, struct kvm_segment *save)
@@ -3555,22 +3550,14 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
-   if (!is_protmode(vcpu))
+   if (unlikely(vmx-rmode.vm86_active))