Re: [PATCH v2 0/8] ccp: KVM: SVM: Use stack for SEV command buffers

2021-04-15 Thread Tom Lendacky
On 4/15/21 11:09 AM, Paolo Bonzini wrote:
> On 07/04/21 20:00, Tom Lendacky wrote:
>> For the series:
>>
>> Acked-by: Tom Lendacky
> 
> Shall I take this as a request (or permission, whatever :)) to merge it
> through the KVM tree?

Adding Herbert. Here's a link to the series:

https://lore.kernel.org/kvm/88eef561-6fd8-a495-0d60-ff688070c...@redhat.com/T/#m2bbdd12452970d3bd7d0b1464c22bf2f0227a9f1

I'm not sure how you typically do the cross-tree stuff. Patch 8 has a
requirement on patches 1-7. The arch/x86/kvm/svm/sev.c file tends to have
more activity/changes than drivers/crypto/ccp/sev-dev.{c,h}, so it would
make sense to take it through the KVM tree. But I think you need to verify
that with Herbert.

Thanks,
Tom

> 
> Paolo
> 


Re: [PATCH v3] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-09 Thread Tom Lendacky
On 4/9/21 9:38 AM, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Access to the GHCB is mainly in the VMGEXIT path and it is known that the
> GHCB will be mapped. But there are two paths where it is possible the GHCB
> might not be mapped.
> 
> The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
> the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
> However, if a SIPI is performed without a corresponding AP Reset Hold,
> then the GHCB might not be mapped (depending on the previous VMEXIT),
> which will result in a NULL pointer dereference.
> 
> The svm_complete_emulated_msr() routine will update the GHCB to inform
> the caller of a RDMSR/WRMSR operation about any errors. While it is likely
> that the GHCB will be mapped in this situation, add a safe guard
> in this path to be certain a NULL pointer dereference is not encountered.
> 
> Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts 
> under SEV-ES")
> Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES 
> guest")
> Signed-off-by: Tom Lendacky 
> 
> ---
> 
> Changes from v2:
> - Removed WARN_ON_ONCE() from the sev_vcpu_deliver_sipi_vector() path
>   since it is guest triggerable and can crash systems with panic_on_warn
>   and replaced with pr_warn_once().

I messed up the change-log here, the WARN_ON_ONCE() was dropped and *not*
replaced with a pr_warn_once().

Thanks,
Tom

> 
> Changes from v1:
> - Added the svm_complete_emulated_msr() path as suggested by Sean
>   Christopherson
> - Add a WARN_ON_ONCE() to the sev_vcpu_deliver_sipi_vector() path
> ---
>  arch/x86/kvm/svm/sev.c | 3 +++
>  arch/x86/kvm/svm/svm.c | 2 +-
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 83e00e524513..0a539f8bc212 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu 
> *vcpu, u8 vector)
>* the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>* non-zero value.
>*/
> + if (!svm->ghcb)
> + return;
> +
>   ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>  }
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 271196400495..534e52ba6045 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2759,7 +2759,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
>  static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
> - if (!sev_es_guest(vcpu->kvm) || !err)
> + if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
>   return kvm_complete_insn_gp(vcpu, err);
>  
>   ghcb_set_sw_exit_info_1(svm->ghcb, 1);
> 


Re: [PATCH v2] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-09 Thread Tom Lendacky
On 4/8/21 2:48 PM, Sean Christopherson wrote:
> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>>
>>
>> On 4/8/21 12:37 PM, Sean Christopherson wrote:
>>> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>>>> On 4/8/21 12:10 PM, Sean Christopherson wrote:
>>>>> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>>>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>>>> index 83e00e524513..7ac67615c070 100644
>>>>>> --- a/arch/x86/kvm/svm/sev.c
>>>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>>>> @@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu 
>>>>>> *vcpu, u8 vector)
>>>>>>   * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>>>>>>   * non-zero value.
>>>>>>   */
>>>>>> +if (WARN_ON_ONCE(!svm->ghcb))
>>>>>
>>>>> Isn't this guest triggerable?  I.e. send a SIPI without doing the reset 
>>>>> hold?
>>>>> If so, this should not WARN.
>>>>
>>>> Yes, it is a guest triggerable event. But a guest shouldn't be doing that,
>>>> so I thought adding the WARN_ON_ONCE() just to detect it wasn't bad.
>>>> Definitely wouldn't want a WARN_ON().
>>>
>>> WARNs are intended only for host issues, e.g. a malicious guest shouldn't be
>>> able to crash the host when running with panic_on_warn.
>>>
>>
>> Ah, yeah, forgot about panic_on_warn. I can go back to the original patch
>> or do a pr_warn_once(), any pref?
> 
> No strong preference.  If you think the print would be helpful for ongoing
> development, then it's probably worth adding.

For development, I'd want to see it all the time. But since it is guest
triggerable, the _once() method is really needed in production. So in the
latest version I just dropped the message/notification.

Thanks,
Tom

> 


[PATCH v3] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-09 Thread Tom Lendacky
From: Tom Lendacky 

Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.

The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.

The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.

Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts 
under SEV-ES")
Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky 

---

Changes from v2:
- Removed WARN_ON_ONCE() from the sev_vcpu_deliver_sipi_vector() path
  since it is guest triggerable and can crash systems with panic_on_warn
  and replaced with pr_warn_once().

Changes from v1:
- Added the svm_complete_emulated_msr() path as suggested by Sean
  Christopherson
- Add a WARN_ON_ONCE() to the sev_vcpu_deliver_sipi_vector() path
---
 arch/x86/kvm/svm/sev.c | 3 +++
 arch/x86/kvm/svm/svm.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 83e00e524513..0a539f8bc212 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, 
u8 vector)
 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
 * non-zero value.
 */
+   if (!svm->ghcb)
+   return;
+
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 271196400495..534e52ba6045 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2759,7 +2759,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
 {
struct vcpu_svm *svm = to_svm(vcpu);
-   if (!sev_es_guest(vcpu->kvm) || !err)
+   if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
return kvm_complete_insn_gp(vcpu, err);
 
ghcb_set_sw_exit_info_1(svm->ghcb, 1);
-- 
2.31.0



Re: [PATCH v2] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky



On 4/8/21 12:37 PM, Sean Christopherson wrote:
> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>> On 4/8/21 12:10 PM, Sean Christopherson wrote:
>>> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>>>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>>>> index 83e00e524513..7ac67615c070 100644
>>>> --- a/arch/x86/kvm/svm/sev.c
>>>> +++ b/arch/x86/kvm/svm/sev.c
>>>> @@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu 
>>>> *vcpu, u8 vector)
>>>> * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>>>> * non-zero value.
>>>> */
>>>> +  if (WARN_ON_ONCE(!svm->ghcb))
>>>
>>> Isn't this guest triggerable?  I.e. send a SIPI without doing the reset 
>>> hold?
>>> If so, this should not WARN.
>>
>> Yes, it is a guest triggerable event. But a guest shouldn't be doing that,
>> so I thought adding the WARN_ON_ONCE() just to detect it wasn't bad.
>> Definitely wouldn't want a WARN_ON().
> 
> WARNs are intended only for host issues, e.g. a malicious guest shouldn't be
> able to crash the host when running with panic_on_warn.
> 

Ah, yeah, forgot about panic_on_warn. I can go back to the original patch
or do a pr_warn_once(), any pref?

Thanks,
Tom


Re: [PATCH v2] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky
On 4/8/21 12:10 PM, Sean Christopherson wrote:
> On Thu, Apr 08, 2021, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> Access to the GHCB is mainly in the VMGEXIT path and it is known that the
>> GHCB will be mapped. But there are two paths where it is possible the GHCB
>> might not be mapped.
>>
>> The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
>> the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
>> However, if a SIPI is performed without a corresponding AP Reset Hold,
>> then the GHCB might not be mapped (depending on the previous VMEXIT),
>> which will result in a NULL pointer dereference.
>>
>> The svm_complete_emulated_msr() routine will update the GHCB to inform
>> the caller of a RDMSR/WRMSR operation about any errors. While it is likely
>> that the GHCB will be mapped in this situation, add a safe guard
>> in this path to be certain a NULL pointer dereference is not encountered.
>>
>> Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts 
>> under SEV-ES")
>> Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES 
>> guest")
>> Signed-off-by: Tom Lendacky 
>>
>> ---
>>
>> Changes from v1:
>> - Added the svm_complete_emulated_msr() path as suggested by Sean
>>   Christopherson
>> - Add a WARN_ON_ONCE() to the sev_vcpu_deliver_sipi_vector() path
>> ---
>>  arch/x86/kvm/svm/sev.c | 3 +++
>>  arch/x86/kvm/svm/svm.c | 2 +-
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 83e00e524513..7ac67615c070 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu 
>> *vcpu, u8 vector)
>>   * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>>   * non-zero value.
>>   */
>> +if (WARN_ON_ONCE(!svm->ghcb))
> 
> Isn't this guest triggerable?  I.e. send a SIPI without doing the reset hold?
> If so, this should not WARN.

Yes, it is a guest triggerable event. But a guest shouldn't be doing that,
so I thought adding the WARN_ON_ONCE() just to detect it wasn't bad.
Definitely wouldn't want a WARN_ON().

Thanks,
Tom

> 
>> +return;
>> +
>>  ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>>  }
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 271196400495..534e52ba6045 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -2759,7 +2759,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
>> msr_data *msr_info)
>>  static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
>>  {
>>  struct vcpu_svm *svm = to_svm(vcpu);
>> -if (!sev_es_guest(vcpu->kvm) || !err)
>> +if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
>>  return kvm_complete_insn_gp(vcpu, err);
>>  
>>  ghcb_set_sw_exit_info_1(svm->ghcb, 1);
>> -- 
>> 2.31.0
>>


Re: [PATCH 1/1] x86/kvm/svm: Implement support for PSFD

2021-04-08 Thread Tom Lendacky
On 4/7/21 2:45 PM, Ramakrishna Saripalli wrote:
> From: Ramakrishna Saripalli 
> 
> Expose Predictive Store Forwarding capability to guests.
> Guests enable or disable PSF via SPEC_CTRL MSR.
> 
> Signed-off-by: Ramakrishna Saripalli 
> ---
>  arch/x86/kvm/cpuid.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 6bd2f8b830e4..9c4af0fef6d7 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -448,6 +448,8 @@ void kvm_set_cpu_caps(void)
>   kvm_cpu_cap_set(X86_FEATURE_INTEL_STIBP);
>   if (boot_cpu_has(X86_FEATURE_AMD_SSBD))
>   kvm_cpu_cap_set(X86_FEATURE_SPEC_CTRL_SSBD);
> + if (boot_cpu_has(X86_FEATURE_AMD_PSFD))
> + kvm_cpu_cap_set(X86_FEATURE_AMD_PSFD);
>  
>   kvm_cpu_cap_mask(CPUID_7_1_EAX,
>   F(AVX_VNNI) | F(AVX512_BF16)
> @@ -482,7 +484,7 @@ void kvm_set_cpu_caps(void)
>   kvm_cpu_cap_mask(CPUID_8000_0008_EBX,
>   F(CLZERO) | F(XSAVEERPTR) |
>   F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | 
> F(VIRT_SSBD) |
> - F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON)
> + F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) | 
> F(AMD_PSFD)

Please note that this patch has a pre-req against the PSFD support that
defines this feature:

https://lore.kernel.org/lkml/20210406155004.230790-2-rsari...@amd.com/#t

Thanks,
Tom

>   );
>  
>   /*
> 


[PATCH v2] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky
From: Tom Lendacky 

Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.

The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.

The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.

Fixes: f1c6366e3043 ("KVM: SVM: Add required changes to support intercepts 
under SEV-ES")
Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky 

---

Changes from v1:
- Added the svm_complete_emulated_msr() path as suggested by Sean
  Christopherson
- Add a WARN_ON_ONCE() to the sev_vcpu_deliver_sipi_vector() path
---
 arch/x86/kvm/svm/sev.c | 3 +++
 arch/x86/kvm/svm/svm.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 83e00e524513..7ac67615c070 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2105,5 +2105,8 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, 
u8 vector)
 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
 * non-zero value.
 */
+   if (WARN_ON_ONCE(!svm->ghcb))
+   return;
+
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 271196400495..534e52ba6045 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2759,7 +2759,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
 {
struct vcpu_svm *svm = to_svm(vcpu);
-   if (!sev_es_guest(vcpu->kvm) || !err)
+   if (!err || !sev_es_guest(vcpu->kvm) || WARN_ON_ONCE(!svm->ghcb))
return kvm_complete_insn_gp(vcpu, err);
 
ghcb_set_sw_exit_info_1(svm->ghcb, 1);
-- 
2.31.0



Re: [PATCH] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky
On 4/8/21 11:14 AM, Paolo Bonzini wrote:
> On 08/04/21 18:04, Tom Lendacky wrote:
>>>>> +   if (!err || !sev_es_guest(vcpu->kvm) ||
>>>>> !WARN_ON_ONCE(svm->ghcb))
>>>> This should be WARN_ON_ONCE(!svm->ghcb), otherwise you'll get the right
>>>> result, but get a stack trace immediately.
>>> Doh, yep.
>> Actually, because of the "or's", this needs to be:
>>
>> if (!err || !sev_es_guest(vcpu->kvm) || (sev_es_guest(vcpu->kvm) &&
>> WARN_ON_ONCE(!svm->ghcb)))
> 
> No, || cuts the right-hand side if the left-hand side is true.  So:
> 
> - if err == 0, the rest is not evaluated
> 
> - if !sev_es_guest(vcpu->kvm), WARN_ON_ONCE(!svm->ghcb) is not evaluated

That's what I was doing in my head, but I guess I need more coffee... :)

Thanks,
Tom

> 
> Paolo
> 


Re: [PATCH] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-08 Thread Tom Lendacky



On 4/7/21 4:07 PM, Sean Christopherson wrote:
> On Wed, Apr 07, 2021, Tom Lendacky wrote:
>> On 4/7/21 3:08 PM, Sean Christopherson wrote:
>>> On Wed, Apr 07, 2021, Tom Lendacky wrote:
>>>> From: Tom Lendacky 
>>>>
>>>> The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
>>>> the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
>>>> However, if a SIPI is performed without a corresponding AP Reset Hold,
>>>> then the GHCB may not be mapped, which will result in a NULL pointer
>>>> dereference.
>>>>
>>>> Check that the GHCB is mapped before attempting the update.
>>>
>>> It's tempting to say the ghcb_set_*() helpers should guard against this, but
>>> that would add a lot of pollution and the vast majority of uses are very 
>>> clearly
>>> in the vmgexit path.  svm_complete_emulated_msr() is the only other case 
>>> that
>>> is non-obvious; would it make sense to sanity check svm->ghcb there as well?
>>
>> Hmm... I'm not sure if we can get here without having taken the VMGEXIT
>> path to start, but it certainly couldn't hurt to add it.
> 
> Yeah, AFAICT it should be impossible to reach the callback without a valid 
> ghcb,
> it'd be purely be a sanity check.
>  
>> I can submit a v2 with that unless you want to submit it (with one small
>> change below).
> 
> I'd say just throw it into v2.
> 
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index 019ac836dcd0..abe9c765628f 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -2728,7 +2728,8 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
>>> msr_data *msr_info)
>>>  static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
>>>  {
>>> struct vcpu_svm *svm = to_svm(vcpu);
>>> -   if (!sev_es_guest(vcpu->kvm) || !err)
>>> +
>>> +   if (!err || !sev_es_guest(vcpu->kvm) || !WARN_ON_ONCE(svm->ghcb))
>>
>> This should be WARN_ON_ONCE(!svm->ghcb), otherwise you'll get the right
>> result, but get a stack trace immediately.
> 
> Doh, yep.

Actually, because of the "or's", this needs to be:

if (!err || !sev_es_guest(vcpu->kvm) || (sev_es_guest(vcpu->kvm) && 
WARN_ON_ONCE(!svm->ghcb)))

Thanks,
Tom

> 


Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-04-08 Thread Tom Lendacky
On 4/7/21 2:45 PM, Borislav Petkov wrote:
> On Wed, Apr 07, 2021 at 01:25:55PM +0200, Borislav Petkov wrote:
>> On Tue, Apr 06, 2021 at 02:42:43PM -0500, Tom Lendacky wrote:
>>> The GHCB spec only defines the "0" reason code set. We could provide Linux
>>> it's own reason code set with some more specific reason codes for
>>> failures, if that is needed.
>>
>> Why Linux only?
>>
>> Don't we want to have a generalized set of error codes which say what
>> has happened so that people can debug?
> 
> To quote Tom from IRC - and that is perfectly fine too, AFAIC:
> 
>  i'm ok with it, but i don't think it should be something dictated 
> by the spec.  the problem is if you want to provide a new error code then the 
> spec has to be updated constantly
>  that's why i said, pick a "reason code set" value and say those 
> are what Linux will use. We could probably document them in Documentation/
>  the error code thing was an issue when introduced as part of the 
> first spec.  that's why only a small number of reason codes are specified
> 
> Yap, makes sense. What we should do in the spec, though, is say: "This
> range is for vendor-specific error codes".
> 
> Also, is GHCBData[23:16] big enough and can we extend it simply? Or do
> we need the spec to at least dictate some ranges so that it can use some bits
> above, say, bit 32 or whatever the upper range of the extension is...

Hopefully we won't have 255 different reason codes. But if we get to that
point we should be able to expand the reason code field to 16-bits. Just
need to be sure that if any new fields are added between now and then,
they are added at bit 32 or above.

Thanks,
Tom

> 
> Hmmm.
> 


Re: [PATCH] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-07 Thread Tom Lendacky
On 4/7/21 3:08 PM, Sean Christopherson wrote:
> On Wed, Apr 07, 2021, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
>> the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
>> However, if a SIPI is performed without a corresponding AP Reset Hold,
>> then the GHCB may not be mapped, which will result in a NULL pointer
>> dereference.
>>
>> Check that the GHCB is mapped before attempting the update.
> 
> It's tempting to say the ghcb_set_*() helpers should guard against this, but
> that would add a lot of pollution and the vast majority of uses are very 
> clearly
> in the vmgexit path.  svm_complete_emulated_msr() is the only other case that
> is non-obvious; would it make sense to sanity check svm->ghcb there as well?

Hmm... I'm not sure if we can get here without having taken the VMGEXIT
path to start, but it certainly couldn't hurt to add it.

I can submit a v2 with that unless you want to submit it (with one small
change below).

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 019ac836dcd0..abe9c765628f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -2728,7 +2728,8 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
>  static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
>  {
> struct vcpu_svm *svm = to_svm(vcpu);
> -   if (!sev_es_guest(vcpu->kvm) || !err)
> +
> +   if (!err || !sev_es_guest(vcpu->kvm) || !WARN_ON_ONCE(svm->ghcb))

This should be WARN_ON_ONCE(!svm->ghcb), otherwise you'll get the right
result, but get a stack trace immediately.

Thanks,
Tom

> return kvm_complete_insn_gp(vcpu, err);
> 
>     ghcb_set_sw_exit_info_1(svm->ghcb, 1);
> 
>> Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES 
>> guest")
>> Signed-off-by: Tom Lendacky 
> 
> Either way:
> 
> Reviewed-by: Sean Christopherson  
> 
>> ---
>>  arch/x86/kvm/svm/sev.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index 83e00e524513..13758e3b106d 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -2105,5 +2105,6 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu 
>> *vcpu, u8 vector)
>>   * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
>>   * non-zero value.
>>   */
>> -ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>> +if (svm->ghcb)
>> +ghcb_set_sw_exit_info_2(svm->ghcb, 1);
>>  }
>> -- 
>> 2.31.0
>>


[PATCH] KVM: SVM: Make sure GHCB is mapped before updating

2021-04-07 Thread Tom Lendacky
From: Tom Lendacky 

The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB may not be mapped, which will result in a NULL pointer
dereference.

Check that the GHCB is mapped before attempting the update.

Fixes: 647daca25d24 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: Tom Lendacky 
---
 arch/x86/kvm/svm/sev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 83e00e524513..13758e3b106d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2105,5 +2105,6 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, 
u8 vector)
 * the guest will set the CS and RIP. Set SW_EXIT_INFO_2 to a
 * non-zero value.
 */
-   ghcb_set_sw_exit_info_2(svm->ghcb, 1);
+   if (svm->ghcb)
+   ghcb_set_sw_exit_info_2(svm->ghcb, 1);
 }
-- 
2.31.0



Re: [PATCH v2 0/8] ccp: KVM: SVM: Use stack for SEV command buffers

2021-04-07 Thread Tom Lendacky
On 4/6/21 5:49 PM, Sean Christopherson wrote:
> This series teaches __sev_do_cmd_locked() to gracefully handle vmalloc'd
> command buffers by copying _all_ incoming data pointers to an internal
> buffer before sending the command to the PSP.  The SEV driver and KVM are
> then converted to use the stack for all command buffers.
> 
> Tested everything except sev_ioctl_do_pek_import(), I don't know anywhere
> near enough about the PSP to give it the right input.
> 
> v2:
>   - Rebase to kvm/queue, commit f96be2deac9b ("KVM: x86: Support KVM VMs
> sharing SEV context").
>   - Unconditionally copy @data to the internal buffer. [Christophe, Brijesh]
>   - Allocate a full page for the buffer. [Brijesh]
>   - Drop one set of the "!"s. [Christophe]
>   - Use virt_addr_valid() instead of is_vmalloc_addr() for the temporary
> patch (definitely feel free to drop the patch if it's not worth
> backporting). [Christophe]
>   - s/intput/input/. [Tom]
>   - Add a patch to free "sev" if init fails.  This is not strictly
> necessary (I think; I suck horribly when it comes to the driver
> framework).   But it felt wrong to not free cmd_buf on failure, and
> even more wrong to free cmd_buf but not sev.
> 
> v1:
>   - 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210402233702.3291792-1-seanjc%40google.com&data=04%7C01%7Cthomas.lendacky%40amd.com%7Cecd38fba67954845323908d8f94e5405%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533462102772796%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SUN8Zp%2Fi%2BiHAjMSe%2Fjwvs9JmXg%2FRvi%2B8j01sLDipPg8%3D&reserved=0
> 
> Sean Christopherson (8):
>   crypto: ccp: Free SEV device if SEV init fails
>   crypto: ccp: Detect and reject "invalid" addresses destined for PSP
>   crypto: ccp: Reject SEV commands with mismatching command buffer
>   crypto: ccp: Play nice with vmalloc'd memory for SEV command structs
>   crypto: ccp: Use the stack for small SEV command buffers
>   crypto: ccp: Use the stack and common buffer for status commands
>   crypto: ccp: Use the stack and common buffer for INIT command
>   KVM: SVM: Allocate SEV command structures on local stack
> 
>  arch/x86/kvm/svm/sev.c   | 262 +--
>  drivers/crypto/ccp/sev-dev.c | 197 +-
>  drivers/crypto/ccp/sev-dev.h |   4 +-
>  3 files changed, 196 insertions(+), 267 deletions(-)

For the series:

Acked-by: Tom Lendacky 

> 


Re: [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active

2021-04-07 Thread Tom Lendacky
On 4/7/21 12:34 PM, Brijesh Singh wrote:
> 
> On 4/7/21 6:59 AM, Borislav Petkov wrote:
>> On Wed, Mar 24, 2021 at 11:44:18AM -0500, Brijesh Singh wrote:
>>> The SEV-SNP guest is required to perform GHCB GPA registration. This is
>> Why does it need to do that? Some additional security so as to not allow
>> changing the GHCB once it is established?
>>
>> I'm guessing that's enforced by the SNP fw and we cannot do that
>> retroactively for SEV...? Because it sounds like a nice little thing we
>> could do additionally.
> 
> The feature is part of the GHCB version 2 and is enforced by the
> hypervisor. I guess it can be extended for the ES. Since this feature
> was not available in GHCB version 1 (base ES) so it should be presented
> as an optional for the ES ?

GHCB GPA registration is only supported and required for SEV-SNP guests.
The final version of the spec documents that and should be published
within the next few days.

Thanks,
Tom

> 
> 
>>
>>> because the hypervisor may prefer that a guest use a consistent and/or
>>> specific GPA for the GHCB associated with a vCPU. For more information,
>>> see the GHCB specification section 2.5.2.
>> I think you mean
>>
>> "2.3.2 GHCB GPA Registration"
>>
>> Please use the section name too because that doc changes from time to
>> time.
>>
>> Also, you probably should update it here:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=04%7C01%7Cbrijesh.singh%40amd.com%7Ce8ae7574ecc742be6c1a08d8f9bcac94%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533936070042328%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NaHJ5R9Dfo%2FPnci%2B%2B6xK9ecpV0%2F%2FYbsdGl25%2BFj3TaU%3D&reserved=0
>>
> 
> Yes, the section may have changed since I wrote the description. Noted.
> I will refer the section name.
> 
> 
>>> diff --git a/arch/x86/boot/compressed/sev-snp.c 
>>> b/arch/x86/boot/compressed/sev-snp.c
>>> index 5c25103b0df1..a4c5e85699a7 100644
>>> --- a/arch/x86/boot/compressed/sev-snp.c
>>> +++ b/arch/x86/boot/compressed/sev-snp.c
>>> @@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
>>>  {
>>> sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
>>>  }
>>> +
>>> +void sev_snp_register_ghcb(unsigned long paddr)
>> Right and let's prefix SNP-specific functions with "snp_" only so that
>> it is clear which is wcich when looking at the code.
>>
>>> +{
>>> +   u64 pfn = paddr >> PAGE_SHIFT;
>>> +   u64 old, val;
>>> +
>>> +   if (!sev_snp_enabled())
>>> +   return;
>>> +
>>> +   /* save the old GHCB MSR */
>>> +   old = sev_es_rd_ghcb_msr();
>>> +
>>> +   /* Issue VMGEXIT */
>> No need for that comment.
>>
>>> +   sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
>>> +   VMGEXIT();
>>> +
>>> +   val = sev_es_rd_ghcb_msr();
>>> +
>>> +   /* If the response GPA is not ours then abort the guest */
>>> +   if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
>>> +   (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
>>> +   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
>> Yet another example where using a specific termination reason could help
>> with debugging guests. Looking at the GHCB spec, I hope GHCBData[23:16]
>> is big enough for all reasons. I'm sure it can be extended ofc ...
> 
> 
> Maybe we can request the GHCB version 3 to add the extended error code.
> 
> 
>> :-)
>>
>>> +   /* Restore the GHCB MSR value */
>>> +   sev_es_wr_ghcb_msr(old);
>>> +}
>>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>>> index f514dad276f2..0523eb21abd7 100644
>>> --- a/arch/x86/include/asm/sev-snp.h
>>> +++ b/arch/x86/include/asm/sev-snp.h
>>> @@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
>>> struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
>>>  };
>>>  
>>> +/* GHCB GPA register */
>>> +#define GHCB_REGISTER_GPA_REQ  0x012UL
>>> +#defineGHCB_REGISTER_GPA_REQ_VAL(v)
>>> (GHCB_REGISTER_GPA_REQ | ((v) << 12))
>>> +
>>> +#define GHCB_REGISTER_GPA_RESP 0x013UL
>> Let's append "UL" to the other request numbers for consistency.
>>
>> Thx.
>>


Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-04-07 Thread Tom Lendacky
On 4/7/21 8:35 AM, Brijesh Singh wrote:
> 
> On 4/7/21 6:16 AM, Borislav Petkov wrote:
>> On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
>>> Before the GHCB is established the caller does not need to save and
>>> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
>>> can be called before and after the GHCB is established hence I am saving
>>> and restoring GHCB MSRs.
>> I think you need to elaborate on that, maybe with an example. What the
>> other sites using the GHCB MSR currently do is:
>>
>> 1. request by writing it
>> 2. read the response
>>
>> None of them save and restore it.
>>
>> So why here?
> 
> GHCB provides two ways to exit from the guest to the hypervisor. The MSR
> protocol and NAEs. The MSR protocol is generally used before the GHCB is
> established. After the GHCB is established the guests typically uses the
> NAEs. All of the current call sites uses the MSR protocol before the
> GHCB is established so they do not need to save and restore the GHCB.
> The GHCB is established on the first #VC -
> arch/x86/boot/compressed/sev-es.c early_setup_sev_es(). The GHCB page
> must a shared page:
> 
> early_setup_sev_es()
> 
>   set_page_decrypted()
> 
>    sev_snp_set_page_shared()
> 
> The sev_snp_set_page_shared() called before the GHCB is established.
> While exiting from the decompression the sev_es_shutdown_ghcb() is
> called to deinit the GHCB.
> 
> sev_es_shutdown_ghcb()
> 
>   set_page_encrypted()
> 
>     sev_snp_set_page_private()
> 
> Now that sev_snp_set_private() is called after the GHCB is established.

I believe the current SEV-ES code always sets the GHCB address in the GHCB
MSR before invoking VMGEXIT, so I think you're safe either way. Worth
testing at least.

Thanks,
Tom

> 
> Since both the sev_snp_set_page_{shared, private}() uses the common
> routine to request the page change hence I choose the Page State Change
> MSR protocol. In one case the page state request happen before and after
> the GHCB is established. We need to save and restore GHCB otherwise will
> be loose the previously established GHCB GPA.
> 
> If needed then we can avoid the save and restore. The GHCB  provides a
> page state change NAE that can be used after the GHCB is established. If
> we go with it then code may look like this:
> 
> 1. Read the GHCB MSR to determine whether the GHCB is established.
> 
> 2. If GHCB is established then use the page state change NAE
> 
> 3. If GHCB is not established then use the page state change MSR protocol.
> 
> We can eliminate the restore but we still need the rdmsr. The code for
> using the NAE page state is going to be a bit larger. Since it is not in
> the hot path so I felt we stick with MSR protocol for the page state change.
> 
> I am open to suggestions. 
> 
> -Brijesh
> 


Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-04-06 Thread Tom Lendacky
On 4/6/21 10:47 AM, Brijesh Singh wrote:
> 
> On 4/6/21 5:33 AM, Borislav Petkov wrote:
>> On Wed, Mar 24, 2021 at 11:44:17AM -0500, Brijesh Singh wrote:
>>

...

>> *Any* and *all* page state changes which fail immediately terminate a
>> guest? Why?
> 
> 
> The hypervisor uses the RMPUPDATE instruction to add the pages in the
> RMP table. If RMPUPDATE fails, then it will be communicated to the
> guest. Now its up to guest on what it wants to do. I choose to terminate
> because guest can't resolve this step on its own. It needs help from the
> hypervisor and hypervisor has bailed on it. Depending on request type,
> the next step will either fail or we go into infinite loop. Lets
> consider an example:
> 
> 1. Guest asked to add a page as a private in RMP table.
> 
> 2. Hypervisor fail to add the page in the RMP table and return an error.
> 
> 3. Guest ignored the error code and moved to the step to validate the page.
> 
> 4. The page validation instruction expects that page must be added in
> the RMP table. In our case the page was not added in the RMP table. So
> it will cause #NPF (rmp violation).
> 
> 5. On #NPF, hypervisor will try adding the page as private but it will
> fail (same as #2). This will keep repeating and guest will not make any
> progress.
> 
> I choose to return "void" from page_state_change() because caller can't
> do anything with error code. Some of the failure may have security
> implication, terminate the guest  as soon as we detect an error condition.
> 
> 
>> Then, how do we communicate this to the guest user what has happened?
>>
>> Can GHCB_SEV_ES_REASON_GENERAL_REQUEST be something special like
>>
>> GHCB_SEV_ES_REASON_PSC_FAILURE
>>
>> or so, so that users know what has happened?
> 
> 
> Current GHCB does not have special code for this. But I think Linux
> guest can define a special code which can be used to indicate the
> termination reason.
> 
> Tom,
> 
> Any other suggestion ?

The GHCB spec only defines the "0" reason code set. We could provide Linux
it's own reason code set with some more specific reason codes for
failures, if that is needed.

Thanks,
Tom

> 
> 
>>


Re: [RFCv1 7/7] KVM: unmap guest memory using poisoned pages

2021-04-06 Thread Tom Lendacky
On 4/6/21 9:33 AM, Dave Hansen wrote:
> On 4/6/21 12:44 AM, David Hildenbrand wrote:
>> On 02.04.21 17:26, Kirill A. Shutemov wrote:
>>> TDX architecture aims to provide resiliency against confidentiality and
>>> integrity attacks. Towards this goal, the TDX architecture helps enforce
>>> the enabling of memory integrity for all TD-private memory.
>>>
>>> The CPU memory controller computes the integrity check value (MAC) for
>>> the data (cache line) during writes, and it stores the MAC with the
>>> memory as meta-data. A 28-bit MAC is stored in the ECC bits.
>>>
>>> Checking of memory integrity is performed during memory reads. If
>>> integrity check fails, CPU poisones cache line.
>>>
>>> On a subsequent consumption (read) of the poisoned data by software,
>>> there are two possible scenarios:
>>>
>>>   - Core determines that the execution can continue and it treats
>>>     poison with exception semantics signaled as a #MCE
>>>
>>>   - Core determines execution cannot continue,and it does an unbreakable
>>>     shutdown
>>>
>>> For more details, see Chapter 14 of Intel TDX Module EAS[1]
>>>
>>> As some of integrity check failures may lead to system shutdown host
>>> kernel must not allow any writes to TD-private memory. This requirment
>>> clashes with KVM design: KVM expects the guest memory to be mapped into
>>> host userspace (e.g. QEMU).
>>
>> So what you are saying is that if QEMU would write to such memory, it
>> could crash the kernel? What a broken design.
> 
> IMNHO, the broken design is mapping the memory to userspace in the first
> place.  Why the heck would you actually expose something with the MMU to
> a context that can't possibly meaningfully access or safely write to it?
> 
> This started with SEV.  QEMU creates normal memory mappings with the SEV
> C-bit (encryption) disabled.  The kernel plumbs those into NPT, but when
> those are instantiated, they have the C-bit set.  So, we have mismatched
> mappings.  Where does that lead?  The two mappings not only differ in
> the encryption bit, causing one side to read gibberish if the other
> writes: they're not even cache coherent.

QEMU is running on the hypervisor side, so even if the C-bit is set for
its memory mappings, it would use the hypervisor key to access the memory,
not the guest key. So it doesn't matter from a QEMU perspective whether it
creates mappings with or without the C-bit. The C-bit in the NPT is only
used if the guest is accessing the memory as shared/un-encrypted, in which
case the the hypervisor key is then used.

The latest EPYC hardware provides cache coherency for encrypted /
non-encrypted accesses (X86_FEATURE_SME_COHERENT).

> 
> That's the situation *TODAY*, even ignoring TDX.
> 
> BTW, I'm pretty sure I know the answer to the "why would you expose this
> to userspace" question: it's what QEMU/KVM did alreadhy for
> non-encrypted memory, so this was the quickest way to get SEV working.
> 
> So, I don't like the #MC either.  But, this series is a step in the
> right direction for TDX *AND* SEV.

So, yes, this is a step in the right direction.

Thanks,
Tom

> 


Re: [PATCH 2/5] crypto: ccp: Reject SEV commands with mismatching command buffer

2021-04-05 Thread Tom Lendacky



On 4/5/21 11:33 AM, Sean Christopherson wrote:
> On Mon, Apr 05, 2021, Tom Lendacky wrote:
>> On 4/2/21 6:36 PM, Sean Christopherson wrote:
>>> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
>>> index 6556d220713b..4c513318f16a 100644
>>> --- a/drivers/crypto/ccp/sev-dev.c
>>> +++ b/drivers/crypto/ccp/sev-dev.c
>>> @@ -141,6 +141,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
>>> *psp_ret)
>>> struct sev_device *sev;
>>> unsigned int phys_lsb, phys_msb;
>>> unsigned int reg, ret = 0;
>>> +   int buf_len;
>>>  
>>> if (!psp || !psp->sev_data)
>>> return -ENODEV;
>>> @@ -150,7 +151,11 @@ static int __sev_do_cmd_locked(int cmd, void *data, 
>>> int *psp_ret)
>>>  
>>> sev = psp->sev_data;
>>>  
>>> -   if (data && WARN_ON_ONCE(is_vmalloc_addr(data)))
>>> +   buf_len = sev_cmd_buffer_len(cmd);
>>> +   if (WARN_ON_ONCE(!!data != !!buf_len))
>>
>> Seems a bit confusing to me.  Can this just be:
>>
>>  if (WARN_ON_ONCE(data && !buf_len))
> 
> Or as Christophe pointed out, "!data != !buf_len".
> 
>> Or is this also trying to catch the case where buf_len is non-zero but
>> data is NULL?
> 
> Ya.  It's not necessary to detect "buf_len && !data", but it doesn't incur
> additional cost.  Is there a reason _not_ to disallow that?

Nope, no reason. I was just trying to process all the not signs :)

Thanks,
Tom

> 


Re: [PATCH 2/5] crypto: ccp: Reject SEV commands with mismatching command buffer

2021-04-05 Thread Tom Lendacky
On 4/2/21 6:36 PM, Sean Christopherson wrote:
> WARN on and reject SEV commands that provide a valid data pointer, but do
> not have a known, non-zero length.  And conversely, reject commands that
> take a command buffer but none is provided.
> 
> Aside from sanity checking intput, disallowing a non-null pointer without

s/intput/input/

> a non-zero size will allow a future patch to cleanly handle vmalloc'd
> data by copying the data to an internal __pa() friendly buffer.
> 
> Note, this also effectively prevents callers from using commands that
> have a non-zero length and are not known to the kernel.  This is not an
> explicit goal, but arguably the side effect is a good thing from the
> kernel's perspective.
> 
> Cc: Brijesh Singh 
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Signed-off-by: Sean Christopherson 
> ---
>  drivers/crypto/ccp/sev-dev.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 6556d220713b..4c513318f16a 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -141,6 +141,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
> *psp_ret)
>   struct sev_device *sev;
>   unsigned int phys_lsb, phys_msb;
>   unsigned int reg, ret = 0;
> + int buf_len;
>  
>   if (!psp || !psp->sev_data)
>   return -ENODEV;
> @@ -150,7 +151,11 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
> *psp_ret)
>  
>   sev = psp->sev_data;
>  
> - if (data && WARN_ON_ONCE(is_vmalloc_addr(data)))
> + buf_len = sev_cmd_buffer_len(cmd);
> + if (WARN_ON_ONCE(!!data != !!buf_len))

Seems a bit confusing to me.  Can this just be:

if (WARN_ON_ONCE(data && !buf_len))

Or is this also trying to catch the case where buf_len is non-zero but
data is NULL?

Thanks,
Tom

> + return -EINVAL;
> +
> + if (WARN_ON_ONCE(data && is_vmalloc_addr(data)))
>   return -EINVAL;
>  
>   /* Get the physical address of the command buffer */
> @@ -161,7 +166,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
> *psp_ret)
>   cmd, phys_msb, phys_lsb, psp_timeout);
>  
>   print_hex_dump_debug("(in):  ", DUMP_PREFIX_OFFSET, 16, 2, data,
> -  sev_cmd_buffer_len(cmd), false);
> +  buf_len, false);
>  
>   iowrite32(phys_lsb, sev->io_regs + sev->vdata->cmdbuff_addr_lo_reg);
>   iowrite32(phys_msb, sev->io_regs + sev->vdata->cmdbuff_addr_hi_reg);
> @@ -197,7 +202,7 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
> *psp_ret)
>   }
>  
>   print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
> -  sev_cmd_buffer_len(cmd), false);
> +  buf_len, false);
>  
>   return ret;
>  }
> 


Re: [PATCH] X86: __set_clr_pte_enc() miscalculates physical address

2021-03-22 Thread Tom Lendacky
On 3/18/21 3:26 PM, Isaku Yamahata wrote:
> __set_clr_pte_enc() miscalculates physical address to operate.
> pfn is in unit of PG_LEVEL_4K, not PGL_LEVEL_{2M, 1G}.
> Shift size to get physical address should be PAGE_SHIFT,
> not page_level_shift().
> 
> Fixes: dfaaec9033b8 ("x86: Add support for changing memory encryption 
> attribute in early boot")
> Reviewed-by: Kirill A. Shutemov 
> Signed-off-by: Isaku Yamahata 

Reviewed-by: Tom Lendacky 

> ---
>  arch/x86/mm/mem_encrypt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 4b01f7dbaf30..ae78cef79980 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -262,7 +262,7 @@ static void __init __set_clr_pte_enc(pte_t *kpte, int 
> level, bool enc)
>   if (pgprot_val(old_prot) == pgprot_val(new_prot))
>   return;
>  
> - pa = pfn << page_level_shift(level);
> + pa = pfn << PAGE_SHIFT;
>   size = page_level_size(level);
>  
>   /*
> 


Re: [PATCH v3 2/8] x86/sev: Do not require Hypervisor CPUID bit for SEV guests

2021-03-17 Thread Tom Lendacky
On 3/12/21 6:38 AM, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> A malicious hypervisor could disable the CPUID intercept for an SEV or
> SEV-ES guest and trick it into the no-SEV boot path, where it could
> potentially reveal secrets. This is not an issue for SEV-SNP guests,
> as the CPUID intercept can't be disabled for those.
> 
> Remove the Hypervisor CPUID bit check from the SEV detection code to
> protect against this kind of attack and add a Hypervisor bit equals
> zero check to the SME detection path to prevent non-SEV guests from
> trying to enable SME.
> 
> This handles the following cases:
> 
>   1) SEV(-ES) guest where CPUID intercept is disabled. The guest
>  will still see leaf 0x801f and the SEV bit. It can
>  retrieve the C-bit and boot normally.
> 
>   2) Non-SEV guests with intercepted CPUID will check SEV_STATUS
>  MSR and find it 0 and will try to enable SME. This will
>  fail when the guest finds MSR_K8_SYSCFG to be zero, as it
>  is emulated by KVM. But we can't rely on that, as there
>  might be other hypervisors which return this MSR with bit
>  23 set. The Hypervisor bit check will prevent that the
>  guest tries to enable SME in this case.
> 
>   3) Non-SEV guests on SEV capable hosts with CPUID intercept
>  disabled (by a malicious hypervisor) will try to boot into
>  the SME path. This will fail, but it is also not considered
>  a problem because non-encrypted guests have no protection
>  against the hypervisor anyway.
> 
> Signed-off-by: Joerg Roedel 

Acked-by: Tom Lendacky 

> ---
>  arch/x86/boot/compressed/mem_encrypt.S |  6 -
>  arch/x86/kernel/sev-es-shared.c|  6 +
>  arch/x86/mm/mem_encrypt_identity.c | 35 ++
>  3 files changed, 20 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
> b/arch/x86/boot/compressed/mem_encrypt.S
> index aa561795efd1..a6dea4e8a082 100644
> --- a/arch/x86/boot/compressed/mem_encrypt.S
> +++ b/arch/x86/boot/compressed/mem_encrypt.S
> @@ -23,12 +23,6 @@ SYM_FUNC_START(get_sev_encryption_bit)
>   push%ecx
>   push%edx
>  
> - /* Check if running under a hypervisor */
> - movl$1, %eax
> - cpuid
> - bt  $31, %ecx   /* Check the hypervisor bit */
> - jnc .Lno_sev
> -
>   movl$0x8000, %eax   /* CPUID to check the highest leaf */
>   cpuid
>   cmpl$0x801f, %eax   /* See if 0x801f is available */
> diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
> index cdc04d091242..387b71669818 100644
> --- a/arch/x86/kernel/sev-es-shared.c
> +++ b/arch/x86/kernel/sev-es-shared.c
> @@ -186,7 +186,6 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
> long exit_code)
>* make it accessible to the hypervisor.
>*
>* In particular, check for:
> -  *  - Hypervisor CPUID bit
>*  - Availability of CPUID leaf 0x801f
>*  - SEV CPUID bit.
>*
> @@ -194,10 +193,7 @@ void __init do_vc_no_ghcb(struct pt_regs *regs, unsigned 
> long exit_code)
>* can't be checked here.
>*/
>  
> - if ((fn == 1 && !(regs->cx & BIT(31
> - /* Hypervisor bit */
> - goto fail;
> - else if (fn == 0x8000 && (regs->ax < 0x801f))
> + if (fn == 0x8000 && (regs->ax < 0x801f))
>   /* SEV leaf check */
>   goto fail;
>   else if ((fn == 0x801f && !(regs->ax & BIT(1
> diff --git a/arch/x86/mm/mem_encrypt_identity.c 
> b/arch/x86/mm/mem_encrypt_identity.c
> index 6c5eb6f3f14f..a19374d26101 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -503,14 +503,10 @@ void __init sme_enable(struct boot_params *bp)
>  
>  #define AMD_SME_BIT  BIT(0)
>  #define AMD_SEV_BIT  BIT(1)
> - /*
> -  * Set the feature mask (SME or SEV) based on whether we are
> -  * running under a hypervisor.
> -  */
> - eax = 1;
> - ecx = 0;
> - native_cpuid(&eax, &ebx, &ecx, &edx);
> - feature_mask = (ecx & BIT(31)) ? AMD_SEV_BIT : AMD_SME_BIT;
> +
> + /* Check the SEV MSR whether SEV or SME is enabled */
> + sev_status   = __rdmsr(MSR_AMD64_SEV);
> + feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : 
> AMD_SME_BIT;
>  
>   /*
>* Check for the SME/SEV feature:
> @@ -530,19 +526,26 @@ void __init sme_enable

Re: [Patch v3 1/2] cgroup: sev: Add misc cgroup controller

2021-03-12 Thread Tom Lendacky

On 3/12/21 2:51 PM, Sean Christopherson wrote:

On Fri, Mar 12, 2021, Vipin Sharma wrote:

On Thu, Mar 11, 2021 at 07:59:03PM +0100, Michal Koutný wrote:

+#ifndef CONFIG_KVM_AMD_SEV
+/*
+ * When this config is not defined, SEV feature is not supported and APIs in
+ * this file are not used but this file still gets compiled into the KVM AMD
+ * module.
+ *
+ * We will not have MISC_CG_RES_SEV and MISC_CG_RES_SEV_ES entries in the enum
+ * misc_res_type {} defined in linux/misc_cgroup.h.

BTW, was there any progress on conditioning sev.c build on
CONFIG_KVM_AMD_SEV? (So that the defines workaround isn't needeed.)


Tom, Brijesh,
Is this something you guys thought about or have some plans to do in the
future? Basically to not include sev.c in compilation if
CONFIG_KVM_AMD_SEV is disabled.


It's crossed my mind, but the number of stubs needed made me back off.  I'm
certainly not opposed to the idea, it's just not a trivial change.


Right, I looked at it when I was doing the SEV-ES work and came to the 
same conclusion.


Thanks,
Tom





Re: [PATCH 2/2] KVM: x86/mmu: Exclude the MMU_PRESENT bit from MMIO SPTE's generation

2021-03-09 Thread Tom Lendacky
On 3/8/21 8:19 PM, Sean Christopherson wrote:
> Drop bit 11, used for the MMU_PRESENT flag, from the set of bits used to
> store the generation number in MMIO SPTEs.  MMIO SPTEs with bit 11 set,
> which occurs when userspace creates 128+ memslots in an address space,
> get false positives for is_shadow_present_spte(), which lead to a variety
> of fireworks, crashes KVM, and likely hangs the host kernel.
> 
> Fixes: b14e28f37e9b ("KVM: x86/mmu: Use a dedicated bit to track 
> shadow/MMU-present SPTEs")
> Reported-by: Tom Lendacky 

Fixes the issue for me. Thanks, Sean.

Tested-by: Tom Lendacky 

> Reported-by: Paolo Bonzini 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/mmu/spte.h | 12 +++-
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
> index b53036d9ddf3..bca0ba11cccf 100644
> --- a/arch/x86/kvm/mmu/spte.h
> +++ b/arch/x86/kvm/mmu/spte.h
> @@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>  #undef SHADOW_ACC_TRACK_SAVED_MASK
>  
>  /*
> - * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
> + * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
>   * the memslots generation and is derived as follows:
>   *
> - * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
> - * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
> + * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
> + * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
>   *
>   * The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included 
> in
>   * the MMIO generation number, as doing so would require stealing a bit from
> @@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>   */
>  
>  #define MMIO_SPTE_GEN_LOW_START  3
> -#define MMIO_SPTE_GEN_LOW_END11
> +#define MMIO_SPTE_GEN_LOW_END10
>  
>  #define MMIO_SPTE_GEN_HIGH_START 52
>  #define MMIO_SPTE_GEN_HIGH_END   62
> @@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & 
> SHADOW_ACC_TRACK_SAVED_MASK));
>   MMIO_SPTE_GEN_LOW_START)
>  #define MMIO_SPTE_GEN_HIGH_MASK  
> GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
>   MMIO_SPTE_GEN_HIGH_START)
> +static_assert(!(SPTE_MMU_PRESENT_MASK &
> + (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));
>  
>  #define MMIO_SPTE_GEN_LOW_BITS   (MMIO_SPTE_GEN_LOW_END - 
> MMIO_SPTE_GEN_LOW_START + 1)
>  #define MMIO_SPTE_GEN_HIGH_BITS  (MMIO_SPTE_GEN_HIGH_END - 
> MMIO_SPTE_GEN_HIGH_START + 1)
>  
>  /* remember to adjust the comment above as well if you change these */
> -static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
> +static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);
>  
>  #define MMIO_SPTE_GEN_LOW_SHIFT  (MMIO_SPTE_GEN_LOW_START - 0)
>  #define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - 
> MMIO_SPTE_GEN_LOW_BITS)
> 


Re: [PATCH 0/3] PSP TEE driver update and bug fixes

2021-03-09 Thread Tom Lendacky
On 3/9/21 2:11 AM, Rijo Thomas wrote:
> The first patch helps to improve the response time by reducing the
> polling time of the tee command status variable.
> 
> Second patch is a bug fix to handle multi-threaded use-case.
> During testing, race condition was seen due to missing synchronisation
> in writes to the TEE ring buffer. This patch helps to resolve that.
> 
> Third patch is to update the copyright year for the tee driver files.
> 

Just something to think about and not as part of this patch series, but
think about submitting a patch that adds you as maintainer of the TEE
portion of the driver (see how the SEV portion is handled).

Thanks,
Tom

> Rijo Thomas (3):
>   crypto: ccp - reduce tee command status polling interval from 5ms to
> 1ms
>   crypto: ccp - fix command queuing to TEE ring buffer
>   crypto: ccp - update copyright year for tee
> 
>  drivers/crypto/ccp/tee-dev.c | 57 
>  drivers/crypto/ccp/tee-dev.h | 20 +++--
>  2 files changed, 57 insertions(+), 20 deletions(-)
> 


Re: [PATCH 3/3] crypto: ccp - update copyright year for tee

2021-03-09 Thread Tom Lendacky
On 3/9/21 2:11 AM, Rijo Thomas wrote:
> Update the copyright year for PSP TEE driver files.
> 
> Signed-off-by: Rijo Thomas 

The copyright updates really should occur as part of the changes in the
other patches vs a separate patch.

Thanks,
Tom

> ---
>  drivers/crypto/ccp/tee-dev.c | 2 +-
>  drivers/crypto/ccp/tee-dev.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/crypto/ccp/tee-dev.c b/drivers/crypto/ccp/tee-dev.c
> index 1aa264815028..8cade4775115 100644
> --- a/drivers/crypto/ccp/tee-dev.c
> +++ b/drivers/crypto/ccp/tee-dev.c
> @@ -5,7 +5,7 @@
>   * Author: Rijo Thomas 
>   * Author: Devaraj Rangasamy 
>   *
> - * Copyright 2019 Advanced Micro Devices, Inc.
> + * Copyright (C) 2019,2021 Advanced Micro Devices, Inc.
>   */
>  
>  #include 
> diff --git a/drivers/crypto/ccp/tee-dev.h b/drivers/crypto/ccp/tee-dev.h
> index dbeb7d289acb..49d26158b71e 100644
> --- a/drivers/crypto/ccp/tee-dev.h
> +++ b/drivers/crypto/ccp/tee-dev.h
> @@ -1,6 +1,6 @@
>  /* SPDX-License-Identifier: MIT */
>  /*
> - * Copyright 2019 Advanced Micro Devices, Inc.
> + * Copyright (C) 2019,2021 Advanced Micro Devices, Inc.
>   *
>   * Author: Rijo Thomas 
>   * Author: Devaraj Rangasamy 
> 


[tip: x86/seves] x86/virtio: Have SEV guests enforce restricted virtio memory access

2021-03-08 Thread tip-bot2 for Tom Lendacky
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 229164175ff0c61ff581e6bf37fbfcb608b6e9bb
Gitweb:
https://git.kernel.org/tip/229164175ff0c61ff581e6bf37fbfcb608b6e9bb
Author:Tom Lendacky 
AuthorDate:Thu, 04 Mar 2021 16:40:11 -06:00
Committer: Borislav Petkov 
CommitterDate: Mon, 08 Mar 2021 20:41:33 +01:00

x86/virtio: Have SEV guests enforce restricted virtio memory access

An SEV guest requires that virtio devices use the DMA API to allow the
hypervisor to successfully access guest memory as needed.

The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio
to use the DMA API. Add arch_has_restricted_virtio_memory_access() for
x86, to fail the device probe if these features have not been set for the
device when running as an SEV guest.

 [ bp: Fix -Wmissing-prototypes warning
   Reported-by: kernel test robot  ]

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
Link: 
https://lkml.kernel.org/r/b46e0211f77ca1831f11132f969d470a6ffc9267.1614897610.git.thomas.lenda...@amd.com
---
 arch/x86/Kconfig  | 1 +
 arch/x86/mm/mem_encrypt.c | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..e80e726 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select INSTRUCTION_DECODER
+   select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 4b01f7d..f3eb53f 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -484,3 +485,8 @@ void __init mem_encrypt_init(void)
print_mem_encrypt_feature_info();
 }
 
+int arch_has_restricted_virtio_memory_access(void)
+{
+   return sev_active();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);


Re: [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs

2021-03-08 Thread Tom Lendacky
On 2/25/21 2:47 PM, Sean Christopherson wrote:
> Introduce MMU_PRESENT to explicitly track which SPTEs are "present" from
> the MMU's perspective.  Checking for shadow-present SPTEs is a very
> common operation for the MMU, particularly in hot paths such as page
> faults.  With the addition of "removed" SPTEs for the TDP MMU,
> identifying shadow-present SPTEs is quite costly especially since it
> requires checking multiple 64-bit values.
> 
> On 64-bit KVM, this reduces the footprint of kvm.ko's .text by ~2k bytes.
> On 32-bit KVM, this increases the footprint by ~200 bytes, but only
> because gcc now inlines several more MMU helpers, e.g. drop_parent_pte().
> 
> Signed-off-by: Sean Christopherson 
> ---
>   arch/x86/kvm/mmu/spte.c |  8 
>   arch/x86/kvm/mmu/spte.h | 11 ++-
>   2 files changed, 14 insertions(+), 5 deletions(-)

I'm trying to run a guest on my Rome system using the queue branch, but
I'm encountering an error that I bisected to this commit. In the guest
(during OVMF boot) I see:

error: kvm run failed Invalid argument
RAX= RBX=ffc12792 RCX=7f58401a 
RDX=7faaf808
RSI=0010 RDI=ffc12792 RBP=ffc12792 
RSP=7faaf740
R8 =0792 R9 =7faaf808 R10=ffc12793 
R11=03f8
R12=0010 R13= R14=7faaf808 
R15=0012
RIP=7f6e9a90 RFL=0246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0030   00c09300 DPL=0 DS   [-WA]
CS =0038   00a09b00 DPL=0 CS64 [-RA]
SS =0030   00c09300 DPL=0 DS   [-WA]
DS =0030   00c09300 DPL=0 DS   [-WA]
FS =0030   00c09300 DPL=0 DS   [-WA]
GS =0030   00c09300 DPL=0 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =   8b00 DPL=0 TSS64-busy
GDT= 7f5ee698 0047
IDT= 7f186018 0fff
CR0=80010033 CR2= CR3=7f801000 CR4=0668
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=0d00
Code=22 00 00 e8 c0 e6 ff ff 48 83 c4 20 45 84 ed 74 07 fb eb 04 <44> 88 65 00 
58 5b 5d 41 5c 41 5d c3 55 48 0f af 3d 1b 37 00 00 be 20 00 00 00 48 03 3d 17

On the hypervisor, I see the following:

[   55.886136] get_mmio_spte: detect reserved bits on spte, addr 0xffc12792, 
dump hierarchy:
[   55.895284] -- spte 0x1344a0827 level 4.
[   55.900059] -- spte 0x134499827 level 3.
[   55.904877] -- spte 0x165bf0827 level 2.
[   55.909651] -- spte 0xff800ffc12817 level 1.

When I kill the guest, I get a kernel panic:

[   95.539683] __pte_list_remove: 40567a6a 0->BUG
[   95.545481] kernel BUG at arch/x86/kvm/mmu/mmu.c:896!
[   95.551133] invalid opcode:  [#1] SMP NOPTI
[   95.556192] CPU: 142 PID: 5054 Comm: qemu-system-x86 Tainted: GW 
5.11.0-rc4-sos-sev-es #1
[   95.566872] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS 
REX1006G 01/25/2020
[   95.575900] RIP: 0010:__pte_list_remove.cold+0x2e/0x48 [kvm]
[   95.582312] Code: c7 c6 40 6f f3 c0 48 c7 c7 aa da f3 c0 e8 79 3d a7 cd 0f 
0b 48 89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 87 da f3 c0 e8 61 3d a7 cd <0f> 0b 48 
89 fa 48 c7 c6 40 6f f3 c0 48 c7 c7 98 da f3 c0 e8 49 3d
[   95.603271] RSP: 0018:c900143e7c78 EFLAGS: 00010246
[   95.609093] RAX: 002a RBX:  RCX: 
[   95.617058] RDX:  RSI: 88900e598950 RDI: 88900e598950
[   95.625019] RBP: 888165bf0090 R08: 88900e598950 R09: c900143e7a98
[   95.632980] R10: 0001 R11: 0001 R12: c9000ff29000
[   95.640944] R13: c900143e7d18 R14: 0098 R15: 
[   95.648912] FS:  () GS:88900e58() 
knlGS:
[   95.657951] CS:  0010 DS:  ES:  CR0: 80050033
[   95.664361] CR2: 7fb328d20c80 CR3: 0001476d2000 CR4: 00350ee0
[   95.672326] Call Trace:
[   95.675065]  mmu_page_zap_pte+0xf9/0x130 [kvm]
[   95.680103]  __kvm_mmu_prepare_zap_page+0x6d/0x380 [kvm]
[   95.686088]  kvm_mmu_zap_all+0x5e/0xe0 [kvm]
[   95.690911]  kvm_mmu_notifier_release+0x2b/0x60 [kvm]
[   95.696614]  __mmu_notifier_release+0x71/0x1e0
[   95.701585]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[   95.707512]  ? __khugepaged_exit+0x111/0x160
[   95.712289]  exit_mmap+0x15b/0x1f0
[   95.716092]  ? __khugepaged_exit+0x111/0x160
[   95.720857]  ? kmem_cache_free+0x210/0x3f0
[   95.725428]  ? kmem_cache_free+0x387/0x3f0
[   95.729998]  mmput+0x56/0x130
[   95.733312]  do_exit+0x341/0xb50
[   95.736923]  do_group_exit+0x3a/0xa0
[   95.740925]  __x64_sys_exit_group+0x14/0x20
[   95.745600]  do_syscall_64+0x33/0x40
[   95.749601]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   95.755241] RIP: 0033:0x7fb

[tip: x86/seves] x86/virtio: Have SEV guests enforce restricted virtio memory access

2021-03-08 Thread tip-bot2 for Tom Lendacky
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 9da54be651f8a856d9e6c14183d0df948e222103
Gitweb:
https://git.kernel.org/tip/9da54be651f8a856d9e6c14183d0df948e222103
Author:Tom Lendacky 
AuthorDate:Thu, 04 Mar 2021 16:40:11 -06:00
Committer: Borislav Petkov 
CommitterDate: Mon, 08 Mar 2021 12:54:43 +01:00

x86/virtio: Have SEV guests enforce restricted virtio memory access

An SEV guest requires that virtio devices use the DMA API to allow the
hypervisor to successfully access guest memory as needed.

The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio
to use the DMA API. Add arch_has_restricted_virtio_memory_access() for
x86, to fail the device probe if these features have not been set for the
device when running as an SEV guest.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
Link: 
https://lkml.kernel.org/r/b46e0211f77ca1831f11132f969d470a6ffc9267.1614897610.git.thomas.lenda...@amd.com
---
 arch/x86/Kconfig  | 1 +
 arch/x86/mm/mem_encrypt.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879..e80e726 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select INSTRUCTION_DECODER
+   select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 4b01f7d..667283f 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -484,3 +484,8 @@ void __init mem_encrypt_init(void)
print_mem_encrypt_feature_info();
 }
 
+int arch_has_restricted_virtio_memory_access(void)
+{
+   return sev_active();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);


[PATCH] x86/virtio: Have SEV guests enforce restricted virtio memory access

2021-03-04 Thread Tom Lendacky
From: Tom Lendacky 

An SEV guest requires that virtio devices use the DMA API to allow the
hypervisor to successfully access guest memory as needed.

The VIRTIO_F_VERSION_1 and VIRTIO_F_ACCESS_PLATFORM features tell virtio
to use the DMA API. Add arch_has_restricted_virtio_memory_access() for
x86, to fail the device probe if these features have not been set for the
device when running as an SEV guest.

Cc: Brijesh Singh 
Signed-off-by: Tom Lendacky 
---
 arch/x86/Kconfig  | 1 +
 arch/x86/mm/mem_encrypt.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2792879d398e..e80e7268d2c6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select INSTRUCTION_DECODER
+   select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 4b01f7dbaf30..667283f3dcfa 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -484,3 +484,8 @@ void __init mem_encrypt_init(void)
print_mem_encrypt_feature_info();
 }
 
+int arch_has_restricted_virtio_memory_access(void)
+{
+   return sev_active();
+}
+EXPORT_SYMBOL_GPL(arch_has_restricted_virtio_memory_access);
-- 
2.30.0



[PATCH] crypto: ccp - Don't initialize SEV support without the SEV feature

2021-03-03 Thread Tom Lendacky
From: Tom Lendacky 

If SEV has been disabled (e.g. through BIOS), the driver probe will still
issue SEV firmware commands. The SEV INIT firmware command will return an
error in this situation, but the error code is a general error code that
doesn't highlight the exact reason.

Add a check for X86_FEATURE_SEV in sev_dev_init() and emit a meaningful
message and skip attempting to initialize the SEV firmware if the feature
is not enabled. Since building the SEV code is dependent on X86_64, adding
the check won't cause any build problems.

Cc: John Allen 
Cc: Brijesh Singh 
Signed-off-by: Tom Lendacky 
---
 drivers/crypto/ccp/sev-dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 476113e12489..b9fc8d7aca73 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -971,6 +972,11 @@ int sev_dev_init(struct psp_device *psp)
struct sev_device *sev;
int ret = -ENOMEM;
 
+   if (!boot_cpu_has(X86_FEATURE_SEV)) {
+   dev_info_once(dev, "SEV: memory encryption not enabled by 
BIOS\n");
+   return 0;
+   }
+
sev = devm_kzalloc(dev, sizeof(*sev), GFP_KERNEL);
if (!sev)
goto e_err;
-- 
2.30.0



Re: [RFC] KVM: x86: Support KVM VMs sharing SEV context

2021-02-25 Thread Tom Lendacky

On 2/24/21 9:44 PM, Steve Rutherford wrote:

On Wed, Feb 24, 2021 at 1:00 AM Nathan Tempelman  wrote:


@@ -1186,6 +1195,10 @@ int svm_register_enc_region(struct kvm *kvm,
 if (!sev_guest(kvm))
 return -ENOTTY;

+   /* If kvm is mirroring encryption context it isn't responsible for it */
+   if (is_mirroring_enc_context(kvm))
+   return -ENOTTY;
+


Is this necessary? Same for unregister. When we looked at
sev_pin_memory, I believe we concluded that double pinning was safe.


 if (range->addr > ULONG_MAX || range->size > ULONG_MAX)
 return -EINVAL;

@@ -1252,6 +1265,10 @@ int svm_unregister_enc_region(struct kvm *kvm,
 struct enc_region *region;
 int ret;

+   /* If kvm is mirroring encryption context it isn't responsible for it */
+   if (is_mirroring_enc_context(kvm))
+   return -ENOTTY;
+
 mutex_lock(&kvm->lock);

 if (!sev_guest(kvm)) {
@@ -1282,6 +1299,65 @@ int svm_unregister_enc_region(struct kvm *kvm,
 return ret;
  }

+int svm_vm_copy_asid_to(struct kvm *kvm, unsigned int mirror_kvm_fd)
+{
+   struct file *mirror_kvm_file;
+   struct kvm *mirror_kvm;
+   struct kvm_sev_info *mirror_kvm_sev;
+   unsigned int asid;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;


You definitely don't want this: this is the function that turns the vm
into an SEV guest (marks SEV as active).


The sev_guest() function does not set sev->active, it only checks it. The 
sev_guest_init() function is where sev->active is set.




(Not an issue with this patch, but a broader issue) I believe
sev_guest lacks the necessary acquire/release barriers on sev->active,


The svm_mem_enc_op() takes the kvm lock and that is the only way into the 
sev_guest_init() function where sev->active is set.


Thanks,
Tom


since it's called without the kvm lock. I mean, it's x86, so the only
one that's going to hose you is the compiler for this type of access.
There should be an smp_rmb() after the access in sev_guest and an
smp_wmb() before the access in SEV_GUEST_INIT and here.


+
+   mutex_lock(&kvm->lock);
+
+   /* Mirrors of mirrors should work, but let's not get silly */
+   if (is_mirroring_enc_context(kvm)) {
+   ret = -ENOTTY;
+   goto failed;
+   }
+
+   mirror_kvm_file = fget(mirror_kvm_fd);
+   if (!kvm_is_kvm(mirror_kvm_file)) {
+   ret = -EBADF;
+   goto failed;
+   }
+
+   mirror_kvm = mirror_kvm_file->private_data;
+
+   if (mirror_kvm == kvm || is_mirroring_enc_context(mirror_kvm)) {

Just check if the source is an sev_guest and that the destination is
not an sev_guest.

I reviewed earlier incarnations of this, and think the high-level idea
is sound. I'd like to see kvm-selftests for this patch, and plan on
collaborating with AMD to help make those happen.



Re: [PATCH 6/7] x86/boot/compressed/64: Check SEV encryption in 32-bit boot-path

2021-02-10 Thread Tom Lendacky

On 2/10/21 10:47 AM, Dave Hansen wrote:

On 2/10/21 2:21 AM, Joerg Roedel wrote:

+   /* Store to memory and keep it in the registers */
+   movl%eax, rva(sev_check_data)(%ebp)
+   movl%ebx, rva(sev_check_data+4)(%ebp)
+
+   /* Enable paging to see if encryption is active */
+   movl%cr0, %edx  /* Backup %cr0 in %edx */
+   movl$(X86_CR0_PG | X86_CR0_PE), %ecx /* Enable Paging and Protected 
mode */
+   movl%ecx, %cr0
+
+   cmpl%eax, rva(sev_check_data)(%ebp)
+   jne 3f
+   cmpl%ebx, rva(sev_check_data+4)(%ebp)
+   jne 3f


Also, I know that turning paging on is a *BIG* barrier.  But, I didn't
think it has any effect on the caches.

I would expect that the underlying physical address of 'sev_check_data'
would change when paging gets enabled because paging sets the C bit.
So, how does the write of 'sev_check_data' get out of the caches and
into memory where it can be read back with the new physical address?

I think there's some bit of the SEV architecture that I'm missing.


Non-paging memory accesses are always considered private (APM Volume 2, 
15.34.4) and thus are cached with the C bit set.


Thanks,
Tom





Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-02-04 Thread Tom Lendacky

On 2/3/21 6:39 PM, Ashish Kalra wrote:

From: Brijesh Singh 

The ioctl is used to retrieve a guest's shared pages list.



...

  
+int svm_get_shared_pages_list(struct kvm *kvm,

+ struct kvm_shared_pages_list *list)
+{
+   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+   struct shared_region_array_entry *array;
+   struct shared_region *pos;
+   int ret, nents = 0;
+   unsigned long sz;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (!list->size)
+   return -EINVAL;
+
+   if (!sev->shared_pages_list_count)
+   return put_user(0, list->pnents);
+
+   sz = sev->shared_pages_list_count * sizeof(struct 
shared_region_array_entry);
+   if (sz > list->size)
+   return -E2BIG;
+
+   array = kmalloc(sz, GFP_KERNEL);
+   if (!array)
+   return -ENOMEM;
+
+   mutex_lock(&kvm->lock);


I think this lock needs to be taken before the memory size is calculated. 
If the list is expanded after obtaining the size and before taking the 
lock, you will run off the end of the array.


Thanks,
Tom


+   list_for_each_entry(pos, &sev->shared_pages_list, list) {
+   array[nents].gfn_start = pos->gfn_start;
+   array[nents++].gfn_end = pos->gfn_end;
+   }
+   mutex_unlock(&kvm->lock);
+
+   ret = -EFAULT;
+   if (copy_to_user(list->buffer, array, sz))
+   goto out;
+   if (put_user(nents, list->pnents))
+   goto out;
+   ret = 0;
+out:
+   kfree(array);
+   return ret;
+}
+


Re: [PATCH v10 08/16] KVM: X86: Introduce KVM_HC_PAGE_ENC_STATUS hypercall

2021-02-04 Thread Tom Lendacky

On 2/3/21 6:38 PM, Ashish Kalra wrote:

From: Brijesh Singh 

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The patch introduces a new shared pages list implemented as a
sorted linked list to track the shared/unencrypted regions marked by the
guest hypercall.



...


+
+   if (enc) {
+   ret = remove_shared_region(gfn_start, gfn_end,
+  &sev->shared_pages_list);
+   if (ret != -ENOMEM)
+   sev->shared_pages_list_count += ret;
+   } else {
+   ret = add_shared_region(gfn_start, gfn_end,
+   &sev->shared_pages_list);
+   if (ret > 0)
+   sev->shared_pages_list_count++;
+   }


I would move the shared_pages_list_count updates into the add/remove 
functions and then just return 0 or a -E error code from those 
functions. It seems simpler than "adding" ret or checking for a greater 
than 0 return code.


Thanks,
Tom


+
+   mutex_unlock(&kvm->lock);
+   return ret;
+}
+
  int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
  {
struct kvm_sev_cmd sev_cmd;
@@ -1693,6 +1842,7 @@ void sev_vm_destroy(struct kvm *kvm)
  
  	sev_unbind_asid(kvm, sev->handle);

sev_asid_free(sev->asid);
+   sev->shared_pages_list_count = 0;
  }
  
  void __init sev_hardware_setup(void)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f923e14e87df..bb249ec625fc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4536,6 +4536,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.complete_emulated_msr = svm_complete_emulated_msr,
  
  	.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,

+
+   .page_enc_status_hc = svm_page_enc_status_hc,
  };
  
  static struct kvm_x86_init_ops svm_init_ops __initdata = {

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0fe874ae5498..6437c1fa1f24 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -79,6 +79,9 @@ struct kvm_sev_info {
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list;  /* List of registered regions */
u64 ap_jump_table;  /* SEV-ES AP Jump Table address */
+   /* List and count of shared pages */
+   int shared_pages_list_count;
+   struct list_head shared_pages_list;
  };
  
  struct kvm_svm {

@@ -472,6 +475,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, 
unsigned nr,
   bool has_error_code, u32 error_code);
  int nested_svm_exit_special(struct vcpu_svm *svm);
  void sync_nested_vmcb_control(struct vcpu_svm *svm);
+int svm_page_enc_status_hc(struct kvm *kvm, unsigned long gpa,
+  unsigned long npages, unsigned long enc);
  
  extern struct kvm_x86_nested_ops svm_nested_ops;
  
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c

index cc60b1fc3ee7..bcbf53851612 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7705,6 +7705,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.can_emulate_instruction = vmx_can_emulate_instruction,
.apic_init_signal_blocked = vmx_apic_init_signal_blocked,
.migrate_timers = vmx_migrate_timers,
+   .page_enc_status_hc = NULL,
  
  	.msr_filter_changed = vmx_msr_filter_changed,

.complete_emulated_msr = kvm_complete_insn_gp,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76bce832cade..2f17f0f9ace7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8162,6 +8162,12 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
kvm_sched_yield(vcpu->kvm, a0);
ret = 0;
break;
+   case KVM_HC_PAGE_ENC_STATUS:
+   ret = -KVM_ENOSYS;
+   if (kvm_x86_ops.page_enc_status_hc)
+   ret = kvm_x86_ops.page_enc_status_hc(vcpu->kvm,
+   a0, a1, a2);
+   break;
default:
ret = -KVM_ENOSYS;
break;
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 8b86609849b9..847b83b75dc8 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -29,6 +29,7 @@
  #define KVM_HC_CLOCK_PAIRING  9
  #define KVM_HC_SEND_IPI   10
  #define KVM_HC_SCHED_YIELD11
+#define KVM_HC_PAGE_ENC_STATUS 12
  
  /*

   * hypercalls use architecture specific



Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-02-02 Thread Tom Lendacky
On 2/2/21 10:37 AM, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 25, 2021 at 07:33:35PM +0100, Martin Radev wrote:
>> On Mon, Jan 18, 2021 at 10:14:28AM -0500, Konrad Rzeszutek Wilk wrote:
>>> On Mon, Jan 18, 2021 at 12:44:58PM +0100, Martin Radev wrote:
>>>> On Wed, Jan 13, 2021 at 12:30:17PM +0100, Christoph Hellwig wrote:
>>>>> On Tue, Jan 12, 2021 at 04:07:29PM +0100, Martin Radev wrote:
>>>>>> The size of the buffer being bounced is not checked if it happens
>>>>>> to be larger than the size of the mapped buffer. Because the size
>>>>>> can be controlled by a device, as it's the case with virtio devices,
>>>>>> this can lead to memory corruption.
>>>>>>
>>>>>
>>>>> I'm really worried about all these hodge podge hacks for not trusted
>>>>> hypervisors in the I/O stack.  Instead of trying to harden protocols
>>>>> that are fundamentally not designed for this, how about instead coming
>>>>> up with a new paravirtualized I/O interface that is specifically
>>>>> designed for use with an untrusted hypervisor from the start?
>>>>
>>>> Your comment makes sense but then that would require the cooperation
>>>> of these vendors and the cloud providers to agree on something meaningful.
>>>> I am also not sure whether the end result would be better than hardening
>>>> this interface to catch corruption. There is already some validation in
>>>> unmap path anyway.
>>>>
>>>> Another possibility is to move this hardening to the common virtio code,
>>>> but I think the code may become more complicated there since it would
>>>> require tracking both the dma_addr and length for each descriptor.
>>>
>>> Christoph,
>>>
>>> I've been wrestling with the same thing - this is specific to busted
>>> drivers. And in reality you could do the same thing with a hardware
>>> virtio device (see example in 
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthunderclap.io%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7Cfc27af49d9a943699f6c08d8c798eed4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637478806973542212%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aUVqobkOSDfDhCAEauABOUvCAaIcw%2FTh07YFxeBjBDU%3D&reserved=0)
>>>  - where the
>>> mitigation is 'enable the IOMMU to do its job.'.
>>>
>>> AMD SEV documents speak about utilizing IOMMU to do this (AMD SEV-SNP)..
>>> and while that is great in the future, SEV without IOMMU is now here.
>>>
>>> Doing a full circle here, this issue can be exploited with virtio
>>> but you could say do that with real hardware too if you hacked the
>>> firmware, so if you say used Intel SR-IOV NIC that was compromised
>>> on an AMD SEV machine, and plumbed in the guest - the IOMMU inside
>>> of the guest would be SWIOTLB code. Last line of defense against
>>> bad firmware to say.
>>>
>>> As such I am leaning towards taking this code, but I am worried
>>> about the performance hit .. but perhaps I shouldn't as if you
>>> are using SWIOTLB=force already you are kind of taking a
>>> performance hit?
>>>
>>
>> I have not measured the performance degradation. This will hit all AMD SEV,
>> Intel TDX, IBM Protected Virtualization VMs. I don't expect the hit to
>> be large since there are only few added operations per hundreads of copied
>> bytes. I could try to measure the performance hit by running some benchmark
>> with virtio-net/virtio-blk/virtio-rng.
>>
>> Earlier I said:
>>>> Another possibility is to move this hardening to the common virtio code,
>>>> but I think the code may become more complicated there since it would
>>>> require tracking both the dma_addr and length for each descriptor.
>>
>> Unfortunately, this doesn't make sense. Even if there's validation for
>> the size in the common virtio layer, there will be some other device
>> which controls a dma_addr and length passed to dma_unmap* in the
>> corresponding driver. The device can target a specific dma-mapped private
>> buffer by changing the dma_addr and set a good length to overwrite buffers
>> following it.
>>
>> So, instead of doing the check in every driver and hitting a performance
>> cost even when swiotlb is not used, it's probably better to fix it in
>> swiotlb.
>>
>> @Tom Lendacky, do you think that it makes sense to harden swiotlb or
>> some other approach may be better for the SEV features?
> 
> I am not Tom, but this change seems the right way forward regardless if
> is TDX, AMD SEV, or any other architecture that encrypt memory and use
> SWIOTLB.

Sorry, I missed the @Tom before. I'm with Konrad and believe it makes
sense to add these checks.

I'm not sure if there would be a better approach for all confidential
computing technologies. SWIOTLB works nicely, but is limited because of
the 32-bit compatible memory location. Being able to have buffers above
the 32-bit limit would alleviate that, but that is support that would have
to be developed.

Thanks,
Tom

> 
> Let me queue it up in development branch and do some regression testing.
>>


[tip: x86/seves] x86/sev-es: Do not unroll string I/O for SEV-ES guests

2021-02-02 Thread tip-bot2 for Tom Lendacky
The following commit has been merged into the x86/seves branch of tip:

Commit-ID: 62a08a7193dc9107904aaa51a04ba3ba2959f745
Gitweb:
https://git.kernel.org/tip/62a08a7193dc9107904aaa51a04ba3ba2959f745
Author:Tom Lendacky 
AuthorDate:Mon, 01 Feb 2021 12:26:27 -06:00
Committer: Borislav Petkov 
CommitterDate: Tue, 02 Feb 2021 16:25:05 +01:00

x86/sev-es: Do not unroll string I/O for SEV-ES guests

Under the GHCB specification, SEV-ES guests can support string I/O.
The current #VC handler contains this support, so remove the need to
unroll kernel string I/O operations. This will reduce the number of #VC
exceptions generated as well as the number VM exits for the guest.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
Link: 
https://lkml.kernel.org/r/3de04b5b638546ac75d42ba52307fe1a922173d3.1612203987.git.thomas.lenda...@amd.com
---
 arch/x86/mm/mem_encrypt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c79e573..d55ea77 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -474,9 +474,10 @@ void __init mem_encrypt_init(void)
swiotlb_update_mem_attributes();
 
/*
-* With SEV, we need to unroll the rep string I/O instructions.
+* With SEV, we need to unroll the rep string I/O instructions,
+* but SEV-ES supports them through the #VC handler.
 */
-   if (sev_active())
+   if (sev_active() && !sev_es_active())
static_branch_enable(&sev_enable_key);
 
print_mem_encrypt_feature_info();


Re: [PATCH] x86/sev-es: Do not unroll string I/O for SEV-ES guests

2021-02-01 Thread Tom Lendacky

On 2/1/21 12:26 PM, Tom Lendacky wrote:

From: Tom Lendacky 

Under the GHCB specification, SEV-ES guests can support string I/O. The
current #VC handler contains this support, so remove the need to unroll
kernel string I/O operations. This will reduce the number of #VC
exceptions generated as well as the number VMEXITS for the guest.

Signed-off-by: Tom Lendacky 
---
  arch/x86/mm/mem_encrypt.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c79e5736ab2b..d55ea77e1ca8 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -474,9 +474,10 @@ void __init mem_encrypt_init(void)
swiotlb_update_mem_attributes();
  
  	/*

-* With SEV, we need to unroll the rep string I/O instructions.
+* With SEV, we need to unroll the rep string I/O instructions,
+* but SEV-ES supports them through the #VC handler.
 */
-   if (sev_active())
+   if (sev_active() && !sev_es_active())
static_branch_enable(&sev_enable_key);


This brings up a question. The name implies that this is a general SEV 
related key. However, it's currently only used for the string I/O 
operations. If further usage of this key is added in the future, then this 
would probably need to be split into two keys, the sev_enable_key and an 
sev_unroll_io_key.


Is it worth documenting that in the comment? Or should the key be renamed now?

Thanks,
Tom

  
  	print_mem_encrypt_feature_info();


base-commit: a7e0bdf1b07ea6169930ec42b0bdb17e1c1e3bb0



[PATCH] x86/sev-es: Do not unroll string I/O for SEV-ES guests

2021-02-01 Thread Tom Lendacky
From: Tom Lendacky 

Under the GHCB specification, SEV-ES guests can support string I/O. The
current #VC handler contains this support, so remove the need to unroll
kernel string I/O operations. This will reduce the number of #VC
exceptions generated as well as the number VMEXITS for the guest.

Signed-off-by: Tom Lendacky 
---
 arch/x86/mm/mem_encrypt.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c79e5736ab2b..d55ea77e1ca8 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -474,9 +474,10 @@ void __init mem_encrypt_init(void)
swiotlb_update_mem_attributes();
 
/*
-* With SEV, we need to unroll the rep string I/O instructions.
+* With SEV, we need to unroll the rep string I/O instructions,
+* but SEV-ES supports them through the #VC handler.
 */
-   if (sev_active())
+   if (sev_active() && !sev_es_active())
static_branch_enable(&sev_enable_key);
 
print_mem_encrypt_feature_info();

base-commit: a7e0bdf1b07ea6169930ec42b0bdb17e1c1e3bb0
-- 
2.30.0



Re: [PATCH V2] Fix unsynchronized access to sev members through svm_register_enc_region

2021-01-27 Thread Tom Lendacky

On 1/27/21 3:54 PM, Sean Christopherson wrote:

On Wed, Jan 27, 2021, Peter Gonda wrote:

Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.


...

+
+   list_add_tail(®ion->list, &sev->regions_list);
+   mutex_unlock(&kvm->lock);
+
/*
 * The guest may change the memory encryption attribute from C=0 -> C=1
 * or vice versa for this memory range. Lets make sure caches are
@@ -1133,13 +1143,6 @@ int svm_register_enc_region(struct kvm *kvm,
 */
sev_clflush_pages(region->pages, region->npages);


I don't think it actually matters, but it feels like the flush should be done
before adding the region to the list.  That would also make this sequence
consistent with the other flows.

Tom, any thoughts?


I don't think it matters, either. This does keep the flushing outside of 
the mutex, so if you are doing parallel operations, that should help speed 
things up a bit.


Thanks,
Tom





Re: [PATCH] Fix unsynchronized access to sev members through svm_register_enc_region

2021-01-26 Thread Tom Lendacky
On 1/26/21 12:54 PM, Peter Gonda wrote:
> sev_pin_memory assumes that callers hold the kvm->lock. This was true for
> all callers except svm_register_enc_region since it does not originate
> from svm_mem_enc_op. Also added lockdep annotation to help prevent
> future regressions.

I'm not exactly sure what the problem is that your fixing? What is the
symptom that you're seeing?

> 
> Tested: Booted SEV enabled VM on host.
> 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: Paolo Bonzini 
> Cc: Joerg Roedel 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Cc: Sean Christopherson 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: sta...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Fixes: 116a2214c5173 (KVM: SVM: Pin guest memory when SEV is active)

I can't find this commit. The Linux and KVM trees have it as:

1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active")

> Signed-off-by: Peter Gonda 
> 
> ---
>  arch/x86/kvm/svm.c | 16 +---

This patch won't apply, as it has already been a few releases since svm.c
was moved to the arch/x86/kvm/svm directory and this function now lives in
sev.c.

Thanks,
Tom

>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index afdc5b44fe9f..9884e57f3d0f 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1699,6 +1699,8 @@ static struct page **sev_pin_memory(struct kvm *kvm, 
> unsigned long uaddr,
>   struct page **pages;
>   unsigned long first, last;
>  
> + lockdep_assert_held(&kvm->lock);
> +
>   if (ulen == 0 || uaddr + ulen < uaddr)
>   return NULL;
>  
> @@ -7228,12 +7230,19 @@ static int svm_register_enc_region(struct kvm *kvm,
>   if (!region)
>   return -ENOMEM;
>  
> + mutex_lock(&kvm->lock);
>   region->pages = sev_pin_memory(kvm, range->addr, range->size, 
> ®ion->npages, 1);
>   if (!region->pages) {
>   ret = -ENOMEM;
>   goto e_free;
>   }
>  
> + region->uaddr = range->addr;
> + region->size = range->size;
> +
> + list_add_tail(®ion->list, &sev->regions_list);
> + mutex_unlock(&kvm->lock);
> +
>   /*
>* The guest may change the memory encryption attribute from C=0 -> C=1
>* or vice versa for this memory range. Lets make sure caches are
> @@ -7242,13 +7251,6 @@ static int svm_register_enc_region(struct kvm *kvm,
>*/
>   sev_clflush_pages(region->pages, region->npages);
>  
> - region->uaddr = range->addr;
> - region->size = range->size;
> -
> - mutex_lock(&kvm->lock);
> - list_add_tail(®ion->list, &sev->regions_list);
> - mutex_unlock(&kvm->lock);
> -
>   return ret;
>  
>  e_free:
> 


Re: [PATCH 1/3] KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest

2021-01-25 Thread Tom Lendacky
On 1/22/21 5:50 PM, Sean Christopherson wrote:
> Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
> GRPs' dirty bits are set from time zero and never cleared, i.e. will

Ah, missed that, bad assumption on my part.

> always be seen as dirty.  The obvious alternative would be to clear
> the dirty bits when appropriate, but removing the dirty checks is
> desirable as it allows reverting GPR dirty+available tracking, which
> adds overhead to all flavors of x86 VMs.
> 
> Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
> by the GHCB spec, which allows the hypervisor (or guest) to provide
> unnecessary info; it's the guest's responsibility to consume only what
> it needs (the hypervisor is untrusted after all).
> 
>   The guest and hypervisor can supply additional state if desired but
>   must not rely on that additional state being provided.

Yes, that's true.

I'm ok with removing the tracking if that's desired. Otherwise, we can add
a vcpu->arch.regs_dirty = 0 in sev_es_sync_from_ghcb().

Thanks,
Tom

> 
> Cc: Brijesh Singh 
> Cc: Tom Lendacky 
> Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index c8ffdbc81709..ac652bc476ae 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1415,16 +1415,13 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
>* to be returned:
>*   GPRs RAX, RBX, RCX, RDX
>*
> -  * Copy their values to the GHCB if they are dirty.
> +  * Copy their values, even if they may not have been written during the
> +  * VM-Exit.  It's the guest's responsibility to not consume random data.
>*/
> - if (kvm_register_is_dirty(vcpu, VCPU_REGS_RAX))
> - ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]);
> - if (kvm_register_is_dirty(vcpu, VCPU_REGS_RBX))
> - ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
> - if (kvm_register_is_dirty(vcpu, VCPU_REGS_RCX))
> - ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
> - if (kvm_register_is_dirty(vcpu, VCPU_REGS_RDX))
> - ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
> + ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]);
> + ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
> + ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
> + ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
>  }
>  
>  static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
> 


Re: [PATCH 3/3] KVM: SVM: Sync GPRs to the GHCB only after VMGEXIT

2021-01-22 Thread Tom Lendacky

On 1/22/21 5:50 PM, Sean Christopherson wrote:

Sync GPRs to the GHCB on VMRUN only if a sync is needed, i.e. if the
previous exit was a VMGEXIT and the guest is expecting some data back.



The start of sev_es_sync_to_ghcb() checks if the GHCB has been mapped, 
which only occurs on VMGEXIT, and exits early if not. And 
sev_es_sync_from_ghcb() is only called if the GHCB has been successfully 
mapped. The only thing in between is sev_es_validate_vmgexit(), which will 
terminate the VM on error. So I don't think this patch is needed.


Thanks,
Tom


Cc: Brijesh Singh 
Cc: Tom Lendacky 
Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/sev.c | 15 ++-
  arch/x86/kvm/svm/svm.h |  1 +
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ac652bc476ae..9bd1e1650eb3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1418,10 +1418,13 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm)
 * Copy their values, even if they may not have been written during the
 * VM-Exit.  It's the guest's responsibility to not consume random data.
 */
-   ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]);
-   ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
-   ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
-   ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
+   if (svm->need_sync_to_ghcb) {
+   ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]);
+   ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]);
+   ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]);
+   ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]);
+   svm->need_sync_to_ghcb = false;
+   }
  }
  
  static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)

@@ -1441,8 +1444,10 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm)
 * VMMCALL allows the guest to provide extra registers. KVM also
 * expects RSI for hypercalls, so include that, too.
 *
-* Copy their values to the appropriate location if supplied.
+* Copy their values to the appropriate location if supplied, and
+* flag that a sync back to the GHCB is needed on the next VMRUN.
 */
+   svm->need_sync_to_ghcb = true;
memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs));
  
  	vcpu->arch.regs[VCPU_REGS_RAX] = ghcb_get_rax_if_valid(ghcb);

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 0fe874ae5498..4e2e5f9fbfc2 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -192,6 +192,7 @@ struct vcpu_svm {
u64 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+   bool need_sync_to_ghcb;
  };
  
  struct svm_cpu_data {




Re: [PATCH v3 13/13] KVM: SVM: Skip SEV cache flush if no ASIDs have been used

2021-01-22 Thread Tom Lendacky

On 1/22/21 2:21 PM, Sean Christopherson wrote:

Skip SEV's expensive WBINVD and DF_FLUSH if there are no SEV ASIDs
waiting to be reclaimed, e.g. if SEV was never used.  This "fixes" an
issue where the DF_FLUSH fails during hardware teardown if the original
SEV_INIT failed.  Ideally, SEV wouldn't be marked as enabled in KVM if
SEV_INIT fails, but that's a problem for another day.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 23 +++
  1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 73da2af1e25d..0a4715e60b88 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -56,9 +56,14 @@ struct enc_region {
unsigned long size;
  };
  
-static int sev_flush_asids(void)

+static int sev_flush_asids(int min_asid, int max_asid)
  {
-   int ret, error = 0;
+   int ret, pos, error = 0;
+
+   /* Check if there are any ASIDs to reclaim before performing a flush */
+   pos = find_next_bit(sev_reclaim_asid_bitmap, max_sev_asid, min_asid);
+   if (pos >= max_asid)
+   return -EBUSY;
  
  	/*

 * DEACTIVATE will clear the WBINVD indicator causing DF_FLUSH to fail,
@@ -80,14 +85,7 @@ static int sev_flush_asids(void)
  /* Must be called with the sev_bitmap_lock held */
  static bool __sev_recycle_asids(int min_asid, int max_asid)
  {
-   int pos;
-
-   /* Check if there are any ASIDs to reclaim before performing a flush */
-   pos = find_next_bit(sev_reclaim_asid_bitmap, max_sev_asid, min_asid);
-   if (pos >= max_asid)
-   return false;
-
-   if (sev_flush_asids())
+   if (sev_flush_asids(min_asid, max_asid))
return false;
  
  	/* The flush process will flush all reclaimable SEV and SEV-ES ASIDs */

@@ -1324,10 +1322,11 @@ void sev_hardware_teardown(void)
if (!sev_enabled)
return;
  
+	/* No need to take sev_bitmap_lock, all VMs have been destroyed. */

+   sev_flush_asids(0, max_sev_asid);
+
bitmap_free(sev_asid_bitmap);
bitmap_free(sev_reclaim_asid_bitmap);
-
-   sev_flush_asids();
  }
  
  int sev_cpu_init(struct svm_cpu_data *sd)




Re: [PATCH v3 07/13] KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)

2021-01-22 Thread Tom Lendacky

On 1/22/21 2:21 PM, Sean Christopherson wrote:

Enable the 'sev' and 'sev_es' module params by default instead of having
them conditioned on CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT.  The extra
Kconfig is pointless as KVM SEV/SEV-ES support is already controlled via
CONFIG_KVM_AMD_SEV, and CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT has the
unfortunate side effect of enabling all the SEV-ES _guest_ code due to
it being dependent on CONFIG_AMD_MEM_ENCRYPT=y.

Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Brijesh Singh 
Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 2b8ebe2f1caf..75a83e2a8a89 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -29,11 +29,11 @@
  
  #ifdef CONFIG_KVM_AMD_SEV

  /* enable/disable SEV support */
-static bool sev_enabled = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+static bool sev_enabled = true;
  module_param_named(sev, sev_enabled, bool, 0444);
  
  /* enable/disable SEV-ES support */

-static bool sev_es_enabled = 
IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+static bool sev_es_enabled = true;
  module_param_named(sev_es, sev_es_enabled, bool, 0444);
  #else
  #define sev_enabled false



Re: [PATCH v3 02/13] KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails

2021-01-22 Thread Tom Lendacky

On 1/22/21 2:21 PM, Sean Christopherson wrote:

Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise
KVM will unnecessarily keep the bitmap when SEV is not fully enabled.

Freeing the page is also necessary to avoid introducing a bug when a
future patch eliminates svm_sev_enabled() in favor of using the global
'sev' flag directly.  While sev_hardware_enabled() checks max_sev_asid,
which is true even if KVM setup fails, 'sev' will be true if and only
if KVM setup fully succeeds.

Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Cc: Tom Lendacky 
Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c8ffdbc81709..ec742dabbd5b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1274,8 +1274,11 @@ void __init sev_hardware_setup(void)
goto out;
  
  	sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);

-   if (!sev_reclaim_asid_bitmap)
+   if (!sev_reclaim_asid_bitmap) {
+   bitmap_free(sev_asid_bitmap);
+   sev_asid_bitmap = NULL;
goto out;
+   }
  
  	pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);

sev_supported = true;



Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Tom Lendacky

On 1/21/21 9:55 AM, Tejun Heo wrote:

Hello,

On Thu, Jan 21, 2021 at 08:55:07AM -0600, Tom Lendacky wrote:

The hardware will allow any SEV capable ASID to be run as SEV-ES, however,
the SEV firmware will not allow the activation of an SEV-ES VM to be
assigned to an ASID greater than or equal to the SEV minimum ASID value. The
reason for the latter is to prevent an !SEV-ES ASID starting out as an
SEV-ES guest and then disabling the SEV-ES VMCB bit that is used by VMRUN.
This would result in the downgrading of the security of the VM without the
VM realizing it.

As a result, you have a range of ASIDs that can only run SEV-ES VMs and a
range of ASIDs that can only run SEV VMs.


I see. That makes sense. What's the downside of SEV-ES compared to SEV w/o
ES? Are there noticeable performance / feature penalties or is the split
mostly for backward compatibility?


SEV-ES is an incremental enhancement of SEV where the register state of 
the guest is protected/encrypted. As with a lot of performance questions, 
the answer is ...it depends. With SEV-ES, there is additional overhead 
associated with a world switch (VMRUN/VMEXIT) to restore and save 
additional register state. Also, exit events are now divided up into 
automatic exits (AE) and non-automatic exits (NAE). NAE events result in a 
new #VC exception being generated where the guest is then required to use 
the VMGEXIT instruction to communicate only the state necessary to perform 
the operation. A CPUID instruction is a good example, where a shared page 
is used to communicate required state to the hypervisor to perform the 
CPUID emulation, which then returns the results back through the shared 
page to the guest. So it all depends on how often the workload in question 
performs operations that result in a VMEXIT of the vCPU, etc.


Thanks,
Tom



Thanks.



Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-21 Thread Tom Lendacky

On 1/20/21 10:40 AM, Tejun Heo wrote:

Hello,

On Tue, Jan 19, 2021 at 11:13:51PM -0800, Vipin Sharma wrote:

Can you please elaborate? I skimmed through the amd manual and it seemed to
say that SEV-ES ASIDs are superset of SEV but !SEV-ES ASIDs. What's the use
case for mixing those two?


For example, customers can be given options for which kind of protection they
want to choose for their workloads based on factors like data protection
requirement, cost, speed, etc.


So, I'm looking for is a bit more in-depth analysis than that. ie. What's
the downside of SEV && !SEV-ES and is the disticntion something inherently
useful?


In terms of features SEV-ES is superset of SEV but that doesn't mean SEV
ASIDs are superset of SEV ASIDs. SEV ASIDs cannot be used for SEV-ES VMs
and similarly SEV-ES ASIDs cannot be used for SEV VMs. Once a system is
booted, based on the BIOS settings each type will have their own
capacity and that number cannot be changed until the next boot and BIOS
changes.


Here's an excerpt from the AMD's system programming manual, section 15.35.2:

   On some systems, there is a limitation on which ASID values can be used on
   SEV guests that are run with SEV-ES disabled. While SEV-ES may be enabled
   on any valid SEV ASID (as defined by CPUID Fn8000_001F[ECX]), there are
   restrictions on which ASIDs may be used for SEV guests with SEV- ES
   disabled. CPUID Fn8000_001F[EDX] indicates the minimum ASID value that
   must be used for an SEV-enabled, SEV-ES-disabled guest. For example, if
   CPUID Fn8000_001F[EDX] returns the value 5, then any VMs which use ASIDs
   1-4 and which enable SEV must also enable SEV-ES.


The hardware will allow any SEV capable ASID to be run as SEV-ES, however, 
the SEV firmware will not allow the activation of an SEV-ES VM to be 
assigned to an ASID greater than or equal to the SEV minimum ASID value. 
The reason for the latter is to prevent an !SEV-ES ASID starting out as an 
SEV-ES guest and then disabling the SEV-ES VMCB bit that is used by VMRUN. 
This would result in the downgrading of the security of the VM without the 
VM realizing it.


As a result, you have a range of ASIDs that can only run SEV-ES VMs and a 
range of ASIDs that can only run SEV VMs.


Thanks,
Tom




We are not mixing the two types of ASIDs, they are separate and used
separately.


Maybe in practice, the key management on the BIOS side is implemented in a
more restricted way but at least the processor manual says differently.


I'm very reluctant to ack vendor specific interfaces for a few reasons but
most importantly because they usually indicate abstraction and/or the
underlying feature not being sufficiently developed and they tend to become
baggages after a while. So, here are my suggestions:


My first patch was only for SEV, but soon we got comments that this can
be abstracted and used by TDX and SEID for their use cases.

I see this patch as providing an abstraction for simple accounting of
resources used for creating secure execution contexts. Here, secure
execution is achieved through different means. SEID, TDX, and SEV
provide security using different features and capabilities. I am not
sure if we will reach a point where all three and other vendors will use
the same approach and technology for this purpose.

Instead of each one coming up with their own resource tracking for their
features, this patch is providing a common framework and cgroup for
tracking these resources.


What's implemented is a shared place where similar things can be thrown in
bu from user's perspective the underlying hardware feature isn't really
abstracted. It's just exposing whatever hardware knobs there are. If you
look at any other cgroup controllers, nothing is exposing this level of
hardware dependent details and I'd really like to keep it that way.

So, what I'm asking for is more in-depth analysis of the landscape and
inherent differences among different vendor implementations to see whether
there can be better approaches or we should just wait and see.


* If there can be a shared abstraction which hopefully makes intuitive
   sense, that'd be ideal. It doesn't have to be one knob but it shouldn't be
   something arbitrary to specific vendors.


I think we should see these as features provided on a host. Tasks can
be executed securely on a host with the guarantees provided by the
specific feature (SEV, SEV-ES, TDX, SEID) used by the task.

I don't think each H/W vendor can agree to a common set of security
guarantees and approach.


Do TDX and SEID have multiple key types tho?


* If we aren't there yet and vendor-specific interface is a must, attach
   that part to an interface which is already vendor-aware.

Sorry, I don't understand this approach. Can you please give more
details about it?


Attaching the interface to kvm side, most likely, instead of exposing the
feature through cgroup.

Thanks.



Re: [PATCH] x86/sev: Add AMD_SEV_ES_GUEST Kconfig for including SEV-ES support

2021-01-18 Thread Tom Lendacky

On 1/18/21 12:03 PM, Paolo Bonzini wrote:

On 16/01/21 06:40, Tom Lendacky wrote:



Introduce a new Kconfig, AMD_SEV_ES_GUEST, to control the inclusion of
support for running as an SEV-ES guest.  Pivoting on AMD_MEM_ENCRYPT for
guest SEV-ES support is undesirable for host-only kernel builds as
AMD_MEM_ENCRYPT is also required to enable KVM/host support for SEV and
SEV-ES.


I believe only KVM_AMD_SEV is required to enable the KVM support to run 
SEV and SEV-ES guests. The AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT setting is 
only used to determine whether to enable the KVM SEV/SEV-ES support by 
default on module load.


Right:

     if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
     sev_hardware_setup();
     } else {
     sev = false;
     sev_es = false;
     }

I removed the addition to "config AMD_MEM_ENCRYPT_ from Sean's patch, but 
(despite merging it not once but twice) I don't really like the hidden 
dependency on AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT and thus AMD_MEM_ENCRYPT.  
Is there any reason to not always enable sev/sev_es by default?


I don't remember where the AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT suggestion 
originally came from. I thought it was from review feedback on the 
original SEV patches, but can't find anything about it. @Brijesh might 
remember.


But I see no reason not to enable them by default.

Thanks,
Tom



Paolo



Re: [PATCH] x86/sev: Add AMD_SEV_ES_GUEST Kconfig for including SEV-ES support

2021-01-15 Thread Tom Lendacky

On 1/15/21 6:25 PM, Sean Christopherson wrote:

Introduce a new Kconfig, AMD_SEV_ES_GUEST, to control the inclusion of
support for running as an SEV-ES guest.  Pivoting on AMD_MEM_ENCRYPT for
guest SEV-ES support is undesirable for host-only kernel builds as
AMD_MEM_ENCRYPT is also required to enable KVM/host support for SEV and
SEV-ES.


I believe only KVM_AMD_SEV is required to enable the KVM support to run 
SEV and SEV-ES guests. The AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT setting is 
only used to determine whether to enable the KVM SEV/SEV-ES support by 
default on module load.


Thanks,
Tom



A dedicated Kconfig also makes it easier to understand exactly what is
and isn't support in a given configuration.

Opportunistically update the AMD_MEM_ENCRYPT help text to note that it
also enables support for SEV guests.

Cc: Tom Lendacky 
Cc: Brijesh Singh 
Signed-off-by: Sean Christopherson 
---

Tested everything except an actual SEV-ES guest, I don't yet have a
workflow for testing those.

  arch/x86/Kconfig   | 13 -
  arch/x86/boot/compressed/Makefile  |  2 +-
  arch/x86/boot/compressed/idt_64.c  |  7 ---
  arch/x86/boot/compressed/idt_handlers_64.S |  2 +-
  arch/x86/boot/compressed/misc.h|  5 -
  arch/x86/entry/entry_64.S  |  2 +-
  arch/x86/include/asm/idtentry.h|  2 +-
  arch/x86/include/asm/mem_encrypt.h | 12 
  arch/x86/include/asm/realmode.h|  4 ++--
  arch/x86/include/asm/sev-es.h  |  2 +-
  arch/x86/kernel/Makefile   |  2 +-
  arch/x86/kernel/head64.c   |  6 +++---
  arch/x86/kernel/head_64.S  |  6 +++---
  arch/x86/kernel/idt.c  |  2 +-
  arch/x86/kernel/kvm.c  |  4 ++--
  arch/x86/mm/mem_encrypt.c  |  2 ++
  arch/x86/realmode/rm/header.S  |  2 +-
  arch/x86/realmode/rm/trampoline_64.S   |  4 ++--
  18 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f851179ff0..5f03e6313113 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1527,12 +1527,14 @@ config AMD_MEM_ENCRYPT
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
-   select INSTRUCTION_DECODER
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
  Encryption (SME).
  
+	  This also enables support for running as a Secure Encrypted

+ Virtualization (SEV) guest.
+
  config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
bool "Activate AMD Secure Memory Encryption (SME) by default"
default y
@@ -1547,6 +1549,15 @@ config AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
  If set to N, then the encryption of system memory can be
  activated with the mem_encrypt=on command line option.
  
+config AMD_SEV_ES_GUEST

+   bool "AMD Secure Encrypted Virtualization - Encrypted State (SEV-ES) Guest 
support"
+   depends on AMD_MEM_ENCRYPT
+   select INSTRUCTION_DECODER
+   help
+ Enable support for running as a Secure Encrypted Virtualization -
+ Encrypted State (SEV-ES) Guest.  This enables SEV-ES boot protocol
+ changes, #VC handling, SEV-ES specific hypercalls, etc...
+
  # Common NUMA Features
  config NUMA
bool "NUMA Memory Allocation and Scheduler Support"
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..8c036b6fc0c2 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -92,7 +92,7 @@ ifdef CONFIG_X86_64
vmlinux-objs-y += $(obj)/idt_64.o $(obj)/idt_handlers_64.o
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
-   vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
+   vmlinux-objs-$(CONFIG_AMD_SEV_ES_GUEST) += $(obj)/sev-es.o
  endif
  
  vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o

diff --git a/arch/x86/boot/compressed/idt_64.c 
b/arch/x86/boot/compressed/idt_64.c
index 804a502ee0d2..916dde4a84b6 100644
--- a/arch/x86/boot/compressed/idt_64.c
+++ b/arch/x86/boot/compressed/idt_64.c
@@ -33,8 +33,9 @@ void load_stage1_idt(void)
boot_idt_desc.address = (unsigned long)boot_idt;
  
  
-	if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT))

-   set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
+#ifdef CONFIG_AMD_SEV_ES_GUEST
+   set_idt_entry(X86_TRAP_VC, boot_stage1_vc);
+#endif
  
  	load_boot_idt(&boot_idt_desc);

  }
@@ -46,7 +47,7 @@ void load_stage2_idt(void)
  
  	set_idt_entry(X86_TRAP_PF, boot_page_fault);
  
-#ifdef CONFIG_AMD_MEM_ENCRYPT

+#ifdef CONFIG_AMD_SEV_ES_GUEST
set_idt_entry(X86_TRAP_VC, boot_stage2_vc);
  #endif
  
diff --git a/arch/x86/boot/compressed/

Re: [PATCH v2 14/14] KVM: SVM: Skip SEV cache flush if no ASIDs have been used

2021-01-15 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Skip SEV's expensive WBINVD and DF_FLUSH if there are no SEV ASIDs
waiting to be reclaimed, e.g. if SEV was never used.  This "fixes" an
issue where the DF_FLUSH fails during hardware teardown if the original
SEV_INIT failed.  Ideally, SEV wouldn't be marked as enabled in KVM if
SEV_INIT fails, but that's a problem for another day.

Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/sev.c | 22 ++
  1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 23a4bead4a82..e71bc742d8da 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -56,9 +56,14 @@ struct enc_region {
unsigned long size;
  };
  
-static int sev_flush_asids(void)

+static int sev_flush_asids(int min_asid, int max_asid)
  {
-   int ret, error = 0;
+   int ret, pos, error = 0;
+
+   /* Check if there are any ASIDs to reclaim before performing a flush */
+   pos = find_next_bit(sev_reclaim_asid_bitmap, max_sev_asid, min_asid);
+   if (pos >= max_asid)
+   return -EBUSY;
  
  	/*

 * DEACTIVATE will clear the WBINVD indicator causing DF_FLUSH to fail,
@@ -80,14 +85,7 @@ static int sev_flush_asids(void)
  /* Must be called with the sev_bitmap_lock held */
  static bool __sev_recycle_asids(int min_asid, int max_asid)
  {
-   int pos;
-
-   /* Check if there are any ASIDs to reclaim before performing a flush */
-   pos = find_next_bit(sev_reclaim_asid_bitmap, max_sev_asid, min_asid);
-   if (pos >= max_asid)
-   return false;
-
-   if (sev_flush_asids())
+   if (sev_flush_asids(min_asid, max_asid))
return false;
  
  	/* The flush process will flush all reclaimable SEV and SEV-ES ASIDs */

@@ -1323,10 +1321,10 @@ void sev_hardware_teardown(void)
if (!sev_enabled)
return;
  
+	sev_flush_asids(0, max_sev_asid);


I guess you could have called __sev_recycle_asids(0, max_sev_asid) here 
and left things unchanged up above. It would do the extra bitmap_xor() and 
bitmap_zero() operations, though. What do you think?


Also, maybe a comment about not needing the bitmap lock because this is 
during teardown.


Thanks,
Tom


+
bitmap_free(sev_asid_bitmap);
bitmap_free(sev_reclaim_asid_bitmap);
-
-   sev_flush_asids();
  }
  
  int sev_cpu_init(struct svm_cpu_data *sd)




Re: [PATCH v2 13/14] KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Remove the forward declaration of sev_flush_asids(), which is only a few
lines above the function itself.

No functional change intended.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7e14514dd083..23a4bead4a82 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -41,7 +41,6 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
  #endif /* CONFIG_KVM_AMD_SEV */
  
  static u8 sev_enc_bit;

-static int sev_flush_asids(void);
  static DECLARE_RWSEM(sev_deactivate_lock);
  static DEFINE_MUTEX(sev_bitmap_lock);
  unsigned int max_sev_asid;



Re: [PATCH v2 12/14] KVM: SVM: Drop redundant svm_sev_enabled() helper

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or
in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled().
This effectively replaces checks against a valid max_sev_asid with checks
against sev_enabled.  sev_enabled is forced off by sev_hardware_setup()
if max_sev_asid is invalid, all call sites are guaranteed to run after
sev_hardware_setup(), and all of the checks care about SEV being fully
enabled (as opposed to intentionally handling the scenario where
max_sev_asid is valid but SEV enabling fails due to OOM).

Signed-off-by: Sean Christopherson 


Ultimately the #ifdef CONFIG_KVM_AMD_SEV that you added that #defines 
sev_enabled and sev_es_enabled to false, resolves the build issue when 
kvm_amd is built into the kernel and ccp is built as a module, for which 
svm_sev_enabled() was originally created.


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 6 +++---
  arch/x86/kvm/svm/svm.h | 5 -
  2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a2c3e2d42a7f..7e14514dd083 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1057,7 +1057,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
struct kvm_sev_cmd sev_cmd;
int r;
  
-	if (!svm_sev_enabled() || !sev_enabled)

+   if (!sev_enabled)
return -ENOTTY;
  
  	if (!argp)

@@ -1321,7 +1321,7 @@ void __init sev_hardware_setup(void)
  
  void sev_hardware_teardown(void)

  {
-   if (!svm_sev_enabled())
+   if (!sev_enabled)
return;
  
  	bitmap_free(sev_asid_bitmap);

@@ -1332,7 +1332,7 @@ void sev_hardware_teardown(void)
  
  int sev_cpu_init(struct svm_cpu_data *sd)

  {
-   if (!svm_sev_enabled())
+   if (!sev_enabled)
return 0;
  
  	sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1, sizeof(void *),

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4eb4bab0ca3e..8cb4395b58a0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -569,11 +569,6 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
  
  extern unsigned int max_sev_asid;
  
-static inline bool svm_sev_enabled(void)

-{
-   return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
-}
-
  void sev_vm_destroy(struct kvm *kvm);
  int svm_mem_enc_op(struct kvm *kvm, void __user *argp);
  int svm_register_enc_region(struct kvm *kvm,



Re: [PATCH v2 11/14] KVM: SVM: Move SEV VMCB tracking allocation to sev.c

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Move the allocation of the SEV VMCB array to sev.c to help pave the way
toward encapsulating SEV enabling wholly within sev.c.

No functional change intended.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 13 +
  arch/x86/kvm/svm/svm.c | 17 -
  arch/x86/kvm/svm/svm.h |  1 +
  3 files changed, 22 insertions(+), 9 deletions(-)



Re: [PATCH v2 11/14] KVM: SVM: Move SEV VMCB tracking allocation to sev.c

2021-01-14 Thread Tom Lendacky

On 1/14/21 3:37 PM, Brijesh Singh wrote:


On 1/13/21 6:37 PM, Sean Christopherson wrote:

Move the allocation of the SEV VMCB array to sev.c to help pave the way
toward encapsulating SEV enabling wholly within sev.c.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/sev.c | 13 +
  arch/x86/kvm/svm/svm.c | 17 -
  arch/x86/kvm/svm/svm.h |  1 +
  3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1a143340103e..a2c3e2d42a7f 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1330,6 +1330,19 @@ void sev_hardware_teardown(void)
sev_flush_asids();
  }
  
+int sev_cpu_init(struct svm_cpu_data *sd)

+{
+   if (!svm_sev_enabled())
+   return 0;
+
+   sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1, sizeof(void *),
+ GFP_KERNEL | __GFP_ZERO);



I saw Tom recommended to use kzalloc.. instead of __GFP_ZERO in previous


kcalloc :)

Thanks,
Tom


patch. With that fixed,

Reviewed-by: Brijesh Singh 



+   if (!sd->sev_vmcbs)
+   return -ENOMEM;
+
+   return 0;
+}
+
  /*
   * Pages used by hardware to hold guest encrypted state must be flushed before
   * returning them to the system.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index bb7b99743bea..89b95fb87a0c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -552,23 +552,22 @@ static void svm_cpu_uninit(int cpu)
  static int svm_cpu_init(int cpu)
  {
struct svm_cpu_data *sd;
+   int ret;
  
  	sd = kzalloc(sizeof(struct svm_cpu_data), GFP_KERNEL);

if (!sd)
return -ENOMEM;
sd->cpu = cpu;
sd->save_area = alloc_page(GFP_KERNEL);
-   if (!sd->save_area)
+   if (!sd->save_area) {
+   ret = -ENOMEM;
goto free_cpu_data;
+   }
clear_page(page_address(sd->save_area));
  
-	if (svm_sev_enabled()) {

-   sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1,
- sizeof(void *),
- GFP_KERNEL | __GFP_ZERO);
-   if (!sd->sev_vmcbs)
-   goto free_save_area;
-   }
+   ret = sev_cpu_init(sd);
+   if (ret)
+   goto free_save_area;
  
  	per_cpu(svm_data, cpu) = sd;
  
@@ -578,7 +577,7 @@ static int svm_cpu_init(int cpu)

__free_page(sd->save_area);
  free_cpu_data:
kfree(sd);
-   return -ENOMEM;
+   return ret;
  
  }
  
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h

index 8e169835f52a..4eb4bab0ca3e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -583,6 +583,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
  void pre_sev_run(struct vcpu_svm *svm, int cpu);
  void __init sev_hardware_setup(void);
  void sev_hardware_teardown(void);
+int sev_cpu_init(struct svm_cpu_data *sd);
  void sev_free_vcpu(struct kvm_vcpu *vcpu);
  int sev_handle_vmgexit(struct vcpu_svm *svm);
  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int 
in);


Re: [PATCH v2 10/14] KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Query max_sev_asid directly after setting it instead of bouncing through
its wrapper, svm_sev_enabled().  Using the wrapper is unnecessary
obfuscation.

No functional change intended.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 02a66008e9b9..1a143340103e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1278,8 +1278,7 @@ void __init sev_hardware_setup(void)
  
  	/* Maximum number of encrypted guests supported simultaneously */

max_sev_asid = ecx;
-
-   if (!svm_sev_enabled())
+   if (!max_sev_asid)
goto out;
  
  	/* Minimum ASID value that should be used for SEV guest */




Re: [PATCH v2 09/14] KVM: SVM: Unconditionally invoke sev_hardware_teardown()

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Remove the redundant svm_sev_enabled() check when calling
sev_hardware_teardown(), the teardown helper itself does the check.
Removing the check from svm.c will eventually allow dropping
svm_sev_enabled() entirely.

No functional change intended.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/svm.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f89f702b2a58..bb7b99743bea 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -887,8 +887,7 @@ static void svm_hardware_teardown(void)
  {
int cpu;
  
-	if (svm_sev_enabled())

-   sev_hardware_teardown();
+   sev_hardware_teardown();
  
  	for_each_possible_cpu(cpu)

svm_cpu_uninit(cpu);



Re: [PATCH v2 08/14] KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Define sev_enabled and sev_es_enabled as 'false' and explicitly #ifdef
out all of sev_hardware_setup() if CONFIG_KVM_AMD_SEV=n.  This kills
three birds at once:

   - Makes sev_enabled and sev_es_enabled off by default if
 CONFIG_KVM_AMD_SEV=n.  Previously, they could be on by default if
 CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y, regardless of KVM SEV
 support.

   - Hides the sev and sev_es module params when CONFIG_KVM_AMD_SEV=n.

   - Resolves a false positive -Wnonnull in __sev_recycle_asids() that is
 currently masked by the equivalent IS_ENABLED(CONFIG_KVM_AMD_SEV)
 check in svm_sev_enabled(), which will be dropped in a future patch.

Cc: Tom Lendacky 
Signed-off-by: Sean Christopherson 


Reviewed by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a024edabaca5..02a66008e9b9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -28,12 +28,17 @@
  #define __ex(x) __kvm_handle_fault_on_reboot(x)
  
  /* enable/disable SEV support */

+#ifdef CONFIG_KVM_AMD_SEV
  static bool sev_enabled = 
IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
  module_param_named(sev, sev_enabled, bool, 0444);
  
  /* enable/disable SEV-ES support */

  static bool sev_es_enabled = 
IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
  module_param_named(sev_es, sev_es_enabled, bool, 0444);
+#else
+#define sev_enabled false
+#define sev_es_enabled false
+#endif /* CONFIG_KVM_AMD_SEV */
  
  static u8 sev_enc_bit;

  static int sev_flush_asids(void);
@@ -1253,11 +1258,12 @@ void sev_vm_destroy(struct kvm *kvm)
  
  void __init sev_hardware_setup(void)

  {
+#ifdef CONFIG_KVM_AMD_SEV
unsigned int eax, ebx, ecx, edx;
bool sev_es_supported = false;
bool sev_supported = false;
  
-	if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev_enabled)

+   if (!sev_enabled)
goto out;
  
  	/* Does the CPU support SEV? */

@@ -1311,6 +1317,7 @@ void __init sev_hardware_setup(void)
  out:
sev_enabled = sev_supported;
sev_es_enabled = sev_es_supported;
+#endif
  }
  
  void sev_hardware_teardown(void)




Re: [PATCH v2 03/14] KVM: SVM: Move SEV module params/variables to sev.c

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:36 PM, Sean Christopherson wrote:

Unconditionally invoke sev_hardware_setup() when configuring SVM and
handle clearing the module params/variable 'sev' and 'sev_es' in
sev_hardware_setup().  This allows making said variables static within
sev.c and reduces the odds of a collision with guest code, e.g. the guest
side of things has already laid claim to 'sev_enabled'.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 11 +++
  arch/x86/kvm/svm/svm.c | 15 +--
  arch/x86/kvm/svm/svm.h |  2 --
  3 files changed, 12 insertions(+), 16 deletions(-)



Re: [PATCH v2 02/14] KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails

2021-01-14 Thread Tom Lendacky

On 1/14/21 11:12 AM, Sean Christopherson wrote:

On Thu, Jan 14, 2021, Tom Lendacky wrote:

On 1/13/21 6:36 PM, Sean Christopherson wrote:

Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise
KVM will unnecessarily keep the bitmap when SEV is not fully enabled.

Freeing the page is also necessary to avoid introducing a bug when a
future patch eliminates svm_sev_enabled() in favor of using the global
'sev' flag directly.  While sev_hardware_enabled() checks max_sev_asid,
which is true even if KVM setup fails, 'sev' will be true if and only
if KVM setup fully succeeds.

Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")


Oops, missed this last time... I don't think the Fixes: tag is needed 
anymore unless you don't want the memory consumption of the first bitmap, 
should the allocation of the second bitmap fail, until kvm_amd is 
rmmod'ed. Up to you.


Thanks,
Tom


Cc: Tom Lendacky 
Signed-off-by: Sean Christopherson 
---
   arch/x86/kvm/svm/sev.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c8ffdbc81709..0eeb6e1b803d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1274,8 +1274,10 @@ void __init sev_hardware_setup(void)
goto out;
sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
-   if (!sev_reclaim_asid_bitmap)
+   if (!sev_reclaim_asid_bitmap) {
+   bitmap_free(sev_asid_bitmap);


Until that future change, you probably need to do sev_asid_bitmap = NULL
here to avoid an issue in sev_hardware_teardown() when it tries to free it
again.


Argh, you're right.  Thanks!



Re: [PATCH v2 06/14] x86/sev: Drop redundant and potentially misleading 'sev_enabled'

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:37 PM, Sean Christopherson wrote:

Drop the sev_enabled flag and switch its one user over to sev_active().
sev_enabled was made redundant with the introduction of sev_status in
commit b57de6cd1639 ("x86/sev-es: Add SEV-ES Feature Detection").
sev_enabled and sev_active() are guaranteed to be equivalent, as each is
true iff 'sev_status & MSR_AMD64_SEV_ENABLED' is true, and are only ever
written in tandem (ignoring compressed boot's version of sev_status).

Removing sev_enabled avoids confusion over whether it refers to the guest
or the host, and will also allow KVM to usurp "sev_enabled" for its own
purposes.

No functional change intended.

Signed-off-by: Sean Christopherson 


Reviewed-by: Tom Lendacky 


---
  arch/x86/include/asm/mem_encrypt.h |  1 -
  arch/x86/mm/mem_encrypt.c  | 12 +---
  arch/x86/mm/mem_encrypt_identity.c |  1 -
  3 files changed, 5 insertions(+), 9 deletions(-)



Re: [PATCH v2 01/14] KVM: SVM: Zero out the VMCB array used to track SEV ASID association

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:36 PM, Sean Christopherson wrote:

Zero out the array of VMCB pointers so that pre_sev_run() won't see
garbage when querying the array to detect when an SEV ASID is being
associated with a new VMCB.  In practice, reading random values is all
but guaranteed to be benign as a false negative (which is extremely
unlikely on its own) can only happen on CPU0 on the first VMRUN and would
only cause KVM to skip the ASID flush.  For anything bad to happen, a
previous instance of KVM would have to exit without flushing the ASID,
_and_ KVM would have to not flush the ASID at any time while building the
new SEV guest.

Cc: Borislav Petkov 
Cc: Tom Lendacky 
Cc: Brijesh Singh 
Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is 
enabled")
Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/svm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ef171790d02..ccf52c5531fb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -573,7 +573,7 @@ static int svm_cpu_init(int cpu)
if (svm_sev_enabled()) {
sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1,
  sizeof(void *),
- GFP_KERNEL);
+ GFP_KERNEL | __GFP_ZERO);


Alternatively, this call could just be changed to kcalloc().

Either way,

Reviewed-by: Tom Lendacky 


if (!sd->sev_vmcbs)
goto free_save_area;
}



Re: [PATCH v2 02/14] KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails

2021-01-14 Thread Tom Lendacky

On 1/13/21 6:36 PM, Sean Christopherson wrote:

Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise
KVM will unnecessarily keep the bitmap when SEV is not fully enabled.

Freeing the page is also necessary to avoid introducing a bug when a
future patch eliminates svm_sev_enabled() in favor of using the global
'sev' flag directly.  While sev_hardware_enabled() checks max_sev_asid,
which is true even if KVM setup fails, 'sev' will be true if and only
if KVM setup fully succeeds.

Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Cc: Tom Lendacky 
Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/sev.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c8ffdbc81709..0eeb6e1b803d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1274,8 +1274,10 @@ void __init sev_hardware_setup(void)
goto out;
  
  	sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);

-   if (!sev_reclaim_asid_bitmap)
+   if (!sev_reclaim_asid_bitmap) {
+   bitmap_free(sev_asid_bitmap);


Until that future change, you probably need to do sev_asid_bitmap = NULL 
here to avoid an issue in sev_hardware_teardown() when it tries to free it 
again.


Thanks,
Tom


goto out;
+   }
  
  	pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);

sev_supported = true;



Re: [PATCH V2] x86/sev-es: Fix SEV-ES #VC handler for string port IO

2021-01-11 Thread Tom Lendacky

On 1/10/21 1:11 AM, Hyunwook (Wooky) Baek wrote:

Don't assume dest/source buffers are userspace addresses when manually
copying data for string I/O or MOVS MMIO, as {get,put}_user() will fail
if handed a kernel address and ultimately lead to a kernel panic.

Signed-off-by: Hyunwook (Wooky) Baek 
Acked-by: David Rientjes 
---

This patch is tested by invoking INSB/OUTSB instructions in kernel space in a
SEV-ES-enabled VM. Without the patch, the kernel crashed with the following
message:
   "SEV-ES: Unsupported exception in #VC instruction emulation - can't continue"
With the patch, the instructions successfully read/wrote the string from/to
the I/O port.


Shouldn't this have a Fixes: tag?

Thanks,
Tom



  arch/x86/kernel/sev-es.c | 12 
  1 file changed, 12 insertions(+)



Re: [PATCH 11/13] KVM: SVM: Drop redundant svm_sev_enabled() helper

2021-01-11 Thread Tom Lendacky
On 1/8/21 6:47 PM, Sean Christopherson wrote:
> Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or
> in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled().
> This effectively replaces checks against a valid max_sev_asid with checks
> against sev_enabled.  sev_enabled is forced off by sev_hardware_setup()
> if max_sev_asid is invalid, all call sites are guaranteed to run after
> sev_hardware_setup(), and all of the checks care about SEV being fully
> enabled (as opposed to intentionally handling the scenario where
> max_sev_asid is valid but SEV enabling fails due to OOM).
> 
> Signed-off-by: Sean Christopherson 
> ---
>   arch/x86/kvm/svm/sev.c | 6 +++---
>   arch/x86/kvm/svm/svm.h | 5 -
>   2 files changed, 3 insertions(+), 8 deletions(-)
> 

With CONFIG_KVM=y, CONFIG_KVM_AMD=y and CONFIG_CRYPTO_DEV_CCP_DD=m, I get
the following build warning:

make: Entering directory '/root/kernels/kvm-build-x86_64'
  DESCEND  objtool   
  CALLscripts/atomic/check-atomics.sh
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h   

  CC  arch/x86/kvm/svm/svm.o


  CC  arch/x86/kvm/svm/nested.o 

  CC  arch/x86/kvm/svm/avic.o   


  CC  arch/x86/kvm/svm/sev.o  
In file included from ./include/linux/cpumask.h:12, 
 from ./arch/x86/include/asm/cpumask.h:5,
 from ./arch/x86/include/asm/msr.h:11,
 from ./arch/x86/include/asm/processor.h:22,
 from ./arch/x86/include/asm/cpufeature.h:5,


 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:78,
 from ./include/linux/percpu.h:6,
 from ./include/linux/context_tracking_state.h:5,
 from ./include/linux/hardirq.h:5,
 from ./include/linux/kvm_host.h:7,
 from arch/x86/kvm/svm/sev.c:11:
In function ‘bitmap_zero’,
inlined from ‘__sev_recycle_asids’ at arch/x86/kvm/svm/sev.c:92:2,
inlined from ‘sev_asid_new’ at arch/x86/kvm/svm/sev.c:113:16,
inlined from ‘sev_guest_init’ at arch/x86/kvm/svm/sev.c:195:9:
./include/linux/bitmap.h:238:2: warning: argument 1 null where non-null 
expected [-Wnonnull]
  238 |  memset(dst, 0, len);
  |  ^~~
In file included from ./arch/x86/include/asm/string.h:5,
 from ./include/linux/string.h:20,
 from ./include/linux/bitmap.h:9,
 from ./include/linux/cpumask.h:12,
 from ./arch/x86/include/asm/cpumask.h:5,
 from ./arch/x86/include/asm/msr.h:11,
 from ./arch/x86/include/asm/processor.h:22,
 from ./arch/x86/include/asm/cpufeature.h:5,
 from ./arch/x86/include/asm/thread_info.h:53,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:78,
 from ./include/linux/percpu.h:6,
 from ./include/linux/context_tracking_state.h:5,
 from ./include/linux/hardirq.h:5,
 from ./include/linux/kvm_host.h:7,
 from arch/x86/kvm/svm/sev.c:11:
arch/x86/kvm/svm/sev.c: In function ‘sev_guest_init’:
./arch/x86/include/asm/string_64.h:18:7: note: in a call to function ‘memset’ 
declared here
   18 | void *memset(void *s, int c, size_t n);
  |   ^~

Thanks,
Tom

> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 8c34c467a09d..1b9174a49b65 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1052,7 +1052,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   struct kvm_sev_cmd sev_cmd;
>   int r;
>   
> - if (!svm_sev_enabled() || !sev_enabled)
> + if (!sev_enabled)
>   return -ENOTTY;
>   
>   if (!argp)
> @@ -1314,7 +1314,7 @@ void __init sev_hardware_setup(void)
>   
>   void sev_hardware_teardown(void)
>   {
> - if (!svm_sev_enabled())
> + if (!sev_enabled)
>   return;
>   
>   bitmap_free(sev_asid_bitmap);
> @@ -1325,7 +1325,7 @@ void sev_hardware_teardown(void)
>   
>   int sev_cpu_init(struct svm_cpu_data *sd)
>   {
> - if (!svm_sev_enabled())
> +

Re: [PATCH 06/13] x86/sev: Rename global "sev_enabled" flag to "sev_guest"

2021-01-11 Thread Tom Lendacky

On 1/11/21 10:02 AM, Tom Lendacky wrote:

On 1/8/21 6:47 PM, Sean Christopherson wrote:

Use "guest" instead of "enabled" for the global "running as an SEV guest"
flag to avoid confusion over whether "sev_enabled" refers to the guest or
the host.  This will also allow KVM to usurp "sev_enabled" for its own
purposes.

No functional change intended.

Signed-off-by: Sean Christopherson 


Acked-by: Tom Lendacky 


Ah, I tried building with CONFIG_KVM=y and CONFIG_KVM_AMD=y and got a 
build error:


In file included from arch/x86/kvm/svm/svm.c:43:
arch/x86/kvm/svm/svm.h:222:20: error: ‘sev_guest’ redeclared as different 
kind of symbol

  222 | static inline bool sev_guest(struct kvm *kvm)
  |^
In file included from ./include/linux/mem_encrypt.h:17,
 from ./arch/x86/include/asm/page_types.h:7,
 from ./arch/x86/include/asm/page.h:9,
 from ./arch/x86/include/asm/thread_info.h:12,
 from ./include/linux/thread_info.h:38,
 from ./arch/x86/include/asm/preempt.h:7,
 from ./include/linux/preempt.h:78,
 from ./include/linux/percpu.h:6,
 from ./include/linux/context_tracking_state.h:5,
 from ./include/linux/hardirq.h:5,
 from ./include/linux/kvm_host.h:7,
 from arch/x86/kvm/svm/svm.c:3:
./arch/x86/include/asm/mem_encrypt.h:23:13: note: previous declaration of 
‘sev_guest’ was here

   23 | extern bool sev_guest;
  | ^

Thanks,
Tom




---
  arch/x86/include/asm/mem_encrypt.h | 2 +-
  arch/x86/mm/mem_encrypt.c  | 4 ++--
  arch/x86/mm/mem_encrypt_identity.c | 2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h

index 2f62bbdd9d12..9b3990928674 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -20,7 +20,7 @@
  extern u64 sme_me_mask;
  extern u64 sev_status;
-extern bool sev_enabled;
+extern bool sev_guest;
  void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
   unsigned long decrypted_kernel_vaddr,
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index bc0833713be9..0f798355de03 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -44,7 +44,7 @@ EXPORT_SYMBOL(sme_me_mask);
  DEFINE_STATIC_KEY_FALSE(sev_enable_key);
  EXPORT_SYMBOL_GPL(sev_enable_key);
-bool sev_enabled __section(".data");
+bool sev_guest __section(".data");
  /* Buffer used for early in-place encryption by BSP, no locking needed */
  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
@@ -344,7 +344,7 @@ int __init early_set_memory_encrypted(unsigned long 
vaddr, unsigned long size)

   */
  bool sme_active(void)
  {
-    return sme_me_mask && !sev_enabled;
+    return sme_me_mask && !sev_guest;
  }
  bool sev_active(void)
diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c

index 6c5eb6f3f14f..91b6b899c02b 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -545,7 +545,7 @@ void __init sme_enable(struct boot_params *bp)
  /* SEV state cannot be controlled by a command line option */
  sme_me_mask = me_mask;
-    sev_enabled = true;
+    sev_guest = true;
  physical_mask &= ~sme_me_mask;
  return;
  }



Re: [PATCH 07/13] KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables

2021-01-11 Thread Tom Lendacky

On 1/8/21 6:47 PM, Sean Christopherson wrote:

Rename sev and sev_es to sev_enabled and sev_es_enabled respectively to
better align with other KVM terminology, and to avoid pseudo-shadowing
when the variables are moved to sev.c in a future patch ('sev' is often
used for local struct kvm_sev_info pointers).

No functional change intended.

Signed-off-by: Sean Christopherson 


Acked-by: Tom Lendacky 


---
  arch/x86/kvm/svm/sev.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8ba93b8fa435..a024edabaca5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -28,12 +28,12 @@
  #define __ex(x) __kvm_handle_fault_on_reboot(x)
  
  /* enable/disable SEV support */

-static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
-module_param(sev, int, 0444);
+static bool sev_enabled = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param_named(sev, sev_enabled, bool, 0444);
  
  /* enable/disable SEV-ES support */

-static int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
-module_param(sev_es, int, 0444);
+static bool sev_es_enabled = 
IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param_named(sev_es, sev_es_enabled, bool, 0444);
  
  static u8 sev_enc_bit;

  static int sev_flush_asids(void);
@@ -213,7 +213,7 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
  
  static int sev_es_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)

  {
-   if (!sev_es)
+   if (!sev_es_enabled)
return -ENOTTY;
  
  	to_kvm_svm(kvm)->sev_info.es_active = true;

@@ -1052,7 +1052,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
struct kvm_sev_cmd sev_cmd;
int r;
  
-	if (!svm_sev_enabled() || !sev)

+   if (!svm_sev_enabled() || !sev_enabled)
return -ENOTTY;
  
  	if (!argp)

@@ -1257,7 +1257,7 @@ void __init sev_hardware_setup(void)
bool sev_es_supported = false;
bool sev_supported = false;
  
-	if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev)

+   if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev_enabled)
goto out;
  
  	/* Does the CPU support SEV? */

@@ -1294,7 +1294,7 @@ void __init sev_hardware_setup(void)
sev_supported = true;
  
  	/* SEV-ES support requested? */

-   if (!sev_es)
+   if (!sev_es_enabled)
goto out;
  
  	/* Does the CPU support SEV-ES? */

@@ -1309,8 +1309,8 @@ void __init sev_hardware_setup(void)
sev_es_supported = true;
  
  out:

-   sev = sev_supported;
-   sev_es = sev_es_supported;
+   sev_enabled = sev_supported;
+   sev_es_enabled = sev_es_supported;
  }
  
  void sev_hardware_teardown(void)




Re: [PATCH 06/13] x86/sev: Rename global "sev_enabled" flag to "sev_guest"

2021-01-11 Thread Tom Lendacky

On 1/8/21 6:47 PM, Sean Christopherson wrote:

Use "guest" instead of "enabled" for the global "running as an SEV guest"
flag to avoid confusion over whether "sev_enabled" refers to the guest or
the host.  This will also allow KVM to usurp "sev_enabled" for its own
purposes.

No functional change intended.

Signed-off-by: Sean Christopherson 


Acked-by: Tom Lendacky 


---
  arch/x86/include/asm/mem_encrypt.h | 2 +-
  arch/x86/mm/mem_encrypt.c  | 4 ++--
  arch/x86/mm/mem_encrypt_identity.c | 2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 2f62bbdd9d12..9b3990928674 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -20,7 +20,7 @@
  
  extern u64 sme_me_mask;

  extern u64 sev_status;
-extern bool sev_enabled;
+extern bool sev_guest;
  
  void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,

 unsigned long decrypted_kernel_vaddr,
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index bc0833713be9..0f798355de03 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -44,7 +44,7 @@ EXPORT_SYMBOL(sme_me_mask);
  DEFINE_STATIC_KEY_FALSE(sev_enable_key);
  EXPORT_SYMBOL_GPL(sev_enable_key);
  
-bool sev_enabled __section(".data");

+bool sev_guest __section(".data");
  
  /* Buffer used for early in-place encryption by BSP, no locking needed */

  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
@@ -344,7 +344,7 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
   */
  bool sme_active(void)
  {
-   return sme_me_mask && !sev_enabled;
+   return sme_me_mask && !sev_guest;
  }
  
  bool sev_active(void)

diff --git a/arch/x86/mm/mem_encrypt_identity.c 
b/arch/x86/mm/mem_encrypt_identity.c
index 6c5eb6f3f14f..91b6b899c02b 100644
--- a/arch/x86/mm/mem_encrypt_identity.c
+++ b/arch/x86/mm/mem_encrypt_identity.c
@@ -545,7 +545,7 @@ void __init sme_enable(struct boot_params *bp)
  
  		/* SEV state cannot be controlled by a command line option */

sme_me_mask = me_mask;
-   sev_enabled = true;
+   sev_guest = true;
physical_mask &= ~sme_me_mask;
return;
}



Re: [PATCH 03/13] KVM: SVM: Move SEV module params/variables to sev.c

2021-01-11 Thread Tom Lendacky

On 1/11/21 4:42 AM, Vitaly Kuznetsov wrote:

Sean Christopherson  writes:


Unconditionally invoke sev_hardware_setup() when configuring SVM and
handle clearing the module params/variable 'sev' and 'sev_es' in
sev_hardware_setup().  This allows making said variables static within
sev.c and reduces the odds of a collision with guest code, e.g. the guest
side of things has already laid claim to 'sev_enabled'.

Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/sev.c | 11 +++
  arch/x86/kvm/svm/svm.c | 15 +--
  arch/x86/kvm/svm/svm.h |  2 --
  3 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0eeb6e1b803d..8ba93b8fa435 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -27,6 +27,14 @@
  
  #define __ex(x) __kvm_handle_fault_on_reboot(x)
  
+/* enable/disable SEV support */

+static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev, int, 0444);
+
+/* enable/disable SEV-ES support */
+static int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev_es, int, 0444);


Two stupid questions (and not really related to your patch) for
self-eduacation if I may:

1) Why do we rely on CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT (which
sound like it control the guest side of things) to set defaults here?


I thought it was a review comment, but I'm not able to find it now.

Brijesh probably remembers better than me.



2) It appears to be possible to do 'modprobe kvm_amd sev=0 sev_es=1' and
this looks like a bogus configuration, should we make an effort to
validate the correctness upon module load?


This will still result in an overall sev=0 sev_es=0. Is the question just 
about issuing a message based on the initial values specified?


Thanks,
Tom




+
  static u8 sev_enc_bit;
  static int sev_flush_asids(void);
  static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1249,6 +1257,9 @@ void __init sev_hardware_setup(void)
bool sev_es_supported = false;
bool sev_supported = false;
  
+	if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev)

+   goto out;
+
/* Does the CPU support SEV? */
if (!boot_cpu_has(X86_FEATURE_SEV))
goto out;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index ccf52c5531fb..f89f702b2a58 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -189,14 +189,6 @@ module_param(vls, int, 0444);
  static int vgif = true;
  module_param(vgif, int, 0444);
  
-/* enable/disable SEV support */

-int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
-module_param(sev, int, 0444);
-
-/* enable/disable SEV-ES support */
-int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
-module_param(sev_es, int, 0444);
-
  bool __read_mostly dump_invalid_vmcb;
  module_param(dump_invalid_vmcb, bool, 0644);
  
@@ -976,12 +968,7 @@ static __init int svm_hardware_setup(void)

kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}
  
-	if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {

-   sev_hardware_setup();
-   } else {
-   sev = false;
-   sev_es = false;
-   }
+   sev_hardware_setup();
  
  	svm_adjust_mmio_mask();
  
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h

index 0fe874ae5498..8e169835f52a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -408,8 +408,6 @@ static inline bool gif_set(struct vcpu_svm *svm)
  #define MSR_CR3_LONG_MBZ_MASK 0xfff0U
  #define MSR_INVALID   0xU
  
-extern int sev;

-extern int sev_es;
  extern bool dump_invalid_vmcb;
  
  u32 svm_msrpm_offset(u32 msr);




Re: [PATCH 03/13] KVM: SVM: Move SEV module params/variables to sev.c

2021-01-11 Thread Tom Lendacky
On 1/8/21 6:47 PM, Sean Christopherson wrote:
> Unconditionally invoke sev_hardware_setup() when configuring SVM and
> handle clearing the module params/variable 'sev' and 'sev_es' in
> sev_hardware_setup().  This allows making said variables static within
> sev.c and reduces the odds of a collision with guest code, e.g. the guest
> side of things has already laid claim to 'sev_enabled'.
> 
> Signed-off-by: Sean Christopherson 
> ---
>   arch/x86/kvm/svm/sev.c | 11 +++
>   arch/x86/kvm/svm/svm.c | 15 +--
>   arch/x86/kvm/svm/svm.h |  2 --
>   3 files changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0eeb6e1b803d..8ba93b8fa435 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -27,6 +27,14 @@
>   
>   #define __ex(x) __kvm_handle_fault_on_reboot(x)
>   
> +/* enable/disable SEV support */
> +static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> +module_param(sev, int, 0444);
> +
> +/* enable/disable SEV-ES support */
> +static int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> +module_param(sev_es, int, 0444);
> +
>   static u8 sev_enc_bit;
>   static int sev_flush_asids(void);
>   static DECLARE_RWSEM(sev_deactivate_lock);
> @@ -1249,6 +1257,9 @@ void __init sev_hardware_setup(void)
>   bool sev_es_supported = false;
>   bool sev_supported = false;
>   
> + if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev)
> + goto out;
> +
>   /* Does the CPU support SEV? */
>   if (!boot_cpu_has(X86_FEATURE_SEV))
>   goto out;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ccf52c5531fb..f89f702b2a58 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -189,14 +189,6 @@ module_param(vls, int, 0444);
>   static int vgif = true;
>   module_param(vgif, int, 0444);
>   
> -/* enable/disable SEV support */
> -int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> -module_param(sev, int, 0444);
> -
> -/* enable/disable SEV-ES support */
> -int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> -module_param(sev_es, int, 0444);
> -
>   bool __read_mostly dump_invalid_vmcb;
>   module_param(dump_invalid_vmcb, bool, 0644);
>   
> @@ -976,12 +968,7 @@ static __init int svm_hardware_setup(void)
>   kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
>   }
>   
> - if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
> - sev_hardware_setup();
> - } else {
> - sev = false;
> - sev_es = false;
> - }
> + sev_hardware_setup();

I believe the reason for the original if statement was similar to:

  853c110982ea ("KVM: x86: support CONFIG_KVM_AMD=y with 
CONFIG_CRYPTO_DEV_CCP_DD=m")

But with the removal of sev_platform_status() from sev_hardware_setup(),
I think it's ok to call sev_hardware_setup() no matter what now.

Thanks,
Tom

>   
>   svm_adjust_mmio_mask();
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0fe874ae5498..8e169835f52a 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -408,8 +408,6 @@ static inline bool gif_set(struct vcpu_svm *svm)
>   #define MSR_CR3_LONG_MBZ_MASK   0xfff0U
>   #define MSR_INVALID 0xU
>   
> -extern int sev;
> -extern int sev_es;
>   extern bool dump_invalid_vmcb;
>   
>   u32 svm_msrpm_offset(u32 msr);
> 


Re: [PATCH 01/13] KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails

2021-01-11 Thread Tom Lendacky

On 1/8/21 6:47 PM, Sean Christopherson wrote:

Free sev_asid_bitmap if the reclaim bitmap allocation fails, othwerise
it will be leaked as sev_hardware_teardown() frees the bitmaps if and
only if SEV is fully enabled (which obviously isn't the case if SEV
setup fails).


The svm_sev_enabled() function is only based on CONFIG_KVM_AMD_SEV and 
max_sev_asid. So sev_hardware_teardown() should still free everything if 
it was allocated since we never change max_sev_asid, no?


Thanks,
Tom



Fixes: 33af3a7ef9e6 ("KVM: SVM: Reduce WBINVD/DF_FLUSH invocations")
Cc: Tom Lendacky 
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
  arch/x86/kvm/svm/sev.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c8ffdbc81709..0eeb6e1b803d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1274,8 +1274,10 @@ void __init sev_hardware_setup(void)
goto out;
  
  	sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);

-   if (!sev_reclaim_asid_bitmap)
+   if (!sev_reclaim_asid_bitmap) {
+   bitmap_free(sev_asid_bitmap);
goto out;
+   }
  
  	pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);

sev_supported = true;



Re: [PATCH v5.1 27/34] KVM: SVM: Add support for booting APs in an SEV-ES guest

2021-01-07 Thread Tom Lendacky

On 1/7/21 12:13 PM, Paolo Bonzini wrote:

On 04/01/21 21:20, Tom Lendacky wrote:

From: Tom Lendacky 

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
   Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
   support. It is up to the guest to transfer control of the AP to the
   proper location.

Subsequent AP boot:
   KVM will expect to receive an AP Reset Hold exit event indicating that
   the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
   awaken it. When the AP Reset Hold exit event is received, KVM will place
   the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
   sequence, KVM will make the vCPU runnable. It is again up to the guest
   to then transfer control of the AP to the proper location.

   To differentiate between an actual HLT and an AP Reset Hold, a new MP
   state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
   placed in upon receiving the AP Reset Hold exit event. Additionally, to
   communicate the AP Reset Hold exit event up to userspace (if needed), a
   new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky 


Queued, thanks.


Thanks, Paolo!

Tom



Paolo



Re: [PATCH v3 1/3] KVM: SVM: use vmsave/vmload for saving/restoring additional host state

2021-01-07 Thread Tom Lendacky




On 1/7/21 9:32 AM, Tom Lendacky wrote:

On 1/5/21 11:20 AM, Sean Christopherson wrote:

On Tue, Jan 05, 2021, Michael Roth wrote:
@@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct 
kvm_vcpu *vcpu,

  if (sev_es_guest(svm->vcpu.kvm)) {
  __svm_sev_es_vcpu_run(svm->vmcb_pa);
  } else {
-    __svm_vcpu_run(svm->vmcb_pa, (unsigned long 
*)&svm->vcpu.arch.regs);

-
-#ifdef CONFIG_X86_64
-    native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
-#else
-    loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
-    loadsegment(gs, svm->host.gs);
-#endif
-#endif
+    __svm_vcpu_run(svm->vmcb_pa, (unsigned long 
*)&svm->vcpu.arch.regs,

+   page_to_phys(per_cpu(svm_data,
+    vcpu->cpu)->save_area));


Does this need to use __sme_page_pa()?


Yes, it should now. The SEV-ES support added the SME encryption bit to the 
MSR_VM_HSAVE_PA MSR, so we need to be consistent in how the data is read 
and written.


Oh, and also in svm_vcpu_load().

Thanks,
Tom


 > Thanks,
Tom




  }
  /*


...


diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 6feb8c08f45a..89f4e8e7bf0e 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -33,6 +33,7 @@
   * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
   * @vmcb_pa:    unsigned long
   * @regs:    unsigned long * (to guest registers)
+ * @hostsa_pa:    unsigned long
   */
  SYM_FUNC_START(__svm_vcpu_run)
  push %_ASM_BP
@@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run)
  #endif
  push %_ASM_BX
+    /* Save @hostsa_pa */
+    push %_ASM_ARG3
+
  /* Save @regs. */
  push %_ASM_ARG2
@@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run)
  xor %r15d, %r15d
  #endif
+    /* "POP" @hostsa_pa to RAX. */
+    pop %_ASM_AX
+
+    /* Restore host user state and FS/GS base */
+    vmload %_ASM_AX


This VMLOAD needs the "handle fault on reboot" goo.  Seeing the code, I 
think
I'd prefer to handle this in C code, especially if Paolo takes the 
svm_ops.h

patch[*].  Actually, I think with that patch it'd make sense to move the
existing VMSAVE+VMLOAD for the guest into svm.c, too.  And completely 
unrelated,
the fault handling in svm/vmenter.S can be cleaned up a smidge to 
eliminate the

JMPs.

Paolo, what do you think about me folding these patches into my series 
to do the
above cleanups?  And maybe sending a pull request for the end result?  
(I'd also
like to add on a patch to use the user return MSR mechanism for 
MSR_TSC_AUX).


[*] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20201231002702.2223707-8-seanjc%40google.com&data=04%7C01%7Cthomas.lendacky%40amd.com%7Ca130e2c4b40442b8532108d8b321a57b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456304409010405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6vWmBbFFP0aOaZr31I7WDhpmzL4A%2FY%2BuzvvZrmDHpWI%3D&reserved=0 




+
  pop %_ASM_BX
  #ifdef CONFIG_X86_64
--
2.25.1



Re: [PATCH v3 1/3] KVM: SVM: use vmsave/vmload for saving/restoring additional host state

2021-01-07 Thread Tom Lendacky

On 1/5/21 8:37 AM, Michael Roth wrote:

Using a guest workload which simply issues 'hlt' in a tight loop to
generate VMEXITs, it was observed (on a recent EPYC processor) that a
significant amount of the VMEXIT overhead measured on the host was the
result of MSR reads/writes in svm_vcpu_load/svm_vcpu_put according to
perf:

   67.49%--kvm_arch_vcpu_ioctl_run
   |
   |--23.13%--vcpu_put
   |  kvm_arch_vcpu_put
   |  |
   |  |--21.31%--native_write_msr
   |  |
   |   --1.27%--svm_set_cr4
   |
   |--16.11%--vcpu_load
   |  |
   |   --15.58%--kvm_arch_vcpu_load
   | |
   | |--13.97%--svm_set_cr4
   | |  |
   | |  |--12.64%--native_read_msr

Most of these MSRs relate to 'syscall'/'sysenter' and segment bases, and
can be saved/restored using 'vmsave'/'vmload' instructions rather than
explicit MSR reads/writes. In doing so there is a significant reduction
in the svm_vcpu_load/svm_vcpu_put overhead measured for the above
workload:

   50.92%--kvm_arch_vcpu_ioctl_run
   |
   |--19.28%--disable_nmi_singlestep
   |
   |--13.68%--vcpu_load
   |  kvm_arch_vcpu_load
   |  |
   |  |--9.19%--svm_set_cr4
   |  |  |
   |  |   --6.44%--native_read_msr
   |  |
   |   --3.55%--native_write_msr
   |
   |--6.05%--kvm_inject_nmi
   |--2.80%--kvm_sev_es_mmio_read
   |--2.19%--vcpu_put
   |  |
   |   --1.25%--kvm_arch_vcpu_put
   | native_write_msr

Quantifying this further, if we look at the raw cycle counts for a
normal iteration of the above workload (according to 'rdtscp'),
kvm_arch_vcpu_ioctl_run() takes ~4600 cycles from start to finish with
the current behavior. Using 'vmsave'/'vmload', this is reduced to
~2800 cycles, a savings of 39%.

While this approach doesn't seem to manifest in any noticeable
improvement for more realistic workloads like UnixBench, netperf, and
kernel builds, likely due to their exit paths generally involving IO
with comparatively high latencies, it does improve overall overhead
of KVM_RUN significantly, which may still be noticeable for certain
situations. It also simplifies some aspects of the code.

With this change, explicit save/restore is no longer needed for the
following host MSRs, since they are documented[1] as being part of the
VMCB State Save Area:

   MSR_STAR, MSR_LSTAR, MSR_CSTAR,
   MSR_SYSCALL_MASK, MSR_KERNEL_GS_BASE,
   MSR_IA32_SYSENTER_CS,
   MSR_IA32_SYSENTER_ESP,
   MSR_IA32_SYSENTER_EIP,
   MSR_FS_BASE, MSR_GS_BASE

and only the following MSR needs individual handling in
svm_vcpu_put/svm_vcpu_load:

   MSR_TSC_AUX

We could drop the host_save_user_msrs array/loop and instead handle
MSR read/write of MSR_TSC_AUX directly, but we leave that for now as
a potential follow-up.

Since 'vmsave'/'vmload' also handles the LDTR and FS/GS segment
registers (and associated hidden state)[2], some of the code
previously used to handle this is no longer needed, so we drop it
as well.

The first public release of the SVM spec[3] also documents the same
handling for the host state in question, so we make these changes
unconditionally.

Also worth noting is that we 'vmsave' to the same page that is
subsequently used by 'vmrun' to record some host additional state. This
is okay, since, in accordance with the spec[2], the additional state
written to the page by 'vmrun' does not overwrite any fields written by
'vmsave'. This has also been confirmed through testing (for the above
CPU, at least).

[1] AMD64 Architecture Programmer's Manual, Rev 3.33, Volume 2, Appendix B, 
Table B-2
[2] AMD64 Architecture Programmer's Manual, Rev 3.31, Volume 3, Chapter 4, 
VMSAVE/VMLOAD
[3] Secure Virtual Machine Architecture Reference Manual, Rev 3.01

Suggested-by: Tom Lendacky 
Signed-off-by: Michael Roth 
---
  arch/x86/kvm/svm/svm.c | 36 +++-
  arch/x86/kvm/svm/svm.h | 19 +--
  arch/x86/kvm/svm/vmenter.S | 10 ++
  3 files changed, 18 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 941e5251e13f..7a7e9b7d47a7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1420,16 +1420,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int 
cpu)
if (sev_es_guest(svm->vcpu.kvm)) {
sev_es_vcpu_load(svm, cpu);
} else {
-#ifdef CONFIG_X86_64
-   rdmsrl(MSR_GS_BASE, to

Re: [PATCH v3 1/3] KVM: SVM: use vmsave/vmload for saving/restoring additional host state

2021-01-07 Thread Tom Lendacky

On 1/5/21 11:20 AM, Sean Christopherson wrote:

On Tue, Jan 05, 2021, Michael Roth wrote:

@@ -3703,16 +3688,9 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu 
*vcpu,
if (sev_es_guest(svm->vcpu.kvm)) {
__svm_sev_es_vcpu_run(svm->vmcb_pa);
} else {
-   __svm_vcpu_run(svm->vmcb_pa, (unsigned long 
*)&svm->vcpu.arch.regs);
-
-#ifdef CONFIG_X86_64
-   native_wrmsrl(MSR_GS_BASE, svm->host.gs_base);
-#else
-   loadsegment(fs, svm->host.fs);
-#ifndef CONFIG_X86_32_LAZY_GS
-   loadsegment(gs, svm->host.gs);
-#endif
-#endif
+   __svm_vcpu_run(svm->vmcb_pa, (unsigned long 
*)&svm->vcpu.arch.regs,
+  page_to_phys(per_cpu(svm_data,
+   vcpu->cpu)->save_area));


Does this need to use __sme_page_pa()?


Yes, it should now. The SEV-ES support added the SME encryption bit to the 
MSR_VM_HSAVE_PA MSR, so we need to be consistent in how the data is read 
and written.


Thanks,
Tom




}
  
  	/*


...


diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 6feb8c08f45a..89f4e8e7bf0e 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -33,6 +33,7 @@
   * __svm_vcpu_run - Run a vCPU via a transition to SVM guest mode
   * @vmcb_pa:  unsigned long
   * @regs: unsigned long * (to guest registers)
+ * @hostsa_pa: unsigned long
   */
  SYM_FUNC_START(__svm_vcpu_run)
push %_ASM_BP
@@ -47,6 +48,9 @@ SYM_FUNC_START(__svm_vcpu_run)
  #endif
push %_ASM_BX
  
+	/* Save @hostsa_pa */

+   push %_ASM_ARG3
+
/* Save @regs. */
push %_ASM_ARG2
  
@@ -154,6 +158,12 @@ SYM_FUNC_START(__svm_vcpu_run)

xor %r15d, %r15d
  #endif
  
+	/* "POP" @hostsa_pa to RAX. */

+   pop %_ASM_AX
+
+   /* Restore host user state and FS/GS base */
+   vmload %_ASM_AX


This VMLOAD needs the "handle fault on reboot" goo.  Seeing the code, I think
I'd prefer to handle this in C code, especially if Paolo takes the svm_ops.h
patch[*].  Actually, I think with that patch it'd make sense to move the
existing VMSAVE+VMLOAD for the guest into svm.c, too.  And completely unrelated,
the fault handling in svm/vmenter.S can be cleaned up a smidge to eliminate the
JMPs.

Paolo, what do you think about me folding these patches into my series to do the
above cleanups?  And maybe sending a pull request for the end result?  (I'd also
like to add on a patch to use the user return MSR mechanism for MSR_TSC_AUX).

[*] 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20201231002702.2223707-8-seanjc%40google.com&data=04%7C01%7Cthomas.lendacky%40amd.com%7C5125acb3a3384ee75a5c08d8b19e2888%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637454640159484993%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Q%2F%2B7kxE9pcV%2BelzHbeRpvs8wlQGQkirKUPg7fBP3QbU%3D&reserved=0


+
pop %_ASM_BX
  
  #ifdef CONFIG_X86_64

--
2.25.1



[PATCH v5.1 27/34] KVM: SVM: Add support for booting APs in an SEV-ES guest

2021-01-04 Thread Tom Lendacky
From: Tom Lendacky 

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
  Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
  support. It is up to the guest to transfer control of the AP to the
  proper location.

Subsequent AP boot:
  KVM will expect to receive an AP Reset Hold exit event indicating that
  the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
  awaken it. When the AP Reset Hold exit event is received, KVM will place
  the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
  sequence, KVM will make the vCPU runnable. It is again up to the guest
  to then transfer control of the AP to the proper location.

  To differentiate between an actual HLT and an AP Reset Hold, a new MP
  state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
  placed in upon receiving the AP Reset Hold exit event. Additionally, to
  communicate the AP Reset Hold exit event up to userspace (if needed), a
  new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/kvm_host.h |  3 +++
 arch/x86/kvm/lapic.c|  2 +-
 arch/x86/kvm/svm/sev.c  | 22 ++
 arch/x86/kvm/svm/svm.c  | 10 ++
 arch/x86/kvm/svm/svm.h  |  2 ++
 arch/x86/kvm/vmx/vmx.c  |  2 ++
 arch/x86/kvm/x86.c  | 26 +-
 include/uapi/linux/kvm.h|  2 ++
 8 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 39707e72b062..23d7b203c060 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1287,6 +1287,8 @@ struct kvm_x86_ops {
void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
+
+   void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 };
 
 struct kvm_x86_nested_ops {
@@ -1468,6 +1470,7 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, 
unsigned short port, int in);
 int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6a87623aa578..a2f08ed777d8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
-   kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
+   kvm_x86_ops.vcpu_deliver_sipi_vector(vcpu, sipi_vector);
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
}
}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e57847ff8bd2..a08cbc04cb4d 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1563,6 +1563,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
break;
case SVM_VMGEXIT_NMI_COMPLETE:
+   case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
break;
@@ -1888,6 +1889,9 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_NMI_COMPLETE:
ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
break;
+   case SVM_VMGEXIT_AP_HLT_LOOP:
+   ret = kvm_emulate_ap_reset_hold(&svm->vcpu);
+   break;
case SVM_VMGEXIT_AP_JUMP_TABLE: {
struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
 
@@ -2040,3 +2044,21 @@ void sev_es_v

Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

2021-01-04 Thread Tom Lendacky

On 12/15/20 2:25 PM, Tom Lendacky wrote:

On 12/14/20 1:46 PM, Tom Lendacky wrote:

On 12/14/20 10:03 AM, Paolo Bonzini wrote:

On 10/12/20 18:10, Tom Lendacky wrote:

From: Tom Lendacky 

+case SVM_VMGEXIT_AP_HLT_LOOP:
+svm->ap_hlt_loop = true;


This value needs to be communicated to userspace.  Let's get this right
from the beginning and use a new KVM_MP_STATE_* value instead (perhaps
reuse KVM_MP_STATE_STOPPED but for x86 #define it as
KVM_MP_STATE_AP_HOLD_RECEIVED?).


Ok, let me look into this.


Paolo, is this something along the lines of what you were thinking, or am
I off base? I created kvm_emulate_ap_reset_hold() to keep the code
consolidated and remove the duplication, but can easily make those changes
local to sev.c. I'd also like to rename SVM_VMGEXIT_AP_HLT_LOOP to
SVM_VMGEXIT_AP_RESET_HOLD to more closely match the GHBC document, but
that can be done later (if possible, since it is already part of the uapi
include file).


Paolo, a quick ping after the holidays as to whether this is the approach 
you were thinking. I think there are a couple of places in x86.c to update 
(vcpu_block() and kvm_arch_vcpu_ioctl_get_mpstate()), also.


Thanks,
Tom



Thanks,
Tom

---
KVM: SVM: Add support for booting APs for an SEV-ES guest

From: Tom Lendacky 

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
   Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
   support. It is up to the guest to transfer control of the AP to the
   proper location.

Subsequent AP boot:
   KVM will expect to receive an AP Reset Hold exit event indicating that
   the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
   awaken it. When the AP Reset Hold exit event is received, KVM will place
   the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
   sequence, KVM will make the vCPU runnable. It is again up to the guest
   to then transfer control of the AP to the proper location.

   To differentiate between an actual HLT and an AP Reset Hold, a new MP
   state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
   placed in upon receiving the AP Reset Hold exit event. Additionally, to
   communicate the AP Reset Hold exit event up to userspace (if needed), a
   new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky 
---
  arch/x86/include/asm/kvm_host.h |3 +++
  arch/x86/kvm/lapic.c|2 +-
  arch/x86/kvm/svm/sev.c  |   22 ++
  arch/x86/kvm/svm/svm.c  |   10 ++
  arch/x86/kvm/svm/svm.h  |2 ++
  arch/x86/kvm/vmx/vmx.c  |2 ++
  arch/x86/kvm/x86.c  |   20 +---
  include/uapi/linux/kvm.h|2 ++
  8 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 39707e72b062..23d7b203c060 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1287,6 +1287,8 @@ struct kvm_x86_ops {
void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
+
+   void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
  };
  
  struct kvm_x86_nested_ops {

@@ -1468,6 +1470,7 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, 
unsigned short port, int in);
  int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
  int kvm_emulate_halt(struct kvm_vcpu *vcpu);
  int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
  int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
  
  void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6a87623aa578..a2f08ed777d8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
/* evaluate pending_events before 

Re: [PATCH v2 2/2] KVM: SVM: Add support for Virtual SPEC_CTRL

2021-01-04 Thread Tom Lendacky

On 12/22/20 4:31 PM, Babu Moger wrote:

Newer AMD processors have a feature to virtualize the use of the
SPEC_CTRL MSR. A hypervisor may wish to impose speculation controls on
guest execution or a guest may want to impose its own speculation
controls. Therefore, the processor implements both host and guest
versions of SPEC_CTRL. Presence of this feature is indicated via CPUID
function 0x800A_EDX[20]: GuestSpecCtrl.  Hypervisors are not
required to enable this feature since it is automatically enabled on
processors that support it.

When in host mode, the host SPEC_CTRL value is in effect and writes
update only the host version of SPEC_CTRL. On a VMRUN, the processor
loads the guest version of SPEC_CTRL from the VMCB. When the guest
writes SPEC_CTRL, only the guest version is updated. On a VMEXIT,
the guest version is saved into the VMCB and the processor returns
to only using the host SPEC_CTRL for speculation control. The guest
SPEC_CTRL is located at offset 0x2E0 in the VMCB.


With the SEV-ES hypervisor support now in the tree, this will need to add 
support in sev_es_sync_vmsa() to put the initial svm->spec_ctrl value in 
the SEV-ES VMSA.




The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed
with the hypervisor SPEC_CTRL setting. This allows the hypervisor to
ensure a minimum SPEC_CTRL if desired.

This support also fixes an issue where a guest may sometimes see an
inconsistent value for the SPEC_CTRL MSR on processors that support
this feature. With the current SPEC_CTRL support, the first write to
SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL
MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it
will be 0x0, instead of the actual expected value. There isn’t a
security concern here, because the host SPEC_CTRL value is or’ed with
the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value.
KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL
MSR just before the VMRUN, so it will always have the actual value
even though it doesn’t appear that way in the guest. The guest will
only see the proper value for the SPEC_CTRL register if the guest was
to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL
support, the MSR interception of SPEC_CTRL is disabled during
vmcb_init, so this will no longer be an issue.

Signed-off-by: Babu Moger 
---
  arch/x86/include/asm/svm.h |4 +++-
  arch/x86/kvm/svm/svm.c |   29 +
  2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 71d630bb5e08..753b25db427c 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -248,12 +248,14 @@ struct vmcb_save_area {
u64 br_to;
u64 last_excp_from;
u64 last_excp_to;
+   u8 reserved_12[72];
+   u32 spec_ctrl;  /* Guest version of SPEC_CTRL at 0x2E0 */
  
  	/*

 * The following part of the save area is valid only for
 * SEV-ES guests when referenced through the GHCB.
 */
-   u8 reserved_7[104];
+   u8 reserved_7[28];
u64 reserved_8; /* rax already available at 0x01f8 */
u64 rcx;
u64 rdx;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 79b3a564f1c9..6d3db3e8cdfe 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1230,6 +1230,16 @@ static void init_vmcb(struct vcpu_svm *svm)
  
  	svm_check_invpcid(svm);
  
+	/*

+* If the host supports V_SPEC_CTRL then disable the interception
+* of MSR_IA32_SPEC_CTRL.
+*/
+   if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)) {
+   save->spec_ctrl = svm->spec_ctrl;
+   set_msr_interception(&svm->vcpu, svm->msrpm,
+MSR_IA32_SPEC_CTRL, 1, 1);
+   }
+


I thought Jim's feedback was to keep the support as originally coded with 
respect to the MSR intercept and only update the svm_vcpu_run() to either 
read/write the MSR or read/write the save area value based on the feature. 
So I think this can be removed.



if (kvm_vcpu_apicv_active(&svm->vcpu))
avic_init_vmcb(svm);
  
@@ -2549,7 +2559,10 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

!guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD))
return 1;
  
-		msr_info->data = svm->spec_ctrl;

+   if (static_cpu_has(X86_FEATURE_V_SPEC_CTRL))
+   msr_info->data = svm->vmcb->save.spec_ctrl;
+   else
+   msr_info->data = svm->spec_ctrl;


This is unneeded since svm->vmcb->save.spec_ctrl is saved in 
svm->spec_ctrl on VMEXIT.



break;
case MSR_AMD64_VIRT_SPEC_CTRL:
if (!msr_info->host_initiated &&
@@ -2640,6 +2653,8 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
return 1;
  
  		svm->spe

Re: [PATCH v1 00/19] x86/insn: Add an insn_decode() API

2020-12-27 Thread Tom Lendacky

On 12/23/20 11:42 AM, Borislav Petkov wrote:

From: Borislav Petkov 

Hi,

here's v1 with the requested change to return -ENODATA on short input to
the decoder. The rest is as in the previous submission.

Only lightly tested.

Thx.

changelog:
==

That is, provided this is how we want to control what the instruction
decoder decodes - by supplying the length of the buffer it should look
at.

We could also say that probably there should be a way to say "decode
only the first insn in the buffer and ignore the rest". That is all up
to the use cases so I'm looking for suggestions here.


That's the way it works today, right?  One instruction, no matter the 
length of the buffer (assuming the length is long enough to include a full 
instruction)?


Because the callers of the decode may rely on parsing only the current 
instruction (like SEV-ES), it should probably default to that (although 
most of the call points are being updated so you could supply a boolean to 
indicate one vs many instructions). The caller doesn't necessarily know 
the length of the instruction, so it may provide a buffer of max 
instruction length.


Also, if you want to parse more than one instruction at a time, wouldn't 
you need to maintain register context within the parsing, I don't think 
that is done today. Or you could chain together some instruction contexts 
to identify each instruction that was parsed?


Thanks,
Tom





Re: [PATCH v5 34/34] KVM: SVM: Provide support to launch and run an SEV-ES guest

2020-12-16 Thread Tom Lendacky
On 12/10/20 11:10 AM, Tom Lendacky wrote:
> From: Tom Lendacky 
>
> An SEV-ES guest is started by invoking a new SEV initialization ioctl,
> KVM_SEV_ES_INIT. This identifies the guest as an SEV-ES guest, which is
> used to drive the appropriate ASID allocation, VMSA encryption, etc.
>
> Before being able to run an SEV-ES vCPU, the vCPU VMSA must be encrypted
> and measured. This is done using the LAUNCH_UPDATE_VMSA command after all
> calls to LAUNCH_UPDATE_DATA have been performed, but before LAUNCH_MEASURE
> has been performed. In order to establish the encrypted VMSA, the current
> (traditional) VMSA and the GPRs are synced to the page that will hold the
> encrypted VMSA and then LAUNCH_UPDATE_VMSA is invoked. The vCPU is then
> marked as having protected guest state.
>
> Signed-off-by: Tom Lendacky 
> ---
> +
> + /* Sync registgers */
> + save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX];
> + save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX];
> + save->rcx = svm->vcpu.arch.regs[VCPU_REGS_RCX];
> + save->rdx = svm->vcpu.arch.regs[VCPU_REGS_RDX];
> + save->rsp = svm->vcpu.arch.regs[VCPU_REGS_RSP];
> + save->rbp = svm->vcpu.arch.regs[VCPU_REGS_RBP];
> + save->rsi = svm->vcpu.arch.regs[VCPU_REGS_RSI];
> + save->rdi = svm->vcpu.arch.regs[VCPU_REGS_RDI];
> + save->r8  = svm->vcpu.arch.regs[VCPU_REGS_R8];
> + save->r9  = svm->vcpu.arch.regs[VCPU_REGS_R9];
> + save->r10 = svm->vcpu.arch.regs[VCPU_REGS_R10];
> + save->r11 = svm->vcpu.arch.regs[VCPU_REGS_R11];
> + save->r12 = svm->vcpu.arch.regs[VCPU_REGS_R12];
> + save->r13 = svm->vcpu.arch.regs[VCPU_REGS_R13];
> + save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14];
> + save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15];
> + save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP];
> +

Paolo, I just noticed that a 32-bit build will fail because of R8-R15
references, sorry about that (I'm kind of surprised krobot hasn't
complained). This should take care of it:

---
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4045de7f8f8b..84b3ee15f4ec 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -529,6 +529,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->rbp = svm->vcpu.arch.regs[VCPU_REGS_RBP];
save->rsi = svm->vcpu.arch.regs[VCPU_REGS_RSI];
save->rdi = svm->vcpu.arch.regs[VCPU_REGS_RDI];
+#ifdef X86_64
save->r8  = svm->vcpu.arch.regs[VCPU_REGS_R8];
save->r9  = svm->vcpu.arch.regs[VCPU_REGS_R9];
save->r10 = svm->vcpu.arch.regs[VCPU_REGS_R10];
@@ -537,6 +538,7 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->r13 = svm->vcpu.arch.regs[VCPU_REGS_R13];
save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14];
save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15];
+#endif
save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP];

/* Sync some non-GPR registers before encrypting */


Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

2020-12-15 Thread Tom Lendacky
On 12/14/20 1:46 PM, Tom Lendacky wrote:
> On 12/14/20 10:03 AM, Paolo Bonzini wrote:
>> On 10/12/20 18:10, Tom Lendacky wrote:
>>> From: Tom Lendacky 
>>>
>>> +case SVM_VMGEXIT_AP_HLT_LOOP:
>>> +svm->ap_hlt_loop = true;
>>
>> This value needs to be communicated to userspace.  Let's get this right
>> from the beginning and use a new KVM_MP_STATE_* value instead (perhaps
>> reuse KVM_MP_STATE_STOPPED but for x86 #define it as
>> KVM_MP_STATE_AP_HOLD_RECEIVED?).
> 
> Ok, let me look into this.

Paolo, is this something along the lines of what you were thinking, or am
I off base? I created kvm_emulate_ap_reset_hold() to keep the code
consolidated and remove the duplication, but can easily make those changes
local to sev.c. I'd also like to rename SVM_VMGEXIT_AP_HLT_LOOP to
SVM_VMGEXIT_AP_RESET_HOLD to more closely match the GHBC document, but
that can be done later (if possible, since it is already part of the uapi
include file).

Thanks,
Tom

---
KVM: SVM: Add support for booting APs for an SEV-ES guest

From: Tom Lendacky 

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
  Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
  support. It is up to the guest to transfer control of the AP to the
  proper location.

Subsequent AP boot:
  KVM will expect to receive an AP Reset Hold exit event indicating that
  the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
  awaken it. When the AP Reset Hold exit event is received, KVM will place
  the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
  sequence, KVM will make the vCPU runnable. It is again up to the guest
  to then transfer control of the AP to the proper location.

  To differentiate between an actual HLT and an AP Reset Hold, a new MP
  state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
  placed in upon receiving the AP Reset Hold exit event. Additionally, to
  communicate the AP Reset Hold exit event up to userspace (if needed), a
  new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/kvm_host.h |3 +++
 arch/x86/kvm/lapic.c|2 +-
 arch/x86/kvm/svm/sev.c  |   22 ++
 arch/x86/kvm/svm/svm.c  |   10 ++
 arch/x86/kvm/svm/svm.h  |2 ++
 arch/x86/kvm/vmx/vmx.c  |2 ++
 arch/x86/kvm/x86.c  |   20 +---
 include/uapi/linux/kvm.h|2 ++
 8 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 39707e72b062..23d7b203c060 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1287,6 +1287,8 @@ struct kvm_x86_ops {
void (*migrate_timers)(struct kvm_vcpu *vcpu);
void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
+
+   void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
 };
 
 struct kvm_x86_nested_ops {
@@ -1468,6 +1470,7 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, 
unsigned short port, int in);
 int kvm_emulate_cpuid(struct kvm_vcpu *vcpu);
 int kvm_emulate_halt(struct kvm_vcpu *vcpu);
 int kvm_vcpu_halt(struct kvm_vcpu *vcpu);
+int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu);
 int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu);
 
 void kvm_get_segment(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6a87623aa578..a2f08ed777d8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2898,7 +2898,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
/* evaluate pending_events before reading the vector */
smp_rmb();
sipi_vector = apic->sipi_vector;
-   kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
+ 

Re: [PATCH v5 00/34] SEV-ES hypervisor support

2020-12-15 Thread Tom Lendacky
On 12/14/20 12:13 PM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> This patch series provides support for running SEV-ES guests under KVM.
>>
> 
> I'm queuing everything except patch 27, there's time to include it later
> in 5.11.
> 
> Regarding MSRs, take a look at the series I'm sending shortly (or perhaps
> in a couple hours).  For now I'll keep it in kvm/queue, but the plan is to
> get acks quickly and/or just include it in 5.11.  Please try the kvm/queue
> branch to see if I screwed up anything.

I pulled and built kvm/queue and was able to launch a single vCPU SEV-ES
guest through OVMF and part way into the kernel before I hit an error. The
kernel tries to get the AP jump table address (which was part of patch
27). If I apply the following patch (just the jump table support from
patch 27), I can successfully boot a single vCPU SEV-ES guest:

KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting

From: Tom Lendacky 

The GHCB specification requires the hypervisor to save the address of an
AP Jump Table so that, for example, vCPUs that have been parked by UEFI
can be started by the OS. Provide support for the AP Jump Table set/get
exit code.

Signed-off-by: Tom Lendacky 
---
 arch/x86/kvm/svm/sev.c |   28 
 arch/x86/kvm/svm/svm.h |1 +
 2 files changed, 29 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 6eb097714d43..8b5ef0fe4490 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+#include 
+
 #include "x86.h"
 #include "svm.h"
 #include "cpuid.h"
@@ -1559,6 +1561,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
goto vmgexit_err;
break;
case SVM_VMGEXIT_NMI_COMPLETE:
+   case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
break;
default:
@@ -1883,6 +1886,31 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_NMI_COMPLETE:
ret = svm_invoke_exit_handler(svm, SVM_EXIT_IRET);
break;
+   case SVM_VMGEXIT_AP_JUMP_TABLE: {
+   struct kvm_sev_info *sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+   switch (control->exit_info_1) {
+   case 0:
+   /* Set AP jump table address */
+   sev->ap_jump_table = control->exit_info_2;
+   break;
+   case 1:
+   /* Get AP jump table address */
+   ghcb_set_sw_exit_info_2(ghcb, sev->ap_jump_table);
+   break;
+   default:
+   pr_err("svm: vmgexit: unsupported AP jump table request 
- exit_info_1=%#llx\n",
+  control->exit_info_1);
+   ghcb_set_sw_exit_info_1(ghcb, 1);
+   ghcb_set_sw_exit_info_2(ghcb,
+   X86_TRAP_UD |
+   SVM_EVTINJ_TYPE_EXEPT |
+   SVM_EVTINJ_VALID);
+   }
+
+   ret = 1;
+   break;
+   }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(&svm->vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, 
exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a5067f776ce0..5431e6335e2e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -78,6 +78,7 @@ struct kvm_sev_info {
int fd; /* SEV device fd */
unsigned long pages_locked; /* Number of pages locked */
struct list_head regions_list;  /* List of registered regions */
+   u64 ap_jump_table;  /* SEV-ES AP Jump Table address */
 };
 
 struct kvm_svm {


Re: [PATCH 3/3] KVM: x86: introduce complete_emulated_msr callback

2020-12-14 Thread Tom Lendacky
On 12/14/20 12:32 PM, Paolo Bonzini wrote:
> This will be used by SEV-ES to inject MSR failure via the GHCB.
> 
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Tom Lendacky 

(Changed Sean's email on this reply, but missed the others...)

> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/svm/svm.c  | 1 +
>  arch/x86/kvm/vmx/vmx.c  | 1 +
>  arch/x86/kvm/x86.c  | 8 
>  4 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8cf6b0493d49..18aa15e6fadd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1285,6 +1285,7 @@ struct kvm_x86_ops {
>  
>   void (*migrate_timers)(struct kvm_vcpu *vcpu);
>   void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
> + int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
>  };
>  
>  struct kvm_x86_nested_ops {
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 801e0a641258..4067d511be08 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4306,6 +4306,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>   .apic_init_signal_blocked = svm_apic_init_signal_blocked,
>  
>   .msr_filter_changed = svm_msr_filter_changed,
> + .complete_emulated_msr = kvm_complete_insn_gp,
>  };
>  
>  static struct kvm_x86_init_ops svm_init_ops __initdata = {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 849be2a9f260..55fa51c0cd9d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7701,6 +7701,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
>   .migrate_timers = vmx_migrate_timers,
>  
>   .msr_filter_changed = vmx_msr_filter_changed,
> + .complete_emulated_msr = kvm_complete_insn_gp,
>   .cpu_dirty_log_size = vmx_cpu_dirty_log_size,
>  };
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2f1bc52e70c0..6c4482b97c91 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1642,12 +1642,12 @@ static int complete_emulated_rdmsr(struct kvm_vcpu 
> *vcpu)
>   kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
>   }
>  
> - return kvm_complete_insn_gp(vcpu, err);
> + return kvm_x86_ops.complete_emulated_msr(vcpu, err);
>  }
>  
>  static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
>  {
> - return kvm_complete_insn_gp(vcpu, vcpu->run->msr.error);
> + return kvm_x86_ops.complete_emulated_msr(vcpu, vcpu->run->msr.error);
>  }
>  
>  static u64 kvm_msr_reason(int r)
> @@ -1720,7 +1720,7 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
>   trace_kvm_msr_read_ex(ecx);
>   }
>  
> - return kvm_complete_insn_gp(vcpu, r);
> + return kvm_x86_ops.complete_emulated_msr(vcpu, r);
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr);
>  
> @@ -1747,7 +1747,7 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
>   else
>   trace_kvm_msr_write_ex(ecx, data);
>  
> - return kvm_complete_insn_gp(vcpu, r);
> + return kvm_x86_ops.complete_emulated_msr(vcpu, r);
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
>  
> 


Re: [PATCH 2/3] KVM: x86: use kvm_complete_insn_gp in emulating RDMSR/WRMSR

2020-12-14 Thread Tom Lendacky
On 12/14/20 12:32 PM, Paolo Bonzini wrote:
> Simplify the four functions that handle {kernel,user} {rd,wr}msr, there
> is still some repetition between the two instances of rdmsr but the
> whole business of calling kvm_inject_gp and kvm_skip_emulated_instruction
> can be unified nicely.
> 
> Because complete_emulated_wrmsr now becomes essentially a call to
> kvm_complete_insn_gp, remove complete_emulated_msr.
> 
> Signed-off-by: Paolo Bonzini 

Just two minor nits below.

Reviewed-by: Tom Lendacky 

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a3fdc16cfd6f..2f1bc52e70c0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1634,27 +1634,20 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 
> data)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_msr);
>  
>  
>   /* MSR read failed? Inject a #GP */

This comment isn't accurate any more, maybe just delete it?

> - if (r) {
> + if (!r) {
> + trace_kvm_msr_read(ecx, data);
> +
> + kvm_rax_write(vcpu, data & -1u);
> + kvm_rdx_write(vcpu, (data >> 32) & -1u);
> + } else {
>   trace_kvm_msr_read_ex(ecx);
> - kvm_inject_gp(vcpu, 0);
> - return 1;
>   }
>  
> - trace_kvm_msr_read(ecx, data);
> -
> - kvm_rax_write(vcpu, data & -1u);
> - kvm_rdx_write(vcpu, (data >> 32) & -1u);
> - return kvm_skip_emulated_instruction(vcpu);
> + return kvm_complete_insn_gp(vcpu, r);
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr);
>  
> @@ -1750,14 +1742,12 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
>   return r;
>  
>   /* MSR write failed? Inject a #GP */

Ditto on this comment.

Thanks,
Tom

> - if (r > 0) {
> + if (!r)
> + trace_kvm_msr_write(ecx, data);
> + else
>   trace_kvm_msr_write_ex(ecx, data);
> - kvm_inject_gp(vcpu, 0);
> - return 1;
> - }
>  
> - trace_kvm_msr_write(ecx, data);
> - return kvm_skip_emulated_instruction(vcpu);
> + return kvm_complete_insn_gp(vcpu, r);
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
>  
> 


Re: [PATCH 1/3] KVM: x86: remove bogus #GP injection

2020-12-14 Thread Tom Lendacky
On 12/14/20 12:32 PM, Paolo Bonzini wrote:
> There is no need to inject a #GP from kvm_mtrr_set_msr, kvm_emulate_wrmsr will
> handle it.
> 
> Signed-off-by: Paolo Bonzini 

Reviewed-by: Tom Lendacky 

> ---
>  arch/x86/kvm/mtrr.c | 6 +-
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
> index 7f0059aa30e1..f472fdb6ae7e 100644
> --- a/arch/x86/kvm/mtrr.c
> +++ b/arch/x86/kvm/mtrr.c
> @@ -84,12 +84,8 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 
> data)
>   } else
>   /* MTRR mask */
>   mask |= 0x7ff;
> - if (data & mask) {
> - kvm_inject_gp(vcpu, 0);
> - return false;
> - }
>  
> - return true;
> + return (data & mask) == 0;
>  }
>  EXPORT_SYMBOL_GPL(kvm_mtrr_valid);
>  
> 


Re: [PATCH v5 27/34] KVM: SVM: Add support for booting APs for an SEV-ES guest

2020-12-14 Thread Tom Lendacky
On 12/14/20 10:03 AM, Paolo Bonzini wrote:
> On 10/12/20 18:10, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
>> where the guest vCPU register state is updated and then the vCPU is VMRUN
>> to begin execution of the AP. For an SEV-ES guest, this won't work because
>> the guest register state is encrypted.
>>
>> Following the GHCB specification, the hypervisor must not alter the guest
>> register state, so KVM must track an AP/vCPU boot. Should the guest want
>> to park the AP, it must use the AP Reset Hold exit event in place of, for
>> example, a HLT loop.
>>
>> First AP boot (first INIT-SIPI-SIPI sequence):
>>    Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
>>    support. It is up to the guest to transfer control of the AP to the
>>    proper location.
>>
>> Subsequent AP boot:
>>    KVM will expect to receive an AP Reset Hold exit event indicating that
>>    the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
>>    awaken it. When the AP Reset Hold exit event is received, KVM will place
>>    the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
>>    sequence, KVM will make the vCPU runnable. It is again up to the guest
>>    to then transfer control of the AP to the proper location.
>>
>> The GHCB specification also requires the hypervisor to save the address of
>> an AP Jump Table so that, for example, vCPUs that have been parked by UEFI
>> can be started by the OS. Provide support for the AP Jump Table set/get
>> exit code.
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>   arch/x86/include/asm/kvm_host.h |  2 ++
>>   arch/x86/kvm/svm/sev.c  | 50 +
>>   arch/x86/kvm/svm/svm.c  |  7 +
>>   arch/x86/kvm/svm/svm.h  |  3 ++
>>   arch/x86/kvm/x86.c  |  9 ++
>>   5 files changed, 71 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h
>> b/arch/x86/include/asm/kvm_host.h
>> index 048b08437c33..60a3b9d33407 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1286,6 +1286,8 @@ struct kvm_x86_ops {
>>     void (*migrate_timers)(struct kvm_vcpu *vcpu);
>>   void (*msr_filter_changed)(struct kvm_vcpu *vcpu);
>> +
>> +    void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
>>   };
>>     struct kvm_x86_nested_ops {
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index a7531de760b5..b47285384b1f 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -17,6 +17,8 @@
>>   #include 
>>   #include 
>>   +#include 
>> +
>>   #include "x86.h"
>>   #include "svm.h"
>>   #include "cpuid.h"
>> @@ -1449,6 +1451,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm
>> *svm)
>>   if (!ghcb_sw_scratch_is_valid(ghcb))
>>   goto vmgexit_err;
>>   break;
>> +    case SVM_VMGEXIT_AP_HLT_LOOP:
>> +    case SVM_VMGEXIT_AP_JUMP_TABLE:
>>   case SVM_VMGEXIT_UNSUPPORTED_EVENT:
>>   break;
>>   default:
>> @@ -1770,6 +1774,35 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
>>   control->exit_info_2,
>>   svm->ghcb_sa);
>>   break;
>> +    case SVM_VMGEXIT_AP_HLT_LOOP:
>> +    svm->ap_hlt_loop = true;
> 
> This value needs to be communicated to userspace.  Let's get this right
> from the beginning and use a new KVM_MP_STATE_* value instead (perhaps
> reuse KVM_MP_STATE_STOPPED but for x86 #define it as
> KVM_MP_STATE_AP_HOLD_RECEIVED?).

Ok, let me look into this.

> 
>> @@ -68,6 +68,7 @@ struct kvm_sev_info {
>>  int fd;    /* SEV device fd */
>>  unsigned long pages_locked; /* Number of pages locked */
>>  struct list_head regions_list;  /* List of registered regions */
>> +    u64 ap_jump_table;    /* SEV-ES AP Jump Table address */
> 
> Do you have any plans for migration of this value?  How does the guest
> ensure that the hypervisor does not screw with it?

I'll be sure that this is part of the SEV-ES live migration support.

For SEV-ES, we can't guarantee that the hypervisor doesn't screw with it.
This is something that SEV-SNP will be able to address.

Thanks,
Tom

> 
> Paolo
> 


Re: [PATCH v5 16/34] KVM: SVM: Add support for SEV-ES GHCB MSR protocol function 0x100

2020-12-14 Thread Tom Lendacky



On 12/14/20 9:49 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> +    pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
>> +    reason_set, reason_code);
>> +    fallthrough;
>> +    }
> 
> It would be nice to send these to userspace instead as a follow-up.

I'll look into doing that.

Thanks,
Tom

> 
> Paolo
> 


Re: [PATCH v5 12/34] KVM: SVM: Add initial support for a VMGEXIT VMEXIT

2020-12-14 Thread Tom Lendacky
On 12/14/20 9:45 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> @@ -3184,6 +3186,8 @@ static int svm_invoke_exit_handler(struct vcpu_svm
>> *svm, u64 exit_code)
>>   return halt_interception(svm);
>>   else if (exit_code == SVM_EXIT_NPF)
>>   return npf_interception(svm);
>> +    else if (exit_code == SVM_EXIT_VMGEXIT)
>> +    return sev_handle_vmgexit(svm);
> 
> Are these common enough to warrant putting them in this short list?

A VMGEXIT exit occurs for any of the listed NAE events in the GHCB
specification (e.g. CPUID, RDMSR/WRMSR, MMIO, port IO, etc.) if those
events are being intercepted (or triggered in the case of MMIO). It will
depend on what is considered common. Since SVM_EXIT_MSR was already in the
list, I figured I should add VMGEXIT.

Thanks,
Tom

> 
> Paolo
> 
>>   #endif
>>   return svm_exit_handlers[exit_code](svm);
>>   }
> 


Re: [PATCH v5 00/34] SEV-ES hypervisor support

2020-12-14 Thread Tom Lendacky
On 12/14/20 12:13 PM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> This patch series provides support for running SEV-ES guests under KVM.
>>
> 
> I'm queuing everything except patch 27, there's time to include it later
> in 5.11.

Thanks, Paolo!

I'll start looking at updating patch 27.

> 
> Regarding MSRs, take a look at the series I'm sending shortly (or perhaps
> in a couple hours).  For now I'll keep it in kvm/queue, but the plan is to
> get acks quickly and/or just include it in 5.11.  Please try the kvm/queue
> branch to see if I screwed up anything.

Ok, I'll take a look at the kvm/queue tree.

Thanks,
Tom

> 
> Paolo
> 


Re: [PATCH v5 08/34] KVM: SVM: Prevent debugging under SEV-ES

2020-12-14 Thread Tom Lendacky
On 12/14/20 9:41 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> Additionally, an SEV-ES guest must only and always intercept DR7 reads and
>> writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for
>> this.
> 
> I cannot see it, where is this documented?

That is documented in the GHCB specification, section 4.5 Debug Register
Support:

https://developer.amd.com/wp-content/resources/56421.pdf

Thanks,
Tom

> 
> Paolo
> 


Re: [PATCH v5 07/34] KVM: SVM: Add required changes to support intercepts under SEV-ES

2020-12-14 Thread Tom Lendacky
On 12/14/20 9:33 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> @@ -2797,7 +2838,27 @@ static int svm_set_msr(struct kvm_vcpu *vcpu,
>> struct msr_data *msr)
>>     static int wrmsr_interception(struct vcpu_svm *svm)
>>   {
>> -    return kvm_emulate_wrmsr(&svm->vcpu);
>> +    u32 ecx;
>> +    u64 data;
>> +
>> +    if (!sev_es_guest(svm->vcpu.kvm))
>> +    return kvm_emulate_wrmsr(&svm->vcpu);
>> +
>> +    ecx = kvm_rcx_read(&svm->vcpu);
>> +    data = kvm_read_edx_eax(&svm->vcpu);
>> +    if (kvm_set_msr(&svm->vcpu, ecx, data)) {
>> +    trace_kvm_msr_write_ex(ecx, data);
>> +    ghcb_set_sw_exit_info_1(svm->ghcb, 1);
>> +    ghcb_set_sw_exit_info_2(svm->ghcb,
>> +    X86_TRAP_GP |
>> +    SVM_EVTINJ_TYPE_EXEPT |
>> +    SVM_EVTINJ_VALID);
>> +    return 1;
>> +    }
>> +
>> +    trace_kvm_msr_write(ecx, data);
>> +
>> +    return kvm_skip_emulated_instruction(&svm->vcpu);
>>   }
>>     static int msr_interception(struct vcpu_svm *svm)
> 
> This code duplication is ugly, and does not work with userspace MSR
> filters too.

Agree and I missed that the userspace MSR support went in.

> 
> But we can instead trap the completion of the MSR read/write to use
> ghcb_set_sw_exit_info_1 instead of kvm_inject_gp, with a callback like
> 
> static int svm_complete_emulated_msr(struct kvm_vcpu *vcpu, int err)
> {
>     if (!sev_es_guest(svm->vcpu.kvm) || !err)
>     return kvm_complete_insn_gp(&svm->vcpu, err);
> 
>     ghcb_set_sw_exit_info_1(svm->ghcb, 1);
>     ghcb_set_sw_exit_info_2(svm->ghcb,
>     X86_TRAP_GP |
>     SVM_EVTINJ_TYPE_EXEPT |
>     SVM_EVTINJ_VALID);
>     return 1;
> }

If we use the kvm_complete_insn_gp() we lose the tracing and it needs to
be able to deal with read completion setting the registers. It also needs
to deal with both kvm_emulate_rdmsr/wrmsr() when not bouncing to
userspace. Let me take a shot at covering all the cases and see what I can
come up with.

I noticed that the userspace completion path doesn't have tracing
invocations, trace_kvm_msr_read/write_ex() or trace_kvm_msr_read/write(),
is that by design?

> 
> 
> ...
> .complete_emulated_msr = svm_complete_emulated_msr,
> 
>> @@ -2827,7 +2888,14 @@ static int interrupt_window_interception(struct
>> vcpu_svm *svm)
>>   static int pause_interception(struct vcpu_svm *svm)
>>   {
>>   struct kvm_vcpu *vcpu = &svm->vcpu;
>> -    bool in_kernel = (svm_get_cpl(vcpu) == 0);
>> +    bool in_kernel;
>> +
>> +    /*
>> + * CPL is not made available for an SEV-ES guest, so just set
>> in_kernel
>> + * to true.
>> + */
>> +    in_kernel = (sev_es_guest(svm->vcpu.kvm)) ? true
>> +  : (svm_get_cpl(vcpu) == 0);
>>     if (!kvm_pause_in_guest(vcpu->kvm))
>>   grow_ple_window(vcpu);
> 
> See below.
> 
>> @@ -3273,6 +3351,13 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>>   struct vcpu_svm *svm = to_svm(vcpu);
>>   struct vmcb *vmcb = svm->vmcb;
>>   +    /*
>> + * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
>> + * bit to determine the state of the IF flag.
>> + */
>> +    if (sev_es_guest(svm->vcpu.kvm))
>> +    return !(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK);
> 
> This seems wrong, you have to take into account SVM_INTERRUPT_SHADOW_MASK
> as well.  Also, even though GIF is not really used by SEV-ES guests, I
> think it's nicer to put this check afterwards.
> 
> That is:
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 4372e45c8f06..2dd9c9698480 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3247,7 +3247,14 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
>  if (!gif_set(svm))
>  return true;
> 
> -    if (is_guest_mode(vcpu)) {
> +    if (sev_es_guest(svm->vcpu.kvm)) {
> +    /*
> + * SEV-ES guests to not expose RFLAGS. Use the VMCB interrupt mask
> + * bit to determine the state of the IF flag.
> + */
> +    if (!(vmcb->control.int_state & SVM_GUEST_INTERRUPT_MASK))
> +    return true;
> +    } else if (is_guest_mode(vcpu)) {
>  /* As long as interrupts are being delivered...  */
>  if ((svm->nested.ctl.int_c

Re: [PATCH v5 02/34] KVM: SVM: Remove the call to sev_platform_status() during setup

2020-12-14 Thread Tom Lendacky
On 12/14/20 6:29 AM, Paolo Bonzini wrote:
> On 10/12/20 18:09, Tom Lendacky wrote:
>> From: Tom Lendacky 
>>
>> When both KVM support and the CCP driver are built into the kernel instead
>> of as modules, KVM initialization can happen before CCP initialization. As
>> a result, sev_platform_status() will return a failure when it is called
>> from sev_hardware_setup(), when this isn't really an error condition.
>>
>> Since sev_platform_status() doesn't need to be called at this time anyway,
>> remove the invocation from sev_hardware_setup().
>>
>> Signed-off-by: Tom Lendacky 
>> ---
>>   arch/x86/kvm/svm/sev.c | 22 +-
>>   1 file changed, 1 insertion(+), 21 deletions(-)
>>
>> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
>> index c0b14106258a..a4ba5476bf42 100644
>> --- a/arch/x86/kvm/svm/sev.c
>> +++ b/arch/x86/kvm/svm/sev.c
>> @@ -1127,9 +1127,6 @@ void sev_vm_destroy(struct kvm *kvm)
>>     int __init sev_hardware_setup(void)
>>   {
>> -    struct sev_user_data_status *status;
>> -    int rc;
>> -
>>   /* Maximum number of encrypted guests supported simultaneously */
>>   max_sev_asid = cpuid_ecx(0x801F);
>>   @@ -1148,26 +1145,9 @@ int __init sev_hardware_setup(void)
>>   if (!sev_reclaim_asid_bitmap)
>>   return 1;
>>   -    status = kmalloc(sizeof(*status), GFP_KERNEL);
>> -    if (!status)
>> -    return 1;
>> -
>> -    /*
>> - * Check SEV platform status.
>> - *
>> - * PLATFORM_STATUS can be called in any state, if we failed to query
>> - * the PLATFORM status then either PSP firmware does not support SEV
>> - * feature or SEV firmware is dead.
>> - */
>> -    rc = sev_platform_status(status, NULL);
>> -    if (rc)
>> -    goto err;
>> -
>>   pr_info("SEV supported\n");
>>   -err:
>> -    kfree(status);
>> -    return rc;
>> +    return 0;
>>   }
>>     void sev_hardware_teardown(void)
>>
> 
> Queued with Cc: stable.
> 
> Note that sev_platform_status now can become static within
> drivers/crypto/ccp/sev-dev.c.

Nice catch. I'll look at doing a follow-on patch to change that.

Thanks,
Tom

> 
> Paolo


[PATCH v5 01/34] x86/cpu: Add VM page flush MSR availablility as a CPUID feature

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

On systems that do not have hardware enforced cache coherency between
encrypted and unencrypted mappings of the same physical page, the
hypervisor can use the VM page flush MSR (0xc001011e) to flush the cache
contents of an SEV guest page. When a small number of pages are being
flushed, this can be used in place of issuing a WBINVD across all CPUs.

CPUID 0x801f_eax[2] is used to determine if the VM page flush MSR is
available. Add a CPUID feature to indicate it is supported and define the
MSR.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/msr-index.h   | 1 +
 arch/x86/kernel/cpu/scattered.c| 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..54df367b3180 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -237,6 +237,7 @@
 #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports 
the VMCALL instruction */
 #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted 
Virtualization - Encrypted State */
+#define X86_FEATURE_VM_PAGE_FLUSH  ( 8*32+21) /* "" VM Page Flush MSR is 
supported */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 972a34d93505..abfc9b0fbd8d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -470,6 +470,7 @@
 #define MSR_AMD64_ICIBSEXTDCTL 0xc001103c
 #define MSR_AMD64_IBSOPDATA4   0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX8 /* includes MSR_AMD64_IBSBRTARGET */
+#define MSR_AMD64_VM_PAGE_FLUSH0xc001011e
 #define MSR_AMD64_SEV_ES_GHCB  0xc0010130
 #define MSR_AMD64_SEV  0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT  0
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 866c9a9bcdee..236924930bf0 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_SEV,  CPUID_EAX,  1, 0x801f, 0 },
{ X86_FEATURE_SEV_ES,   CPUID_EAX,  3, 0x801f, 0 },
{ X86_FEATURE_SME_COHERENT, CPUID_EAX, 10, 0x801f, 0 },
+   { X86_FEATURE_VM_PAGE_FLUSH,CPUID_EAX,  2, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.28.0



[PATCH v5 03/34] KVM: SVM: Add support for SEV-ES capability in KVM

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

Add support to KVM for determining if a system is capable of supporting
SEV-ES as well as determining if a guest is an SEV-ES guest.

Signed-off-by: Tom Lendacky 
---
 arch/x86/kvm/Kconfig   |  3 ++-
 arch/x86/kvm/svm/sev.c | 47 ++
 arch/x86/kvm/svm/svm.c | 20 +-
 arch/x86/kvm/svm/svm.h | 17 ++-
 4 files changed, 66 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f92dfd8ef10d..7ac592664c52 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -100,7 +100,8 @@ config KVM_AMD_SEV
depends on KVM_AMD && X86_64
depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
help
-   Provides support for launching Encrypted VMs on AMD processors.
+ Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
+ with Encrypted State (SEV-ES) on AMD processors.
 
 config KVM_MMU_AUDIT
bool "Audit KVM MMU"
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a4ba5476bf42..9bf5e9dadff5 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -932,7 +932,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
struct kvm_sev_cmd sev_cmd;
int r;
 
-   if (!svm_sev_enabled())
+   if (!svm_sev_enabled() || !sev)
return -ENOTTY;
 
if (!argp)
@@ -1125,29 +1125,58 @@ void sev_vm_destroy(struct kvm *kvm)
sev_asid_free(sev->asid);
 }
 
-int __init sev_hardware_setup(void)
+void __init sev_hardware_setup(void)
 {
+   unsigned int eax, ebx, ecx, edx;
+   bool sev_es_supported = false;
+   bool sev_supported = false;
+
+   /* Does the CPU support SEV? */
+   if (!boot_cpu_has(X86_FEATURE_SEV))
+   goto out;
+
+   /* Retrieve SEV CPUID information */
+   cpuid(0x801f, &eax, &ebx, &ecx, &edx);
+
/* Maximum number of encrypted guests supported simultaneously */
-   max_sev_asid = cpuid_ecx(0x801F);
+   max_sev_asid = ecx;
 
if (!svm_sev_enabled())
-   return 1;
+   goto out;
 
/* Minimum ASID value that should be used for SEV guest */
-   min_sev_asid = cpuid_edx(0x801F);
+   min_sev_asid = edx;
 
/* Initialize SEV ASID bitmaps */
sev_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
if (!sev_asid_bitmap)
-   return 1;
+   goto out;
 
sev_reclaim_asid_bitmap = bitmap_zalloc(max_sev_asid, GFP_KERNEL);
if (!sev_reclaim_asid_bitmap)
-   return 1;
+   goto out;
 
-   pr_info("SEV supported\n");
+   pr_info("SEV supported: %u ASIDs\n", max_sev_asid - min_sev_asid + 1);
+   sev_supported = true;
 
-   return 0;
+   /* SEV-ES support requested? */
+   if (!sev_es)
+   goto out;
+
+   /* Does the CPU support SEV-ES? */
+   if (!boot_cpu_has(X86_FEATURE_SEV_ES))
+   goto out;
+
+   /* Has the system been allocated ASIDs for SEV-ES? */
+   if (min_sev_asid == 1)
+   goto out;
+
+   pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1);
+   sev_es_supported = true;
+
+out:
+   sev = sev_supported;
+   sev_es = sev_es_supported;
 }
 
 void sev_hardware_teardown(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6dc337b9c231..a1ea30c98629 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -187,9 +187,13 @@ static int vgif = true;
 module_param(vgif, int, 0444);
 
 /* enable/disable SEV support */
-static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
 module_param(sev, int, 0444);
 
+/* enable/disable SEV-ES support */
+int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev_es, int, 0444);
+
 static bool __read_mostly dump_invalid_vmcb = 0;
 module_param(dump_invalid_vmcb, bool, 0644);
 
@@ -959,15 +963,11 @@ static __init int svm_hardware_setup(void)
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
}
 
-   if (sev) {
-   if (boot_cpu_has(X86_FEATURE_SEV) &&
-   IS_ENABLED(CONFIG_KVM_AMD_SEV)) {
-   r = sev_hardware_setup();
-   if (r)
-   sev = false;
-   } else {
-   sev = false;
-   }
+   if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
+   sev_hardware_setup();
+   } else {
+   sev = false;
+   sev_es = false;
}
 
svm_adjust_mmio_mask();
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index fdff76eb6ceb..56d950df82e5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm

[PATCH v5 02/34] KVM: SVM: Remove the call to sev_platform_status() during setup

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

When both KVM support and the CCP driver are built into the kernel instead
of as modules, KVM initialization can happen before CCP initialization. As
a result, sev_platform_status() will return a failure when it is called
from sev_hardware_setup(), when this isn't really an error condition.

Since sev_platform_status() doesn't need to be called at this time anyway,
remove the invocation from sev_hardware_setup().

Signed-off-by: Tom Lendacky 
---
 arch/x86/kvm/svm/sev.c | 22 +-
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index c0b14106258a..a4ba5476bf42 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1127,9 +1127,6 @@ void sev_vm_destroy(struct kvm *kvm)
 
 int __init sev_hardware_setup(void)
 {
-   struct sev_user_data_status *status;
-   int rc;
-
/* Maximum number of encrypted guests supported simultaneously */
max_sev_asid = cpuid_ecx(0x801F);
 
@@ -1148,26 +1145,9 @@ int __init sev_hardware_setup(void)
if (!sev_reclaim_asid_bitmap)
return 1;
 
-   status = kmalloc(sizeof(*status), GFP_KERNEL);
-   if (!status)
-   return 1;
-
-   /*
-* Check SEV platform status.
-*
-* PLATFORM_STATUS can be called in any state, if we failed to query
-* the PLATFORM status then either PSP firmware does not support SEV
-* feature or SEV firmware is dead.
-*/
-   rc = sev_platform_status(status, NULL);
-   if (rc)
-   goto err;
-
pr_info("SEV supported\n");
 
-err:
-   kfree(status);
-   return rc;
+   return 0;
 }
 
 void sev_hardware_teardown(void)
-- 
2.28.0



[PATCH v5 04/34] KVM: SVM: Add GHCB accessor functions for retrieving fields

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

Update the GHCB accessor functions to add functions for retrieve GHCB
fields by name. Update existing code to use the new accessor functions.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/svm.h   | 10 ++
 arch/x86/kernel/cpu/vmware.c | 12 ++--
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 71d630bb5e08..1edf24f51b53 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -379,6 +379,16 @@ struct vmcb {
(unsigned long *)&ghcb->save.valid_bitmap); 
\
}   
\

\
+   static inline u64 ghcb_get_##field(struct ghcb *ghcb)   
\
+   {   
\
+   return ghcb->save.field;
\
+   }   
\
+   
\
+   static inline u64 ghcb_get_##field##_if_valid(struct ghcb *ghcb)
\
+   {   
\
+   return ghcb_##field##_is_valid(ghcb) ? ghcb->save.field : 0;
\
+   }   
\
+   
\
static inline void ghcb_set_##field(struct ghcb *ghcb, u64 value)   
\
{   
\
__set_bit(GHCB_BITMAP_IDX(field),   
\
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 924571fe5864..c6ede3b3d302 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -501,12 +501,12 @@ static bool vmware_sev_es_hcall_finish(struct ghcb *ghcb, 
struct pt_regs *regs)
  ghcb_rbp_is_valid(ghcb)))
return false;
 
-   regs->bx = ghcb->save.rbx;
-   regs->cx = ghcb->save.rcx;
-   regs->dx = ghcb->save.rdx;
-   regs->si = ghcb->save.rsi;
-   regs->di = ghcb->save.rdi;
-   regs->bp = ghcb->save.rbp;
+   regs->bx = ghcb_get_rbx(ghcb);
+   regs->cx = ghcb_get_rcx(ghcb);
+   regs->dx = ghcb_get_rdx(ghcb);
+   regs->si = ghcb_get_rsi(ghcb);
+   regs->di = ghcb_get_rdi(ghcb);
+   regs->bp = ghcb_get_rbp(ghcb);
 
return true;
 }
-- 
2.28.0



[PATCH v5 07/34] KVM: SVM: Add required changes to support intercepts under SEV-ES

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

When a guest is running under SEV-ES, the hypervisor cannot access the
guest register state. There are numerous places in the KVM code where
certain registers are accessed that are not allowed to be accessed (e.g.
RIP, CR0, etc). Add checks to prevent register accesses and add intercept
update support at various points within the KVM code.

Also, when handling a VMGEXIT, exceptions are passed back through the
GHCB. Since the RDMSR/WRMSR intercepts (may) inject a #GP on error,
update the SVM intercepts to handle this for SEV-ES guests.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/svm.h |   3 +-
 arch/x86/kvm/svm/svm.c | 111 +
 arch/x86/kvm/x86.c |   6 +-
 3 files changed, 107 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1edf24f51b53..bce28482d63d 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -178,7 +178,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
-#define SVM_INTERRUPT_SHADOW_MASK 1
+#define SVM_INTERRUPT_SHADOW_MASK  BIT_ULL(0)
+#define SVM_GUEST_INTERRUPT_MASK   BIT_ULL(1)
 
 #define SVM_IOIO_STR_SHIFT 2
 #define SVM_IOIO_REP_SHIFT 3
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index cd4c9884e5a8..857d0d3f2752 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "trace.h"
@@ -340,6 +341,13 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
+   /*
+* SEV-ES does not expose the next RIP. The RIP update is controlled by
+* the type of exit and the #VC handler in the guest.
+*/
+   if (sev_es_guest(vcpu->kvm))
+   goto done;
+
if (nrips && svm->vmcb->control.next_rip != 0) {
WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
svm->next_rip = svm->vmcb->control.next_rip;
@@ -351,6 +359,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
} else {
kvm_rip_write(vcpu, svm->next_rip);
}
+
+done:
svm_set_interrupt_shadow(vcpu, 0);
 
return 1;
@@ -1652,9 +1662,18 @@ static void svm_set_gdt(struct kvm_vcpu *vcpu, struct 
desc_ptr *dt)
 
 static void update_cr0_intercept(struct vcpu_svm *svm)
 {
-   ulong gcr0 = svm->vcpu.arch.cr0;
-   u64 *hcr0 = &svm->vmcb->save.cr0;
+   ulong gcr0;
+   u64 *hcr0;
+
+   /*
+* SEV-ES guests must always keep the CR intercepts cleared. CR
+* tracking is done using the CR write traps.
+*/
+   if (sev_es_guest(svm->vcpu.kvm))
+   return;
 
+   gcr0 = svm->vcpu.arch.cr0;
+   hcr0 = &svm->vmcb->save.cr0;
*hcr0 = (*hcr0 & ~SVM_CR0_SELECTIVE_MASK)
| (gcr0 & SVM_CR0_SELECTIVE_MASK);
 
@@ -1674,7 +1693,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
struct vcpu_svm *svm = to_svm(vcpu);
 
 #ifdef CONFIG_X86_64
-   if (vcpu->arch.efer & EFER_LME) {
+   if (vcpu->arch.efer & EFER_LME && !vcpu->arch.guest_state_protected) {
if (!is_paging(vcpu) && (cr0 & X86_CR0_PG)) {
vcpu->arch.efer |= EFER_LMA;
svm->vmcb->save.efer |= EFER_LMA | EFER_LME;
@@ -2608,7 +2627,29 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
 
 static int rdmsr_interception(struct vcpu_svm *svm)
 {
-   return kvm_emulate_rdmsr(&svm->vcpu);
+   u32 ecx;
+   u64 data;
+
+   if (!sev_es_guest(svm->vcpu.kvm))
+   return kvm_emulate_rdmsr(&svm->vcpu);
+
+   ecx = kvm_rcx_read(&svm->vcpu);
+   if (kvm_get_msr(&svm->vcpu, ecx, &data)) {
+   trace_kvm_msr_read_ex(ecx);
+   ghcb_set_sw_exit_info_1(svm->ghcb, 1);
+   ghcb_set_sw_exit_info_2(svm->ghcb,
+   X86_TRAP_GP |
+   SVM_EVTINJ_TYPE_EXEPT |
+   SVM_EVTINJ_VALID);
+   return 1;
+   }
+
+   trace_kvm_msr_read(ecx, data);
+
+   kvm_rax_write(&svm->vcpu, data & -1u);
+   kvm_rdx_write(&svm->vcpu, (data >> 32) & -1u);
+
+   return kvm_skip_emulated_instruction(&svm->vcpu);
 }
 
 static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 data)
@@ -2797,7 +2838,27 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr)
 
 static int wrmsr_interception(struct vcpu_svm *svm)
 {
-   return kvm_emulate_wrmsr(&svm->vcpu);
+   u32 ecx;
+   u64 data;
+
+

[PATCH v5 05/34] KVM: SVM: Add support for the SEV-ES VMSA

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

Allocate a page during vCPU creation to be used as the encrypted VM save
area (VMSA) for the SEV-ES guest. Provide a flag in the kvm_vcpu_arch
structure that indicates whether the guest state is protected.

When freeing a VMSA page that has been encrypted, the cache contents must
be flushed using the MSR_AMD64_VM_PAGE_FLUSH before freeing the page.

[ i386 build warnings ]
Reported-by: kernel test robot 
Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/kvm_host.h |  3 ++
 arch/x86/kvm/svm/sev.c  | 67 +
 arch/x86/kvm/svm/svm.c  | 24 +++-
 arch/x86/kvm/svm/svm.h  |  5 +++
 4 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f002cdb13a0b..8cf6b0493d49 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,6 +805,9 @@ struct kvm_vcpu_arch {
 */
bool enforce;
} pv_cpuid;
+
+   /* Protected Guests */
+   bool guest_state_protected;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9bf5e9dadff5..fb4a411f7550 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "x86.h"
 #include "svm.h"
@@ -1190,6 +1191,72 @@ void sev_hardware_teardown(void)
sev_flush_asids();
 }
 
+/*
+ * Pages used by hardware to hold guest encrypted state must be flushed before
+ * returning them to the system.
+ */
+static void sev_flush_guest_memory(struct vcpu_svm *svm, void *va,
+  unsigned long len)
+{
+   /*
+* If hardware enforced cache coherency for encrypted mappings of the
+* same physical page is supported, nothing to do.
+*/
+   if (boot_cpu_has(X86_FEATURE_SME_COHERENT))
+   return;
+
+   /*
+* If the VM Page Flush MSR is supported, use it to flush the page
+* (using the page virtual address and the guest ASID).
+*/
+   if (boot_cpu_has(X86_FEATURE_VM_PAGE_FLUSH)) {
+   struct kvm_sev_info *sev;
+   unsigned long va_start;
+   u64 start, stop;
+
+   /* Align start and stop to page boundaries. */
+   va_start = (unsigned long)va;
+   start = (u64)va_start & PAGE_MASK;
+   stop = PAGE_ALIGN((u64)va_start + len);
+
+   if (start < stop) {
+   sev = &to_kvm_svm(svm->vcpu.kvm)->sev_info;
+
+   while (start < stop) {
+   wrmsrl(MSR_AMD64_VM_PAGE_FLUSH,
+  start | sev->asid);
+
+   start += PAGE_SIZE;
+   }
+
+   return;
+   }
+
+   WARN(1, "Address overflow, using WBINVD\n");
+   }
+
+   /*
+* Hardware should always have one of the above features,
+* but if not, use WBINVD and issue a warning.
+*/
+   WARN_ONCE(1, "Using WBINVD to flush guest memory\n");
+   wbinvd_on_all_cpus();
+}
+
+void sev_free_vcpu(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm;
+
+   if (!sev_es_guest(vcpu->kvm))
+   return;
+
+   svm = to_svm(vcpu);
+
+   if (vcpu->arch.guest_state_protected)
+   sev_flush_guest_memory(svm, svm->vmsa, PAGE_SIZE);
+   __free_page(virt_to_page(svm->vmsa));
+}
+
 void pre_sev_run(struct vcpu_svm *svm, int cpu)
 {
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a1ea30c98629..cd4c9884e5a8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1289,6 +1289,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm;
struct page *vmcb_page;
+   struct page *vmsa_page = NULL;
int err;
 
BUILD_BUG_ON(offsetof(struct vcpu_svm, vcpu) != 0);
@@ -1299,9 +1300,19 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
if (!vmcb_page)
goto out;
 
+   if (sev_es_guest(svm->vcpu.kvm)) {
+   /*
+* SEV-ES guests require a separate VMSA page used to contain
+* the encrypted register state of the guest.
+*/
+   vmsa_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+   if (!vmsa_page)
+   goto error_free_vmcb_page;
+   }
+
err = avic_init_vcpu(svm);
if (err)
-   goto error_free_vmcb_page;
+   goto error_free_vmsa_page;
 
/* We initialize this flag to true to make sure that the is_running
 * bit would be set the first time the vcpu is loaded.
@@ -1311,12 +132

[PATCH v5 13/34] KVM: SVM: Create trace events for VMGEXIT processing

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

Add trace events for entry to and exit from VMGEXIT processing. The vCPU
id and the exit reason will be common for the trace events. The exit info
fields will represent the input and output values for the entry and exit
events, respectively.

Signed-off-by: Tom Lendacky 
---
 arch/x86/kvm/svm/sev.c |  6 +
 arch/x86/kvm/trace.h   | 53 ++
 arch/x86/kvm/x86.c |  2 ++
 3 files changed, 61 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 54e6894b26d2..da473c6b725e 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -15,10 +15,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "x86.h"
 #include "svm.h"
 #include "cpuid.h"
+#include "trace.h"
 
 static int sev_flush_asids(void);
 static DECLARE_RWSEM(sev_deactivate_lock);
@@ -1464,6 +1466,8 @@ static void pre_sev_es_run(struct vcpu_svm *svm)
if (!svm->ghcb)
return;
 
+   trace_kvm_vmgexit_exit(svm->vcpu.vcpu_id, svm->ghcb);
+
sev_es_sync_to_ghcb(svm);
 
kvm_vcpu_unmap(&svm->vcpu, &svm->ghcb_map, true);
@@ -1528,6 +1532,8 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
svm->ghcb = svm->ghcb_map.hva;
ghcb = svm->ghcb_map.hva;
 
+   trace_kvm_vmgexit_enter(svm->vcpu.vcpu_id, ghcb);
+
exit_code = ghcb_get_sw_exit_code(ghcb);
 
ret = sev_es_validate_vmgexit(svm);
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index aef960f90f26..7da931a511c9 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1578,6 +1578,59 @@ TRACE_EVENT(kvm_hv_syndbg_get_msr,
  __entry->vcpu_id, __entry->vp_index, __entry->msr,
  __entry->data)
 );
+
+/*
+ * Tracepoint for the start of VMGEXIT processing
+ */
+TRACE_EVENT(kvm_vmgexit_enter,
+   TP_PROTO(unsigned int vcpu_id, struct ghcb *ghcb),
+   TP_ARGS(vcpu_id, ghcb),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, vcpu_id)
+   __field(u64, exit_reason)
+   __field(u64, info1)
+   __field(u64, info2)
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id = vcpu_id;
+   __entry->exit_reason = ghcb->save.sw_exit_code;
+   __entry->info1   = ghcb->save.sw_exit_info_1;
+   __entry->info2   = ghcb->save.sw_exit_info_2;
+   ),
+
+   TP_printk("vcpu %u, exit_reason %llx, exit_info1 %llx, exit_info2 %llx",
+ __entry->vcpu_id, __entry->exit_reason,
+ __entry->info1, __entry->info2)
+);
+
+/*
+ * Tracepoint for the end of VMGEXIT processing
+ */
+TRACE_EVENT(kvm_vmgexit_exit,
+   TP_PROTO(unsigned int vcpu_id, struct ghcb *ghcb),
+   TP_ARGS(vcpu_id, ghcb),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, vcpu_id)
+   __field(u64, exit_reason)
+   __field(u64, info1)
+   __field(u64, info2)
+   ),
+
+   TP_fast_assign(
+   __entry->vcpu_id = vcpu_id;
+   __entry->exit_reason = ghcb->save.sw_exit_code;
+   __entry->info1   = ghcb->save.sw_exit_info_1;
+   __entry->info2   = ghcb->save.sw_exit_info_2;
+   ),
+
+   TP_printk("vcpu %u, exit_reason %llx, exit_info1 %llx, exit_info2 %llx",
+ __entry->vcpu_id, __entry->exit_reason,
+ __entry->info1, __entry->info2)
+);
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index de0e35083df5..d89736066b39 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11321,3 +11321,5 @@ 
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
-- 
2.28.0



[PATCH v5 08/34] KVM: SVM: Prevent debugging under SEV-ES

2020-12-10 Thread Tom Lendacky
From: Tom Lendacky 

Since the guest register state of an SEV-ES guest is encrypted, debugging
is not supported. Update the code to prevent guest debugging when the
guest has protected state.

Additionally, an SEV-ES guest must only and always intercept DR7 reads and
writes. Update set_dr_intercepts() and clr_dr_intercepts() to account for
this.

Signed-off-by: Tom Lendacky 
---
 arch/x86/kvm/svm/svm.c |  9 +
 arch/x86/kvm/svm/svm.h | 81 +++---
 arch/x86/kvm/x86.c |  3 ++
 3 files changed, 57 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 857d0d3f2752..513cf667dff4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1806,6 +1806,9 @@ static void svm_set_dr6(struct vcpu_svm *svm, unsigned 
long value)
 {
struct vmcb *vmcb = svm->vmcb;
 
+   if (svm->vcpu.arch.guest_state_protected)
+   return;
+
if (unlikely(value != vmcb->save.dr6)) {
vmcb->save.dr6 = value;
vmcb_mark_dirty(vmcb, VMCB_DR);
@@ -1816,6 +1819,9 @@ static void svm_sync_dirty_debug_regs(struct kvm_vcpu 
*vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
+   if (vcpu->arch.guest_state_protected)
+   return;
+
get_debugreg(vcpu->arch.db[0], 0);
get_debugreg(vcpu->arch.db[1], 1);
get_debugreg(vcpu->arch.db[2], 2);
@@ -1834,6 +1840,9 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned 
long value)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
+   if (vcpu->arch.guest_state_protected)
+   return;
+
svm->vmcb->save.dr7 = value;
vmcb_mark_dirty(svm->vmcb, VMCB_DR);
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 80a359f3cf20..abfe53d6b3dc 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -199,6 +199,28 @@ static inline struct kvm_svm *to_kvm_svm(struct kvm *kvm)
return container_of(kvm, struct kvm_svm, kvm);
 }
 
+static inline bool sev_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+   return sev->active;
+#else
+   return false;
+#endif
+}
+
+static inline bool sev_es_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+   struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
+
+   return sev_guest(kvm) && sev->es_active;
+#else
+   return false;
+#endif
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
vmcb->control.clean = 0;
@@ -250,21 +272,24 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm)
 {
struct vmcb *vmcb = get_host_vmcb(svm);
 
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+   if (!sev_es_guest(svm->vcpu.kvm)) {
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_READ);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
+   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRITE);
+   }
+
vmcb_set_intercept(&vmcb->control, INTERCEPT_DR7_READ);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR0_WRITE);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR1_WRITE);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR2_WRITE);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR3_WRITE);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR4_WRITE);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR5_WRITE);
-   vmcb_set_intercept(&vmcb->control, INTERCEPT_DR6_WRI

  1   2   3   4   5   6   7   8   9   10   >