from:"Brijesh Singh"

Re: [RFC Part2 PATCH 05/30] x86: define RMP violation #PF error code

2021-04-20 Thread Brijesh Singh



On 4/20/21 5:32 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 12:04:11PM -0500, Brijesh Singh wrote:
>
> Btw, for all your patches where the subject prefix is only "x86:":
>
> The tip tree preferred format for patch subject prefixes is
> 'subsys/component:', e.g. 'x86/apic:', 'x86/mm/fault:', 'sched/fair:',
> 'genirq/core:'. Please do not use file names or complete file paths as
> prefix. 'git log path/to/file' should give you a reasonable hint in most
> cases.
>
> The condensed patch description in the subject line should start with a
> uppercase letter and should be written in imperative tone.
>
> Please go over them and fix that up.

Sure, I will go over each patch and fix them.

thanks

Re: [PATCH] KVM: x86: document behavior of measurement ioctls with len==0

2021-04-20 Thread Brijesh Singh



On 4/20/21 4:34 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini 


Reviewed-by: Brijesh Singh 

Thanks

> ---
>  Documentation/virt/kvm/amd-memory-encryption.rst | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
> b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 469a6308765b..34ce2d1fcb89 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -148,6 +148,9 @@ measurement. Since the guest owner knows the initial 
> contents of the guest at
>  boot, the measurement can be verified by comparing it to what the guest owner
>  expects.
>  
> +If len is zero on entry, the measurement blob length is written to len and
> +uaddr is unused.
> +
>  Parameters (in): struct  kvm_sev_launch_measure
>  
>  Returns: 0 on success, -negative on error
> @@ -271,6 +274,9 @@ report containing the SHA-256 digest of the guest memory 
> and VMSA passed through
>  commands and signed with the PEK. The digest returned by the command should 
> match the digest
>  used by the guest owner with the KVM_SEV_LAUNCH_MEASURE.
>  
> +If len is zero on entry, the measurement blob length is written to len and
> +uaddr is unused.
> +
>  Parameters (in): struct kvm_sev_attestation
>  
>  Returns: 0 on success, -negative on error

Re: [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table

2021-04-19 Thread Brijesh Singh



On 4/19/21 1:10 PM, Andy Lutomirski wrote:
>
>> On Apr 19, 2021, at 10:58 AM, Dave Hansen  wrote:
>>
>> On 4/19/21 10:46 AM, Brijesh Singh wrote:
>>> - guest wants to make gpa 0x1000 as a shared page. To support this, we
>>> need to psmash the large RMP entry into 512 4K entries. The psmash
>>> instruction breaks the large RMP entry into 512 4K entries without
>>> affecting the previous validation. Now the we need to force the host to
>>> use the 4K page level instead of the 2MB.
>>>
>>> To my understanding, Linux kernel fault handler does not build the page
>>> tables on demand for the kernel addresses. All kernel addresses are
>>> pre-mapped on the boot. Currently, I am proactively spitting the physmap
>>> to avoid running into situation where x86 page level is greater than the
>>> RMP page level.
>> In other words, if the host maps guest memory with 2M mappings, the
>> guest can induce page faults in the host.  The only way the host can
>> avoid this is to map everything with 4k mappings.
>>
>> If the host does not avoid this, it could end up in the situation where
>> it gets page faults on access to kernel data structures.  Imagine if a
>> kernel stack page ended up in the same 2M mapping as a guest page.  I
>> *think* the next write to the kernel stack would end up double-faulting.
> I’m confused by this scenario. This should only affect physical pages that 
> are in the 2M area that contains guest memory. But, if we have a 2M direct 
> map PMD entry that contains kernel data and guest private memory, we’re 
> already in a situation in which the kernel touching that memory would machine 
> check, right?

When SEV-SNP is enabled in the host, a page can be in one of the
following state:

1. Hypevisor  (assigned = 0, Validated=0)

2. Firmware (assigned = 1, immutable=1)

3. Context/VMSA (assigned=1, vmsa=1)

4. Guest private (assigned = 1, Validated=1)


You are right that we should never run into situation where the kernel
data and guest page will be in the same PMD entry. 

During the SEV-VM creation, KVM allocates one firmware page and one vmsa
page for each vcpus. The firmware page is used by the SEV-SNP firmware
to keep some private metadata. The VMSA page contains the guest register
state. I am more concern about the pages allocated by the KVM for the
VMSA and firmware. These pages are not a guest private per se.  To avoid
getting into this situation we can probably create SNP buffer pool. All
the firmware and VMSA pages should come from this pool.

Another challenging one, KVM maps a guest page and does write to it. One
such example is the GHCB page. If the mapped address points to a PMD
entry then we will get an RMP violation.


> ISTM we should fully unmap any guest private page from the kernel and all 
> host user pagetables before actually making it be a guest private page.

Re: [RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table

2021-04-19 Thread Brijesh Singh



On 4/19/21 7:32 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 12:04:10PM -0500, Brijesh Singh wrote:
>> A write from the hypervisor goes through the RMP checks. When the
>> hypervisor writes to pages, hardware checks to ensures that the assigned
>> bit in the RMP is zero (i.e page is shared). If the page table entry that
>> gives the sPA indicates that the target page size is a large page, then
>> all RMP entries for the 4KB constituting pages of the target must have the
>> assigned bit 0.
> Hmm, so this is important: I read this such that we can have a 2M
> page table entry but the RMP table can contain 4K entries for the
> corresponding 512 4K pages. Is that correct?

Yes that is correct.


>
> If so, then there's a certain discrepancy here and I'd expect that if
> the page gets split/collapsed, depending on the result, the RMP table
> should be updated too, so that it remains in sync.

Yes that is correct. For write access to succeed we need both the x86
and RMP page tables in sync.

>
> For example:
>
> * mm decides to group all 512 4K entries into a 2M entry, RMP table gets
> updated in the end to reflect that

To my understanding, we don't group 512 4K entries into a 2M for the
kernel address range. We do this for the userspace address through
khugepage daemon. If page tables get out of sync then it will cause an
RMP violation, the Patch #7 adds support to split the pages on demand.


>
> * mm decides to split a page, RMP table gets updated too, for the same
> reason.
>
> In this way, RMP table will be always in sync with the pagetables.
>
> I know, I probably am missing something but that makes most sense to
> me instead of noticing the discrepancy and getting to work then, when
> handling the RMP violation.
>
> Or?
>
> Thx.
>

Re: [RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers

2021-04-15 Thread Brijesh Singh



On 4/15/21 2:50 PM, Borislav Petkov wrote:
> On Thu, Apr 15, 2021 at 01:08:09PM -0500, Brijesh Singh wrote:
>> This is from Family 19h Model 01h Rev B01. The processor which
>> introduces the SNP feature. Yes, I have already upload the PPR on the BZ.
>>
>> The PPR is also available at AMD: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fen%2Fsupport%2Ftech-docsdata=04%7C01%7Cbrijesh.singh%40amd.com%7Ca20ef8e85fca49875f4b08d90047b837%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637541130354491050%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=MosOIntXEk6ikXpIFd89XkZPb8H6oO25y2mP2l82blU%3Dreserved=0
> Please add the link in the bugzilla to the comments here - this is the
> reason why stuff is being uploaded in the first place, because those
> vendor sites tend to change and those links become stale with time.

Will do.


>
>> I guess I was trying to shorten the name. I am good with struct rmpentry;
> Yes please - typedefs are used only in very specific cases.
>
>> All those magic numbers are documented in the PPR.
> We use defines - not magic numbers. For example
>
> #define RMPTABLE_ENTRIES_OFFSET 0x4000
>
> The 8 is probably
>
> PAGE_SHIFT - RMPENTRY_SHIFT
>
> because you have GPA bits [50:12] and an RMP entry is 16 bytes, i.e., 1 << 4.
>
> With defines it is actually clear what the computation is doing - with
> naked numbers not really.

Sure, I will add macros to make it more readable.

Re: [RFC Part2 PATCH 03/30] x86: add helper functions for RMPUPDATE and PSMASH instruction

2021-04-15 Thread Brijesh Singh



On 4/15/21 1:00 PM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 12:04:09PM -0500, Brijesh Singh wrote:
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 06394b6d56b2..7a0138cb3e17 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -644,3 +644,44 @@ rmpentry_t *lookup_page_in_rmptable(struct page *page, 
>> int *level)
>>  return entry;
>>  }
>>  EXPORT_SYMBOL_GPL(lookup_page_in_rmptable);
>> +
>> +int rmptable_psmash(struct page *page)
> psmash() should be enough like all those other wrappers around insns.

Noted.


>
>> +{
>> +unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
>> +int ret;
>> +
>> +if (!static_branch_unlikely(_enable_key))
>> +return -ENXIO;
>> +
>> +/* Retry if another processor is modifying the RMP entry. */
> Also, a comment here should say which binutils version supports the
> insn mnemonic so that it can be converted to "psmash" later. Ditto for
> rmpupdate below.
>
> Looking at the binutils repo, it looks like since version 2.36.
>
> /me rebuilds objdump...

Sure, I will add comment.

>> +do {
>> +asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
>> +  : "=a"(ret)
>> +  : "a"(spa)
>> +  : "memory", "cc");
>> +} while (ret == PSMASH_FAIL_INUSE);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(rmptable_psmash);
>> +
>> +int rmptable_rmpupdate(struct page *page, struct rmpupdate *val)
> rmpupdate()
>
>> +{
>> +unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
>> +bool flush = true;
>> +int ret;
>> +
>> +if (!static_branch_unlikely(_enable_key))
>> +return -ENXIO;
>> +
>> +/* Retry if another processor is modifying the RMP entry. */
>> +do {
>> +asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
>> + : "=a"(ret)
>> + : "a"(spa), "c"((unsigned long)val), "d"(flush)
>   ^^^
>
> what's the cast for?
No need to cast it. I will drop in next round.
> "d"(flush)?

Hmm, either it copied this function from pvalidate or old internal APM
may had the flush. I will fix it in the next rev. thanks for pointing it.


>
> There's nothing in the APM talking about RMPUPDATE taking an input arg
> in %rdx?
>

Re: [RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers

2021-04-15 Thread Brijesh Singh



On 4/15/21 12:03 PM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 12:04:08PM -0500, Brijesh Singh wrote:
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> Also, why is all this SNP stuff landing in this file instead of in sev.c
> or so which is AMD-specific?
>
I don't have any strong reason to keep in mem_encrypt.c. All these can
go in sev.c. I will move them in next version.

Re: [RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers

2021-04-15 Thread Brijesh Singh

Hi Boris,


On 4/15/21 11:57 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 12:04:08PM -0500, Brijesh Singh wrote:
>> The lookup_page_in_rmptable() can be used by the host to read the RMP
>> entry for a given page. The RMP entry format is documented in PPR
>> section 2.1.5.2.
> I see
>
> Table 15-36. Fields of an RMP Entry
>
> in the APM.
>
> Which PPR do you mean? Also, you know where to put those documents,
> right?

This is from Family 19h Model 01h Rev B01. The processor which
introduces the SNP feature. Yes, I have already upload the PPR on the BZ.

The PPR is also available at AMD: https://www.amd.com/en/support/tech-docs


>> +/* RMP table entry format (PPR section 2.1.5.2) */
>> +struct __packed rmpentry {
>> +union {
>> +struct {
>> +uint64_t assigned:1;
>> +uint64_t pagesize:1;
>> +uint64_t immutable:1;
>> +uint64_t rsvd1:9;
>> +uint64_t gpa:39;
>> +uint64_t asid:10;
>> +uint64_t vmsa:1;
>> +uint64_t validated:1;
>> +uint64_t rsvd2:1;
>> +} info;
>> +uint64_t low;
>> +};
>> +uint64_t high;
>> +};
>> +
>> +typedef struct rmpentry rmpentry_t;
> Eww, a typedef. Why?
>
> struct rmpentry is just fine.


I guess I was trying to shorten the name. I am good with struct rmpentry;


>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 39461b9cb34e..06394b6d56b2 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -34,6 +34,8 @@
>>  
>>  #include "mm_internal.h"
>>  
> <--- Needs a comment here to explain the magic 0x4000 and the magic
> shift by 8.


All those magic numbers are documented in the PPR. APM does not provide
the offset of the entry inside the RMP table. This is where we need to
refer the PPR.

>> +#define rmptable_page_offset(x) (0x4000 + (((unsigned long) x) >> 8))
>> +
>>  /*
>>   * Since SME related variables are set early in the boot process they must
>>   * reside in the .data section so as not to be zeroed out when the .bss
>> @@ -612,3 +614,33 @@ static int __init mem_encrypt_snp_init(void)
>>   * SEV-SNP must be enabled across all CPUs, so make the initialization as a 
>> late initcall.
>>   */
>>  late_initcall(mem_encrypt_snp_init);
>> +
>> +rmpentry_t *lookup_page_in_rmptable(struct page *page, int *level)
> snp_lookup_page_in_rmptable()

Noted.


>> +{
>> +unsigned long phys = page_to_pfn(page) << PAGE_SHIFT;
>> +rmpentry_t *entry, *large_entry;
>> +unsigned long vaddr;
>> +
>> +if (!static_branch_unlikely(_enable_key))
>> +return NULL;
>> +
>> +vaddr = rmptable_start + rmptable_page_offset(phys);
>> +if (WARN_ON(vaddr > rmptable_end))
> Do you really want to spew a warn on splat for each wrong vaddr? What
> for?
I guess I was using it during my development and there is no need for
it. I will remove it.
>
>> +return NULL;
>> +
>> +entry = (rmpentry_t *)vaddr;
>> +
>> +/*
>> + * Check if this page is covered by the large RMP entry. This is needed 
>> to get
>> + * the page level used in the RMP entry.
>> + *
> No need for a new line in the comment and no need for the "e.g." thing
> either.
>
> Also, s/the large RMP entry/a large RMP entry/g.
Noted/
>
>> + * e.g. if the page is covered by the large RMP entry then page size is 
>> set in the
>> + *   base RMP entry.
>> + */
>> +vaddr = rmptable_start + rmptable_page_offset(phys & PMD_MASK);
>> +large_entry = (rmpentry_t *)vaddr;
>> +*level = rmpentry_pagesize(large_entry);
>> +
>> +return entry;
>> +}
>> +EXPORT_SYMBOL_GPL(lookup_page_in_rmptable);
> Exported for kvm?

The current user for this are: KVM, CCP and page fault handler.

-Brijesh

Re: [RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support

2021-04-14 Thread Brijesh Singh



On 4/14/21 2:27 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 12:04:07PM -0500, Brijesh Singh wrote:
>> @@ -538,6 +540,10 @@
>>  #define MSR_K8_SYSCFG   0xc0010010
>>  #define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT   23
>>  #define MSR_K8_SYSCFG_MEM_ENCRYPT   BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
>> +#define MSR_K8_SYSCFG_SNP_EN_BIT24
>> +#define MSR_K8_SYSCFG_SNP_EN
>> BIT_ULL(MSR_K8_SYSCFG_SNP_EN_BIT)
>> +#define MSR_K8_SYSCFG_SNP_VMPL_EN_BIT   25
>> +#define MSR_K8_SYSCFG_SNP_VMPL_EN   BIT_ULL(MSR_K8_SYSCFG_SNP_VMPL_EN_BIT)
>>  #define MSR_K8_INT_PENDING_MSG  0xc0010055
>>  /* C1E active bits in int pending message */
>>  #define K8_INTP_C1E_ACTIVE_MASK 0x1800
> Ok, I believe it is finally time to make this MSR architectural and drop
> this silliness with "K8" in the name. If you wanna send me a prepatch which
> converts all like this:
>
> MSR_K8_SYSCFG -> MSR_AMD64_SYSCFG
>
> I'll gladly take it. If you prefer me to do it, I'll gladly do it.


I will send patch to address it.

>
>> @@ -44,12 +45,16 @@ u64 sev_check_data __section(".data") = 0;
>>  EXPORT_SYMBOL(sme_me_mask);
>>  DEFINE_STATIC_KEY_FALSE(sev_enable_key);
>>  EXPORT_SYMBOL_GPL(sev_enable_key);
>> +DEFINE_STATIC_KEY_FALSE(snp_enable_key);
>> +EXPORT_SYMBOL_GPL(snp_enable_key);
>>  
>>  bool sev_enabled __section(".data");
>>  
>>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>>  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>>  
>> +static unsigned long rmptable_start, rmptable_end;
> __ro_after_init I guess.

Yes.

>
>> +
>>  /*
>>   * When SNP is active, this routine changes the page state from private to 
>> shared before
>>   * copying the data from the source to destination and restore after the 
>> copy. This is required
>> @@ -528,3 +533,82 @@ void __init mem_encrypt_init(void)
>>  print_mem_encrypt_feature_info();
>>  }
>>  
>> +static __init void snp_enable(void *arg)
>> +{
>> +u64 val;
>> +
>> +rdmsrl_safe(MSR_K8_SYSCFG, );
> Why is this one _safe but the wrmsr isn't? Also, _safe returns a value -
> check it pls and return early.
No strong reason to use _safe. We reached here after all the CPUID
checks. I will drop the _safe.
>
>> +
>> +val |= MSR_K8_SYSCFG_SNP_EN;
>> +val |= MSR_K8_SYSCFG_SNP_VMPL_EN;
>> +
>> +wrmsrl(MSR_K8_SYSCFG, val);
>> +}
>> +
>> +static __init int rmptable_init(void)
>> +{
>> +u64 rmp_base, rmp_end;
>> +unsigned long sz;
>> +void *start;
>> +u64 val;
>> +
>> +rdmsrl_safe(MSR_AMD64_RMP_BASE, _base);
>> +rdmsrl_safe(MSR_AMD64_RMP_END, _end);
> Ditto, why _safe if you're checking CPUID?
>
>> +
>> +if (!rmp_base || !rmp_end) {
>> +pr_info("SEV-SNP: Memory for the RMP table has not been 
>> reserved by BIOS\n");
>> +return 1;
>> +}
>> +
>> +sz = rmp_end - rmp_base + 1;
>> +
>> +start = memremap(rmp_base, sz, MEMREMAP_WB);
>> +if (!start) {
>> +pr_err("SEV-SNP: Failed to map RMP table 0x%llx-0x%llx\n", 
>> rmp_base, rmp_end);
>   ^^^
>
> That prefix is done by doing
>
> #undef pr_fmt
> #define pr_fmt(fmt) "SEV-SNP: " fmt
>
> before the SNP-specific functions.
Sure, I will use it.
>
>> +return 1;
>> +}
>> +
>> +/*
>> + * Check if SEV-SNP is already enabled, this can happen if we are 
>> coming from kexec boot.
>> + * Do not initialize the RMP table when SEV-SNP is already.
>> + */
> comment can be 80 cols wide.
Noted.
>
>> +rdmsrl_safe(MSR_K8_SYSCFG, );
> As above.
>
>> +if (val & MSR_K8_SYSCFG_SNP_EN)
>> +goto skip_enable;
>> +
>> +/* Initialize the RMP table to zero */
>> +memset(start, 0, sz);
>> +
>> +/* Flush the caches to ensure that data is written before we enable the 
>> SNP */
>> +wbinvd_on_all_cpus();
>> +
>> +/* Enable the SNP feature */
>> +on_each_cpu(snp_enable, NULL, 1);
> What happens if you boot only a subset of the CPUs and then others get
> hotplugged later? IOW, you need a CPU hotplug notifier which enables the
> feature bit on newly arrived CPUs.
>
> Which makes me wonder whether it makes sense to have this in an initcall
> and not put it instead in

Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit

2021-04-12 Thread Brijesh Singh



On 4/12/21 8:05 AM, Borislav Petkov wrote:
> On Mon, Apr 12, 2021 at 07:55:01AM -0500, Brijesh Singh wrote:
>> The cur_entry is updated by the hypervisor. While building the psc
>> buffer the guest sets the cur_entry=0 and the end_entry point to the
>> last valid entry. The cur_entry is incremented by the hypervisor after
>> it successfully processes one 4K page. As per the spec, the hypervisor
>> could get interrupted in middle of the page state change and cur_entry
>> allows the guest to resume the page state change from the point where it
>> was interrupted.
> This is non-obvious and belongs in a comment above it. Otherwise it
> looks weird.


Sure, I will add the comment and provide reference to the GHCB section.

Thanks

Re: [RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit

2021-04-12 Thread Brijesh Singh



On 4/12/21 6:49 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:24AM -0500, Brijesh Singh wrote:
>> @@ -161,3 +162,108 @@ void __init early_snp_set_memory_shared(unsigned long 
>> vaddr, unsigned long paddr
>>   /* Ask hypervisor to make the memory shared in the RMP table. */
>>  early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
>>  }
>> +
>> +static int snp_page_state_vmgexit(struct ghcb *ghcb, struct 
>> snp_page_state_change *data)
> That function name definitely needs changing. The
> vmgexit_page_state_change() one too. They're currenty confusing as hell
> and I can't know what each one does without looking at its function
> body.
>
>> +{
>> +struct snp_page_state_header *hdr;
>> +int ret = 0;
>> +
>> +hdr = >header;
>> +
>> +/*
>> + * The hypervisor can return before processing all the entries, the 
>> loop below retries
>> + * until all the entries are processed.
>> + */
>> +while (hdr->cur_entry <= hdr->end_entry) {
> This doesn't make any sense: snp_set_page_state() builds a "set" of
> pages to change their state in a loop and this one iterates *again* over
> *something* which I'm not even clear on what.
>
> Is something setting cur_entry to end_entry eventually?
>
> In any case, why not issue those page state changes one-by-one in
> snp_set_page_state() or is it possible that HV can do a couple of
> them in one go so you have to poke it here until it sets cur_entry ==
> end_entry?


The cur_entry is updated by the hypervisor. While building the psc
buffer the guest sets the cur_entry=0 and the end_entry point to the
last valid entry. The cur_entry is incremented by the hypervisor after
it successfully processes one 4K page. As per the spec, the hypervisor
could get interrupted in middle of the page state change and cur_entry
allows the guest to resume the page state change from the point where it
was interrupted.

>
>> +ghcb_set_sw_scratch(ghcb, (u64)__pa(data));


Since we can get interrupted while executing the PSC so just to be safe
I re-initialized the scratch scratch area with our buffer instead of
relying on old values.


> Why do you have to call that here for every loop iteration...
>
>> +ret = vmgexit_page_state_change(ghcb, data);


As per the spec the caller must check that the cur_entry > end_entry to
determine whether all the entries are processed. If not then retry the
state change. The hypervisor will skip the previously processed entries.
The snp_page_state_vmgexit() is implemented to return only after all the
entries are changed.


>  and in that function too?!
>
>> +/* Page State Change VMGEXIT can pass error code through 
>> exit_info_2. */
>> +if (ret || ghcb->save.sw_exit_info_2)
>> +break;
>> +}
>> +
>> +return ret;
> You don't need that ret variable - just return value directly.


Noted.

>
>> +}
>> +
>> +static void snp_set_page_state(unsigned long paddr, unsigned int npages, 
>> int op)
>> +{
>> +unsigned long paddr_end, paddr_next;
>> +struct snp_page_state_change *data;
>> +struct snp_page_state_header *hdr;
>> +struct snp_page_state_entry *e;
>> +struct ghcb_state state;
>> +struct ghcb *ghcb;
>> +int ret, idx;
>> +
>> +paddr = paddr & PAGE_MASK;
>> +paddr_end = paddr + (npages << PAGE_SHIFT);
>> +
>> +ghcb = sev_es_get_ghcb();
> That function can return NULL.


Ah good point. Will fix in next rev.

>
>> +data = (struct snp_page_state_change *)ghcb->shared_buffer;
>> +hdr = >header;
>> +e = &(data->entry[0]);
> So
>
>   e = data->entry;
>
> ?


Sure I can do that. It reads better that way.


>> +memset(data, 0, sizeof (*data));
>> +
>> +for (idx = 0; paddr < paddr_end; paddr = paddr_next) {
> As before, a while loop pls.


Noted.

>
>> +int level = PG_LEVEL_4K;
> Why does this needs to happen on each loop iteration? It looks to me you
> wanna do below:
>
>   e->pagesize = X86_RMP_PG_LEVEL(PG_LEVEL_4K);
>
> instead.


Noted. I will remove the local variable.


>> +
>> +/* If we cannot fit more request then issue VMGEXIT before 
>> going further.  */
>  any more requests
>
> No "we" pls.


Noted.

>
>> +if (hdr->end_entry == (SNP_PAGE_STATE_CHANGE_MAX_ENTRY - 1)) {
>> +ret = snp_page_state_vmgexit(ghcb, data);
>> +

Re: [RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active

2021-04-09 Thread Brijesh Singh



On 4/9/21 11:53 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:22AM -0500, Brijesh Singh wrote:
>> +/*
>> + * The ROM memory is not part of the E820 system RAM and is not 
>> prevalidated by the BIOS.
>> + * The kernel page table maps the ROM region as encrypted memory, the 
>> SEV-SNP requires
>> + * the all the encrypted memory must be validated before the access.
>> + */
>> +if (sev_snp_active()) {
>> +unsigned long n, paddr;
>> +
>> +n = ((system_rom_resource.end + 1) - video_rom_resource.start) 
>> >> PAGE_SHIFT;
>> +paddr = video_rom_resource.start;
>> +early_snp_set_memory_private((unsigned long)__va(paddr), paddr, 
>> n);
>> +}
> I don't like this sprinkling of SNP-special stuff that needs to be done,
> around the tree. Instead, pls define a function called
>
>   snp_prep_memory(unsigned long pa, unsigned int num_pages, enum 
> operation);
>
> or so which does all the manipulation needed and the callsites only
> simply unconditionally call that function so that all detail is
> extracted and optimized away when not config-enabled.


Sure, I will do this in the next rev.


>
> Thx.
>

Re: [PATCH] KVM: SVM: Add support for KVM_SEV_SEND_CANCEL command

2021-04-08 Thread Brijesh Singh

On 4/1/21 8:44 PM, Steve Rutherford wrote:
> After completion of SEND_START, but before SEND_FINISH, the source VMM can
> issue the SEND_CANCEL command to stop a migration. This is necessary so
> that a cancelled migration can restart with a new target later.
>
> Signed-off-by: Steve Rutherford 
> ---
>  .../virt/kvm/amd-memory-encryption.rst|  9 +++
>  arch/x86/kvm/svm/sev.c| 24 +++
>  include/linux/psp-sev.h   | 10 
>  include/uapi/linux/kvm.h  |  2 ++
>  4 files changed, 45 insertions(+)

Can we add a new case statement in sev_cmd_buffer_len()
[drivers/crypto/ccp/sev-dev.c] for this command ? I understand that the
command just contains the handle. I have found dyndbg very helpful. If
the command is not added in the sev_cmd_buffer_len() then we don't dump
the command buffer.

With that fixed.

Reviewed-by: Brijesh Singh

Re: [RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change

2021-04-08 Thread Brijesh Singh



On 4/8/21 6:40 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:20AM -0500, Brijesh Singh wrote:
>> @@ -63,6 +63,10 @@ struct __packed snp_page_state_change {
>>  #define GHCB_REGISTER_GPA_RESP  0x013UL
>>  #define GHCB_REGISTER_GPA_RESP_VAL(val) ((val) >> 12)
>>  
>> +/* Macro to convert the x86 page level to the RMP level and vice versa */
>> +#define X86_RMP_PG_LEVEL(level) (((level) == PG_LEVEL_4K) ? 
>> RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
>> +#define RMP_X86_PG_LEVEL(level) (((level) == RMP_PG_SIZE_4K) ? 
>> PG_LEVEL_4K : PG_LEVEL_2M)
> Please add those with the patch which uses them for the first time.
>
> Also, it seems to me the names should be
>
> X86_TO_RMP_PG_LEVEL
> RMP_TO_X86_PG_LEVEL
>
> ...

Noted.

>> @@ -56,3 +56,108 @@ void sev_snp_register_ghcb(unsigned long paddr)
>>  /* Restore the GHCB MSR value */
>>  sev_es_wr_ghcb_msr(old);
>>  }
>> +
>> +static void sev_snp_issue_pvalidate(unsigned long vaddr, unsigned int 
>> npages, bool validate)
> pvalidate_pages() I guess.

Noted.

>
>> +{
>> +unsigned long eflags, vaddr_end, vaddr_next;
>> +int rc;
>> +
>> +vaddr = vaddr & PAGE_MASK;
>> +vaddr_end = vaddr + (npages << PAGE_SHIFT);
>> +
>> +for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> Yuck, that vaddr_next gets initialized at the end of the loop. How about
> using a while loop here instead?
>
>   while (vaddr < vaddr_end) {
>
>   ...
>
>   vaddr += PAGE_SIZE;
>   }
>
> then you don't need vaddr_next at all. Ditto for all the other loops in
> this patch which iterate over pages.
Yes, I will switch to use a while loop() in next rev.
>
>> +rc = __pvalidate(vaddr, RMP_PG_SIZE_4K, validate, );
> So this function gets only 4K pages to pvalidate?

The early routines uses the GHCB MSR protocol for the validation. The
GHCB MSR protocol supports 4K only. The early routine can be called
before the GHCB is established.


>
>> +
> ^ Superfluous newline.
Noted.
>> +if (rc) {
>> +pr_err("Failed to validate address 0x%lx ret %d\n", 
>> vaddr, rc);
> You can combine the pr_err and dump_stack() below into a WARN() here:
>
>   WARN(rc, ...);
Noted.
>> +goto e_fail;
>> +}
>> +
>> +/* Check for the double validation condition */
>> +if (eflags & X86_EFLAGS_CF) {
>> +pr_err("Double %salidation detected (address 0x%lx)\n",
>> +validate ? "v" : "inv", vaddr);
>> +goto e_fail;
>> +}
> As before - this should be communicated by a special retval from
> __pvalidate().
Yes.
>
>> +
>> +vaddr_next = vaddr + PAGE_SIZE;
>> +}
>> +
>> +return;
>> +
>> +e_fail:
>> +/* Dump stack for the debugging purpose */
>> +dump_stack();
>> +
>> +/* Ask to terminate the guest */
>> +sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> Another termination reason to #define.
>
>> +}
>> +
>> +static void __init early_snp_set_page_state(unsigned long paddr, unsigned 
>> int npages, int op)
>> +{
>> +unsigned long paddr_end, paddr_next;
>> +u64 old, val;
>> +
>> +paddr = paddr & PAGE_MASK;
>> +paddr_end = paddr + (npages << PAGE_SHIFT);
>> +
>> +/* save the old GHCB MSR */
>> +old = sev_es_rd_ghcb_msr();
>> +
>> +for (; paddr < paddr_end; paddr = paddr_next) {
>> +
>> +/*
>> + * Use the MSR protocol VMGEXIT to request the page state 
>> change. We use the MSR
>> + * protocol VMGEXIT because in early boot we may not have the 
>> full GHCB setup
>> + * yet.
>> + */
>> +sev_es_wr_ghcb_msr(GHCB_SNP_PAGE_STATE_REQ_GFN(paddr >> 
>> PAGE_SHIFT, op));
>> +VMGEXIT();
> Yeah, I know we don't always strictly adhere to 80 columns but there's
> no real need not to fit that in 80 cols here so please shorten names and
> comments. Ditto for the rest.
Noted.
>
>> +
>> +val = sev_es_rd_ghcb_msr();
>> +
>> +/* Read the response, if the page state change failed then 
>> terminate the guest. */
>> +if (GHCB_SEV_GHCB_RESP_CODE(val) != 
>> GHCB_SNP_PAGE_STATE_CHANGE_RESP)
>> +

Re: [RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active

2021-04-07 Thread Brijesh Singh



On 4/7/21 6:59 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:18AM -0500, Brijesh Singh wrote:
>> The SEV-SNP guest is required to perform GHCB GPA registration. This is
> Why does it need to do that? Some additional security so as to not allow
> changing the GHCB once it is established?
>
> I'm guessing that's enforced by the SNP fw and we cannot do that
> retroactively for SEV...? Because it sounds like a nice little thing we
> could do additionally.

The feature is part of the GHCB version 2 and is enforced by the
hypervisor. I guess it can be extended for the ES. Since this feature
was not available in GHCB version 1 (base ES) so it should be presented
as an optional for the ES ?


>
>> because the hypervisor may prefer that a guest use a consistent and/or
>> specific GPA for the GHCB associated with a vCPU. For more information,
>> see the GHCB specification section 2.5.2.
> I think you mean
>
> "2.3.2 GHCB GPA Registration"
>
> Please use the section name too because that doc changes from time to
> time.
>
> Also, you probably should update it here:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D206537data=04%7C01%7Cbrijesh.singh%40amd.com%7Ce8ae7574ecc742be6c1a08d8f9bcac94%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533936070042328%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=NaHJ5R9Dfo%2FPnci%2B%2B6xK9ecpV0%2F%2FYbsdGl25%2BFj3TaU%3Dreserved=0
>

Yes, the section may have changed since I wrote the description. Noted.
I will refer the section name.


>> diff --git a/arch/x86/boot/compressed/sev-snp.c 
>> b/arch/x86/boot/compressed/sev-snp.c
>> index 5c25103b0df1..a4c5e85699a7 100644
>> --- a/arch/x86/boot/compressed/sev-snp.c
>> +++ b/arch/x86/boot/compressed/sev-snp.c
>> @@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
>>  {
>>  sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
>>  }
>> +
>> +void sev_snp_register_ghcb(unsigned long paddr)
> Right and let's prefix SNP-specific functions with "snp_" only so that
> it is clear which is wcich when looking at the code.
>
>> +{
>> +u64 pfn = paddr >> PAGE_SHIFT;
>> +u64 old, val;
>> +
>> +if (!sev_snp_enabled())
>> +return;
>> +
>> +/* save the old GHCB MSR */
>> +old = sev_es_rd_ghcb_msr();
>> +
>> +/* Issue VMGEXIT */
> No need for that comment.
>
>> +sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
>> +VMGEXIT();
>> +
>> +val = sev_es_rd_ghcb_msr();
>> +
>> +/* If the response GPA is not ours then abort the guest */
>> +if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
>> +(GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
>> +sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
> Yet another example where using a specific termination reason could help
> with debugging guests. Looking at the GHCB spec, I hope GHCBData[23:16]
> is big enough for all reasons. I'm sure it can be extended ofc ...


Maybe we can request the GHCB version 3 to add the extended error code.


> :-)
>
>> +/* Restore the GHCB MSR value */
>> +sev_es_wr_ghcb_msr(old);
>> +}
>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>> index f514dad276f2..0523eb21abd7 100644
>> --- a/arch/x86/include/asm/sev-snp.h
>> +++ b/arch/x86/include/asm/sev-snp.h
>> @@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
>>  struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
>>  };
>>  
>> +/* GHCB GPA register */
>> +#define GHCB_REGISTER_GPA_REQ   0x012UL
>> +#define GHCB_REGISTER_GPA_REQ_VAL(v)
>> (GHCB_REGISTER_GPA_REQ | ((v) << 12))
>> +
>> +#define GHCB_REGISTER_GPA_RESP  0x013UL
> Let's append "UL" to the other request numbers for consistency.
>
> Thx.
>

Re: [PATCH v2 0/8] ccp: KVM: SVM: Use stack for SEV command buffers

2021-04-07 Thread Brijesh Singh



On 4/6/21 5:49 PM, Sean Christopherson wrote:
> This series teaches __sev_do_cmd_locked() to gracefully handle vmalloc'd
> command buffers by copying _all_ incoming data pointers to an internal
> buffer before sending the command to the PSP.  The SEV driver and KVM are
> then converted to use the stack for all command buffers.
>
> Tested everything except sev_ioctl_do_pek_import(), I don't know anywhere
> near enough about the PSP to give it the right input.
>
> v2:
>   - Rebase to kvm/queue, commit f96be2deac9b ("KVM: x86: Support KVM VMs
> sharing SEV context").
>   - Unconditionally copy @data to the internal buffer. [Christophe, Brijesh]
>   - Allocate a full page for the buffer. [Brijesh]
>   - Drop one set of the "!"s. [Christophe]
>   - Use virt_addr_valid() instead of is_vmalloc_addr() for the temporary
> patch (definitely feel free to drop the patch if it's not worth
> backporting). [Christophe]
>   - s/intput/input/. [Tom]
>   - Add a patch to free "sev" if init fails.  This is not strictly
> necessary (I think; I suck horribly when it comes to the driver
> framework).   But it felt wrong to not free cmd_buf on failure, and
> even more wrong to free cmd_buf but not sev.
>
> v1:
>   - 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.kernel.org%2Fr%2F20210402233702.3291792-1-seanjc%40google.comdata=04%7C01%7Cbrijesh.singh%40amd.com%7C051db746fc2048e06acb08d8f94e527b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637533462083069551%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=bbNHBXMO1RWh8i4siTYkv4P92Ph5C7SnAZ3uTPsxgvg%3Dreserved=0
>
> Sean Christopherson (8):
>   crypto: ccp: Free SEV device if SEV init fails
>   crypto: ccp: Detect and reject "invalid" addresses destined for PSP
>   crypto: ccp: Reject SEV commands with mismatching command buffer
>   crypto: ccp: Play nice with vmalloc'd memory for SEV command structs
>   crypto: ccp: Use the stack for small SEV command buffers
>   crypto: ccp: Use the stack and common buffer for status commands
>   crypto: ccp: Use the stack and common buffer for INIT command
>   KVM: SVM: Allocate SEV command structures on local stack
>
>  arch/x86/kvm/svm/sev.c   | 262 +--
>  drivers/crypto/ccp/sev-dev.c | 197 +++++-
>  drivers/crypto/ccp/sev-dev.h |   4 +-
>  3 files changed, 196 insertions(+), 267 deletions(-)
>

Thanks Sean.

Reviewed-by: Brijesh Singh

Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-04-07 Thread Brijesh Singh



On 4/7/21 9:21 AM, Tom Lendacky wrote:
> On 4/7/21 8:35 AM, Brijesh Singh wrote:
>> On 4/7/21 6:16 AM, Borislav Petkov wrote:
>>> On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
>>>> Before the GHCB is established the caller does not need to save and
>>>> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
>>>> can be called before and after the GHCB is established hence I am saving
>>>> and restoring GHCB MSRs.
>>> I think you need to elaborate on that, maybe with an example. What the
>>> other sites using the GHCB MSR currently do is:
>>>
>>> 1. request by writing it
>>> 2. read the response
>>>
>>> None of them save and restore it.
>>>
>>> So why here?
>> GHCB provides two ways to exit from the guest to the hypervisor. The MSR
>> protocol and NAEs. The MSR protocol is generally used before the GHCB is
>> established. After the GHCB is established the guests typically uses the
>> NAEs. All of the current call sites uses the MSR protocol before the
>> GHCB is established so they do not need to save and restore the GHCB.
>> The GHCB is established on the first #VC -
>> arch/x86/boot/compressed/sev-es.c early_setup_sev_es(). The GHCB page
>> must a shared page:
>>
>> early_setup_sev_es()
>>
>>   set_page_decrypted()
>>
>>    sev_snp_set_page_shared()
>>
>> The sev_snp_set_page_shared() called before the GHCB is established.
>> While exiting from the decompression the sev_es_shutdown_ghcb() is
>> called to deinit the GHCB.
>>
>> sev_es_shutdown_ghcb()
>>
>>   set_page_encrypted()
>>
>>     sev_snp_set_page_private()
>>
>> Now that sev_snp_set_private() is called after the GHCB is established.
> I believe the current SEV-ES code always sets the GHCB address in the GHCB
> MSR before invoking VMGEXIT, so I think you're safe either way. Worth
> testing at least.


Ah, I didn;t realize that the sev_es_ghcb_hv_call() helper sets the GHCB
MSR before invoking VMGEXIT. I should be able to drop the save and
restore during the page state change. Thanks Tom.

Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-04-07 Thread Brijesh Singh

On 4/7/21 6:16 AM, Borislav Petkov wrote:
> On Tue, Apr 06, 2021 at 10:47:18AM -0500, Brijesh Singh wrote:
>> Before the GHCB is established the caller does not need to save and
>> restore MSRs. The page_state_change() uses the GHCB MSR protocol and it
>> can be called before and after the GHCB is established hence I am saving
>> and restoring GHCB MSRs.
> I think you need to elaborate on that, maybe with an example. What the
> other sites using the GHCB MSR currently do is:
>
> 1. request by writing it
> 2. read the response
>
> None of them save and restore it.
>
> So why here?

GHCB provides two ways to exit from the guest to the hypervisor. The MSR
protocol and NAEs. The MSR protocol is generally used before the GHCB is
established. After the GHCB is established the guests typically uses the
NAEs. All of the current call sites uses the MSR protocol before the
GHCB is established so they do not need to save and restore the GHCB.
The GHCB is established on the first #VC -
arch/x86/boot/compressed/sev-es.c early_setup_sev_es(). The GHCB page
must a shared page:

early_setup_sev_es()

  set_page_decrypted()

   sev_snp_set_page_shared()

The sev_snp_set_page_shared() called before the GHCB is established.
While exiting from the decompression the sev_es_shutdown_ghcb() is
called to deinit the GHCB.

sev_es_shutdown_ghcb()

  set_page_encrypted()

    sev_snp_set_page_private()

Now that sev_snp_set_private() is called after the GHCB is established.

Since both the sev_snp_set_page_{shared, private}() uses the common
routine to request the page change hence I choose the Page State Change
MSR protocol. In one case the page state request happen before and after
the GHCB is established. We need to save and restore GHCB otherwise will
be loose the previously established GHCB GPA.

If needed then we can avoid the save and restore. The GHCB  provides a
page state change NAE that can be used after the GHCB is established. If
we go with it then code may look like this:

1. Read the GHCB MSR to determine whether the GHCB is established.

2. If GHCB is established then use the page state change NAE

3. If GHCB is not established then use the page state change MSR protocol.

We can eliminate the restore but we still need the rdmsr. The code for
using the NAE page state is going to be a bit larger. Since it is not in
the hot path so I felt we stick with MSR protocol for the page state change.

I am open to suggestions. 

-Brijesh

Re: [RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-04-06 Thread Brijesh Singh



On 4/6/21 5:33 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:17AM -0500, Brijesh Singh wrote:
>> Many of the integrity guarantees of SEV-SNP are enforced through the
>> Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
>> particular page of DRAM should be mapped. The VMs can request the
>> hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
>> defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP
>> entry is a Validated flag; this flag is automatically cleared to 0 by the
>> CPU hardware when a new RMP entry is created for a guest. Each VM page
>> can be either validated or invalidated, as indicated by the Validated
>> flag in the RMP entry. Memory access to a private page that is not
>> validated generates a #VC. A VM can use PVALIDATE instruction to validate
>> the private page before using it.
> I guess this should say "A VM must use the PVALIDATE insn to validate
> that private page before using it." Otherwise it can't use it, right.
> Thus the "must" and not "can".


Noted, I should have used "must".

>
>> To maintain the security guarantee of SEV-SNP guests, when transitioning
>> a memory from private to shared, the guest must invalidate the memory range
>> before asking the hypervisor to change the page state to shared in the RMP
>> table.
> So first you talk about memory pages, now about memory range...
>
>> After the page is mapped private in the page table, the guest must issue a
> ... and now about pages again. Let's talk pages only pls.


Noted, I will stick to memory pages. thanks


>
>> page state change VMGEXIT to make the memory private in the RMP table and
>> validate it. If the memory is not validated after its added in the RMP table
>> as private, then a VC exception (page-not-validated) will be raised.
> Didn't you just say this already above?


Yes I said it in the start of the commit, I will work to avoid the
repetition.


>> We do
> Who's "we"?
>
>> not support the page-not-validated exception yet, so it will crash the guest.
>>
>> On boot, BIOS should have validated the entire system memory. During
>> the kernel decompression stage, the VC handler uses the
>> set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
>> attribute). And while exiting from the decompression, it calls the
>> set_memory_encyrpted() to make the page private.
> Hmm, that commit message needs reorganizing, from
> Documentation/process/submitting-patches.rst:
>
>  "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
>   instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
>   to do frotz", as if you are giving orders to the codebase to change
>   its behaviour."
>
> So this should say something along the lines of "Add helpers for validating
> pages in the decompression stage" or so.

I will improve the commit message to avoid using the "[we]" or "[I]".
Add helpers looks good, thanks.


>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Joerg Roedel 
>> Cc: "H. Peter Anvin" 
>> Cc: Tony Luck 
>> Cc: Dave Hansen 
>> Cc: "Peter Zijlstra (Intel)" 
>> Cc: Paolo Bonzini 
>> Cc: Tom Lendacky 
>> Cc: David Rientjes 
>> Cc: Sean Christopherson 
>> Cc: x...@kernel.org
>> Cc: k...@vger.kernel.org
> Btw, you don't really need to add those CCs to the patch - it is enough
> if you Cc the folks when you send the patches with git.


Noted.

>
>> Signed-off-by: Brijesh Singh 
>> ---
>>  arch/x86/boot/compressed/Makefile   |   1 +
>>  arch/x86/boot/compressed/ident_map_64.c |  18 
>>  arch/x86/boot/compressed/sev-snp.c  | 115 
>>  arch/x86/boot/compressed/sev-snp.h  |  25 ++
>>  4 files changed, 159 insertions(+)
>>  create mode 100644 arch/x86/boot/compressed/sev-snp.c
>>  create mode 100644 arch/x86/boot/compressed/sev-snp.h
>>
>> diff --git a/arch/x86/boot/compressed/Makefile 
>> b/arch/x86/boot/compressed/Makefile
>> index e0bc3988c3fa..4d422aae8a86 100644
>> --- a/arch/x86/boot/compressed/Makefile
>> +++ b/arch/x86/boot/compressed/Makefile
>> @@ -93,6 +93,7 @@ ifdef CONFIG_X86_64
>>  vmlinux-objs-y += $(obj)/mem_encrypt.o
>>  vmlinux-objs-y += $(obj)/pgtable_64.o
>>  vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
>> +vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-snp.o
> Yeah, as before, make that a single sev.o and p

Re: [PATCH 3/5] crypto: ccp: Play nice with vmalloc'd memory for SEV command structs

2021-04-05 Thread Brijesh Singh



On 4/5/21 10:06 AM, Sean Christopherson wrote:
> On Sun, Apr 04, 2021, Christophe Leroy wrote:
>> Le 03/04/2021 à 01:37, Sean Christopherson a écrit :
>>> @@ -152,11 +153,21 @@ static int __sev_do_cmd_locked(int cmd, void *data, 
>>> int *psp_ret)
>>> sev = psp->sev_data;
>>> buf_len = sev_cmd_buffer_len(cmd);
>>> -   if (WARN_ON_ONCE(!!data != !!buf_len))
>>> +   if (WARN_ON_ONCE(!!__data != !!buf_len))
>>> return -EINVAL;
>>> -   if (WARN_ON_ONCE(data && is_vmalloc_addr(data)))
>>> -   return -EINVAL;
>>> +   if (__data && is_vmalloc_addr(__data)) {
>>> +   /*
>>> +* If the incoming buffer is virtually allocated, copy it to
>>> +* the driver's scratch buffer as __pa() will not work for such
>>> +* addresses, vmalloc_to_page() is not guaranteed to succeed,
>>> +* and vmalloc'd data may not be physically contiguous.
>>> +*/
>>> +   data = sev->cmd_buf;
>>> +   memcpy(data, __data, buf_len);
>>> +   } else {
>>> +   data = __data;
>>> +   }
>> I don't know how big commands are, but if they are small, it would probably
>> be more efficient to inconditionnally copy them to the buffer rather then
>> doing the test.
> Brijesh, I assume SNP support will need to copy the commands unconditionally? 
> If
> yes, it probably makes sense to do so now and avoid vmalloc dependencies
> completely.  And I think that would allow for the removal of status_cmd_buf 
> and
> init_cmd_buf, or is there another reason those dedicated buffers exist?


Yes, we need to copy the commands unconditionally for the SNP support.
It makes sense to avoid the vmalloc dependencies. I can't think of any
reason why we would need the status_cmd_buf and init_cmd_buf after those
changes.

Re: [PATCH 0/5] ccp: KVM: SVM: Use stack for SEV command buffers

2021-04-04 Thread Brijesh Singh

Hi Sean,

On 4/2/21 6:36 PM, Sean Christopherson wrote:
> While doing minor KVM cleanup to account various kernel allocations, I
> noticed that all of the SEV command buffers are allocated via kmalloc(),
> even for commands whose payloads is smaller than a pointer.  After much
> head scratching, the only reason I could come up with for dynamically
> allocating the command data is CONFIG_VMAP_STACK=y.
>
> This series teaches __sev_do_cmd_locked() to gracefully handle vmalloc'd
> command buffers by copying such buffers an internal buffer before sending
> the command to the PSP.  The SEV driver and KVM are then converted to use
> the stack for all command buffers.

Thanks for the series. Post SNP series, I was going to move all the
command buffer allocation to the stack. You are ahead of me :). I can
certainly build upon your series.

The behavior of the SEV-legacy command is changed when SNP firmware is
in the INIT state. All the legacy commands that cause a firmware to
write to memory must be in the firmware state before issuing the
command. One of my patch in the SNP series is using an internal memory
before sending the command to the PSP.

Looking forward to the SNP support, may I ask you to remove the
vmalloc'd buffer check and use a page for the internal buffer ? In SNP
series, I can simply transition the internal page to firmware state
before issuing the command.

> The first patch is optional, I included it in case someone wants to
> backport it to stable kernels.  It wouldn't actually fix bugs, but it
> would make debugging issues a lot easier if they did pop up.
>
> Tested everything except sev_ioctl_do_pek_import(), I don't know anywhere
> near enough about the PSP to give it the right input.
>
> Based on kvm/queue, commit f96be2deac9b ("KVM: x86: Support KVM VMs
> sharing SEV context") to avoid a minor conflict.
>
> Sean Christopherson (5):
>   crypto: ccp: Detect and reject vmalloc addresses destined for PSP
>   crypto: ccp: Reject SEV commands with mismatching command buffer
>   crypto: ccp: Play nice with vmalloc'd memory for SEV command structs
>   crypto: ccp: Use the stack for small SEV command buffers
>   KVM: SVM: Allocate SEV command structures on local stack
>
>  arch/x86/kvm/svm/sev.c   | 262 +--
>  drivers/crypto/ccp/sev-dev.c | 161 ++---
>  drivers/crypto/ccp/sev-dev.h |   7 +
>  3 files changed, 184 insertions(+), 246 deletions(-)
>

Re: [RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file

2021-04-02 Thread Brijesh Singh



On 4/2/21 2:27 PM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:16AM -0500, Brijesh Singh wrote:
>> The sev_es_terminate() and sev_es_{wr,rd}_ghcb_msr() helper functions
>> in a common file so that it can be used by both the SEV-ES and SEV-SNP.
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Joerg Roedel 
>> Cc: "H. Peter Anvin" 
>> Cc: Tony Luck 
>> Cc: Dave Hansen 
>> Cc: "Peter Zijlstra (Intel)" 
>> Cc: Paolo Bonzini 
>> Cc: Tom Lendacky 
>> Cc: David Rientjes 
>> Cc: Sean Christopherson 
>> Cc: x...@kernel.org
>> Cc: k...@vger.kernel.org
>> Signed-off-by: Brijesh Singh 
>> ---
>>  arch/x86/boot/compressed/sev-common.c | 32 +++
>>  arch/x86/boot/compressed/sev-es.c | 22 ++
>>  arch/x86/kernel/sev-common-shared.c   | 31 ++
>>  arch/x86/kernel/sev-es-shared.c   | 21 +++---
>>  4 files changed, 68 insertions(+), 38 deletions(-)
>>  create mode 100644 arch/x86/boot/compressed/sev-common.c
>>  create mode 100644 arch/x86/kernel/sev-common-shared.c
> Yeah, once you merge it all into sev.c and sev-shared.c, that patch is
> not needed anymore.


Agreed. Renaming the sev-es.{c,h} -> sev.{c,h} will certainly help.
Additionally,  I noticed that GHCB MSR helper macro's are duplicated
between the arch/x86/include/asm/sev-es.h and arch/x86/kvm/svm/svm.h. I
am creating a new file (arch/x86/include/asm/sev-common.h) that will
consolidate all the helper macro common between the guest and the
hypervisor.

>
> Thx.
>

Re: [RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure

2021-04-01 Thread Brijesh Singh



On 4/1/21 5:32 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:15AM -0500, Brijesh Singh wrote:
>> An SNP-active guest will use the page state change VNAE MGEXIT defined in
> I guess this was supposed to mean "NAE VMGEXIT" but pls write "NAE" out
> at least once so that reader can find its way around the spec.


Noted. I will fix in next rev.

>> the GHCB specification section 4.1.6 to ask the hypervisor to make the
>> guest page private or shared in the RMP table. In addition to the
>> private/shared, the guest can also ask the hypervisor to split or
>> combine multiple 4K validated pages as a single 2M page or vice versa.
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Joerg Roedel 
>> Cc: "H. Peter Anvin" 
>> Cc: Tony Luck 
>> Cc: Dave Hansen 
>> Cc: "Peter Zijlstra (Intel)" 
>> Cc: Paolo Bonzini 
>> Cc: Tom Lendacky 
>> Cc: David Rientjes 
>> Cc: Sean Christopherson 
>> Cc: x...@kernel.org
>> Cc: k...@vger.kernel.org
>> Signed-off-by: Brijesh Singh 
>> ---
>>  arch/x86/include/asm/sev-snp.h  | 34 +
>>  arch/x86/include/uapi/asm/svm.h |  1 +
>>  2 files changed, 35 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>> index 5a6d1367cab7..f514dad276f2 100644
>> --- a/arch/x86/include/asm/sev-snp.h
>> +++ b/arch/x86/include/asm/sev-snp.h
>> @@ -22,6 +22,40 @@
>>  #define RMP_PG_SIZE_2M  1
>>  #define RMP_PG_SIZE_4K  0
>>  
>> +/* Page State Change MSR Protocol */
>> +#define GHCB_SNP_PAGE_STATE_CHANGE_REQ  0x0014
>> +#define GHCB_SNP_PAGE_STATE_REQ_GFN(v, o)   
>> (GHCB_SNP_PAGE_STATE_CHANGE_REQ | \
>> + ((unsigned long)((o) & 
>> 0xf) << 52) | \
>> + (((v) << 12) & 
>> 0xff))
> This macro needs to be more readable and I'm not sure the masking is
> correct. IOW, something like this perhaps:
>
> #define GHCB_SNP_PAGE_STATE_REQ_GFN(va, operation)\
>   operation) & 0xf) << 52) | ((va) & GENMASK_ULL(51, 12)) | 
> GHCB_SNP_PAGE_STATE_CHANGE_REQ)


I guess I was trying to keep it in consistent with sev-es.h macro
definitions in which the command is used before the fields. In next
version, I will use the msb to lsb ordering.


>
> where you have each GHCBData element at the proper place, msb to lsb.
> Now, GHCB spec says:
>
>   "GHCBData[51:12] – Guest physical frame number"
>
> and I'm not clear as to what this macro takes: a virtual address or a
> pfn. If it is a pfn, then you need to do:
>
>   (((pfn) << 12) & GENMASK_ULL(51, 0))
>
> but if it is a virtual address you need to do what I have above. Since
> you do "<< 12" then it must be a pfn already but then you should call
> the argument "pfn" so that it is clear what it takes.


Yes, the macro takes the pfn.

> Your mask above covers [55:0] but [55:52] is the page operation so the
> high bit in that mask needs to be 51.

Ack. I will limit the mask to 51 so that we don't go outside the pfn
field width. thank you for pointing it.


> AFAICT, ofc.
>
>> +#define SNP_PAGE_STATE_PRIVATE  1
>> +#define SNP_PAGE_STATE_SHARED   2
>> +#define SNP_PAGE_STATE_PSMASH   3
>> +#define SNP_PAGE_STATE_UNSMASH  4
>> +
>> +#define GHCB_SNP_PAGE_STATE_CHANGE_RESP 0x0015
>> +#define GHCB_SNP_PAGE_STATE_RESP_VAL(val)   ((val) >> 32)
> 
>
> Some stray tabs here and above, pls pay attention to vertical alignment too.


I noticed that sev-es.h uses tab when defining the macro to build
command. Another example where I tried to keep a bit consistentency with
sev-es.h. I will drop it in next rev.


>
>> +
>> +/* Page State Change NAE event */
>> +#define SNP_PAGE_STATE_CHANGE_MAX_ENTRY 253
>> +struct __packed snp_page_state_header {
>> +uint16_t cur_entry;
>> +uint16_t end_entry;
>> +uint32_t reserved;
>> +};
>> +
>> +struct __packed snp_page_state_entry {
>> +uint64_t cur_page:12;
>> +uint64_t gfn:40;
>> +uint64_t operation:4;
>> +uint64_t pagesize:1;
>> +uint64_t reserved:7;
>> +};
> Any particular reason for the uint_t types or can you use our
> shorter u types?

IIRC, the spec structure has uint_t, so I used it as-is. No
strong reason f

Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction

2021-03-26 Thread Brijesh Singh



On 3/26/21 2:12 PM, Borislav Petkov wrote:
> On Fri, Mar 26, 2021 at 01:22:24PM -0500, Brijesh Singh wrote:
>> Should I do the same for the sev-es.c ? Currently, I am keeping all the
>> SEV-SNP specific changes in sev-snp.{c,h}. After a rename of
>> sev-es.{c,h} from both the arch/x86/kernel and arch-x86/boot/compressed
>> I can add the SNP specific stuff to it.
>>
>> Thoughts ?
> SNP depends on the whole functionality in SEV-ES, right? Which means,
> SNP will need all the functionality of sev-es.c.

Yes, SEV-SNP needs the whole SEV-ES functionality.  I will work add
pre-patch to rename sev-es to sev, then add SNP changes in the sev.c.

thanks

>
> But sev-es.c is a lot more code than the header and snp is
>
>  arch/x86/kernel/sev-snp.c   | 269 
>
> oh well, not so much.
>
> I guess a single
>
> arch/x86/kernel/sev.c
>
> is probably ok.
>
> We can always do arch/x86/kernel/sev/ later and split stuff then when it
> starts getting real fat and impacts complication times.
>
> Btw, there's also arch/x86/kernel/sev-es-shared.c and that can be
>
> arch/x86/kernel/sev-shared.c
>
> then.
>
> Thx.
>

Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction

2021-03-26 Thread Brijesh Singh



On 3/26/21 2:22 PM, Borislav Petkov wrote:
> On Fri, Mar 26, 2021 at 10:42:56AM -0500, Brijesh Singh wrote:
>> There is no strong reason for a separate sev-snp.h. I will add a
>> pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.
> Thx.
>
>> I was trying to adhere to existing functions which uses a direct
>> instruction opcode.
> That's not really always the case:
>
> arch/x86/include/asm/special_insns.h
>
> The "__" prefixed things should mean lower abstraction level helpers and
> we drop the ball on those sometimes.
>
>> It's not duplicate error code. The EAX returns an actual error code. The
>> rFlags contains additional information. We want both the codes available
>> to the caller so that it can make a proper decision.
>>
>> e.g.
>>
>> 1. A callers validate an address 0x1000. The instruction validated it
>> and return success.
> Your function returns PVALIDATE_SUCCESS.
>
>> 2. Caller asked to validate the same address again. The instruction will
>> return success but since the address was validated before hence
>> rFlags.CF will be set to indicate that PVALIDATE instruction did not
>> made any change in the RMP table.
> Your function returns PVALIDATE_VALIDATED_ALREADY or so.
>
>> You are correct that currently I am using only carry flag. So far we
>> don't need other flags. What do you think about something like this:
>>
>> * Add a new user defined error code
>>
>>  #define PVALIDATE_FAIL_NOUPDATE        255 /* The error is returned if
>> rFlags.CF set */
> Or that.
>
>> * Remove the rFlags parameters from the __pvalidate()
> Yes, it seems unnecessary at the moment. And I/O function arguments are
> always yuck.
>
>> * Update the __pvalidate to check the rFlags.CF and if set then return
>> the new user-defined error code.
> Yap, you can convert all that to pvalidate() return values, methinks,
> and then make that function simpler for callers because they should
> not have to deal with rFLAGS - only return values to denote what the
> function did.


Ack. I will made the required changes in next version.

>
> Thx.
>

Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction

2021-03-26 Thread Brijesh Singh



On 3/26/21 10:42 AM, Brijesh Singh wrote:
> On 3/26/21 9:30 AM, Borislav Petkov wrote:
>> On Wed, Mar 24, 2021 at 11:44:14AM -0500, Brijesh Singh wrote:
>>>  arch/x86/include/asm/sev-snp.h | 52 ++
>> Hmm, a separate header.
>>
>> Yeah, I know we did sev-es.h but I think it all should be in a single
>> sev.h which contains all AMD-specific memory encryption declarations.
>> It's not like it is going to be huge or so, by the looks of how big
>> sev-es.h is.
>>
>> Or is there a particular need to have a separate snp header?
>>
>> If not, please do a pre-patch which renames sev-es.h to sev.h and then
>> add the SNP stuff to it.
>
> There is no strong reason for a separate sev-snp.h. I will add a
> pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.


Should I do the same for the sev-es.c ? Currently, I am keeping all the
SEV-SNP specific changes in sev-snp.{c,h}. After a rename of
sev-es.{c,h} from both the arch/x86/kernel and arch-x86/boot/compressed
I can add the SNP specific stuff to it.

Thoughts ?

>
>>>  1 file changed, 52 insertions(+)
>>>  create mode 100644 arch/x86/include/asm/sev-snp.h
>>>
>>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>>> new file mode 100644
>>> index ..5a6d1367cab7
>>> --- /dev/null
>>> +++ b/arch/x86/include/asm/sev-snp.h
>>> @@ -0,0 +1,52 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +/*
>>> + * AMD SEV Secure Nested Paging Support
>>> + *
>>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>>> + *
>>> + * Author: Brijesh Singh 
>>> + */
>>> +
>>> +#ifndef __ASM_SECURE_NESTED_PAGING_H
>>> +#define __ASM_SECURE_NESTED_PAGING_H
>>> +
>>> +#ifndef __ASSEMBLY__
>>> +#include  /* native_save_fl() */
>> Where is that used? Looks like leftovers.
>
> Initially I was thinking to use the native_save_fl() to read the rFlags
> but then realized that what if rFlags get changed between the call to
> pvalidate instruction and native_save_fl(). I will remove this header
> inclusion. Thank you for pointing.
>
>>> +
>>> +/* Return code of __pvalidate */
>>> +#define PVALIDATE_SUCCESS  0
>>> +#define PVALIDATE_FAIL_INPUT   1
>>> +#define PVALIDATE_FAIL_SIZEMISMATCH6
>>> +
>>> +/* RMP page size */
>>> +#define RMP_PG_SIZE_2M 1
>>> +#define RMP_PG_SIZE_4K 0
>>> +
>>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>>> +static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int 
>>> validate,
>> Why the "__" prefix?
> I was trying to adhere to existing functions which uses a direct
> instruction opcode. Most of those function have "__" prefix (e.g
> __mwait, __tpause, ..).
>
> Should I drop the __prefix ?
>
>  
>
>>> + unsigned long *rflags)
>>> +{
>>> +   unsigned long flags;
>>> +   int rc;
>>> +
>>> +   asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
>>> +"pushf; pop %0\n\t"
>> Ewww, PUSHF is expensive.
>>
>>> +: "=rm"(flags), "=a"(rc)
>>> +: "a"(vaddr), "c"(rmp_psize), "d"(validate)
>>> +: "memory", "cc");
>>> +
>>> +   *rflags = flags;
>>> +   return rc;
>> Hmm, rc *and* rflags. Manual says "Upon completion, a return code is
>> stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on this
>> return code."
>>
>> So what exactly does that mean and is the return code duplicated in
>> rFLAGS?
>
> It's not duplicate error code. The EAX returns an actual error code. The
> rFlags contains additional information. We want both the codes available
> to the caller so that it can make a proper decision.
>
> e.g.
>
> 1. A callers validate an address 0x1000. The instruction validated it
> and return success.
>
> 2. Caller asked to validate the same address again. The instruction will
> return success but since the address was validated before hence
> rFlags.CF will be set to indicate that PVALIDATE instruction did not
> made any change in the RMP table.
>
>> If so, can you return a single value which has everything you need to
>> know?
>>
>> I see that you're using the retval only for the carry flag to check
>> whether the page has alr

Re: [RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction

2021-03-26 Thread Brijesh Singh

On 3/26/21 9:30 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:14AM -0500, Brijesh Singh wrote:
>>  arch/x86/include/asm/sev-snp.h | 52 ++
> Hmm, a separate header.
>
> Yeah, I know we did sev-es.h but I think it all should be in a single
> sev.h which contains all AMD-specific memory encryption declarations.
> It's not like it is going to be huge or so, by the looks of how big
> sev-es.h is.
>
> Or is there a particular need to have a separate snp header?
>
> If not, please do a pre-patch which renames sev-es.h to sev.h and then
> add the SNP stuff to it.

There is no strong reason for a separate sev-snp.h. I will add a
pre-patch to rename sev-es.h to sev.h and add SNP stuff to it.

>
>>  1 file changed, 52 insertions(+)
>>  create mode 100644 arch/x86/include/asm/sev-snp.h
>>
>> diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
>> new file mode 100644
>> index ..5a6d1367cab7
>> --- /dev/null
>> +++ b/arch/x86/include/asm/sev-snp.h
>> @@ -0,0 +1,52 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * AMD SEV Secure Nested Paging Support
>> + *
>> + * Copyright (C) 2021 Advanced Micro Devices, Inc.
>> + *
>> + * Author: Brijesh Singh 
>> + */
>> +
>> +#ifndef __ASM_SECURE_NESTED_PAGING_H
>> +#define __ASM_SECURE_NESTED_PAGING_H
>> +
>> +#ifndef __ASSEMBLY__
>> +#include  /* native_save_fl() */
> Where is that used? Looks like leftovers.

Initially I was thinking to use the native_save_fl() to read the rFlags
but then realized that what if rFlags get changed between the call to
pvalidate instruction and native_save_fl(). I will remove this header
inclusion. Thank you for pointing.

>
>> +
>> +/* Return code of __pvalidate */
>> +#define PVALIDATE_SUCCESS   0
>> +#define PVALIDATE_FAIL_INPUT1
>> +#define PVALIDATE_FAIL_SIZEMISMATCH 6
>> +
>> +/* RMP page size */
>> +#define RMP_PG_SIZE_2M  1
>> +#define RMP_PG_SIZE_4K  0
>> +
>> +#ifdef CONFIG_AMD_MEM_ENCRYPT
>> +static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int 
>> validate,
> Why the "__" prefix?

I was trying to adhere to existing functions which uses a direct
instruction opcode. Most of those function have "__" prefix (e.g
__mwait, __tpause, ..).

Should I drop the __prefix ?

>
>> +  unsigned long *rflags)
>> +{
>> +unsigned long flags;
>> +int rc;
>> +
>> +asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
>> + "pushf; pop %0\n\t"
> Ewww, PUSHF is expensive.
>
>> + : "=rm"(flags), "=a"(rc)
>> + : "a"(vaddr), "c"(rmp_psize), "d"(validate)
>> + : "memory", "cc");
>> +
>> +*rflags = flags;
>> +return rc;
> Hmm, rc *and* rflags. Manual says "Upon completion, a return code is
> stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on this
> return code."
>
> So what exactly does that mean and is the return code duplicated in
> rFLAGS?

It's not duplicate error code. The EAX returns an actual error code. The
rFlags contains additional information. We want both the codes available
to the caller so that it can make a proper decision.

e.g.

1. A callers validate an address 0x1000. The instruction validated it
and return success.

2. Caller asked to validate the same address again. The instruction will
return success but since the address was validated before hence
rFlags.CF will be set to indicate that PVALIDATE instruction did not
made any change in the RMP table.

> If so, can you return a single value which has everything you need to
> know?
>
> I see that you're using the retval only for the carry flag to check
> whether the page has already been validated so I think you could define
> a set of return value defines from that function which callers can
> check.

You are correct that currently I am using only carry flag. So far we
don't need other flags. What do you think about something like this:

* Add a new user defined error code

 #define PVALIDATE_FAIL_NOUPDATE        255 /* The error is returned if
rFlags.CF set */

* Remove the rFlags parameters from the __pvalidate()

* Update the __pvalidate to check the rFlags.CF and if set then return
the new user-defined error code.

> And looking above again, you do have PVALIDATE_* defines except that
> nothing's using them. Use them please.

Actually the later patches does make use of the error codes

Re: [RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support

2021-03-25 Thread Brijesh Singh



On 3/25/21 10:51 AM, Dave Hansen wrote:
> On 3/25/21 8:31 AM, Brijesh Singh wrote:
>> On 3/25/21 9:58 AM, Dave Hansen wrote:
>>>> +static int __init mem_encrypt_snp_init(void)
>>>> +{
>>>> +  if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
>>>> +  return 1;
>>>> +
>>>> +  if (rmptable_init()) {
>>>> +  setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>>> +  return 1;
>>>> +  }
>>>> +
>>>> +  static_branch_enable(_enable_key);
>>>> +
>>>> +  return 0;
>>>> +}
>>> Could you explain a bit why 'snp_enable_key' is needed in addition to
>>> X86_FEATURE_SEV_SNP?
>>
>> The X86_FEATURE_SEV_SNP indicates that hardware supports the feature --
>> this does not necessary means that SEV-SNP is enabled in the host.
> I think you're confusing the CPUID bit that initially populates
> X86_FEATURE_SEV_SNP with the X86_FEATURE bit.  We clear X86_FEATURE bits
> all the time for features that the kernel turns off, even while the
> hardware supports it.


Ah, yes I was getting mixed up. I will see if we can remove the
snp_key_enabled and use the feature check.


> Look at what we do in init_ia32_feat_ctl() for SGX, for instance.  We
> then go on to use X86_FEATURE_SGX at runtime to see if SGX was disabled,
> even though the hardware supports it.
>
>>> For a lot of features, we just use cpu_feature_enabled(), which does
>>> both compile-time and static_cpu_has().  This whole series seems to lack
>>> compile-time disables for the code that it adds, like the code it adds
>>> to arch/x86/mm/fault.c or even mm/memory.c.
>> Noted, I will add the #ifdef  to make sure that its compiled out when
>> the config does not have the AMD_MEM_ENCRYPTION enabled.
> IS_ENABLED() tends to be nicer for these things.
>
> Even better is if you coordinate these with your X86_FEATURE_SEV_SNP
> checks.  Then, put X86_FEATURE_SEV_SNP in disabled-features.h, and you
> can use cpu_feature_enabled(X86_FEATURE_SEV_SNP) as both a
> (statically-patched) runtime *AND* compile-time check without an
> explicit #ifdefs.

I will try improve this in v2 and will try IS_ENABLED().

Re: [RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support

2021-03-25 Thread Brijesh Singh

On 3/25/21 9:58 AM, Dave Hansen wrote:
>> +static int __init mem_encrypt_snp_init(void)
>> +{
>> +if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
>> +return 1;
>> +
>> +if (rmptable_init()) {
>> +setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>> +return 1;
>> +}
>> +
>> +static_branch_enable(_enable_key);
>> +
>> +return 0;
>> +}
> Could you explain a bit why 'snp_enable_key' is needed in addition to
> X86_FEATURE_SEV_SNP?

The X86_FEATURE_SEV_SNP indicates that hardware supports the feature --
this does not necessary means that SEV-SNP is enabled in the host. The
snp_enabled_key() helper is later used by kernel and drivers to check
whether SEV-SNP is enabled. e.g. when a driver calls the RMPUPDATE
instruction, the rmpupdate helper routine checks whether the SNP is
enabled. If SEV-SNP is not enabled then instruction will cause a #UD.

>
> For a lot of features, we just use cpu_feature_enabled(), which does
> both compile-time and static_cpu_has().  This whole series seems to lack
> compile-time disables for the code that it adds, like the code it adds
> to arch/x86/mm/fault.c or even mm/memory.c.

Noted, I will add the #ifdef  to make sure that its compiled out when
the config does not have the AMD_MEM_ENCRYPTION enabled.

>

Re: [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation

2021-03-25 Thread Brijesh Singh



On 3/25/21 9:48 AM, Dave Hansen wrote:
> On 3/24/21 10:04 AM, Brijesh Singh wrote:
>> When SEV-SNP is enabled globally in the system, a write from the hypervisor
>> can raise an RMP violation. We can resolve the RMP violation by splitting
>> the virtual address to a lower page level.
>>
>> e.g
>> - guest made a page shared in the RMP entry so that the hypervisor
>>   can write to it.
>> - the hypervisor has mapped the pfn as a large page. A write access
>>   will cause an RMP violation if one of the pages within the 2MB region
>>   is a guest private page.
>>
>> The above RMP violation can be resolved by simply splitting the large
>> page.
> What if the large page is provided by hugetlbfs?

I was not able to find a method to split the large pages in the
hugetlbfs. Unfortunately, at this time a VMM cannot use the backing
memory from the hugetlbfs pool. An SEV-SNP aware VMM can use either
transparent hugepage or small pages.


>
> What if the kernel uses the direct map to access the page instead of the
> userspace mapping?


See the Patch 04/30. Currently, we split the kernel direct maps to 4K
before adding the page in the RMP table to avoid the need to split the
pages due to the RMP fault.


>
>> The architecture specific code will read the RMP entry to determine
>> if the fault can be resolved by splitting and propagating the request
>> to split the page by setting newly introduced fault flag
>> (FAULT_FLAG_PAGE_SPLIT). If the fault cannot be resolved by splitting,
>> then a SIGBUS signal is sent to terminate the process.
> Are users just supposed to know what memory types are compatible with
> SEV-SNP?  Basically, don't use anything that might map a guest using
> non-4k entries, except THP?


Currently, VMM will need to know the compatible memory type and use it
for allocating the backing pages.

>
> This does seem like a rather nasty aspect of the hardware.  For
> everything else, if the virtualization page tables and the x86 tables
> disagree, the TLB just sees the smallest page size.
>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index 7605e06a6dd9..f6571563f433 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -1305,6 +1305,70 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned 
>> long hw_error_code,
>>  }
>>  NOKPROBE_SYMBOL(do_kern_addr_fault);
>>  
>> +#define RMP_FAULT_RETRY 0
>> +#define RMP_FAULT_KILL  1
>> +#define RMP_FAULT_PAGE_SPLIT2
>> +
>> +static inline size_t pages_per_hpage(int level)
>> +{
>> +return page_level_size(level) / PAGE_SIZE;
>> +}
>> +
>> +/*
>> + * The RMP fault can happen when a hypervisor attempts to write to:
>> + * 1. a guest owned page or
>> + * 2. any pages in the large page is a guest owned page.
>> + *
>> + * #1 will happen only when a process or VMM is attempting to modify the 
>> guest page
>> + * without the guests cooperation. If a guest wants a VMM to be able to 
>> write to its memory
>> + * then it should make the page shared. If we detect #1, kill the process 
>> because we can not
>> + * resolve the fault.
>> + *
>> + * #2 can happen when the page level does not match between the RMP entry 
>> and x86
>> + * page table walk, e.g the page is mapped as a large page in the x86 page 
>> table but its
>> + * added as a 4K shared page in the RMP entry. This can be resolved by 
>> splitting the address
>> + * into a smaller page level.
>> + */
> These comments need to get wrapped a bit sooner.  Could you try to match
> some of the others in the file?


Noted.


>
>> +static int handle_rmp_page_fault(unsigned long hw_error_code, unsigned long 
>> address)
>> +{
>> +unsigned long pfn, mask;
>> +int rmp_level, level;
>> +rmpentry_t *e;
>> +pte_t *pte;
>> +
>> +/* Get the native page level */
>> +pte = lookup_address_in_mm(current->mm, address, );
>> +if (unlikely(!pte))
>> +return RMP_FAULT_KILL;
>> +
>> +pfn = pte_pfn(*pte);
>> +if (level > PG_LEVEL_4K) {
>> +mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
>> +pfn |= (address >> PAGE_SHIFT) & mask;
>> +}
> What is this trying to do, exactly?


Trying to calculate the pfn within the large entry.

The lookup above will return a base pfn of a large page. Need to find
index within the large page to calculate the PFN.

>
>> +/* Get the page level from the RMP entry. */
>> +e = lookup_page_in_rmptable(pfn_to_page(pfn), _leve

Re: [RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature

2021-03-25 Thread Brijesh Singh



On 3/25/21 5:54 AM, Borislav Petkov wrote:
> On Wed, Mar 24, 2021 at 11:44:12AM -0500, Brijesh Singh wrote:
>> Add CPU feature detection for Secure Encrypted Virtualization with
>> Secure Nested Paging. This feature adds a strong memory integrity
>> protection to help prevent malicious hypervisor-based attacks like
>> data replay, memory re-mapping, and more.
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Joerg Roedel 
>> Cc: "H. Peter Anvin" 
>> Cc: Tony Luck 
>> Cc: Dave Hansen 
>> Cc: "Peter Zijlstra (Intel)" 
>> Cc: Paolo Bonzini 
>> Cc: Tom Lendacky 
>> Cc: David Rientjes 
>> Cc: Sean Christopherson 
>> Cc: x...@kernel.org
>> Cc: k...@vger.kernel.org
>> Signed-off-by: Brijesh Singh 
>> ---
>>  arch/x86/include/asm/cpufeatures.h | 1 +
>>  arch/x86/kernel/cpu/amd.c  | 3 ++-
>>  arch/x86/kernel/cpu/scattered.c| 1 +
>>  3 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/cpufeatures.h 
>> b/arch/x86/include/asm/cpufeatures.h
>> index 84b887825f12..a5b369f10bcd 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -238,6 +238,7 @@
>>  #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers 
>> VMMCALL hypercall instruction */
>>  #define X86_FEATURE_SEV_ES  ( 8*32+20) /* AMD Secure Encrypted 
>> Virtualization - Encrypted State */
>>  #define X86_FEATURE_VM_PAGE_FLUSH   ( 8*32+21) /* "" VM Page Flush MSR is 
>> supported */
>> +#define X86_FEATURE_SEV_SNP ( 8*32+22) /* AMD Secure Encrypted 
>> Virtualization - Secure Nested Paging */
> That leaf got a separate word now: word 19.
>
> For the future: pls redo your patches against tip/master because it has
> the latest state of affairs in tip-land.


For the early feedback I was trying to find one tree which can be used
for building both the guest and hypervisor at once. In future, I will
submit the part-1 against the tip/master and part-2 against the
kvm/master. thanks


>
>>  /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
>>  #define X86_FEATURE_FSGSBASE( 9*32+ 0) /* RDFSBASE, 
>> WRFSBASE, RDGSBASE, WRGSBASE instructions*/
>> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
>> index f8ca66f3d861..39f7a4b5b04c 100644
>> --- a/arch/x86/kernel/cpu/amd.c
>> +++ b/arch/x86/kernel/cpu/amd.c
>> @@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 
>> *c)
>>   *If BIOS has not enabled SME then don't advertise the
>>   *SME feature (set in scattered.c).
>>   *   For SEV: If BIOS has not enabled SEV then don't advertise the
>> - *SEV and SEV_ES feature (set in scattered.c).
>> + *SEV, SEV_ES and SEV_SNP feature (set in scattered.c).
> So you can remove the "scattered.c" references in the comments here.
>
>>   *
>>   *   In all cases, since support for SME and SEV requires long mode,
>>   *   don't advertise the feature under CONFIG_X86_32.
>> @@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 
>> *c)
>>  clear_sev:
>>  setup_clear_cpu_cap(X86_FEATURE_SEV);
>>  setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
>> +setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
>>  }
>>  }
>>  
>> diff --git a/arch/x86/kernel/cpu/scattered.c 
>> b/arch/x86/kernel/cpu/scattered.c
>> index 236924930bf0..eaec1278dc2e 100644
>> --- a/arch/x86/kernel/cpu/scattered.c
>> +++ b/arch/x86/kernel/cpu/scattered.c
>> @@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>>  { X86_FEATURE_SEV_ES,   CPUID_EAX,  3, 0x801f, 0 },
>>  { X86_FEATURE_SME_COHERENT, CPUID_EAX, 10, 0x801f, 0 },
>>  { X86_FEATURE_VM_PAGE_FLUSH,CPUID_EAX,  2, 0x801f, 0 },
>> +{ X86_FEATURE_SEV_SNP,  CPUID_EAX,  4, 0x801f, 0 },
>>  { 0, 0, 0, 0, 0 }
>>  };
> And this too.
>
> Thx.
>

Re: [RFC Part2 PATCH 05/30] x86: define RMP violation #PF error code

2021-03-25 Thread Brijesh Singh



On 3/24/21 1:03 PM, Dave Hansen wrote:
>> diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
>> index 10b1de500ab1..107f9d947e8d 100644
>> --- a/arch/x86/include/asm/trap_pf.h
>> +++ b/arch/x86/include/asm/trap_pf.h
>> @@ -12,6 +12,7 @@
>>   *   bit 4 ==   1: fault was an instruction 
>> fetch
>>   *   bit 5 ==   1: protection keys block access
>>   *   bit 15 ==  1: SGX MMU page-fault
>> + *   bit 31 ==  1: fault was an RMP violation
>>   */
>>  enum x86_pf_error_code {
>>  X86_PF_PROT =   1 << 0,
>> @@ -21,6 +22,7 @@ enum x86_pf_error_code {
>>  X86_PF_INSTR=   1 << 4,
>>  X86_PF_PK   =   1 << 5,
>>  X86_PF_SGX  =   1 << 15,
>> +X86_PF_RMP  =   1ull << 31,
>>  };
> Man, I hope AMD and Intel are talking to each other about these bits.  :)
>
> Either way, this is hitting the limits of what I know about how enums
> are implemented.  I had internalized that they are just an 'int', but
> that doesn't seem quite right.  It sounds like they must be implemented
> using *an* integer type, but not necessarily 'int' itself.
>
> Either way, '1<<31' doesn't fit in a 32-bit signed int.  But, gcc at
> least doesn't seem to blow the enum up into a 64-bit type, which is nice.
>
> Could we at least start declaring these with BIT()?


Sure, I can bit the BIT() macro to define the bits. Do you want me to
update all of the fault codes to use BIT() or just the one I am adding
in this patch ?

Re: [RFC Part2 PATCH 06/30] x86/fault: dump the RMP entry on #PF

2021-03-24 Thread Brijesh Singh



On 3/24/21 12:47 PM, Andy Lutomirski wrote:
> On Wed, Mar 24, 2021 at 10:04 AM Brijesh Singh  wrote:
>> If hardware detects an RMP violation, it will raise a page-fault exception
>> with the RMP bit set. To help the debug, dump the RMP entry of the faulting
>> address.
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Joerg Roedel 
>> Cc: "H. Peter Anvin" 
>> Cc: Tony Luck 
>> Cc: Dave Hansen 
>> Cc: "Peter Zijlstra (Intel)" 
>> Cc: Paolo Bonzini 
>> Cc: Tom Lendacky 
>> Cc: David Rientjes 
>> Cc: Sean Christopherson 
>> Cc: x...@kernel.org
>> Cc: k...@vger.kernel.org
>> Signed-off-by: Brijesh Singh 
>> ---
>>  arch/x86/mm/fault.c | 75 +
>>  1 file changed, 75 insertions(+)
>>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index f39b551f89a6..7605e06a6dd9 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -31,6 +31,7 @@
>>  #include  /* VMALLOC_START, ...   */
>>  #include   /* kvm_handle_async_pf  */
>>  #include   /* fixup_vdso_exception()   */
>> +#include/* lookup_rmpentry ...  */
>>
>>  #define CREATE_TRACE_POINTS
>>  #include 
>> @@ -147,6 +148,76 @@ is_prefetch(struct pt_regs *regs, unsigned long 
>> error_code, unsigned long addr)
>>  DEFINE_SPINLOCK(pgd_lock);
>>  LIST_HEAD(pgd_list);
>>
>> +static void dump_rmpentry(struct page *page, rmpentry_t *e)
>> +{
>> +   unsigned long paddr = page_to_pfn(page) << PAGE_SHIFT;
>> +
>> +   pr_alert("RMPEntry paddr 0x%lx [assigned=%d immutable=%d pagesize=%d 
>> gpa=0x%lx asid=%d "
>> +   "vmsa=%d validated=%d]\n", paddr, rmpentry_assigned(e), 
>> rmpentry_immutable(e),
>> +   rmpentry_pagesize(e), rmpentry_gpa(e), rmpentry_asid(e), 
>> rmpentry_vmsa(e),
>> +   rmpentry_validated(e));
>> +   pr_alert("RMPEntry paddr 0x%lx %016llx %016llx\n", paddr, e->high, 
>> e->low);
>> +}
>> +
>> +static void show_rmpentry(unsigned long address)
>> +{
>> +   struct page *page = virt_to_page(address);
> This is an error path, and I don't think you have any particular
> guarantee that virt_to_page(address) is valid.  Please add appropriate
> validation or use one of the slow lookup helpers.


Noted, thanks for the quick feedback.

[RFC Part2 PATCH 28/30] KVM: SVM: add support to handle Page State Change VMGEXIT

2021-03-24 Thread Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change NAE event
as defined in the GHCB specification section 4.1.6.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c | 58 ++
 arch/x86/kvm/svm/svm.h |  4 +++
 2 files changed, 62 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 8f046b45c424..35e7a7bbf878 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2141,6 +2141,7 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
case SVM_VMGEXIT_AP_HLT_LOOP:
case SVM_VMGEXIT_AP_JUMP_TABLE:
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
+   case SVM_VMGEXIT_PAGE_STATE_CHANGE:
break;
default:
goto vmgexit_err;
@@ -2425,6 +2426,7 @@ static int __snp_handle_page_state_change(struct kvm_vcpu 
*vcpu, int op, gpa_t g
case SNP_PAGE_STATE_PRIVATE:
rc = snp_make_page_private(vcpu, gpa, pfn, level);
break;
+   /* TODO: Add USMASH and PSMASH support */
default:
rc = -EINVAL;
break;
@@ -2445,6 +2447,53 @@ static int __snp_handle_page_state_change(struct 
kvm_vcpu *vcpu, int op, gpa_t g
return rc;
 }
 
+static unsigned long snp_handle_page_state_change(struct vcpu_svm *svm, struct 
ghcb *ghcb)
+{
+   struct snp_page_state_entry *entry;
+   struct kvm_vcpu *vcpu = >vcpu;
+   struct snp_page_state_change *info;
+   unsigned long rc;
+   int level, op;
+   gpa_t gpa;
+
+   if (!sev_snp_guest(vcpu->kvm))
+   return -ENXIO;
+
+   if (!setup_vmgexit_scratch(svm, true, sizeof(ghcb->save.sw_scratch))) {
+   pr_err("vmgexit: scratch area is not setup.\n");
+   return -EINVAL;
+   }
+
+   info = (struct snp_page_state_change *)svm->ghcb_sa;
+   entry = >entry[info->header.cur_entry];
+
+   if ((info->header.cur_entry >= SNP_PAGE_STATE_CHANGE_MAX_ENTRY) ||
+   (info->header.end_entry >= SNP_PAGE_STATE_CHANGE_MAX_ENTRY) ||
+   (info->header.cur_entry > info->header.end_entry))
+   return VMGEXIT_PAGE_STATE_INVALID_HEADER;
+
+   while (info->header.cur_entry <= info->header.end_entry) {
+   entry = >entry[info->header.cur_entry];
+   gpa = gfn_to_gpa(entry->gfn);
+   level = RMP_X86_PG_LEVEL(entry->pagesize);
+   op = entry->operation;
+
+   if (!IS_ALIGNED(gpa, page_level_size(level))) {
+   rc = VMGEXIT_PAGE_STATE_INVALID_ENTRY;
+   goto out;
+   }
+
+   rc = __snp_handle_page_state_change(vcpu, op, gpa, level);
+   if (rc)
+   goto out;
+
+   info->header.cur_entry++;
+   }
+
+out:
+   return rc;
+}
+
 static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
struct vmcb_control_area *control = >vmcb->control;
@@ -2667,6 +2716,15 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
ret = 1;
break;
}
+   case SVM_VMGEXIT_PAGE_STATE_CHANGE: {
+   unsigned long rc;
+
+   ret = 1;
+
+   rc = snp_handle_page_state_change(svm, ghcb);
+   ghcb_set_sw_exit_info_2(ghcb, rc);
+   break;
+   }
case SVM_VMGEXIT_UNSUPPORTED_EVENT:
vcpu_unimpl(>vcpu,
"vmgexit: unsupported event - exit_info_1=%#llx, 
exit_info_2=%#llx\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 31bc9cc12c44..9fcfceb4d71e 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -600,6 +600,10 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #defineGHCB_MSR_PAGE_STATE_CHANGE_RSVD_POS 12
 #defineGHCB_MSR_PAGE_STATE_CHANGE_RSVD_MASK0xf
 
+#define VMGEXIT_PAGE_STATE_INVALID_HEADER  0x10001
+#define VMGEXIT_PAGE_STATE_INVALID_ENTRY   0x10002
+#define VMGEXIT_PAGE_STATE_FIRMWARE_ERROR(x)   ((x & 0x) | 0x2)
+
 #define GHCB_MSR_TERM_REQ  0x100
 #define GHCB_MSR_TERM_REASON_SET_POS   12
 #define GHCB_MSR_TERM_REASON_SET_MASK  0xf
-- 
2.17.1

[RFC Part2 PATCH 30/30] KVM: X86: Add support to handle the RMP nested page fault

2021-03-24 Thread Brijesh Singh

Follow the recommendation from APM2 section 15.36.10 and 15.36.11 to
resolve the RMP violation encountered during the NPT table walk.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/mmu/mmu.c  | 20 
 arch/x86/kvm/svm/sev.c  | 57 +
 arch/x86/kvm/svm/svm.c  |  1 +
 arch/x86/kvm/svm/svm.h  |  2 ++
 5 files changed, 82 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ea584606885..79dec4f93808 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1311,6 +1311,8 @@ struct kvm_x86_ops {
void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
int (*get_tdp_max_page_level)(struct kvm_vcpu *vcpu, gpa_t gpa, int 
max_level);
+   int (*handle_rmp_page_fault)(struct kvm_vcpu *vcpu, gpa_t gpa, 
kvm_pfn_t pfn,
+   int level, u64 error_code);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1e057e046ca4..ec396169706f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5105,6 +5105,18 @@ int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, 
gva_t gva)
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
 
+static int handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 
error_code)
+{
+   kvm_pfn_t pfn;
+   int level;
+
+   if (unlikely(!kvm_mmu_get_tdp_walk(vcpu, gpa, , )))
+   return RET_PF_RETRY;
+
+   kvm_x86_ops.handle_rmp_page_fault(vcpu, gpa, pfn, level, error_code);
+   return RET_PF_RETRY;
+}
+
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
   void *insn, int insn_len)
 {
@@ -5121,6 +5133,14 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa, u64 error_code,
goto emulate;
}
 
+   if (unlikely(error_code & PFERR_GUEST_RMP_MASK)) {
+   r = handle_rmp_page_fault(vcpu, cr2_or_gpa, error_code);
+   if (r == RET_PF_RETRY)
+   return 1;
+   else
+   return r;
+   }
+
if (r == RET_PF_INVALID) {
r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa,
  lower_32_bits(error_code), false);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 35e7a7bbf878..dbb4f15de9ba 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2924,3 +2924,60 @@ int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, 
gpa_t gpa, int max_level)
 
return min_t(uint32_t, level, max_level);
 }
+
+int snp_handle_rmp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t pfn,
+ int level, u64 error_code)
+{
+   int rlevel, rc = 0;
+   rmpentry_t *e;
+   bool private;
+   gfn_t gfn;
+
+   e = lookup_page_in_rmptable(pfn_to_page(pfn), );
+   if (!e)
+   return 1;
+
+   private = !!(error_code & PFERR_GUEST_ENC_MASK);
+
+   /*
+* See APM section 15.36.11 on how to handle the RMP fault for the 
large pages.
+*
+*  npt  rmpaccess  action
+*  --
+*  4k   2M C=1   psmash
+*  xx  C=1   if page is not private then add a new RMP 
entry
+*  xx  C=0   if page is private then make it shared
+*  2M   4k C=x   zap
+*/
+   if ((error_code & PFERR_GUEST_SIZEM_MASK) ||
+   ((level == PG_LEVEL_4K) && (rlevel == PG_LEVEL_2M) && private)) {
+   rc = snp_rmptable_psmash(vcpu, pfn);
+   goto zap_gfn;
+   }
+
+   /*
+* If it's a private access, and the page is not assigned in the RMP 
table, create a
+* new private RMP entry.
+*/
+   if (!rmpentry_assigned(e) && private) {
+   rc = snp_make_page_private(vcpu, gpa, pfn, PG_LEVEL_4K);
+   goto zap_gfn;
+   }
+
+   /*
+* If it's a shared access, then make the page shared in the RMP table.
+*/
+   if (rmpentry_assigned(e) && !private)
+   rc = snp_make_page_shared(vcpu, gpa, pfn, PG_LEVEL_4K);
+
+zap_gfn:
+   /*
+* Now that we have updated the RMP pagesize, zap the existing rmaps for
+* large entry ranges so that nested p

[RFC Part2 PATCH 29/30] KVM: X86: export the kvm_zap_gfn_range() for the SNP use

2021-03-24 Thread Brijesh Singh

While resolving the RMP page fault, we may run into cases where the page
level between the RMP entry and TDP does not match and the 2M RMP entry
must be split into 4K RMP entries. Or a 2M TDP page need to be broken
into multiple of 4K pages.

To keep the RMP and TDP page level in sync, we will zap the gfn range
after splitting the pages in the RMP entry. The zap should force the
TDP to gets rebuilt with the new page level.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/mmu.h  | 2 --
 arch/x86/kvm/mmu/mmu.c  | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 074605408970..5ea584606885 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1397,6 +1397,8 @@ void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+
 
 int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3);
 bool pdptrs_changed(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index e7c4e55215bf..5f7ebe4afd63 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -223,8 +223,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, 
struct kvm_mmu *mmu,
return -(u32)fault & errcode;
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 147f22bda6e7..1e057e046ca4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5569,6 +5569,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
 
spin_unlock(>mmu_lock);
 }
+EXPORT_SYMBOL_GPL(kvm_zap_gfn_range);
 
 static bool slot_rmap_write_protect(struct kvm *kvm,
struct kvm_rmap_head *rmap_head)
-- 
2.17.1

[RFC Part2 PATCH 27/30] KVM: SVM: add support to handle MSR based Page State Change VMGEXIT

2021-03-24 Thread Brijesh Singh

SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
table to be private or shared using the Page State Change MSR protocol
as defined in the GHCB specification section 2.5.1.

Before changing the page state in the RMP entry, we lookup the page in
the TDP to make sure that there is a valid mapping for it. If the mapping
exist then try to find a workable page level between the TDP and RMP for
the page. If the page is not mapped in the TDP, then create a fault such
that it gets mapped before we change the page state in the RMP entry.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c | 148 +
 arch/x86/kvm/svm/svm.h |  11 +++
 2 files changed, 159 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7c242c470eba..8f046b45c424 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -25,6 +25,7 @@
 #include "svm.h"
 #include "cpuid.h"
 #include "trace.h"
+#include "mmu.h"
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
@@ -2322,6 +2323,128 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 
value)
svm->vmcb->control.ghcb_gpa = value;
 }
 
+static int snp_rmptable_psmash(struct kvm_vcpu *vcpu, kvm_pfn_t pfn)
+{
+   pfn = pfn & ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+
+   /* Split the 2MB-page RMP entry into a corresponding set of contiguous 
4KB-page RMP entry */
+   return rmptable_psmash(pfn_to_page(pfn));
+}
+
+static int snp_make_page_shared(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t 
pfn, int level)
+{
+   struct rmpupdate val;
+   int rc, rmp_level;
+   rmpentry_t *e;
+
+   e = lookup_page_in_rmptable(pfn_to_page(pfn), _level);
+   if (!e)
+   return -EINVAL;
+
+   if (!rmpentry_assigned(e))
+   return 0;
+
+   /* Log if the entry is validated */
+   if (rmpentry_validated(e))
+   pr_debug_ratelimited("Remove RMP entry for a validated gpa 
0x%llx\n", gpa);
+
+   /*
+* Is the page part of an existing 2M RMP entry ? Split the 2MB into 
multiple of 4K-page
+* before making the memory shared.
+*/
+   if ((level == PG_LEVEL_4K) && (rmp_level == PG_LEVEL_2M)) {
+   rc = snp_rmptable_psmash(vcpu, pfn);
+   if (rc)
+   return rc;
+   }
+
+   memset(, 0, sizeof(val));
+   val.pagesize = X86_RMP_PG_LEVEL(level);
+   return rmptable_rmpupdate(pfn_to_page(pfn), );
+}
+
+static int snp_make_page_private(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t 
pfn, int level)
+{
+   struct kvm_sev_info *sev = _kvm_svm(vcpu->kvm)->sev_info;
+   struct rmpupdate val;
+   int rmp_level;
+   rmpentry_t *e;
+
+   e = lookup_page_in_rmptable(pfn_to_page(pfn), _level);
+   if (!e)
+   return -EINVAL;
+
+   /* Log if the entry is validated */
+   if (rmpentry_validated(e))
+   pr_err_ratelimited("Asked to make a pre-validated gpa %llx 
private\n", gpa);
+
+   memset(, 0, sizeof(val));
+   val.gpa = gpa;
+   val.asid = sev->asid;
+   val.pagesize = X86_RMP_PG_LEVEL(level);
+   val.assigned = true;
+
+   return rmptable_rmpupdate(pfn_to_page(pfn), );
+}
+
+static int __snp_handle_page_state_change(struct kvm_vcpu *vcpu, int op, gpa_t 
gpa, int level)
+{
+   struct kvm *kvm = vcpu->kvm;
+   gpa_t end, next_gpa;
+   int rc, tdp_level;
+   kvm_pfn_t pfn;
+
+   end = gpa + page_level_size(level);
+
+   for (; end > gpa; gpa = next_gpa) {
+   /*
+* Get the pfn and level for the gpa from the nested page table.
+*
+* If the TDP walk failed, then its safe to say that we don't 
have a valid
+* mapping for the gpa in the nested page table. Create a fault 
to map the
+* page is nested page table.
+*/
+   if (!kvm_mmu_get_tdp_walk(vcpu, gpa, , _level)) {
+   pfn = kvm_mmu_map_tdp_page(vcpu, gpa, PFERR_USER_MASK, 
level);
+   if (is_error_noslot_pfn(pfn))
+   goto out;
+
+   if (!kvm_mmu_get_tdp_walk(vcpu, gpa, , _level))
+   goto out;
+   }
+
+   /* Adjust the level so that we don't go higher than the backing 
page level */
+   level = min_t(size_t, level, tdp_level);
+
+   spin_lock(>mmu_lock);
+
+   switch (op) {

[RFC Part2 PATCH 26/30] KVM: SVM: add support to handle GHCB GPA register VMGEXIT

2021-03-24 Thread Brijesh Singh

SEV-SNP guests are required to perform a GHCB GPA registration (see
section 2.5.2 in GHCB specification). Before using a GHCB GPA for a vCPU
the first time, a guest must register the vCPU GHCB GPA. If hypervisor
can work with the guest requested GPA then it must respond back with the
same GPA otherwise return -1.

On every VMEXIT, we verify that GHCB GPA matches with the registered value.
If a mismatch is detected then abort the guest.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c | 28 
 arch/x86/kvm/svm/svm.h | 15 +++
 2 files changed, 43 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index e66be4d305b9..7c242c470eba 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2378,6 +2378,28 @@ static int sev_handle_vmgexit_msr_protocol(struct 
vcpu_svm *svm)
  GHCB_MSR_INFO_POS);
break;
}
+   case GHCB_MSR_GHCB_GPA_REGISTER_REQ: {
+   kvm_pfn_t pfn;
+   u64 gfn;
+
+   gfn = get_ghcb_msr_bits(svm,
+   GHCB_MSR_GHCB_GPA_REGISTER_VALUE_MASK,
+   GHCB_MSR_GHCB_GPA_REGISTER_VALUE_POS);
+
+   pfn = kvm_vcpu_gfn_to_pfn(vcpu, gfn);
+   if (is_error_noslot_pfn(pfn))
+   gfn = GHCB_MSR_GHCB_GPA_REGISTER_ERROR;
+   else
+   svm->ghcb_registered_gpa = gfn_to_gpa(gfn);
+
+   set_ghcb_msr_bits(svm, gfn,
+ GHCB_MSR_GHCB_GPA_REGISTER_VALUE_MASK,
+ GHCB_MSR_GHCB_GPA_REGISTER_VALUE_POS);
+   set_ghcb_msr_bits(svm, GHCB_MSR_GHCB_GPA_REGISTER_RESP,
+ GHCB_MSR_INFO_MASK,
+ GHCB_MSR_INFO_POS);
+   break;
+   }
case GHCB_MSR_TERM_REQ: {
u64 reason_set, reason_code;
 
@@ -2418,6 +2440,12 @@ int sev_handle_vmgexit(struct vcpu_svm *svm)
return -EINVAL;
}
 
+   /* SEV-SNP guest requires that the GHCB GPA must be registered */
+   if (sev_snp_guest(svm->vcpu.kvm) && !ghcb_gpa_is_registered(svm, 
ghcb_gpa)) {
+   vcpu_unimpl(>vcpu, "vmgexit: GHCB GPA [%#llx] is not 
registered.\n", ghcb_gpa);
+   return -EINVAL;
+   }
+
if (kvm_vcpu_map(>vcpu, ghcb_gpa >> PAGE_SHIFT, >ghcb_map)) {
/* Unable to map GHCB from guest */
vcpu_unimpl(>vcpu, "vmgexit: error mapping GHCB [%#llx] 
from guest\n",
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9b095f8fc0cf..0de7c77b0d59 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -194,6 +194,8 @@ struct vcpu_svm {
u64 ghcb_sa_len;
bool ghcb_sa_sync;
bool ghcb_sa_free;
+
+   u64 ghcb_registered_gpa;
 };
 
 struct svm_cpu_data {
@@ -254,6 +256,13 @@ static inline bool sev_snp_guest(struct kvm *kvm)
 #endif
 }
 
+#define GHCB_GPA_INVALID   0x
+
+static inline bool ghcb_gpa_is_registered(struct vcpu_svm *svm, u64 val)
+{
+   return svm->ghcb_registered_gpa == val;
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
vmcb->control.clean = 0;
@@ -574,6 +583,12 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
 #define GHCB_MSR_CPUID_REG_POS 30
 #define GHCB_MSR_CPUID_REG_MASK0x3
 
+#define GHCB_MSR_GHCB_GPA_REGISTER_REQ 0x012
+#define GHCB_MSR_GHCB_GPA_REGISTER_VALUE_POS   12
+#define GHCB_MSR_GHCB_GPA_REGISTER_VALUE_MASK  0xf
+#define GHCB_MSR_GHCB_GPA_REGISTER_RESP0x013
+#define GHCB_MSR_GHCB_GPA_REGISTER_ERROR   0xf
+
 #define GHCB_MSR_TERM_REQ  0x100
 #define GHCB_MSR_TERM_REASON_SET_POS   12
 #define GHCB_MSR_TERM_REASON_SET_MASK  0xf
-- 
2.17.1

[RFC Part2 PATCH 24/30] KVM: X86: define new RMP check related #NPF error bits

2021-03-24 Thread Brijesh Singh

When SEV-SNP is enabled globally, the hardware places restrictions on all
memory accesses based on the RMP entry, whether the hyperviso or a VM,
performs the accesses. When hardware encounters an RMP access violation
during a guest access, it will cause a #VMEXIT(NPF).

See APM2 section 16.36.10 for more details.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 93dc4f232964..074605408970 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -216,8 +216,12 @@ enum x86_intercept_stage;
 #define PFERR_RSVD_BIT 3
 #define PFERR_FETCH_BIT 4
 #define PFERR_PK_BIT 5
+#define PFERR_GUEST_RMP_BIT 31
 #define PFERR_GUEST_FINAL_BIT 32
 #define PFERR_GUEST_PAGE_BIT 33
+#define PFERR_GUEST_ENC_BIT 34
+#define PFERR_GUEST_SIZEM_BIT 35
+#define PFERR_GUEST_VMPL_BIT 36
 
 #define PFERR_PRESENT_MASK (1U << PFERR_PRESENT_BIT)
 #define PFERR_WRITE_MASK (1U << PFERR_WRITE_BIT)
@@ -227,6 +231,10 @@ enum x86_intercept_stage;
 #define PFERR_PK_MASK (1U << PFERR_PK_BIT)
 #define PFERR_GUEST_FINAL_MASK (1ULL << PFERR_GUEST_FINAL_BIT)
 #define PFERR_GUEST_PAGE_MASK (1ULL << PFERR_GUEST_PAGE_BIT)
+#define PFERR_GUEST_RMP_MASK (1ULL << PFERR_GUEST_RMP_BIT)
+#define PFERR_GUEST_ENC_MASK (1ULL << PFERR_GUEST_ENC_BIT)
+#define PFERR_GUEST_SIZEM_MASK (1ULL << PFERR_GUEST_SIZEM_BIT)
+#define PFERR_GUEST_VMPL_MASK (1ULL << PFERR_GUEST_VMPL_BIT)
 
 #define PFERR_NESTED_GUEST_PAGE (PFERR_GUEST_PAGE_MASK |   \
 PFERR_WRITE_MASK | \
-- 
2.17.1

[RFC Part2 PATCH 25/30] KVM: X86: update page-fault trace to log the 64-bit error code

2021-03-24 Thread Brijesh Singh

The page-fault error code is a 64-bit value, but the trace prints only
the lower 32-bits. Some of the SEV-SNP RMP fault error codes are
available in the upper 32-bits.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 2de30c20bc26..16236a8b42eb 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -329,12 +329,12 @@ TRACE_EVENT(kvm_inj_exception,
  * Tracepoint for page fault.
  */
 TRACE_EVENT(kvm_page_fault,
-   TP_PROTO(unsigned long fault_address, unsigned int error_code),
+   TP_PROTO(unsigned long fault_address, u64 error_code),
TP_ARGS(fault_address, error_code),
 
TP_STRUCT__entry(
__field(unsigned long,  fault_address   )
-   __field(unsigned int,   error_code  )
+   __field(u64,error_code  )
),
 
TP_fast_assign(
@@ -342,7 +342,7 @@ TRACE_EVENT(kvm_page_fault,
__entry->error_code = error_code;
),
 
-   TP_printk("address %lx error_code %x",
+   TP_printk("address %lx error_code %llx",
  __entry->fault_address, __entry->error_code)
 );
 
-- 
2.17.1

[RFC Part2 PATCH 23/30] KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use

2021-03-24 Thread Brijesh Singh

The SEV-SNP VMs may call the page state change VMGEXIT to add the GPA
as private or shared in the RMP table. The page state change VMGEXIT
will contain the RMP page level to be used in the RMP entry. If the
page level between the TDP and RMP does not match then, it will result
in nested-page-fault (RMP violation).

The SEV-SNP VMGEXIT handler will use the kvm_mmu_get_tdp_walk() to get
the current page-level in the TDP for the given GPA and calculate a
workable page level. If a GPA is mapped as a 4K-page in the TDP, but
the guest requested to add the GPA as a 2M in the RMP entry then the
2M request will be broken into 4K-pages to keep the RMP and TDP
page-levels in sync.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/mmu.h |  1 +
 arch/x86/kvm/mmu/mmu.c | 29 +
 2 files changed, 30 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 70dce26a5882..e7c4e55215bf 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -110,6 +110,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, 
u32 error_code,
   bool prefault);
 
 int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int 
max_level);
+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, 
int *level);
 
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa,
u32 err, bool prefault)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 33104943904b..147f22bda6e7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3828,6 +3828,35 @@ int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t 
gpa, u32 error_code, int m
 }
 EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
 
+bool kvm_mmu_get_tdp_walk(struct kvm_vcpu *vcpu, gpa_t gpa, kvm_pfn_t *pfn, 
int *level)
+{
+   u64 sptes[PT64_ROOT_MAX_LEVEL + 1];
+   int leaf, root;
+
+   if (is_tdp_mmu_root(vcpu->kvm, vcpu->arch.mmu->root_hpa))
+   leaf = kvm_tdp_mmu_get_walk(vcpu, gpa, sptes, );
+   else
+   leaf = get_walk(vcpu, gpa, sptes, );
+
+   if (unlikely(leaf < 0))
+   return false;
+
+   /* Check if the leaf SPTE is present */
+   if (!is_shadow_present_pte(sptes[leaf]))
+   return false;
+
+   *pfn = spte_to_pfn(sptes[leaf]);
+   if (leaf > PG_LEVEL_4K) {
+   u64 page_mask = KVM_PAGES_PER_HPAGE(leaf) - 
KVM_PAGES_PER_HPAGE(leaf - 1);
+   *pfn |= (gpa_to_gfn(gpa) & page_mask);
+   }
+
+   *level = leaf;
+
+   return true;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_tdp_walk);
+
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,
   struct kvm_mmu *context)
 {
-- 
2.17.1

[RFC Part2 PATCH 22/30] x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV

2021-03-24 Thread Brijesh Singh

Introduce a helper to directly fault-in a TDP page without going
through the full page fault path.  This allows SEV-SNP to build
the netsted-page-table while handling the page state change VMGEXIT.
A guest may issue a page state change VMGEXIT before accessing the
page. Creating a fault-in, we can get the TDP page level and PFN
which will be used while calculating the RMP page size.

SEV-SNP guest calls, page state change VMGEXIT followed by the PVALIDATE.
If the page is not present in the TDP then PVALIDATE will cause a nested
page fault. If we can build the TDP while handling the page state change
VMGEXIT, it can also avoid a nested page fault due to the page not
being present.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/mmu.h |  2 ++
 arch/x86/kvm/mmu/mmu.c | 20 
 2 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 261be1d2032b..70dce26a5882 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -109,6 +109,8 @@ static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu)
 int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
   bool prefault);
 
+int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int 
max_level);
+
 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa,
u32 err, bool prefault)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e55df7b4e297..33104943904b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3808,6 +3808,26 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, 
u32 error_code,
 max_level, true);
 }
 
+int kvm_mmu_map_tdp_page(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, int 
max_level)
+{
+   int r;
+
+   /*
+* Loop on the page fault path to handle the case where an mmu_notifier
+* invalidation triggers RET_PF_RETRY.  In the normal page fault path,
+* KVM needs to resume the guest in case the invalidation changed any
+* of the page fault properties, i.e. the gpa or error code.  For this
+* path, the gpa and error code are fixed by the caller, and the caller
+* expects failure if and only if the page fault can't be fixed.
+*/
+   do {
+   r = direct_page_fault(vcpu, gpa, error_code, false, max_level, 
true);
+   } while (r == RET_PF_RETRY);
+
+   return r;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_map_tdp_page);
+
 static void nonpaging_init_context(struct kvm_vcpu *vcpu,
   struct kvm_mmu *context)
 {
-- 
2.17.1

[RFC Part2 PATCH 20/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_FINISH command

2021-03-24 Thread Brijesh Singh

The KVM_SEV_SNP_LAUNCH_FINISH finalize the cryptographic digest and stores
it as the measurement of the guest at launch.

While finalizing the launch flow, it also issues the LAUNCH_UPDATE command
to encrypt the VMSA pages.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c   | 131 +++
 include/uapi/linux/kvm.h |  13 
 2 files changed, 144 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4037430b8d56..810fd2b8a9ff 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1380,6 +1380,117 @@ static int snp_launch_update(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int snp_launch_update_vmsa(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_snp_launch_update *data;
+   int i, ret;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   data->gctx_paddr = __sme_page_pa(sev->snp_context);
+   data->page_type = SNP_PAGE_TYPE_VMSA;
+
+   for (i = 0; i < kvm->created_vcpus; i++) {
+   struct vcpu_svm *svm = to_svm(kvm->vcpus[i]);
+   struct rmpupdate e = {};
+
+   /* Perform some pre-encryption checks against the VMSA */
+   ret = sev_es_sync_vmsa(svm);
+   if (ret)
+   goto e_free;
+
+   /* Transition the VMSA page to a firmware state. */
+   e.assigned = 1;
+   e.immutable = 1;
+   e.asid = sev->asid;
+   e.gpa = -1;
+   e.pagesize = RMP_PG_SIZE_4K;
+   ret = rmptable_rmpupdate(virt_to_page(svm->vmsa), );
+   if (ret)
+   goto e_free;
+
+   /* Issue the SNP command to encrypt the VMSA */
+   data->address = __sme_pa(svm->vmsa);
+   ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, 
data, >error);
+   if (ret) {
+   snp_page_reclaim(virt_to_page(svm->vmsa), 
RMP_PG_SIZE_4K);
+   goto e_free;
+   }
+
+   svm->vcpu.arch.guest_state_protected = true;
+   }
+
+e_free:
+   kfree(data);
+
+   return ret;
+}
+
+static int snp_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_snp_launch_finish *data;
+   void *id_block = NULL, *id_auth = NULL;
+   struct kvm_sev_snp_launch_finish params;
+   int ret;
+
+   if (!sev_snp_guest(kvm))
+   return -ENOTTY;
+
+   if (!sev->snp_context)
+   return -EINVAL;
+
+   if (copy_from_user(, (void __user *)(uintptr_t)argp->data, 
sizeof(params)))
+   return -EFAULT;
+
+   /* Measure all vCPUs using LAUNCH_UPDATE before we finalize the launch 
flow. */
+   ret = snp_launch_update_vmsa(kvm, argp);
+   if (ret)
+   return ret;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   if (params.id_block_en) {
+   id_block = psp_copy_user_blob(params.id_block_uaddr, 
KVM_SEV_SNP_ID_BLOCK_SIZE);
+   if (IS_ERR(id_block)) {
+   ret = PTR_ERR(id_block);
+   goto e_free;
+   }
+
+   data->id_block_en = 1;
+   data->id_block_paddr = __sme_pa(id_block);
+   }
+
+   if (params.auth_key_en) {
+   id_auth = psp_copy_user_blob(params.id_auth_uaddr, 
KVM_SEV_SNP_ID_AUTH_SIZE);
+   if (IS_ERR(id_auth)) {
+   ret = PTR_ERR(id_auth);
+   goto e_free_id_block;
+   }
+
+   data->auth_key_en = 1;
+   data->id_auth_paddr = __sme_pa(id_auth);
+   }
+
+   data->gctx_paddr = __sme_page_pa(sev->snp_context);
+   ret = sev_issue_cmd(kvm, SEV_CMD_SNP_LAUNCH_FINISH, data, >error);
+
+   kfree(id_auth);
+
+e_free_id_block:
+   kfree(id_block);
+
+e_free:
+   kfree(data);
+
+   return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -1439,6 +1550,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_SNP_LAUNCH_UPDATE:
r = snp_launch_update(kvm, _cmd);
break;
+   case KVM_SEV_SNP_LAUNCH_F

[RFC Part2 PATCH 17/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_START command

2021-03-24 Thread Brijesh Singh

KVM_SEV_SNP_LAUNCH_START begins the launch process for an SEV-SNP guest.
The command initializes a cryptographic digest context used to construct the
measurement of the guest. If the guest is expected to be migrated, the
command also binds a migration agent (MA) to the guest.

For more information see the SEV-SNP spec section 4.5 and 8.11.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c   | 221 ++-
 arch/x86/kvm/svm/svm.h   |   1 +
 include/uapi/linux/kvm.h |   8 ++
 3 files changed, 229 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 36042a2b19b3..7652e57f7e01 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -37,6 +37,9 @@ static unsigned int min_sev_asid;
 static unsigned long *sev_asid_bitmap;
 static unsigned long *sev_reclaim_asid_bitmap;
 
+static void snp_free_context_page(struct page *page);
+static int snp_decommission_context(struct kvm *kvm);
+
 struct enc_region {
struct list_head list;
unsigned long npages;
@@ -1069,6 +1072,181 @@ static int sev_snp_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return 0;
 }
 
+static int snp_page_reclaim(struct page *page, int rmppage_size)
+{
+   struct sev_data_snp_page_reclaim *data;
+   struct rmpupdate e = {};
+   int rc, error;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   data->paddr = __sme_page_pa(page) | rmppage_size;
+   rc = sev_snp_reclaim(data, );
+   if (rc)
+   goto e_free;
+
+   rc = rmptable_rmpupdate(page, );
+
+e_free:
+   kfree(data);
+
+   return rc;
+}
+
+static void snp_free_context_page(struct page *page)
+{
+   /* Reclaim the page before changing the attribute */
+   if (snp_page_reclaim(page, RMP_PG_SIZE_4K)) {
+   pr_info("SEV-SNP: failed to reclaim page, leaking it.\n");
+   return;
+   }
+
+   __free_page(page);
+}
+
+static struct page *snp_alloc_context_page(void)
+{
+   struct rmpupdate val = {};
+   struct page *page = NULL;
+   int rc;
+
+   page = alloc_page(GFP_KERNEL);
+   if (!page)
+   return NULL;
+
+   /* Transition the context page to the firmware state.*/
+   val.immutable = 1;
+   val.assigned = 1;
+   val.pagesize = RMP_PG_SIZE_4K;
+   rc = rmptable_rmpupdate(page, );
+   if (rc)
+   goto e_free;
+
+   return page;
+
+e_free:
+   __free_page(page);
+
+   return NULL;
+}
+
+static struct page *snp_context_create(struct kvm *kvm, struct kvm_sev_cmd 
*argp)
+{
+   struct sev_data_snp_gctx_create *data;
+   struct page *context = NULL;
+   int rc;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return NULL;
+
+   /* Allocate memory for context page */
+   context = snp_alloc_context_page();
+   if (!context)
+   goto e_free;
+
+   data->gctx_paddr = __sme_page_pa(context);
+   rc = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_GCTX_CREATE, data, 
>error);
+   if (rc) {
+   snp_free_context_page(context);
+   context = NULL;
+   }
+
+e_free:
+   kfree(data);
+
+   return context;
+}
+
+static int snp_bind_asid(struct kvm *kvm, int *error)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_snp_activate *data;
+   int asid = sev_get_asid(kvm);
+   int ret, retry_count = 0;
+
+   data = kmalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   /* Activate ASID on the given context */
+   data->gctx_paddr = __sme_page_pa(sev->snp_context);
+   data->asid   = asid;
+again:
+   ret = sev_issue_cmd(kvm, SEV_CMD_SNP_ACTIVATE, data, error);
+
+   /* Check if the DF_FLUSH is required, and try again */
+   if (ret && (*error == SEV_RET_DFFLUSH_REQUIRED) && (!retry_count)) {
+   /* Guard DEACTIVATE against WBINVD/DF_FLUSH used in ASID 
recycling */
+   down_read(_deactivate_lock);
+   wbinvd_on_all_cpus();
+   ret = sev_guest_snp_df_flush(error);
+   up_read(_deactivate_lock);
+
+   if (ret)
+   goto e_free;
+
+   /* only one retry */
+   retry_count = 1;
+
+   goto again;
+   }
+
+e_free:
+   kfree(data);
+
+   return ret;
+}
+
+static int snp_launch_start(struct k

[RFC Part2 PATCH 21/30] KVM: X86: Add kvm_x86_ops to get the max page level for the TDP

2021-03-24 Thread Brijesh Singh

When running an SEV-SNP VM, the sPA used to index the RMP entry is
obtained through the TDP translation (gva->gpa->spa). The TDP page
level is checked against the page level programmed in the RMP entry.
If the page level does not match, then it will cause a nested page
fault with the RMP bit set to indicate the RMP violation.

To resolve the fault, we must match the page levels between the TDP
and RMP entry. Add a new kvm_x86_op (get_tdp_max_page_level) that
can be used to query the current the RMP page size. The page fault
handler will call the architecture code to get the maximum allowed
page level for the GPA and limit the TDP page level.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu/mmu.c  |  6 --
 arch/x86/kvm/svm/sev.c  | 20 
 arch/x86/kvm/svm/svm.c  |  1 +
 arch/x86/kvm/svm/svm.h  |  1 +
 arch/x86/kvm/vmx/vmx.c  |  8 
 6 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ccd5f8090ff6..93dc4f232964 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1302,6 +1302,7 @@ struct kvm_x86_ops {
 
void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
+   int (*get_tdp_max_page_level)(struct kvm_vcpu *vcpu, gpa_t gpa, int 
max_level);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6d16481aa29d..e55df7b4e297 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3747,11 +3747,13 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, 
gpa_t gpa, u32 error_code,
 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa,
u32 error_code, bool prefault)
 {
+   int max_level = kvm_x86_ops.get_tdp_max_page_level(vcpu, gpa, 
PG_LEVEL_2M);
+
pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code);
 
/* This path builds a PAE pagetable, we can map 2mb pages at maximum. */
return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault,
-PG_LEVEL_2M, false);
+max_level, false);
 }
 
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
@@ -3792,7 +3794,7 @@ int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, 
u32 error_code,
 {
int max_level;
 
-   for (max_level = KVM_MAX_HUGEPAGE_LEVEL;
+   for (max_level = kvm_x86_ops.get_tdp_max_page_level(vcpu, gpa, 
KVM_MAX_HUGEPAGE_LEVEL);
 max_level > PG_LEVEL_4K;
 max_level--) {
int page_num = KVM_PAGES_PER_HPAGE(max_level);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 810fd2b8a9ff..e66be4d305b9 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2670,3 +2670,23 @@ struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
 
return pfn_to_page(pfn);
 }
+
+int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int max_level)
+{
+   rmpentry_t *e;
+   kvm_pfn_t pfn;
+   int level;
+
+   if (!sev_snp_guest(vcpu->kvm))
+   return max_level;
+
+   pfn = gfn_to_pfn(vcpu->kvm, gpa_to_gfn(gpa));
+   if (is_error_noslot_pfn(pfn))
+   return max_level;
+
+   e = lookup_page_in_rmptable(pfn_to_page(pfn), );
+   if (unlikely(!e))
+   return max_level;
+
+   return min_t(uint32_t, level, max_level);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 72fc1bd8737c..73259a3564eb 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4563,6 +4563,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.vcpu_deliver_sipi_vector = svm_vcpu_deliver_sipi_vector,
 
.alloc_apic_backing_page = svm_alloc_apic_backing_page,
+   .get_tdp_max_page_level = sev_get_tdp_max_page_level,
 };
 
 static struct kvm_x86_init_ops svm_init_ops __initdata = {
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 97efdca498ed..9b095f8fc0cf 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -606,6 +606,7 @@ void sev_es_vcpu_put(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
 void sev_snp_init_vmcb(struct vcpu_svm *svm);
+int sev_get_tdp_max_page_level(struct kvm_vcpu *vcpu, gpa_t gpa, int 
max_level);
 
 /* vmenter.S */
 
d

[RFC Part2 PATCH 19/30] KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates

2021-03-24 Thread Brijesh Singh

The guest pages of the SEV-SNP VM maybe added as a private page in the
RMP entry (assigned bit is set). While terminating the guest we must
unassign those pages so that pages are transitioned to the hypervisor
state before they can be freed.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1a0c8c95d178..4037430b8d56 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1517,6 +1517,47 @@ find_enc_region(struct kvm *kvm, struct kvm_enc_region 
*range)
 static void __unregister_enc_region_locked(struct kvm *kvm,
   struct enc_region *region)
 {
+   struct rmpupdate val = {};
+   unsigned long i, pfn;
+   rmpentry_t *e;
+   int level, rc;
+
+   /*
+* On SEV-SNP, the guest memory pages are assigned in the RMP table. 
Un-assigned them
+* before releasing the memory.
+*/
+   if (sev_snp_guest(kvm)) {
+   for (i = 0; i < region->npages; i++) {
+   pfn = page_to_pfn(region->pages[i]);
+
+   if (need_resched())
+   schedule();
+
+   e = lookup_page_in_rmptable(region->pages[i], );
+   if (!e) {
+   pr_err("SEV-SNP: failed to read RMP entry (pfn 
0x%lx\n", pfn);
+   continue;
+   }
+
+   /* If its not a guest assigned page then skip it */
+   if (!rmpentry_assigned(e))
+   continue;
+
+   /* Is the page part of a 2MB RMP entry? */
+   if (level == PG_LEVEL_2M) {
+   val.pagesize = RMP_PG_SIZE_2M;
+   pfn &= ~(KVM_PAGES_PER_HPAGE(PG_LEVEL_2M) - 1);
+   } else {
+   val.pagesize = RMP_PG_SIZE_4K;
+   }
+
+   /* Transition the page to hypervisor owned. */
+   rc = rmptable_rmpupdate(pfn_to_page(pfn), );
+   if (rc)
+   pr_err("SEV-SNP: failed to release pfn 0x%lx 
ret=%d\n", pfn, rc);
+   }
+   }
+
sev_unpin_memory(kvm, region->pages, region->npages);
list_del(>list);
kfree(region);
-- 
2.17.1

[RFC Part2 PATCH 18/30] KVM: SVM: add KVM_SEV_SNP_LAUNCH_UPDATE command

2021-03-24 Thread Brijesh Singh

The KVM_SEV_SNP_LAUNCH_UPDATE command can be used to insert data into the
guest's memory. The data is encrypted with the cryptographic context
created with the KVM_SEV_SNP_LAUNCH_START.

In addition to the inserting data, it can insert a two special pages
into the guests memory: the secrets page and the CPUID page.

For more information see the SEV-SNP spec section 4.5 and 8.12.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c   | 136 +++
 include/uapi/linux/kvm.h |  18 ++
 2 files changed, 154 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 7652e57f7e01..1a0c8c95d178 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1247,6 +1247,139 @@ static int snp_launch_start(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return rc;
 }
 
+static struct kvm_memory_slot *hva_to_memslot(struct kvm *kvm, unsigned long 
hva)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   struct kvm_memory_slot *memslot;
+
+   kvm_for_each_memslot(memslot, slots) {
+   if (hva >= memslot->userspace_addr &&
+   hva < memslot->userspace_addr + (memslot->npages << 
PAGE_SHIFT))
+   return memslot;
+   }
+
+   return NULL;
+}
+
+static bool hva_to_gpa(struct kvm *kvm, unsigned long hva, gpa_t *gpa)
+{
+   struct kvm_memory_slot *memslot;
+   gpa_t gpa_offset;
+
+   memslot = hva_to_memslot(kvm, hva);
+   if (!memslot)
+   return false;
+
+   gpa_offset = hva - memslot->userspace_addr;
+   *gpa = ((memslot->base_gfn << PAGE_SHIFT) + gpa_offset);
+
+   return true;
+}
+
+static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   unsigned long npages, vaddr, vaddr_end, i, next_vaddr;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_snp_launch_update *data;
+   struct kvm_sev_snp_launch_update params;
+   int *error = >error;
+   struct kvm_vcpu *vcpu;
+   struct page **inpages;
+   struct rmpupdate e;
+   int ret;
+
+   if (!sev_snp_guest(kvm))
+   return -ENOTTY;
+
+   if (!sev->snp_context)
+   return -EINVAL;
+
+   if (copy_from_user(, (void __user *)(uintptr_t)argp->data, 
sizeof(params)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   data->gctx_paddr = __sme_page_pa(sev->snp_context);
+   data->vmpl1_perms = 0xf;
+   data->vmpl2_perms = 0xf;
+   data->vmpl3_perms = 0xf;
+
+   /* Lock the user memory. */
+   inpages = sev_pin_memory(kvm, params.uaddr, params.len, , 1);
+   if (!inpages) {
+   ret = -ENOMEM;
+   goto e_free;
+   }
+
+   vcpu = kvm_get_vcpu(kvm, 0);
+   vaddr = params.uaddr;
+   vaddr_end = vaddr + params.len;
+
+   for (i = 0; vaddr < vaddr_end; vaddr = next_vaddr, i++) {
+   unsigned long psize, pmask;
+   int level = PG_LEVEL_4K;
+   gpa_t gpa;
+
+   if (!hva_to_gpa(kvm, vaddr, )) {
+   ret = -EINVAL;
+   goto e_unpin;
+   }
+
+   psize = page_level_size(level);
+   pmask = page_level_mask(level);
+   gpa = gpa & pmask;
+
+   /* Transition the page state to pre-guest */
+   memset(, 0, sizeof(e));
+   e.assigned = 1;
+   e.gpa = gpa;
+   e.asid = sev_get_asid(kvm);
+   e.immutable = true;
+   e.pagesize = X86_RMP_PG_LEVEL(level);
+   ret = rmptable_rmpupdate(inpages[i], );
+   if (ret) {
+   ret = -EFAULT;
+   goto e_unpin;
+   }
+
+   data->address = __sme_page_pa(inpages[i]);
+   data->page_size = e.pagesize;
+   data->page_type = params.page_type;
+   ret = __sev_issue_cmd(argp->sev_fd, SEV_CMD_SNP_LAUNCH_UPDATE, 
data, error);
+   if (ret) {
+   snp_page_reclaim(inpages[i], e.pagesize);
+   goto e_unpin;
+   }
+
+   next_vaddr = (vaddr & pmask) + psize;
+   }
+
+e_unpin:
+   /* Content of memory is updated, mark pages dirty */
+   memset(, 0, sizeof(e));
+   for (i = 0; i < npages; i++) {
+   set_page_dirty_lock(in

[RFC Part2 PATCH 16/30] KVM: SVM: add KVM_SNP_INIT command

2021-03-24 Thread Brijesh Singh

The KVM_SNP_INIT command is used by the hypervisor to initialize the
SEV-SNP platform context. In a typical workflow, this command should be the
first command issued. When creating SEV-SNP guest, the VMM must use this
command instead of the KVM_SEV_INIT or KVM_SEV_ES_INIT.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c   | 41 ++--
 arch/x86/kvm/svm/svm.c   |  5 +
 arch/x86/kvm/svm/svm.h   |  1 +
 include/uapi/linux/kvm.h |  3 +++
 4 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 4d5be5d2b05c..36042a2b19b3 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -189,7 +189,10 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
if (asid < 0)
return ret;
 
-   ret = sev_platform_init(>error);
+   if (sev->snp_active)
+   ret = sev_snp_init(>error);
+   else
+   ret = sev_platform_init(>error);
if (ret)
goto e_free;
 
@@ -206,12 +209,19 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
 
 static int sev_es_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 {
+   int ret;
+
if (!sev_es)
return -ENOTTY;
 
+   /* Must be set so that sev_asid_new() allocates ASID from the ES ASID 
range. */
to_kvm_svm(kvm)->sev_info.es_active = true;
 
-   return sev_guest_init(kvm, argp);
+   ret = sev_guest_init(kvm, argp);
+   if (ret)
+   to_kvm_svm(kvm)->sev_info.es_active = false;
+
+   return ret;
 }
 
 static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
@@ -1042,6 +1052,23 @@ static int sev_launch_secret(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_snp_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   int rc;
+
+   if (!sev_snp)
+   return -ENOTTY;
+
+   rc = sev_es_guest_init(kvm, argp);
+   if (rc)
+   return rc;
+
+   sev->snp_active = true;
+
+   return 0;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -1092,6 +1119,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_LAUNCH_SECRET:
r = sev_launch_secret(kvm, _cmd);
break;
+   case KVM_SEV_SNP_INIT:
+   r = sev_snp_guest_init(kvm, _cmd);
+   break;
default:
r = -EINVAL;
goto out;
@@ -1955,6 +1985,13 @@ int sev_es_string_io(struct vcpu_svm *svm, int size, 
unsigned int port, int in)
svm->ghcb_sa, svm->ghcb_sa_len, in);
 }
 
+void sev_snp_init_vmcb(struct vcpu_svm *svm)
+{
+   struct vmcb_save_area *save = >vmcb->save;
+
+   save->sev_features |= SVM_SEV_FEATURES_SNP_ACTIVE;
+}
+
 void sev_es_init_vmcb(struct vcpu_svm *svm)
 {
struct kvm_vcpu *vcpu = >vcpu;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 13df2cbfc361..72fc1bd8737c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1281,6 +1281,11 @@ static void init_vmcb(struct vcpu_svm *svm)
/* Perform SEV-ES specific VMCB updates */
sev_es_init_vmcb(svm);
}
+
+   if (sev_snp_guest(svm->vcpu.kvm)) {
+   /* Perform SEV-SNP specific VMCB Updates */
+   sev_snp_init_vmcb(svm);
+   }
}
 
vmcb_mark_all_dirty(svm->vmcb);
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 9e8cd39bd703..9d41735699c6 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -604,6 +604,7 @@ void sev_es_vcpu_load(struct vcpu_svm *svm, int cpu);
 void sev_es_vcpu_put(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu);
+void sev_snp_init_vmcb(struct vcpu_svm *svm);
 
 /* vmenter.S */
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 374c67875cdb..e0e7dd71a863 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1594,6 +1594,9 @@ enum sev_cmd_id {
/* Guest certificates commands */
KVM_SEV_CERT_EXPORT,
 
+   /* SNP specific commands */
+   KVM_SEV_SNP_INIT,
+
KVM_SEV_NR_MAX,
 };
 
-- 
2.17.1

[RFC Part2 PATCH 14/30] KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe

2021-03-24 Thread Brijesh Singh

When SEV-SNP is globally enabled on a system, the VMRUN instruction
performs additional security checks on AVIC backing, VMSA, and VMCB page.
On a successful VMRUN, these pages are marked "in-use" by the
hardware in the RMP entry, and any attempt to modify the RMP entry for
these pages will result in page-fault (RMP violation check).

While performing the RMP check, hardware will try to create a 2MB TLB
entry for the large page accesses. When it does this, it first reads
the RMP for the base of 2MB region and verifies that all this memory is
safe. If AVIC backing, VMSA, and VMCB memory happen to be the base of
2MB region, then RMP check will fail because of the "in-use" marking for
the base entry of this 2MB region.

e.g.

1. A VMCB was allocated on 2MB-aligned address.
2. The VMRUN instruction marks this RMP entry as "in-use".
3. Another process allocated some other page of memory that happened to be
   within the same 2MB region.
4. That process tried to write its page using physmap.

If the physmap entry in step #4 uses a large (1G/2M) page, then the
hardware will attempt to create a 2M TLB entry. The hardware will find
that the "in-use" bit is set in the RMP entry (because it was a
VMCB page) and will cause an RMP violation check.

See APM2 section 15.36.12 for more information on VMRUN checks when
SEV-SNP is globally active.

A generic allocator can return a page which are 2M aligned and will not
be safe to be used when SEV-SNP is globally enabled. Add a
snp_safe_alloc_page() helper that can be used for allocating the
SNP safe memory. The helper allocated 2 pages and splits them into order-1
allocation. It frees one page and keeps one of the page which is not
2M aligned.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Co-developed-by: Marc Orr 
Signed-off-by: Marc Orr 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/lapic.c|  5 -
 arch/x86/kvm/svm/sev.c  | 27 +++
 arch/x86/kvm/svm/svm.c  | 16 ++--
 arch/x86/kvm/svm/svm.h  |  1 +
 5 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3d6616f6f6ef..ccd5f8090ff6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1301,6 +1301,7 @@ struct kvm_x86_ops {
int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err);
 
void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector);
+   void *(*alloc_apic_backing_page)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_x86_nested_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 43cceadd073e..7e0151838273 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2425,7 +2425,10 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int 
timer_advance_ns)
 
vcpu->arch.apic = apic;
 
-   apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+   if (kvm_x86_ops.alloc_apic_backing_page)
+   apic->regs = kvm_x86_ops.alloc_apic_backing_page(vcpu);
+   else
+   apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
if (!apic->regs) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
   vcpu->vcpu_id);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b720837faf5a..4d5be5d2b05c 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2079,3 +2079,30 @@ void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, 
u8 vector)
 */
ghcb_set_sw_exit_info_2(svm->ghcb, 1);
 }
+
+struct page *snp_safe_alloc_page(struct kvm_vcpu *vcpu)
+{
+   unsigned long pfn;
+   struct page *p;
+
+   if (!snp_key_active())
+   return alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+
+   p = alloc_pages(GFP_KERNEL_ACCOUNT | __GFP_ZERO, 1);
+   if (!p)
+   return NULL;
+
+   /* split the page order */
+   split_page(p, 1);
+
+   /* Find a non-2M aligned page */
+   pfn = page_to_pfn(p);
+   if (IS_ALIGNED(__pfn_to_phys(pfn), PMD_SIZE)) {
+   pfn++;
+   __free_page(p);
+   } else {
+   __free_page(pfn_to_page(pfn + 1));
+   }
+
+   return pfn_to_page(pfn);
+}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index aa7ff4685c87..13df2cbfc361 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1324,7 +1324,7 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
svm = to_svm(vcpu);
 
err = -ENOMEM;
-   vmcb_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+

[RFC Part2 PATCH 15/30] KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area

2021-03-24 Thread Brijesh Singh

The hypervisor uses the SEV_FEATURES field (offset 3B0h) in the Save State
Area to control the SEV-SNP guest features such as SNPActive, vTOM,
ReflectVC etc. An SEV-SNP guest can read the SEV_FEATURES fields through
the SEV_STATUS MSR.

See APM2 Table 15-34 and B-4 for more details.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: Vitaly Kuznetsov 
Cc: Wanpeng Li 
Cc: Jim Mattson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/svm.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1c561945b426..c38783a1d24f 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -212,6 +212,15 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_NESTED_CTL_SEV_ENABLE  BIT(1)
 #define SVM_NESTED_CTL_SEV_ES_ENABLE   BIT(2)
 
+#define SVM_SEV_FEATURES_SNP_ACTIVEBIT(0)
+#define SVM_SEV_FEATURES_VTOM  BIT(1)
+#define SVM_SEV_FEATURES_REFLECT_VCBIT(2)
+#define SVM_SEV_FEATURES_RESTRICTED_INJECTION  BIT(3)
+#define SVM_SEV_FEATURES_ALTERNATE_INJECTION   BIT(4)
+#define SVM_SEV_FEATURES_DEBUG_SWAPBIT(5)
+#define SVM_SEV_FEATURES_PREVENT_HOST_IBS  BIT(6)
+#define SVM_SEV_FEATURES_BTB_ISOLATION BIT(7)
+
 struct vmcb_seg {
u16 selector;
u16 attrib;
@@ -293,7 +302,8 @@ struct vmcb_save_area {
u64 sw_exit_info_1;
u64 sw_exit_info_2;
u64 sw_scratch;
-   u8 reserved_11[56];
+   u64 sev_features;
+   u8 reserved_11[48];
u64 xcr0;
u8 valid_bitmap[16];
u64 x87_state_gpa;
-- 
2.17.1

[RFC Part2 PATCH 09/30] crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP

2021-03-24 Thread Brijesh Singh

Before SNP VMs can be launched, the platform must be appropriately
configured and initialized. Platform initialization is accomplished via
the SNP_INIT command.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/sev-dev.c | 164 ++-
 drivers/crypto/ccp/sev-dev.h |   2 +
 include/linux/psp-sev.h  |  16 
 3 files changed, 179 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 8a9fd843ad9e..c983a8b040c3 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -21,8 +21,10 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
+#include 
 
 #include "psp-dev.h"
 #include "sev-dev.h"
@@ -30,6 +32,7 @@
 #define DEVICE_NAME"sev"
 #define SEV_FW_FILE"amd/sev.fw"
 #define SEV_FW_NAME_SIZE   64
+#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
 static DEFINE_MUTEX(sev_cmd_mutex);
 static struct sev_misc_dev *misc_dev;
@@ -574,6 +577,93 @@ static int sev_update_firmware(struct device *dev)
return ret;
 }
 
+static void snp_set_hsave_pa(void *arg)
+{
+   wrmsrl(MSR_VM_HSAVE_PA, 0);
+}
+
+static int __sev_snp_init_locked(int *error)
+{
+   struct psp_device *psp = psp_master;
+   struct sev_device *sev;
+   int rc = 0;
+
+   if (!psp || !psp->sev_data)
+   return -ENODEV;
+
+   sev = psp->sev_data;
+
+   if (sev->snp_inited && sev->state >= SEV_STATE_INIT)
+   return 0;
+
+   if (!snp_key_active()) {
+   dev_notice(sev->dev, "SNP is not enabled\n");
+   return -ENODEV;
+   }
+
+   /* SNP_INIT requires the MSR_VM_HSAVE_PA must be set to 0h across all 
cores. */
+   on_each_cpu(snp_set_hsave_pa, NULL, 1);
+
+   /* Prepare for first SEV guest launch after INIT */
+   wbinvd_on_all_cpus();
+
+   /* Issue the SNP_INIT firmware command. */
+   rc = __sev_do_cmd_locked(SEV_CMD_SNP_INIT, NULL, error);
+   if (rc)
+   return rc;
+
+   sev->snp_inited = true;
+   sev->state = SEV_STATE_INIT;
+   dev_dbg(sev->dev, "SEV-SNP firmware initialized\n");
+
+   return rc;
+}
+
+int sev_snp_init(int *error)
+{
+   int rc;
+
+   mutex_lock(_cmd_mutex);
+   rc = __sev_snp_init_locked(error);
+   mutex_unlock(_cmd_mutex);
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_init);
+
+static int __sev_snp_shutdown_locked(int *error)
+{
+   struct sev_device *sev = psp_master->sev_data;
+   int ret;
+
+   ret = __sev_do_cmd_locked(SEV_CMD_SNP_SHUTDOWN, NULL, error);
+   if (ret)
+   return ret;
+
+   wbinvd_on_all_cpus();
+
+   ret = __sev_do_cmd_locked(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+   if (ret)
+   dev_err(sev->dev, "SEV-SNP firmware DF_FLUSH failed\n");
+
+   sev->snp_inited = false;
+   sev->state = SEV_STATE_UNINIT;
+   dev_dbg(sev->dev, "SEV-SNP firmware shutdown\n");
+
+   return ret;
+}
+
+static int sev_snp_shutdown(int *error)
+{
+   int rc;
+
+   mutex_lock(_cmd_mutex);
+   rc = __sev_snp_shutdown_locked(NULL);
+   mutex_unlock(_cmd_mutex);
+
+   return rc;
+}
+
 static int sev_ioctl_do_pek_import(struct sev_issue_cmd *argp, bool writable)
 {
struct sev_device *sev = psp_master->sev_data;
@@ -1043,6 +1133,42 @@ int sev_issue_cmd_external_user(struct file *filep, 
unsigned int cmd,
 }
 EXPORT_SYMBOL_GPL(sev_issue_cmd_external_user);
 
+static int sev_snp_firmware_state(void)
+{
+   struct sev_data_snp_platform_status_buf *buf = NULL;
+   struct page *status_page = NULL;
+   int state = SEV_STATE_UNINIT;
+   int rc, error;
+
+   status_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+   if (!status_page)
+   return -ENOMEM;
+
+   buf = kzalloc(sizeof(*buf), GFP_KERNEL_ACCOUNT);
+   if (!buf) {
+   __free_page(status_page);
+   return -ENOMEM;
+   }
+
+   buf->status_paddr = __sme_page_pa(status_page);
+   rc = sev_do_cmd(SEV_CMD_SNP_PLATFORM_STATUS, buf, );
+
+   /*
+* The status buffer is allocated as a hypervisor page. As per the SEV 
spec,
+* if the firmware is in INIT state then status buffer must be either a
+* the firmware page or the default page. Since our status buffer is in
+* the hypervisor page, so, if firmware is in INIT state then we should
+* fail with INVALID_PAGE_STATE.
+*/
+   if (rc && error == S

[RFC Part2 PATCH 13/30] KVM: SVM: add initial SEV-SNP support

2021-03-24 Thread Brijesh Singh

The next generation of SEV is called SEV-SNP (Secure Nested Paging).
SEV-SNP builds upon existing SEV and SEV-ES functionality  while adding new
hardware based security protection. SEV-SNP adds strong memory encryption
integrity protection to help prevent malicious hypervisor-based attacks
such as data replay, memory re-mapping, and more, to create an isolated
execution environment.

The SNP feature can be enabled in the KVM by passing the sev-snp module
parameter.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm/sev.c | 17 +
 arch/x86/kvm/svm/svm.c |  5 +
 arch/x86/kvm/svm/svm.h | 13 +
 3 files changed, 35 insertions(+)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 48017fef1cd9..b720837faf5a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -19,6 +19,7 @@
 #include 
 
 #include 
+#include 
 
 #include "x86.h"
 #include "svm.h"
@@ -1249,6 +1250,7 @@ void sev_vm_destroy(struct kvm *kvm)
 void __init sev_hardware_setup(void)
 {
unsigned int eax, ebx, ecx, edx;
+   bool sev_snp_supported = false;
bool sev_es_supported = false;
bool sev_supported = false;
 
@@ -1298,9 +1300,24 @@ void __init sev_hardware_setup(void)
pr_info("SEV-ES supported: %u ASIDs\n", min_sev_asid - 1);
sev_es_supported = true;
 
+   /* SEV-SNP support requested? */
+   if (!sev_snp)
+   goto out;
+
+   /* Does the CPU support SEV-SNP? */
+   if (!boot_cpu_has(X86_FEATURE_SEV_SNP))
+   goto out;
+
+   if (!snp_key_active())
+   goto out;
+
+   pr_info("SEV-SNP supported: %u ASIDs\n", min_sev_asid - 1);
+   sev_snp_supported = true;
+
 out:
sev = sev_supported;
sev_es = sev_es_supported;
+   sev_snp = sev_snp_supported;
 }
 
 void sev_hardware_teardown(void)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3442d44ca53b..aa7ff4685c87 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -197,6 +197,10 @@ module_param(sev, int, 0444);
 int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
 module_param(sev_es, int, 0444);
 
+/* enable/disable SEV-SNP support */
+int sev_snp = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
+module_param(sev_snp, int, 0444);
+
 bool __read_mostly dump_invalid_vmcb;
 module_param(dump_invalid_vmcb, bool, 0644);
 
@@ -986,6 +990,7 @@ static __init int svm_hardware_setup(void)
} else {
sev = false;
sev_es = false;
+   sev_snp = false;
}
 
svm_adjust_mmio_mask();
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6e7d070f8b86..3dd60d2a567a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -73,6 +73,7 @@ enum {
 struct kvm_sev_info {
bool active;/* SEV enabled guest */
bool es_active; /* SEV-ES enabled guest */
+   bool snp_active;/* SEV-SNP enabled guest */
unsigned int asid;  /* ASID used for this guest */
unsigned int handle;/* SEV firmware handle */
int fd; /* SEV device fd */
@@ -241,6 +242,17 @@ static inline bool sev_es_guest(struct kvm *kvm)
 #endif
 }
 
+static inline bool sev_snp_guest(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_AMD_SEV
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+
+   return sev_es_guest(kvm) && sev->snp_active;
+#else
+   return false;
+#endif
+}
+
 static inline void vmcb_mark_all_dirty(struct vmcb *vmcb)
 {
vmcb->control.clean = 0;
@@ -407,6 +419,7 @@ static inline bool gif_set(struct vcpu_svm *svm)
 
 extern int sev;
 extern int sev_es;
+extern int sev_snp;
 extern bool dump_invalid_vmcb;
 
 u32 svm_msrpm_offset(u32 msr);
-- 
2.17.1

[RFC Part2 PATCH 10/30] crypto: ccp: shutdown SNP firmware on kexec

2021-03-24 Thread Brijesh Singh

When the kernel is getting ready to kexec, it calls the device_shutdown() to
allow drivers to cleanup before the kexec. If SEV firmware is initialized
then shut it down before kexec'ing the new kernel.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/sev-dev.c | 18 --
 drivers/crypto/ccp/sp-pci.c  | 12 
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index c983a8b040c3..562501c43d8f 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1110,6 +1110,15 @@ int sev_dev_init(struct psp_device *psp)
return ret;
 }
 
+static void sev_firmware_shutdown(void)
+{
+   if (boot_cpu_has(X86_FEATURE_SEV))
+   sev_platform_shutdown(NULL);
+
+   if (boot_cpu_has(X86_FEATURE_SEV_SNP))
+   sev_snp_shutdown(NULL);
+}
+
 void sev_dev_destroy(struct psp_device *psp)
 {
struct sev_device *sev = psp->sev_data;
@@ -1117,6 +1126,8 @@ void sev_dev_destroy(struct psp_device *psp)
if (!sev)
return;
 
+   sev_firmware_shutdown();
+
if (sev->misc)
kref_put(_dev->refcount, sev_exit);
 
@@ -1272,12 +1283,7 @@ void sev_pci_exit(void)
if (!psp_master->sev_data)
return;
 
-   if (boot_cpu_has(X86_FEATURE_SEV))
-   sev_platform_shutdown(NULL);
-
-   if (boot_cpu_has(X86_FEATURE_SEV_SNP))
-   sev_snp_shutdown(NULL);
-
+   sev_firmware_shutdown();
 
if (sev_es_tmr) {
/* The TMR area was encrypted, flush it from the cache */
diff --git a/drivers/crypto/ccp/sp-pci.c b/drivers/crypto/ccp/sp-pci.c
index f471dbaef1fb..9210bfda91a2 100644
--- a/drivers/crypto/ccp/sp-pci.c
+++ b/drivers/crypto/ccp/sp-pci.c
@@ -239,6 +239,17 @@ static int sp_pci_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
return ret;
 }
 
+static void sp_pci_shutdown(struct pci_dev *pdev)
+{
+   struct device *dev = >dev;
+   struct sp_device *sp = dev_get_drvdata(dev);
+
+   if (!sp)
+   return;
+
+   sp_destroy(sp);
+}
+
 static void sp_pci_remove(struct pci_dev *pdev)
 {
struct device *dev = >dev;
@@ -368,6 +379,7 @@ static struct pci_driver sp_pci_driver = {
.id_table = sp_pci_table,
.probe = sp_pci_probe,
.remove = sp_pci_remove,
+   .shutdown = sp_pci_shutdown,
.driver.pm = _pci_pm_ops,
 };
 
-- 
2.17.1

[RFC Part2 PATCH 11/30] crypto:ccp: provide APIs to issue SEV-SNP commands

2021-03-24 Thread Brijesh Singh

Provide the APIs for the hypervisor to manage an SEV-SNP guest. The
commands for SEV-SNP is defined in the SEV-SNP firmware specification.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/sev-dev.c | 41 +
 include/linux/psp-sev.h  | 85 
 2 files changed, 126 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 562501c43d8f..242c4775eb56 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -1019,6 +1019,47 @@ int sev_guest_df_flush(int *error)
 }
 EXPORT_SYMBOL_GPL(sev_guest_df_flush);
 
+int sev_guest_snp_decommission(struct sev_data_snp_decommission *data, int 
*error)
+{
+   return sev_do_cmd(SEV_CMD_SNP_DECOMMISSION, data, error);
+}
+EXPORT_SYMBOL_GPL(sev_guest_snp_decommission);
+
+int sev_guest_snp_df_flush(int *error)
+{
+   return sev_do_cmd(SEV_CMD_SNP_DF_FLUSH, NULL, error);
+}
+EXPORT_SYMBOL_GPL(sev_guest_snp_df_flush);
+
+int sev_snp_reclaim(struct sev_data_snp_page_reclaim *data, int *error)
+{
+   return sev_do_cmd(SEV_CMD_SNP_PAGE_RECLAIM, data, error);
+}
+EXPORT_SYMBOL_GPL(sev_snp_reclaim);
+
+int sev_snp_unsmash(unsigned long paddr, int *error)
+{
+   struct sev_data_snp_page_unsmash *data;
+   int rc;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   data->paddr = paddr;
+   rc = sev_do_cmd(SEV_CMD_SNP_PAGE_UNSMASH, data, error);
+
+   kfree(data);
+   return rc;
+}
+EXPORT_SYMBOL_GPL(sev_snp_unsmash);
+
+int sev_guest_snp_dbg_decrypt(struct sev_data_snp_dbg *data, int *error)
+{
+   return sev_do_cmd(SEV_CMD_SNP_DBG_DECRYPT, data, error);
+}
+EXPORT_SYMBOL_GPL(sev_guest_snp_dbg_decrypt);
+
 static void sev_exit(struct kref *ref)
 {
misc_deregister(_dev->misc);
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index ec45c18c3b0a..32532df37446 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -821,6 +821,80 @@ int sev_guest_df_flush(int *error);
  */
 int sev_guest_decommission(struct sev_data_decommission *data, int *error);
 
+/**
+ * sev_guest_df_flush - perform SNP DF_FLUSH command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEVif the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO   if the sev returned a non-zero return code
+ */
+int sev_guest_snp_df_flush(int *error);
+
+/**
+ * sev_guest_snp_decommission - perform SNP_DECOMMISSION command
+ *
+ * @decommission: sev_data_decommission structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEVif the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO   if the sev returned a non-zero return code
+ */
+int sev_guest_snp_decommission(struct sev_data_snp_decommission *data, int 
*error);
+
+/**
+ * sev_snp_reclaim - perform SNP_PAGE_RECLAIM command
+ *
+ * @decommission: sev_snp_page_reclaim structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEVif the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO   if the sev returned a non-zero return code
+ */
+int sev_snp_reclaim(struct sev_data_snp_page_reclaim *data, int *error);
+
+/**
+ * sev_snp_reclaim - perform SNP_PAGE_UNSMASH command
+ *
+ * @decommission: sev_snp_page_unmash structure to be processed
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEVif the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO   if the sev returned a non-zero return code
+ */
+int sev_snp_unsmash(unsigned long paddr, int *error);
+
+/**
+ * sev_guest_snp_dbg_decrypt - perform SEV SNP_DBG_DECRYPT command
+ *
+ * @sev_ret: sev command return code
+ *
+ * Returns:
+ * 0 if the sev successfully processed the command
+ * -%ENODEVif the sev device is not available
+ * -%ENOTSUPP  if the sev does not support SEV
+ * -%ETIMEDOUT if the sev command timed out
+ * -%EIO   if the sev returned a non-zero return code
+ */
+int sev_guest_snp_dbg_decrypt(struct sev_data_snp_dbg *data, int *error);
+
+
 void *psp_copy_user_blob(u64 uaddr, u32 len);
 
 #else  /*

[RFC Part2 PATCH 12/30] crypto ccp: handle the legacy SEV command when SNP is enabled

2021-03-24 Thread Brijesh Singh

The behavior of the SEV-legacy commands is altered when the SNP firmware
is in the INIT state. When SNP is in INIT state, all the SEV-legacy
commands that cause the firmware to write to memory must be in the
firmware state before issuing the command..

See SEV-SNP spec section 5.3.7 for more detail.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/sev-dev.c | 90 +---
 drivers/crypto/ccp/sev-dev.h |  1 +
 2 files changed, 85 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 242c4775eb56..4aa9d4505d71 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -148,12 +148,35 @@ static int sev_cmd_buffer_len(int cmd)
return 0;
 }
 
+static bool sev_legacy_cmd_buf_writable(int cmd)
+{
+   switch (cmd) {
+   case SEV_CMD_PLATFORM_STATUS:
+   case SEV_CMD_GUEST_STATUS:
+   case SEV_CMD_LAUNCH_START:
+   case SEV_CMD_RECEIVE_START:
+   case SEV_CMD_LAUNCH_MEASURE:
+   case SEV_CMD_SEND_START:
+   case SEV_CMD_SEND_UPDATE_DATA:
+   case SEV_CMD_SEND_UPDATE_VMSA:
+   case SEV_CMD_PEK_CSR:
+   case SEV_CMD_PDH_CERT_EXPORT:
+   case SEV_CMD_GET_ID:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int __sev_do_cmd_locked(int cmd, void *data, int *psp_ret)
 {
+   size_t cmd_buf_len = sev_cmd_buffer_len(cmd);
struct psp_device *psp = psp_master;
struct sev_device *sev;
unsigned int phys_lsb, phys_msb;
unsigned int reg, ret = 0;
+   struct page *cmd_page = NULL;
+   struct rmpupdate e = {};
 
if (!psp || !psp->sev_data)
return -ENODEV;
@@ -163,15 +186,47 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
*psp_ret)
 
sev = psp->sev_data;
 
+   /*
+* Check If SNP is initialized and we are asked to execute a legacy
+* command that requires write by the firmware in the command buffer.
+* In that case use an intermediate command buffer page to complete the
+* operation.
+*
+* NOTE: If the command buffer contains a pointer which will be modified
+* by the firmware then caller must take care of it.
+*/
+   if (sev->snp_inited && sev_legacy_cmd_buf_writable(cmd)) {
+   cmd_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+   if (!cmd_page)
+   return -ENOMEM;
+
+   memcpy(page_address(cmd_page), data, cmd_buf_len);
+
+   /* make it as a firmware page */
+   e.immutable = true;
+   e.assigned = true;
+   ret = rmptable_rmpupdate(cmd_page, );
+   if (ret) {
+   dev_err(sev->dev, "sev cmd id %#x, failed to change to 
firmware state (spa 0x%lx ret %d).\n",
+   cmd, page_to_pfn(cmd_page) << PAGE_SHIFT, ret);
+   goto e_free;
+   }
+   }
+
/* Get the physical address of the command buffer */
-   phys_lsb = data ? lower_32_bits(__psp_pa(data)) : 0;
-   phys_msb = data ? upper_32_bits(__psp_pa(data)) : 0;
+   if (cmd_page) {
+   phys_lsb = data ? lower_32_bits(__sme_page_pa(cmd_page)) : 0;
+   phys_msb = data ? upper_32_bits(__sme_page_pa(cmd_page)) : 0;
+   } else {
+   phys_lsb = data ? lower_32_bits(__psp_pa(data)) : 0;
+   phys_msb = data ? upper_32_bits(__psp_pa(data)) : 0;
+   }
 
dev_dbg(sev->dev, "sev command id %#x buffer 0x%08x%08x timeout %us\n",
cmd, phys_msb, phys_lsb, psp_timeout);
 
print_hex_dump_debug("(in):  ", DUMP_PREFIX_OFFSET, 16, 2, data,
-sev_cmd_buffer_len(cmd), false);
+cmd_buf_len, false);
 
iowrite32(phys_lsb, sev->io_regs + sev->vdata->cmdbuff_addr_lo_reg);
iowrite32(phys_msb, sev->io_regs + sev->vdata->cmdbuff_addr_hi_reg);
@@ -185,6 +240,24 @@ static int __sev_do_cmd_locked(int cmd, void *data, int 
*psp_ret)
 
/* wait for command completion */
ret = sev_wait_cmd_ioc(sev, , psp_timeout);
+
+   /* if an intermediate page is used then copy the data back to original. 
*/
+   if (cmd_page) {
+   int rc;
+
+   /* make it as a hypervisor page */
+   memset(, 0, sizeof(struct rmpupdate));
+   rc = rmptable_rmpupdate(cmd_page, );
+   if (rc) {
+   dev_err(sev->dev, "

[RFC Part2 PATCH 08/30] crypto:ccp: define the SEV-SNP commands

2021-03-24 Thread Brijesh Singh

AMD introduced the next generation of SEV called SEV-SNP (Secure Nested
Paging). SEV-SNP builds upon existing SEV and SEV-ES functionality
while adding new hardware security protection.

Define the commands and structures used to communicate with the AMD-SP
when creating and managing the SEV-SNP guests. The SEV-SNP firmware spec
is available at developer.amd.com/sev.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/sev-dev.c |  11 ++
 include/linux/psp-sev.h  | 210 +++
 include/uapi/linux/psp-sev.h |  27 +
 3 files changed, 248 insertions(+)

diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
index 476113e12489..8a9fd843ad9e 100644
--- a/drivers/crypto/ccp/sev-dev.c
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -128,6 +128,17 @@ static int sev_cmd_buffer_len(int cmd)
case SEV_CMD_LAUNCH_UPDATE_SECRET:  return sizeof(struct 
sev_data_launch_secret);
case SEV_CMD_DOWNLOAD_FIRMWARE: return sizeof(struct 
sev_data_download_firmware);
case SEV_CMD_GET_ID:return sizeof(struct 
sev_data_get_id);
+   case SEV_CMD_SNP_GCTX_CREATE:   return sizeof(struct 
sev_data_snp_gctx_create);
+   case SEV_CMD_SNP_LAUNCH_START:  return sizeof(struct 
sev_data_snp_launch_start);
+   case SEV_CMD_SNP_LAUNCH_UPDATE: return sizeof(struct 
sev_data_snp_launch_update);
+   case SEV_CMD_SNP_ACTIVATE:  return sizeof(struct 
sev_data_snp_activate);
+   case SEV_CMD_SNP_DECOMMISSION:  return sizeof(struct 
sev_data_snp_decommission);
+   case SEV_CMD_SNP_PAGE_RECLAIM:  return sizeof(struct 
sev_data_snp_page_reclaim);
+   case SEV_CMD_SNP_GUEST_STATUS:  return sizeof(struct 
sev_data_snp_guest_status);
+   case SEV_CMD_SNP_LAUNCH_FINISH: return sizeof(struct 
sev_data_snp_launch_finish);
+   case SEV_CMD_SNP_PAGE_UNSMASH:  return sizeof(struct 
sev_data_snp_page_unsmash);
+   case SEV_CMD_SNP_PLATFORM_STATUS:   return sizeof(struct 
sev_data_snp_platform_status_buf);
+   case SEV_CMD_SNP_GUEST_REQUEST: return sizeof(struct 
sev_data_snp_guest_request);
default:return 0;
}
 
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index 49d155cd2dfe..df89f0207099 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -83,6 +83,34 @@ enum sev_cmd {
SEV_CMD_DBG_DECRYPT = 0x060,
SEV_CMD_DBG_ENCRYPT = 0x061,
 
+   /* SNP specific commands */
+   SEV_CMD_SNP_INIT= 0x81,
+   SEV_CMD_SNP_SHUTDOWN= 0x82,
+   SEV_CMD_SNP_PLATFORM_STATUS = 0x83,
+   SEV_CMD_SNP_DF_FLUSH= 0x84,
+   SEV_CMD_SNP_DOWNLOAD_FIRMWARE   = 0x85,
+   SEV_CMD_SNP_GET_ID  = 0x86,
+   SEV_CMD_SNP_DECOMMISSION= 0x90,
+   SEV_CMD_SNP_ACTIVATE= 0x91,
+   SEV_CMD_SNP_GUEST_STATUS= 0x92,
+   SEV_CMD_SNP_GCTX_CREATE = 0x93,
+   SEV_CMD_SNP_GUEST_REQUEST   = 0x94,
+   SEV_CMD_SNP_ACTIVATE_EX = 0x95,
+   SEV_CMD_SNP_LAUNCH_START= 0xA0,
+   SEV_CMD_SNP_LAUNCH_UPDATE   = 0xA1,
+   SEV_CMD_SNP_LAUNCH_FINISH   = 0xA2,
+   SEV_CMD_SNP_DBG_DECRYPT = 0xB0,
+   SEV_CMD_SNP_DBG_ENCRYT  = 0xB1,
+   SEV_CMD_SNP_PAGE_SWAP_OUT   = 0xC0,
+   SEV_CMD_SNP_PAGE_SWAP_IN= 0xC1,
+   SEV_CMD_SNP_PAGE_MOVE   = 0xC2,
+   SEV_CMD_SNP_PAGE_MD_INIT= 0xC3,
+   SEV_CMD_SNP_PAGE_MD_RECLAIM = 0xC4,
+   SEV_CMD_SNP_PAGE_RO_RECLAIM = 0xC5,
+   SEV_CMD_SNP_PAGE_RO_RESTORE = 0xC6,
+   SEV_CMD_SNP_PAGE_RECLAIM= 0xC7,
+   SEV_CMD_SNP_PAGE_UNSMASH= 0xC8,
+
SEV_CMD_MAX,
 };
 
@@ -483,6 +511,188 @@ struct sev_data_dbg {
u32 len;/* In */
 } __packed;
 
+/**
+ * struct sev_data_snp_platform_status_buf - SNP_PLATFORM_STATUS command params
+ *
+ * @address: physical address where the status should be copied
+ */
+struct sev_data_snp_platform_status_buf {
+   u64 status_paddr;   /* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_download_firmware - SNP_DOWNLOAD_FIRMWARE command params
+ *
+ * @address: physical address of firmware image
+ * @len: len of the firmware image
+ */
+struct sev_data_snp_download_firmware {
+   u64 address;/* In */
+   u32 len;/* In */
+} __packed;
+
+/**
+ * struct sev_data_snp_gctx_create - SNP_GCTX_C

[RFC Part2 PATCH 06/30] x86/fault: dump the RMP entry on #PF

2021-03-24 Thread Brijesh Singh

If hardware detects an RMP violation, it will raise a page-fault exception
with the RMP bit set. To help the debug, dump the RMP entry of the faulting
address.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/fault.c | 75 +
 1 file changed, 75 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f39b551f89a6..7605e06a6dd9 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -31,6 +31,7 @@
 #include  /* VMALLOC_START, ...   */
 #include   /* kvm_handle_async_pf  */
 #include   /* fixup_vdso_exception()   */
+#include/* lookup_rmpentry ...  */
 
 #define CREATE_TRACE_POINTS
 #include 
@@ -147,6 +148,76 @@ is_prefetch(struct pt_regs *regs, unsigned long 
error_code, unsigned long addr)
 DEFINE_SPINLOCK(pgd_lock);
 LIST_HEAD(pgd_list);
 
+static void dump_rmpentry(struct page *page, rmpentry_t *e)
+{
+   unsigned long paddr = page_to_pfn(page) << PAGE_SHIFT;
+
+   pr_alert("RMPEntry paddr 0x%lx [assigned=%d immutable=%d pagesize=%d 
gpa=0x%lx asid=%d "
+   "vmsa=%d validated=%d]\n", paddr, rmpentry_assigned(e), 
rmpentry_immutable(e),
+   rmpentry_pagesize(e), rmpentry_gpa(e), rmpentry_asid(e), 
rmpentry_vmsa(e),
+   rmpentry_validated(e));
+   pr_alert("RMPEntry paddr 0x%lx %016llx %016llx\n", paddr, e->high, 
e->low);
+}
+
+static void show_rmpentry(unsigned long address)
+{
+   struct page *page = virt_to_page(address);
+   rmpentry_t *entry, *large_entry;
+   int level, rmp_level;
+   pgd_t *pgd;
+   pte_t *pte;
+
+   /* Get the RMP entry for the fault address */
+   entry = lookup_page_in_rmptable(page, _level);
+   if (!entry) {
+   pr_alert("SEV-SNP: failed to read RMP entry for address 
0x%lx\n", address);
+   return;
+   }
+
+   dump_rmpentry(page, entry);
+
+   /*
+* If fault occurred during the large page walk, dump the RMP entry at 
base of 2MB page.
+*/
+   pgd = __va(read_cr3_pa());
+   pgd += pgd_index(address);
+   pte = lookup_address_in_pgd(pgd, address, );
+   if ((level > PG_LEVEL_4K) && (!IS_ALIGNED(address, PMD_SIZE))) {
+   address = address & PMD_MASK;
+   large_entry = lookup_page_in_rmptable(virt_to_page(address), 
_level);
+   if (!large_entry) {
+   pr_alert("SEV-SNP: failed to read large RMP entry 
0x%lx\n",
+   address & PMD_MASK);
+   return;
+   }
+
+   dump_rmpentry(virt_to_page(address), large_entry);
+   }
+
+   /*
+* If the RMP entry at the faulting address was not assigned, then dump 
may not provide
+* any useful debug information. Iterate through the entire 2MB region, 
and dump the RMP
+* entries if one of the bit in the RMP entry is set.
+*/
+   if (!rmpentry_assigned(entry)) {
+   unsigned long start, end;
+
+   start = address & PMD_MASK;
+   end = start + PMD_SIZE;
+
+   for (; start < end; start += PAGE_SIZE) {
+   entry = lookup_page_in_rmptable(virt_to_page(start), 
_level);
+   if (!entry)
+   return;
+
+   /* If any of the bits in RMP entry is set then dump it 
*/
+   if (entry->high || entry->low)
+   pr_alert("RMPEntry paddr %lx: %016llx 
%016llx\n",
+   page_to_pfn(page) << PAGE_SHIFT, 
entry->high, entry->low);
+   }
+   }
+}
+
 #ifdef CONFIG_X86_32
 static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address)
 {
@@ -580,6 +651,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code, unsigned long ad
}
 
dump_pagetable(address);
+
+   if (error_code & X86_PF_RMP)
+   show_rmpentry(address);
+
 }
 
 static noinline void
-- 
2.17.1

[RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation

2021-03-24 Thread Brijesh Singh

When SEV-SNP is enabled globally in the system, a write from the hypervisor
can raise an RMP violation. We can resolve the RMP violation by splitting
the virtual address to a lower page level.

e.g
- guest made a page shared in the RMP entry so that the hypervisor
  can write to it.
- the hypervisor has mapped the pfn as a large page. A write access
  will cause an RMP violation if one of the pages within the 2MB region
  is a guest private page.

The above RMP violation can be resolved by simply splitting the large
page.

The architecture specific code will read the RMP entry to determine
if the fault can be resolved by splitting and propagating the request
to split the page by setting newly introduced fault flag
(FAULT_FLAG_PAGE_SPLIT). If the fault cannot be resolved by splitting,
then a SIGBUS signal is sent to terminate the process.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/fault.c | 81 +
 include/linux/mm.h  |  6 +++-
 mm/memory.c | 11 ++
 3 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 7605e06a6dd9..f6571563f433 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1305,6 +1305,70 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long 
hw_error_code,
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
+#define RMP_FAULT_RETRY0
+#define RMP_FAULT_KILL 1
+#define RMP_FAULT_PAGE_SPLIT   2
+
+static inline size_t pages_per_hpage(int level)
+{
+   return page_level_size(level) / PAGE_SIZE;
+}
+
+/*
+ * The RMP fault can happen when a hypervisor attempts to write to:
+ * 1. a guest owned page or
+ * 2. any pages in the large page is a guest owned page.
+ *
+ * #1 will happen only when a process or VMM is attempting to modify the guest 
page
+ * without the guests cooperation. If a guest wants a VMM to be able to write 
to its memory
+ * then it should make the page shared. If we detect #1, kill the process 
because we can not
+ * resolve the fault.
+ *
+ * #2 can happen when the page level does not match between the RMP entry and 
x86
+ * page table walk, e.g the page is mapped as a large page in the x86 page 
table but its
+ * added as a 4K shared page in the RMP entry. This can be resolved by 
splitting the address
+ * into a smaller page level.
+ */
+static int handle_rmp_page_fault(unsigned long hw_error_code, unsigned long 
address)
+{
+   unsigned long pfn, mask;
+   int rmp_level, level;
+   rmpentry_t *e;
+   pte_t *pte;
+
+   /* Get the native page level */
+   pte = lookup_address_in_mm(current->mm, address, );
+   if (unlikely(!pte))
+   return RMP_FAULT_KILL;
+
+   pfn = pte_pfn(*pte);
+   if (level > PG_LEVEL_4K) {
+   mask = pages_per_hpage(level) - pages_per_hpage(level - 1);
+   pfn |= (address >> PAGE_SHIFT) & mask;
+   }
+
+   /* Get the page level from the RMP entry. */
+   e = lookup_page_in_rmptable(pfn_to_page(pfn), _level);
+   if (!e) {
+   pr_alert("SEV-SNP: failed to lookup RMP entry for address 0x%lx 
pfn 0x%lx\n",
+address, pfn);
+   return RMP_FAULT_KILL;
+   }
+
+   /* Its a guest owned page */
+   if (rmpentry_assigned(e))
+   return RMP_FAULT_KILL;
+
+   /*
+* Its a shared page but the page level does not match between the 
native walk
+* and RMP entry.
+*/
+   if (level > rmp_level)
+   return RMP_FAULT_PAGE_SPLIT;
+
+   return RMP_FAULT_RETRY;
+}
+
 /* Handle faults in the user portion of the address space */
 static inline
 void do_user_addr_fault(struct pt_regs *regs,
@@ -1315,6 +1379,7 @@ void do_user_addr_fault(struct pt_regs *regs,
struct task_struct *tsk;
struct mm_struct *mm;
vm_fault_t fault;
+   int ret;
unsigned int flags = FAULT_FLAG_DEFAULT;
 
tsk = current;
@@ -1377,6 +1442,22 @@ void do_user_addr_fault(struct pt_regs *regs,
if (hw_error_code & X86_PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;
 
+   /*
+* If its an RMP violation, see if we can resolve it.
+*/
+   if ((hw_error_code & X86_PF_RMP)) {
+   ret = handle_rmp_page_fault(hw_error_code, address);
+   if (ret == RMP_FAULT_PAGE_SPLIT) {
+   flags |= FAULT_FLAG_PAGE_SPLIT;
+   } else if (ret == RMP_FAULT_KILL) {
+   fault |= VM_FAULT_SIGBUS;
+   mm_fault_error(regs, hw_error_code, address, fault

[RFC Part2 PATCH 05/30] x86: define RMP violation #PF error code

2021-03-24 Thread Brijesh Singh

Bit 31 in the page fault-error bit will be set when processor encounters
an RMP violation.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/trap_pf.h | 2 ++
 arch/x86/mm/fault.c| 1 +
 2 files changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/trap_pf.h b/arch/x86/include/asm/trap_pf.h
index 10b1de500ab1..107f9d947e8d 100644
--- a/arch/x86/include/asm/trap_pf.h
+++ b/arch/x86/include/asm/trap_pf.h
@@ -12,6 +12,7 @@
  *   bit 4 ==  1: fault was an instruction fetch
  *   bit 5 ==  1: protection keys block access
  *   bit 15 == 1: SGX MMU page-fault
+ *   bit 31 == 1: fault was an RMP violation
  */
 enum x86_pf_error_code {
X86_PF_PROT =   1 << 0,
@@ -21,6 +22,7 @@ enum x86_pf_error_code {
X86_PF_INSTR=   1 << 4,
X86_PF_PK   =   1 << 5,
X86_PF_SGX  =   1 << 15,
+   X86_PF_RMP  =   1ull << 31,
 };
 
 #endif /* _ASM_X86_TRAP_PF_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..f39b551f89a6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -547,6 +547,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code, unsigned long ad
 !(error_code & X86_PF_PROT) ? "not-present page" :
 (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
 (error_code & X86_PF_PK)? "protection keys violation" :
+(error_code & X86_PF_RMP)   ? "rmp violation" :
   "permissions violation");
 
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
-- 
2.17.1

[RFC Part2 PATCH 04/30] x86/mm: split the physmap when adding the page in RMP table

2021-03-24 Thread Brijesh Singh

The integrity guarantee of SEV-SNP is enforced through the RMP table.
The RMP is used in conjuntion with standard x86 and IOMMU page
tables to enforce memory restrictions and page access rights. The
RMP is indexed by system physical address, and is checked at the end
of CPU and IOMMU table walks. The RMP check is enforced as soon as
SEV-SNP is enabled globally in the system. Not every memory access
requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality
is already protected via memory encryption. When hardware encounters
an RMP checks failure, it raise a page-fault exception. The RMP bit in
fault error code can be used to determine if the fault was due to an
RMP checks failure.

A write from the hypervisor goes through the RMP checks. When the
hypervisor writes to pages, hardware checks to ensures that the assigned
bit in the RMP is zero (i.e page is shared). If the page table entry that
gives the sPA indicates that the target page size is a large page, then
all RMP entries for the 4KB constituting pages of the target must have the
assigned bit 0. If one of entry does not have assigned bit 0 then hardware
will raise an RMP violation. To resolve it, we must split the page table
entry leading to target page into 4K.

This poses a challenge in the Linux memory model. The Linux kernel
creates a direct mapping of all the physical memory -- referred to as
the physmap. The physmap may contain a valid mapping of guest owned pages.
During the page table walk, we may get into the situation where one
of the pages within the large page is owned by the guest (i.e assigned
bit is set in RMP). A write to a non-guest within the large page will
raise an RMP violation. To workaround it, we call set_memory_4k() to split
the physmap before adding the page in the RMP table. This ensures that the
pages added in the RMP table are used as 4K in the physmap.

The spliting of the physmap is a temporary solution until we work to
improve the kernel page fault handler to split the pages on demand.
One of the disadvtange of splitting is that eventually, we will end up
breaking down the entire physmap unless we combine the split pages back to
a large page. I am open to the suggestation on various approaches we could
take to address this problem.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/mem_encrypt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 7a0138cb3e17..4047acb37c30 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -674,6 +674,12 @@ int rmptable_rmpupdate(struct page *page, struct rmpupdate 
*val)
if (!static_branch_unlikely(_enable_key))
return -ENXIO;
 
+   ret = set_memory_4k((unsigned long)page_to_virt(page), 1);
+   if (ret) {
+   pr_err("SEV-SNP: failed to split physical address 0x%lx 
(%d)\n", spa, ret);
+   return ret;
+   }
+
/* Retry if another processor is modifying the RMP entry. */
do {
asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
-- 
2.17.1

[RFC Part2 PATCH 03/30] x86: add helper functions for RMPUPDATE and PSMASH instruction

2021-03-24 Thread Brijesh Singh

The RMPUPDATE instruction writes a new RMP entry in the RMP Table. The
hypervisor will use the instruction to add pages to the RMP table. See
APM3 for details on the instruction operations.

The PSMASH instruction expands a 2MB RMP entry into a corresponding set of
contiguous 4KB-Page RMP entries. The hypervisor will use this instruction
to adjust the RMP entry without invalidating the previous RMP entry.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-snp.h | 27 ++
 arch/x86/mm/mem_encrypt.c  | 41 ++
 2 files changed, 68 insertions(+)

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index 2aa14b38c5ed..199d88a38c76 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -96,6 +96,29 @@ typedef struct rmpentry rmpentry_t;
 #define rmpentry_gpa(x)((unsigned long)(x)->info.gpa)
 #define rmpentry_immutable(x)  ((x)->info.immutable)
 
+
+/* Return code of RMPUPDATE */
+#define RMPUPDATE_SUCCESS  0
+#define RMPUPDATE_FAIL_INPUT   1
+#define RMPUPDATE_FAIL_PERMISSION  2
+#define RMPUPDATE_FAIL_INUSE   3
+#define RMPUPDATE_FAIL_OVERLAP 4
+
+struct rmpupdate {
+   u64 gpa;
+   u8 assigned;
+   u8 pagesize;
+   u8 immutable;
+   u8 rsvd;
+   u32 asid;
+} __packed;
+
+/* Return code of PSMASH */
+#define PSMASH_FAIL_INPUT  1
+#define PSMASH_FAIL_PERMISSION 2
+#define PSMASH_FAIL_INUSE  3
+#define PSMASH_FAIL_BADADDR4
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 #include 
 
@@ -124,6 +147,8 @@ void __init early_snp_set_memory_shared(unsigned long 
vaddr, unsigned long paddr
 int snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
 int snp_set_memory_private(unsigned long vaddr, unsigned int npages);
 rmpentry_t *lookup_page_in_rmptable(struct page *page, int *level);
+int rmptable_psmash(struct page *page);
+int rmptable_rmpupdate(struct page *page, struct rmpupdate *e);
 
 extern struct static_key_false snp_enable_key;
 static inline bool snp_key_active(void)
@@ -155,6 +180,8 @@ static inline int snp_set_memory_shared(unsigned long 
vaddr, unsigned int npages
 static inline int snp_set_memory_private(unsigned long vaddr, unsigned int 
npages) { return 0; }
 static inline bool snp_key_active(void) { return false; }
 static inline rpmentry_t *lookup_page_in_rmptable(struct page *page, int 
*level) { return NULL; }
+static inline int rmptable_psmash(struct page *page) { return -ENXIO; }
+static inline int rmptable_rmpupdate(struct page *page, struct rmpupdate *e) { 
return -ENXIO; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 06394b6d56b2..7a0138cb3e17 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -644,3 +644,44 @@ rmpentry_t *lookup_page_in_rmptable(struct page *page, int 
*level)
return entry;
 }
 EXPORT_SYMBOL_GPL(lookup_page_in_rmptable);
+
+int rmptable_psmash(struct page *page)
+{
+   unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
+   int ret;
+
+   if (!static_branch_unlikely(_enable_key))
+   return -ENXIO;
+
+   /* Retry if another processor is modifying the RMP entry. */
+   do {
+   asm volatile(".byte 0xF3, 0x0F, 0x01, 0xFF"
+ : "=a"(ret)
+ : "a"(spa)
+ : "memory", "cc");
+   } while (ret == PSMASH_FAIL_INUSE);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(rmptable_psmash);
+
+int rmptable_rmpupdate(struct page *page, struct rmpupdate *val)
+{
+   unsigned long spa = page_to_pfn(page) << PAGE_SHIFT;
+   bool flush = true;
+   int ret;
+
+   if (!static_branch_unlikely(_enable_key))
+   return -ENXIO;
+
+   /* Retry if another processor is modifying the RMP entry. */
+   do {
+   asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFE"
+: "=a"(ret)
+: "a"(spa), "c"((unsigned long)val), "d"(flush)
+: "memory", "cc");
+   } while (ret == PSMASH_FAIL_INUSE);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(rmptable_rmpupdate);
-- 
2.17.1

[RFC Part2 PATCH 02/30] x86/sev-snp: add RMP entry lookup helpers

2021-03-24 Thread Brijesh Singh

The lookup_page_in_rmptable() can be used by the host to read the RMP
entry for a given page. The RMP entry format is documented in PPR
section 2.1.5.2.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-snp.h | 31 +++
 arch/x86/mm/mem_encrypt.c  | 32 
 2 files changed, 63 insertions(+)

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index f7280d5c6158..2aa14b38c5ed 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -67,6 +67,35 @@ struct __packed snp_page_state_change {
 #define X86_RMP_PG_LEVEL(level)(((level) == PG_LEVEL_4K) ? 
RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
 #define RMP_X86_PG_LEVEL(level)(((level) == RMP_PG_SIZE_4K) ? 
PG_LEVEL_4K : PG_LEVEL_2M)
 
+/* RMP table entry format (PPR section 2.1.5.2) */
+struct __packed rmpentry {
+   union {
+   struct {
+   uint64_t assigned:1;
+   uint64_t pagesize:1;
+   uint64_t immutable:1;
+   uint64_t rsvd1:9;
+   uint64_t gpa:39;
+   uint64_t asid:10;
+   uint64_t vmsa:1;
+   uint64_t validated:1;
+   uint64_t rsvd2:1;
+   } info;
+   uint64_t low;
+   };
+   uint64_t high;
+};
+
+typedef struct rmpentry rmpentry_t;
+
+#define rmpentry_assigned(x)   ((x)->info.assigned)
+#define rmpentry_pagesize(x)   (RMP_X86_PG_LEVEL((x)->info.pagesize))
+#define rmpentry_vmsa(x)   ((x)->info.vmsa)
+#define rmpentry_asid(x)   ((x)->info.asid)
+#define rmpentry_validated(x)  ((x)->info.validated)
+#define rmpentry_gpa(x)((unsigned long)(x)->info.gpa)
+#define rmpentry_immutable(x)  ((x)->info.immutable)
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 #include 
 
@@ -94,6 +123,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, 
unsigned long paddr
unsigned int npages);
 int snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
 int snp_set_memory_private(unsigned long vaddr, unsigned int npages);
+rmpentry_t *lookup_page_in_rmptable(struct page *page, int *level);
 
 extern struct static_key_false snp_enable_key;
 static inline bool snp_key_active(void)
@@ -124,6 +154,7 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned 
long paddr, unsigned i
 static inline int snp_set_memory_shared(unsigned long vaddr, unsigned int 
npages) { return 0; }
 static inline int snp_set_memory_private(unsigned long vaddr, unsigned int 
npages) { return 0; }
 static inline bool snp_key_active(void) { return false; }
+static inline rpmentry_t *lookup_page_in_rmptable(struct page *page, int 
*level) { return NULL; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 39461b9cb34e..06394b6d56b2 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -34,6 +34,8 @@
 
 #include "mm_internal.h"
 
+#define rmptable_page_offset(x)(0x4000 + (((unsigned long) x) >> 8))
+
 /*
  * Since SME related variables are set early in the boot process they must
  * reside in the .data section so as not to be zeroed out when the .bss
@@ -612,3 +614,33 @@ static int __init mem_encrypt_snp_init(void)
  * SEV-SNP must be enabled across all CPUs, so make the initialization as a 
late initcall.
  */
 late_initcall(mem_encrypt_snp_init);
+
+rmpentry_t *lookup_page_in_rmptable(struct page *page, int *level)
+{
+   unsigned long phys = page_to_pfn(page) << PAGE_SHIFT;
+   rmpentry_t *entry, *large_entry;
+   unsigned long vaddr;
+
+   if (!static_branch_unlikely(_enable_key))
+   return NULL;
+
+   vaddr = rmptable_start + rmptable_page_offset(phys);
+   if (WARN_ON(vaddr > rmptable_end))
+   return NULL;
+
+   entry = (rmpentry_t *)vaddr;
+
+   /*
+* Check if this page is covered by the large RMP entry. This is needed 
to get
+* the page level used in the RMP entry.
+*
+* e.g. if the page is covered by the large RMP entry then page size is 
set in the
+*   base RMP entry.
+*/
+   vaddr = rmptable_start + rmptable_page_offset(phys & PMD_MASK);
+   large_entry = (rmpentry_t *)vaddr;
+   *level = rmpentry_pagesize(large_entry);
+
+   return entry;
+}
+EXPORT_SYMBOL_GPL(lookup_page_in_rmptable);
-- 
2.17.1

[RFC Part2 PATCH 01/30] x86: Add the host SEV-SNP initialization support

2021-03-24 Thread Brijesh Singh

The memory integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). The RMP is a single data
structure shared across the system that contains one entry for every 4K
page of DRAM that may be used by SEV-SNP VMs. The goal of RMP is to
track the owner of each page of memory. Pages of memory can be owned by
the hypervisor, owned by a specific VM or owned by the AMD-SP. See APM2
section 15.36.3 for more detail on RMP.

The RMP table is used to enforce access control to memory. The table itself
is not directly writable by the software. New CPU instructions (RMPUPDATE,
PVALIDATE, RMPADJUST) are used to manipulate the RMP entries.

Based on the platform configuration, the BIOS reserves the memory used
for the RMP table. The start and end address of the RMP table can be
queried by reading the RMP_BASE and RMP_END MSRs. If the RMP_BASE and
RMP_END are not set then we disable the SEV-SNP feature.

The SEV-SNP feature is enabled only after the RMP table is successfully
initialized.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/msr-index.h |  6 +++
 arch/x86/include/asm/sev-snp.h   | 10 
 arch/x86/mm/mem_encrypt.c| 84 
 3 files changed, 100 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b03694e116fe..1142d31eb06c 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -481,6 +481,8 @@
 #define MSR_AMD64_SEV_ENABLED  BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED   BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
 #define MSR_AMD64_SEV_SNP_ENABLED  BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
+#define MSR_AMD64_RMP_BASE 0xc0010132
+#define MSR_AMD64_RMP_END  0xc0010133
 
 #define MSR_AMD64_VIRT_SPEC_CTRL   0xc001011f
 
@@ -538,6 +540,10 @@
 #define MSR_K8_SYSCFG  0xc0010010
 #define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT  23
 #define MSR_K8_SYSCFG_MEM_ENCRYPT  BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
+#define MSR_K8_SYSCFG_SNP_EN_BIT   24
+#define MSR_K8_SYSCFG_SNP_EN   BIT_ULL(MSR_K8_SYSCFG_SNP_EN_BIT)
+#define MSR_K8_SYSCFG_SNP_VMPL_EN_BIT  25
+#define MSR_K8_SYSCFG_SNP_VMPL_EN  BIT_ULL(MSR_K8_SYSCFG_SNP_VMPL_EN_BIT)
 #define MSR_K8_INT_PENDING_MSG 0xc0010055
 /* C1E active bits in int pending message */
 #define K8_INTP_C1E_ACTIVE_MASK0x1800
diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index 59b57a5f6524..f7280d5c6158 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -68,6 +68,8 @@ struct __packed snp_page_state_change {
 #define RMP_X86_PG_LEVEL(level)(((level) == RMP_PG_SIZE_4K) ? 
PG_LEVEL_4K : PG_LEVEL_2M)
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
+#include 
+
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
  unsigned long *rflags)
 {
@@ -93,6 +95,13 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, 
unsigned long paddr
 int snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
 int snp_set_memory_private(unsigned long vaddr, unsigned int npages);
 
+extern struct static_key_false snp_enable_key;
+static inline bool snp_key_active(void)
+{
+   return static_branch_unlikely(_enable_key);
+}
+
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 static inline int __pvalidate(unsigned long vaddr, int psize, int validate, 
unsigned long *eflags)
@@ -114,6 +123,7 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned 
long paddr, unsigned i
 }
 static inline int snp_set_memory_shared(unsigned long vaddr, unsigned int 
npages) { return 0; }
 static inline int snp_set_memory_private(unsigned long vaddr, unsigned int 
npages) { return 0; }
+static inline bool snp_key_active(void) { return false; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 35af2f21b8f1..39461b9cb34e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "mm_internal.h"
 
@@ -44,12 +45,16 @@ u64 sev_check_data __section(".data") = 0;
 EXPORT_SYMBOL(sme_me_mask);
 DEFINE_STATIC_KEY_FALSE(sev_enable_key);
 EXPORT_SYMBOL_GPL(sev_enable_key);
+DEFINE_STATIC_KEY_FALSE(snp_enable_key);
+EXPORT_SYMBOL_GPL(snp_enable_key);
 
 bool sev_enabled __section(".data");
 
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
 
+static unsigned lon

[RFC Part2 PATCH 00/30] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

2021-03-24 Thread Brijesh Singh

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required in a host OS for SEV-SNP support. The series builds upon
SEV-SNP Part-1 https://marc.info/?l=kvm=161660430125343=2 .

This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

The CCP driver is enhanced to provide new APIs that use the SEV-SNP
specific commands defined in the SEV-SNP firmware specification. The KVM
driver uses those APIs to create and managed the SEV-SNP guests.

The GHCB specification version 2 introduces new set of NAE's that is
used by the SEV-SNP guest to communicate with the hypervisor. The series
provides support to handle the following new NAE events:
- Register GHCB GPA
- Page State Change Request

The RMP check is enforced as soon as SEV-SNP is enabled. Not every memory
access requires an RMP check. In particular, the read accesses from the
hypervisor do not require RMP checks because the data confidentiality is
already protected via memory encryption. When hardware encounters an RMP
checks failure, it raises a page-fault exception. If RMP check failure
is due to the page-size mismatch, then split the large page to resolve
the fault. See patch 4 and 7 for further details.

The series does not provide support for the following SEV-SNP specific
NAE's yet:

* Query Attestation 
* AP bring up
* Interrupt security

The series is based on kvm/master commit:
  87aa9ec939ec KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs

The complete source is available at
https://github.com/AMDESE/linux/tree/sev-snp-part-2-rfc1

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org

Additional resources
-
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
 
APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)

GHCB spec v2:
  The draft specification is posted on AMD-SEV-SNP mailing list:
   https://lists.suse.com/mailman/private/amd-sev-snp/

  Copy of draft spec is also available at 
  
https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf

GHCB spec v1:
SEV-SNP firmware specification:
 https://developer.amd.com/sev/

Brijesh Singh (30):
  x86: Add the host SEV-SNP initialization support
  x86/sev-snp: add RMP entry lookup helpers
  x86: add helper functions for RMPUPDATE and PSMASH instruction
  x86/mm: split the physmap when adding the page in RMP table
  x86: define RMP violation #PF error code
  x86/fault: dump the RMP entry on #PF
  mm: add support to split the large THP based on RMP violation
  crypto:ccp: define the SEV-SNP commands
  crypto: ccp: Add support to initialize the AMD-SP for SEV-SNP
  crypto: ccp: shutdown SNP firmware on kexec
  crypto:ccp: provide APIs to issue SEV-SNP commands
  crypto ccp: handle the legacy SEV command when SNP is enabled
  KVM: SVM: add initial SEV-SNP support
  KVM: SVM: make AVIC backing, VMSA and VMCB memory allocation SNP safe
  KVM: SVM: define new SEV_FEATURES field in the VMCB Save State Area
  KVM: SVM: add KVM_SNP_INIT command
  KVM: SVM: add KVM_SEV_SNP_LAUNCH_START command
  KVM: SVM: add KVM_SEV_SNP_LAUNCH_UPDATE command
  KVM: SVM: Reclaim the guest pages when SEV-SNP VM terminates
  KVM: SVM: add KVM_SEV_SNP_LAUNCH_FINISH command
  KVM: X86: Add kvm_x86_ops to get the max page level for the TDP
  x86/mmu: Introduce kvm_mmu_map_tdp_page() for use by SEV
  KVM: X86: Introduce kvm_mmu_get_tdp_walk() for SEV-SNP use
  KVM: X86: define new RMP check related #NPF error bits
  KVM: X86: update page-fault trace to log the 64-bit error code
  KVM: SVM: add support to handle GHCB GPA register VMGEXIT
  KVM: SVM: add support to handle MSR based Page State Change VMGEXIT
  KVM: SVM: add support to handle Page State Change VMGEXIT
  KVM: X86: export the kvm_zap_gfn_range() for the SNP use
  KVM: X86: Add support to handle the RMP nested page fault

 arch/x86/include/asm/kvm_host.h  |  14 +
 arch/x86/include/asm/msr-index.h |   6 +
 arch/x86/include/asm/sev-snp.h   |  68 +++
 arch/x86/include/asm/svm.h   |  12 +-
 arch/x86/include/asm/trap_pf.h   |   2 +
 arch/x86/kvm/lapic.c |   5 +-
 arch/x86/kvm/mmu.h   |   5 +-
 arch/x86/kvm/mmu/mmu.c   |  76 ++-
 arch/x86/kvm/svm/sev.c   | 925 ++-
 arch/x86/kvm/svm/svm.c   |  28 +-
 arch/x86/kvm/svm/svm.h   |  49 ++
 arch/x86/kvm/trace.h |   6 +-
 arch/x86/kvm/vmx/vmx.c   |   8 +
 arch/x86/mm/fault.c  | 157 ++
 arch/x86/mm/mem_encrypt.c|

[RFC Part1 PATCH 13/13] x86/kernel: add support to validate memory when changing C-bit

2021-03-24 Thread Brijesh Singh

The set_memory_{encrypt,decrypt}() are used for changing the pages
from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the memory region in
   the RMP table.
2. Validate the memory region after the RMP entry is added.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before encryption attribute
is removed from the page table:

1. Invalidate the page.
2. Issue the page state change VMGEXIT to remove the page from RMP table.

To change the page state in the RMP table, use the Page State Change
VMGEXIT defined in the GHCB spec section 4.1.6.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-es.h  |   2 +
 arch/x86/include/asm/sev-snp.h |   4 ++
 arch/x86/kernel/sev-es.c   |   7 +++
 arch/x86/kernel/sev-snp.c  | 106 +
 arch/x86/mm/pat/set_memory.c   |  19 ++
 5 files changed, 138 insertions(+)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index 33838a8f8495..8715e41e2c8f 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -109,6 +109,7 @@ static __always_inline void sev_es_nmi_complete(void)
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
 extern struct ghcb *sev_es_get_ghcb(struct ghcb_state *state);
 extern void sev_es_put_ghcb(struct ghcb_state *state);
+extern int vmgexit_page_state_change(struct ghcb *ghcb, void *data);
 
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
@@ -118,6 +119,7 @@ static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
 static inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state) { return 
NULL; }
 static inline void sev_es_put_ghcb(struct ghcb_state *state) { }
+static inline int vmgexit_page_state_change(struct ghcb *ghcb, void *data) { 
return 0; }
 #endif
 
 #endif
diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index c4b096206062..59b57a5f6524 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -90,6 +90,8 @@ void __init early_snp_set_memory_private(unsigned long vaddr, 
unsigned long padd
unsigned int npages);
 void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long 
paddr,
unsigned int npages);
+int snp_set_memory_shared(unsigned long vaddr, unsigned int npages);
+int snp_set_memory_private(unsigned long vaddr, unsigned int npages);
 
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
@@ -110,6 +112,8 @@ early_snp_set_memory_shared(unsigned long vaddr, unsigned 
long paddr, unsigned i
 {
return 0;
 }
+static inline int snp_set_memory_shared(unsigned long vaddr, unsigned int 
npages) { return 0; }
+static inline int snp_set_memory_private(unsigned long vaddr, unsigned int 
npages) { return 0; }
 
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index d4957b3fc43f..7309be685440 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -586,6 +586,13 @@ static bool __init sev_es_setup_ghcb(void)
return true;
 }
 
+int vmgexit_page_state_change(struct ghcb *ghcb, void *data)
+{
+   ghcb_set_sw_scratch(ghcb, (u64)__pa(data));
+
+   return sev_es_ghcb_hv_call(ghcb, NULL, SVM_VMGEXIT_PAGE_STATE_CHANGE, 
0, 0);
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void sev_es_ap_hlt_loop(void)
 {
diff --git a/arch/x86/kernel/sev-snp.c b/arch/x86/kernel/sev-snp.c
index ff9b35bfb05c..d236089c0739 100644
--- a/arch/x86/kernel/sev-snp.c
+++ b/arch/x86/kernel/sev-snp.c
@@ -15,6 +15,7 @@
 
 #include 
 #include 
+#include 
 
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
@@ -161,3 +162,108 @@ void __init early_snp_set_memory_shared(unsigned long 
vaddr, unsigned long paddr
 /* Ask hypervisor to make the memory shared in the RMP table. */
early_snp_set_page_state(paddr, npages, SNP_PAGE_STATE_SHARED);
 }
+
+static int snp_page_state_vmgexit(struct ghcb *ghcb, struct 
snp_page_state_change *data)
+{
+   struct snp_page_state_header *hdr;
+   int ret = 0;
+
+   hdr = >header;
+
+   /*
+* The hypervisor can return before processing all the entries, the 
loop below retries
+* until all the entries are processed.
+*/
+   while (hdr->cur_entry <= hdr->end_entry) {
+   ghcb_set

[RFC Part1 PATCH 12/13] x86/sev-es: make GHCB get and put helper accessible outside

2021-03-24 Thread Brijesh Singh

The SEV-SNP support extended the GHCB specification with few SNP-specific
VMGEXITs. Those VMGEXITs will be implemented in sev-snp.c. Make the GHCB
get/put helper available outside the sev-es.c so that SNP VMGEXIT can
avoid duplicating the GHCB get/put logic..

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-es.h | 9 +
 arch/x86/kernel/sev-es.c  | 8 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/sev-es.h b/arch/x86/include/asm/sev-es.h
index cf1d957c7091..33838a8f8495 100644
--- a/arch/x86/include/asm/sev-es.h
+++ b/arch/x86/include/asm/sev-es.h
@@ -81,6 +81,10 @@ extern void vc_no_ghcb(void);
 extern void vc_boot_ghcb(void);
 extern bool handle_vc_boot_ghcb(struct pt_regs *regs);
 
+struct ghcb_state {
+   struct ghcb *ghcb;
+};
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern struct static_key_false sev_es_enable_key;
 extern void __sev_es_ist_enter(struct pt_regs *regs);
@@ -103,12 +107,17 @@ static __always_inline void sev_es_nmi_complete(void)
__sev_es_nmi_complete();
 }
 extern int __init sev_es_efi_map_ghcbs(pgd_t *pgd);
+extern struct ghcb *sev_es_get_ghcb(struct ghcb_state *state);
+extern void sev_es_put_ghcb(struct ghcb_state *state);
+
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
 static inline int sev_es_setup_ap_jump_table(struct real_mode_header *rmh) { 
return 0; }
 static inline void sev_es_nmi_complete(void) { }
 static inline int sev_es_efi_map_ghcbs(pgd_t *pgd) { return 0; }
+static inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state) { return 
NULL; }
+static inline void sev_es_put_ghcb(struct ghcb_state *state) { }
 #endif
 
 #endif
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 004bf1102dc1..d4957b3fc43f 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -98,10 +98,6 @@ struct sev_es_runtime_data {
bool ghcb_registered;
 };
 
-struct ghcb_state {
-   struct ghcb *ghcb;
-};
-
 static DEFINE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
 DEFINE_STATIC_KEY_FALSE(sev_es_enable_key);
 
@@ -178,7 +174,7 @@ void noinstr __sev_es_ist_exit(void)
this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long 
*)ist);
 }
 
-static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
+struct ghcb *sev_es_get_ghcb(struct ghcb_state *state)
 {
struct sev_es_runtime_data *data;
struct ghcb *ghcb;
@@ -213,7 +209,7 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct 
ghcb_state *state)
return ghcb;
 }
 
-static __always_inline void sev_es_put_ghcb(struct ghcb_state *state)
+void sev_es_put_ghcb(struct ghcb_state *state)
 {
struct sev_es_runtime_data *data;
struct ghcb *ghcb;
-- 
2.17.1

[RFC Part1 PATCH 10/13] X86: kernel: make the bss.decrypted section shared in RMP table

2021-03-24 Thread Brijesh Singh

The encryption attribute for the bss.decrypted region is cleared in the
initial page table build. This is because the section contains the data
that need to be shared between the guest and the hypervisor.

When SEV-SNP is active, just clearing the encryption attribute in the
page table is not enough. We also need to make the page shared in the
RMP table.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/head64.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 5e9beb77cafd..1bf005d38ebc 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Manage page tables very early on.
@@ -288,6 +289,19 @@ unsigned long __head __startup_64(unsigned long physaddr,
if (mem_encrypt_active()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
+
+   /*
+* The bss.decrypted region is mapped decrypted in the initial 
page table.
+* If SEV-SNP is active then transition the page to shared in 
the RMP table
+* so that it is consistent with the page table attribute 
change below.
+*/
+   if (sev_snp_active()) {
+   unsigned long npages;
+
+   npages = PAGE_ALIGN(vaddr_end - vaddr) >> PAGE_SHIFT;
+   early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), 
npages);
+   }
+
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
i = pmd_index(vaddr);
pmd[i] -= sme_get_me_mask();
-- 
2.17.1

[RFC Part1 PATCH 07/13] x86/compressed: register GHCB memory when SNP is active

2021-03-24 Thread Brijesh Singh

The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification section 2.5.2.

Currently, we do not support working with hypervisor preferred GPA, If
the hypervisor can not work with our provided GPA then we will terminate
the boot.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/boot/compressed/sev-es.c  |  4 
 arch/x86/boot/compressed/sev-snp.c | 26 ++
 arch/x86/include/asm/sev-snp.h | 11 +++
 3 files changed, 41 insertions(+)

diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 58b15b7c1aa7..c85d3d9ec57a 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "error.h"
 
@@ -118,6 +119,9 @@ static bool early_setup_sev_es(void)
/* Initialize lookup tables for the instruction decoder */
inat_init_tables();
 
+   /* SEV-SNP guest requires the GHCB GPA must be registered */
+   sev_snp_register_ghcb(__pa(_ghcb_page));
+
return true;
 }
 
diff --git a/arch/x86/boot/compressed/sev-snp.c 
b/arch/x86/boot/compressed/sev-snp.c
index 5c25103b0df1..a4c5e85699a7 100644
--- a/arch/x86/boot/compressed/sev-snp.c
+++ b/arch/x86/boot/compressed/sev-snp.c
@@ -113,3 +113,29 @@ void sev_snp_set_page_shared(unsigned long paddr)
 {
sev_snp_set_page_private_shared(paddr, SNP_PAGE_STATE_SHARED);
 }
+
+void sev_snp_register_ghcb(unsigned long paddr)
+{
+   u64 pfn = paddr >> PAGE_SHIFT;
+   u64 old, val;
+
+   if (!sev_snp_enabled())
+   return;
+
+   /* save the old GHCB MSR */
+   old = sev_es_rd_ghcb_msr();
+
+   /* Issue VMGEXIT */
+   sev_es_wr_ghcb_msr(GHCB_REGISTER_GPA_REQ_VAL(pfn));
+   VMGEXIT();
+
+   val = sev_es_rd_ghcb_msr();
+
+   /* If the response GPA is not ours then abort the guest */
+   if ((GHCB_SEV_GHCB_RESP_CODE(val) != GHCB_REGISTER_GPA_RESP) ||
+   (GHCB_REGISTER_GPA_RESP_VAL(val) != pfn))
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);
+
+   /* Restore the GHCB MSR value */
+   sev_es_wr_ghcb_msr(old);
+}
diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index f514dad276f2..0523eb21abd7 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -56,6 +56,13 @@ struct __packed snp_page_state_change {
struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
 };
 
+/* GHCB GPA register */
+#define GHCB_REGISTER_GPA_REQ  0x012UL
+#defineGHCB_REGISTER_GPA_REQ_VAL(v)
(GHCB_REGISTER_GPA_REQ | ((v) << 12))
+
+#define GHCB_REGISTER_GPA_RESP 0x013UL
+#defineGHCB_REGISTER_GPA_RESP_VAL(val) ((val) >> 12)
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
  unsigned long *rflags)
@@ -73,6 +80,8 @@ static inline int __pvalidate(unsigned long vaddr, int 
rmp_psize, int validate,
return rc;
 }
 
+void sev_snp_register_ghcb(unsigned long paddr);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 static inline int __pvalidate(unsigned long vaddr, int psize, int validate, 
unsigned long *eflags)
@@ -80,6 +89,8 @@ static inline int __pvalidate(unsigned long vaddr, int psize, 
int validate, unsi
return 0;
 }
 
+static inline void sev_snp_register_ghcb(unsigned long paddr) { }
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif /* __ASSEMBLY__ */
-- 
2.17.1

[RFC Part1 PATCH 11/13] x86/kernel: validate rom memory before accessing when SEV-SNP is active

2021-03-24 Thread Brijesh Singh

The probe_roms() access the memory range (0xc - 0x1) to probe
various ROMs. The memory range is not part of the E820 system RAM
range. The memory range is mapped as private (i.e encrypted) in page
table.

When SEV-SNP is active, all the private memory must be validated before
the access. The ROM range was not part of E820 map, so the guest BIOS
did not validate it. An access to invalidated memory will cause a VC
exception. We don't have VC exception handler ready to validate the
memory on-demand. Lets validate the ROM memory region before it is
assessed.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/probe_roms.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 9e1def3744f2..65640b401b9c 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 static struct resource system_rom_resource = {
.name   = "System ROM",
@@ -202,6 +204,19 @@ void __init probe_roms(void)
unsigned char c;
int i;
 
+   /*
+* The ROM memory is not part of the E820 system RAM and is not 
prevalidated by the BIOS.
+* The kernel page table maps the ROM region as encrypted memory, the 
SEV-SNP requires
+* the all the encrypted memory must be validated before the access.
+*/
+   if (sev_snp_active()) {
+   unsigned long n, paddr;
+
+   n = ((system_rom_resource.end + 1) - video_rom_resource.start) 
>> PAGE_SHIFT;
+   paddr = video_rom_resource.start;
+   early_snp_set_memory_private((unsigned long)__va(paddr), paddr, 
n);
+   }
+
/* video rom */
upper = adapter_rom_resources[0].start;
for (start = video_rom_resource.start; start < upper; start += 2048) {
-- 
2.17.1

[RFC Part1 PATCH 05/13] X86/sev-es: move few helper functions in common file

2021-03-24 Thread Brijesh Singh

The sev_es_terminate() and sev_es_{wr,rd}_ghcb_msr() helper functions
in a common file so that it can be used by both the SEV-ES and SEV-SNP.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/boot/compressed/sev-common.c | 32 +++
 arch/x86/boot/compressed/sev-es.c | 22 ++
 arch/x86/kernel/sev-common-shared.c   | 31 ++
 arch/x86/kernel/sev-es-shared.c   | 21 +++---
 4 files changed, 68 insertions(+), 38 deletions(-)
 create mode 100644 arch/x86/boot/compressed/sev-common.c
 create mode 100644 arch/x86/kernel/sev-common-shared.c

diff --git a/arch/x86/boot/compressed/sev-common.c 
b/arch/x86/boot/compressed/sev-common.c
new file mode 100644
index ..d81ff7a3a67d
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-common.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Brijesh Singh 
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * This file is not compiled stand-alone. It is includes directly in the
+ * sev-es.c and sev-snp.c.
+ */
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+   unsigned long low, high;
+
+   asm volatile("rdmsr" : "=a" (low), "=d" (high) :
+   "c" (MSR_AMD64_SEV_ES_GHCB));
+
+   return ((high << 32) | low);
+}
+
+static inline void sev_es_wr_ghcb_msr(u64 val)
+{
+   u32 low, high;
+
+   low  = val & 0xUL;
+   high = val >> 32;
+
+   asm volatile("wrmsr" : : "c" (MSR_AMD64_SEV_ES_GHCB),
+   "a"(low), "d" (high) : "memory");
+}
diff --git a/arch/x86/boot/compressed/sev-es.c 
b/arch/x86/boot/compressed/sev-es.c
index 27826c265aab..58b15b7c1aa7 100644
--- a/arch/x86/boot/compressed/sev-es.c
+++ b/arch/x86/boot/compressed/sev-es.c
@@ -54,26 +54,8 @@ static unsigned long insn_get_seg_base(struct pt_regs *regs, 
int seg_reg_idx)
return 0UL;
 }
 
-static inline u64 sev_es_rd_ghcb_msr(void)
-{
-   unsigned long low, high;
-
-   asm volatile("rdmsr" : "=a" (low), "=d" (high) :
-   "c" (MSR_AMD64_SEV_ES_GHCB));
-
-   return ((high << 32) | low);
-}
-
-static inline void sev_es_wr_ghcb_msr(u64 val)
-{
-   u32 low, high;
-
-   low  = val & 0xUL;
-   high = val >> 32;
-
-   asm volatile("wrmsr" : : "c" (MSR_AMD64_SEV_ES_GHCB),
-   "a"(low), "d" (high) : "memory");
-}
+/* Provides sev_es_{wr,rd}_ghcb_msr() */
+#include "sev-common.c"
 
 static enum es_result vc_decode_insn(struct es_em_ctxt *ctxt)
 {
diff --git a/arch/x86/kernel/sev-common-shared.c 
b/arch/x86/kernel/sev-common-shared.c
new file mode 100644
index ..6229566add6f
--- /dev/null
+++ b/arch/x86/kernel/sev-common-shared.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Brijesh Singh 
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * This file is not compiled stand-alone. It contains code shared
+ * between the pre-decompression boot code and the running Linux kernel
+ * and is included directly into both code-bases.
+ */
+
+static void sev_es_terminate(unsigned int reason)
+{
+   u64 val = GHCB_SEV_TERMINATE;
+
+   /*
+* Tell the hypervisor what went wrong - only reason-set 0 is
+* currently supported.
+*/
+   val |= GHCB_SEV_TERMINATE_REASON(0, reason);
+
+   /* Request Guest Termination from Hypvervisor */
+   sev_es_wr_ghcb_msr(val);
+   VMGEXIT();
+
+   while (true)
+   asm volatile("hlt\n" : : : "memory");
+}
+
diff --git a/arch/x86/kernel/sev-es-shared.c b/arch/x86/kernel/sev-es-shared.c
index cdc04d091242..669e15678387 100644
--- a/arch/x86/kernel/sev-es-shared.c
+++ b/arch/x86/kernel/sev-es-shared.c
@@ -14,6 +14,9 @@
 #define has_cpuflag(f) boot_cpu_has(f)
 #endif
 
+/* Provides sev_es_terminate() */
+#include "sev-common-shared.c"
+
 static bool __init sev_es_check_cpu_features(void)
 {
if (!has_cpuflag(X86_FEATURE_RDRAND)) {
@@ -24,24 +27,6 @@ static bool __init sev_es_check_cpu_features(void)
return true;
 }
 
-static void sev_es_terminate(unsigned int reason)
-{
-   u64 val = GHCB_SEV_TERMINATE;
-
-   /*
-* Tell the hypervisor what went wrong - only reason-set 0 is
-* currently supported.
-*/
-   val |= GHCB_SEV_TERMINATE_REASON(0, reason);
-
-   /* Request Guest Termination from Hypvervisor */
-   sev_es_wr_ghcb_msr(val);
-   VMGEXIT();
-
-   while (true)
-   asm volatile("hlt\n" : : : "memory");
-}
-
 static bool sev_es_negotiate_protocol(void)
 {
u64 val;
-- 
2.17.1

[RFC Part1 PATCH 06/13] x86/compressed: rescinds and validate the memory used for the GHCB

2021-03-24 Thread Brijesh Singh

Many of the integrity guarantees of SEV-SNP are enforced through the
Reverse Map Table (RMP). Each RMP entry contains the GPA at which a
particular page of DRAM should be mapped. The VMs can request the
hypervisor to add pages in the RMP table via the Page State Change VMGEXIT
defined in the GHCB specification section 2.5.1 and 4.1.6. Inside each RMP
entry is a Validated flag; this flag is automatically cleared to 0 by the
CPU hardware when a new RMP entry is created for a guest. Each VM page
can be either validated or invalidated, as indicated by the Validated
flag in the RMP entry. Memory access to a private page that is not
validated generates a #VC. A VM can use PVALIDATE instruction to validate
the private page before using it.

To maintain the security guarantee of SEV-SNP guests, when transitioning
a memory from private to shared, the guest must invalidate the memory range
before asking the hypervisor to change the page state to shared in the RMP
table.

After the page is mapped private in the page table, the guest must issue a
page state change VMGEXIT to make the memory private in the RMP table and
validate it. If the memory is not validated after its added in the RMP table
as private, then a VC exception (page-not-validated) will be raised. We do
not support the page-not-validated exception yet, so it will crash the guest.

On boot, BIOS should have validated the entire system memory. During
the kernel decompression stage, the VC handler uses the
set_memory_decrypted() to make the GHCB page shared (i.e clear encryption
attribute). And while exiting from the decompression, it calls the
set_memory_encyrpted() to make the page private.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/boot/compressed/Makefile   |   1 +
 arch/x86/boot/compressed/ident_map_64.c |  18 
 arch/x86/boot/compressed/sev-snp.c  | 115 
 arch/x86/boot/compressed/sev-snp.h  |  25 ++
 4 files changed, 159 insertions(+)
 create mode 100644 arch/x86/boot/compressed/sev-snp.c
 create mode 100644 arch/x86/boot/compressed/sev-snp.h

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index e0bc3988c3fa..4d422aae8a86 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -93,6 +93,7 @@ ifdef CONFIG_X86_64
vmlinux-objs-y += $(obj)/mem_encrypt.o
vmlinux-objs-y += $(obj)/pgtable_64.o
vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-es.o
+   vmlinux-objs-$(CONFIG_AMD_MEM_ENCRYPT) += $(obj)/sev-snp.o
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
diff --git a/arch/x86/boot/compressed/ident_map_64.c 
b/arch/x86/boot/compressed/ident_map_64.c
index f7213d0943b8..0a420ce5550f 100644
--- a/arch/x86/boot/compressed/ident_map_64.c
+++ b/arch/x86/boot/compressed/ident_map_64.c
@@ -37,6 +37,8 @@
 #include  /* For COMMAND_LINE_SIZE */
 #undef _SETUP
 
+#include "sev-snp.h"
+
 extern unsigned long get_cmd_line_ptr(void);
 
 /* Used by PAGE_KERN* macros: */
@@ -278,12 +280,28 @@ static int set_clr_page_flags(struct x86_mapping_info 
*info,
if ((set | clr) & _PAGE_ENC)
clflush_page(address);
 
+   /*
+* If the encryption attribute is being cleared, then change the page 
state to
+* shared in the RMP entry. Change of the page state must be done 
before the
+* PTE updates.
+*/
+   if (clr & _PAGE_ENC)
+   sev_snp_set_page_shared(pte_pfn(*ptep) << PAGE_SHIFT);
+
/* Update PTE */
pte = *ptep;
pte = pte_set_flags(pte, set);
pte = pte_clear_flags(pte, clr);
set_pte(ptep, pte);
 
+   /*
+* If the encryption attribute is being set, then change the page state 
to
+* private in the RMP entry. The page state must be done after the PTE
+* is updated.
+*/
+   if (set & _PAGE_ENC)
+   sev_snp_set_page_private(pte_pfn(*ptep) << PAGE_SHIFT);
+
/* Flush TLB after changing encryption attribute */
write_cr3(top_level_pgt);
 
diff --git a/arch/x86/boot/compressed/sev-snp.c 
b/arch/x86/boot/compressed/sev-snp.c
new file mode 100644
index ..5c25103b0df1
--- /dev/null
+++ b/arch/x86/boot/compressed/sev-snp.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD SEV SNP support
+ *
+ * Author: Brijesh Singh 
+ *
+ */
+
+#include "misc.h"
+#include "error.h"
+
+#include 
+#include 
+#include 
+
+#include "sev-snp.h"
+
+static bool sev_snp_enabled(void)
+{
+   unsigned long low, high;
+   u64 val;
+
+   asm volatile("rdmsr\n" : "=a&quo

[RFC Part1 PATCH 09/13] x86/kernel: add support to validate memory in early enc attribute change

2021-03-24 Thread Brijesh Singh

The early_set_memory_{encrypt,decrypt}() are used for changing the
page from decrypted (shared) to encrypted (private) and vice versa.
When SEV-SNP is active, the page state transition needs to go through
additional steps.

If the page is transitioned from shared to private, then perform the
following after the encryption attribute is set in the page table:

1. Issue the page state change VMGEXIT to add the page as a private
   in the RMP table.
2. Validate the page after its successfully added in the RMP table.

To maintain the security guarantees, if the page is transitioned from
private to shared, then perform the following before clearing the
encryption attribute from the page table.

1. Invalidate the page.
2. Issue the page state change VMGEXIT to make the page shared in the
   RMP table.

The early_set_memory_{encryot,decrypt} can be called before the full GHCB
is setup, use the SNP page state MSR protocol VMGEXIT defined in the GHCB
section 2.3.1 to request the page state change in the RMP table.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-snp.h |  20 +++
 arch/x86/kernel/sev-snp.c  | 105 +
 arch/x86/mm/mem_encrypt.c  |  40 -
 3 files changed, 163 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index 0523eb21abd7..c4b096206062 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -63,6 +63,10 @@ struct __packed snp_page_state_change {
 #define GHCB_REGISTER_GPA_RESP 0x013UL
 #defineGHCB_REGISTER_GPA_RESP_VAL(val) ((val) >> 12)
 
+/* Macro to convert the x86 page level to the RMP level and vice versa */
+#define X86_RMP_PG_LEVEL(level)(((level) == PG_LEVEL_4K) ? 
RMP_PG_SIZE_4K : RMP_PG_SIZE_2M)
+#define RMP_X86_PG_LEVEL(level)(((level) == RMP_PG_SIZE_4K) ? 
PG_LEVEL_4K : PG_LEVEL_2M)
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
  unsigned long *rflags)
@@ -82,6 +86,11 @@ static inline int __pvalidate(unsigned long vaddr, int 
rmp_psize, int validate,
 
 void sev_snp_register_ghcb(unsigned long paddr);
 
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long 
paddr,
+   unsigned int npages);
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long 
paddr,
+   unsigned int npages);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 static inline int __pvalidate(unsigned long vaddr, int psize, int validate, 
unsigned long *eflags)
@@ -91,6 +100,17 @@ static inline int __pvalidate(unsigned long vaddr, int 
psize, int validate, unsi
 
 static inline void sev_snp_register_ghcb(unsigned long paddr) { }
 
+static inline void __init
+early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr, 
unsigned int npages)
+{
+   return 0;
+}
+static inline void __init
+early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr, unsigned 
int npages)
+{
+   return 0;
+}
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/kernel/sev-snp.c b/arch/x86/kernel/sev-snp.c
index d32225c2b653..ff9b35bfb05c 100644
--- a/arch/x86/kernel/sev-snp.c
+++ b/arch/x86/kernel/sev-snp.c
@@ -56,3 +56,108 @@ void sev_snp_register_ghcb(unsigned long paddr)
/* Restore the GHCB MSR value */
sev_es_wr_ghcb_msr(old);
 }
+
+static void sev_snp_issue_pvalidate(unsigned long vaddr, unsigned int npages, 
bool validate)
+{
+   unsigned long eflags, vaddr_end, vaddr_next;
+   int rc;
+
+   vaddr = vaddr & PAGE_MASK;
+   vaddr_end = vaddr + (npages << PAGE_SHIFT);
+
+   for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+   rc = __pvalidate(vaddr, RMP_PG_SIZE_4K, validate, );
+
+   if (rc) {
+   pr_err("Failed to validate address 0x%lx ret %d\n", 
vaddr, rc);
+   goto e_fail;
+   }
+
+   /* Check for the double validation condition */
+   if (eflags & X86_EFLAGS_CF) {
+   pr_err("Double %salidation detected (address 0x%lx)\n",
+   validate ? "v" : "inv", vaddr);
+   goto e_fail;
+   }
+
+   vaddr_next = vaddr + PAGE_SIZE;
+   }
+
+   return;
+
+e_fail:
+   /* Dump stack for the debugging purpose */
+   dump_stack();
+
+   /* Ask to terminate the guest */
+   sev_es_terminate(GHCB_SEV_ES_REASON_GENERAL_REQUEST);

[RFC Part1 PATCH 04/13] x86/sev-snp: define page state change VMGEXIT structure

2021-03-24 Thread Brijesh Singh

An SNP-active guest will use the page state change VNAE MGEXIT defined in
the GHCB specification section 4.1.6 to ask the hypervisor to make the
guest page private or shared in the RMP table. In addition to the
private/shared, the guest can also ask the hypervisor to split or
combine multiple 4K validated pages as a single 2M page or vice versa.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-snp.h  | 34 +
 arch/x86/include/uapi/asm/svm.h |  1 +
 2 files changed, 35 insertions(+)

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
index 5a6d1367cab7..f514dad276f2 100644
--- a/arch/x86/include/asm/sev-snp.h
+++ b/arch/x86/include/asm/sev-snp.h
@@ -22,6 +22,40 @@
 #define RMP_PG_SIZE_2M 1
 #define RMP_PG_SIZE_4K 0
 
+/* Page State Change MSR Protocol */
+#define GHCB_SNP_PAGE_STATE_CHANGE_REQ 0x0014
+#defineGHCB_SNP_PAGE_STATE_REQ_GFN(v, o)   
(GHCB_SNP_PAGE_STATE_CHANGE_REQ | \
+((unsigned long)((o) & 
0xf) << 52) | \
+(((v) << 12) & 
0xff))
+#defineSNP_PAGE_STATE_PRIVATE  1
+#defineSNP_PAGE_STATE_SHARED   2
+#defineSNP_PAGE_STATE_PSMASH   3
+#defineSNP_PAGE_STATE_UNSMASH  4
+
+#define GHCB_SNP_PAGE_STATE_CHANGE_RESP0x0015
+#defineGHCB_SNP_PAGE_STATE_RESP_VAL(val)   ((val) >> 32)
+
+/* Page State Change NAE event */
+#define SNP_PAGE_STATE_CHANGE_MAX_ENTRY253
+struct __packed snp_page_state_header {
+   uint16_t cur_entry;
+   uint16_t end_entry;
+   uint32_t reserved;
+};
+
+struct __packed snp_page_state_entry {
+   uint64_t cur_page:12;
+   uint64_t gfn:40;
+   uint64_t operation:4;
+   uint64_t pagesize:1;
+   uint64_t reserved:7;
+};
+
+struct __packed snp_page_state_change {
+   struct snp_page_state_header header;
+   struct snp_page_state_entry entry[SNP_PAGE_STATE_CHANGE_MAX_ENTRY];
+};
+
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
  unsigned long *rflags)
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 554f75fe013c..751867aa432f 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -108,6 +108,7 @@
 #define SVM_VMGEXIT_AP_JUMP_TABLE  0x8005
 #define SVM_VMGEXIT_SET_AP_JUMP_TABLE  0
 #define SVM_VMGEXIT_GET_AP_JUMP_TABLE  1
+#define SVM_VMGEXIT_PAGE_STATE_CHANGE  0x8010
 #define SVM_VMGEXIT_UNSUPPORTED_EVENT  0x8000
 
 #define SVM_EXIT_ERR   -1
-- 
2.17.1

[RFC Part1 PATCH 08/13] x86/sev-es: register GHCB memory when SEV-SNP is active

2021-03-24 Thread Brijesh Singh

The SEV-SNP guest is required to perform GHCB GPA registration. This is
because the hypervisor may prefer that a guest use a consistent and/or
specific GPA for the GHCB associated with a vCPU. For more information,
see the GHCB specification section 2.5.2.

During the boot, init_ghcb() allocates a per-cpu GHCB page. On very first
VC exception, the exception handler switch to using the per-cpu GHCB page
allocated during the init_ghcb(). The GHCB page must be registered in
the current vcpu context.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/Makefile  |  3 ++
 arch/x86/kernel/sev-es.c  | 19 +
 arch/x86/kernel/sev-snp.c | 58 +++
 3 files changed, 80 insertions(+)
 create mode 100644 arch/x86/kernel/sev-snp.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 5eeb808eb024..2fb24c49d2e3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev-es.o = -pg
+CFLAGS_REMOVE_sev-snp.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o   := n
@@ -29,6 +30,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o:= n
 KASAN_SANITIZE_stacktrace.o:= n
 KASAN_SANITIZE_paravirt.o  := n
 KASAN_SANITIZE_sev-es.o:= n
+KASAN_SANITIZE_sev-snp.o   := n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -151,6 +153,7 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)+= 
unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)   += unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += sev-es.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT)  += sev-snp.o
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/sev-es.c b/arch/x86/kernel/sev-es.c
index 0bd1a0fc587e..004bf1102dc1 100644
--- a/arch/x86/kernel/sev-es.c
+++ b/arch/x86/kernel/sev-es.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -88,6 +89,13 @@ struct sev_es_runtime_data {
 * is currently unsupported in SEV-ES guests.
 */
unsigned long dr7;
+
+   /*
+* SEV-SNP requires that the GHCB must be registered before using it.
+* The flag below will indicate whether the GHCB is registered, if its
+* not registered then sev_es_get_ghcb() will perform the registration.
+*/
+   bool ghcb_registered;
 };
 
 struct ghcb_state {
@@ -196,6 +204,12 @@ static __always_inline struct ghcb *sev_es_get_ghcb(struct 
ghcb_state *state)
data->ghcb_active = true;
}
 
+   /* SEV-SNP guest requires that GHCB must be registered before using it. 
*/
+   if (sev_snp_active() && !data->ghcb_registered) {
+   sev_snp_register_ghcb(__pa(ghcb));
+   data->ghcb_registered = true;
+   }
+
return ghcb;
 }
 
@@ -569,6 +583,10 @@ static bool __init sev_es_setup_ghcb(void)
/* Alright - Make the boot-ghcb public */
boot_ghcb = _ghcb_page;
 
+   /* SEV-SNP guest requires that GHCB GPA must be registered */
+   if (sev_snp_active())
+   sev_snp_register_ghcb(__pa(_ghcb_page));
+
return true;
 }
 
@@ -658,6 +676,7 @@ static void __init init_ghcb(int cpu)
 
data->ghcb_active = false;
data->backup_ghcb_active = false;
+   data->ghcb_registered = false;
 }
 
 void __init sev_es_init_vc_handling(void)
diff --git a/arch/x86/kernel/sev-snp.c b/arch/x86/kernel/sev-snp.c
new file mode 100644
index ..d32225c2b653
--- /dev/null
+++ b/arch/x86/kernel/sev-snp.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2021 Advanced Micro Devices
+ *
+ * Author: Brijesh Singh 
+ */
+
+#define pr_fmt(fmt)"SEV-SNP: " fmt
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+static inline u64 sev_es_rd_ghcb_msr(void)
+{
+   return __rdmsr(MSR_AMD64_SEV_ES_GHCB);
+}
+
+static inline void sev_es_wr_ghcb_msr(u64 val)
+{
+   u32 low, high;
+
+   low  = (u32)(val);
+   high = (u32)(val >> 32);
+
+   native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
+}
+
+/* Provides sev_es_terminate() */
+#include "sev-common-shared.c"
+
+void sev_snp_register_ghcb(unsigned long paddr)
+{
+   u64 pfn = paddr >> PAGE_SHIFT;
+   u64 old, val;
+
+   /* s

[RFC Part1 PATCH 03/13] x86: add a helper routine for the PVALIDATE instruction

2021-03-24 Thread Brijesh Singh

An SNP-active guest uses the PVALIDATE instruction to validate or
rescind the validation of a guest page’s RMP entry. Upon completion,
a return code is stored in EAX and rFLAGS bits are set based on the
return code. If the instruction completed successfully, the CF
indicates if the content of the RMP were changed or not.

See AMD APM Volume 3 for additional details.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/sev-snp.h | 52 ++
 1 file changed, 52 insertions(+)
 create mode 100644 arch/x86/include/asm/sev-snp.h

diff --git a/arch/x86/include/asm/sev-snp.h b/arch/x86/include/asm/sev-snp.h
new file mode 100644
index ..5a6d1367cab7
--- /dev/null
+++ b/arch/x86/include/asm/sev-snp.h
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD SEV Secure Nested Paging Support
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh 
+ */
+
+#ifndef __ASM_SECURE_NESTED_PAGING_H
+#define __ASM_SECURE_NESTED_PAGING_H
+
+#ifndef __ASSEMBLY__
+#include  /* native_save_fl() */
+
+/* Return code of __pvalidate */
+#define PVALIDATE_SUCCESS  0
+#define PVALIDATE_FAIL_INPUT   1
+#define PVALIDATE_FAIL_SIZEMISMATCH6
+
+/* RMP page size */
+#define RMP_PG_SIZE_2M 1
+#define RMP_PG_SIZE_4K 0
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+static inline int __pvalidate(unsigned long vaddr, int rmp_psize, int validate,
+ unsigned long *rflags)
+{
+   unsigned long flags;
+   int rc;
+
+   asm volatile(".byte 0xF2, 0x0F, 0x01, 0xFF\n\t"
+"pushf; pop %0\n\t"
+: "=rm"(flags), "=a"(rc)
+: "a"(vaddr), "c"(rmp_psize), "d"(validate)
+: "memory", "cc");
+
+   *rflags = flags;
+   return rc;
+}
+
+#else  /* !CONFIG_AMD_MEM_ENCRYPT */
+
+static inline int __pvalidate(unsigned long vaddr, int psize, int validate, 
unsigned long *eflags)
+{
+   return 0;
+}
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif /* __ASSEMBLY__ */
+#endif  /* __ASM_SECURE_NESTED_PAGING_H */
-- 
2.17.1

[RFC Part1 PATCH 02/13] x86/mm: add sev_snp_active() helper

2021-03-24 Thread Brijesh Singh

The sev_snp_active() helper can be used by the guest to query whether the
SNP - Secure Nested Paging feature is active.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/mem_encrypt.h | 2 ++
 arch/x86/include/asm/msr-index.h   | 2 ++
 arch/x86/mm/mem_encrypt.c  | 9 +
 3 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 31c4df123aa0..d99aa260d328 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -54,6 +54,7 @@ void __init sev_es_init_vc_handling(void);
 bool sme_active(void);
 bool sev_active(void);
 bool sev_es_active(void);
+bool sev_snp_active(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
 
@@ -79,6 +80,7 @@ static inline void sev_es_init_vc_handling(void) { }
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
+static inline bool sev_snp_active(void) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 
0; }
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..b03694e116fe 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -477,8 +477,10 @@
 #define MSR_AMD64_SEV  0xc0010131
 #define MSR_AMD64_SEV_ENABLED_BIT  0
 #define MSR_AMD64_SEV_ES_ENABLED_BIT   1
+#define MSR_AMD64_SEV_SNP_ENABLED_BIT  2
 #define MSR_AMD64_SEV_ENABLED  BIT_ULL(MSR_AMD64_SEV_ENABLED_BIT)
 #define MSR_AMD64_SEV_ES_ENABLED   BIT_ULL(MSR_AMD64_SEV_ES_ENABLED_BIT)
+#define MSR_AMD64_SEV_SNP_ENABLED  BIT_ULL(MSR_AMD64_SEV_SNP_ENABLED_BIT)
 
 #define MSR_AMD64_VIRT_SPEC_CTRL   0xc001011f
 
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index c3d5f0236f35..5bd50008fc9a 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -390,6 +390,11 @@ bool noinstr sev_es_active(void)
return sev_status & MSR_AMD64_SEV_ES_ENABLED;
 }
 
+bool sev_snp_active(void)
+{
+   return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+}
+
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
@@ -462,6 +467,10 @@ static void print_mem_encrypt_feature_info(void)
if (sev_es_active())
pr_cont(" SEV-ES");
 
+   /* Secure Nested Paging */
+   if (sev_snp_active())
+   pr_cont(" SEV-SNP");
+
pr_cont("\n");
 }
 
-- 
2.17.1

[RFC Part1 PATCH 01/13] x86/cpufeatures: Add SEV-SNP CPU feature

2021-03-24 Thread Brijesh Singh

Add CPU feature detection for Secure Encrypted Virtualization with
Secure Nested Paging. This feature adds a strong memory integrity
protection to help prevent malicious hypervisor-based attacks like
data replay, memory re-mapping, and more.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/amd.c  | 3 ++-
 arch/x86/kernel/cpu/scattered.c| 1 +
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..a5b369f10bcd 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -238,6 +238,7 @@
 #define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted 
Virtualization - Encrypted State */
 #define X86_FEATURE_VM_PAGE_FLUSH  ( 8*32+21) /* "" VM Page Flush MSR is 
supported */
+#define X86_FEATURE_SEV_SNP( 8*32+22) /* AMD Secure Encrypted 
Virtualization - Secure Nested Paging */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f8ca66f3d861..39f7a4b5b04c 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -586,7 +586,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 *If BIOS has not enabled SME then don't advertise the
 *SME feature (set in scattered.c).
 *   For SEV: If BIOS has not enabled SEV then don't advertise the
-*SEV and SEV_ES feature (set in scattered.c).
+*SEV, SEV_ES and SEV_SNP feature (set in scattered.c).
 *
 *   In all cases, since support for SME and SEV requires long mode,
 *   don't advertise the feature under CONFIG_X86_32.
@@ -618,6 +618,7 @@ static void early_detect_mem_encrypt(struct cpuinfo_x86 *c)
 clear_sev:
setup_clear_cpu_cap(X86_FEATURE_SEV);
setup_clear_cpu_cap(X86_FEATURE_SEV_ES);
+   setup_clear_cpu_cap(X86_FEATURE_SEV_SNP);
}
 }
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 236924930bf0..eaec1278dc2e 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -45,6 +45,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_SEV_ES,   CPUID_EAX,  3, 0x801f, 0 },
{ X86_FEATURE_SME_COHERENT, CPUID_EAX, 10, 0x801f, 0 },
{ X86_FEATURE_VM_PAGE_FLUSH,CPUID_EAX,  2, 0x801f, 0 },
+   { X86_FEATURE_SEV_SNP,  CPUID_EAX,  4, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.17.1

[RFC Part1 PATCH 00/13] Add AMD Secure Nested Paging (SEV-SNP) Guest Support

2021-03-24 Thread Brijesh Singh

This part of Secure Encrypted Paging (SEV-SNP) series focuses on the changes
required in a guest OS for SEV-SNP support.

SEV-SNP builds upon existing SEV and SEV-ES functionality while adding
new hardware-based memory protections. SEV-SNP adds strong memory integrity
protection to help prevent malicious hypervisor-based attacks like data
replay, memory re-mapping and more in order to create an isolated memory
encryption environment.
 
This series provides the basic building blocks to support booting the SEV-SNP
VMs, it does not cover all the security enhancement introduced by the SEV-SNP
such as interrupt protection.

Many of the integrity guarantees of SEV-SNP are enforced through a new
structure called the Reverse Map Table (RMP). Adding a new page to SEV-SNP
VM requires a 2-step process. First, the hypervisor assigns a page to the
guest using the new RMPUPDATE instruction. This transitions the page to
guest-invalid. Second, the guest validates the page using the new PVALIDATE
instruction. The SEV-SNP VMs can use the new "Page State Change Request NAE"
defined in the GHCB specification to ask hypervisor to add or remove page
from the RMP table.
 
Each page assigned to the SEV-SNP VM can either be validated or unvalidated,
as indicated by the Validated flag in the page's RMP entry. There are two
approaches that can be taken for the page validation: Pre-validation and
Lazy Validation.
  
Under pre-validation, the pages are validated prior to first use. And under
lazy validation, pages are validated when first accessed. An access to a
unvalidated page results in a #VC exception, at which time the exception
handler may validate the page. Lazy validation requires careful tracking of
the validated pages to avoid validating the same GPA more than once. The
recently introduced "Unaccepted" memory type can be used to communicate the
unvalidated memory ranges to the Guest OS.

At this time we only sypport the pre-validation, the OVMF guest BIOS
validates the entire RAM before the control is handed over to the guest kernel.
The early_set_memory_{encrypt,decrypt} and set_memory_{encrypt,decrypt} are
enlightened to perform the page validation or invalidation while setting or
clearing the encryption attribute from the page table.

This series does not provide support for the following SEV-SNP features yet:

* CPUID filtering
* Driver to query attestation report
* AP bring up using the new SEV-SNP NAE
* Lazy validation
* Interrupt security

The series is based on kvm/master commit:
  87aa9ec939ec KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs

The complete source is available at
 https://github.com/AMDESE/linux/tree/sev-snp-part-1-rfc1

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Joerg Roedel 
Cc: "H. Peter Anvin" 
Cc: Tony Luck 
Cc: Dave Hansen 
Cc: "Peter Zijlstra (Intel)" 
Cc: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org

Additional resources
-
SEV-SNP whitepaper
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
 
APM 2: https://www.amd.com/system/files/TechDocs/24593.pdf
(section 15.36)

GHCB spec v2:
  The draft specification is posted on AMD-SEV-SNP mailing list:
   https://lists.suse.com/mailman/private/amd-sev-snp/

  Copy of draft spec is also available at 
  
https://github.com/AMDESE/AMDSEV/blob/sev-snp-devel/docs/56421-Guest_Hypervisor_Communication_Block_Standardization.pdf

GHCB spec v1:
SEV-SNP firmware specification:
 https://developer.amd.com/sev/

Brijesh Singh (13):
  x86/cpufeatures: Add SEV-SNP CPU feature
  x86/mm: add sev_snp_active() helper
  x86: add a helper routine for the PVALIDATE instruction
  x86/sev-snp: define page state change VMGEXIT structure
  X86/sev-es: move few helper functions in common file
  x86/compressed: rescinds and validate the memory used for the GHCB
  x86/compressed: register GHCB memory when SNP is active
  x86/sev-es: register GHCB memory when SEV-SNP is active
  x86/kernel: add support to validate memory in early enc attribute
change
  X86: kernel: make the bss.decrypted section shared in RMP table
  x86/kernel: validate rom memory before accessing when SEV-SNP is
active
  x86/sev-es: make GHCB get and put helper accessible outside
  x86/kernel: add support to validate memory when changing C-bit

 arch/x86/boot/compressed/Makefile   |   1 +
 arch/x86/boot/compressed/ident_map_64.c |  18 ++
 arch/x86/boot/compressed/sev-common.c   |  32 +++
 arch/x86/boot/compressed/sev-es.c   |  26 +--
 arch/x86/boot/compressed/sev-snp.c  | 141 +
 arch/x86/boot/compressed/sev-snp.h  |  25 +++
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/mem_encrypt.h  |   2 +
 arch/x86/include/asm/msr-index.h|   2 +
 arch/x86/include/asm/sev-es.h   |  11 +
 arch/x86/include/asm/sev-snp.h

Re: [PATCH v10 10/16] KVM: x86: Introduce KVM_GET_SHARED_PAGES_LIST ioctl

2021-03-08 Thread Brijesh Singh



On 3/8/21 1:51 PM, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Ashish Kalra wrote:
>> On Fri, Feb 26, 2021 at 09:44:41AM -0800, Sean Christopherson wrote:
>>> +Will and Quentin (arm64)
>>>
>>> Moving the non-KVM x86 folks to bcc, I don't they care about KVM details at 
>>> this
>>> point.
>>>
>>> On Fri, Feb 26, 2021, Ashish Kalra wrote:
 On Thu, Feb 25, 2021 at 02:59:27PM -0800, Steve Rutherford wrote:
> On Thu, Feb 25, 2021 at 12:20 PM Ashish Kalra  
> wrote:
> Thanks for grabbing the data!
>
> I am fine with both paths. Sean has stated an explicit desire for
> hypercall exiting, so I think that would be the current consensus.
>>> Yep, though it'd be good to get Paolo's input, too.
>>>
> If we want to do hypercall exiting, this should be in a follow-up
> series where we implement something more generic, e.g. a hypercall
> exiting bitmap or hypercall exit list. If we are taking the hypercall
> exit route, we can drop the kvm side of the hypercall.
>>> I don't think this is a good candidate for arbitrary hypercall 
>>> interception.  Or
>>> rather, I think hypercall interception should be an orthogonal 
>>> implementation.
>>>
>>> The guest, including guest firmware, needs to be aware that the hypercall is
>>> supported, and the ABI needs to be well-defined.  Relying on userspace VMMs 
>>> to
>>> implement a common ABI is an unnecessary risk.
>>>
>>> We could make KVM's default behavior be a nop, i.e. have KVM enforce the 
>>> ABI but
>>> require further VMM intervention.  But, I just don't see the point, it would
>>> save only a few lines of code.  It would also limit what KVM could do in the
>>> future, e.g. if KVM wanted to do its own bookkeeping _and_ exit to 
>>> userspace,
>>> then mandatory interception would essentially make it impossible for KVM to 
>>> do
>>> bookkeeping while still honoring the interception request.
>>>
>>> However, I do think it would make sense to have the userspace exit be a 
>>> generic
>>> exit type.  But hey, we already have the necessary ABI defined for that!  
>>> It's
>>> just not used anywhere.
>>>
>>> /* KVM_EXIT_HYPERCALL */
>>> struct {
>>> __u64 nr;
>>> __u64 args[6];
>>> __u64 ret;
>>> __u32 longmode;
>>> __u32 pad;
>>> } hypercall;
>>>
>>>
> Userspace could also handle the MSR using MSR filters (would need to
> confirm that).  Then userspace could also be in control of the cpuid bit.
>>> An MSR is not a great fit; it's x86 specific and limited to 64 bits of data.
>>> The data limitation could be fudged by shoving data into non-standard GPRs, 
>>> but
>>> that will result in truly heinous guest code, and extensibility issues.
>>>
>>> The data limitation is a moot point, because the x86-only thing is a deal
>>> breaker.  arm64's pKVM work has a near-identical use case for a guest to 
>>> share
>>> memory with a host.  I can't think of a clever way to avoid having to 
>>> support
>>> TDX's and SNP's hypervisor-agnostic variants, but we can at least not have
>>> multiple KVM variants.
>>>
>> Potentially, there is another reason for in-kernel hypercall handling
>> considering SEV-SNP. In case of SEV-SNP the RMP table tracks the state
>> of each guest page, for instance pages in hypervisor state, i.e., pages
>> with C=0 and pages in guest valid state with C=1.
>>
>> Now, there shouldn't be a need for page encryption status hypercalls on 
>> SEV-SNP as KVM can track & reference guest page status directly using 
>> the RMP table.
> Relying on the RMP table itself would require locking the RMP table for an
> extended duration, and walking the entire RMP to find shared pages would be
> very inefficient.
>
>> As KVM maintains the RMP table, therefore we will need SET/GET type of
>> interfaces to provide the guest page encryption status to userspace.
> Hrm, somehow I temporarily forgot about SNP and TDX adding their own 
> hypercalls
> for converting between shared and private.  And in the case of TDX, the 
> hypercall
> can't be trusted, i.e. is just a hint, otherwise the guest could induce a #MC 
> in
> the host.
>
> But, the different guest behavior doesn't require KVM to maintain a list/tree,
> e.g. adding a dedicated KVM_EXIT_* for notifying userspace of page encryption
> status changes would also suffice.  
>
> Actually, that made me think of another argument against maintaining a list in
> KVM: there's no way to notify userspace that a page's status has changed.
> Userspace would need to query KVM to do GET_LIST after every GET_DIRTY.
> Obviously not a huge issue, but it does make migration slightly less 
> efficient.
>
> On a related topic, there are fatal race conditions that will require careful
> coordination between guest and host, and will effectively be wired into the 
> ABI.
> SNP and TDX don't suffer these issues because host awareness of status is 
> atomic
> with respect to the guest actually writing the page with the new

Re: [PATCH] crypto: ccp - Don't initialize SEV support without the SEV feature

2021-03-04 Thread Brijesh Singh



On 3/3/21 4:31 PM, Tom Lendacky wrote:
> From: Tom Lendacky 
>
> If SEV has been disabled (e.g. through BIOS), the driver probe will still
> issue SEV firmware commands. The SEV INIT firmware command will return an
> error in this situation, but the error code is a general error code that
> doesn't highlight the exact reason.
>
> Add a check for X86_FEATURE_SEV in sev_dev_init() and emit a meaningful
> message and skip attempting to initialize the SEV firmware if the feature
> is not enabled. Since building the SEV code is dependent on X86_64, adding
> the check won't cause any build problems.
>
> Cc: John Allen 
> Cc: Brijesh Singh 
> Signed-off-by: Tom Lendacky 


Reviewed-By: Brijesh Singh 

> ---
>  drivers/crypto/ccp/sev-dev.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
> index 476113e12489..b9fc8d7aca73 100644
> --- a/drivers/crypto/ccp/sev-dev.c
> +++ b/drivers/crypto/ccp/sev-dev.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -971,6 +972,11 @@ int sev_dev_init(struct psp_device *psp)
>   struct sev_device *sev;
>   int ret = -ENOMEM;
>  
> + if (!boot_cpu_has(X86_FEATURE_SEV)) {
> + dev_info_once(dev, "SEV: memory encryption not enabled by 
> BIOS\n");
> + return 0;
> + }
> +
>   sev = devm_kzalloc(dev, sizeof(*sev), GFP_KERNEL);
>   if (!sev)
>   goto e_err;

Re: [PATCH v2] KVM/SVM: add support for SEV attestation command

2021-01-22 Thread Brijesh Singh

Hi Paolo,

Do you have any feedback on this ? It will be great if we can queue this
for 5.11.

-Brijesh

On 1/4/21 9:17 AM, Brijesh Singh wrote:
> The SEV FW version >= 0.23 added a new command that can be used to query
> the attestation report containing the SHA-256 digest of the guest memory
> encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
> sign the report with the Platform Endorsement Key (PEK).
>
> See the SEV FW API spec section 6.8 for more details.
>
> Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
> used to get the SHA-256 digest. The main difference between the
> KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter
> can be called while the guest is running and the measurement value is
> signed with PEK.
>
> Cc: James Bottomley 
> Cc: Tom Lendacky 
> Cc: David Rientjes 
> Cc: Paolo Bonzini 
> Cc: Sean Christopherson 
> Cc: Borislav Petkov 
> Cc: John Allen 
> Cc: Herbert Xu 
> Cc: linux-cry...@vger.kernel.org
> Reviewed-by: Tom Lendacky 
> Acked-by: David Rientjes 
> Tested-by: James Bottomley 
> Signed-off-by: Brijesh Singh 
> ---
> v2:
>   * Fix documentation typo
>
>  .../virt/kvm/amd-memory-encryption.rst| 21 ++
>  arch/x86/kvm/svm/sev.c| 71 +++
>  drivers/crypto/ccp/sev-dev.c  |  1 +
>  include/linux/psp-sev.h   | 17 +
>  include/uapi/linux/kvm.h  |  8 +++
>  5 files changed, 118 insertions(+)
>
> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
> b/Documentation/virt/kvm/amd-memory-encryption.rst
> index 09a8f2a34e39..469a6308765b 100644
> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
> @@ -263,6 +263,27 @@ Returns: 0 on success, -negative on error
>  __u32 trans_len;
>  };
>  
> +10. KVM_SEV_GET_ATTESTATION_REPORT
> +--
> +
> +The KVM_SEV_GET_ATTESTATION_REPORT command can be used by the hypervisor to 
> query the attestation
> +report containing the SHA-256 digest of the guest memory and VMSA passed 
> through the KVM_SEV_LAUNCH
> +commands and signed with the PEK. The digest returned by the command should 
> match the digest
> +used by the guest owner with the KVM_SEV_LAUNCH_MEASURE.
> +
> +Parameters (in): struct kvm_sev_attestation
> +
> +Returns: 0 on success, -negative on error
> +
> +::
> +
> +struct kvm_sev_attestation_report {
> +__u8 mnonce[16];/* A random mnonce that will be 
> placed in the report */
> +
> +__u64 uaddr;/* userspace address where the 
> report should be copied */
> +__u32 len;
> +};
> +
>  References
>  ==
>  
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 566f4d18185b..c4d3ee6be362 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -927,6 +927,74 @@ static int sev_launch_secret(struct kvm *kvm, struct 
> kvm_sev_cmd *argp)
>   return ret;
>  }
>  
> +static int sev_get_attestation_report(struct kvm *kvm, struct kvm_sev_cmd 
> *argp)
> +{
> + void __user *report = (void __user *)(uintptr_t)argp->data;
> + struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
> + struct sev_data_attestation_report *data;
> + struct kvm_sev_attestation_report params;
> + void __user *p;
> + void *blob = NULL;
> + int ret;
> +
> + if (!sev_guest(kvm))
> + return -ENOTTY;
> +
> + if (copy_from_user(, (void __user *)(uintptr_t)argp->data, 
> sizeof(params)))
> + return -EFAULT;
> +
> + data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
> + if (!data)
> + return -ENOMEM;
> +
> + /* User wants to query the blob length */
> + if (!params.len)
> + goto cmd;
> +
> + p = (void __user *)(uintptr_t)params.uaddr;
> + if (p) {
> + if (params.len > SEV_FW_BLOB_MAX_SIZE) {
> + ret = -EINVAL;
> + goto e_free;
> + }
> +
> + ret = -ENOMEM;
> + blob = kmalloc(params.len, GFP_KERNEL);
> + if (!blob)
> + goto e_free;
> +
> + data->address = __psp_pa(blob);
> + data->len = params.len;
> + memcpy(data->mnonce, params.mnonce, sizeof(params.mnonce));
> + }
> +cmd:
> + data->handle = sev->handle;
> + ret = sev_issue_cmd(kvm, SEV_CMD_ATTESTATION_REPORT, data, 
> >error

Re: [PATCH v2 12/14] KVM: SVM: Drop redundant svm_sev_enabled() helper

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Replace calls to svm_sev_enabled() with direct checks on sev_enabled, or
> in the case of svm_mem_enc_op, simply drop the call to svm_sev_enabled().
> This effectively replaces checks against a valid max_sev_asid with checks
> against sev_enabled.  sev_enabled is forced off by sev_hardware_setup()
> if max_sev_asid is invalid, all call sites are guaranteed to run after
> sev_hardware_setup(), and all of the checks care about SEV being fully
> enabled (as opposed to intentionally handling the scenario where
> max_sev_asid is valid but SEV enabling fails due to OOM).
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 6 +++---
>  arch/x86/kvm/svm/svm.h | 5 -
>  2 files changed, 3 insertions(+), 8 deletions(-)


Thanks

Reviewed-by: Brijesh Singh 


> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a2c3e2d42a7f..7e14514dd083 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1057,7 +1057,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   struct kvm_sev_cmd sev_cmd;
>   int r;
>  
> - if (!svm_sev_enabled() || !sev_enabled)
> + if (!sev_enabled)
>   return -ENOTTY;
>  
>   if (!argp)
> @@ -1321,7 +1321,7 @@ void __init sev_hardware_setup(void)
>  
>  void sev_hardware_teardown(void)
>  {
> - if (!svm_sev_enabled())
> + if (!sev_enabled)
>   return;
>  
>   bitmap_free(sev_asid_bitmap);
> @@ -1332,7 +1332,7 @@ void sev_hardware_teardown(void)
>  
>  int sev_cpu_init(struct svm_cpu_data *sd)
>  {
> - if (!svm_sev_enabled())
> + if (!sev_enabled)
>   return 0;
>  
>   sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1, sizeof(void *),
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 4eb4bab0ca3e..8cb4395b58a0 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -569,11 +569,6 @@ void svm_vcpu_unblocking(struct kvm_vcpu *vcpu);
>  
>  extern unsigned int max_sev_asid;
>  
> -static inline bool svm_sev_enabled(void)
> -{
> - return IS_ENABLED(CONFIG_KVM_AMD_SEV) ? max_sev_asid : 0;
> -}
> -
>  void sev_vm_destroy(struct kvm *kvm);
>  int svm_mem_enc_op(struct kvm *kvm, void __user *argp);
>  int svm_register_enc_region(struct kvm *kvm,

Re: [PATCH v2 13/14] KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Remove the forward declaration of sev_flush_asids(), which is only a few
> lines above the function itself.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 1 -
>  1 file changed, 1 deletion(-)


Thanks

Reviewed-by: Brijesh Singh 


> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 7e14514dd083..23a4bead4a82 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -41,7 +41,6 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
>  #endif /* CONFIG_KVM_AMD_SEV */
>  
>  static u8 sev_enc_bit;
> -static int sev_flush_asids(void);
>  static DECLARE_RWSEM(sev_deactivate_lock);
>  static DEFINE_MUTEX(sev_bitmap_lock);
>  unsigned int max_sev_asid;

Re: [PATCH v2 11/14] KVM: SVM: Move SEV VMCB tracking allocation to sev.c

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Move the allocation of the SEV VMCB array to sev.c to help pave the way
> toward encapsulating SEV enabling wholly within sev.c.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 13 +
>  arch/x86/kvm/svm/svm.c | 17 -
>  arch/x86/kvm/svm/svm.h |  1 +
>  3 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 1a143340103e..a2c3e2d42a7f 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1330,6 +1330,19 @@ void sev_hardware_teardown(void)
>   sev_flush_asids();
>  }
>  
> +int sev_cpu_init(struct svm_cpu_data *sd)
> +{
> + if (!svm_sev_enabled())
> + return 0;
> +
> + sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1, sizeof(void *),
> +   GFP_KERNEL | __GFP_ZERO);


I saw Tom recommended to use kzalloc.. instead of __GFP_ZERO in previous
patch. With that fixed,

Reviewed-by: Brijesh Singh 


> + if (!sd->sev_vmcbs)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
>  /*
>   * Pages used by hardware to hold guest encrypted state must be flushed 
> before
>   * returning them to the system.
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index bb7b99743bea..89b95fb87a0c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -552,23 +552,22 @@ static void svm_cpu_uninit(int cpu)
>  static int svm_cpu_init(int cpu)
>  {
>   struct svm_cpu_data *sd;
> + int ret;
>  
>   sd = kzalloc(sizeof(struct svm_cpu_data), GFP_KERNEL);
>   if (!sd)
>   return -ENOMEM;
>   sd->cpu = cpu;
>   sd->save_area = alloc_page(GFP_KERNEL);
> - if (!sd->save_area)
> + if (!sd->save_area) {
> + ret = -ENOMEM;
>   goto free_cpu_data;
> + }
>   clear_page(page_address(sd->save_area));
>  
> - if (svm_sev_enabled()) {
> - sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1,
> -   sizeof(void *),
> -   GFP_KERNEL | __GFP_ZERO);
> - if (!sd->sev_vmcbs)
> - goto free_save_area;
> - }
> + ret = sev_cpu_init(sd);
> + if (ret)
> + goto free_save_area;
>  
>   per_cpu(svm_data, cpu) = sd;
>  
> @@ -578,7 +577,7 @@ static int svm_cpu_init(int cpu)
>   __free_page(sd->save_area);
>  free_cpu_data:
>   kfree(sd);
> - return -ENOMEM;
> + return ret;
>  
>  }
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 8e169835f52a..4eb4bab0ca3e 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -583,6 +583,7 @@ int svm_unregister_enc_region(struct kvm *kvm,
>  void pre_sev_run(struct vcpu_svm *svm, int cpu);
>  void __init sev_hardware_setup(void);
>  void sev_hardware_teardown(void);
> +int sev_cpu_init(struct svm_cpu_data *sd);
>  void sev_free_vcpu(struct kvm_vcpu *vcpu);
>  int sev_handle_vmgexit(struct vcpu_svm *svm);
>  int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int 
> in);

Re: [PATCH v2 10/14] KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Query max_sev_asid directly after setting it instead of bouncing through
> its wrapper, svm_sev_enabled().  Using the wrapper is unnecessary
> obfuscation.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

thanks

Reviewed-by: Brijesh Singh 

>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 02a66008e9b9..1a143340103e 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -1278,8 +1278,7 @@ void __init sev_hardware_setup(void)
>  
>   /* Maximum number of encrypted guests supported simultaneously */
>   max_sev_asid = ecx;
> -
> - if (!svm_sev_enabled())
> + if (!max_sev_asid)
>   goto out;
>  
>   /* Minimum ASID value that should be used for SEV guest */

Re: [PATCH v2 09/14] KVM: SVM: Unconditionally invoke sev_hardware_teardown()

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Remove the redundant svm_sev_enabled() check when calling
> sev_hardware_teardown(), the teardown helper itself does the check.
> Removing the check from svm.c will eventually allow dropping
> svm_sev_enabled() entirely.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/svm.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)


Reviewed-by: Brijesh Singh 


>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f89f702b2a58..bb7b99743bea 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -887,8 +887,7 @@ static void svm_hardware_teardown(void)
>  {
>   int cpu;
>  
> - if (svm_sev_enabled())
> - sev_hardware_teardown();
> + sev_hardware_teardown();
>  
>   for_each_possible_cpu(cpu)
>   svm_cpu_uninit(cpu);

Re: [PATCH v2 08/14] KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Define sev_enabled and sev_es_enabled as 'false' and explicitly #ifdef
> out all of sev_hardware_setup() if CONFIG_KVM_AMD_SEV=n.  This kills
> three birds at once:
>
>   - Makes sev_enabled and sev_es_enabled off by default if
> CONFIG_KVM_AMD_SEV=n.  Previously, they could be on by default if
> CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=y, regardless of KVM SEV
> support.
>
>   - Hides the sev and sev_es module params when CONFIG_KVM_AMD_SEV=n.
>
>   - Resolves a false positive -Wnonnull in __sev_recycle_asids() that is
> currently masked by the equivalent IS_ENABLED(CONFIG_KVM_AMD_SEV)
> check in svm_sev_enabled(), which will be dropped in a future patch.
>
> Cc: Tom Lendacky 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)

thanks

Reviewed-by: Brijesh Singh 


>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a024edabaca5..02a66008e9b9 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -28,12 +28,17 @@
>  #define __ex(x) __kvm_handle_fault_on_reboot(x)
>  
>  /* enable/disable SEV support */
> +#ifdef CONFIG_KVM_AMD_SEV
>  static bool sev_enabled = 
> IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
>  module_param_named(sev, sev_enabled, bool, 0444);
>  
>  /* enable/disable SEV-ES support */
>  static bool sev_es_enabled = 
> IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
>  module_param_named(sev_es, sev_es_enabled, bool, 0444);
> +#else
> +#define sev_enabled false
> +#define sev_es_enabled false
> +#endif /* CONFIG_KVM_AMD_SEV */
>  
>  static u8 sev_enc_bit;
>  static int sev_flush_asids(void);
> @@ -1253,11 +1258,12 @@ void sev_vm_destroy(struct kvm *kvm)
>  
>  void __init sev_hardware_setup(void)
>  {
> +#ifdef CONFIG_KVM_AMD_SEV
>   unsigned int eax, ebx, ecx, edx;
>   bool sev_es_supported = false;
>   bool sev_supported = false;
>  
> - if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev_enabled)
> + if (!sev_enabled)
>   goto out;
>  
>   /* Does the CPU support SEV? */
> @@ -1311,6 +1317,7 @@ void __init sev_hardware_setup(void)
>  out:
>   sev_enabled = sev_supported;
>   sev_es_enabled = sev_es_supported;
> +#endif
>  }
>  
>  void sev_hardware_teardown(void)

Re: [PATCH v2 07/14] KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Rename sev and sev_es to sev_enabled and sev_es_enabled respectively to
> better align with other KVM terminology, and to avoid pseudo-shadowing
> when the variables are moved to sev.c in a future patch ('sev' is often
> used for local struct kvm_sev_info pointers.
>
> No functional change intended.
>
> Acked-by: Tom Lendacky 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

thanks

Reviewed-by: Brijesh Singh 


>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 8ba93b8fa435..a024edabaca5 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -28,12 +28,12 @@
>  #define __ex(x) __kvm_handle_fault_on_reboot(x)
>  
>  /* enable/disable SEV support */
> -static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> -module_param(sev, int, 0444);
> +static bool sev_enabled = 
> IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> +module_param_named(sev, sev_enabled, bool, 0444);
>  
>  /* enable/disable SEV-ES support */
> -static int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> -module_param(sev_es, int, 0444);
> +static bool sev_es_enabled = 
> IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> +module_param_named(sev_es, sev_es_enabled, bool, 0444);
>  
>  static u8 sev_enc_bit;
>  static int sev_flush_asids(void);
> @@ -213,7 +213,7 @@ static int sev_guest_init(struct kvm *kvm, struct 
> kvm_sev_cmd *argp)
>  
>  static int sev_es_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
>  {
> - if (!sev_es)
> + if (!sev_es_enabled)
>   return -ENOTTY;
>  
>   to_kvm_svm(kvm)->sev_info.es_active = true;
> @@ -1052,7 +1052,7 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
>   struct kvm_sev_cmd sev_cmd;
>   int r;
>  
> - if (!svm_sev_enabled() || !sev)
> + if (!svm_sev_enabled() || !sev_enabled)
>   return -ENOTTY;
>  
>   if (!argp)
> @@ -1257,7 +1257,7 @@ void __init sev_hardware_setup(void)
>   bool sev_es_supported = false;
>   bool sev_supported = false;
>  
> - if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev)
> + if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev_enabled)
>   goto out;
>  
>   /* Does the CPU support SEV? */
> @@ -1294,7 +1294,7 @@ void __init sev_hardware_setup(void)
>   sev_supported = true;
>  
>   /* SEV-ES support requested? */
> - if (!sev_es)
> + if (!sev_es_enabled)
>   goto out;
>  
>   /* Does the CPU support SEV-ES? */
> @@ -1309,8 +1309,8 @@ void __init sev_hardware_setup(void)
>   sev_es_supported = true;
>  
>  out:
> - sev = sev_supported;
> - sev_es = sev_es_supported;
> + sev_enabled = sev_supported;
> + sev_es_enabled = sev_es_supported;
>  }
>  
>  void sev_hardware_teardown(void)

Re: [PATCH v2 06/14] x86/sev: Drop redundant and potentially misleading 'sev_enabled'

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:37 PM, Sean Christopherson wrote:
> Drop the sev_enabled flag and switch its one user over to sev_active().
> sev_enabled was made redundant with the introduction of sev_status in
> commit b57de6cd1639 ("x86/sev-es: Add SEV-ES Feature Detection").
> sev_enabled and sev_active() are guaranteed to be equivalent, as each is
> true iff 'sev_status & MSR_AMD64_SEV_ENABLED' is true, and are only ever
> written in tandem (ignoring compressed boot's version of sev_status).
>
> Removing sev_enabled avoids confusion over whether it refers to the guest
> or the host, and will also allow KVM to usurp "sev_enabled" for its own
> purposes.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/include/asm/mem_encrypt.h |  1 -
>  arch/x86/mm/mem_encrypt.c  | 12 +---
>  arch/x86/mm/mem_encrypt_identity.c |  1 -
>  3 files changed, 5 insertions(+), 9 deletions(-)

Thanks

Reviewed-by: Brijesh Singh 

> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index 2f62bbdd9d12..88d624499411 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -20,7 +20,6 @@
>  
>  extern u64 sme_me_mask;
>  extern u64 sev_status;
> -extern bool sev_enabled;
>  
>  void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
>unsigned long decrypted_kernel_vaddr,
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index bc0833713be9..b89bc03c63a2 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -44,8 +44,6 @@ EXPORT_SYMBOL(sme_me_mask);
>  DEFINE_STATIC_KEY_FALSE(sev_enable_key);
>  EXPORT_SYMBOL_GPL(sev_enable_key);
>  
> -bool sev_enabled __section(".data");
> -
>  /* Buffer used for early in-place encryption by BSP, no locking needed */
>  static char sme_early_buffer[PAGE_SIZE] __initdata __aligned(PAGE_SIZE);
>  
> @@ -342,16 +340,16 @@ int __init early_set_memory_encrypted(unsigned long 
> vaddr, unsigned long size)
>   * up under SME the trampoline area cannot be encrypted, whereas under SEV
>   * the trampoline area must be encrypted.
>   */
> -bool sme_active(void)
> -{
> - return sme_me_mask && !sev_enabled;
> -}
> -
>  bool sev_active(void)
>  {
>   return sev_status & MSR_AMD64_SEV_ENABLED;
>  }
>  
> +bool sme_active(void)
> +{
> + return sme_me_mask && !sev_active();
> +}
> +
>  /* Needs to be called from non-instrumentable code */
>  bool noinstr sev_es_active(void)
>  {
> diff --git a/arch/x86/mm/mem_encrypt_identity.c 
> b/arch/x86/mm/mem_encrypt_identity.c
> index 6c5eb6f3f14f..0c2759b7f03a 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -545,7 +545,6 @@ void __init sme_enable(struct boot_params *bp)
>  
>   /* SEV state cannot be controlled by a command line option */
>   sme_me_mask = me_mask;
> - sev_enabled = true;
>   physical_mask &= ~sme_me_mask;
>   return;
>   }

Re: [PATCH v2 05/14] KVM: x86: Override reported SME/SEV feature flags with host mask

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:36 PM, Sean Christopherson wrote:
> Add a reverse-CPUID entry for the memory encryption word, 0x801F.EAX,
> and use it to override the supported CPUID flags reported to userspace.
> Masking the reported CPUID flags avoids over-reporting KVM support, e.g.
> without the mask a SEV-SNP capable CPU may incorrectly advertise SNP
> support to userspace.
>
> Cc: Brijesh Singh 
> Cc: Tom Lendacky 
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/cpuid.c | 2 ++
>  arch/x86/kvm/cpuid.h | 1 +
>  2 files changed, 3 insertions(+)

thanks

Reviewed-by: Brijesh Singh 

>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 13036cf0b912..b7618cdd06b5 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -855,6 +855,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array 
> *array, u32 function)
>   case 0x801F:
>   if (!boot_cpu_has(X86_FEATURE_SEV))
>   entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
> + else
> + cpuid_entry_override(entry, CPUID_8000_001F_EAX);
>   break;
>   /*Add support for Centaur's CPUID instruction*/
>   case 0xC000:
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..8b6fc9bde248 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -63,6 +63,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
>   [CPUID_8000_0007_EBX] = {0x8007, 0, CPUID_EBX},
>   [CPUID_7_EDX] = { 7, 0, CPUID_EDX},
>   [CPUID_7_1_EAX]   = { 7, 1, CPUID_EAX},
> + [CPUID_8000_001F_EAX] = {0x801f, 1, CPUID_EAX},
>  };
>  
>  /*

Re: [PATCH v2 04/14] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:36 PM, Sean Christopherson wrote:
> Collect the scattered SME/SEV related feature flags into a dedicated
> word.  There are now five recognized features in CPUID.0x801F.EAX,
> with at least one more on the horizon (SEV-SNP).  Using a dedicated word
> allows KVM to use its automagic CPUID adjustment logic when reporting
> the set of supported features to userspace.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/include/asm/cpufeature.h  |  7 +--
>  arch/x86/include/asm/cpufeatures.h | 17 +++--
>  arch/x86/include/asm/disabled-features.h   |  3 ++-
>  arch/x86/include/asm/required-features.h   |  3 ++-
>  arch/x86/kernel/cpu/common.c   |  3 +++
>  arch/x86/kernel/cpu/scattered.c|  5 -
>  tools/arch/x86/include/asm/disabled-features.h |  3 ++-
>  tools/arch/x86/include/asm/required-features.h |  3 ++-
>  8 files changed, 27 insertions(+), 17 deletions(-)

Thanks

Reviewed-by: Brijesh Singh 

>
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index 59bf91c57aa8..1728d4ce5730 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -30,6 +30,7 @@ enum cpuid_leafs
>   CPUID_7_ECX,
>   CPUID_8000_0007_EBX,
>   CPUID_7_EDX,
> + CPUID_8000_001F_EAX,
>  };
>  
>  #ifdef CONFIG_X86_FEATURE_NAMES
> @@ -88,8 +89,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>  CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 16, feature_bit) ||\
>  CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 17, feature_bit) ||\
>  CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 18, feature_bit) ||\
> +CHECK_BIT_IN_MASK_WORD(REQUIRED_MASK, 19, feature_bit) ||\
>  REQUIRED_MASK_CHECK||\
> -BUILD_BUG_ON_ZERO(NCAPINTS != 19))
> +BUILD_BUG_ON_ZERO(NCAPINTS != 20))
>  
>  #define DISABLED_MASK_BIT_SET(feature_bit)   \
>( CHECK_BIT_IN_MASK_WORD(DISABLED_MASK,  0, feature_bit) ||\
> @@ -111,8 +113,9 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>  CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 16, feature_bit) ||\
>  CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 17, feature_bit) ||\
>  CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 18, feature_bit) ||\
> +CHECK_BIT_IN_MASK_WORD(DISABLED_MASK, 19, feature_bit) ||\
>  DISABLED_MASK_CHECK||\
> -BUILD_BUG_ON_ZERO(NCAPINTS != 19))
> +BUILD_BUG_ON_ZERO(NCAPINTS != 20))
>  
>  #define cpu_has(c, bit)  
> \
>   (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 :  \
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 9f9e9511f7cd..7c0bb1a20050 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -13,7 +13,7 @@
>  /*
>   * Defines x86 CPU feature bits
>   */
> -#define NCAPINTS 19 /* N 32-bit words worth of 
> info */
> +#define NCAPINTS 20 /* N 32-bit words worth of 
> info */
>  #define NBUGINTS 1  /* N 32-bit bug flags */
>  
>  /*
> @@ -96,7 +96,7 @@
>  #define X86_FEATURE_SYSCALL32( 3*32+14) /* "" syscall in 
> IA32 userspace */
>  #define X86_FEATURE_SYSENTER32   ( 3*32+15) /* "" sysenter in 
> IA32 userspace */
>  #define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well 
> */
> -#define X86_FEATURE_SME_COHERENT ( 3*32+17) /* "" AMD hardware-enforced 
> cache coherency */
> +/* FREE!( 3*32+17) */
>  #define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes 
> RDTSC */
>  #define X86_FEATURE_ACC_POWER( 3*32+19) /* AMD Accumulated 
> Power Mechanism */
>  #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) 
> instructions */
> @@ -201,7 +201,7 @@
>  #define X86_FEATURE_INVPCID_SINGLE   ( 7*32+ 7) /* Effectively INVPCID && 
> CR4.PCIDE=1 */
>  #define X86_FEATURE_HW_PSTATE( 7*32+ 8) /* AMD HW-PState */
>  #define X86_FEATURE_PROC_FEEDBACK( 7*32+ 9) /* AMD ProcFeedbackInterface 
> */
> -#define X86_FEATURE_SME  ( 7*32+10) /* AMD Secure Memory 
> Encryption */
> +/* FREE!( 7*32+10) */
>  #define X86_FEATURE_PTI  ( 7*32+11) /* Kernel Page Table 
>

Re: [PATCH v2 03/14] KVM: SVM: Move SEV module params/variables to sev.c

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:36 PM, Sean Christopherson wrote:
> Unconditionally invoke sev_hardware_setup() when configuring SVM and
> handle clearing the module params/variable 'sev' and 'sev_es' in
> sev_hardware_setup().  This allows making said variables static within
> sev.c and reduces the odds of a collision with guest code, e.g. the guest
> side of things has already laid claim to 'sev_enabled'.
>
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/sev.c | 11 +++
>  arch/x86/kvm/svm/svm.c | 15 +--
>  arch/x86/kvm/svm/svm.h |  2 --
>  3 files changed, 12 insertions(+), 16 deletions(-)


Reviewed-by: Brijesh Singh 


>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 0eeb6e1b803d..8ba93b8fa435 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -27,6 +27,14 @@
>  
>  #define __ex(x) __kvm_handle_fault_on_reboot(x)
>  
> +/* enable/disable SEV support */
> +static int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> +module_param(sev, int, 0444);
> +
> +/* enable/disable SEV-ES support */
> +static int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> +module_param(sev_es, int, 0444);
> +
>  static u8 sev_enc_bit;
>  static int sev_flush_asids(void);
>  static DECLARE_RWSEM(sev_deactivate_lock);
> @@ -1249,6 +1257,9 @@ void __init sev_hardware_setup(void)
>   bool sev_es_supported = false;
>   bool sev_supported = false;
>  
> + if (!IS_ENABLED(CONFIG_KVM_AMD_SEV) || !sev)
> + goto out;
> +
>   /* Does the CPU support SEV? */
>   if (!boot_cpu_has(X86_FEATURE_SEV))
>   goto out;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index ccf52c5531fb..f89f702b2a58 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -189,14 +189,6 @@ module_param(vls, int, 0444);
>  static int vgif = true;
>  module_param(vgif, int, 0444);
>  
> -/* enable/disable SEV support */
> -int sev = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> -module_param(sev, int, 0444);
> -
> -/* enable/disable SEV-ES support */
> -int sev_es = IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT);
> -module_param(sev_es, int, 0444);
> -
>  bool __read_mostly dump_invalid_vmcb;
>  module_param(dump_invalid_vmcb, bool, 0644);
>  
> @@ -976,12 +968,7 @@ static __init int svm_hardware_setup(void)
>   kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
>   }
>  
> - if (IS_ENABLED(CONFIG_KVM_AMD_SEV) && sev) {
> - sev_hardware_setup();
> - } else {
> - sev = false;
> - sev_es = false;
> - }
> + sev_hardware_setup();
>  
>   svm_adjust_mmio_mask();
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0fe874ae5498..8e169835f52a 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -408,8 +408,6 @@ static inline bool gif_set(struct vcpu_svm *svm)
>  #define MSR_CR3_LONG_MBZ_MASK0xfff0U
>  #define MSR_INVALID  0xU
>  
> -extern int sev;
> -extern int sev_es;
>  extern bool dump_invalid_vmcb;
>  
>  u32 svm_msrpm_offset(u32 msr);

Re: [PATCH v2 01/14] KVM: SVM: Zero out the VMCB array used to track SEV ASID association

2021-01-14 Thread Brijesh Singh



On 1/13/21 6:36 PM, Sean Christopherson wrote:
> Zero out the array of VMCB pointers so that pre_sev_run() won't see
> garbage when querying the array to detect when an SEV ASID is being
> associated with a new VMCB.  In practice, reading random values is all
> but guaranteed to be benign as a false negative (which is extremely
> unlikely on its own) can only happen on CPU0 on the first VMRUN and would
> only cause KVM to skip the ASID flush.  For anything bad to happen, a
> previous instance of KVM would have to exit without flushing the ASID,
> _and_ KVM would have to not flush the ASID at any time while building the
> new SEV guest.
>
> Cc: Borislav Petkov 
> Cc: Tom Lendacky 
> Cc: Brijesh Singh 
> Fixes: 70cd94e60c73 ("KVM: SVM: VMRUN should use associated ASID when SEV is 
> enabled")
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kvm/svm/svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7ef171790d02..ccf52c5531fb 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -573,7 +573,7 @@ static int svm_cpu_init(int cpu)
>   if (svm_sev_enabled()) {
>   sd->sev_vmcbs = kmalloc_array(max_sev_asid + 1,
> sizeof(void *),
> -   GFP_KERNEL);
> +   GFP_KERNEL | __GFP_ZERO);
>       if (!sd->sev_vmcbs)
>   goto free_save_area;
>   }


Reviewed-by: Brijesh Singh

Re: [Patch v4 1/2] cgroup: svm: Add Encryption ID controller

2021-01-13 Thread Brijesh Singh




On 1/7/2021 7:28 PM, Vipin Sharma wrote:

Hardware memory encryption is available on multiple generic CPUs. For
example AMD has Secure Encrypted Virtualization (SEV) and SEV -
Encrypted State (SEV-ES).

These memory encryptions are useful in creating encrypted virtual
machines (VMs) and user space programs.

There are limited number of encryption IDs that can be used
simultaneously on a machine for encryption. This generates a need for
the system admin to track, limit, allocate resources, and optimally
schedule VMs and user workloads in the cloud infrastructure. Some
malicious programs can exhaust all of these resources on a host causing
starvation of other workloads.

Encryption ID controller allows control of these resources using
Cgroups.

Controller is enabled by CGROUP_ENCRYPTION_IDS config option.
Encryption controller provide 3 interface files for each encryption ID
type. For example, in SEV:

1. encrpytion_ids.sev.max
Sets the maximum usage of SEV IDs in the cgroup.
2. encryption_ids.sev.current
Current usage of SEV IDs in the cgroup and its children.
3. encryption_ids.sev.stat
Shown only at the root cgroup. Displays total SEV IDs available
on the platform and current usage count.

Other ID types can be easily added in the controller in the same way.

Signed-off-by: Vipin Sharma 
Reviewed-by: David Rientjes 
Reviewed-by: Dionna Glaze 



Acked-by: Brijesh Singh 


---
  arch/x86/kvm/svm/sev.c|  52 +++-
  include/linux/cgroup_subsys.h |   4 +
  include/linux/encryption_ids_cgroup.h |  72 +
  include/linux/kvm_host.h  |   4 +
  init/Kconfig  |  14 +
  kernel/cgroup/Makefile|   1 +
  kernel/cgroup/encryption_ids.c| 422 ++
  7 files changed, 557 insertions(+), 12 deletions(-)
  create mode 100644 include/linux/encryption_ids_cgroup.h
  create mode 100644 kernel/cgroup/encryption_ids.c

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 9858d5ae9ddd..1924ab2eaf11 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -14,6 +14,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -86,10 +87,18 @@ static bool __sev_recycle_asids(int min_asid, int max_asid)
return true;
  }
  
-static int sev_asid_new(struct kvm_sev_info *sev)

+static int sev_asid_new(struct kvm *kvm)
  {
-   int pos, min_asid, max_asid;
+   int pos, min_asid, max_asid, ret;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
bool retry = true;
+   enum encryption_id_type type;
+
+   type = sev->es_active ? ENCRYPTION_ID_SEV_ES : ENCRYPTION_ID_SEV;
+
+   ret = enc_id_cg_try_charge(kvm, type, 1);
+   if (ret)
+   return ret;
  
  	mutex_lock(_bitmap_lock);
  
@@ -107,7 +116,8 @@ static int sev_asid_new(struct kvm_sev_info *sev)

goto again;
}
mutex_unlock(_bitmap_lock);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto e_uncharge;
}
  
  	__set_bit(pos, sev_asid_bitmap);

@@ -115,6 +125,9 @@ static int sev_asid_new(struct kvm_sev_info *sev)
mutex_unlock(_bitmap_lock);
  
  	return pos + 1;

+e_uncharge:
+   enc_id_cg_uncharge(kvm, type, 1);
+   return ret;
  }
  
  static int sev_get_asid(struct kvm *kvm)

@@ -124,14 +137,16 @@ static int sev_get_asid(struct kvm *kvm)
return sev->asid;
  }
  
-static void sev_asid_free(int asid)

+static void sev_asid_free(struct kvm *kvm)
  {
struct svm_cpu_data *sd;
int cpu, pos;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   enum encryption_id_type type;
  
  	mutex_lock(_bitmap_lock);
  
-	pos = asid - 1;

+   pos = sev->asid - 1;
__set_bit(pos, sev_reclaim_asid_bitmap);
  
  	for_each_possible_cpu(cpu) {

@@ -140,6 +155,9 @@ static void sev_asid_free(int asid)
}
  
  	mutex_unlock(_bitmap_lock);

+
+   type = sev->es_active ? ENCRYPTION_ID_SEV_ES : ENCRYPTION_ID_SEV;
+   enc_id_cg_uncharge(kvm, type, 1);
  }
  
  static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)

@@ -184,22 +202,22 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
if (unlikely(sev->active))
return ret;
  
-	asid = sev_asid_new(sev);

+   asid = sev_asid_new(kvm);
if (asid < 0)
return ret;
+   sev->asid = asid;
  
  	ret = sev_platform_init(>error);

if (ret)
goto e_free;
  
  	sev->active = true;

-   sev->asid = asid;
INIT_LIST_HEAD(>regions_list);
  
  	return 0;
  
  e_free:

-   sev_asid_free(asid);
+   sev_asid_free(kvm);
return ret;
  }
  
@@ -1240,12 +1258,12 @@ void sev_vm_destroy(struct kvm *kvm)

mutex_unlock(>lock);
  
  	sev_unbind_asid(kvm, sev->handle);

-   sev_as

[PATCH v2] KVM/SVM: add support for SEV attestation command

2021-01-04 Thread Brijesh Singh

The SEV FW version >= 0.23 added a new command that can be used to query
the attestation report containing the SHA-256 digest of the guest memory
encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
sign the report with the Platform Endorsement Key (PEK).

See the SEV FW API spec section 6.8 for more details.

Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
used to get the SHA-256 digest. The main difference between the
KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter
can be called while the guest is running and the measurement value is
signed with PEK.

Cc: James Bottomley 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Paolo Bonzini 
Cc: Sean Christopherson 
Cc: Borislav Petkov 
Cc: John Allen 
Cc: Herbert Xu 
Cc: linux-cry...@vger.kernel.org
Reviewed-by: Tom Lendacky 
Acked-by: David Rientjes 
Tested-by: James Bottomley 
Signed-off-by: Brijesh Singh 
---
v2:
  * Fix documentation typo

 .../virt/kvm/amd-memory-encryption.rst| 21 ++
 arch/x86/kvm/svm/sev.c| 71 +++
 drivers/crypto/ccp/sev-dev.c  |  1 +
 include/linux/psp-sev.h   | 17 +
 include/uapi/linux/kvm.h  |  8 +++
 5 files changed, 118 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
b/Documentation/virt/kvm/amd-memory-encryption.rst
index 09a8f2a34e39..469a6308765b 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -263,6 +263,27 @@ Returns: 0 on success, -negative on error
 __u32 trans_len;
 };
 
+10. KVM_SEV_GET_ATTESTATION_REPORT
+--
+
+The KVM_SEV_GET_ATTESTATION_REPORT command can be used by the hypervisor to 
query the attestation
+report containing the SHA-256 digest of the guest memory and VMSA passed 
through the KVM_SEV_LAUNCH
+commands and signed with the PEK. The digest returned by the command should 
match the digest
+used by the guest owner with the KVM_SEV_LAUNCH_MEASURE.
+
+Parameters (in): struct kvm_sev_attestation
+
+Returns: 0 on success, -negative on error
+
+::
+
+struct kvm_sev_attestation_report {
+__u8 mnonce[16];/* A random mnonce that will be placed 
in the report */
+
+__u64 uaddr;/* userspace address where the report 
should be copied */
+__u32 len;
+};
+
 References
 ==
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 566f4d18185b..c4d3ee6be362 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -927,6 +927,74 @@ static int sev_launch_secret(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_get_attestation_report(struct kvm *kvm, struct kvm_sev_cmd 
*argp)
+{
+   void __user *report = (void __user *)(uintptr_t)argp->data;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_attestation_report *data;
+   struct kvm_sev_attestation_report params;
+   void __user *p;
+   void *blob = NULL;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(, (void __user *)(uintptr_t)argp->data, 
sizeof(params)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   /* User wants to query the blob length */
+   if (!params.len)
+   goto cmd;
+
+   p = (void __user *)(uintptr_t)params.uaddr;
+   if (p) {
+   if (params.len > SEV_FW_BLOB_MAX_SIZE) {
+   ret = -EINVAL;
+   goto e_free;
+   }
+
+   ret = -ENOMEM;
+   blob = kmalloc(params.len, GFP_KERNEL);
+   if (!blob)
+   goto e_free;
+
+   data->address = __psp_pa(blob);
+   data->len = params.len;
+   memcpy(data->mnonce, params.mnonce, sizeof(params.mnonce));
+   }
+cmd:
+   data->handle = sev->handle;
+   ret = sev_issue_cmd(kvm, SEV_CMD_ATTESTATION_REPORT, data, 
>error);
+   /*
+* If we query the session length, FW responded with expected data.
+*/
+   if (!params.len)
+   goto done;
+
+   if (ret)
+   goto e_free_blob;
+
+   if (blob) {
+   if (copy_to_user(p, blob, params.len))
+   ret = -EFAULT;
+   }
+
+done:
+   params.len = data->len;
+   if (copy_to_user(report, , sizeof(params)))
+   ret = -EFAULT;
+e_free_blob:
+   kfree(blob);
+e_free:
+   kfree(data);
+   return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -971,6 +1039,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp

Re: [PATCH] KVM/SVM: add support for SEV attestation command

2020-12-09 Thread Brijesh Singh



On 12/9/20 1:51 AM, Ard Biesheuvel wrote:
> On Fri, 4 Dec 2020 at 22:30, Brijesh Singh  wrote:
>> The SEV FW version >= 0.23 added a new command that can be used to query
>> the attestation report containing the SHA-256 digest of the guest memory
>> encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
>> sign the report with the Platform Endorsement Key (PEK).
>>
>> See the SEV FW API spec section 6.8 for more details.
>>
>> Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
>> used to get the SHA-256 digest. The main difference between the
>> KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the later
> latter
>
>> can be called while the guest is running and the measurement value is
>> signed with PEK.
>>
>> Cc: James Bottomley 
>> Cc: Tom Lendacky 
>> Cc: David Rientjes 
>> Cc: Paolo Bonzini 
>> Cc: Sean Christopherson 
>> Cc: Borislav Petkov 
>> Cc: John Allen 
>> Cc: Herbert Xu 
>> Cc: linux-cry...@vger.kernel.org
>> Signed-off-by: Brijesh Singh 
>> ---
>>  .../virt/kvm/amd-memory-encryption.rst| 21 ++
>>  arch/x86/kvm/svm/sev.c| 71 +++
>>  drivers/crypto/ccp/sev-dev.c  |  1 +
>>  include/linux/psp-sev.h   | 17 +
>>  include/uapi/linux/kvm.h  |  8 +++
>>  5 files changed, 118 insertions(+)
>>
>> diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
>> b/Documentation/virt/kvm/amd-memory-encryption.rst
>> index 09a8f2a34e39..4c6685d0fddd 100644
>> --- a/Documentation/virt/kvm/amd-memory-encryption.rst
>> +++ b/Documentation/virt/kvm/amd-memory-encryption.rst
>> @@ -263,6 +263,27 @@ Returns: 0 on success, -negative on error
>>  __u32 trans_len;
>>  };
>>
>> +10. KVM_SEV_GET_ATTESATION_REPORT
> KVM_SEV_GET_ATTESTATION_REPORT
>
>> +-
>> +
>> +The KVM_SEV_GET_ATTESATION_REPORT command can be used by the hypervisor to 
>> query the attestation
> KVM_SEV_GET_ATTESTATION_REPORT


Noted, I will send v2 with these fixed.

Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2020-12-08 Thread Brijesh Singh



On 12/7/20 9:09 PM, Steve Rutherford wrote:
> On Mon, Dec 7, 2020 at 12:42 PM Sean Christopherson  wrote:
>> On Sun, Dec 06, 2020, Paolo Bonzini wrote:
>>> On 03/12/20 01:34, Sean Christopherson wrote:
>>>> On Tue, Dec 01, 2020, Ashish Kalra wrote:
>>>>> From: Brijesh Singh 
>>>>>
>>>>> KVM hypercall framework relies on alternative framework to patch the
>>>>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
>>>>> apply_alternative() is called then it defaults to VMCALL. The approach
>>>>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
>>>>> will be able to decode the instruction and do the right things. But
>>>>> when SEV is active, guest memory is encrypted with guest key and
>>>>> hypervisor will not be able to decode the instruction bytes.
>>>>>
>>>>> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The 
>>>>> hypercall
>>>>> will be used by the SEV guest to notify encrypted pages to the hypervisor.
>>>> What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to 
>>>> VMMCALL
>>>> and opt into VMCALL?  It's a synthetic feature flag either way, and I don't
>>>> think there are any existing KVM hypercalls that happen before 
>>>> alternatives are
>>>> patched, i.e. it'll be a nop for sane kernel builds.
>>>>
>>>> I'm also skeptical that a KVM specific hypercall is the right approach for 
>>>> the
>>>> encryption behavior, but I'll take that up in the patches later in the 
>>>> series.
>>> Do you think that it's the guest that should "donate" memory for the bitmap
>>> instead?
>> No.  Two things I'd like to explore:
>>
>>   1. Making the hypercall to announce/request private vs. shared common 
>> across
>>  hypervisors (KVM, Hyper-V, VMware, etc...) and technologies (SEV-* and 
>> TDX).
>>  I'm concerned that we'll end up with multiple hypercalls that do more or
>>  less the same thing, e.g. KVM+SEV, Hyper-V+SEV, TDX, etc...  Maybe it's 
>> a
>>  pipe dream, but I'd like to at least explore options before shoving in 
>> KVM-
>>  only hypercalls.
>>
>>
>>   2. Tracking shared memory via a list of ranges instead of a using bitmap to
>>  track all of guest memory.  For most use cases, the vast majority of 
>> guest
>>  memory will be private, most ranges will be 2mb+, and conversions 
>> between
>>  private and shared will be uncommon events, i.e. the overhead to walk 
>> and
>>  split/merge list entries is hopefully not a big concern.  I suspect a 
>> list
>>  would consume far less memory, hopefully without impacting performance.
> For a fancier data structure, I'd suggest an interval tree. Linux
> already has an rbtree-based interval tree implementation, which would
> likely work, and would probably assuage any performance concerns.
>
> Something like this would not be worth doing unless most of the shared
> pages were physically contiguous. A sample Ubuntu 20.04 VM on GCP had
> 60ish discontiguous shared regions. This is by no means a thorough
> search, but it's suggestive. If this is typical, then the bitmap would
> be far less efficient than most any interval-based data structure.
>
> You'd have to allow userspace to upper bound the number of intervals
> (similar to the maximum bitmap size), to prevent host OOMs due to
> malicious guests. There's something nice about the guest donating
> memory for this, since that would eliminate the OOM risk.


Tracking the list of ranges may not be bad idea, especially if we use
the some kind of rbtree-based data structure to update the ranges. It
will certainly be better than bitmap which grows based on the guest
memory size and as you guys see in the practice most of the pages will
be guest private. I am not sure if guest donating a memory will cover
all the cases, e.g what if we do a memory hotplug (increase the guest
ram from 2GB to 64GB), will donated memory range will be enough to store
the metadata.

[PATCH] KVM/SVM: add support for SEV attestation command

2020-12-04 Thread Brijesh Singh

The SEV FW version >= 0.23 added a new command that can be used to query
the attestation report containing the SHA-256 digest of the guest memory
encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
sign the report with the Platform Endorsement Key (PEK).

See the SEV FW API spec section 6.8 for more details.

Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
used to get the SHA-256 digest. The main difference between the
KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the later
can be called while the guest is running and the measurement value is
signed with PEK.

Cc: James Bottomley 
Cc: Tom Lendacky 
Cc: David Rientjes 
Cc: Paolo Bonzini 
Cc: Sean Christopherson 
Cc: Borislav Petkov 
Cc: John Allen 
Cc: Herbert Xu 
Cc: linux-cry...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 .../virt/kvm/amd-memory-encryption.rst| 21 ++
 arch/x86/kvm/svm/sev.c| 71 +++
 drivers/crypto/ccp/sev-dev.c  |  1 +
 include/linux/psp-sev.h   | 17 +
 include/uapi/linux/kvm.h  |  8 +++
 5 files changed, 118 insertions(+)

diff --git a/Documentation/virt/kvm/amd-memory-encryption.rst 
b/Documentation/virt/kvm/amd-memory-encryption.rst
index 09a8f2a34e39..4c6685d0fddd 100644
--- a/Documentation/virt/kvm/amd-memory-encryption.rst
+++ b/Documentation/virt/kvm/amd-memory-encryption.rst
@@ -263,6 +263,27 @@ Returns: 0 on success, -negative on error
 __u32 trans_len;
 };
 
+10. KVM_SEV_GET_ATTESATION_REPORT
+-
+
+The KVM_SEV_GET_ATTESATION_REPORT command can be used by the hypervisor to 
query the attestation
+report containing the SHA-256 digest of the guest memory and VMSA passed 
through the KVM_SEV_LAUNCH
+commands and signed with the PEK. The digest returned by the command should 
match the digest
+used by the guest owner with the KVM_SEV_LAUNCH_MEASURE.
+
+Parameters (in): struct kvm_sev_attestation
+
+Returns: 0 on success, -negative on error
+
+::
+
+struct kvm_sev_attestation_report {
+__u8 mnonce[16];/* A random mnonce that will be placed 
in the report */
+
+__u64 uaddr;/* userspace address where the report 
should be copied */
+__u32 len;
+};
+
 References
 ==
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 566f4d18185b..c4d3ee6be362 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -927,6 +927,74 @@ static int sev_launch_secret(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_get_attestation_report(struct kvm *kvm, struct kvm_sev_cmd 
*argp)
+{
+   void __user *report = (void __user *)(uintptr_t)argp->data;
+   struct kvm_sev_info *sev = _kvm_svm(kvm)->sev_info;
+   struct sev_data_attestation_report *data;
+   struct kvm_sev_attestation_report params;
+   void __user *p;
+   void *blob = NULL;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(, (void __user *)(uintptr_t)argp->data, 
sizeof(params)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL_ACCOUNT);
+   if (!data)
+   return -ENOMEM;
+
+   /* User wants to query the blob length */
+   if (!params.len)
+   goto cmd;
+
+   p = (void __user *)(uintptr_t)params.uaddr;
+   if (p) {
+   if (params.len > SEV_FW_BLOB_MAX_SIZE) {
+   ret = -EINVAL;
+   goto e_free;
+   }
+
+   ret = -ENOMEM;
+   blob = kmalloc(params.len, GFP_KERNEL);
+   if (!blob)
+   goto e_free;
+
+   data->address = __psp_pa(blob);
+   data->len = params.len;
+   memcpy(data->mnonce, params.mnonce, sizeof(params.mnonce));
+   }
+cmd:
+   data->handle = sev->handle;
+   ret = sev_issue_cmd(kvm, SEV_CMD_ATTESTATION_REPORT, data, 
>error);
+   /*
+* If we query the session length, FW responded with expected data.
+*/
+   if (!params.len)
+   goto done;
+
+   if (ret)
+   goto e_free_blob;
+
+   if (blob) {
+   if (copy_to_user(p, blob, params.len))
+   ret = -EFAULT;
+   }
+
+done:
+   params.len = data->len;
+   if (copy_to_user(report, , sizeof(params)))
+   ret = -EFAULT;
+e_free_blob:
+   kfree(blob);
+e_free:
+   kfree(data);
+   return ret;
+}
+
 int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -971,6 +1039,9 @@ int svm_mem_enc_op(struct kvm *kvm, void __user *argp)
case KVM_SEV_LAUNCH_SECRET:
r = sev_launch_secret(kvm, _cmd);
break;
+

Re: [PATCH v2 1/9] KVM: x86: Add AMD SEV specific Hypercall3

2020-12-04 Thread Brijesh Singh



On 12/2/20 6:34 PM, Sean Christopherson wrote:
> On Tue, Dec 01, 2020, Ashish Kalra wrote:
>> From: Brijesh Singh 
>>
>> KVM hypercall framework relies on alternative framework to patch the
>> VMCALL -> VMMCALL on AMD platform. If a hypercall is made before
>> apply_alternative() is called then it defaults to VMCALL. The approach
>> works fine on non SEV guest. A VMCALL would causes #UD, and hypervisor
>> will be able to decode the instruction and do the right things. But
>> when SEV is active, guest memory is encrypted with guest key and
>> hypervisor will not be able to decode the instruction bytes.
>>
>> Add SEV specific hypercall3, it unconditionally uses VMMCALL. The hypercall
>> will be used by the SEV guest to notify encrypted pages to the hypervisor.
> What if we invert KVM_HYPERCALL and X86_FEATURE_VMMCALL to default to VMMCALL
> and opt into VMCALL?  It's a synthetic feature flag either way, and I don't
> think there are any existing KVM hypercalls that happen before alternatives 
> are
> patched, i.e. it'll be a nop for sane kernel builds.


If we invert the X86_FEATURE_VMMCALL to default to VMMCALL then it
should work fine without this patch. So far there was no hypercall made
before the alternative patching took place. Since the page state change
can occur much before the alternative patching so we need to default to
VMMCALL when SEV is active.


> I'm also skeptical that a KVM specific hypercall is the right approach for the
> encryption behavior, but I'll take that up in the patches later in the series.


Great, I am open to explore other alternative approaches.


>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: "H. Peter Anvin" 
>> Cc: Paolo Bonzini 
>> Cc: "Radim Krčmář" 
>> Cc: Joerg Roedel 
>> Cc: Borislav Petkov 
>> Cc: Tom Lendacky 
>> Cc: x...@kernel.org
>> Cc: k...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Reviewed-by: Steve Rutherford 
>> Reviewed-by: Venu Busireddy 
>> Signed-off-by: Brijesh Singh 
>> Signed-off-by: Ashish Kalra 
>> ---
>>  arch/x86/include/asm/kvm_para.h | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_para.h 
>> b/arch/x86/include/asm/kvm_para.h
>> index 338119852512..bc1b11d057fc 100644
>> --- a/arch/x86/include/asm/kvm_para.h
>> +++ b/arch/x86/include/asm/kvm_para.h
>> @@ -85,6 +85,18 @@ static inline long kvm_hypercall4(unsigned int nr, 
>> unsigned long p1,
>>  return ret;
>>  }
>>  
>> +static inline long kvm_sev_hypercall3(unsigned int nr, unsigned long p1,
>> +  unsigned long p2, unsigned long p3)
>> +{
>> +long ret;
>> +
>> +asm volatile("vmmcall"
>> + : "=a"(ret)
>> + : "a"(nr), "b"(p1), "c"(p2), "d"(p3)
>> + : "memory");
>> +return ret;
>> +}
>> +
>>  #ifdef CONFIG_KVM_GUEST
>>  bool kvm_para_available(void);
>>  unsigned int kvm_arch_para_features(void);
>> -- 
>> 2.17.1
>>

Re: [PATCH v2] iommu/amd: Enforce 4k mapping for certain IOMMU data structures

2020-11-19 Thread Brijesh Singh



On 11/19/20 8:30 PM, Suravee Suthikulpanit wrote:
> Will,
>
> To answer your questions from v1 thread.
>
> On 11/18/20 5:57 AM, Will Deacon wrote:
> > On 11/5/20 9:58 PM, Suravee Suthikulpanit wrote:
> >> AMD IOMMU requires 4k-aligned pages for the event log, the PPR log,
> >> and the completion wait write-back regions. However, when allocating
> >> the pages, they could be part of large mapping (e.g. 2M) page.
> >> This causes #PF due to the SNP RMP hardware enforces the check based
> >> on the page level for these data structures.
> >
> > Please could you include an example backtrace here?
>
> Unfortunately, we don't actually have the backtrace available here.
> This information is based on the SEV-SNP specification.
>
> >> So, fix by calling set_memory_4k() on the allocated pages.
> >
> > I think I'm missing something here. set_memory_4k() will break the
> kernel
> > linear mapping up into page granular mappings, but the IOMMU isn't
> using
> > that mapping, right?
>
> That's correct. This does not affect the IOMMU, but it affects the PSP
> FW.
>
> > It's just using the physical address returned by
> iommu_virt_to_phys(), so why does it matter?
> >
> > Just be nice to capture some of this rationale in the log,
> especially as
> > I'm not familiar with this device.
>
> According to the AMD SEV-SNP white paper
> (https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf),
> the Reverse Map Table (RMP) contains one entry for every 4K page of
> DRAM that may be used by the VM. In this case, the pages allocated by
> the IOMMU driver are added as 4K entries in the RMP table by the
> SEV-SNP FW.
>
> During the page table walk, the RMP checks if the page is owned by the
> hypervisor. Without calling set_memory_4k() to break the mapping up
> into 4K pages, pages could end up being part of large mapping (e.g. 2M
> page), in which the page access would be denied and result in #PF.


Since the page is added as a 4K page in the RMP table by the SEV-SNP FW,
so we need to split the physmap to ensure that this page will be access
with a 4K mapping from the x86. If the page is part of large page then
write access will cause a RMP violation (i.e #PF), this is because SNP
hardware enforce that the CPU page level walk must match with page-level
programmed in the RMP table.


>
> >> Fixes: commit c69d89aff393 ("iommu/amd: Use 4K page for completion
> wait write-back semaphore")
> >
> > I couldn't figure out how that commit could cause this problem.
> Please can
> > you explain that to me?
>
> Hope this helps clarify. If so, I'll update the commit log and send
> out V3.
>
> Thanks,
> Suravee

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1608 matches

Mail list logo