Re: [Part1 PATCH v4 15/17] percpu: introduce DEFINE_PER_CPU_UNENCRYPTED

2017-09-19 Thread Brijesh Singh

Hi Boris,

On 09/19/2017 05:39 AM, Borislav Petkov wrote:
...


@@ -815,6 +825,7 @@
. = ALIGN(cacheline);   \
*(.data..percpu)\
*(.data..percpu..shared_aligned)\
+   PERCPU_UNENCRYPTED_SECTION  \
VMLINUX_SYMBOL(__per_cpu_end) = .;


So looking at this more: I'm wondering if we can simply reuse the
PER_CPU_SHARED_ALIGNED_SECTION definition which is for shared per-CPU
sections. Instead of introducing a special section which is going to be
used only by SEV, practically.

Because "shared" also kinda implies that it is shared by multiple agents
and those agents can just as well be guest and hypervisor. And then that
patch is gone too.

Hmmm...?



"..shared_aligned" section does not start and end with page-size alignment.
Since the C-bit works on PAGE_SIZE alignment hence the "..unencrypted" section
starts and ends with page-size alignment. The closest I can find is
"..page_aligned" but again it does not end with page-size alignment.

Additionally, since we clear the C-bit from unencrypted section hence we
should avoid overloading the existing section -- we don't want to expose more
than we wish.



[Part1 PATCH v4 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) description

2017-09-16 Thread Brijesh Singh
Update the AMD memory encryption document describing the Secure Encrypted
Virtualization (SEV) feature.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Jonathan Corbet 
Cc: Borislav Petkov 
Cc: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 Documentation/x86/amd-memory-encryption.txt | 30 +
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
index f512ab718541..afc41f544dab 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -1,4 +1,5 @@
-Secure Memory Encryption (SME) is a feature found on AMD processors.
+Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
+features found on AMD processors.
 
 SME provides the ability to mark individual pages of memory as encrypted using
 the standard x86 page tables.  A page that is marked encrypted will be
@@ -6,24 +7,38 @@ automatically decrypted when read from DRAM and encrypted 
when written to
 DRAM.  SME can therefore be used to protect the contents of DRAM from physical
 attacks on the system.
 
+SEV enables running encrypted virtual machines (VMs) in which the code and data
+of the guest VM are secured so that a decrypted version is available only
+within the VM itself. SEV guest VMs have the concept of private and shared
+memory. Private memory is encrypted with the guest-specific key, while shared
+memory may be encrypted with hypervisor key. When SME is enabled, the 
hypervisor
+key is the same key which is used in SME.
+
 A page is encrypted when a page table entry has the encryption bit set (see
 below on how to determine its position).  The encryption bit can also be
 specified in the cr3 register, allowing the PGD table to be encrypted. Each
 successive level of page tables can also be encrypted by setting the encryption
 bit in the page table entry that points to the next table. This allows the full
 page table hierarchy to be encrypted. Note, this means that just because the
-encryption bit is set in cr3, doesn't imply the full hierarchy is encyrpted.
+encryption bit is set in cr3, doesn't imply the full hierarchy is encrypted.
 Each page table entry in the hierarchy needs to have the encryption bit set to
 achieve that. So, theoretically, you could have the encryption bit set in cr3
 so that the PGD is encrypted, but not set the encryption bit in the PGD entry
 for a PUD which results in the PUD pointed to by that entry to not be
 encrypted.
 
-Support for SME can be determined through the CPUID instruction. The CPUID
-function 0x801f reports information related to SME:
+When SEV is enabled, instruction pages and guest page tables are always treated
+as private. All the DMA operations inside the guest must be performed on shared
+memory. Since the memory encryption bit is controlled by the guest OS when it
+is operating in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware
+forces the memory encryption bit to 1.
+
+Support for SME and SEV can be determined through the CPUID instruction. The
+CPUID function 0x801f reports information related to SME:
 
0x801f[eax]:
Bit[0] indicates support for SME
+   Bit[1] indicates support for SEV
0x801f[ebx]:
Bits[5:0]  pagetable bit number used to activate memory
   encryption
@@ -39,6 +54,13 @@ determine if SME is enabled and/or to enable memory 
encryption:
Bit[23]   0 = memory encryption features are disabled
  1 = memory encryption features are enabled
 
+If SEV is supported, MSR 0xc0010131 (MSR_AMD64_SEV) can be used to determine if
+SEV is active:
+
+   0xc0010131:
+   Bit[0]0 = memory encryption is not active
+ 1 = memory encryption is active
+
 Linux relies on BIOS to set this bit if BIOS has determined that the reduction
 in the physical address space as a result of enabling memory encryption (see
 CPUID information above) will not conflict with the address space resource
-- 
2.9.5



[Part1 PATCH v4 02/17] x86/mm: Add Secure Encrypted Virtualization (SEV) support

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

Provide support for Secure Encrypted Virtualization (SEV). This initial
support defines a flag that is used by the kernel to determine if it is
running with SEV active.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/mem_encrypt.h |  6 ++
 arch/x86/mm/mem_encrypt.c  | 26 ++
 include/linux/mem_encrypt.h| 12 
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 6a77c63540f7..2b024741bce9 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -47,6 +47,9 @@ void __init mem_encrypt_init(void);
 
 void swiotlb_set_mem_attributes(void *vaddr, unsigned long size);
 
+bool sme_active(void);
+bool sev_active(void);
+
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask0ULL
@@ -64,6 +67,9 @@ static inline void __init sme_early_init(void) { }
 static inline void __init sme_encrypt_kernel(void) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
+static inline bool sme_active(void) { return false; }
+static inline bool sev_active(void) { return false; }
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 /*
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 3fcc8e01683b..4e6dcabe52fc 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -40,6 +40,8 @@ static char sme_cmdline_off[] __initdata = "off";
 u64 sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+unsigned int sev_enabled __section(.data) = 0;
+
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
 
@@ -190,6 +192,30 @@ void __init sme_early_init(void)
protection_map[i] = pgprot_encrypted(protection_map[i]);
 }
 
+/*
+ * SME and SEV are very similar but they are not the same, so there are
+ * times that the kernel will need to distinguish between SME and SEV. The
+ * sme_active() and sev_active() functions are used for this.  When a
+ * distinction isn't needed, the mem_encrypt_active() function can be used.
+ *
+ * The trampoline code is a good example for this requirement.  Before
+ * paging is activated, SME will access all memory as decrypted, but SEV
+ * will access all memory as encrypted.  So, when APs are being brought
+ * up under SME the trampoline area cannot be encrypted, whereas under SEV
+ * the trampoline area must be encrypted.
+ */
+bool sme_active(void)
+{
+   return sme_me_mask && !sev_enabled;
+}
+EXPORT_SYMBOL_GPL(sme_active);
+
+bool sev_active(void)
+{
+   return sme_me_mask && sev_enabled;
+}
+EXPORT_SYMBOL_GPL(sev_active);
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void)
 {
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 265a9cd21cb4..b55ba30a60a0 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,17 +22,21 @@
 #else  /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
 #define sme_me_mask0ULL
+#define sev_enabled0
+
+static inline bool sme_active(void) { return false; }
+static inline bool sev_active(void) { return false; }
 
 #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
-static inline bool sme_active(void)
+static inline unsigned long sme_get_me_mask(void)
 {
-   return !!sme_me_mask;
+   return sme_me_mask;
 }
 
-static inline u64 sme_get_me_mask(void)
+static inline bool mem_encrypt_active(void)
 {
-   return sme_me_mask;
+   return !!sme_me_mask;
 }
 
 #ifdef CONFIG_AMD_MEM_ENCRYPT
-- 
2.9.5



[Part1 PATCH v4 04/17] x86/realmode: Don't decrypt trampoline area under SEV

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

When SEV is active the trampoline area will need to be in encrypted
memory so only mark the area decrypted if SME is active.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Laura Abbott 
Cc: "Kirill A. Shutemov" 
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/realmode/init.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 1f71980fc5e0..d03125c2b73b 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -63,9 +63,10 @@ static void __init setup_real_mode(void)
/*
 * If SME is active, the trampoline area will need to be in
 * decrypted memory in order to bring up other processors
-* successfully.
+* successfully. This is not needed for SEV.
 */
-   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+   if (sme_active())
+   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
 
memcpy(base, real_mode_blob, size);
 
-- 
2.9.5



[Part1 PATCH v4 05/17] x86/mm: Use encrypted access of boot related data with SEV

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

When Secure Encrypted Virtualization (SEV) is active, boot data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Laura Abbott 
Cc: "Kirill A. Shutemov" 
Cc: Matt Fleming 
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/ioremap.c | 44 ++--
 1 file changed, 30 insertions(+), 14 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 34f0e1847dd6..52cc0f4ed494 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -422,6 +422,9 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * areas should be mapped decrypted. And since the encryption key can
  * change across reboots, persistent memory should also be mapped
  * decrypted.
+ *
+ * If SEV is active, that implies that BIOS/UEFI also ran encrypted so
+ * only persistent memory should be mapped decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
@@ -458,6 +461,11 @@ static bool memremap_should_map_decrypted(resource_size_t 
phys_addr,
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   /* For SEV, these areas are encrypted */
+   if (sev_active())
+   break;
+   /* Fallthrough */
+
case E820_TYPE_PRAM:
return true;
default:
@@ -581,7 +589,7 @@ static bool __init 
early_memremap_is_setup_data(resource_size_t phys_addr,
 bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
 unsigned long flags)
 {
-   if (!sme_active())
+   if (!mem_encrypt_active())
return true;
 
if (flags & MEMREMAP_ENC)
@@ -590,12 +598,13 @@ bool arch_memremap_can_ram_remap(resource_size_t 
phys_addr, unsigned long size,
if (flags & MEMREMAP_DEC)
return false;
 
-   if (memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   return false;
+   if (sme_active()) {
+   if (memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size))
+   return false;
+   }
 
-   return true;
+   return !memremap_should_map_decrypted(phys_addr, size);
 }
 
 /*
@@ -608,17 +617,24 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
 unsigned long size,
 pgprot_t prot)
 {
-   if (!sme_active())
+   bool encrypted_prot;
+
+   if (!mem_encrypt_active())
return prot;
 
-   if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   prot = pgprot_decrypted(prot);
-   else
-   prot = pgprot_encrypted(prot);
+   encrypted_prot = true;
+
+   if (sme_active()) {
+   if (early_memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size))
+   encrypted_prot = false;
+   }
+
+   if (encrypted_prot && memremap_should_map_decrypted(phys_addr, size))
+   encrypted_prot = false;
 
-   return prot;
+   return encrypted_prot ? pgprot_encrypted(prot)
+ : pgprot_decrypted(prot);
 }
 
 bool phys_mem_access_encrypted(unsigned long phys_addr, unsigned long size)
-- 
2.9.5



[Part1 PATCH v4 07/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: linux-...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/platform/efi/efi_64.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e83888e5b9..5469c9319f43 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +417,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -538,6 +546,9 @@ static int __init efi_update_mem_attr(struct mm_struct *mm, 
efi_memory_desc_t *m
if (!(md->attribute & EFI_MEMORY_RO))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
return efi_update_mappings(md, pf);
 }
 
@@ -589,6 +600,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.5



[Part1 PATCH v4 09/17] resource: Provide resource struct in resource walk callback

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

In prep for a new function that will need additional resource information
during the resource walk, update the resource walk callback to pass the
resource structure.  Since the current callback start and end arguments
are pulled from the resource structure, the callback functions can obtain
them from the resource structure directly.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: Benjamin Herrenschmidt 
Reviewed-by: Kees Cook 
Reviewed-by: Borislav Petkov 
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/powerpc/kernel/machine_kexec_file_64.c | 12 +---
 arch/x86/kernel/crash.c | 18 +-
 arch/x86/kernel/pmem.c  |  2 +-
 include/linux/ioport.h  |  4 ++--
 include/linux/kexec.h   |  2 +-
 kernel/kexec_file.c |  5 +++--
 kernel/resource.c   |  9 +
 7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c 
b/arch/powerpc/kernel/machine_kexec_file_64.c
index 992c0d258e5d..e4395f937d63 100644
--- a/arch/powerpc/kernel/machine_kexec_file_64.c
+++ b/arch/powerpc/kernel/machine_kexec_file_64.c
@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
  * and that value will be returned. If all free regions are visited without
  * func returning non-zero, then zero will be returned.
  */
-int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+   int (*func)(struct resource *, void *))
 {
int ret = 0;
u64 i;
phys_addr_t mstart, mend;
+   struct resource res = { };
 
if (kbuf->top_down) {
for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 44404e2307bb..815008c9ca18 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
 
@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data 
*ced,
return ret;
 }
 
-static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
+static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
struct crash_elf_data *ced = arg;
Elf64_Ehdr *ehdr;
@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, 
u64 end, void *arg)
ehdr = ced->ehdr;
 
/* Exclude unwanted mem ranges */
-   ret = elf_header_exclude_ranges(ced, start, end);
+   ret = elf_header_exclude_ranges(ced, res->start, res->end);
if (ret)
return ret;
 
@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, 
struct e820_entry *entry)
return 0;
 }
 
-static int memmap_entry_callback(u64 start, u64 end, void *arg)
+static int memmap_entry_callback(struct resource *res, void *arg)
 {
struct crash_memmap_data *cmd = arg;
struct boot_params *params = cmd->params;
struct e820_entry ei;
 
-   ei.addr = start;
-   ei.size = end - start + 1;
+   ei.addr = res->start;
+   ei.size = res->end - res->start + 1;
ei.type = cmd->type;
add_e820_entry(params, &ei);
 
@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
return ret;
 }
 
-static int determine_backup_

[Part1 PATCH v4 06/17] x86/mm: Include SEV for encryption memory attribute changes

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

The current code checks only for sme_active() when determining whether
to perform the encryption attribute change.  Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: John Ogness 
Cc: Matt Fleming 
Cc: Laura Abbott 
Cc: Dan Williams 
Cc: "Kirill A. Shutemov" 
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/pageattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dfb7d657cf43..3fe68483463c 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
unsigned long start;
int ret;
 
-   /* Nothing to do if the SME is not active */
-   if (!sme_active())
+   /* Nothing to do if memory encryption is not active */
+   if (!mem_encrypt_active())
return 0;
 
/* Should not be working on unaligned addresses */
-- 
2.9.5



[Part1 PATCH v4 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Shevchenko 
Cc: David Laight 
Cc: Arnd Bergmann 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/io.h | 42 ++
 arch/x86/mm/mem_encrypt.c |  8 
 2 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index c40a95c33bb8..07c28ee398d9 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -265,6 +265,20 @@ static inline void slow_down_io(void)
 
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern struct static_key_false __sev;
+static inline bool __sev_active(void)
+{
+   return static_branch_unlikely(&__sev);
+}
+
+#else /* !CONFIG_AMD_MEM_ENCRYPT */
+
+static inline bool __sev_active(void) { return false; }
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
 #define BUILDIO(bwl, bw, type) \
 static inline void out##bwl(unsigned type value, int port) \
 {  \
@@ -295,14 +309,34 @@ static inline unsigned type in##bwl##_p(int port) 
\
\
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {  \
-   asm volatile("rep; outs" #bwl   \
-: "+S"(addr), "+c"(count) : "d"(port) : "memory"); \
+   if (__sev_active()) {   \
+   unsigned type *value = (unsigned type *)addr;   \
+   while (count) { \
+   out##bwl(*value, port); \
+   value++;\
+   count--;\
+   }   \
+   } else {\
+   asm volatile("rep; outs" #bwl   \
+: "+S"(addr), "+c"(count)  \
+: "d"(port) : "memory");   \
+   }   \
 }  \
\
 static inline void ins##bwl(int port, void *addr, unsigned long count) \
 {  \
-   asm volatile("rep; ins" #bwl\
-: "+D"(addr), "+c"(count) : "d"(port) : "memory"); \
+   if (__sev_active()) {   \
+   unsigned type *value = (unsigned type *)addr;   \
+   while (count) { \
+   *value = in##bwl(port); \
+   value++;\
+   count--;\
+   }   \
+   } else {\
+   asm volatile("rep; ins" #bwl\
+: "+D"(addr), "+c"(count)  \
+: "d"(port) : "memory");   \
+   }   \
 }
 
 BUILDIO(b, b, char)
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 9b0c921c0597..b361fabde4c8 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -39,6 +39,8 @@ static char sme_cmdline_off[] __initdata = "off";
  */
 u64 sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
+DEFINE_STATIC_KEY_FALSE(__sev);
+EXPORT_SYMBOL_GPL(__sev);
 
 unsigned int sev_enabled __section(.data) = 0;
 
@@ -311,6 +313,12 @@ void __init mem_encrypt_init(void)
if (sev_active())
dma_ops = &sev_dma_ops;
 
+   /*
+* With SEV, we need to unroll the rep string I/O instructions.
+*/
+   if (sev_active())
+   static_branch_enable(&__sev);
+
pr_info("AMD %s active\n",
sev_active() ? "Secure Encrypted Virtualization (SEV)"
 : "Secure Memory Encryption (SME)");
-- 
2.9.5



[Part1 PATCH v4 12/17] x86/boot: Add early boot support when running with SEV active

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.

Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x0001, bit 31), that the SEV feature is available
(CPUID 0x801f, bit 1) and then checking a non-interceptable SEV MSR
(0xc0010131, bit 0).

This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory, it can boot properly.

After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so.  This allows
us to distinguish between SME and SEV, each of which have unique
differences in how certain things are handled: e.g. DMA (always bounce
buffered with SEV) or EFI tables (always access decrypted with SME).

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Konrad Rzeszutek Wilk 
Cc: "Kirill A. Shutemov" 
Cc: Laura Abbott 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/boot/compressed/Makefile  |   1 +
 arch/x86/boot/compressed/head_64.S |  16 +
 arch/x86/boot/compressed/mem_encrypt.S | 115 +
 arch/x86/boot/compressed/misc.h|   2 +
 arch/x86/boot/compressed/pagetable.c   |   8 ++-
 arch/x86/include/asm/msr-index.h   |   3 +
 arch/x86/include/uapi/asm/kvm_para.h   |   1 -
 arch/x86/mm/mem_encrypt.c  |  50 ++
 8 files changed, 181 insertions(+), 15 deletions(-)
 create mode 100644 arch/x86/boot/compressed/mem_encrypt.S

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 8a958274b54c..7fc5b7168e4f 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -77,6 +77,7 @@ vmlinux-objs-$(CONFIG_EARLY_PRINTK) += 
$(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o
+   vmlinux-objs-y += $(obj)/mem_encrypt.o
 endif
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index b4a5d284391c..3dfad60720d0 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -130,6 +130,19 @@ ENTRY(startup_32)
  /*
   * Build early 4G boot pagetable
   */
+   /*
+* If SEV is active then set the encryption mask in the page tables.
+* This will insure that when the kernel is copied and decompressed
+* it will be done so encrypted.
+*/
+   callget_sev_encryption_bit
+   xorl%edx, %edx
+   testl   %eax, %eax
+   jz  1f
+   subl$32, %eax   /* Encryption bit is always above bit 31 */
+   bts %eax, %edx  /* Set encryption mask for page tables */
+1:
+
/* Initialize Page tables to 0 */
lealpgtable(%ebx), %edi
xorl%eax, %eax
@@ -140,12 +153,14 @@ ENTRY(startup_32)
lealpgtable + 0(%ebx), %edi
leal0x1007 (%edi), %eax
movl%eax, 0(%edi)
+   addl%edx, 4(%edi)
 
/* Build Level 3 */
lealpgtable + 0x1000(%ebx), %edi
leal0x1007(%edi), %eax
movl$4, %ecx
 1: movl%eax, 0x00(%edi)
+   addl%edx, 0x04(%edi)
addl$0x1000, %eax
addl$8, %edi
decl%ecx
@@ -156,6 +171,7 @@ ENTRY(startup_32)
movl$0x0183, %eax
movl$2048, %ecx
 1: movl%eax, 0(%edi)
+   addl%edx, 4(%edi)
addl$0x0020, %eax
addl$8, %edi
decl%ecx
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
new file mode 100644
index ..03149c77c749
--- /dev/null
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -0,0 +1,115 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+   .text
+   .code32
+ENTRY(get_sev_encryption_bit)
+   xor %eax, %eax
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   push%ebx
+   push%ecx
+   push%edx
+   push%edi
+
+   call1f
+1: popl%edi
+   subl$1b, %edi
+
+   movlenc_bit(%edi), %eax
+ 

[Part1 PATCH v4 11/17] x86/mm: DMA support for SEV memory encryption

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

DMA access to encrypted memory cannot be performed when SEV is active.
In order for DMA to properly work when SEV is active, the SWIOTLB bounce
buffers must be used.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Konrad Rzeszutek Wilk 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/mem_encrypt.c | 86 +++
 lib/swiotlb.c |  5 +--
 2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 4e6dcabe52fc..967f116ec65e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -190,6 +190,70 @@ void __init sme_early_init(void)
/* Update the protection map with memory encryption mask */
for (i = 0; i < ARRAY_SIZE(protection_map); i++)
protection_map[i] = pgprot_encrypted(protection_map[i]);
+
+   if (sev_active())
+   swiotlb_force = SWIOTLB_FORCE;
+}
+
+static void *sme_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+  gfp_t gfp, unsigned long attrs)
+{
+   unsigned long dma_mask;
+   unsigned int order;
+   struct page *page;
+   void *vaddr = NULL;
+
+   dma_mask = dma_alloc_coherent_mask(dev, gfp);
+   order = get_order(size);
+
+   /*
+* Memory will be memset to zero after marking decrypted, so don't
+* bother clearing it before.
+*/
+   gfp &= ~__GFP_ZERO;
+
+   page = alloc_pages_node(dev_to_node(dev), gfp, order);
+   if (page) {
+   dma_addr_t addr;
+
+   /*
+* Since we will be clearing the encryption bit, check the
+* mask with it already cleared.
+*/
+   addr = __sme_clr(phys_to_dma(dev, page_to_phys(page)));
+   if ((addr + size) > dma_mask) {
+   __free_pages(page, get_order(size));
+   } else {
+   vaddr = page_address(page);
+   *dma_handle = addr;
+   }
+   }
+
+   if (!vaddr)
+   vaddr = swiotlb_alloc_coherent(dev, size, dma_handle, gfp);
+
+   if (!vaddr)
+   return NULL;
+
+   /* Clear the SME encryption bit for DMA use if not swiotlb area */
+   if (!is_swiotlb_buffer(dma_to_phys(dev, *dma_handle))) {
+   set_memory_decrypted((unsigned long)vaddr, 1 << order);
+   memset(vaddr, 0, PAGE_SIZE << order);
+   *dma_handle = __sme_clr(*dma_handle);
+   }
+
+   return vaddr;
+}
+
+static void sme_free(struct device *dev, size_t size, void *vaddr,
+dma_addr_t dma_handle, unsigned long attrs)
+{
+   /* Set the SME encryption bit for re-use if not swiotlb area */
+   if (!is_swiotlb_buffer(dma_to_phys(dev, dma_handle)))
+   set_memory_encrypted((unsigned long)vaddr,
+1 << get_order(size));
+
+   swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
 /*
@@ -216,6 +280,20 @@ bool sev_active(void)
 }
 EXPORT_SYMBOL_GPL(sev_active);
 
+static const struct dma_map_ops sev_dma_ops = {
+   .alloc  = sme_alloc,
+   .free   = sme_free,
+   .map_page   = swiotlb_map_page,
+   .unmap_page = swiotlb_unmap_page,
+   .map_sg = swiotlb_map_sg_attrs,
+   .unmap_sg   = swiotlb_unmap_sg_attrs,
+   .sync_single_for_cpu= swiotlb_sync_single_for_cpu,
+   .sync_single_for_device = swiotlb_sync_single_for_device,
+   .sync_sg_for_cpu= swiotlb_sync_sg_for_cpu,
+   .sync_sg_for_device = swiotlb_sync_sg_for_device,
+   .mapping_error  = swiotlb_dma_mapping_error,
+};
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void)
 {
@@ -225,6 +303,14 @@ void __init mem_encrypt_init(void)
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();
 
+   /*
+* With SEV, DMA operations cannot use encryption. New DMA ops
+* are required in order to mark the DMA areas as decrypted or
+* to use bounce buffers.
+*/
+   if (sev_active())
+   dma_ops = &sev_dma_ops;
+
pr_info("AMD Secure Memory Encryption (SME) active\n");
 }
 
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 8c6c83ef57a4..cea19aaf303c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -507,8 +507,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
 
-   if (sme_active())
-   pr_warn_onc

[Part1 PATCH v4 14/17] x86: Add support for changing memory encryption attribute in early boot

2017-09-16 Thread Brijesh Singh
Some KVM-specific custom MSRs share the guest physical address with the
hypervisor in early boot. When SEV is active, the shared physical address
must be mapped with memory encryption attribute cleared so that both
hypervisor and guest can access the data.

Add APIs to change the memory encryption attribute in early boot code.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/mem_encrypt.h |  17 ++
 arch/x86/mm/mem_encrypt.c  | 121 +
 2 files changed, 138 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 2b024741bce9..21b9d8fc8293 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -42,6 +42,11 @@ void __init sme_early_init(void);
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(struct boot_params *bp);
 
+int __init early_set_memory_decrypted(resource_size_t paddr,
+ unsigned long size);
+int __init early_set_memory_encrypted(resource_size_t paddr,
+ unsigned long size);
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void);
 
@@ -70,6 +75,18 @@ static inline void __init sme_enable(struct boot_params *bp) 
{ }
 static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 
+static inline int __init early_set_memory_decrypted(resource_size_t paddr,
+   unsigned long size)
+{
+   return 0;
+}
+
+static inline int __init early_set_memory_encrypted(resource_size_t paddr,
+   unsigned long size)
+{
+   return 0;
+}
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 /*
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index b361fabde4c8..cecdf52f3c70 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#include "mm_internal.h"
+
 static char sme_cmdline_arg[] __initdata = "mem_encrypt";
 static char sme_cmdline_on[]  __initdata = "on";
 static char sme_cmdline_off[] __initdata = "off";
@@ -258,6 +260,125 @@ static void sme_free(struct device *dev, size_t size, 
void *vaddr,
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
+static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+{
+   pgprot_t old_prot, new_prot;
+   unsigned long pfn;
+   pte_t new_pte;
+
+   switch (level) {
+   case PG_LEVEL_4K:
+   pfn = pte_pfn(*kpte);
+   old_prot = pte_pgprot(*kpte);
+   break;
+   case PG_LEVEL_2M:
+   pfn = pmd_pfn(*(pmd_t *)kpte);
+   old_prot = pmd_pgprot(*(pmd_t *)kpte);
+   break;
+   case PG_LEVEL_1G:
+   pfn = pud_pfn(*(pud_t *)kpte);
+   old_prot = pud_pgprot(*(pud_t *)kpte);
+   break;
+   default:
+   return;
+   }
+
+   new_prot = old_prot;
+   if (enc)
+   pgprot_val(new_prot) |= _PAGE_ENC;
+   else
+   pgprot_val(new_prot) &= ~_PAGE_ENC;
+
+   /* if prot is same then do nothing */
+   if (pgprot_val(old_prot) == pgprot_val(new_prot))
+   return;
+
+   new_pte = pfn_pte(pfn, new_prot);
+   set_pte_atomic(kpte, new_pte);
+}
+
+static int __init early_set_memory_enc_dec(resource_size_t paddr,
+  unsigned long size, bool enc)
+{
+   unsigned long vaddr, vaddr_end, vaddr_next;
+   unsigned long psize, pmask;
+   int split_page_size_mask;
+   pte_t *kpte;
+   int level, ret;
+
+   vaddr = (unsigned long)__va(paddr);
+   vaddr_next = vaddr;
+   vaddr_end = vaddr + size;
+
+   /*
+* We are going to change the physical page attribute from C=1 to C=0
+* or vice versa. Flush the caches to ensure that data is written into
+* memory with correct C-bit before we change attribute.
+*/
+   clflush_cache_range(__va(paddr), size);
+
+   for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+   kpte = lookup_address(vaddr, &level);
+   if (!kpte || pte_none(*kpte)) {
+   ret = 1;
+   goto out;
+   }
+
+   if (level == PG_LEVEL_4K) {
+   __set_clr_pte_enc(kpte, level, enc);
+   vaddr_next = (vaddr & PAGE_MASK) + PAGE_SIZE;
+   continue;
+   }
+
+   psize = page_level_size(level);
+   pmask = page_level_mask(level);
+
+   /*
+* Check, whether we can change the larg

[Part1 PATCH v4 15/17] percpu: introduce DEFINE_PER_CPU_UNENCRYPTED

2017-09-16 Thread Brijesh Singh
When SEV is active, memory is encrypted with guest-specific key, and if
guest OS wants to share the memory region with hypervisor then it must
clear the C-bit (i.e set unencrypted) before sharing it.

DEFINE_PER_CPU_UNENCRYPTED can be used to define the per-cpu variables
which will be shared between guest and hypervisor. Currently, KVM defines
three variables (steal-time, apf_reason, and avic_eio) which are shared
with hypervisor.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Arnd Bergmann 
Cc: Tejun Heo 
Cc: Christoph Lameter 
Cc: linux-a...@vger.kernel.org
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 include/asm-generic/vmlinux.lds.h | 11 +++
 include/linux/percpu-defs.h   | 15 +++
 2 files changed, 26 insertions(+)

diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index 8acfc1e099e1..363858f43cbc 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -777,6 +777,16 @@
 #define INIT_RAM_FS
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define PERCPU_UNENCRYPTED_SECTION \
+   . = ALIGN(PAGE_SIZE);   \
+   *(.data..percpu..unencrypted)   \
+   . = ALIGN(PAGE_SIZE);
+#else
+#define PERCPU_UNENCRYPTED_SECTION
+#endif
+
+
 /*
  * Default discarded sections.
  *
@@ -815,6 +825,7 @@
. = ALIGN(cacheline);   \
*(.data..percpu)\
*(.data..percpu..shared_aligned)\
+   PERCPU_UNENCRYPTED_SECTION  \
VMLINUX_SYMBOL(__per_cpu_end) = .;
 
 /**
diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h
index 8f16299ca068..b2b99ad4b31d 100644
--- a/include/linux/percpu-defs.h
+++ b/include/linux/percpu-defs.h
@@ -173,6 +173,21 @@
DEFINE_PER_CPU_SECTION(type, name, "..read_mostly")
 
 /*
+ * Declaration/definition used for per-CPU variables that should be accessed
+ * as unencrypted when memory encryption is enabled in the guest.
+ */
+#if defined(CONFIG_VIRTUALIZATION) && defined(CONFIG_AMD_MEM_ENCRYPT)
+
+#define DECLARE_PER_CPU_UNENCRYPTED(type, name)
\
+   DECLARE_PER_CPU_SECTION(type, name, "..unencrypted")
+
+#define DEFINE_PER_CPU_UNENCRYPTED(type, name) \
+   DEFINE_PER_CPU_SECTION(type, name, "..unencrypted")
+#else
+#define DEFINE_PER_CPU_UNENCRYPTED(type, name) DEFINE_PER_CPU(type, name)
+#endif
+
+/*
  * Intermodule exports for per-CPU variables.  sparse forgets about
  * address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to
  * noop if __CHECKER__.
-- 
2.9.5



[Part1 PATCH v4 17/17] X86/KVM: Clear encryption attribute when SEV is active

2017-09-16 Thread Brijesh Singh
The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
periodically updates the contents of the memory. When SEV is active, we
must clear the encryption attributes from the shared memory pages so that
both hypervisor and guest can access the data.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Tom Lendacky 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/entry/vdso/vma.c  |  5 ++--
 arch/x86/kernel/kvmclock.c | 65 ++
 2 files changed, 57 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 1911310959f8..d63053142b16 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm,
struct pvclock_vsyscall_time_info *pvti =
pvclock_pvti_cpu0_va();
if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
-   ret = vm_insert_pfn(
+   ret = vm_insert_pfn_prot(
vma,
vmf->address,
-   __pa(pvti) >> PAGE_SHIFT);
+   __pa(pvti) >> PAGE_SHIFT,
+   pgprot_decrypted(vma->vm_page_prot));
}
} else if (sym_offset == image->sym_hvclock_page) {
struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d88967659098..3de184be0887 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock wall_clock;
+static struct pvclock_wall_clock *wall_clock;
 
 struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 {
@@ -64,15 +65,15 @@ static void kvm_get_wallclock(struct timespec *now)
int low, high;
int cpu;
 
-   low = (int)__pa_symbol(&wall_clock);
-   high = ((u64)__pa_symbol(&wall_clock) >> 32);
+   low = (int)slow_virt_to_phys(wall_clock);
+   high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
 
native_write_msr(msr_kvm_wall_clock, low, high);
 
cpu = get_cpu();
 
vcpu_time = &hv_clock[cpu].pvti;
-   pvclock_read_wallclock(&wall_clock, vcpu_time, now);
+   pvclock_read_wallclock(wall_clock, vcpu_time, now);
 
put_cpu();
 }
@@ -249,11 +250,39 @@ static void kvm_shutdown(void)
native_machine_shutdown();
 }
 
+static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
+phys_addr_t align)
+{
+   phys_addr_t mem;
+
+   mem = memblock_alloc(size, align);
+   if (!mem)
+   return 0;
+
+   if (sev_active()) {
+   if (early_set_memory_decrypted(mem, size))
+   goto e_free;
+   }
+
+   return mem;
+e_free:
+   memblock_free(mem, size);
+   return 0;
+}
+
+static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
+{
+   if (sev_active())
+   early_set_memory_encrypted(addr, size);
+
+   memblock_free(addr, size);
+}
+
 void __init kvmclock_init(void)
 {
struct pvclock_vcpu_time_info *vcpu_time;
-   unsigned long mem;
-   int size, cpu;
+   unsigned long mem, mem_wall_clock;
+   int size, cpu, wall_clock_size;
u8 flags;
 
size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
@@ -267,21 +296,35 @@ void __init kvmclock_init(void)
} else if (!(kvmclock && kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
return;
 
-   printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
-   msr_kvm_system_time, msr_kvm_wall_clock);
+   wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
+   mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
+   if (!mem_wall_clock)
+   return;
 
-   mem = memblock_alloc(size, PAGE_SIZE);
-   if (!mem)
+   wall_clock = __va(mem_wall_clock);
+   memset(wall_clock, 0, wall_clock_size);
+
+   mem = kvm_memblock_alloc(size, PAGE_SIZE);
+   if (!mem) {
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
return;
+   }
+
hv_clock = __va(mem);
memset(hv_

[Part1 PATCH v4 08/17] resource: Consolidate resource walking code

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
functions each have much of the same code.  Create a new function that
consolidates the common code from these functions in one place to reduce
the amount of duplicated code.

Cc: Borislav Petkov 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 kernel/resource.c | 52 +---
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f04404152..7323c1b636cd 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -400,6 +400,26 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
return 0;
 }
 
+static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
+bool first_level_children_only,
+void *arg, int (*func)(u64, u64, void *))
+{
+   u64 orig_end = res->end;
+   int ret = -1;
+
+   while ((res->start < res->end) &&
+  !find_next_iomem_res(res, desc, first_level_children_only)) {
+   ret = (*func)(res->start, res->end, arg);
+   if (ret)
+   break;
+
+   res->start = res->end + 1;
+   res->end = orig_end;
+   }
+
+   return ret;
+}
+
 /*
  * Walks through iomem resources and calls func() with matching resource
  * ranges. This walks through whole tree and not just first level children.
@@ -418,26 +438,12 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long 
flags, u64 start,
u64 end, void *arg, int (*func)(u64, u64, void *))
 {
struct resource res;
-   u64 orig_end;
-   int ret = -1;
 
res.start = start;
res.end = end;
res.flags = flags;
-   orig_end = res.end;
-
-   while ((res.start < res.end) &&
-   (!find_next_iomem_res(&res, desc, false))) {
-
-   ret = (*func)(res.start, res.end, arg);
-   if (ret)
-   break;
-
-   res.start = res.end + 1;
-   res.end = orig_end;
-   }
 
-   return ret;
+   return __walk_iomem_res_desc(&res, desc, false, arg, func);
 }
 
 /*
@@ -451,22 +457,13 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *))
 {
struct resource res;
-   u64 orig_end;
-   int ret = -1;
 
res.start = start;
res.end = end;
res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-   orig_end = res.end;
-   while ((res.start < res.end) &&
-   (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
-   ret = (*func)(res.start, res.end, arg);
-   if (ret)
-   break;
-   res.start = res.end + 1;
-   res.end = orig_end;
-   }
-   return ret;
+
+   return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+arg, func);
 }
 
 #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
@@ -508,6 +505,7 @@ static int __is_ram(unsigned long pfn, unsigned long 
nr_pages, void *arg)
 {
return 1;
 }
+
 /*
  * This generic page_is_ram() returns true if specified address is
  * registered as System RAM in iomem_resource list.
-- 
2.9.5



[Part1 PATCH v4 16/17] X86/KVM: Unencrypt shared per-cpu variables when SEV is active

2017-09-16 Thread Brijesh Singh
When SEV is active, guest memory is encrypted with guest-specific key, a
guest memory region shared with hypervisor must be mapped as unencrypted
before we share it.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Tom Lendacky 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: k...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/kvm.c | 46 +++---
 1 file changed, 43 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 874827b0d7ca..9ccb48b027e4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
 
 early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 
-static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
-static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
+static DEFINE_PER_CPU_UNENCRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) 
__aligned(64);
+static DEFINE_PER_CPU_UNENCRYPTED(struct kvm_steal_time, steal_time) 
__aligned(64);
 static int has_steal_clock = 0;
 
 /*
@@ -305,7 +305,7 @@ static void kvm_register_steal_time(void)
cpu, (unsigned long long) slow_virt_to_phys(st));
 }
 
-static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
+static DEFINE_PER_CPU_UNENCRYPTED(unsigned long, kvm_apic_eoi) = 
KVM_PV_EOI_DISABLED;
 
 static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 {
@@ -419,9 +419,46 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+static inline void __init __set_percpu_var_unencrypted(
+   void *var, int size)
+{
+   unsigned long pa = slow_virt_to_phys(var);
+
+   /* decrypt the memory in-place */
+   sme_early_decrypt(pa, size);
+
+   /* clear the C-bit from the page table */
+   early_set_memory_decrypted(pa, size);
+}
+
+/*
+ * Iterate through all possible CPUs and map the memory region pointed
+ * by apf_reason, steal_time and kvm_apic_eoi as unencrypted at once.
+ *
+ * Note: we iterate through all possible CPUs to ensure that CPUs
+ * hotplugged will have their per-cpu variable already mapped as
+ * unencrypted.
+ */
+static void __init set_percpu_unencrypted(void)
+{
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   __set_percpu_var_unencrypted(&per_cpu(apf_reason, cpu),
+   sizeof(struct kvm_vcpu_pv_apf_data));
+   __set_percpu_var_unencrypted(&per_cpu(steal_time, cpu),
+   sizeof(struct kvm_steal_time));
+   __set_percpu_var_unencrypted(&per_cpu(kvm_apic_eoi, cpu),
+   sizeof(unsigned long));
+   }
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
+   if (sev_active())
+   set_percpu_unencrypted();
+
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
kvm_spinlock_init();
@@ -489,6 +526,9 @@ void __init kvm_guest_init(void)
  kvm_cpu_online, kvm_cpu_down_prepare) < 0)
pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
 #else
+   if (sev_active())
+   set_percpu_unencrypted();
+
kvm_guest_cpu_init();
 #endif
 
-- 
2.9.5



[Part1 PATCH v4 10/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

In order for memory pages to be properly mapped when SEV is active, we
need to use the PAGE_KERNEL protection attribute as the base protection.
This will insure that memory mapping of, e.g. ACPI tables, receives the
proper mapping attributes.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: "Kirill A. Shutemov" 
Cc: Laura Abbott 
Cc: Andy Lutomirski 
Cc: "Jérôme Glisse" 
Cc: Andrew Morton 
Cc: Dan Williams 
Cc: Kees Cook 
Cc: x...@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/ioremap.c  | 77 ++
 include/linux/ioport.h |  3 ++
 kernel/resource.c  | 19 +
 3 files changed, 88 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 52cc0f4ed494..812b8a8066ba 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -27,6 +27,11 @@
 
 #include "physaddr.h"
 
+struct ioremap_mem_flags {
+   bool system_ram;
+   bool desc_other;
+};
+
 /*
  * Fix up the linear direct mapping of the kernel to avoid cache attribute
  * conflicts.
@@ -56,19 +61,61 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long 
size,
return err;
 }
 
-static int __ioremap_check_ram(unsigned long start_pfn, unsigned long nr_pages,
-  void *arg)
+static int __ioremap_check_ram(struct resource *res)
 {
+   unsigned long start_pfn, stop_pfn;
unsigned long i;
 
-   for (i = 0; i < nr_pages; ++i)
-   if (pfn_valid(start_pfn + i) &&
-   !PageReserved(pfn_to_page(start_pfn + i)))
-   return 1;
+   if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
+   return 0;
+
+   start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
+   stop_pfn = (res->end + 1) >> PAGE_SHIFT;
+   if (stop_pfn > start_pfn) {
+   for (i = 0; i < (stop_pfn - start_pfn); ++i)
+   if (pfn_valid(start_pfn + i) &&
+   !PageReserved(pfn_to_page(start_pfn + i)))
+   return 1;
+   }
 
return 0;
 }
 
+static int __ioremap_check_desc_other(struct resource *res)
+{
+   return (res->desc != IORES_DESC_NONE);
+}
+
+static int __ioremap_res_check(struct resource *res, void *arg)
+{
+   struct ioremap_mem_flags *flags = arg;
+
+   if (!flags->system_ram)
+   flags->system_ram = __ioremap_check_ram(res);
+
+   if (!flags->desc_other)
+   flags->desc_other = __ioremap_check_desc_other(res);
+
+   return flags->system_ram && flags->desc_other;
+}
+
+/*
+ * To avoid multiple resource walks, this function walks resources marked as
+ * IORESOURCE_MEM and IORESOURCE_BUSY and looking for system RAM and/or a
+ * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
+ */
+static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
+   struct ioremap_mem_flags *flags)
+{
+   u64 start, end;
+
+   start = (u64)addr;
+   end = start + size - 1;
+   memset(flags, 0, sizeof(*flags));
+
+   walk_mem_res(start, end, flags, __ioremap_res_check);
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. It transparently creates kernel huge I/O mapping when
@@ -87,9 +134,10 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
unsigned long size, enum page_cache_mode pcm, void *caller)
 {
unsigned long offset, vaddr;
-   resource_size_t pfn, last_pfn, last_addr;
+   resource_size_t last_addr;
const resource_size_t unaligned_phys_addr = phys_addr;
const unsigned long unaligned_size = size;
+   struct ioremap_mem_flags mem_flags;
struct vm_struct *area;
enum page_cache_mode new_pcm;
pgprot_t prot;
@@ -108,13 +156,12 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
return NULL;
}
 
+   __ioremap_check_mem(phys_addr, size, &mem_flags);
+
/*
 * Don't allow anybody to remap normal RAM that we're using..
 */
-   pfn  = phys_addr >> PAGE_SHIFT;
-   last_pfn = last_addr >> PAGE_SHIFT;
-   if (walk_system_ram_range(pfn, last_pfn - pfn + 1, NULL,
- __ioremap_check_ram) == 1) {
+   if (mem_flags.system_ram) {
WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
  &phys_addr, &last_addr);
return NULL;
@@ -146,7 +193,15 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
pcm = new_pcm;
}
 
+   /*

[Part1 PATCH v4 03/17] x86/mm: Don't attempt to encrypt initrd under SEV

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

When SEV is active the initrd/initramfs will already have already been
placed in memory encrypted so do not try to encrypt it.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/setup.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 82559867e0a9..967155e63afe 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -368,9 +368,11 @@ static void __init reserve_initrd(void)
 * If SME is active, this memory will be marked encrypted by the
 * kernel when it is accessed (including relocation). However, the
 * ramdisk image was loaded decrypted by the bootloader, so make
-* sure that it is encrypted before accessing it.
+* sure that it is encrypted before accessing it. For SEV the
+* ramdisk will already be encrypted, so only do this for SME.
 */
-   sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
+   if (sme_active())
+   sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
 
initrd_start = 0;
 
-- 
2.9.5



[Part1 PATCH v4 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-09-16 Thread Brijesh Singh
This part of Secure Encrypted Virtualization (SEV) series focuses on the
changes required in a guest OS for SEV support.

When SEV is active, the memory content of guest OS will be transparently 
encrypted
with a key unique to the guest VM.

SEV guests have concept of private and shared memory. Private memory is 
encrypted
with the guest-specific key, while shared memory may be encrypted with 
hypervisor
key. Certain type of memory (namely insruction pages and guest page tables) are
always treated as private. Due to security reasons all DMA operations inside the
guest must be performed on shared memory.

The SEV feature is enabled by the hypervisor, and guest can identify it through
CPUID function and the 0xc0010131 (F17H_SEV) MSR. When enabled, page table 
entries
will determine how memory is accessed. If a page table entry has the memory
encryption mask set, then that memory will be accessed using guest-specific key.
Certain memory (instruction pages, page tables) will always be accessed using
guest-specific key.

This patch series builds upon the Secure Memory Encryption (SME) feature. Unlike
SME, when SEV is enabled, all the data (e.g EFI, kernel, initrd, etc) will have
been placed into memory as encrypted by the guest BIOS.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot. Since the DMA operations inside guest must be
performed on shared memory hence it uses SW-IOTLB to complete the DMA 
operations.

The following links provide additional details:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf


AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf


SEV Guest BIOS support:
  SEV support has been accepted into EDKII/OVMF BIOS
  https://github.com/tianocore/edk2/commits/master

---
This series is based on tip/master commit : e3b4bfd351fa (Merge branch 
'WIP.x86/apic').
Complete git tree is available: https://github.com/codomania/tip/tree/sev-v4-p1

Changes since v3:
 * use static key to branch the unrolling of rep ins/outs when SEV is active
 * simplify the memory encryption detection logic
 * rename per-cpu define to DEFINE_PER_CPU_UNENCRYPTED
 * simplfy the logic to map per-cpu as unencrypted
 * changes to address v3 feedbacks

Changes since v2:
 * add documentation
 * update early_set_memory_* to use kernel_physical_mapping_init()
   to split larger page into smaller (recommended by Boris)
 * changes to address v2 feedback
 * drop hypervisor specific patches, those patches will be included in part 2

Brijesh Singh (5):
  Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV)
description
  x86: Add support for changing memory encryption attribute in early
boot
  percpu: introduce DEFINE_PER_CPU_UNENCRYPTED
  X86/KVM: Unencrypt shared per-cpu variables when SEV is active
  X86/KVM: Clear encryption attribute when SEV is active

Tom Lendacky (12):
  x86/mm: Add Secure Encrypted Virtualization (SEV) support
  x86/mm: Don't attempt to encrypt initrd under SEV
  x86/realmode: Don't decrypt trampoline area under SEV
  x86/mm: Use encrypted access of boot related data with SEV
  x86/mm: Include SEV for encryption memory attribute changes
  x86/efi: Access EFI data as encrypted when SEV is active
  resource: Consolidate resource walking code
  resource: Provide resource struct in resource walk callback
  x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory
pages
  x86/mm: DMA support for SEV memory encryption
  x86/boot: Add early boot support when running with SEV active
  x86/io: Unroll string I/O when SEV is active

 Documentation/x86/amd-memory-encryption.txt |  30 ++-
 arch/powerpc/kernel/machine_kexec_file_64.c |  12 +-
 arch/x86/boot/compressed/Makefile   |   1 +
 arch/x86/boot/compressed/head_64.S  |  16 ++
 arch/x86/boot/compressed/mem_encrypt.S  | 115 +++
 arch/x86/boot/compressed/misc.h |   2 +
 arch/x86/boot/compressed/pagetable.c|   8 +-
 arch/x86/entry/vdso/vma.c   |   5 +-
 arch/x86/include/asm/io.h   |  42 +++-
 arch/x86/include/asm/mem_encrypt.h  |  23 +++
 arch/x86/include/asm/msr-index.h|   3 +
 arch/x86/include/uapi/asm/kvm_para.h|   1 -
 arch/x86/kernel/crash.c |  18 +-
 arch/x86/kernel/kvm.c   |  46 -
 arch/x86/kernel/kvmclock.c  |  65 +--
 arch/x86/kernel/pmem.c  |   2 +-
 arch/x86/kernel/setup.c |   6 +-
 arch/x86/mm/ioremap.c   

Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Brijesh Singh



On 09/15/2017 11:22 AM, Borislav Petkov wrote:

mem_encrypt_init() where everything should be set up already.



Yep, its safe to derefs the static key in mem_encrypt_init(). I've
tried the approach and it seems to be work fine. I will include the
required changes in next rev. thanks



Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Brijesh Singh



On 09/15/2017 09:40 AM, Borislav Petkov wrote:

I need to figure out the include hell first.


I am working with slightly newer patch sets -- in that patch Tom has
moved the sev_active() definition in arch/x86/mm/mem_encrypt.c and I
have no issue using your recommended (since I no longer need the include
path changes).

But in my quick run I did found a runtime issue, it seems enabling the static
key in sme_enable is too early. Guest reboots as soon as it tries to enable
the key.

I see the similar issue with non SEV guest with my simple patch below.
Guest will reboot as soon as it tries to enable the key.

--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,8 @@ pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & 
~(_PAGE_GLOBAL | _PAGE_NX);
 
 #define __head __section(.head.text)
 
+DEFINE_STATIC_KEY_FALSE(__testme);

+
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
return ptr - (void *)_text + (void *)physaddr;
@@ -71,6 +73,8 @@ unsigned long __head __startup_64(unsigned long physaddr,
if (load_delta & ~PMD_PAGE_MASK)
for (;;);
 
+   static_branch_enable(&__testme);

+
/* Activate Secure Memory Encryption (SME) if supported and enabled */
sme_enable(bp);



Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Brijesh Singh



On 09/15/2017 07:24 AM, Borislav Petkov wrote:

On Tue, Aug 22, 2017 at 06:52:48PM +0200, Borislav Petkov wrote:

As always, the devil is in the detail.


Ok, actually we can make this much simpler by using a static key. A
conceptual patch below - I only need to fix that crazy include hell I'm
stepping into with this.

In any case, we were talking about having a static branch already so
this fits the whole strategy.



thanks for the suggestion Boris, it will make patch much simpler.
I will try this out.

-Brijesh


Re: [RFC Part2 PATCH v3 19/26] KVM: svm: Add support for SEV GUEST_STATUS command

2017-09-14 Thread Brijesh Singh


On 9/14/17 5:35 AM, Borislav Petkov wrote:
...

> +
>> +if (copy_from_user(¶ms, (void *) argp->data,
>> +sizeof(struct kvm_sev_guest_status)))
> Let me try to understand what's going on here. You copy user data into
> params...

This is wrong -- since all the parameters in GET_STATUS is "OUT" hence
we don't need to perform copy_from_user. I will fix it. thanks

>
>> +return -EFAULT;
>> +
>> +data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +if (!data)
>> +return -ENOMEM;
>> +
>> +data->handle = sev_get_handle(kvm);
>> +ret = sev_issue_cmd(kvm, SEV_CMD_GUEST_STATUS, data, &argp->error);
>> +if (ret)
>> +goto e_free;
>> +
>> +params.policy = data->policy;
>> +params.state = data->state;
>> +params.handle = data->handle;
> ... *overwrite* the copied data which means, the copy meant *absolutely*
> *nothing* at all! ...
>
> Also, why does userspace need to know the firmware ->handle?


SEV firmware supports key-sharing, if guest policy allows sharing the
key between VMs then we need the firmware->handle. If key-sharing
feature is used then firmware->handle of the 1st VM will be passed into
the LAUNCH_START of  2nd VM.  I still have not coded up anything in qemu
for key-sharing and also I am using GET_STATUS command in qemu yet. But
wanted to make sure that if we decide to add "info sev-status" command
in qemu-monitor to retrieve the SEV state information and then all the
information is available to us.






Re: [RFC Part2 PATCH v3 16/26] KVM: SVM: Add support for SEV LAUNCH_UPDATE_DATA command

2017-09-13 Thread Brijesh Singh



On 09/13/2017 12:55 PM, Borislav Petkov wrote:
...


+
+   /* pin the user virtual address */
+   pinned = get_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0,
+   pages);


Let it stick out.



Will do.

...



+   vaddr = params.address;
+   size = params.length;
+   vaddr_end = vaddr + size;
+
+   /* lock the user memory */
+   inpages = sev_pin_memory(vaddr, size, &npages, 1);


This way user basically controls how many pages to pin and you need to
limit that on the upper end.



Actually I don't know what should be sane upper bound in this case --
typically we encrypt the guest BIOS using LAUNCH_UPDATE_DATA command.
I have heard that some user may want to create a pre-encrypted image
(which may contains guest BIOS + kernel + initrd) -- this can be huge.

For SEV guest, we have been needing to pin the memory hence how about if
we limit the number of pages to pin with rlimit ? The rlimit check can
also include the guest RAM pinning.



+   if (!inpages) {
+   ret = -ENOMEM;
+   goto e_free;
+   }
+
+   /*
+* invalidate the cache to ensure that DRAM has recent content before


recent content?



Cache access from the PSP are coherent with x86 but not other way around --
I will update the comments to reflect the true meaning.

...



Yah, let it stick out.



Okay.


Re: [RFC Part2 PATCH v3 15/26] KVM: SVM: Add support for SEV LAUNCH_START command

2017-09-13 Thread Brijesh Singh



On 09/13/2017 01:37 PM, Borislav Petkov wrote:

On Wed, Sep 13, 2017 at 01:23:08PM -0500, Brijesh Singh wrote:

Yes, I will add some upper bound check on the length field and add the
sanity-check just after copying the parameters from userspace


Also, you could either fail the command if some of the reserved fields
are set - picky - or zero them out - less picky :)




Actually reversed fields are not exposed in userspace structure.

e.g a LAUNCH_UPDATE_DATE userspace structure looks like this:

struct kvm_sev_launch_update_data {
__u64 address;   /* userspace address of memory region to encrypt */
__u32 length;/* length of memory region to encrypt */
};

But SEV firmware command structure is a slightly different (mainly it contains
the reserved field and firmware handle etc).

/**
  * struct sev_data_launch_update_data - LAUNCH_UPDATE_DATA command parameter
  *
  * @handle: firmware handle to use
  * @length: length of memory to be encrypted
  * @address: physical address of memory region to encrypt
  */
 struct sev_data_launch_update_data {
 u32 handle; /* In */
 u32 reserved;
 u64 address;/* In */
 u32 length; /* In */
 };


Please note that some commands require us passing the VM ASID etc --
userspace does not have VM ASID information.

The current approach is -- while handling the command we copy the value
from userspace structure into FW compatible structure and also populate
missing fields which are not known to userspace (e.g firmware handle,
VM ASID, use system physical addresses etc).


Re: [RFC Part2 PATCH v3 15/26] KVM: SVM: Add support for SEV LAUNCH_START command

2017-09-13 Thread Brijesh Singh



On 09/13/2017 12:25 PM, Borislav Petkov wrote:
...


+static void sev_deactivate_handle(struct kvm *kvm, int *error);
+static void sev_decommission_handle(struct kvm *kvm, int *error);


Please move code in a way that you don't need those forward
declarations. Also, I'm wondering if having all the SEV-related code
could live in sev.c or so - svm.c is humongous.




Yes, svm.c is humongous.


...


+
+static void sev_decommission_handle(struct kvm *kvm, int *error)
+{
+   struct sev_data_decommission *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);


Also, better on stack. Please do that for the other functions below too.



Yes, some structures are small and I don't expect them to grow in newer API
spec. We should be able to move them on the stack. I will audit the code and
make the necessary changes.





+   ret = -EFAULT;
+   if (copy_from_user(¶ms, (void *)argp->data,
+   sizeof(struct kvm_sev_launch_start)))


Sanity-check params. This is especially important if later we start
using reserved fields.



Yes, I will add some upper bound check on the length field and add the
sanity-check just after copying the parameters from userspace


...


+   goto e_free;
+
+   ret = -ENOMEM;
+   start = kzalloc(sizeof(*start), GFP_KERNEL);
+   if (!start)
+   goto e_free;
+
+   /* Bit 15:6 reserved, must be 0 */
+   start->policy = params.policy & ~0xffc0;
+
+   if (params.dh_cert_length && params.dh_cert_address) {


Yeah, we talked about this already: sanity-checking needed. But you get
the idea.



Will do

...



 if (copy_from_user(session_addr,
   (void *)params.session_address,
   params.session_length))

reads better to me. Better yet if you shorten those member names into
s_addr and s_len and so on...




Will use your recommendation.

thanks


Re: [RFC Part2 PATCH v3 13/26] KVM: SVM: Add KVM_SEV_INIT command

2017-09-13 Thread Brijesh Singh

Hi Boris,

thanks for the detail review.

On 09/13/2017 10:06 AM, Borislav Petkov wrote:
...


+static int sev_platform_get_state(int *state, int *error)
+{
+   int ret;
+   struct sev_data_status *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);


It's a bit silly to do the allocation only for the duration of
sev_platform_status() - just allocate "data" on the stack.



I am okay with moving it on the stack but just to give context why
I went in this way. The physical address of data is given to the
device (in this case SEV FW). I was not sure if its okay to pass the
stack address to the device. Additionally, the FW spec requires us to
zero all the fields -- so we need to memset if we allocate it on the
stack.



Re: [RFC Part2 PATCH v3 03/26] crypto: ccp: Add Secure Encrypted Virtualization (SEV) device support

2017-09-13 Thread Brijesh Singh



On 09/13/2017 09:17 AM, Borislav Petkov wrote:
...


+
+unlock:
+   mutex_unlock(&sev_cmd_mutex);
+   print_hex_dump_debug("(out): ", DUMP_PREFIX_OFFSET, 16, 2, data,
+   sev_cmd_buffer_len(cmd), false);
+   return ret;


... and here you return psp_ret == 0 even though something failed.

What I think you should do is not touch @psp_ret when you return before
the SEV command executes and *when* you return, set @psp_ret accordingly
to denote the status of the command execution.

Or if you're touching it before you execute the SEV
command and you return early, it should say something like
PSP_CMDRESP_COMMAND_DIDNT_EXECUTE or so, to tell the caller exactly what
happened.



Agreed, very good catch thank you. I will fix it.

-Brijesh


Re: [RFC Part2 PATCH v3 11/26] KVM: X86: Extend struct kvm_arch to include SEV information

2017-09-13 Thread Brijesh Singh



On 09/13/2017 08:37 AM, Borislav Petkov wrote:
...


+   return &kvm->arch.sev_info;
+}
+
+static inline void sev_set_active(struct kvm *kvm)
+{
+   to_sev_info(kvm)->active = true;
+}


Is this the accepted way to do this in KVM land or can you simply access
all members directly:

kvm->arch.sev_info.

Because I see stuff like that:



Actually, I see both approaches used in svm.c but I am flexible to go with
either ways. lets wait for Paolo and Radim comments.

-Brijesh



Re: [RFC Part2 PATCH v3 10/26] KVM: Introduce KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioctl

2017-09-12 Thread Brijesh Singh



On 09/12/2017 03:29 PM, Borislav Petkov wrote:

...


+   int (*memory_encryption_unregister_ram)(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram);
  };


You can shorten those prefixes to "mem_enc" or so and struct
kvm_memory_encrypt_ram to struct enc_region - which is exactly what it
is - an encrypted memory region descriptor - and then fit each function
on a single line.



Sure, I can do that. In one of the feedback Paolo recommended
KVM_MEMORY_ENCRYPT_* ioctl name hence I tried to stick with the same name
for structure. I am flexible to use 'struct enc_region' but I personally
prefer to keep "mem" somewhere in the structure naming to indicate its for
*memory* encryption -- maybe struct kvm_mem_enc_region.


...

+   struct kvm_memory_encrypt_ram)


As with KVM_MEMORY_ENCRYPT_OP, those two need to be in the KVM API document.



Yes, I missed updating the Documentation/virtual/kvm/api.txt for these new
ioctls. I will update it.


Re: [RFC Part2 PATCH v3 05/26] KVM: SVM: Reserve ASID range for SEV guest

2017-09-12 Thread Brijesh Singh



On 09/12/2017 03:04 PM, Borislav Petkov wrote:
...


SEV-enabled guest is from 1 to a maximum value defined via CPUID
Fn8000_001f[ECX].


I'd rewrite that to:

"The range of allowed SEV guest ASIDs is [1 - CPUID_8000_001F[ECX][31:0]]".



thanks, will do.

...

  
+/* Secure Encrypted Virtualization */


If anything, this comment should explain what that variable is.
Basically the comment you have in sev_hardware_setup() now.



Will add more comments.

...



max_sev_asid is static and it is already initialized to 0 and thus this
function can be simplified to:

/*
  * Get maximum number of encrypted guest supported: Fn8001_001F[ECX].
  *   [31:0]: Number of supported guests.
  */
static __init void sev_hardware_setup(void)
{
 max_sev_asid = cpuid_ecx(0x801F);
}



Agreed, I will improve it.

thanks


Re: [RFC Part2 PATCH v3 03/26] crypto: ccp: Add Secure Encrypted Virtualization (SEV) device support

2017-09-12 Thread Brijesh Singh

Hi Boris,

I will address all your feedback in next rev.


On 09/12/2017 09:02 AM, Borislav Petkov wrote:
...




You could make that more tabular like this:

 case SEV_CMD_INIT:  return sizeof(struct sev_data_init);
 case SEV_CMD_PLATFORM_STATUS:   return sizeof(struct sev_data_status);
 case SEV_CMD_PEK_CSR:   return sizeof(struct sev_data_pek_csr);
...

which should make it more readable.

But looking at this more, this is a static mapping between the commands
and the corresponding struct sizes and you use it in

 print_hex_dump_debug("(in):  ", DUMP_PREFIX_OFFSET, 16, 2, data,
 sev_cmd_buffer_len(cmd), false);

But then, I don't see what that brings you because you're not dumping
the actual @data length but the *expected* data length based on the
command type.

And *that* you can look up in the manual and do not need it in code,
enlarging the driver unnecessarily.

...



The debug statement is very helpful during development, it gives me the full
view of what command we send to PSP, data dump of command buffer before and
after the request completion.  e.g when dyndbg is enabled the output looks like
this:

[392035.621308] ccp :02:00.2: sev command id 0x4 buffer 0x80146d232820
[392035.621312] (in):  :      
[392035.624725] (out): : 0e00   0b00  

The first debug line prints command ID, second line prints memory dump of the 
command
structure and third line prints memory dump of command structure after PSP 
processed
the command.

The caller will use sev_issue_cmd() to issue PSP command. At this time we know
the command id and a opaque pointer (this points to command structure for 
command id).
Caller does not give us length of the command structure hence we need to derive 
it
from the command id using sev_cmd_buffer_len(). The command structure length is 
fixed
for a given command id.


Thanks
Brijesh


Re: [RFC Part2 PATCH v3 02/26] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-09-08 Thread Brijesh Singh



On 09/08/2017 03:40 AM, Borislav Petkov wrote:

On Thu, Sep 07, 2017 at 05:19:32PM -0500, Brijesh Singh wrote:

At high level, AMD-SP (AMD Secure Processor) (i.e CCP driver) will provide the
support for CCP, SEV and TEE FW commands.


  +--- CCP
  |
AMD-SP --|
  |+--- SEV
  ||
  + PSP ---*
   |
   + TEE


I still don't see the need for such finegrained separation, though.
There's no "this is a separate compilation unit because... ". At least
the PSP branch could be a single driver without the interface.

For example, psp_request_sev_irq() is called only by sev_dev_init(). So
why is sev-dev a separate compilation unit? Is anything else going to
use the PSP interface?



I don't know anything about the TEE support hence I don't have very strong
reason for finegrained separation -- I just wanted to ensure that the SEV
enablement does not interfere with TEE support in the future.




If not, just put it all in a psp-dev file and that's it. We have a
gazillion config options and having two more just because, is not a good
reason. You can always carve it out later if there's real need. But if
the SEV thing can't function without the PSP thing, then you can just as
well put it inside it.

This way you can save yourself a bunch of exported functions and the
like.

Another example for not optimal design is psp_request_tee_irq() - it
doesn't really request an irq by calling into the IRQ core but simply
assigns a handler. Which looks to me like you're simulating an interface
where one is not really needed. Ditto for the sev_irq version, btw.




It's possible that both TEE and SEV share the same interrupt but their
interrupt handling approach could be totally different hence I tried to
abstract it.

I am making several assumption on TEE side without knowing in detail

I can go with your recommendation -- we can always crave it out later once
the TEE support is visible.

-Brijesh


Re: [RFC Part2 PATCH v3 02/26] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-09-08 Thread Brijesh Singh



On 09/08/2017 03:40 AM, Borislav Petkov wrote:

On Thu, Sep 07, 2017 at 05:19:32PM -0500, Brijesh Singh wrote:

At high level, AMD-SP (AMD Secure Processor) (i.e CCP driver) will provide the
support for CCP, SEV and TEE FW commands.


  +--- CCP
  |
AMD-SP --|
  |+--- SEV
  ||
  + PSP ---*
   |
   + TEE


I still don't see the need for such finegrained separation, though.
There's no "this is a separate compilation unit because... ". At least
the PSP branch could be a single driver without the interface.

For example, psp_request_sev_irq() is called only by sev_dev_init(). So
why is sev-dev a separate compilation unit? Is anything else going to
use the PSP interface?



I don't know anything about the TEE support hence I don't have very strong
reason for finegrained separation -- I just wanted to ensure that the SEV
enablement does not interfere with TEE support in the future.



If not, just put it all in a psp-dev file and that's it. We have a
gazillion config options and having two more just because, is not a good
reason. You can always carve it out later if there's real need. But if
the SEV thing can't function without the PSP thing, then you can just as
well put it inside it.

This way you can save yourself a bunch of exported functions and the
like.

Another example for not optimal design is psp_request_tee_irq() - it
doesn't really request an irq by calling into the IRQ core but simply
assigns a handler. Which looks to me like you're simulating an interface
where one is not really needed. Ditto for the sev_irq version, btw.



It's possible that both TEE and SEV share the same interrupt but their
interrupt handling approach could be totally different hence I tried to
abstract it.

I am making several assumption on TEE side without knowing in detail ;)

I can go with your recommendation -- we can always crave it out later once
the TEE support is visible.

-Brijesh


Re: [RFC Part2 PATCH v3 02/26] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-09-07 Thread Brijesh Singh

Hi Boris,

On 09/07/2017 09:27 AM, Borislav Petkov wrote:

...



The commit message above reads better to me as the help text than what
you have here.

Also, in order to make it easier for the user, I think we'll need a
CONFIG_AMD_MEM_ENCRYPT_SEV or so and make that depend on CONFIG_KVM_AMD,
this above and all the other pieces that are needed. Just so that when
the user builds such a kernel, all is enabled and not her having to go
look for what else is needed.

And then put the sev code behind that config option. Depending on how
ugly it gets...



I will add more detail in the help text. I will look into adding some
depends.

...


+
+void psp_add_device(struct psp_device *psp)


That function is needlessly global and should be static, AFAICT.

Better yet, it is called only once and its body is trivial so you can
completely get rid of it and meld it into the callsite.



Agreed, will do.

.


+
+static struct psp_device *psp_alloc_struct(struct sp_device *sp)


"psp_alloc()" is enough I guess.



I was trying to adhere to the existing ccp-dev.c function naming
conversion.





static.

Please audit all your functions in the psp pile and make them static if
not needed outside of their compilation unit.



Will do.


+{
+   unsigned int status;
+   irqreturn_t ret = IRQ_HANDLED;
+   struct psp_device *psp = data;


Please sort function local variables declaration in a reverse christmas
tree order:

 longest_variable_name;
 shorter_var_name;
 even_shorter;
 i;



Got it, will do



+
+   /* read the interrupt status */
+   status = ioread32(psp->io_regs + PSP_P2CMSG_INTSTS);
+
+   /* invoke subdevice interrupt handlers */
+   if (status) {
+   if (psp->sev_irq_handler)
+   ret = psp->sev_irq_handler(irq, psp->sev_irq_data);
+   if (psp->tee_irq_handler)
+   ret = psp->tee_irq_handler(irq, psp->tee_irq_data);
+   }
+
+   /* clear the interrupt status */
+   iowrite32(status, psp->io_regs + PSP_P2CMSG_INTSTS);


We're clearing the status by writing the same value back?!? Shouldn't
that be:

iowrite32(0, psp->io_regs + PSP_P2CMSG_INTSTS);



Actually the SW should write "1" to clear the bit. To make it clear, I
can use value 1 and add comment.




Below I see

iowrite32(0x, psp->io_regs + PSP_P2CMSG_INTSTS);

which is supposed to clear IRQs. Btw, you can write that:

iowrite32(-1, psp->io_regs + PSP_P2CMSG_INTSTS);



Sure, I will do that

...

...


+
+   sp_set_psp_master(sp);


So this function is called only once and declared somewhere else. You
could simply do here:

 if (sp->set_psp_master_device)
 sp->set_psp_master_device(sp);

and get rid of one more global function.



Sure I can do that.




+   /* Enable interrupt */
+   dev_dbg(dev, "Enabling interrupts ...\n");
+   iowrite32(7, psp->io_regs + PSP_P2CMSG_INTEN);


Uh, a magic "7"! Exciting!

I wonder what that means and whether it could be a define with an
explanatory name instead. Ditto for the other values...




I will try to define some macro instead of hard coded values.




+
+int psp_dev_resume(struct sp_device *sp)
+{
+   return 0;
+}
+
+int psp_dev_suspend(struct sp_device *sp, pm_message_t state)
+{
+   return 0;
+}


Those last two are completely useless. Delete them pls.



We don't have any PM support, I agree will delete it.

...


+int psp_request_sev_irq(struct psp_device *psp, irq_handler_t handler,
+   void *data)
+{
+   psp->sev_irq_data = data;
+   psp->sev_irq_handler = handler;
+
+   return 0;
+}
+
+int psp_free_sev_irq(struct psp_device *psp, void *data)
+{
+   if (psp->sev_irq_handler) {
+   psp->sev_irq_data = NULL;
+   psp->sev_irq_handler = NULL;
+   }
+
+   return 0;
+}


Both void. Please do not return values from functions which are simply
void functions by design.



thanks, will fix it.

...


+int psp_request_sev_irq(struct psp_device *psp, irq_handler_t handler,
+   void *data);
+int psp_free_sev_irq(struct psp_device *psp, void *data);
+
+int psp_request_tee_irq(struct psp_device *psp, irq_handler_t handler,
+   void *data);


Let them stick out.


okay

...




+int psp_free_tee_irq(struct psp_device *psp, void *data);
+
+struct psp_device *psp_get_master_device(void);
+
+extern const struct psp_vdata psp_entry;
+
+#endif /* __PSP_DEV_H */
diff --git a/drivers/crypto/ccp/sp-dev.c b/drivers/crypto/ccp/sp-dev.c


So this file is called sp-dev and the other psp-dev. Confusing.

And in general, why isn't the whole thing a single psp-dev and you can
save yourself all the registering blabla and have a single driver for
the whole PSP functionality?

Distros will have to enable everything anyway and the whole CCP/PSP code
is only a couple of KBs so you can just as well put it all into 

Re: SME/32-bit regression

2017-09-06 Thread Brijesh Singh



On 09/06/2017 04:03 PM, Boris Ostrovsky wrote:

On 09/06/2017 02:19 PM, Borislav Petkov wrote:

On Wed, Sep 06, 2017 at 01:06:50PM -0500, Brijesh Singh wrote:

I did the following quick run with your patch and everything seems to be
working okay

64-bit build:
---
1) Baremetal SME *enabled* - System boots fine
  a) 32-bit guest launch : successful (KVM HV)
  b) 64-bit guest launch : successful (KVM HV)
  c) 64-bit SEV guest launch : successful (KVM HV)

2) Baremetal SME *disabled* - System boots fine
  a) 32-bit guest launch : successful (KVM HV)
  b) 64-bit guest launch : successful (KVM HV)
  c) 64-bit SEV guest launch : successful (KVM HV)

32-bit build
--
I am installing 32-bit distro to verify 32-bit baremetal boot and will
report my findings soon.

Thanks Brijesh, that's awesome!

I'll add your Tested-by once you're done testing successfully.




32-bit seems to be working well - thanks

-Brijesh



You can have my Tested-by (mostly Xen but some baremetal too).

-boris



Re: [RFC Part2 PATCH v3 01/26] Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV)

2017-09-06 Thread Brijesh Singh



On 09/06/2017 11:41 AM, Borislav Petkov wrote:

On Tue, Sep 05, 2017 at 04:39:14PM -0500, Brijesh Singh wrote:

Not sure if we need to document the complete measurement flow in the
driver doc.


No, not the whole thing - only summarized in a couple of sentences with
the link to the doc.



Will do.



I was trying to keep everything to 80 column limit but if that is
not an issue for documentation then I like your recommendation.


That rule is not a hard one - rather, it is to human discretion what
is better - readability or fitting on some small screen, no one uses
anymore.



I will follow your recommendation



The command does not require explicit parameter to differentiate between
live migration vs snapshot. All it needs is a destination platform
PDH key. If its live migration case then VM management stack will probably
communicate with remote platform and get its PDH keys before calling us.
The KVM driver simply acts upon the request from the userspace. SEV firmware
spec Appendix A [1] provides complete flow diagram which need to be implemented
in userspace. The driver simply act upon when it asked to create SEND_START
context.


Ok, so that only creates the context after sending the PDH cert into the
firmware. So please state that first and then what the command can be
used for. The way it is written now, it reads like it does the sending
of the guest.



Will clarify it in documentation.



Re: [RFC Part2 PATCH v3 02/26] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-09-06 Thread Brijesh Singh

Hi Boris,


On 09/06/2017 12:00 PM, Borislav Petkov wrote:

...


--
|diff --git a/drivers/crypto/ccp/sp-dev.c b/drivers/crypto/ccp/sp-dev.c
|index a017233..d263ba4 100644
|--- a/drivers/crypto/ccp/sp-dev.c
|+++ b/drivers/crypto/ccp/sp-dev.c
--

What tree is that against? In any case, it doesn't apply here.


This RFC is based on tip/master commit : 22db3de (Merge branch 'x86/mm').




This bit of my struggle -- tip/master is not in sync with cryptodev-2.6 [1].
In order to expand the CCP driver we need the following commits from the
cryptodev-2.6

57de3aefb73f crypto: ccp - remove ccp_present() check from device initialize
d0ebbc0c407a crypto: ccp - rename ccp driver initialize files as sp device
f4d18d656f88 crypto: ccp - Abstract interrupt registeration
720419f01832 crypto: ccp - Introduce the AMD Secure Processor device
970e8303cb8d crypto: ccp - Use devres interface to allocate PCI/iomap and 
cleanup

I cherry-picked these patches into tip/master before starting the SEV work.

Since these patches were already reviewed and accepted hence I did not include 
it
in my RFC series. I am not sure what is best way to handle it. Should I include
these patches in the series ? or just mention them in cover letter ? I am 
looking
for suggestions on how to best communicate it. thanks

[1] https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/

My staging tree on github contain these precursor patches.



$ git show 22db3de
fatal: ambiguous argument '22db3de': unknown revision or path not in the 
working tree.

Do you have updated version of the series which you can send out?


@@ -67,6 +74,10 @@ struct sp_device {
/* DMA caching attribute support */
unsigned int axcache;
  
+	/* get and set master device */

+   struct sp_device*(*get_psp_master_device)(void);
+   void(*set_psp_master_device)(struct sp_device *);


WARNING: missing space after return type
#502: FILE: drivers/crypto/ccp/sp-dev.h:79:
+   void(*set_psp_master_device)(struct sp_device *);

Don't forget to run all patches through checkpatch. Some of the warnings
make sense.

Thx.



Re: SME/32-bit regression

2017-09-06 Thread Brijesh Singh



On 09/06/2017 11:44 AM, Borislav Petkov wrote:

On Wed, Sep 06, 2017 at 04:30:23PM +, Lendacky, Thomas wrote:

Sorry for the top post, I'm on holiday and don't have access to a good
email client... I went with unsigned long to match all the page table
related declarations. If changing to u64 doesn't generate any warnings
or other issues them I'm good with that.


Ok, no worries. Lemme run the smoke-tests on it and test it to see
everything else still works.



I did the following quick run with your patch and everything seems to be
working okay

64-bit build:
---
1) Baremetal SME *enabled* - System boots fine
 a) 32-bit guest launch : successful (KVM HV)
 b) 64-bit guest launch : successful (KVM HV)
 c) 64-bit SEV guest launch : successful (KVM HV)

2) Baremetal SME *disabled* - System boots fine
 a) 32-bit guest launch : successful (KVM HV)
 b) 64-bit guest launch : successful (KVM HV)
 c) 64-bit SEV guest launch : successful (KVM HV)

32-bit build
--
I am installing 32-bit distro to verify 32-bit baremetal boot and will
report my findings soon.

-Brijesh


Re: [RFC Part2 PATCH v3 01/26] Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV)

2017-09-05 Thread Brijesh Singh

Hi Boris,

Thanks for detail review, I have incorporate the spell check
in my work flow and will be fixing all those spell check errors
innext rev.


On 09/05/2017 12:21 PM, Borislav Petkov wrote:

[...]



+3. KVM_SEV_LAUNCH_MEASURE
+
+Parameters (in): struct  kvm_sev_launch_measure
+Returns: 0 on success, -negative on error
+
+LAUNCH_MEASURE returns the measurement of the memory region encrypted with
+LAUNCH_UPDATE_DATA. The measurement is keyed with the TIK so that the guest
+owner can use the measurement to verify the guest was properly launched without
+tempering.


So this could use a bit more text as it is such an important aspect of
the whole verification of the guest.


+
+struct kvm_sev_launch_measure {
+   /* where to copy the measurement blob */
+   __u64 address;
+
+   /* length of memory region containing measurement */
+   __u32 length;
+};
+
+If measurement length is too small, the required length is returned in the
+length field.
+
+On success, the measurement is copied to the address.


And how is success signalled to the caller?



The measurement verification is performed outside the KVM/Qemu.

From driver point of view, all we have to do is issues LAUNCH_MEASURE
command when userspace asks for the measurement. I can see that command
name is confusing - I am thinking of renaming it to
"KVM_SEV_GET_LAUNCH_MEASUREMENT"

The complete flow is listed in Appendix A of SEV firmware spec [1].

I will update the doc to give SEV spec section references for the details.

Not sure if we need to document the complete measurement flow in the
driver doc.

[...]


+
+4. KVM_SEV_LAUNCH_FINISH
+
+Returns: 0 on success, -negative on error
+
+LAUNCH_FINISH command finalize the SEV guest launch process.


"The KVM_SEV_LAUNCH_FINISH command..."


+
+5. KVM_SEV_GUEST_STATUS
+
+Parameters (out): struct kvm_sev_guest_status


This is an "out" command, so it should be called
KVM_SEV_GET_GUEST_STATUS. Or is it too late for that?



I was trying map with SEV firmware spec command names but I see your
point and will call it "KVM_SEV_GET_GUEST_STATUS".



+
+enum {
+   /* guest state is not known */
+   SEV_STATE_INVALID = 0;


not known or invalid?



Again, was trying to follow the spec naming convention but I can go
with UNKNOWN ..



Btw, side-comments will make this much more readable:

enum {
 SEV_STATE_INVALID = 0,
 SEV_STATE_LAUNCHING,
 SEV_STATE_SECRET,   /* guest is being launched and ready to accept 
the ciphertext data */
 SEV_STATE_RUNNING,  /* guest is fully launched and running */
 SEV_STATE_RECEIVING,/* guest is being migrated in from another SEV 
machine */
 SEV_STATE_SENDING,  /* guest is getting migrated out to another 
SEV machine */
};




I was trying to keep everything to 80 column limit but if that is
not an issue for documentation then I like your recommendation.


[...]


+8. KVM_SEV_SEND_START
+
+Parameters (in): struct kvm_sev_send_start
+Returns: 0 on success, -negative on error
+
+SEND_START command is used to export a SEV guest from one platform to another.


Export or migrate?


+It can be used for saving a guest to disk to be resumed later, or it can be
+used to migrate a guest across the network to a receiving platform.


And how do I specify which of those actions needs to happen?



The command does not require explicit parameter to differentiate between
live migration vs snapshot. All it needs is a destination platform
PDH key. If its live migration case then VM management stack will probably
communicate with remote platform and get its PDH keys before calling us.
The KVM driver simply acts upon the request from the userspace. SEV firmware
spec Appendix A [1] provides complete flow diagram which need to be implemented
in userspace. The driver simply act upon when it asked to create SEND_START
context.
 
[1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf




Phew, that took long.



Thank you for detail review.


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-04 Thread Brijesh Singh

On 9/4/17 12:05 PM, Borislav Petkov wrote:
> On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote:
>>  So far, we have not seen the need for having such functions except
>> this cases. The approach we have right now works just fine and not
>> sure if its worth adding new functions.
> Then put the call to kvm_map_hv_shared_decrypted() into
> kvm_smp_prepare_boot_cpu() to denote that you're executing this whole
> stuff only once during guest init.
>
> Now you're doing additional jumping-through-hoops with that once static
> var just so you can force something which needs to execute only once but
> gets called in a per-CPU path.
>
> See what I mean?

Yes, I see your point. I will address this issue in next rev.


-Brijesh


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-02 Thread Brijesh Singh


On 9/1/17 10:21 PM, Andy Lutomirski wrote:
> On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh  wrote:
>> Hi Boris,
>>
>> On 08/30/2017 12:46 PM, Borislav Petkov wrote:
>>> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
>>>> I was trying to avoid mixing early and no-early set_memory_decrypted()
>>>> but if
>>>> feedback is: use early_set_memory_decrypted() only if its required
>>>> otherwise
>>>> use set_memory_decrypted() then I can improve the logic in next rev.
>>>> thanks
>>>
>>> Yes, I think you should use the early versions when you're, well,
>>> *early* :-) But get rid of that for_each_possible_cpu() and do it only
>>> on the current CPU, as this is a per-CPU path anyway. If you need to
>>> do it on *every* CPU and very early, then you need a separate function
>>> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.
>>>
>> I am trying to implement your feedback and now remember why I choose to
>> use early_set_memory_decrypted() and for_each_possible_cpu loop. These
>> percpu variables are static. Hence before clearing the C-bit we must
>> perform the in-place decryption so that original assignment is preserved
>> after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
>> -- which can be used to perform the in-place decryption but we do not have
>> similar routine for non-early cases. In order to address your feedback,
>> we have to add similar functions. So far, we have not seen the need for
>> having such functions except this cases. The approach we have right now
>> works just fine and not sure if its worth adding new functions.
>>
>> Thoughts ?
>>
>> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of
>> memory
> Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED?  ISTM the "HV
> shared" bit is incidental.

Thanks for the suggestion, we could call it DEFINE_PER_CPU_UNENCRYPTED.
I will use it in next rev.

-Brijesh



Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-01 Thread Brijesh Singh

Hi Boris,

On 08/30/2017 12:46 PM, Borislav Petkov wrote:

On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:

I was trying to avoid mixing early and no-early set_memory_decrypted() but if
feedback is: use early_set_memory_decrypted() only if its required otherwise
use set_memory_decrypted() then I can improve the logic in next rev. thanks


Yes, I think you should use the early versions when you're, well,
*early* :-) But get rid of that for_each_possible_cpu() and do it only
on the current CPU, as this is a per-CPU path anyway. If you need to
do it on *every* CPU and very early, then you need a separate function
which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.



I am trying to implement your feedback and now remember why I choose to
use early_set_memory_decrypted() and for_each_possible_cpu loop. These
percpu variables are static. Hence before clearing the C-bit we must
perform the in-place decryption so that original assignment is preserved
after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
-- which can be used to perform the in-place decryption but we do not have
similar routine for non-early cases. In order to address your feedback,
we have to add similar functions. So far, we have not seen the need for
having such functions except this cases. The approach we have right now
works just fine and not sure if its worth adding new functions.

Thoughts ?

[1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of 
memory


-Brijesh


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-08-30 Thread Brijesh Singh

Hi Boris,

On 08/29/2017 05:22 AM, Borislav Petkov wrote:

[...]


On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote:

Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU


   MSRs


variable at compile time and share its physical address with hypervisor.


That sentence needs changing - the MSRs don't allocate - for them gets
allocated.


It presents a challege when SEV is active in guest OS, when SEV is active,
the guest memory is encrypted with guest key hence hypervisor will not
able to modify the guest memory. When SEV is active, we need to clear the
encryption attribute (aka C-bit) of shared physical addresses so that both
guest and hypervisor can access the data.


This whole paragraph needs rewriting.



I will improve the commit message in next rev.

[...]


+/* NOTE: function is marked as __ref because it is used by __init functions */


No need for that comment.

What should you look into is why do you need to call the early versions:

" * producing a warning (of course, no warning does not mean code is
  * correct, so optimally document why the __ref is needed and why it's OK)."

And we do have the normal set_memory_decrypted() etc helpers so why
aren't we using those?



Since kvm_guest_init() is called early in the boot process hence we will not
able to use set_memory_decrypted() function. IIRC, if we try calling
set_memory_decrypted() early then we will hit a BUG_ON [1] -- mainly when it
tries to flush the caches.

[1] 
http://elixir.free-electrons.com/linux/latest/source/arch/x86/mm/pageattr.c#L167




If you need to use the early ones too, then you probably need to
differentiate this in the callers by passing a "bool early", which calls
the proper flavor.



Sure I can rearrange code to make it more readable and use "bool early"
parameter to differentiate it.



+static int __ref kvm_map_hv_shared_decrypted(void)
+{
+   static int once, ret;
+   int cpu;
+
+   if (once)
+   return ret;


So this function gets called per-CPU but you need to do this ugly "once"
thing - i.e., global function called in a per-CPU context.

Why can't you do that mapping only on the current CPU and then
when that function is called on the next CPU, it will do the same thing
on that next CPU?




Yes, it can be done but I remember running into issues during the CPU hot plug.
The patch uses early_set_memory_decrypted() -- which calls
kernel_physical_mapping_init() to split the large pages into smaller. IIRC, the
API did not work after the system is successfully booted. After the system is
booted we must use the set_memory_decrypted().

I was trying to avoid mixing early and no-early set_memory_decrypted() but if
feedback is: use early_set_memory_decrypted() only if its required otherwise
use set_memory_decrypted() then I can improve the logic in next rev. thanks


[...]


diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index da0be9a..52854cf 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -783,6 +783,9 @@
. = ALIGN(cacheline);   \
*(.data..percpu)\
*(.data..percpu..shared_aligned)\
+   . = ALIGN(PAGE_SIZE);   \
+   *(.data..percpu..hv_shared) \
+   . = ALIGN(PAGE_SIZE);   \
VMLINUX_SYMBOL(__per_cpu_end) = .;


Yeah, no, you can't do that. That's adding this section unconditionally
on *every* arch. You need to do some ifdeffery like it is done at the
beginning of that file and have this only on the arch which supports SEV.




Will do . thanks

-Brijesh


Re: [RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

2017-08-28 Thread Brijesh Singh
Hi Boris,


On 8/28/17 5:51 AM, Borislav Petkov wrote:

[..]

> +static int __init early_set_memory_enc_dec(resource_size_t paddr,
>> +   unsigned long size, bool enc)
>> +{
>> +unsigned long vaddr, vaddr_end, vaddr_next;
>> +unsigned long psize, pmask;
>> +int split_page_size_mask;
>> +pte_t *kpte;
>> +int level;
>> +
>> +vaddr = (unsigned long)__va(paddr);
>> +vaddr_next = vaddr;
>> +vaddr_end = vaddr + size;
>> +
>> +/*
>> + * We are going to change the physical page attribute from C=1 to C=0
>> + * or vice versa. Flush the caches to ensure that data is written into
>> + * memory with correct C-bit before we change attribute.
>> + */
>> +clflush_cache_range(__va(paddr), size);
>> +
>> +for (; vaddr < vaddr_end; vaddr = vaddr_next) {
>> +kpte = lookup_address(vaddr, &level);
>> +if (!kpte || pte_none(*kpte))
>> +return 1;
> Return before flushing TLBs? Perhaps you mean
>
>   ret = 1;
>   goto out;
>
> here and out does
>
>   __flush_tlb_all();
>   return ret;

thanks, good catch. I will fix in next rev.

-Brijesh


[tip:x86/mm] kvm/x86: Avoid clearing the C-bit in rsvd_bits()

2017-08-26 Thread tip-bot for Brijesh Singh
Commit-ID:  ea2800ddb20d6e66042051a61f66e6bea4fa0db7
Gitweb: http://git.kernel.org/tip/ea2800ddb20d6e66042051a61f66e6bea4fa0db7
Author: Brijesh Singh 
AuthorDate: Fri, 25 Aug 2017 15:55:40 -0500
Committer:  Ingo Molnar 
CommitDate: Sat, 26 Aug 2017 09:23:00 +0200

kvm/x86: Avoid clearing the C-bit in rsvd_bits()

The following commit:

  d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")

uses __sme_clr() to remove the C-bit in rsvd_bits(). rsvd_bits() is
just a simple function to return some 1 bits. Applying a mask based
on properties of the host MMU is incorrect. Additionally, the masks
computed by __reset_rsvds_bits_mask also apply to guest page tables,
where the C bit is reserved since we don't emulate SME.

The fix is to clear the C-bit from rsvd_bits_mask array after it has been
populated from __reset_rsvds_bits_mask()

Suggested-by: Paolo Bonzini 
Signed-off-by: Brijesh Singh 
Acked-by: Paolo Bonzini 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Radim Krčmář 
Cc: Stephen Rothwell 
Cc: Thomas Gleixner 
Cc: Tom Lendacky 
Cc: k...@vger.kernel.org
Cc: paolo.bonz...@gmail.com
Fixes: d0ec49d ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Link: http://lkml.kernel.org/r/20170825205540.123531-1-brijesh.si...@amd.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/kvm/mmu.c | 30 +++---
 arch/x86/kvm/mmu.h |  2 +-
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccb70b8..04d7508 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4109,16 +4109,28 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
bool uses_nx = context->nx || context->base_role.smep_andnot_wp;
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
 
/*
 * Passing "true" to the last argument is okay; it adds a check
 * on bit 8 of the SPTEs which KVM doesn't use anyway.
 */
-   __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+   shadow_zero_check = &context->shadow_zero_check;
+   __reset_rsvds_bits_mask(vcpu, shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, uses_nx,
guest_cpuid_has_gbpages(vcpu), is_pse(vcpu),
true);
+
+   if (!shadow_me_mask)
+   return;
+
+   for (i = context->shadow_root_level; --i >= 0;) {
+   shadow_zero_check->rsvd_bits_mask[0][i] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[1][i] &= ~shadow_me_mask;
+   }
+
 }
 EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
 
@@ -4136,17 +4148,29 @@ static void
 reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
 {
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
+
+   shadow_zero_check = &context->shadow_zero_check;
+
if (boot_cpu_is_amd())
-   __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+   __reset_rsvds_bits_mask(vcpu, shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, false,
boot_cpu_has(X86_FEATURE_GBPAGES),
true, true);
else
-   __reset_rsvds_bits_mask_ept(&context->shadow_zero_check,
+   __reset_rsvds_bits_mask_ept(shadow_zero_check,
boot_cpu_data.x86_phys_bits,
false);
 
+   if (!shadow_me_mask)
+   return;
+
+   for (i = context->shadow_root_level; --i >= 0;) {
+   shadow_zero_check->rsvd_bits_mask[0][i] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[1][i] &= ~shadow_me_mask;
+   }
 }
 
 /*
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3cc7255..d7d248a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
 
 static inline u64 rsvd_bits(int s, int e)
 {
-   return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
+   return ((1ULL << (e - s + 1)) - 1) << s;
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value);


[PATCH] kvm/x86: Avoid clearing the C-bit in rsvd_bits()

2017-08-25 Thread Brijesh Singh
d0ec49d ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
uses __sme_clr() to remove the C-bit in rsvd_bits(). rsvd_bits() is
just a simple function to return some 1 bits. Applying a mask based
on properties of the host MMU is incorrect. Additionally, the masks
computed by __reset_rsvds_bits_mask also apply to guest page tables,
where the C bit is reserved since we don't emulate SME.

The fix is to clear the C-bit from rsvd_bits_mask array after it has been
populated from __reset_rsvds_bits_mask()

Acked-by: Paolo Bonzini 
Cc: Tom Lendacky 
Cc: Stephen Rothwell 
Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Thomas Gleixner 
Cc: H. Peter Anvin 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Suggested-by: Paolo Bonzini 
Fixes: d0ec49d ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/mmu.c | 30 +++---
 arch/x86/kvm/mmu.h |  2 +-
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccb70b8..04d7508 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4109,16 +4109,28 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
bool uses_nx = context->nx || context->base_role.smep_andnot_wp;
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
 
/*
 * Passing "true" to the last argument is okay; it adds a check
 * on bit 8 of the SPTEs which KVM doesn't use anyway.
 */
-   __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+   shadow_zero_check = &context->shadow_zero_check;
+   __reset_rsvds_bits_mask(vcpu, shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, uses_nx,
guest_cpuid_has_gbpages(vcpu), is_pse(vcpu),
true);
+
+   if (!shadow_me_mask)
+   return;
+
+   for (i = context->shadow_root_level; --i >= 0;) {
+   shadow_zero_check->rsvd_bits_mask[0][i] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[1][i] &= ~shadow_me_mask;
+   }
+
 }
 EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
 
@@ -4136,17 +4148,29 @@ static void
 reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
 {
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
+
+   shadow_zero_check = &context->shadow_zero_check;
+
if (boot_cpu_is_amd())
-   __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+   __reset_rsvds_bits_mask(vcpu, shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, false,
boot_cpu_has(X86_FEATURE_GBPAGES),
true, true);
else
-   __reset_rsvds_bits_mask_ept(&context->shadow_zero_check,
+   __reset_rsvds_bits_mask_ept(shadow_zero_check,
boot_cpu_data.x86_phys_bits,
false);
 
+   if (!shadow_me_mask)
+   return;
+
+   for (i = context->shadow_root_level; --i >= 0;) {
+   shadow_zero_check->rsvd_bits_mask[0][i] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[1][i] &= ~shadow_me_mask;
+   }
 }
 
 /*
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3cc7255..d7d248a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
 
 static inline u64 rsvd_bits(int s, int e)
 {
-   return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
+   return ((1ULL << (e - s + 1)) - 1) << s;
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value);
-- 
2.9.4



Re: linux-next: manual merge of the kvm tree with the tip tree

2017-08-25 Thread Brijesh Singh



On 08/25/2017 03:05 PM, Paolo Bonzini wrote:

On 25/08/2017 18:53, Brijesh Singh wrote:





Neither my version nor yours is correct. :)  The right one has [0][i]
and [1][i] (I inverted the indices by mistake).

With that change, you can include my

Acked-by: Paolo Bonzini 



Ingo,

I assuming that this patch should be sent through the tip since SME support
came from tip. I will be submitting the patch very soon.

-Brijesh


Re: linux-next: manual merge of the kvm tree with the tip tree

2017-08-25 Thread Brijesh Singh

Hi Paolo,


On 08/25/2017 08:57 AM, Tom Lendacky wrote:

On 8/25/2017 1:39 AM, Paolo Bonzini wrote:

On 25/08/2017 06:39, Stephen Rothwell wrote:



First, rsvd_bits is just a simple function to return some 1 bits.  Applying
a mask based on properties of the host MMU is incorrect.

Second, the masks computed by __reset_rsvds_bits_mask also apply to
guest page tables, where the C bit is reserved since we don't emulate
SME.

Something like this:




Thanks for the tip, I have expanded the patch to cover tdp cases and have 
verified
that it works fine with SME enabled KVM. If you are okay with this then I can
send patch.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccb70b8..7a8edc0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4109,16 +4109,30 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
bool uses_nx = context->nx || context->base_role.smep_andnot_wp;
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
 
/*

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccb70b8..7a8edc0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4109,16 +4109,30 @@ void
 reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
 {
bool uses_nx = context->nx || context->base_role.smep_andnot_wp;
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
 
/*

 * Passing "true" to the last argument is okay; it adds a check
 * on bit 8 of the SPTEs which KVM doesn't use anyway.
 */
-   __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+   shadow_zero_check = &context->shadow_zero_check;
+   __reset_rsvds_bits_mask(vcpu, shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, uses_nx,
guest_cpuid_has_gbpages(vcpu), is_pse(vcpu),
true);
+
+   if (!shadow_me_mask)
+   return;
+
+   for (i = context->shadow_root_level; --i >= 0;) {
+   shadow_zero_check->rsvd_bits_mask[i][0] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[i][1] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[i][2] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[i][3] &= ~shadow_me_mask;
+   }
+
 }
 EXPORT_SYMBOL_GPL(reset_shadow_zero_bits_mask);
 
@@ -4136,8 +4150,13 @@ static void

 reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
 {
+   struct rsvd_bits_validate *shadow_zero_check;
+   int i;
+
+   shadow_zero_check = &context->shadow_zero_check;
+
if (boot_cpu_is_amd())
-   __reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+   __reset_rsvds_bits_mask(vcpu, shadow_zero_check,
boot_cpu_data.x86_phys_bits,
context->shadow_root_level, false,
boot_cpu_has(X86_FEATURE_GBPAGES),
@@ -4147,6 +4166,15 @@ reset_tdp_shadow_zero_bits_mask(struct kvm_vcpu *vcpu,
boot_cpu_data.x86_phys_bits,
false);
 
+   if (!shadow_me_mask)

+   return;
+
+   for (i = context->shadow_root_level; --i >= 0;) {
+   shadow_zero_check->rsvd_bits_mask[i][0] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[i][1] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[i][2] &= ~shadow_me_mask;
+   shadow_zero_check->rsvd_bits_mask[i][3] &= ~shadow_me_mask;
+   }
 }
 
 /*

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 3cc7255..d7d248a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -48,7 +48,7 @@
 
 static inline u64 rsvd_bits(int s, int e)

 {
-   return __sme_clr(((1ULL << (e - s + 1)) - 1) << s);
+   return ((1ULL << (e - s + 1)) - 1) << s;
 }
 
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask, u64 mmio_value);







Thanks Paolo, Brijesh and I will test this and make sure everything works
properly with this patch.

Thanks,
Tom



diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2dafd36368cc..e0597d703d72 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4142,16 +4142,24 @@ void
  reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
  {
  bool uses_nx = context->nx || context->base_role.smep_andnot_wp;
+struct rsvd_bits_validate *shadow_zero_check;
+int i;
  /*
   * Passing "true" to the last argument is okay; it adds a check
   * on bit 8 of the SPTEs which KVM doesn't use anyway.
   */
-__reset_rsvds_bits_mask(vcpu, &context->shadow_zero_check,
+shadow_zero_check = &context->shadow_zero_check;
+__reset_rsvds_bits_mask(vcpu, shad

Re: [PATCH v2] KVM: x86: Avoid guest page table walk when gpa_available is set

2017-08-08 Thread Brijesh Singh

Hi Radim,


On 07/20/2017 02:43 AM, Radim Krčmář wrote:

2017-07-19 08:35-0500, Brijesh Singh:

On 07/19/2017 06:19 AM, Radim Krčmář wrote:

2017-07-17 16:32-0500, Brijesh Singh:

Hi Paolo and Radim

Any comments on this patch, I could not find it in 4.13-2 branch.

Please let me know if you want to fix something, or want me to
refresh and resend the patch.


Sorry, I tried it during the merge window, but it didn't pass tests on
VMX and I got distracted by other bugs before looking into the cause.

Can you reproduce the fail?



No worries, thanks.

I can try to reproduce it, are you running kvm-unittest or something different?


I noticed that a linux guest hung in early boot, but at least (io)apic
kvm-unit-tests failed as well, IIRC.


IIRC, VMX does not set the gpa_available flag hence I am wondering what did I 
miss
in the patch to trigger the failure. I will debug it and let you know.


It does now, in ept_violation and ept_misconfig,



I am able to reproduce the issue on VMX, Sorry it took a bit longer to verify
it.

I was not aware that VMX is also making use of gpa_available flag hence I missed
updating the vmx.c to set the gpa_val. After applying the below small patch I am
able to boot the guest on Intel Xeon E5-2665.

Additionally, there was one issue in current patch pointed by Paolo [1]. If 
patch
was using vcpu->arch.gpa_val check as pointed by Paolo then on VMX we will 
silently
fallback to guest page table walk (even when gpa_available is set). I guess 
since I
have testing my code on SVM platform hence never caught the error. I will soon 
send
updated patch.

[1] http://marc.info/?l=kvm&m=150116338725964&w=2

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b5e0b02..9309fbb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6309,6 +6309,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
  ? PFERR_PRESENT_MASK : 0;
 
vcpu->arch.gpa_available = true;

+   vcpu->arch.gpa_val = gpa;
vcpu->arch.exit_qualification = exit_qualification;
 
return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);

@@ -6326,6 +6327,7 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
}
 
ret = handle_mmio_page_fault(vcpu, gpa, true);

+   vcpu->arch.gpa_val = gpa;
vcpu->arch.gpa_available = true;
if (likely(ret == RET_MMIO_PF_EMULATE))
return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==



[PATCH] KVM: SVM: Limit PFERR_NESTED_GUEST_PAGE error_code check to L1 guest

2017-08-07 Thread Brijesh Singh
Commit: 1472775 (kvm: svm: Add support for additional SVM NPF error codes)
added new error code to aid nested page fault handling. The commit
unprotect (kvm_mmu_unprotect_page) the page when we get a NFP due to
guest page table walk where the page was marked RO.

Paolo highlighted a use case,  where an L0->L2 shadow nested page table
is marked read-only, in particular when a page is read only in L1's nested
page table. If such a page is accessed by L2 while walking page tables
it can cause a nested page fault (page table walks are write accessed).
However, after kvm_mmu_unprotect_page we may get another page fault, and
again in an endless stream.

To cover this use case, we qualify the new error_code check with
vcpu->arch.mmu_direct_map so that the error_code check would run on L1
guest, and not the L2 guest. This would restrict it avoid hitting the above
use case.

Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Thomas Lendacky 
Signed-off-by: Brijesh Singh 
---

See http://marc.info/?l=kvm&m=150153155519373&w=2 for detail discussion on the 
use case and code flow.

 arch/x86/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9b1dd11..4aaa4aa 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4839,7 +4839,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u64 error_code,
 * Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
 *   in PFERR_NEXT_GUEST_PAGE)
 */
-   if (error_code == PFERR_NESTED_GUEST_PAGE) {
+   if (vcpu->arch.mmu.direct_map &&
+   (error_code == PFERR_NESTED_GUEST_PAGE)) {
kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
return 1;
}
-- 
2.9.4



Re: [PATCH v2 1/3] kvm: svm: Add support for additional SVM NPF error codes

2017-08-04 Thread Brijesh Singh

Hi Paolo,

On 08/04/2017 09:05 AM, Paolo Bonzini wrote:

On 04/08/2017 02:30, Brijesh Singh wrote:



On 8/2/17 5:42 AM, Paolo Bonzini wrote:

On 01/08/2017 15:36, Brijesh Singh wrote:

The flow is:

hardware walks page table; L2 page table points to read only memory
-> pf_interception (code =
-> kvm_handle_page_fault (need_unprotect = false)
-> kvm_mmu_page_fault
-> paging64_page_fault (for example)
   -> try_async_pf
  map_writable set to false
   -> paging64_fetch(write_fault = true, map_writable = false,
prefault = false)
  -> mmu_set_spte(speculative = false, host_writable = false,
write_fault = true)
 -> set_spte
mmu_need_write_protect returns true
return true
 write_fault == true -> set emulate = true
 return true
  return true
   return true
emulate

Without this patch, emulation would have called

..._gva_to_gpa_nested
-> translate_nested_gpa
-> paging64_gva_to_gpa
-> paging64_walk_addr
-> paging64_walk_addr_generic
   set fault (nested_page_fault=true)

and then:

 kvm_propagate_fault
 -> nested_svm_inject_npf_exit


maybe then safer thing would be to qualify the new error_code check with
!mmu_is_nested(vcpu) or something like that. So that way it would run on
L1 guest, and not the L2 guest. I believe that would restrict it avoid
hitting this case. Are you okay with this change ?

Or check "vcpu->arch.mmu.direct_map"?  That would be true when not using
shadow pages.


Yes that can be used.


Are you going to send a patch for this?



Yes. I should be posting it by Monday or Tuesday - need sometime to verify it.

-Brijesh


Re: [PATCH v2 1/3] kvm: svm: Add support for additional SVM NPF error codes

2017-08-03 Thread Brijesh Singh


On 8/2/17 5:42 AM, Paolo Bonzini wrote:
> On 01/08/2017 15:36, Brijesh Singh wrote:
>>> The flow is:
>>>
>>>hardware walks page table; L2 page table points to read only memory
>>>-> pf_interception (code =
>>>-> kvm_handle_page_fault (need_unprotect = false)
>>>-> kvm_mmu_page_fault
>>>-> paging64_page_fault (for example)
>>>   -> try_async_pf
>>>  map_writable set to false
>>>   -> paging64_fetch(write_fault = true, map_writable = false,
>>> prefault = false)
>>>  -> mmu_set_spte(speculative = false, host_writable = false,
>>> write_fault = true)
>>> -> set_spte
>>>mmu_need_write_protect returns true
>>>return true
>>> write_fault == true -> set emulate = true
>>> return true
>>>  return true
>>>   return true
>>>emulate
>>>
>>> Without this patch, emulation would have called
>>>
>>>..._gva_to_gpa_nested
>>>-> translate_nested_gpa
>>>-> paging64_gva_to_gpa
>>>-> paging64_walk_addr
>>>-> paging64_walk_addr_generic
>>>   set fault (nested_page_fault=true)
>>>
>>> and then:
>>>
>>> kvm_propagate_fault
>>> -> nested_svm_inject_npf_exit
>>>
>> maybe then safer thing would be to qualify the new error_code check with
>> !mmu_is_nested(vcpu) or something like that. So that way it would run on
>> L1 guest, and not the L2 guest. I believe that would restrict it avoid
>> hitting this case. Are you okay with this change ?
> Or check "vcpu->arch.mmu.direct_map"?  That would be true when not using
> shadow pages.

Yes that can be used.

>> IIRC, the main place where this check was valuable was when L1 guest had
>> a fault (when coming out of the L2 guest) and emulation was not needed.
> How do I measure the effect?  I tried counting the number of emulations,
> and any difference from the patch was lost in noise.

I think this patch is necessary for functional reasons (not just perf), because 
we added the other patch to look at the GPA and stop walking the guest page 
tables on a NPF.

The issue I think was that hardware has taken an NPF because the page table is 
marked RO, and it saves the GPA in the VMCB.  KVM was then going and emulating 
the instruction and it saw that a GPA was available.  But that GPA was not the 
GPA of the instruction it is emulating, since it was the GPA of the tablewalk 
page that had the fault. It was debugged that at the time and realized that 
emulating the instruction was unnecessary so we added this new code in there 
which fixed the functional issue and helps perf.

I don't have any data on how much perf, as I recall it was most effective when 
the L1 guest page tables and L2 nested page tables were exactly the same.  In 
that case, it avoided emulations for code that L1 executes which I think could 
be as much as one emulation per 4kb code page.



Re: [PATCH v2 1/3] kvm: svm: Add support for additional SVM NPF error codes

2017-08-01 Thread Brijesh Singh



On 07/31/2017 03:05 PM, Paolo Bonzini wrote:



There can be different cases where an L0->L2 shadow nested page table is
marked read only, in particular when a page is read only in L1's nested
page tables.  If such a page is accessed by L2 while walking page tables
it will cause a nested page fault (page table walks are write accesses).
   However, after kvm_mmu_unprotect_page you will get another page fault,
and again in an endless stream.

Instead, emulation would have caused a nested page fault vmexit, I think.


If possible could you please give me some pointer on how to create this use
case so that we can get definitive answer.

Looking at the code path is giving me indication that the new code
(the kvm_mmu_unprotect_page call) only happens if vcpu->arch.mmu_page_fault()
returns an indication that the instruction should be emulated. I would not
expect that to be the case scenario you described above since L1 making a page
read-only (this is a page table for L2) is an error and should result in #NPF
being injected into L1.


The flow is:

   hardware walks page table; L2 page table points to read only memory
   -> pf_interception (code =
   -> kvm_handle_page_fault (need_unprotect = false)
   -> kvm_mmu_page_fault
   -> paging64_page_fault (for example)
  -> try_async_pf
 map_writable set to false
  -> paging64_fetch(write_fault = true, map_writable = false, prefault = 
false)
 -> mmu_set_spte(speculative = false, host_writable = false, 
write_fault = true)
-> set_spte
   mmu_need_write_protect returns true
   return true
write_fault == true -> set emulate = true
return true
 return true
  return true
   emulate

Without this patch, emulation would have called

   ..._gva_to_gpa_nested
   -> translate_nested_gpa
   -> paging64_gva_to_gpa
   -> paging64_walk_addr
   -> paging64_walk_addr_generic
  set fault (nested_page_fault=true)

and then:

kvm_propagate_fault
-> nested_svm_inject_npf_exit



maybe then safer thing would be to qualify the new error_code check with
!mmu_is_nested(vcpu) or something like that. So that way it would run on
L1 guest, and not the L2 guest. I believe that would restrict it avoid
hitting this case. Are you okay with this change ?

IIRC, the main place where this check was valuable was when L1 guest had
a fault (when coming out of the L2 guest) and emulation was not needed.

-Brijesh


Re: [PATCH v2 1/3] kvm: svm: Add support for additional SVM NPF error codes

2017-07-31 Thread Brijesh Singh


On 07/31/2017 10:44 AM, Paolo Bonzini wrote:

On 31/07/2017 15:30, Brijesh Singh wrote:

Hi Paolo,

On 07/27/2017 11:27 AM, Paolo Bonzini wrote:

On 23/11/2016 18:01, Brijesh Singh wrote:

   +/*
+ * Before emulating the instruction, check if the error code
+ * was due to a RO violation while translating the guest page.
+ * This can occur when using nested virtualization with nested
+ * paging in both guests. If true, we simply unprotect the page
+ * and resume the guest.
+ *
+ * Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
+ *   in PFERR_NEXT_GUEST_PAGE)
+ */
+if (error_code == PFERR_NESTED_GUEST_PAGE) {
+kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
+return 1;
+}



What happens if L1 is mapping some memory that is read only in L0?  That
is, the L1 nested page tables make it read-write, but the L0 shadow
nested page tables make it read-only.

Accessing it would cause an NPF, and then my guess is that the L1 guest
would loop on the failing instruction instead of just dropping the write.




Not sure if I am able to follow your use case. Could you please explain me
in bit detail.

The purpose of the code above was really for when we resume from the L2 guest
back to the L1 guest. The L1 page tables are marked RO when in the L2 guest
(for shadow paging) as I recall, so when we come back to the L1 guest, it can
get a fault since its page tables are not marked writeable at L0 as they
need to be.


There can be different cases where an L0->L2 shadow nested page table is
marked read only, in particular when a page is read only in L1's nested
page tables.  If such a page is accessed by L2 while walking page tables
it will cause a nested page fault (page table walks are write accesses).
  However, after kvm_mmu_unprotect_page you will get another page fault,
and again in an endless stream.

Instead, emulation would have caused a nested page fault vmexit, I think.



If possible could you please give me some pointer on how to create this use
case so that we can get definitive answer.

Looking at the code path is giving me indication that the new code
(the kvm_mmu_unprotect_page call) only happens if vcpu->arch.mmu_page_fault()
returns an indication that the instruction should be emulated. I would not 
expect
that to be the case scenario you described above since L1 making a page 
read-only
(this is a page table for L2) is an error and should result in #NPF being 
injected
into L1. It's bit hard for me to visualize the code flow and figure out exactly
how that would happen, but I just tried booting nested virtualization and it 
seem
to be working okay.

Is there a kvm-unit-test which I can run to trigger this scenario ? thanks

-Brijesh


Re: [PATCH v2 1/3] kvm: svm: Add support for additional SVM NPF error codes

2017-07-31 Thread Brijesh Singh

Hi Paolo,

On 07/27/2017 11:27 AM, Paolo Bonzini wrote:

On 23/11/2016 18:01, Brijesh Singh wrote:
  
+	/*

+* Before emulating the instruction, check if the error code
+* was due to a RO violation while translating the guest page.
+* This can occur when using nested virtualization with nested
+* paging in both guests. If true, we simply unprotect the page
+* and resume the guest.
+*
+* Note: AMD only (since it supports the PFERR_GUEST_PAGE_MASK used
+*   in PFERR_NEXT_GUEST_PAGE)
+*/
+   if (error_code == PFERR_NESTED_GUEST_PAGE) {
+   kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2));
+   return 1;
+   }



What happens if L1 is mapping some memory that is read only in L0?  That
is, the L1 nested page tables make it read-write, but the L0 shadow
nested page tables make it read-only.

Accessing it would cause an NPF, and then my guess is that the L1 guest
would loop on the failing instruction instead of just dropping the write.




Not sure if I am able to follow your use case. Could you please explain me
in bit detail.

The purpose of the code above was really for when we resume from the L2 guest
back to the L1 guest. The L1 page tables are marked RO when in the L2 guest
(for shadow paging) as I recall, so when we come back to the L1 guest, it can
get a fault since its page tables are not marked writeable at L0 as they need 
to be.

-Brijesh


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Brijesh Singh



On 07/26/2017 02:26 PM, H. Peter Anvin wrote:


   \

   static inline void outs##bwl(int port, const void *addr, unsigned

long count) \

   {


This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/


Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling

a real

function will be significant.
The code bloat reduction will be significant.


I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is

actually

still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the

early

console.



There are some indirect user of string I/O functions. The following
functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this
approach,
I have added a new file arch/x86/kernel/io.c which provides non rep
version of
string I/O routines. The file gets built and used only when
AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with
AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
a function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)
\
  unsigned type value = in##bwl(port);\
  slow_down_io(); \
  return value;   \
-}
\
-
\
+}
+
+#define BUILDIO_REP(bwl, bw, type)
\
static inline void outs##bwl(int port, const void *addr, unsigned long
count) \
{
\
  asm volatile("rep; outs" #bwl   \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
unsigned long count)\
{
\
  asm volatile("rep; ins" #bwl\
   : "+D"(addr), "+c"(count) : "d"(port));\
-}
+}
\
  
  BUILDIO(b, b, char)

  BUILDIO(w, w, short)
  BUILDIO(l, , int)
  
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+extern void outsb_try_rep(int port, const void *addr, unsigned long
count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long
count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long
count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb  outsb_try_rep
+#define insb   insb_try_rep
+#define outsw  outsw_try_rep
+#define insw   insw_try_rep
+#define outsl  outsl_try_rep
+#define insl   insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
  extern void *xlate_dev_mem_ptr(phys_addr_t phys);
  extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -42,6 +42,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
  
  obj-y  := process_$(BITS).o signal.o

  obj-$(CONFIG_COMPAT)   += signal_compat.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += io.o
obj-y  += traps.o irq.o irq_$(BITS).o
dumpstack_$(BITS).o
  obj-y  += time.o ioport.o dumpstack.o nmi.o
  obj-$(CONFIG_MODIFY_LDT_SYSCALL)   += ldt.o
diff --git a/arch/x86/kernel/io.c b/arch/x86/kernel/io.c
new file mode 100644
index 000..f58afa9
--- /dev/null
+++ b/arch/x86/kernel/io.c
@@ -0,0 +1,87 @@
+#include 
+#include 
+#include 
+
+void outsb_try_rep(int port, const void *addr, unsigned long count)
+{
+   if (sev_active()) {
+   unsigned char *value = (unsigned char *)addr;
+   while (count) {
+   outb(*value, port);
+   value++;
+   count--;
+   }
+   } else {
+   asm volatile("rep; outsb" : "+S"(addr), "+c"(count) :
"d"(port));
+   }
+}
+
+void insb_try_rep(int port, void *addr, unsigned long count)
+{
+   if (sev_active()) {
+   unsigned char *value = (unsigned char *)addr;
+  

Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Brijesh Singh


Hi Arnd and David,

On 07/26/2017 05:45 AM, Arnd Bergmann wrote:

On Tue, Jul 25, 2017 at 11:51 AM, David Laight  wrote:

From: Brijesh Singh

Sent: 24 July 2017 20:08
From: Tom Lendacky 

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/include/asm/io.h | 26 ++
  1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..2f3c002 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) 
  \
   \
  static inline void outs##bwl(int port, const void *addr, unsigned long count) 
\
  {


This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/


Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling a real
function will be significant.
The code bloat reduction will be significant.


I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is actually
still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the early
console.



There are some indirect user of string I/O functions. The following functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this approach,
I have added a new file arch/x86/kernel/io.c which provides non rep version of
string I/O routines. The file gets built and used only when AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make a 
function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)   
\
unsigned type value = in##bwl(port);\
slow_down_io(); \
return value;   \
-}  \
-   \
+}
+
+#define BUILDIO_REP(bwl, bw, type) \
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {  \
asm volatile("rep; outs" #bwl   \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr, 
unsigned long count)\
 {  \
asm volatile("rep; ins" #bwl\
 : "+D"(addr), "+c"(count) : "d"(port));\
-}
+}  \
 
 BUILDIO(b, b, char)

 BUILDIO(w, w, short)
 BUILDIO(l, , int)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+extern void outsb_try_rep(int port, const void *addr, unsigned long count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb  outsb_try_rep
+#define insb   insb_try_rep
+#define outsw  outsw_try_rep
+#define insw   insw_try_rep
+#define outsl  outsl_try_rep
+#define insl   insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
 extern void *xlate_dev_mem_ptr(phys_addr_t phys);
 extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/M

Re: [RFC Part2 PATCH v3 02/26] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-07-25 Thread Brijesh Singh


On 07/25/2017 03:29 AM, Kamil Konieczny wrote:

Hi,

minor misspelling,

On 24.07.2017 22:02, Brijesh Singh wrote:

Platform Security Processor (PSP) is part of AMD Secure Processor (AMD-SP),
PSP is a dedicated processor that provides the support for key management
commands in a Secure Encrypted Virtualiztion (SEV) mode, along with
software-based Tursted Executation Environment (TEE) to enable the

- ^ Trusted

third-party tursted applications.

-- ^ trusted
[...]



Noted. thanks

-Brijesh


Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption

2017-07-25 Thread Brijesh Singh



On 07/25/2017 12:45 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:41PM -0500, Brijesh Singh wrote:

Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure 
Encrypted Virtualization (SEV) descrption

 ^^

Please introduce a spellchecker into your workflow.


Update amd-memory-encryption document describing the AMD Secure Encrypted


"Update the AMD memory encryption document...

The patch has the proper URL already.


Virtualization (SEV) feature.

Signed-off-by: Brijesh Singh 
---
  Documentation/x86/amd-memory-encryption.txt | 29 ++---
  1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
index f512ab7..747df07 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -1,4 +1,5 @@
-Secure Memory Encryption (SME) is a feature found on AMD processors.
+Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
+features found on AMD processors.
  
  SME provides the ability to mark individual pages of memory as encrypted using

  the standard x86 page tables.  A page that is marked encrypted will be
@@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted when 
written to
  DRAM.  SME can therefore be used to protect the contents of DRAM from physical
  attacks on the system.
  
+SEV enables running encrypted virtual machine (VMs) in which the code and data


 machines


+of the virtual machine are secured so that decrypted version is available only


... of the guest VM ...   ... so that a decrypted ...


+within the VM itself. SEV guest VMs have concept of private and shared memory.


have *the* concept - you need to use
definite and indefinite articles in your
text.


+Private memory is encrypted with the guest-specific key, while shared memory
+may be encrypted with hypervisor key.


And here you explain that the hypervisor key is the same key which we
use in SME. So that people can make the connection.


+
  A page is encrypted when a page table entry has the encryption bit set (see
  below on how to determine its position).  The encryption bit can also be
  specified in the cr3 register, allowing the PGD table to be encrypted. Each
@@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption 
bit in the PGD entry
  for a PUD which results in the PUD pointed to by that entry to not be
  encrypted.
  
-Support for SME can be determined through the CPUID instruction. The CPUID

-function 0x801f reports information related to SME:
+When SEV is enabled, certain type of memory (namely insruction pages and guest


When SEV is enabled, instruction pages and guest page tables are ...


+page tables) are always treated as private. Due to security reasons all DMA


security reasons??


+operations inside the guest must be performed on shared memory. Since the
+memory encryption bit is only controllable by the guest OS when it is operating


 ... is controlled ...


+in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces memory


... forces the 
memory ...


+encryption bit to 1.
+
+Support for SME and SEV can be determined through the CPUID instruction. The
+CPUID function 0x801f reports information related to SME:
  
  	0x801f[eax]:

Bit[0] indicates support for SME
+   0x81f[eax]:


There's a 0 missing and you don't really need it as it is already above.


+   Bit[1] indicates support for SEV
0x801f[ebx]:
Bits[5:0]  pagetable bit number used to activate memory
   encryption
@@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory 
encryption:
Bit[23]   0 = memory encryption features are disabled
  1 = memory encryption features are enabled
  
+If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine if


If this MSR is going to be part of the architecture - and I really think
it is - then call it MSR_AMD64_SEV.



Thanks Boris, I'll update the doc per your feedbacks. And will rename the MSR to
MSR_AMD64_SEV.

-Brijesh


[RFC Part2 PATCH v3 25/26] KVM: SVM: Do not install #UD intercept when SEV is enabled

2017-07-24 Thread Brijesh Singh
On #UD, x86_emulate_instruction() fetches the data from guest memory and
decodes the instruction bytes to assist further. When SEV is enabled, the
instruction bytes will be encrypted using the guest-specific key, hypervisor
will no longer able to fetch the instruction bytes to assist UD handling.
By not installing intercept we let the guest receive and handle #UD.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 64b9f60..4581d03 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1432,8 +1432,10 @@ static void init_vmcb(struct vcpu_svm *svm)
svm->vmcb->control.virt_ext |= 
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
}
 
-   if (sev_guest(svm->vcpu.kvm))
+   if (sev_guest(svm->vcpu.kvm)) {
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
+   clr_exception_intercept(svm, UD_VECTOR);
+   }
 
mark_all_dirty(svm->vmcb);
 
-- 
2.9.4



[RFC Part2 PATCH v3 26/26] KVM: X86: Restart the guest when insn_len is zero and SEV is enabled

2017-07-24 Thread Brijesh Singh
On AMD platform, under certain conditions insn_len may be zero on #NPF.
This can happen if guest gets a page-fault on data access, but HW table
walker is not able to read the instruction page (e.g instuction page
is not present in memory).

Typically, when insn_len is zero, x86_emulate_instruction() walks the
guest page table and fetches the instruction bytes from guest memory.
When SEV is enabled, the guest memory is encrypted with guest-specific
key hence hypervisor will not able to fetch the instruction bytes.
In those cases we simply restart the guest.

I have encountered this issue when running kernbench inside the guest.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/mmu.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ccb70b8..be41ad0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4850,6 +4850,23 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u64 error_code,
if (mmio_info_in_cache(vcpu, cr2, direct))
emulation_type = 0;
 emulate:
+   /*
+* On AMD platform, under certain conditions insn_len may be zero on 
#NPF.
+* This can happen if guest gets a page-fault on data access, but HW 
table
+* walker is not able to read the instruction page (e.g instuction page
+* is not present).
+*
+* Typically, when insn_len is zero, x86_emulate_instruction() walks the
+* guest page table and fetches the instruction bytes. When SEV is 
active,
+* the guest memory is encrypted with guest key hence we will not able 
to
+* fetch the instruction bytes. In those cases we simply restart the 
guest.
+*/
+   if (unlikely(!insn_len)) {
+   if (kvm_x86_ops->memory_encryption_enabled &&
+   kvm_x86_ops->memory_encryption_enabled(vcpu))
+   return 1;
+   }
+
er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len);
 
switch (er) {
-- 
2.9.4



[RFC Part2 PATCH v3 20/26] KVM: SVM: Add support for SEV DEBUG_DECRYPT command

2017-07-24 Thread Brijesh Singh
The command is used for decrypting a guest memory region for debug
purposes.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 160 +
 1 file changed, 160 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 21f85e1..933384a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6058,6 +6058,162 @@ static int sev_guest_status(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int __sev_dbg_enc_dec(struct kvm *kvm, unsigned long src,
+unsigned long dst, int size, int *error, bool enc)
+{
+   struct sev_data_dbg *data;
+   int ret;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = sev_get_handle(kvm);
+   data->dst_addr = dst;
+   data->src_addr = src;
+   data->length = size;
+
+   ret = sev_issue_cmd(kvm,
+   enc ? SEV_CMD_DBG_ENCRYPT : SEV_CMD_DBG_DECRYPT,
+   data, error);
+   kfree(data);
+   return ret;
+}
+
+/*
+ * Decrypt source memory into userspace or kernel buffer. If destination buffer
+ * or len is not aligned to 16-byte boundary then it uses intermediate buffer.
+ */
+static int __sev_dbg_dec(struct kvm *kvm, unsigned long paddr,
+unsigned long __user dst_uaddr,
+unsigned long dst_kaddr, unsigned long dst_paddr,
+int size, int *error)
+{
+   int ret, offset, len = size;
+   struct page *tpage = NULL;
+
+   /*
+* Debug command works with 16-byte aligned inputs, check if all inputs
+* (src, dst and len) are 16-byte aligned. If one of the input is not
+* aligned then we decrypt more than requested into a temporary buffer
+* and copy the porition of data into destination buffer.
+*/
+   if (!IS_ALIGNED(paddr, 16) || !IS_ALIGNED(dst_paddr, 16) ||
+   !IS_ALIGNED(size, 16)) {
+   tpage = (void *)alloc_page(GFP_KERNEL);
+   if (!tpage)
+   return -ENOMEM;
+
+   dst_paddr = __sme_page_pa(tpage);
+
+   /*
+* if source buffer is not aligned then offset will be used
+* when copying the data from the temporary buffer into
+* destination buffer.
+*/
+   offset = paddr & 15;
+
+   /* its safe to read more than requested size. */
+   len = round_up(size + offset, 16);
+
+   paddr = round_down(paddr, 16);
+   }
+
+   ret = __sev_dbg_enc_dec(kvm, paddr, dst_paddr, len, error, false);
+   /*
+* If temporary buffer is used then copy the data from temporary buffer
+* into destination buffer.
+*/
+   if (tpage) {
+
+   /*
+* If destination buffer is a userspace buffer then use
+* copy_to_user otherwise memcpy.
+*/
+   if (dst_uaddr) {
+   if (copy_to_user((uint8_t *)dst_uaddr,
+   page_address(tpage) + offset, size))
+   ret = -EFAULT;
+   } else {
+   memcpy((void *)dst_kaddr,
+   page_address(tpage) + offset, size);
+   }
+
+   __free_page(tpage);
+   }
+
+   return ret;
+}
+
+static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   unsigned long vaddr, vaddr_end, next_vaddr;
+   unsigned long dst_vaddr, dst_vaddr_end;
+   struct page **srcpage, **dstpage;
+   struct kvm_sev_dbg debug;
+   unsigned long n;
+   int ret, size;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(&debug, (void *)argp->data,
+   sizeof(struct kvm_sev_dbg)))
+   return -EFAULT;
+
+   vaddr = debug.src_addr;
+   size = debug.length;
+   vaddr_end = vaddr + size;
+   dst_vaddr = debug.dst_addr;
+   dst_vaddr_end = dst_vaddr + size;
+
+   for (; vaddr < vaddr_end; vaddr = next_vaddr) {
+   int len, s_off, d_off;
+
+   /* lock userspace source and destination page */
+   srcpage = sev_pin_memory(vaddr & PAGE_MASK, PAGE_SIZE, &n, 0);
+   if (!srcpage)
+   return -EFAULT;
+
+   dstpage = sev_pin_memory(dst_vaddr & PAGE_MASK, PAGE_SIZE,
+   &n, 1);
+   if (!dstpage) {
+   sev_unpin_memory(srcpage, n);
+   return -EFAULT;
+   }
+
+   /* flush the caches to ensure that DRAM has recent contents */
+   sev_clflush_pages(srcpage, 1);
+ 

[RFC Part2 PATCH v3 21/26] KVM: SVM: Add support for SEV DEBUG_ENCRYPT command

2017-07-24 Thread Brijesh Singh
The command copies a plain text into guest memory and encrypts it using
the VM encryption key. The command will be used for debug purposes
(e.g setting breakpoint through gdbserver)

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 174 +
 1 file changed, 174 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 933384a..75dcaa9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6214,6 +6214,176 @@ static int sev_dbg_decrypt(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int __sev_dbg_enc(struct kvm *kvm, unsigned long __user vaddr,
+unsigned long paddr, unsigned long __user dst_vaddr,
+unsigned long dst_paddr, int size, int *error)
+{
+   struct page *src_tpage = NULL;
+   struct page *dst_tpage = NULL;
+   int ret, len = size;
+
+   /*
+* Debug encrypt command works with 16-byte aligned inputs. Function
+* handles the alingment issue as below:
+*
+* case 1
+*  If source buffer is not 16-byte aligned then we copy the data from
+*  source buffer into a PAGE aligned intermediate (src_tpage) buffer
+*  and use this intermediate buffer as source buffer
+*
+* case 2
+*  If destination buffer or length is not 16-byte aligned then:
+*   - decrypt portion of destination buffer into intermediate buffer
+* (dst_tpage)
+*   - copy the source data into intermediate buffer
+*   - use the intermediate buffer as source buffer
+*/
+
+   /* If source is not aligned  (case 1) */
+   if (!IS_ALIGNED(vaddr, 16)) {
+   src_tpage = alloc_page(GFP_KERNEL);
+   if (!src_tpage)
+   return -ENOMEM;
+
+   if (copy_from_user(page_address(src_tpage),
+   (uint8_t *)vaddr, size)) {
+   __free_page(src_tpage);
+   return -EFAULT;
+   }
+   paddr = __sme_page_pa(src_tpage);
+
+   /* flush the caches to ensure that DRAM has recent contents */
+   clflush_cache_range(page_address(src_tpage), PAGE_SIZE);
+   }
+
+   /* If destination buffer or length is not aligned (case 2) */
+   if (!IS_ALIGNED(dst_vaddr, 16) || !IS_ALIGNED(size, 16)) {
+   int dst_offset;
+
+   dst_tpage = alloc_page(GFP_KERNEL);
+   if (!dst_tpage) {
+   ret = -ENOMEM;
+   goto e_free;
+   }
+
+   /* decrypt destination buffer into intermediate buffer */
+   ret = __sev_dbg_dec(kvm,
+   round_down(dst_paddr, 16),
+   0,
+   (unsigned long)page_address(dst_tpage),
+   __sme_page_pa(dst_tpage),
+   round_up(size, 16),
+   error);
+   if (ret)
+   goto e_free;
+
+   dst_offset = dst_paddr & 15;
+
+   /*
+* modify the intermediate buffer with data from source
+* buffer.
+*/
+   if (src_tpage)
+   memcpy(page_address(dst_tpage) + dst_offset,
+   page_address(src_tpage), size);
+   else {
+   if (copy_from_user(page_address(dst_tpage) + dst_offset,
+   (void *) vaddr, size)) {
+   ret = -EFAULT;
+   goto e_free;
+   }
+   }
+
+
+   /* use intermediate buffer as source */
+   paddr = __sme_page_pa(dst_tpage);
+
+   /* flush the caches to ensure that DRAM gets recent updates */
+   clflush_cache_range(page_address(dst_tpage), PAGE_SIZE);
+
+   /* now we have length and destination buffer aligned */
+   dst_paddr = round_down(dst_paddr, 16);
+   len = round_up(size, 16);
+   }
+
+   ret = __sev_dbg_enc_dec(kvm, paddr, dst_paddr, len, error, true);
+e_free:
+   if (src_tpage)
+   __free_page(src_tpage);
+   if (dst_tpage)
+   __free_page(dst_tpage);
+   return ret;
+}
+
+static int sev_dbg_encrypt(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   unsigned long vaddr, vaddr_end, dst_vaddr, next_vaddr;
+   struct kvm_sev_dbg debug;
+   int ret, size;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(&debug, (void *)argp->data,
+   sizeof(struct kvm_sev_dbg)))
+   return -EFAULT;
+
+   size = debug.length;
+   vaddr = debug.src_addr;
+   vaddr_

[RFC Part2 PATCH v3 22/26] KVM: SVM: Pin guest memory when SEV is active

2017-07-24 Thread Brijesh Singh
The SEV memory encryption engine uses a tweak such that two identical
plaintexts at different location will have a different ciphertexts.
So swapping or moving ciphertexts of two pages will not result in
plaintexts being swapped. Relocating (or migrating) a physical backing
pages for SEV guest will require some additional steps. The current SEV
key management spec does not provide commands to swap or migrate (move)
ciphertexts. For now, we pin the guest memory registered through
KVM_MEMORY_ENCRYPT_REGISTER_RAM ioctl.

Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/kvm/svm.c  | 113 
 2 files changed, 114 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 150177e..a91aadf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -747,6 +747,7 @@ struct kvm_sev_info {
unsigned int handle;/* firmware handle */
unsigned int asid;  /* asid for this guest */
int sev_fd; /* SEV device fd */
+   struct list_head ram_list; /* list of registered ram */
 };
 
 struct kvm_arch {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 75dcaa9..cdb1cf3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -333,8 +333,19 @@ static int sev_asid_new(void);
 static void sev_asid_free(int asid);
 static void sev_deactivate_handle(struct kvm *kvm, int *error);
 static void sev_decommission_handle(struct kvm *kvm, int *error);
+static void sev_unpin_memory(struct page **pages, unsigned long npages);
+
 #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
+struct kvm_sev_pin_ram {
+   struct list_head list;
+   unsigned long npages;
+   struct page **pages;
+   struct kvm_memory_encrypt_ram userspace;
+};
+
+static void __mem_encrypt_unregister_ram(struct kvm_sev_pin_ram *ram);
+
 static bool svm_sev_enabled(void)
 {
return !!max_sev_asid;
@@ -385,6 +396,11 @@ static inline void sev_set_fd(struct kvm *kvm, int fd)
to_sev_info(kvm)->sev_fd = fd;
 }
 
+static inline struct list_head *sev_get_ram_list(struct kvm *kvm)
+{
+   return &to_sev_info(kvm)->ram_list;
+}
+
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
vmcb->control.clean = 0;
@@ -1566,10 +1582,24 @@ static void sev_firmware_uninit(void)
 static void sev_vm_destroy(struct kvm *kvm)
 {
int state, error;
+   struct list_head *pos, *q;
+   struct kvm_sev_pin_ram *ram;
+   struct list_head *head = sev_get_ram_list(kvm);
 
if (!sev_guest(kvm))
return;
 
+   /*
+* if userspace was terminated before unregistering the memory region
+* then lets unpin all the registered memory.
+*/
+   if (!list_empty(head)) {
+   list_for_each_safe(pos, q, head) {
+   ram = list_entry(pos, struct kvm_sev_pin_ram, list);
+   __mem_encrypt_unregister_ram(ram);
+   }
+   }
+
/* release the firmware resources for this guest */
if (sev_get_handle(kvm)) {
sev_deactivate_handle(kvm, &error);
@@ -5640,6 +5670,7 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
sev_set_active(kvm);
sev_set_asid(kvm, asid);
sev_set_fd(kvm, argp->sev_fd);
+   INIT_LIST_HEAD(sev_get_ram_list(kvm));
ret = 0;
 e_err:
fdput(f);
@@ -6437,6 +6468,86 @@ static int svm_memory_encryption_op(struct kvm *kvm, 
void __user *argp)
return r;
 }
 
+static int mem_encrypt_register_ram(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram)
+{
+   struct list_head *head = sev_get_ram_list(kvm);
+   struct kvm_sev_pin_ram *pin_ram;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   pin_ram = kzalloc(sizeof(*pin_ram), GFP_KERNEL);
+   if (!pin_ram)
+   return -ENOMEM;
+
+   pin_ram->pages = sev_pin_memory(ram->address, ram->size,
+   &pin_ram->npages, 1);
+   if (!pin_ram->pages)
+   goto e_free;
+
+   /*
+* Guest may change the memory encryption attribute from C=0 -> C=1
+* for this memory range. Lets make sure caches are flushed to ensure
+* that guest data gets written into memory with correct C-bit.
+*/
+   sev_clflush_pages(pin_ram->pages, pin_ram->npages);
+
+   pin_ram->userspace.address = ram->address;
+   pin_ram->userspace.size = ram->size;
+   list_add_tail(&pin_ram->list, head);
+   return 0;
+e_free:
+   kfree(pin_ram);
+   return 1;
+}
+
+static struct kvm_sev_pin_ram *sev_find_pinned_ram(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram)
+{
+   struct list_head *head = sev_get_ram_li

[RFC Part2 PATCH v3 23/26] KVM: X86: Add memory encryption enabled ops

2017-07-24 Thread Brijesh Singh
Extend kvm_x86_ops to add memory_encyption_enabled() ops. It returns a
boolean indicating whether memory encryption is enabled on the VCPU.

Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/svm.c  | 8 
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a91aadf..a14d4dd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1073,6 +1073,7 @@ struct kvm_x86_ops {
struct kvm_memory_encrypt_ram *ram);
int (*memory_encryption_unregister_ram)(struct kvm *kvm,
struct kvm_memory_encrypt_ram *ram);
+   bool (*memory_encryption_enabled)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index cdb1cf3..0bbd050 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6548,6 +6548,12 @@ static int mem_encrypt_unregister_ram(struct kvm *kvm,
return 0;
 }
 
+static bool mem_encrypt_enabled(struct kvm_vcpu *vcpu)
+{
+   return !!(to_svm(vcpu)->vmcb->control.nested_ctl &
+   SVM_NESTED_CTL_SEV_ENABLE);
+}
+
 static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -6664,6 +6670,8 @@ static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.memory_encryption_op = svm_memory_encryption_op,
.memory_encryption_register_ram = mem_encrypt_register_ram,
.memory_encryption_unregister_ram = mem_encrypt_unregister_ram,
+   .memory_encryption_enabled = mem_encrypt_enabled,
+
 };
 
 static int __init svm_init(void)
-- 
2.9.4



[RFC Part2 PATCH v3 24/26] KVM: SVM: Clear C-bit from the page fault address

2017-07-24 Thread Brijesh Singh
When SEV is active, on #NPF the page fault address will contain C-bit.
We must clear the C-bit before handling the fault.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0bbd050..64b9f60 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2321,7 +2321,7 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned 
long value)
 
 static int pf_interception(struct vcpu_svm *svm)
 {
-   u64 fault_address = svm->vmcb->control.exit_info_2;
+   u64 fault_address = __sme_clr(svm->vmcb->control.exit_info_2);
u64 error_code = svm->vmcb->control.exit_info_1;
 
return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address,
-- 
2.9.4



[RFC Part2 PATCH v3 17/26] KVM: SVM: Add support for SEV LAUNCH_MEASURE command

2017-07-24 Thread Brijesh Singh
The command is used to retrieve the measurement of memory encrypted
through the LAUNCH_UPDATE_DATA command. This measurement can be used
for attestation purposes.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 91b070f..9b672eb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5957,6 +5957,54 @@ static int sev_launch_update_data(struct kvm *kvm, 
struct kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_launch_measure(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct sev_data_launch_measure *data = NULL;
+   struct kvm_sev_launch_measure params;
+   void *addr = NULL;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(¶ms, (void *)argp->data,
+   sizeof(struct kvm_sev_launch_measure)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   if (params.address && params.length) {
+   ret = -EFAULT;
+   addr = kzalloc(params.length, GFP_KERNEL);
+   if (!addr)
+   goto e_free;
+   data->address = __psp_pa(addr);
+   data->length = params.length;
+   }
+
+   data->handle = sev_get_handle(kvm);
+   ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_MEASURE, data, &argp->error);
+
+   /* copy the measurement to userspace */
+   if (addr &&
+   copy_to_user((void *)params.address, addr, params.length)) {
+   ret = -EFAULT;
+   goto e_free;
+   }
+
+   params.length = data->length;
+   if (copy_to_user((void *)argp->data, ¶ms,
+   sizeof(struct kvm_sev_launch_measure)))
+   ret = -EFAULT;
+
+e_free:
+   kfree(addr);
+   kfree(data);
+   return ret;
+}
+
 static int svm_memory_encryption_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -5980,6 +6028,10 @@ static int svm_memory_encryption_op(struct kvm *kvm, 
void __user *argp)
r = sev_launch_update_data(kvm, &sev_cmd);
break;
}
+   case KVM_SEV_LAUNCH_MEASURE: {
+   r = sev_launch_measure(kvm, &sev_cmd);
+   break;
+   }
default:
break;
}
-- 
2.9.4



[RFC Part2 PATCH v3 19/26] KVM: svm: Add support for SEV GUEST_STATUS command

2017-07-24 Thread Brijesh Singh
The command is used for querying the SEV guest status.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7a77197..21f85e1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6024,6 +6024,40 @@ static int sev_launch_finish(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_guest_status(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct kvm_sev_guest_status params;
+   struct sev_data_guest_status *data;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(¶ms, (void *) argp->data,
+   sizeof(struct kvm_sev_guest_status)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = sev_get_handle(kvm);
+   ret = sev_issue_cmd(kvm, SEV_CMD_GUEST_STATUS, data, &argp->error);
+   if (ret)
+   goto e_free;
+
+   params.policy = data->policy;
+   params.state = data->state;
+   params.handle = data->handle;
+
+   if (copy_to_user((void *) argp->data, ¶ms,
+   sizeof(struct kvm_sev_guest_status)))
+   ret = -EFAULT;
+e_free:
+   kfree(data);
+   return ret;
+}
+
 static int svm_memory_encryption_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -6055,6 +6089,10 @@ static int svm_memory_encryption_op(struct kvm *kvm, 
void __user *argp)
r = sev_launch_finish(kvm, &sev_cmd);
break;
}
+   case KVM_SEV_GUEST_STATUS: {
+   r = sev_guest_status(kvm, &sev_cmd);
+   break;
+   }
default:
break;
}
-- 
2.9.4



[RFC Part2 PATCH v3 18/26] KVM: SVM: Add support for SEV LAUNCH_FINISH command

2017-07-24 Thread Brijesh Singh
The command is used for finializing the SEV guest launch process.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9b672eb..7a77197 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -6005,6 +6005,25 @@ static int sev_launch_measure(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_launch_finish(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct sev_data_launch_finish *data;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = sev_get_handle(kvm);
+   ret = sev_issue_cmd(kvm, SEV_CMD_LAUNCH_FINISH, data, &argp->error);
+
+   kfree(data);
+   return ret;
+}
+
 static int svm_memory_encryption_op(struct kvm *kvm, void __user *argp)
 {
struct kvm_sev_cmd sev_cmd;
@@ -6032,6 +6051,10 @@ static int svm_memory_encryption_op(struct kvm *kvm, 
void __user *argp)
r = sev_launch_measure(kvm, &sev_cmd);
break;
}
+   case KVM_SEV_LAUNCH_FINISH: {
+   r = sev_launch_finish(kvm, &sev_cmd);
+   break;
+   }
default:
break;
}
-- 
2.9.4



[RFC Part2 PATCH v3 15/26] KVM: SVM: Add support for SEV LAUNCH_START command

2017-07-24 Thread Brijesh Singh
The command is used to bootstrap SEV guest from unencrypted boot images.
The command creates a new VM encryption key (VEK) using the guest owner's
policy, public DH certificates, and session information.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 165 +
 1 file changed, 165 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 72f7c27..3e325578 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -329,6 +329,8 @@ static unsigned int max_sev_asid;
 static unsigned long *sev_asid_bitmap;
 static int sev_asid_new(void);
 static void sev_asid_free(int asid);
+static void sev_deactivate_handle(struct kvm *kvm, int *error);
+static void sev_decommission_handle(struct kvm *kvm, int *error);
 
 static bool svm_sev_enabled(void)
 {
@@ -1565,6 +1567,12 @@ static void sev_vm_destroy(struct kvm *kvm)
if (!sev_guest(kvm))
return;
 
+   /* release the firmware resources for this guest */
+   if (sev_get_handle(kvm)) {
+   sev_deactivate_handle(kvm, &error);
+   sev_decommission_handle(kvm, &error);
+   }
+
sev_asid_free(sev_get_asid(kvm));
sev_firmware_uninit();
 }
@@ -5635,6 +5643,159 @@ static int sev_guest_init(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
+{
+   int fd = sev_get_fd(kvm);
+   struct fd f;
+   int ret;
+
+   f = fdget(fd);
+   if (!f.file)
+   return -EBADF;
+
+   ret = sev_issue_cmd_external_user(f.file, id, data, error);
+   fdput(f);
+
+   return ret;
+}
+
+static void sev_decommission_handle(struct kvm *kvm, int *error)
+{
+   struct sev_data_decommission *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return;
+
+   data->handle = sev_get_handle(kvm);
+   sev_guest_decommission(data, error);
+   kfree(data);
+}
+
+static void sev_deactivate_handle(struct kvm *kvm, int *error)
+{
+   struct sev_data_deactivate *data;
+   int ret;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return;
+
+   data->handle = sev_get_handle(kvm);
+   ret = sev_guest_deactivate(data, error);
+   if (ret)
+   goto e_free;
+
+   wbinvd_on_all_cpus();
+
+   sev_guest_df_flush(error);
+e_free:
+   kfree(data);
+}
+
+static int sev_activate_asid(struct kvm *kvm, unsigned int handle, int *error)
+{
+   struct sev_data_activate *data;
+   int asid = sev_get_asid(kvm);
+   int ret;
+
+   wbinvd_on_all_cpus();
+
+   ret = sev_guest_df_flush(error);
+   if (ret)
+   return ret;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   data->handle = handle;
+   data->asid   = asid;
+   ret = sev_guest_activate(data, error);
+   if (ret)
+   goto e_err;
+
+   sev_set_handle(kvm, handle);
+e_err:
+   kfree(data);
+   return ret;
+}
+
+static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   struct sev_data_launch_start *start = NULL;
+   struct kvm_sev_launch_start params;
+   void *dh_cert_addr = NULL;
+   void *session_addr = NULL;
+   int *error = &argp->error;
+   int ret;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   ret = -EFAULT;
+   if (copy_from_user(¶ms, (void *)argp->data,
+   sizeof(struct kvm_sev_launch_start)))
+   goto e_free;
+
+   ret = -ENOMEM;
+   start = kzalloc(sizeof(*start), GFP_KERNEL);
+   if (!start)
+   goto e_free;
+
+   /* Bit 15:6 reserved, must be 0 */
+   start->policy = params.policy & ~0xffc0;
+
+   if (params.dh_cert_length && params.dh_cert_address) {
+   ret = -ENOMEM;
+   dh_cert_addr = kmalloc(params.dh_cert_length, GFP_KERNEL);
+   if (!dh_cert_addr)
+   goto e_free;
+
+   ret = -EFAULT;
+   if (copy_from_user(dh_cert_addr, (void *)params.dh_cert_address,
+   params.dh_cert_length))
+   goto e_free;
+
+   start->dh_cert_address = __sme_set(__pa(dh_cert_addr));
+   start->dh_cert_length = params.dh_cert_length;
+   }
+
+   if (params.session_length && params.session_address) {
+   ret = -ENOMEM;
+   session_addr = kmalloc(params.session_length, GFP_KERNEL);
+   if (!session_addr)
+   goto e_free;
+
+   ret = -EFAULT;
+   if (copy_from_user(session_addr, (void *)params.session_address,
+   params.session_length))
+   goto e

[RFC Part2 PATCH v3 16/26] KVM: SVM: Add support for SEV LAUNCH_UPDATE_DATA command

2017-07-24 Thread Brijesh Singh
The command is used for encrypting the guest memory region using the VM
encryption key (VEK) created during LAUNCH_START.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 165 +
 1 file changed, 165 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3e325578..91b070f 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -39,6 +39,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -331,6 +333,7 @@ static int sev_asid_new(void);
 static void sev_asid_free(int asid);
 static void sev_deactivate_handle(struct kvm *kvm, int *error);
 static void sev_decommission_handle(struct kvm *kvm, int *error);
+#define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
 static bool svm_sev_enabled(void)
 {
@@ -5796,6 +5799,164 @@ static int sev_launch_start(struct kvm *kvm, struct 
kvm_sev_cmd *argp)
return ret;
 }
 
+static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen,
+   unsigned long *n, int write)
+{
+   unsigned long npages, pinned, size;
+   struct page **pages;
+   int first, last;
+
+   /* Get number of pages */
+   first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
+   last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
+   npages = (last - first + 1);
+
+   /* Avoid using vmalloc for smaller buffer */
+   size = npages * sizeof(struct page *);
+   if (size > PAGE_SIZE)
+   pages = vmalloc(size);
+   else
+   pages = kmalloc(size, GFP_KERNEL);
+
+   if (!pages)
+   return NULL;
+
+   /* pin the user virtual address */
+   pinned = get_user_pages_fast(uaddr, npages, write ? FOLL_WRITE : 0,
+   pages);
+   if (pinned != npages) {
+   pr_err("failed to pin %ld pages (got %ld)\n", npages, pinned);
+   goto err;
+   }
+
+   *n = npages;
+   return pages;
+err:
+   if (pinned > 0)
+   release_pages(pages, pinned, 0);
+   kvfree(pages);
+
+   return NULL;
+}
+
+static void sev_unpin_memory(struct page **pages, unsigned long npages)
+{
+   release_pages(pages, npages, 0);
+   kvfree(pages);
+}
+
+static void sev_clflush_pages(struct page *pages[], unsigned long npages)
+{
+   uint8_t *page_virtual;
+   unsigned long i;
+
+   if (npages == 0 || pages == NULL)
+   return;
+
+   for (i = 0; i < npages; i++) {
+   page_virtual = kmap_atomic(pages[i]);
+   clflush_cache_range(page_virtual, PAGE_SIZE);
+   kunmap_atomic(page_virtual);
+   }
+}
+
+static int get_num_contig_pages(int idx, struct page **inpages,
+   unsigned long npages)
+{
+   int i = idx + 1, pages = 1;
+   unsigned long paddr, next_paddr;
+
+   /* find the number of contiguous pages starting from idx */
+   paddr = __sme_page_pa(inpages[idx]);
+   while (i < npages) {
+   next_paddr = __sme_page_pa(inpages[i++]);
+   if ((paddr + PAGE_SIZE) == next_paddr) {
+   pages++;
+   paddr = next_paddr;
+   continue;
+   }
+   break;
+   }
+
+   return pages;
+}
+
+static int sev_launch_update_data(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   unsigned long vaddr, vaddr_end, next_vaddr, npages, size;
+   struct kvm_sev_launch_update_data params;
+   struct sev_data_launch_update_data *data;
+   struct page **inpages;
+   int i, ret, pages;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(¶ms, (void *)argp->data,
+   sizeof(struct kvm_sev_launch_update_data)))
+   return -EFAULT;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   vaddr = params.address;
+   size = params.length;
+   vaddr_end = vaddr + size;
+
+   /* lock the user memory */
+   inpages = sev_pin_memory(vaddr, size, &npages, 1);
+   if (!inpages) {
+   ret = -ENOMEM;
+   goto e_free;
+   }
+
+   /*
+* invalidate the cache to ensure that DRAM has recent content before
+* calling the SEV commands.
+*/
+   sev_clflush_pages(inpages, npages);
+
+   for (i = 0; vaddr < vaddr_end; vaddr = next_vaddr, i += pages) {
+   int offset, len;
+
+   /*
+* since user buffer may not be page aligned, calculate the
+* offset within the page.
+*/
+   offset = vaddr & (PAGE_SIZE - 1);
+
+   /*
+* calculate the number of pages that can be encrypted in one go
+*/
+   pages = g

[RFC Part2 PATCH v3 13/26] KVM: SVM: Add KVM_SEV_INIT command

2017-07-24 Thread Brijesh Singh
The command initializes the SEV firmware and allocate a new ASID for
this guest from SEV ASID pool. The firmware must be initialized before
we issue guest launch command to create a new encryption context.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 188 -
 1 file changed, 187 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2a5a03a..e99a572 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -37,6 +37,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -321,6 +323,14 @@ enum {
 
 /* Secure Encrypted Virtualization */
 static unsigned int max_sev_asid;
+static unsigned long *sev_asid_bitmap;
+static int sev_asid_new(void);
+static void sev_asid_free(int asid);
+
+static bool svm_sev_enabled(void)
+{
+   return !!max_sev_asid;
+}
 
 static inline struct kvm_sev_info *to_sev_info(struct kvm *kvm)
 {
@@ -1093,6 +1103,12 @@ static __init void sev_hardware_setup(void)
if (!nguests)
return;
 
+   /* Initialize SEV ASID bitmap */
+   sev_asid_bitmap = kcalloc(BITS_TO_LONGS(nguests),
+ sizeof(unsigned long), GFP_KERNEL);
+   if (IS_ERR(sev_asid_bitmap))
+   return;
+
max_sev_asid = nguests;
 }
 
@@ -1184,10 +1200,18 @@ static __init int svm_hardware_setup(void)
return r;
 }
 
+static __exit void sev_hardware_unsetup(void)
+{
+   kfree(sev_asid_bitmap);
+}
+
 static __exit void svm_hardware_unsetup(void)
 {
int cpu;
 
+   if (svm_sev_enabled())
+   sev_hardware_unsetup();
+
for_each_possible_cpu(cpu)
svm_cpu_uninit(cpu);
 
@@ -1373,6 +1397,9 @@ static void init_vmcb(struct vcpu_svm *svm)
svm->vmcb->control.virt_ext |= 
VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
}
 
+   if (sev_guest(svm->vcpu.kvm))
+   svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ENABLE;
+
mark_all_dirty(svm->vmcb);
 
enable_gif(svm);
@@ -1483,6 +1510,51 @@ static inline int avic_free_vm_id(int id)
return 0;
 }
 
+static int sev_platform_get_state(int *state, int *error)
+{
+   int ret;
+   struct sev_data_status *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   ret = sev_platform_status(data, error);
+   if (!ret)
+   *state = data->state;
+
+   kfree(data);
+   return ret;
+}
+
+static void sev_firmware_uninit(void)
+{
+   int rc, state, error;
+
+   rc = sev_platform_get_state(&state, &error);
+   if (rc) {
+   pr_err("SEV: failed to get firmware state (%#x)\n",
+   error);
+   return;
+   }
+
+   /* If we are in initialized state then uninitialize it */
+   if (state == SEV_STATE_INIT)
+   sev_platform_shutdown(&error);
+
+}
+
+static void sev_vm_destroy(struct kvm *kvm)
+{
+   int state, error;
+
+   if (!sev_guest(kvm))
+   return;
+
+   sev_asid_free(sev_get_asid(kvm));
+   sev_firmware_uninit();
+}
+
 static void avic_vm_destroy(struct kvm *kvm)
 {
unsigned long flags;
@@ -1503,6 +1575,12 @@ static void avic_vm_destroy(struct kvm *kvm)
spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
 }
 
+static void svm_vm_destroy(struct kvm *kvm)
+{
+   avic_vm_destroy(kvm);
+   sev_vm_destroy(kvm);
+}
+
 static int avic_vm_init(struct kvm *kvm)
 {
unsigned long flags;
@@ -5428,6 +5506,112 @@ static void svm_setup_mce(struct kvm_vcpu *vcpu)
vcpu->arch.mcg_cap &= 0x1ff;
 }
 
+static int sev_asid_new(void)
+{
+   int pos;
+
+   if (!max_sev_asid)
+   return -EINVAL;
+
+   pos = find_first_zero_bit(sev_asid_bitmap, max_sev_asid);
+   if (pos >= max_sev_asid)
+   return -EBUSY;
+
+   set_bit(pos, sev_asid_bitmap);
+   return pos + 1;
+}
+
+static void sev_asid_free(int asid)
+{
+   int pos;
+
+   pos = asid - 1;
+   clear_bit(pos, sev_asid_bitmap);
+}
+
+static int sev_firmware_init(int *error)
+{
+   int ret, state;
+
+   ret = sev_platform_get_state(&state, error);
+   if (ret)
+   return ret;
+
+   /*
+* If SEV firmware is in uninitialized state, lets initialize the
+* firmware before issuing guest commands.
+*/
+   if (state == SEV_STATE_UNINIT) {
+   struct sev_data_init *data;
+
+   data = kzalloc(sizeof(*data), GFP_KERNEL);
+   if (!data)
+   return -ENOMEM;
+
+   ret = sev_platform_init(data, error);
+   kfree(data);
+   }
+
+   return ret;
+}
+
+static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   int asid, ret;
+   struct

[RFC Part2 PATCH v3 14/26] KVM: SVM: VMRUN should use assosiated ASID when SEV is enabled

2017-07-24 Thread Brijesh Singh
SEV hardware uses ASIDs to associate memory encryption key with the
guest VMs. During the guest creation time, we use SEV_CMD_ACTIVATE
command to bind a particular ASID to the guest. Lets make sure that
VMCB is programmed with the binded ASID before a VMRUN.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 50 +-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e99a572..72f7c27 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -213,6 +213,9 @@ struct vcpu_svm {
 */
struct list_head ir_list;
spinlock_t ir_list_lock;
+
+   /* which host cpu was used for running this vcpu */
+   unsigned int last_cpuid;
 };
 
 /*
@@ -573,6 +576,8 @@ struct svm_cpu_data {
struct kvm_ldttss_desc *tss_desc;
 
struct page *save_area;
+
+   struct vmcb **sev_vmcbs;  /* index = sev_asid, value = vmcb pointer */
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -886,6 +891,7 @@ static void svm_cpu_uninit(int cpu)
return;
 
per_cpu(svm_data, raw_smp_processor_id()) = NULL;
+   kfree(sd->sev_vmcbs);
__free_page(sd->save_area);
kfree(sd);
 }
@@ -904,6 +910,14 @@ static int svm_cpu_init(int cpu)
if (!sd->save_area)
goto err_1;
 
+   if (svm_sev_enabled()) {
+   sd->sev_vmcbs = kmalloc((max_sev_asid + 1) * sizeof(void *),
+   GFP_KERNEL);
+   r = -ENOMEM;
+   if (!sd->sev_vmcbs)
+   goto err_1;
+   }
+
per_cpu(svm_data, cpu) = sd;
 
return 0;
@@ -4442,12 +4456,40 @@ static void reload_tss(struct kvm_vcpu *vcpu)
load_TR_desc();
 }
 
+static void pre_sev_run(struct vcpu_svm *svm)
+{
+   int cpu = raw_smp_processor_id();
+   int asid = sev_get_asid(svm->vcpu.kvm);
+   struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
+
+   /* Assign the asid allocated with this SEV guest */
+   svm->vmcb->control.asid = asid;
+
+   /*
+* Flush guest TLB:
+*
+* 1) when different VMCB for the same ASID is to be run on the same 
host CPU.
+* 2) or this VMCB was executed on different host cpu in previous 
VMRUNs.
+*/
+   if (sd->sev_vmcbs[asid] == svm->vmcb &&
+   svm->last_cpuid == cpu)
+   return;
+
+   svm->last_cpuid = cpu;
+   sd->sev_vmcbs[asid] = svm->vmcb;
+   svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ASID;
+   mark_dirty(svm->vmcb, VMCB_ASID);
+}
+
 static void pre_svm_run(struct vcpu_svm *svm)
 {
int cpu = raw_smp_processor_id();
 
struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 
+   if (sev_guest(svm->vcpu.kvm))
+   return pre_sev_run(svm);
+
/* FIXME: handle wraparound of asid_generation */
if (svm->asid_generation != sd->asid_generation)
new_asid(svm, sd);
@@ -5523,10 +5565,16 @@ static int sev_asid_new(void)
 
 static void sev_asid_free(int asid)
 {
-   int pos;
+   struct svm_cpu_data *sd;
+   int pos, cpu;
 
pos = asid - 1;
clear_bit(pos, sev_asid_bitmap);
+
+   for_each_possible_cpu(cpu) {
+   sd = per_cpu(svm_data, cpu);
+   sd->sev_vmcbs[pos] = NULL;
+   }
 }
 
 static int sev_firmware_init(int *error)
-- 
2.9.4



[RFC Part2 PATCH v3 11/26] KVM: X86: Extend struct kvm_arch to include SEV information

2017-07-24 Thread Brijesh Singh
The patch adds a new member (sev_info) in 'struct kvm_arch', and
setter/getter functions for the sev_info field.

Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h |  9 +
 arch/x86/kvm/svm.c  | 45 +
 2 files changed, 54 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4295f82..150177e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -742,6 +742,13 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT,/* created with KVM_CAP_SPLIT_IRQCHIP */
 };
 
+struct kvm_sev_info {
+   bool active;/* SEV enabled guest */
+   unsigned int handle;/* firmware handle */
+   unsigned int asid;  /* asid for this guest */
+   int sev_fd; /* SEV device fd */
+};
+
 struct kvm_arch {
unsigned int n_used_mmu_pages;
unsigned int n_requested_mmu_pages;
@@ -829,6 +836,8 @@ struct kvm_arch {
 
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+
+   struct kvm_sev_info sev_info;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 256c9df..2a5a03a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -322,6 +322,51 @@ enum {
 /* Secure Encrypted Virtualization */
 static unsigned int max_sev_asid;
 
+static inline struct kvm_sev_info *to_sev_info(struct kvm *kvm)
+{
+   return &kvm->arch.sev_info;
+}
+
+static inline void sev_set_active(struct kvm *kvm)
+{
+   to_sev_info(kvm)->active = true;
+}
+
+static inline unsigned int sev_get_handle(struct kvm *kvm)
+{
+   return to_sev_info(kvm)->handle;
+}
+
+static inline bool sev_guest(struct kvm *kvm)
+{
+   return to_sev_info(kvm)->active;
+}
+
+static inline int sev_get_asid(struct kvm *kvm)
+{
+   return to_sev_info(kvm)->asid;
+}
+
+static inline int sev_get_fd(struct kvm *kvm)
+{
+   return to_sev_info(kvm)->sev_fd;
+}
+
+static inline void sev_set_asid(struct kvm *kvm, int asid)
+{
+   to_sev_info(kvm)->asid = asid;
+}
+
+static inline void sev_set_handle(struct kvm *kvm, unsigned int handle)
+{
+   to_sev_info(kvm)->handle = handle;
+}
+
+static inline void sev_set_fd(struct kvm *kvm, int fd)
+{
+   to_sev_info(kvm)->sev_fd = fd;
+}
+
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
vmcb->control.clean = 0;
-- 
2.9.4



[RFC Part2 PATCH v3 12/26] KVM: Define SEV key management command id

2017-07-24 Thread Brijesh Singh
Define Secure Encrypted Virtualization (SEV) key management command id
and structure. The command definition is available in SEV KM [1] spec
0.14 and Documentation/virtual/kvm/amd-memory-encryption.txt

[1] http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

Signed-off-by: Brijesh Singh 
---
 include/uapi/linux/kvm.h | 148 +++
 1 file changed, 148 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6074065..8decc88 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1367,6 +1367,154 @@ struct kvm_memory_encrypt_ram {
__u64 size;
 };
 
+/* Secure Encrypted Virtualization command */
+enum sev_cmd_id {
+   /* Guest initialization commands */
+   KVM_SEV_INIT = 0,
+   KVM_SEV_ES_INIT,
+   /* Guest launch commands */
+   KVM_SEV_LAUNCH_START,
+   KVM_SEV_LAUNCH_UPDATE_DATA,
+   KVM_SEV_LAUNCH_UPDATE_VMSA,
+   KVM_SEV_LAUNCH_SECRET,
+   KVM_SEV_LAUNCH_MEASURE,
+   KVM_SEV_LAUNCH_FINISH,
+   /* Guest migration commands (outgoing) */
+   KVM_SEV_SEND_START,
+   KVM_SEV_SEND_UPDATE_DATA,
+   KVM_SEV_SEND_UPDATE_VMSA,
+   KVM_SEV_SEND_FINISH,
+   /* Guest migration commands (incoming) */
+   KVM_SEV_RECEIVE_START,
+   KVM_SEV_RECEIVE_UPDATE_DATA,
+   KVM_SEV_RECEIVE_UPDATE_VMSA,
+   KVM_SEV_RECEIVE_FINISH,
+   /* Guest status and debug commands */
+   KVM_SEV_GUEST_STATUS,
+   KVM_SEV_DBG_DECRYPT,
+   KVM_SEV_DBG_ENCRYPT,
+   /* Guest certificates commands */
+   KVM_SEV_CERT_EXPORT,
+
+   KVM_SEV_NR_MAX,
+};
+
+struct kvm_sev_cmd {
+   __u32 id;
+   __u64 data;
+   __u32 error;
+   __u32 sev_fd;
+};
+
+struct kvm_sev_launch_start {
+   __u32 handle;
+   __u32 policy;
+   __u64 dh_cert_address;
+   __u32 dh_cert_length;
+   __u64 session_address;
+   __u32 session_length;
+};
+
+struct kvm_sev_launch_update_data {
+   __u64 address;
+   __u32 length;
+};
+
+struct kvm_sev_launch_update_vmsa {
+   __u64 address;
+   __u32 length;
+};
+
+struct kvm_sev_launch_secret {
+   __u64 hdr_address;
+   __u32 hdr_length;
+   __u64 guest_address;
+   __u32 guest_length;
+   __u64 trans_address;
+   __u32 trans_length;
+};
+
+struct kvm_sev_launch_measure {
+   __u64 address;
+   __u32 length;
+};
+
+struct kvm_sev_send_start {
+   __u32 policy;
+   __u64 pdh_cert_address;
+   __u32 pdh_cert_length;
+   __u64 plat_cert_address;
+   __u32 plat_cert_length;
+   __u64 amd_cert_address;
+   __u32 amd_cert_length;
+   __u64 session_address;
+   __u32 session_length;
+};
+
+struct kvm_sev_send_update_data {
+   __u64 hdr_address;
+   __u32 hdr_length;
+   __u64 guest_address;
+   __u32 guest_length;
+   __u64 trans_address;
+   __u32 trans_length;
+};
+
+struct kvm_sev_send_update_vmsa {
+   __u64 hdr_address;
+   __u32 hdr_length;
+   __u64 guest_address;
+   __u32 guest_length;
+   __u64 trans_address;
+   __u32 trans_length;
+};
+
+struct kvm_sev_receive_start {
+   __u32 handle;
+   __u32 policy;
+   __u64 pdh_cert_address;
+   __u32 pdh_cert_length;
+   __u64 session_address;
+   __u32 session_length;
+};
+
+struct kvm_sev_receive_update_data {
+   __u64 hdr_address;
+   __u32 hdr_length;
+   __u64 guest_address;
+   __u32 guest_length;
+   __u64 trans_address;
+   __u32 trans_length;
+};
+
+struct kvm_sev_receive_update_vmsa {
+   __u64 hdr_address;
+   __u32 hdr_length;
+   __u64 guest_address;
+   __u32 guest_length;
+   __u64 trans_address;
+   __u32 trans_length;
+};
+
+struct kvm_sev_guest_status {
+   __u32 handle;
+   __u32 policy;
+   __u32 state;
+};
+
+struct kvm_sev_dbg {
+   __u64 src_addr;
+   __u64 dst_addr;
+   __u32 length;
+};
+
+struct kvm_sev_cert_export {
+   __u64 pdh_cert_address;
+   __u32 pdh_cert_length;
+   __u64 cert_chain_address;
+   __u32 cert_chain_length;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX   (1 << 2)
-- 
2.9.4



[RFC Part2 PATCH v3 10/26] KVM: Introduce KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioctl

2017-07-24 Thread Brijesh Singh
If hardware support memory encryption then KVM_MEMORY_REGISTER_RAM and
KVM_MEMORY_UNREGISTER_RAM ioctl's can be used by userspace to register/
unregister the guest memory regions which may contains the encrypted
data (e.g guest RAM, PCI BAR, SMRAM etc).

Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/kvm_host.h |  4 
 arch/x86/kvm/x86.c  | 36 
 include/uapi/linux/kvm.h|  9 +
 3 files changed, 49 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 99a0e11..4295f82 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1059,6 +1059,10 @@ struct kvm_x86_ops {
void (*setup_mce)(struct kvm_vcpu *vcpu);
 
int (*memory_encryption_op)(struct kvm *kvm, void __user *argp);
+   int (*memory_encryption_register_ram)(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram);
+   int (*memory_encryption_unregister_ram)(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9d3ff5..8febdb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3982,6 +3982,24 @@ static int kvm_vm_ioctl_memory_encryption_op(struct kvm 
*kvm, void __user *argp)
return -ENOTTY;
 }
 
+static int kvm_vm_ioctl_mem_encrypt_register_ram(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram)
+{
+   if (kvm_x86_ops->memory_encryption_register_ram)
+   return kvm_x86_ops->memory_encryption_register_ram(kvm, ram);
+
+   return -ENOTTY;
+}
+
+static int kvm_vm_ioctl_mem_encrypt_unregister_ram(struct kvm *kvm,
+   struct kvm_memory_encrypt_ram *ram)
+{
+   if (kvm_x86_ops->memory_encryption_unregister_ram)
+   return kvm_x86_ops->memory_encryption_unregister_ram(kvm, ram);
+
+   return -ENOTTY;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -4246,6 +4264,24 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_memory_encryption_op(kvm, argp);
break;
}
+   case KVM_MEMORY_ENCRYPT_REGISTER_RAM: {
+   struct kvm_memory_encrypt_ram ram;
+
+   r = -EFAULT;
+   if (copy_from_user(&ram, argp, sizeof(ram)))
+   goto out;
+   r = kvm_vm_ioctl_mem_encrypt_register_ram(kvm, &ram);
+   break;
+   }
+   case KVM_MEMORY_ENCRYPT_UNREGISTER_RAM: {
+   struct kvm_memory_encrypt_ram ram;
+
+   r = -EFAULT;
+   if (copy_from_user(&ram, argp, sizeof(ram)))
+   goto out;
+   r = kvm_vm_ioctl_mem_encrypt_unregister_ram(kvm, &ram);
+   break;
+   }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ab3b711..6074065 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1357,6 +1357,15 @@ struct kvm_s390_ucas_mapping {
 #define KVM_S390_SET_CMMA_BITS  _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
 /* Memory Encryption Commands */
 #define KVM_MEMORY_ENCRYPT_OP_IOWR(KVMIO, 0xba, unsigned long)
+#define KVM_MEMORY_ENCRYPT_REGISTER_RAM   _IOR(KVMIO, 0xbb, \
+   struct kvm_memory_encrypt_ram)
+#define KVM_MEMORY_ENCRYPT_UNREGISTER_RAM  _IOR(KVMIO, 0xbc, \
+   struct kvm_memory_encrypt_ram)
+
+struct kvm_memory_encrypt_ram {
+   __u64 address;
+   __u64 size;
+};
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
-- 
2.9.4



[RFC Part2 PATCH v3 08/26] KVM: X86: Extend CPUID range to include new leaf

2017-07-24 Thread Brijesh Singh
This CPUID provides the memory encryption support information on
AMD Platform. The complete description for CPUID leaf is available
in APM volume 2, Section 15.34

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/cpuid.c | 2 +-
 arch/x86/kvm/svm.c   | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 59ca2ee..372e969 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -599,7 +599,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
entry->edx = 0;
break;
case 0x8000:
-   entry->eax = min(entry->eax, 0x801a);
+   entry->eax = min(entry->eax, 0x801f);
break;
case 0x8001:
entry->edx &= kvm_cpuid_8000_0001_edx_x86_features;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1cd7c78..256c9df 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5131,6 +5131,12 @@ static void svm_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
entry->edx |= SVM_FEATURE_NPT;
 
break;
+   case 0x801F:
+   /* Support memory encryption cpuid if host supports it */
+   if (boot_cpu_has(X86_FEATURE_SEV))
+   cpuid(0x801f, &entry->eax, &entry->ebx,
+   &entry->ecx, &entry->edx);
+
}
 }
 
-- 
2.9.4



[RFC Part2 PATCH v3 07/26] KVM: SVM: Add SEV feature definitions to KVM

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Define the SEV enable bit for the VMCB control structure. The hypervisor
will use this bit to enable SEV in the guest.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/svm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index a3d9e0b..0be01f9 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -140,6 +140,7 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_VM_CR_SVM_DIS_MASK  0x0010ULL
 
 #define SVM_NESTED_CTL_NP_ENABLE   BIT(0)
+#define SVM_NESTED_CTL_SEV_ENABLE  BIT(1)
 
 struct __attribute__ ((__packed__)) vmcb_seg {
u16 selector;
-- 
2.9.4



[RFC Part2 PATCH v3 09/26] KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl

2017-07-24 Thread Brijesh Singh
If hardware supports encrypting then KVM_MEMORY_ENCRYPT_OP ioctl can
be used by qemu to issue platform specific memory encryption commands.

Signed-off-by: Brijesh Singh 
Reviewed-by: Paolo Bonzini 
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c  | 12 
 include/uapi/linux/kvm.h|  2 ++
 3 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7cbaab5..99a0e11 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1057,6 +1057,8 @@ struct kvm_x86_ops {
void (*cancel_hv_timer)(struct kvm_vcpu *vcpu);
 
void (*setup_mce)(struct kvm_vcpu *vcpu);
+
+   int (*memory_encryption_op)(struct kvm *kvm, void __user *argp);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 88be1aa..c9d3ff5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3974,6 +3974,14 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
return r;
 }
 
+static int kvm_vm_ioctl_memory_encryption_op(struct kvm *kvm, void __user 
*argp)
+{
+   if (kvm_x86_ops->memory_encryption_op)
+   return kvm_x86_ops->memory_encryption_op(kvm, argp);
+
+   return -ENOTTY;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
@@ -4234,6 +4242,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_enable_cap(kvm, &cap);
break;
}
+   case KVM_MEMORY_ENCRYPT_OP: {
+   r = kvm_vm_ioctl_memory_encryption_op(kvm, argp);
+   break;
+   }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6cd63c1..ab3b711 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1355,6 +1355,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_S390_CMMA_MIGRATION */
 #define KVM_S390_GET_CMMA_BITS  _IOWR(KVMIO, 0xb8, struct 
kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS  _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
+/* Memory Encryption Commands */
+#define KVM_MEMORY_ENCRYPT_OP_IOWR(KVMIO, 0xba, unsigned long)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
-- 
2.9.4



[RFC Part2 PATCH v3 04/26] KVM: SVM: Prepare to reserve asid for SEV guest

2017-07-24 Thread Brijesh Singh
In current implementation, asid allocation starts from 1, this patch
adds a min_asid variable in svm_vcpu structure to allow starting asid
from something other than 1.

Signed-off-by: Brijesh Singh 
Reviewed-by: Paolo Bonzini 
---
 arch/x86/kvm/svm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6af04dd..46f41bb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -511,6 +511,7 @@ struct svm_cpu_data {
u64 asid_generation;
u32 max_asid;
u32 next_asid;
+   u32 min_asid;
struct kvm_ldttss_desc *tss_desc;
 
struct page *save_area;
@@ -768,6 +769,7 @@ static int svm_hardware_enable(void)
sd->asid_generation = 1;
sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
sd->next_asid = sd->max_asid + 1;
+   sd->min_asid = 1;
 
gdt = get_current_gdt_rw();
sd->tss_desc = (struct kvm_ldttss_desc *)(gdt + GDT_ENTRY_TSS);
@@ -2072,7 +2074,7 @@ static void new_asid(struct vcpu_svm *svm, struct 
svm_cpu_data *sd)
 {
if (sd->next_asid > sd->max_asid) {
++sd->asid_generation;
-   sd->next_asid = 1;
+   sd->next_asid = sd->min_asid;
svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
}
 
-- 
2.9.4



[RFC Part2 PATCH v3 06/26] KVM: SVM: Prepare for new bit definition in nested_ctl

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Currently the nested_ctl variable in the vmcb_control_area structure is
used to indicate nested paging support. The nested paging support field
is actually defined as bit 0 of the field. In order to support a new
feature flag the usage of the nested_ctl and nested paging support must
be converted to operate on a single bit.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/svm.h | 2 ++
 arch/x86/kvm/svm.c | 7 ---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 58fffe7..a3d9e0b 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -139,6 +139,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define SVM_VM_CR_SVM_LOCK_MASK 0x0008ULL
 #define SVM_VM_CR_SVM_DIS_MASK  0x0010ULL
 
+#define SVM_NESTED_CTL_NP_ENABLE   BIT(0)
+
 struct __attribute__ ((__packed__)) vmcb_seg {
u16 selector;
u16 attrib;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 06bd902..1cd7c78 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1296,7 +1296,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 
if (npt_enabled) {
/* Setup VMCB for Nested Paging */
-   control->nested_ctl = 1;
+   control->nested_ctl |= SVM_NESTED_CTL_NP_ENABLE;
clr_intercept(svm, INTERCEPT_INVLPG);
clr_exception_intercept(svm, PF_VECTOR);
clr_cr_intercept(svm, INTERCEPT_CR3_READ);
@@ -2904,7 +2904,8 @@ static bool nested_vmcb_checks(struct vmcb *vmcb)
if (vmcb->control.asid == 0)
return false;
 
-   if (vmcb->control.nested_ctl && !npt_enabled)
+   if ((vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE) &&
+   !npt_enabled)
return false;
 
return true;
@@ -2979,7 +2980,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
else
svm->vcpu.arch.hflags &= ~HF_HIF_MASK;
 
-   if (nested_vmcb->control.nested_ctl) {
+   if (nested_vmcb->control.nested_ctl & SVM_NESTED_CTL_NP_ENABLE) {
kvm_mmu_unload(&svm->vcpu);
svm->nested.nested_cr3 = nested_vmcb->control.nested_cr3;
nested_svm_init_mmu_context(&svm->vcpu);
-- 
2.9.4



[RFC Part2 PATCH v3 03/26] crypto: ccp: Add Secure Encrypted Virtualization (SEV) device support

2017-07-24 Thread Brijesh Singh
AMDs new Secure Encrypted Virtualization (SEV) feature allows the memory
contents of a virtual machine to be transparently encrypted with a key
unique to the guest VM. The programming and management of the encryption
keys are handled by the AMD Secure Processor (AMD-SP), which exposes the
commands for these tasks. The complete spec for various commands are
available at:
http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf

This patch extends AMD-SP driver to provide:

 - a in-kernel APIs to communicate with SEV device. The APIs can be used
   by the hypervisor to create encryption context for the SEV guests.

 - a userspace IOCTL to manage the platform certificates etc

Cc: Herbert Xu 
Cc: David S. Miller 
Cc: Gary Hook 
Cc: linux-cry...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/Kconfig   |  10 +
 drivers/crypto/ccp/Makefile  |   1 +
 drivers/crypto/ccp/psp-dev.c |   4 +
 drivers/crypto/ccp/psp-dev.h |  27 ++
 drivers/crypto/ccp/sev-dev.c | 416 ++
 drivers/crypto/ccp/sev-dev.h |  67 +
 drivers/crypto/ccp/sev-ops.c | 457 +
 drivers/crypto/ccp/sp-pci.c  |   2 +-
 include/linux/psp-sev.h  | 683 +++
 include/uapi/linux/psp-sev.h | 110 +++
 10 files changed, 1776 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/ccp/sev-dev.c
 create mode 100644 drivers/crypto/ccp/sev-dev.h
 create mode 100644 drivers/crypto/ccp/sev-ops.c
 create mode 100644 include/linux/psp-sev.h
 create mode 100644 include/uapi/linux/psp-sev.h

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 41c0ff5..ae0ff1c 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -40,3 +40,13 @@ config CRYPTO_DEV_SP_PSP
 Provide the support for AMD Platform Security Processor (PSP) device
 which can be used for communicating with Secure Encryption 
Virtualization
 (SEV) firmware.
+
+config CRYPTO_DEV_PSP_SEV
+   bool "Secure Encrypted Virtualization (SEV) interface"
+   default y
+   depends on CRYPTO_DEV_CCP_DD
+   depends on CRYPTO_DEV_SP_PSP
+   help
+Provide the kernel and userspace (/dev/sev) interface to communicate 
with
+Secure Encrypted Virtualization (SEV) firmware running inside AMD 
Platform
+Security Processor (PSP)
diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile
index 8aae4ff..94ca748 100644
--- a/drivers/crypto/ccp/Makefile
+++ b/drivers/crypto/ccp/Makefile
@@ -8,6 +8,7 @@ ccp-$(CONFIG_CRYPTO_DEV_SP_CCP) += ccp-dev.o \
ccp-debugfs.o
 ccp-$(CONFIG_PCI) += sp-pci.o
 ccp-$(CONFIG_CRYPTO_DEV_SP_PSP) += psp-dev.o
+ccp-$(CONFIG_CRYPTO_DEV_PSP_SEV) += sev-dev.o sev-ops.o
 
 obj-$(CONFIG_CRYPTO_DEV_CCP_CRYPTO) += ccp-crypto.o
 ccp-crypto-objs := ccp-crypto-main.o \
diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
index bb0ea9a..0c9d25c 100644
--- a/drivers/crypto/ccp/psp-dev.c
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -97,6 +97,7 @@ irqreturn_t psp_irq_handler(int irq, void *data)
 static int psp_init(struct psp_device *psp)
 {
psp_add_device(psp);
+   sev_dev_init(psp);
 
return 0;
 }
@@ -166,17 +167,20 @@ void psp_dev_destroy(struct sp_device *sp)
struct psp_device *psp = sp->psp_data;
 
sp_free_psp_irq(sp, psp);
+   sev_dev_destroy(psp);
 
psp_del_device(psp);
 }
 
 int psp_dev_resume(struct sp_device *sp)
 {
+   sev_dev_resume(sp->psp_data);
return 0;
 }
 
 int psp_dev_suspend(struct sp_device *sp, pm_message_t state)
 {
+   sev_dev_suspend(sp->psp_data, state);
return 0;
 }
 
diff --git a/drivers/crypto/ccp/psp-dev.h b/drivers/crypto/ccp/psp-dev.h
index 6e167b8..9334d87 100644
--- a/drivers/crypto/ccp/psp-dev.h
+++ b/drivers/crypto/ccp/psp-dev.h
@@ -78,5 +78,32 @@ int psp_free_tee_irq(struct psp_device *psp, void *data);
 struct psp_device *psp_get_master_device(void);
 
 extern const struct psp_vdata psp_entry;
+#ifdef CONFIG_CRYPTO_DEV_PSP_SEV
+
+int sev_dev_init(struct psp_device *psp);
+void sev_dev_destroy(struct psp_device *psp);
+int sev_dev_resume(struct psp_device *psp);
+int sev_dev_suspend(struct psp_device *psp, pm_message_t state);
+
+#else /* !CONFIG_CRYPTO_DEV_PSP_SEV */
+
+static inline int sev_dev_init(struct psp_device *psp)
+{
+   return -ENODEV;
+}
+
+static inline void sev_dev_destroy(struct psp_device *psp) { }
+
+static inline int sev_dev_resume(struct psp_device *psp)
+{
+   return -ENODEV;
+}
+
+static inline int sev_dev_suspend(struct psp_device *psp, pm_message_t state)
+{
+   return -ENODEV;
+}
+
+#endif /* CONFIG_CRYPTO_DEV_PSP_SEV */
 
 #endif /* __PSP_DEV_H */
diff --git a/drivers/crypto/ccp/sev-dev.c b/drivers/crypto/ccp/sev-dev.c
new file mode 100644
index 000..a2b41dd
--- /dev/null
+++ b/drivers/crypto/ccp/sev-dev.c
@@ -0,0 +1,416 @@
+/*
+ * AMD Secure Encrypted Virtualizat

[RFC Part2 PATCH v3 05/26] KVM: SVM: Reserve ASID range for SEV guest

2017-07-24 Thread Brijesh Singh
SEV-enabled guest must use ASIDs from the defined subset, while non-SEV
guests can use the remaining ASID range. The range of ASID allowed for
SEV-enabled guest is from 1 to a maximum value defined via CPUID
Fn8000_001f[ECX].

Signed-off-by: Brijesh Singh 
---
 arch/x86/kvm/svm.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 46f41bb..06bd902 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -319,6 +319,9 @@ enum {
 
 #define VMCB_AVIC_APIC_BAR_MASK0xFF000ULL
 
+/* Secure Encrypted Virtualization */
+static unsigned int max_sev_asid;
+
 static inline void mark_all_dirty(struct vmcb *vmcb)
 {
vmcb->control.clean = 0;
@@ -769,7 +772,7 @@ static int svm_hardware_enable(void)
sd->asid_generation = 1;
sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
sd->next_asid = sd->max_asid + 1;
-   sd->min_asid = 1;
+   sd->min_asid = max_sev_asid + 1;
 
gdt = get_current_gdt_rw();
sd->tss_desc = (struct kvm_ldttss_desc *)(gdt + GDT_ENTRY_TSS);
@@ -1033,6 +1036,21 @@ static int avic_ga_log_notifier(u32 ga_tag)
return 0;
 }
 
+static __init void sev_hardware_setup(void)
+{
+   int nguests;
+
+   /*
+* Get maximum number of encrypted guest supported: Fn8001_001F[ECX]
+* Bit 31:0: Number of supported guest
+*/
+   nguests = cpuid_ecx(0x801F);
+   if (!nguests)
+   return;
+
+   max_sev_asid = nguests;
+}
+
 static __init int svm_hardware_setup(void)
 {
int cpu;
@@ -1063,6 +1081,9 @@ static __init int svm_hardware_setup(void)
kvm_tsc_scaling_ratio_frac_bits = 32;
}
 
+   if (boot_cpu_has(X86_FEATURE_SEV))
+   sev_hardware_setup();
+
if (nested) {
printk(KERN_INFO "kvm: Nested Virtualization enabled\n");
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
-- 
2.9.4



[RFC Part2 PATCH v3 02/26] crypto: ccp: Add Platform Security Processor (PSP) device support

2017-07-24 Thread Brijesh Singh
Platform Security Processor (PSP) is part of AMD Secure Processor (AMD-SP),
PSP is a dedicated processor that provides the support for key management
commands in a Secure Encrypted Virtualiztion (SEV) mode, along with
software-based Tursted Executation Environment (TEE) to enable the
third-party tursted applications.

Cc: Herbert Xu 
Cc: David S. Miller 
Cc: Gary Hook 
Cc: linux-cry...@vger.kernel.org
Signed-off-by: Brijesh Singh 
---
 drivers/crypto/ccp/Kconfig   |   9 ++
 drivers/crypto/ccp/Makefile  |   1 +
 drivers/crypto/ccp/psp-dev.c | 226 +++
 drivers/crypto/ccp/psp-dev.h |  82 
 drivers/crypto/ccp/sp-dev.c  |  43 
 drivers/crypto/ccp/sp-dev.h  |  41 +++-
 drivers/crypto/ccp/sp-pci.c  |  46 +
 7 files changed, 447 insertions(+), 1 deletion(-)
 create mode 100644 drivers/crypto/ccp/psp-dev.c
 create mode 100644 drivers/crypto/ccp/psp-dev.h

diff --git a/drivers/crypto/ccp/Kconfig b/drivers/crypto/ccp/Kconfig
index 15b63fd..41c0ff5 100644
--- a/drivers/crypto/ccp/Kconfig
+++ b/drivers/crypto/ccp/Kconfig
@@ -31,3 +31,12 @@ config CRYPTO_DEV_CCP_CRYPTO
  Support for using the cryptographic API with the AMD Cryptographic
  Coprocessor. This module supports offload of SHA and AES algorithms.
  If you choose 'M' here, this module will be called ccp_crypto.
+
+config CRYPTO_DEV_SP_PSP
+   bool "Platform Security Processor device"
+   default y
+   depends on CRYPTO_DEV_CCP_DD
+   help
+Provide the support for AMD Platform Security Processor (PSP) device
+which can be used for communicating with Secure Encryption 
Virtualization
+(SEV) firmware.
diff --git a/drivers/crypto/ccp/Makefile b/drivers/crypto/ccp/Makefile
index 5f2adc5..8aae4ff 100644
--- a/drivers/crypto/ccp/Makefile
+++ b/drivers/crypto/ccp/Makefile
@@ -7,6 +7,7 @@ ccp-$(CONFIG_CRYPTO_DEV_SP_CCP) += ccp-dev.o \
ccp-dmaengine.o \
ccp-debugfs.o
 ccp-$(CONFIG_PCI) += sp-pci.o
+ccp-$(CONFIG_CRYPTO_DEV_SP_PSP) += psp-dev.o
 
 obj-$(CONFIG_CRYPTO_DEV_CCP_CRYPTO) += ccp-crypto.o
 ccp-crypto-objs := ccp-crypto-main.o \
diff --git a/drivers/crypto/ccp/psp-dev.c b/drivers/crypto/ccp/psp-dev.c
new file mode 100644
index 000..bb0ea9a
--- /dev/null
+++ b/drivers/crypto/ccp/psp-dev.c
@@ -0,0 +1,226 @@
+/*
+ * AMD Platform Security Processor (PSP) interface
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Brijesh Singh 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "sp-dev.h"
+#include "psp-dev.h"
+
+static LIST_HEAD(psp_devs);
+static DEFINE_SPINLOCK(psp_devs_lock);
+
+const struct psp_vdata psp_entry = {
+   .offset = 0x10500,
+};
+
+void psp_add_device(struct psp_device *psp)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&psp_devs_lock, flags);
+
+   list_add_tail(&psp->entry, &psp_devs);
+
+   spin_unlock_irqrestore(&psp_devs_lock, flags);
+}
+
+void psp_del_device(struct psp_device *psp)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&psp_devs_lock, flags);
+
+   list_del(&psp->entry);
+   spin_unlock_irqrestore(&psp_devs_lock, flags);
+}
+
+static struct psp_device *psp_alloc_struct(struct sp_device *sp)
+{
+   struct device *dev = sp->dev;
+   struct psp_device *psp;
+
+   psp = devm_kzalloc(dev, sizeof(*psp), GFP_KERNEL);
+   if (!psp)
+   return NULL;
+
+   psp->dev = dev;
+   psp->sp = sp;
+
+   snprintf(psp->name, sizeof(psp->name), "psp-%u", sp->ord);
+
+   return psp;
+}
+
+irqreturn_t psp_irq_handler(int irq, void *data)
+{
+   unsigned int status;
+   irqreturn_t ret = IRQ_HANDLED;
+   struct psp_device *psp = data;
+
+   /* read the interrupt status */
+   status = ioread32(psp->io_regs + PSP_P2CMSG_INTSTS);
+
+   /* invoke subdevice interrupt handlers */
+   if (status) {
+   if (psp->sev_irq_handler)
+   ret = psp->sev_irq_handler(irq, psp->sev_irq_data);
+   if (psp->tee_irq_handler)
+   ret = psp->tee_irq_handler(irq, psp->tee_irq_data);
+   }
+
+   /* clear the interrupt status */
+   iowrite32(status, psp->io_regs + PSP_P2CMSG_INTSTS);
+
+   return ret;
+}
+
+static int psp_init(struct psp_device *psp)
+{
+   psp_add_device(psp);
+
+   return 0;
+}
+
+int psp_dev_init(struct sp_device *sp)
+{
+   struct device *dev = sp->dev;
+   struct psp_device *psp;
+   int ret;
+
+   ret = -ENOMEM;
+   psp = psp_alloc_st

[RFC Part2 PATCH v3 01/26] Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV)

2017-07-24 Thread Brijesh Singh
Create a Documentation entry to describe the AMD Secure Encrypted
Virtualization (SEV) feature.

Signed-off-by: Brijesh Singh 
---
 .../virtual/kvm/amd-memory-encryption.txt  | 328 +
 1 file changed, 328 insertions(+)
 create mode 100644 Documentation/virtual/kvm/amd-memory-encryption.txt

diff --git a/Documentation/virtual/kvm/amd-memory-encryption.txt 
b/Documentation/virtual/kvm/amd-memory-encryption.txt
new file mode 100644
index 000..cffed2d
--- /dev/null
+++ b/Documentation/virtual/kvm/amd-memory-encryption.txt
@@ -0,0 +1,328 @@
+Secure Encrypted Virtualization (SEV) is a feature found on AMD processors.
+
+SEV is an extension to the AMD-V architecture which supports running virtual
+machine (VMs) under the control of a hypervisor. When enabled, the memory
+contents of VM will be transparently encrypted with a key unique to the VM.
+
+Hypervisor can determine the SEV support through the CPUID instruction. The 
CPUID
+function 0x801f reports information related to SEV:
+
+   0x801f[eax]:
+   Bit[1]  indicates support for SEV
+
+   0x801f[ecx]:
+   Bits[31:0]  Number of encrypted guest supported simultaneously
+
+If support for SEV is present, MSR 0xc00100010 (MSR_K8_SYSCFG) and MSR
+0xc015 (MSR_K7_HWCR_SMMLOCK) can be used to determine if it can be enabled:
+
+   0xc00100010:
+   Bit[23]0 = memory encryption can be enabled
+  0 = memory encryption can not be enabled
+
+   0xc00010015:
+   Bit[0] 0 = memory encryption can not be enabled
+  1 = memory encryption can be enabled
+
+When SEV support is available, it can be enabled on specific VM during the 
VMRUN
+instruction by setting SEV bit in VMCB offset 090h:
+
+   VMCB offset 090h:
+   Bit[1]  1 = Enable SEV
+
+SEV hardware uses ASIDs to associate memory encryption key with the guest VMs.
+Hence the ASID for the SEV-enabled guests must be from 1 to a maximum value
+defined through the CPUID function 0x801f[ECX].
+
+
+SEV Key Management
+--
+
+The Key management for the SEV guest is handled by a seperate processor known 
as
+the AMD Secure Processor (AMD-SP). Firmware running inside the AMD-SP provides 
a
+secure key management interface to perform common hypervisor activities such as
+encrypting bootstrap code, snapshotting, migrating and debugging the guest. For
+more informaiton, see SEV Key Management spec:
+
+http://support.amd.com/TechDocs/55766_SEV-KM%20API_Specification.pdf
+
+1. KVM_SEV_LAUNCH_START
+
+Parameters: struct  kvm_sev_launch_start (in/out)
+Returns: 0 on success, -negative on error
+
+LAUNCH_START command is used to bootstrap a guest by encrypting its memory with
+a new VM Encryption Key (VEK). In order to create guest context, hypervisor 
should
+provide guest policy, owners public diffie-hellman (PDH) key and session 
parameters.
+
+The guest policy constrains the use and features activated for the lifetime of 
the
+launched guest, such as disallowing debugging, enabling key sharing, or 
turning on
+other SEV related features.
+
+The guest owners PDH allows the firmware to establish a cryptographic session 
with
+the guest owner to negotiate keys used for attestation.
+
+The session parameters contains informations such as guest policy MAC, 
transport
+integrity key (TIK), transport encryption key (TEK) etc.
+
+struct kvm_sev_launch_start {
+
+   /* Guest Hanldle, if zero then FW creates a new handle */
+   __u32 handle;
+
+   /* Guest policy */
+   __u32 policy;
+
+   /* Address which contains guest owner's PDH certificate blob */
+   __u64 dh_cert_address;
+   __u32 dh_cert_length;
+
+   /* Address which contains guest session information blob */
+   __u64 session_address;
+   __u32 session_length;
+};
+
+On success, the 'handle' field contain a new handle.
+
+2. KVM_SEV_LAUNCH_UPDATE_DATA
+
+Parameters (in): struct  kvm_sev_launch_update
+Returns: 0 on success, -negative on error
+
+LAUNCH_UPDATE_DATA encrypts the memory region using the VEK created during
+LAUNCH_START. It also calculates a measurement of the memory region. This
+measurement can be used as a signature of the memory contents.
+
+struct kvm_sev_launch_update {
+   /* address of the data to be encrypted (must be 16-byte aligned) */
+   __u64 address;
+
+   /* length of the data to be encrypted (must be 16-byte aligned) */
+   __u32 length;
+};
+
+3. KVM_SEV_LAUNCH_MEASURE
+
+Parameters (in): struct  kvm_sev_launch_measure
+Returns: 0 on success, -negative on error
+
+LAUNCH_MEASURE returns the measurement of the memory region encrypted with
+LAUNCH_UPDATE_DATA. The measurement is keyed with the TIK so that the guest
+owner can use the measurement to verify the guest was properly launched without
+tempering.
+
+struct kvm_sev_launch_measure {
+   /* where to copy the m

[RFC Part2 PATCH v3 00/26] x86: Secure Encrypted Virtualization (AMD)

2017-07-24 Thread Brijesh Singh
This part of Secure Encryted Virtualization (SEV) patch series focuses on KVM
changes required to create and manage SEV guests.

SEV is an extension to the AMD-V architecture which supports running encrypted
virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have 
their
pages (code and data) secured such that only the guest itself has access to
unencrypted version. Each encrypted VM is associated with a unique encryption 
key;
if its data is accessed to a different entity using a different key the 
encrypted
guest's data will be incorrectly decrypted, leading to unintelligible data.
This security model ensures that hypervisor will no longer able to inspect or
alter any guest code or data.

The key management of this feature is handled by a separate processor known as
the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key
Management Specification (see below) provides a set of commands which can be
used by hypervisor to load virtual machine keys through the AMD-SP driver.

The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPTION_OP). The
ioctl will be used by qemu to issue SEV guest-specific commands defined in Key
Management Specification.

The following links provide additional details:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
  SEV support has been interated into EDKII/OVMF BIOS
  https://github.com/tianocore/edk2

RFC part1:
http://marc.info/?l=kvm&m=150092330804060&w=2

---
This RFC is based on tip/master commit : 22db3de (Merge branch 'x86/mm').
Complete git tree is available: https://github.com/codomania/tip/tree/sev-rfc-3

TODO:
 * Add SEV guest migration command support

Cc: Herbert Xu 
Cc: David S. Miller 
Cc: Gary Hook 
Cc: linux-cry...@vger.kernel.org

Changes since v2:
 * Add KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioct to register encrypted
   memory ranges (recommend by Paolo)
 * Extend kvm_x86_ops to provide new memory_encryption_enabled ops
 * Enhance DEBUG DECRYPT/ENCRYPT commands to work with more than one page 
(recommended by Paolo)
 * Optimize LAUNCH_UPDATE command to reduce the number of calls to AMD-SP driver
 * Changes to address v2 feedbacks

Brijesh Singh (24):
  Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization
(SEV)
  crypto: ccp: Add Platform Security Processor (PSP) device support
  crypto: ccp: Add Secure Encrypted Virtualization (SEV) device support
  KVM: SVM: Prepare to reserve asid for SEV guest
  KVM: SVM: Reserve ASID range for SEV guest
  KVM: X86: Extend CPUID range to include new leaf
  KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctl
  KVM: Introduce KVM_MEMORY_ENCRYPT_REGISTER/UNREGISTER_RAM ioctl
  KVM: X86: Extend struct kvm_arch to include SEV information
  KVM: Define SEV key management command id
  KVM: SVM: Add KVM_SEV_INIT command
  KVM: SVM: VMRUN should use assosiated ASID when SEV is enabled
  KVM: SVM: Add support for SEV LAUNCH_START command
  KVM: SVM: Add support for SEV LAUNCH_UPDATE_DATA command
  KVM: SVM: Add support for SEV LAUNCH_MEASURE command
  KVM: SVM: Add support for SEV LAUNCH_FINISH command
  KVM: svm: Add support for SEV GUEST_STATUS command
  KVM: SVM: Add support for SEV DEBUG_DECRYPT command
  KVM: SVM: Add support for SEV DEBUG_ENCRYPT command
  KVM: SVM: Pin guest memory when SEV is active
  KVM: X86: Add memory encryption enabled ops
  KVM: SVM: Clear C-bit from the page fault address
  KVM: SVM: Do not install #UD intercept when SEV is enabled
  KVM: X86: Restart the guest when insn_len is zero and SEV is enabled

Tom Lendacky (2):
  KVM: SVM: Prepare for new bit definition in nested_ctl
  KVM: SVM: Add SEV feature definitions to KVM

 .../virtual/kvm/amd-memory-encryption.txt  |  328 ++
 arch/x86/include/asm/kvm_host.h|   17 +
 arch/x86/include/asm/svm.h |3 +
 arch/x86/kvm/cpuid.c   |2 +-
 arch/x86/kvm/mmu.c |   17 +
 arch/x86/kvm/svm.c | 1221 +++-
 arch/x86/kvm/x86.c |   48 +
 drivers/crypto/ccp/Kconfig |   19 +
 drivers/crypto/ccp/Makefile|2 +
 drivers/crypto/ccp/psp-dev.c   |  230 
 drivers/crypto/ccp/psp-dev.h   |  109 ++
 drivers/crypto/ccp/sev-dev.c   |  416 +++
 drivers/crypto/ccp/sev-dev.h 

[RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/io.h | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..2f3c002 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) 
\
\
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {  \
-   asm volatile("rep; outs" #bwl   \
-: "+S"(addr), "+c"(count) : "d"(port));\
+   if (sev_active()) { \
+   unsigned type *value = (unsigned type *)addr;   \
+   while (count) { \
+   out##bwl(*value, port); \
+   value++;\
+   count--;\
+   }   \
+   } else {\
+   asm volatile("rep; outs" #bwl   \
+: "+S"(addr), "+c"(count) : "d"(port));\
+   }   \
 }  \
\
 static inline void ins##bwl(int port, void *addr, unsigned long count) \
 {  \
-   asm volatile("rep; ins" #bwl\
-: "+D"(addr), "+c"(count) : "d"(port));\
+   if (sev_active()) { \
+   unsigned type *value = (unsigned type *)addr;   \
+   while (count) { \
+   *value = in##bwl(port); \
+   value++;\
+   count--;\
+   }   \
+   } else {\
+   asm volatile("rep; ins" #bwl\
+: "+D"(addr), "+c"(count) : "d"(port));\
+   }   \
 }
 
 BUILDIO(b, b, char)
-- 
2.9.4



[RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.

Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x0001, bit 31), that the SEV feature is available
(CPUID 0x801f, bit 1) and then check a non-interceptable SEV MSR
(0xc0010131, bit 0).

This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory.

After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so.  This allows
us to distinguish between SME and SEV, each of which have unique
differences in how certain things are handled: e.g. DMA (always bounce
buffered with SEV) or EFI tables (always access decrypted with SME).

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/boot/compressed/Makefile  |   2 +
 arch/x86/boot/compressed/head_64.S |  16 +
 arch/x86/boot/compressed/mem_encrypt.S | 103 +
 arch/x86/boot/compressed/misc.h|   2 +
 arch/x86/boot/compressed/pagetable.c   |   8 ++-
 arch/x86/include/asm/mem_encrypt.h |   3 +
 arch/x86/include/asm/msr-index.h   |   3 +
 arch/x86/include/uapi/asm/kvm_para.h   |   1 -
 arch/x86/mm/mem_encrypt.c  |  42 +++---
 9 files changed, 169 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/boot/compressed/mem_encrypt.S

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 2c860ad..d2fe901 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o 
$(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
$(obj)/piggy.o $(obj)/cpuflags.o
 
+vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o
+
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index fbf4c32..6179d43 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -130,6 +130,19 @@ ENTRY(startup_32)
  /*
   * Build early 4G boot pagetable
   */
+   /*
+* If SEV is active then set the encryption mask in the page tables.
+* This will insure that when the kernel is copied and decompressed
+* it will be done so encrypted.
+*/
+   callget_sev_encryption_bit
+   xorl%edx, %edx
+   testl   %eax, %eax
+   jz  1f
+   subl$32, %eax   /* Encryption bit is always above bit 31 */
+   bts %eax, %edx  /* Set encryption mask for page tables */
+1:
+
/* Initialize Page tables to 0 */
lealpgtable(%ebx), %edi
xorl%eax, %eax
@@ -140,12 +153,14 @@ ENTRY(startup_32)
lealpgtable + 0(%ebx), %edi
leal0x1007 (%edi), %eax
movl%eax, 0(%edi)
+   addl%edx, 4(%edi)
 
/* Build Level 3 */
lealpgtable + 0x1000(%ebx), %edi
leal0x1007(%edi), %eax
movl$4, %ecx
 1: movl%eax, 0x00(%edi)
+   addl%edx, 0x04(%edi)
addl$0x1000, %eax
addl$8, %edi
decl%ecx
@@ -156,6 +171,7 @@ ENTRY(startup_32)
movl$0x0183, %eax
movl$2048, %ecx
 1: movl%eax, 0(%edi)
+   addl%edx, 4(%edi)
addl$0x0020, %eax
addl$8, %edi
decl%ecx
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
new file mode 100644
index 000..696716e
--- /dev/null
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -0,0 +1,103 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+   .text
+   .code32
+ENTRY(get_sev_encryption_bit)
+   xor %eax, %eax
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   push%ebx
+   push%ecx
+   push%edx
+
+   /* Check if running under a hypervisor */
+   movl$1, %eax
+   cpuid
+   bt  $31, %ecx   /* Check the hypervisor bit */
+   jnc .Lno_sev
+
+   movl$0x8000, %eax   /* CPUID to check the highest leaf */
+   cpuid
+   cmpl$0x801f, %eax   /* See if 0x801f is available */
+   jb  .Lno_sev
+
+   /*
+* Check

[RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

2017-07-24 Thread Brijesh Singh
Some KVM-specific custom MSRs shares the guest physical address with
hypervisor. When SEV is active, the shared physical address must be mapped
with encryption attribute cleared so that both hypervsior and guest can
access the data.

Add APIs to change memory encryption attribute in early boot code.

Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/mem_encrypt.h |  17 ++
 arch/x86/mm/mem_encrypt.c  | 117 +
 2 files changed, 134 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 9cb6472..30b539e 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -46,6 +46,11 @@ void __init sme_early_init(void);
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(struct boot_params *bp);
 
+int __init early_set_memory_decrypted(resource_size_t paddr,
+ unsigned long size);
+int __init early_set_memory_encrypted(resource_size_t paddr,
+ unsigned long size);
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void);
 
@@ -69,6 +74,18 @@ static inline void __init sme_early_init(void) { }
 static inline void __init sme_encrypt_kernel(void) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
+static inline int __init early_set_memory_decrypted(resource_size_t paddr,
+   unsigned long size)
+{
+   return 0;
+}
+
+static inline int __init early_set_memory_encrypted(resource_size_t paddr,
+   unsigned long size)
+{
+   return 0;
+}
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 /*
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ed8780e..d174b1c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#include "mm_internal.h"
+
 static char sme_cmdline_arg[] __initdata = "mem_encrypt";
 static char sme_cmdline_on[]  __initdata = "on";
 static char sme_cmdline_off[] __initdata = "off";
@@ -257,6 +259,121 @@ static void sme_free(struct device *dev, size_t size, 
void *vaddr,
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
+static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+{
+   pgprot_t old_prot, new_prot;
+   unsigned long pfn;
+   pte_t new_pte;
+
+   switch (level) {
+   case PG_LEVEL_4K:
+   pfn = pte_pfn(*kpte);
+   old_prot = pte_pgprot(*kpte);
+   break;
+   case PG_LEVEL_2M:
+   pfn = pmd_pfn(*(pmd_t *)kpte);
+   old_prot = pmd_pgprot(*(pmd_t *)kpte);
+   break;
+   case PG_LEVEL_1G:
+   pfn = pud_pfn(*(pud_t *)kpte);
+   old_prot = pud_pgprot(*(pud_t *)kpte);
+   break;
+   default:
+   return;
+   }
+
+   new_prot = old_prot;
+   if (enc)
+   pgprot_val(new_prot) |= _PAGE_ENC;
+   else
+   pgprot_val(new_prot) &= ~_PAGE_ENC;
+
+   /* if prot is same then do nothing */
+   if (pgprot_val(old_prot) == pgprot_val(new_prot))
+   return;
+
+   new_pte = pfn_pte(pfn, new_prot);
+   set_pte_atomic(kpte, new_pte);
+}
+
+static int __init early_set_memory_enc_dec(resource_size_t paddr,
+  unsigned long size, bool enc)
+{
+   unsigned long vaddr, vaddr_end, vaddr_next;
+   unsigned long psize, pmask;
+   int split_page_size_mask;
+   pte_t *kpte;
+   int level;
+
+   vaddr = (unsigned long)__va(paddr);
+   vaddr_next = vaddr;
+   vaddr_end = vaddr + size;
+
+   /*
+* We are going to change the physical page attribute from C=1 to C=0
+* or vice versa. Flush the caches to ensure that data is written into
+* memory with correct C-bit before we change attribute.
+*/
+   clflush_cache_range(__va(paddr), size);
+
+   for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+   kpte = lookup_address(vaddr, &level);
+   if (!kpte || pte_none(*kpte))
+   return 1;
+
+   if (level == PG_LEVEL_4K) {
+   __set_clr_pte_enc(kpte, level, enc);
+   vaddr_next = (vaddr & PAGE_MASK) + PAGE_SIZE;
+   continue;
+   }
+
+   psize = page_level_size(level);
+   pmask = page_level_mask(level);
+
+   /*
+* Check, whether we can change the large page in one go.
+* We request a split, when the address is not aligned and
+* the number of pages to set/clear encryption bit is smaller
+* than the number of pages in the large page.
+   

[RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-07-24 Thread Brijesh Singh
Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU
variable at compile time and share its physical address with hypervisor.
It presents a challege when SEV is active in guest OS, when SEV is active,
the guest memory is encrypted with guest key hence hypervisor will not
able to modify the guest memory. When SEV is active, we need to clear the
encryption attribute (aka C-bit) of shared physical addresses so that both
guest and hypervisor can access the data.

To solve this problem, I have tried these three options:

1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV
is detected clear the C-bit from the page table. But while doing so I
found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init
was called.

2) Since the C-bit works on PAGE_SIZE hence add some extra padding to
'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime
clear the encryption attribute of the full PAGE. The downside of this -
we need to modify structure which may break the compatibility.

3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
used to hold the compile time shared per-CPU variables. When SEV is
detected we map this section without C-bit.

This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED
macro to create a compile time per-CPU variable. When SEV is detected we
clear the C-bit from the shared per-CPU variable.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/kvm.c | 46 ---
 include/asm-generic/vmlinux.lds.h |  3 +++
 include/linux/percpu-defs.h   | 12 ++
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 71c17a5..1f6fec8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
 
 early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 
-static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
-static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
+static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) 
__aligned(64);
+static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) 
__aligned(64);
 static int has_steal_clock = 0;
 
 /*
@@ -303,7 +303,7 @@ static void kvm_register_steal_time(void)
cpu, (unsigned long long) slow_virt_to_phys(st));
 }
 
-static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
+static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = 
KVM_PV_EOI_DISABLED;
 
 static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 {
@@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 
val)
apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
+/* NOTE: function is marked as __ref because it is used by __init functions */
+static int __ref kvm_map_hv_shared_decrypted(void)
+{
+   static int once, ret;
+   int cpu;
+
+   if (once)
+   return ret;
+
+   /*
+* Iterate through all possible CPU's and clear the C-bit from
+* percpu variables.
+*/
+   for_each_possible_cpu(cpu) {
+   struct kvm_vcpu_pv_apf_data *apf;
+   unsigned long pa;
+
+   apf = &per_cpu(apf_reason, cpu);
+   pa = slow_virt_to_phys(apf);
+   sme_early_decrypt(pa & PAGE_MASK, PAGE_SIZE);
+   ret = early_set_memory_decrypted(pa, PAGE_SIZE);
+   if (ret)
+   break;
+   }
+
+   once = 1;
+   return ret;
+}
+
 static void kvm_guest_cpu_init(void)
 {
if (!kvm_para_available())
return;
 
+   /*
+* When SEV is active, map the shared percpu as unencrypted so that
+* both guest and hypervsior can access the data.
+*/
+   if (sev_active()) {
+   if (kvm_map_hv_shared_decrypted()) {
+   printk(KERN_ERR "Failed to map percpu as 
unencrypted\n");
+   return;
+   }
+   }
+
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
 
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index da0be9a..52854cf 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -783,6 +783,9 @@
. = ALIGN(cacheline);   \
*(.data..percpu)\
*(.data..percpu..shared_aligned)\
+   . = ALIGN(PAGE_SIZE);   \
+   *(.data..percpu..hv_shared) \
+   . = ALIGN(PAGE_SIZE);  

[RFC Part1 PATCH v3 17/17] X86/KVM: Clear encryption attribute when SEV is active

2017-07-24 Thread Brijesh Singh
The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
periodically updates the contents of the memory. When SEV is active, we
must clear the encryption attributes from the shared memory pages so that
both hypervisor and guest can access the data.

Signed-off-by: Brijesh Singh 
---
 arch/x86/entry/vdso/vma.c  |  5 ++--
 arch/x86/kernel/kvmclock.c | 64 +++---
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 726355c..ff50251 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm,
struct pvclock_vsyscall_time_info *pvti =
pvclock_pvti_cpu0_va();
if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
-   ret = vm_insert_pfn(
+   ret = vm_insert_pfn_prot(
vma,
vmf->address,
-   __pa(pvti) >> PAGE_SHIFT);
+   __pa(pvti) >> PAGE_SHIFT,
+   pgprot_decrypted(vma->vm_page_prot));
}
} else if (sym_offset == image->sym_hvclock_page) {
struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d889676..f3a8101 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock wall_clock;
+static struct pvclock_wall_clock *wall_clock;
 
 struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 {
@@ -64,15 +65,18 @@ static void kvm_get_wallclock(struct timespec *now)
int low, high;
int cpu;
 
-   low = (int)__pa_symbol(&wall_clock);
-   high = ((u64)__pa_symbol(&wall_clock) >> 32);
+   if (!wall_clock)
+   return;
+
+   low = (int)slow_virt_to_phys(wall_clock);
+   high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
 
native_write_msr(msr_kvm_wall_clock, low, high);
 
cpu = get_cpu();
 
vcpu_time = &hv_clock[cpu].pvti;
-   pvclock_read_wallclock(&wall_clock, vcpu_time, now);
+   pvclock_read_wallclock(wall_clock, vcpu_time, now);
 
put_cpu();
 }
@@ -249,11 +253,39 @@ static void kvm_shutdown(void)
native_machine_shutdown();
 }
 
+static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
+phys_addr_t align)
+{
+   phys_addr_t mem;
+
+   mem = memblock_alloc(size, align);
+   if (!mem)
+   return 0;
+
+   if (sev_active()) {
+   if (early_set_memory_decrypted(mem, size))
+   goto e_free;
+   }
+
+   return mem;
+e_free:
+   memblock_free(mem, size);
+   return 0;
+}
+
+static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
+{
+   if (sev_active())
+   early_set_memory_encrypted(addr, size);
+
+   memblock_free(addr, size);
+}
+
 void __init kvmclock_init(void)
 {
struct pvclock_vcpu_time_info *vcpu_time;
-   unsigned long mem;
-   int size, cpu;
+   unsigned long mem, mem_wall_clock;
+   int size, cpu, wall_clock_size;
u8 flags;
 
size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
@@ -270,15 +302,29 @@ void __init kvmclock_init(void)
printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);
 
-   mem = memblock_alloc(size, PAGE_SIZE);
-   if (!mem)
+   wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
+   mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
+   if (!mem_wall_clock)
return;
+
+   wall_clock = __va(mem_wall_clock);
+   memset(wall_clock, 0, wall_clock_size);
+
+   mem = kvm_memblock_alloc(size, PAGE_SIZE);
+   if (!mem) {
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
+   return;
+   }
+
hv_clock = __va(mem);
memset(hv_clock, 0, size);
 
if (kvm_register_clock("primary cpu clock")) {
hv_clock = NULL;
-   memblock_free(mem, size);
+   kvm_memblock_free(mem, size);
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
return;
}
 
-- 
2.9.4



[RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

In order for memory pages to be properly mapped when SEV is active, we
need to use the PAGE_KERNEL protection attribute as the base protection.
This will insure that memory mapping of, e.g. ACPI tables, receives the
proper mapping attributes.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/ioremap.c  | 28 
 include/linux/ioport.h |  3 +++
 kernel/resource.c  | 17 +
 3 files changed, 48 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c0be7cf..7b27332 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
 }
 
+static int __ioremap_res_desc_other(struct resource *res, void *arg)
+{
+   return (res->desc != IORES_DESC_NONE);
+}
+
+/*
+ * This function returns true if the target memory is marked as
+ * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
+ * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
+ */
+static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
+{
+   u64 start, end;
+
+   start = (u64)addr;
+   end = start + size - 1;
+
+   return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. It transparently creates kernel huge I/O mapping when
@@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
pcm = new_pcm;
}
 
+   /*
+* If the page being mapped is in memory and SEV is active then
+* make sure the memory encryption attribute is enabled in the
+* resulting mapping.
+*/
prot = PAGE_KERNEL_IO;
+   if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
+   prot = pgprot_encrypted(prot);
+
switch (pcm) {
case _PAGE_CACHE_MODE_UC:
default:
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 1c66b9c..297f5b8 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -268,6 +268,9 @@ extern int
 walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *));
 extern int
+walk_mem_res(u64 start, u64 end, void *arg,
+int (*func)(struct resource *, void *));
+extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
 extern int
diff --git a/kernel/resource.c b/kernel/resource.c
index 5f9ee7bb0..ec3fa0c 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -468,6 +468,23 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 arg, func);
 }
 
+/*
+ * This function calls the @func callback against all memory ranges, which
+ * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
+ */
+int walk_mem_res(u64 start, u64 end, void *arg,
+int (*func)(struct resource *, void *))
+{
+   struct resource res;
+
+   res.start = start;
+   res.end = end;
+   res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+
+   return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+arg, func);
+}
+
 #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
 
 /*
-- 
2.9.4



[RFC Part1 PATCH v3 12/17] x86/mm: DMA support for SEV memory encryption

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

DMA access to memory mapped as encrypted while SEV is active can not be
encrypted during device write or decrypted during device read. In order
for DMA to properly work when SEV is active, the SWIOTLB bounce buffers
must be used.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/mem_encrypt.c | 86 +++
 lib/swiotlb.c |  5 +--
 2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 1e4643e..5e5d460 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -191,8 +191,86 @@ void __init sme_early_init(void)
/* Update the protection map with memory encryption mask */
for (i = 0; i < ARRAY_SIZE(protection_map); i++)
protection_map[i] = pgprot_encrypted(protection_map[i]);
+
+   if (sev_active())
+   swiotlb_force = SWIOTLB_FORCE;
+}
+
+static void *sme_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+  gfp_t gfp, unsigned long attrs)
+{
+   unsigned long dma_mask;
+   unsigned int order;
+   struct page *page;
+   void *vaddr = NULL;
+
+   dma_mask = dma_alloc_coherent_mask(dev, gfp);
+   order = get_order(size);
+
+   /*
+* Memory will be memset to zero after marking decrypted, so don't
+* bother clearing it before.
+*/
+   gfp &= ~__GFP_ZERO;
+
+   page = alloc_pages_node(dev_to_node(dev), gfp, order);
+   if (page) {
+   dma_addr_t addr;
+
+   /*
+* Since we will be clearing the encryption bit, check the
+* mask with it already cleared.
+*/
+   addr = __sme_clr(phys_to_dma(dev, page_to_phys(page)));
+   if ((addr + size) > dma_mask) {
+   __free_pages(page, get_order(size));
+   } else {
+   vaddr = page_address(page);
+   *dma_handle = addr;
+   }
+   }
+
+   if (!vaddr)
+   vaddr = swiotlb_alloc_coherent(dev, size, dma_handle, gfp);
+
+   if (!vaddr)
+   return NULL;
+
+   /* Clear the SME encryption bit for DMA use if not swiotlb area */
+   if (!is_swiotlb_buffer(dma_to_phys(dev, *dma_handle))) {
+   set_memory_decrypted((unsigned long)vaddr, 1 << order);
+   memset(vaddr, 0, PAGE_SIZE << order);
+   *dma_handle = __sme_clr(*dma_handle);
+   }
+
+   return vaddr;
+}
+
+static void sme_free(struct device *dev, size_t size, void *vaddr,
+dma_addr_t dma_handle, unsigned long attrs)
+{
+   /* Set the SME encryption bit for re-use if not swiotlb area */
+   if (!is_swiotlb_buffer(dma_to_phys(dev, dma_handle)))
+   set_memory_encrypted((unsigned long)vaddr,
+1 << get_order(size));
+
+   swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
+static const struct dma_map_ops sme_dma_ops = {
+   .alloc  = sme_alloc,
+   .free   = sme_free,
+   .map_page   = swiotlb_map_page,
+   .unmap_page = swiotlb_unmap_page,
+   .map_sg = swiotlb_map_sg_attrs,
+   .unmap_sg   = swiotlb_unmap_sg_attrs,
+   .sync_single_for_cpu= swiotlb_sync_single_for_cpu,
+   .sync_single_for_device = swiotlb_sync_single_for_device,
+   .sync_sg_for_cpu= swiotlb_sync_sg_for_cpu,
+   .sync_sg_for_device = swiotlb_sync_sg_for_device,
+   .mapping_error  = swiotlb_dma_mapping_error,
+};
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void)
 {
@@ -202,6 +280,14 @@ void __init mem_encrypt_init(void)
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();
 
+   /*
+* With SEV, DMA operations cannot use encryption. New DMA ops
+* are required in order to mark the DMA areas as decrypted or
+* to use bounce buffers.
+*/
+   if (sev_active())
+   dma_ops = &sme_dma_ops;
+
pr_info("AMD Secure Memory Encryption (SME) active\n");
 }
 
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 8c6c83e..85fed2f 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -507,8 +507,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
 
-   if (sme_active())
-   pr_warn_once("SME is active and system is using DMA bounce 
buffers\n");
+   if (sme_active() || sev_active())
+   pr_warn_once("%s is active and system is using DMA bounce 
buffers\n

[RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

In prep for a new function that will need additional resource information
during the resource walk, update the resource walk callback to pass the
resource structure.  Since the current callback start and end arguments
are pulled from the resource structure, the callback functions can obtain
them from the resource structure directly.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/powerpc/kernel/machine_kexec_file_64.c | 12 +---
 arch/x86/kernel/crash.c | 18 +-
 arch/x86/kernel/pmem.c  |  2 +-
 include/linux/ioport.h  |  4 ++--
 include/linux/kexec.h   |  2 +-
 kernel/kexec_file.c |  5 +++--
 kernel/resource.c   |  9 +
 7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c 
b/arch/powerpc/kernel/machine_kexec_file_64.c
index 992c0d2..e4395f9 100644
--- a/arch/powerpc/kernel/machine_kexec_file_64.c
+++ b/arch/powerpc/kernel/machine_kexec_file_64.c
@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
  * and that value will be returned. If all free regions are visited without
  * func returning non-zero, then zero will be returned.
  */
-int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+   int (*func)(struct resource *, void *))
 {
int ret = 0;
u64 i;
phys_addr_t mstart, mend;
+   struct resource res = { };
 
if (kbuf->top_down) {
for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 44404e2..815008c 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
 
@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data 
*ced,
return ret;
 }
 
-static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
+static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
struct crash_elf_data *ced = arg;
Elf64_Ehdr *ehdr;
@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, 
u64 end, void *arg)
ehdr = ced->ehdr;
 
/* Exclude unwanted mem ranges */
-   ret = elf_header_exclude_ranges(ced, start, end);
+   ret = elf_header_exclude_ranges(ced, res->start, res->end);
if (ret)
return ret;
 
@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, 
struct e820_entry *entry)
return 0;
 }
 
-static int memmap_entry_callback(u64 start, u64 end, void *arg)
+static int memmap_entry_callback(struct resource *res, void *arg)
 {
struct crash_memmap_data *cmd = arg;
struct boot_params *params = cmd->params;
struct e820_entry ei;
 
-   ei.addr = start;
-   ei.size = end - start + 1;
+   ei.addr = res->start;
+   ei.size = res->end - res->start + 1;
ei.type = cmd->type;
add_e820_entry(params, &ei);
 
@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
return ret;
 }
 
-static int determine_backup_region(u64 start, u64 end, void *arg)
+static int determine_backup_region(struct resource *res, void *arg)
 {
struct kimage *image = arg;
 
-   image->arch.backup_src_start = start;
-   image->arch.backup_src_sz = end - start + 1;
+   image->a

[RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
functions each have much of the same code.  Create a new function that
consolidates the common code from these functions in one place to reduce
the amount of duplicated code.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 kernel/resource.c | 53 ++---
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..7b20b3e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
res->start = p->start;
if (res->end > p->end)
res->end = p->end;
+   res->desc = p->desc;
return 0;
 }
 
+static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
+bool first_level_children_only,
+void *arg, int (*func)(u64, u64, void *))
+{
+   u64 orig_end = res->end;
+   int ret = -1;
+
+   while ((res->start < res->end) &&
+  !find_next_iomem_res(res, desc, first_level_children_only)) {
+   ret = (*func)(res->start, res->end, arg);
+   if (ret)
+   break;
+
+   res->start = res->end + 1;
+   res->end = orig_end;
+   }
+
+   return ret;
+}
+
 /*
  * Walks through iomem resources and calls func() with matching resource
  * ranges. This walks through whole tree and not just first level children.
@@ -418,26 +439,12 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long 
flags, u64 start,
u64 end, void *arg, int (*func)(u64, u64, void *))
 {
struct resource res;
-   u64 orig_end;
-   int ret = -1;
 
res.start = start;
res.end = end;
res.flags = flags;
-   orig_end = res.end;
-
-   while ((res.start < res.end) &&
-   (!find_next_iomem_res(&res, desc, false))) {
-
-   ret = (*func)(res.start, res.end, arg);
-   if (ret)
-   break;
-
-   res.start = res.end + 1;
-   res.end = orig_end;
-   }
 
-   return ret;
+   return __walk_iomem_res_desc(&res, desc, false, arg, func);
 }
 
 /*
@@ -451,22 +458,13 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *))
 {
struct resource res;
-   u64 orig_end;
-   int ret = -1;
 
res.start = start;
res.end = end;
res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-   orig_end = res.end;
-   while ((res.start < res.end) &&
-   (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
-   ret = (*func)(res.start, res.end, arg);
-   if (ret)
-   break;
-   res.start = res.end + 1;
-   res.end = orig_end;
-   }
-   return ret;
+
+   return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+arg, func);
 }
 
 #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
@@ -508,6 +506,7 @@ static int __is_ram(unsigned long pfn, unsigned long 
nr_pages, void *arg)
 {
return 1;
 }
+
 /*
  * This generic page_is_ram() returns true if specified address is
  * registered as System RAM in iomem_resource list.
-- 
2.9.4



[RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

When SEV is active the trampoline area will need to be in encrypted
memory so only mark the area decrypted if SME is active.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/realmode/init.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 1f71980..c7eeca7 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
/*
 * If SME is active, the trampoline area will need to be in
 * decrypted memory in order to bring up other processors
-* successfully.
+* successfully. For SEV the trampoline area needs to be in
+* encrypted memory, so only do this for SME.
 */
-   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+   if (sme_active())
+   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
 
memcpy(base, real_mode_blob, size);
 
-- 
2.9.4



[RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

When Secure Encrypted Virtualization (SEV) is active, boot data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/ioremap.c | 44 
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 34f0e18..c0be7cf 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -422,6 +422,9 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * areas should be mapped decrypted. And since the encryption key can
  * change across reboots, persistent memory should also be mapped
  * decrypted.
+ *
+ * If SEV is active, that implies that BIOS/UEFI also ran encrypted so
+ * only persistent memory should be mapped decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
@@ -458,6 +461,11 @@ static bool memremap_should_map_decrypted(resource_size_t 
phys_addr,
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   /* For SEV, these areas are encrypted */
+   if (sev_active())
+   break;
+   /* Fallthrough */
+
case E820_TYPE_PRAM:
return true;
default:
@@ -581,7 +589,7 @@ static bool __init 
early_memremap_is_setup_data(resource_size_t phys_addr,
 bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
 unsigned long flags)
 {
-   if (!sme_active())
+   if (!sme_active() && !sev_active())
return true;
 
if (flags & MEMREMAP_ENC)
@@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t 
phys_addr, unsigned long size,
if (flags & MEMREMAP_DEC)
return false;
 
-   if (memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   return false;
+   if (sme_active()) {
+   if (memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size) ||
+   memremap_should_map_decrypted(phys_addr, size))
+   return false;
+   } else if (sev_active()) {
+   if (memremap_should_map_decrypted(phys_addr, size))
+   return false;
+   }
 
return true;
 }
@@ -608,15 +621,22 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
 unsigned long size,
 pgprot_t prot)
 {
-   if (!sme_active())
+   if (!sme_active() && !sev_active())
return prot;
 
-   if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   prot = pgprot_decrypted(prot);
-   else
-   prot = pgprot_encrypted(prot);
+   if (sme_active()) {
+   if (early_memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size) ||
+   memremap_should_map_decrypted(phys_addr, size))
+   prot = pgprot_decrypted(prot);
+   else
+   prot = pgprot_encrypted(prot);
+   } else if (sev_active()) {
+   if (memremap_should_map_decrypted(phys_addr, size))
+   prot = pgprot_decrypted(prot);
+   else
+   prot = pgprot_encrypted(prot);
+   }
 
return prot;
 }
-- 
2.9.4



[RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

The current code checks only for sme_active() when determining whether
to perform the encryption attribute change.  Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/pageattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dfb7d65..b726b23 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
unsigned long start;
int ret;
 
-   /* Nothing to do if the SME is not active */
-   if (!sme_active())
+   /* Nothing to do if SME and SEV are not active */
+   if (!sme_active() && !sev_active())
return 0;
 
/* Should not be working on unaligned addresses */
-- 
2.9.4



[RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/platform/efi/efi_64.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e8388..1ecb3f6 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t 
*md, unsigned long pf)
pgd_t *pgd = efi_pgd;
int err1, err2;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
/* Update the 1:1 mapping */
pfn = md->phys_addr >> PAGE_SHIFT;
err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, 
pf);
@@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.4



<    2   3   4   5   6   7   8   9   >