Re: [PATCH] x86/efi: Access EFI MMIO data as unencrypted when SEV is active

2018-07-16 Thread Brijesh Singh

Hi Ard,


On 07/11/2018 05:00 AM, Ard Biesheuvel wrote:

On 3 July 2018 at 15:32, Brijesh Singh  wrote:

SEV guest fails to update the UEFI runtime variables stored in the
flash. commit 1379edd59673 ("x86/efi: Access EFI data as encrypted
when SEV is active") unconditionally maps all the UEFI runtime data
as 'encrypted' (C=1). When SEV is active the UEFI runtime data marked
as EFI_MEMORY_MAPPED_IO should be mapped as 'unencrypted' so that both
guest and hypervisor can access the data.

Fixes: 1379edd59673 (x86/efi: Access EFI data as encrypted ...)
Cc: Tom Lendacky 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: linux-efi@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: Ard Biesheuvel 
Cc: Matt Fleming 
Cc: Andy Lutomirski 
Cc:  # 4.15.x
Signed-off-by: Brijesh Singh 
---
  arch/x86/platform/efi/efi_64.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 77873ce..5f2eb32 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -417,7 +417,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
 if (!(md->attribute & EFI_MEMORY_WB))
 flags |= _PAGE_PCD;

-   if (sev_active())
+   if (sev_active() && md->type != EFI_MEMORY_MAPPED_IO)
 flags |= _PAGE_ENC;

 pfn = md->phys_addr >> PAGE_SHIFT;


Is it safe to only update this occurrence and not the one in
efi_runtime_update_mappings() ?




It's safe to update this occurrence only. The SEV support is added
in recent EDK2 bios, and the version of bios provides the
EFI_MEMORY_ATTRIBUTE_TABLE. Hence the efi_enabled(EFI_MEM_ATTR) check in 
efi_runtime_update_mappings() will always be true. When EFI_MEM_ATTR is

set the code updates the mapping and returns (see below)

void __init efi_runtime_update_mappings(void)
{
   .
   .

   /*
* Use the EFI Memory Attribute Table for mapping permissions if it
* exists, since it is intended to supersede EFI_PROPERTIES_TABLE.
*/
if (efi_enabled(EFI_MEM_ATTR)) {
efi_memattr_apply_permissions(NULL, efi_update_mem_attr);
return;
}


   ...

}

The EFI_MEMORY_ATTRIBUTE_TABLE table does not include MMIO regions, the
table describes the memory protections to EFI runtime code and data
regions only. Both EFI runtime code and data should be mapped as
encrypted. Hence I skipped updating the efi_runtime_update_mappings().


thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86/efi: Access EFI MMIO data as unencrypted when SEV is active

2018-07-03 Thread Brijesh Singh



On 7/3/18 10:44 AM, Borislav Petkov wrote:
> (dropping stable@ as this is not how you send patches to stable).
>
> On Tue, Jul 03, 2018 at 05:37:18PM +0200, Ard Biesheuvel wrote:
>> On 3 July 2018 at 15:32, Brijesh Singh  wrote:
>>> SEV guest fails to update the UEFI runtime variables stored in the
>>> flash. commit 1379edd59673 ("x86/efi: Access EFI data as encrypted
>>> when SEV is active") unconditionally maps all the UEFI runtime data
>>> as 'encrypted' (C=1). When SEV is active the UEFI runtime data marked
>>> as EFI_MEMORY_MAPPED_IO should be mapped as 'unencrypted' so that both
>>> guest and hypervisor can access the data.
>>>
>> I'm uncomfortable having to carry these heuristics in the kernel. The
>> UEFI memory map should be the definitive source of information
>> regarding how the OS should map the regions it describes, and if we
>> need to guess the encryption status, we are likely to get it wrong at
>> least some of the times.

I agree with Ard,  it may be good idea to extend the UEFI spec to
include encryption information. Having this information may be helpful
in some cases, e.g if we ever need to map a specific non IO memory as
unencrypted. So far we have not seen the need for it. But I will ask AMD
folks working closely with UEFI committee to float this and submit it as
enhancement in Tianocore BZ.

> I think the problem here is that IO memory can't be encrypted, at least
> at the moment. Thus this patch. I believe future versions will be able
> to handle encrypted IO but that's something Brijesh can correct me on.

Yes you are right, IO memory can't be encrypted. We map all IO memory
ranges as unencrypted everywhere else in the kernel. The
EFI_MEMORY_MAPPED_IO type should also be mapped as unencrypted.


> So it is not really about the UEFI spec but about what the hardware
> does/supports currently.
>
> And I don't think that change matters on anything else besides AMD with
> SEV enabled...
>
> Thx.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86/efi: Access EFI MMIO data as unencrypted when SEV is active

2018-07-03 Thread Brijesh Singh
SEV guest fails to update the UEFI runtime variables stored in the
flash. commit 1379edd59673 ("x86/efi: Access EFI data as encrypted
when SEV is active") unconditionally maps all the UEFI runtime data
as 'encrypted' (C=1). When SEV is active the UEFI runtime data marked
as EFI_MEMORY_MAPPED_IO should be mapped as 'unencrypted' so that both
guest and hypervisor can access the data.

Fixes: 1379edd59673 (x86/efi: Access EFI data as encrypted ...)
Cc: Tom Lendacky 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: linux-efi@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: Ard Biesheuvel 
Cc: Matt Fleming 
Cc: Andy Lutomirski 
Cc:  # 4.15.x
Signed-off-by: Brijesh Singh 
---
 arch/x86/platform/efi/efi_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 77873ce..5f2eb32 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -417,7 +417,7 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
-   if (sev_active())
+   if (sev_active() && md->type != EFI_MEMORY_MAPPED_IO)
flags |= _PAGE_ENC;
 
pfn = md->phys_addr >> PAGE_SHIFT;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Part1 PATCH v7 07/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-10-20 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
Tested-by: Borislav Petkov 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: linux-efi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: x...@kernel.org
---
 arch/x86/platform/efi/efi_64.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e83888e5b9..5469c9319f43 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +417,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -538,6 +546,9 @@ static int __init efi_update_mem_attr(struct mm_struct *mm, 
efi_memory_desc_t *m
if (!(md->attribute & EFI_MEMORY_RO))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
return efi_update_mappings(md, pf);
 }
 
@@ -589,6 +600,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Part1 PATCH v6 07/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-10-16 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: linux-efi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
---
 arch/x86/platform/efi/efi_64.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e83888e5b9..5469c9319f43 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +417,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -538,6 +546,9 @@ static int __init efi_update_mem_attr(struct mm_struct *mm, 
efi_memory_desc_t *m
if (!(md->attribute & EFI_MEMORY_RO))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
return efi_update_mappings(md, pf);
 }
 
@@ -589,6 +600,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Part1 PATCH v5 07/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-09-27 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: linux-efi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
Reviewed-by: Borislav Petkov 
---
 arch/x86/platform/efi/efi_64.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e83888e5b9..5469c9319f43 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +417,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -538,6 +546,9 @@ static int __init efi_update_mem_attr(struct mm_struct *mm, 
efi_memory_desc_t *m
if (!(md->attribute & EFI_MEMORY_RO))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
return efi_update_mappings(md, pf);
 }
 
@@ -589,6 +600,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Part1 PATCH v4 07/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-09-16 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Borislav Petkov 
Cc: Andy Lutomirski 
Cc: Matt Fleming 
Cc: Ard Biesheuvel 
Cc: linux-efi@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: x...@kernel.org
Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/platform/efi/efi_64.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e83888e5b9..5469c9319f43 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,11 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +417,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -538,6 +546,9 @@ static int __init efi_update_mem_attr(struct mm_struct *mm, 
efi_memory_desc_t *m
if (!(md->attribute & EFI_MEMORY_RO))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
return efi_update_mappings(md, pf);
 }
 
@@ -589,6 +600,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Brijesh Singh



On 09/15/2017 11:22 AM, Borislav Petkov wrote:

mem_encrypt_init() where everything should be set up already.



Yep, its safe to derefs the static key in mem_encrypt_init(). I've
tried the approach and it seems to be work fine. I will include the
required changes in next rev. thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Brijesh Singh



On 09/15/2017 09:40 AM, Borislav Petkov wrote:

I need to figure out the include hell first.


I am working with slightly newer patch sets -- in that patch Tom has
moved the sev_active() definition in arch/x86/mm/mem_encrypt.c and I
have no issue using your recommended (since I no longer need the include
path changes).

But in my quick run I did found a runtime issue, it seems enabling the static
key in sme_enable is too early. Guest reboots as soon as it tries to enable
the key.

I see the similar issue with non SEV guest with my simple patch below.
Guest will reboot as soon as it tries to enable the key.

--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -40,6 +40,8 @@ pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & 
~(_PAGE_GLOBAL | _PAGE_NX);
 
 #define __head __section(.head.text)
 
+DEFINE_STATIC_KEY_FALSE(__testme);

+
 static void __head *fixup_pointer(void *ptr, unsigned long physaddr)
 {
return ptr - (void *)_text + (void *)physaddr;
@@ -71,6 +73,8 @@ unsigned long __head __startup_64(unsigned long physaddr,
if (load_delta & ~PMD_PAGE_MASK)
for (;;);
 
+   static_branch_enable(&__testme);

+
/* Activate Secure Memory Encryption (SME) if supported and enabled */
sme_enable(bp);

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-09-15 Thread Brijesh Singh



On 09/15/2017 07:24 AM, Borislav Petkov wrote:

On Tue, Aug 22, 2017 at 06:52:48PM +0200, Borislav Petkov wrote:

As always, the devil is in the detail.


Ok, actually we can make this much simpler by using a static key. A
conceptual patch below - I only need to fix that crazy include hell I'm
stepping into with this.

In any case, we were talking about having a static branch already so
this fits the whole strategy.



thanks for the suggestion Boris, it will make patch much simpler.
I will try this out.

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-04 Thread Brijesh Singh

On 9/4/17 12:05 PM, Borislav Petkov wrote:
> On Fri, Sep 01, 2017 at 05:52:13PM -0500, Brijesh Singh wrote:
>>  So far, we have not seen the need for having such functions except
>> this cases. The approach we have right now works just fine and not
>> sure if its worth adding new functions.
> Then put the call to kvm_map_hv_shared_decrypted() into
> kvm_smp_prepare_boot_cpu() to denote that you're executing this whole
> stuff only once during guest init.
>
> Now you're doing additional jumping-through-hoops with that once static
> var just so you can force something which needs to execute only once but
> gets called in a per-CPU path.
>
> See what I mean?

Yes, I see your point. I will address this issue in next rev.


-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-02 Thread Brijesh Singh


On 9/1/17 10:21 PM, Andy Lutomirski wrote:
> On Fri, Sep 1, 2017 at 3:52 PM, Brijesh Singh  wrote:
>> Hi Boris,
>>
>> On 08/30/2017 12:46 PM, Borislav Petkov wrote:
>>> On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:
>>>> I was trying to avoid mixing early and no-early set_memory_decrypted()
>>>> but if
>>>> feedback is: use early_set_memory_decrypted() only if its required
>>>> otherwise
>>>> use set_memory_decrypted() then I can improve the logic in next rev.
>>>> thanks
>>>
>>> Yes, I think you should use the early versions when you're, well,
>>> *early* :-) But get rid of that for_each_possible_cpu() and do it only
>>> on the current CPU, as this is a per-CPU path anyway. If you need to
>>> do it on *every* CPU and very early, then you need a separate function
>>> which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.
>>>
>> I am trying to implement your feedback and now remember why I choose to
>> use early_set_memory_decrypted() and for_each_possible_cpu loop. These
>> percpu variables are static. Hence before clearing the C-bit we must
>> perform the in-place decryption so that original assignment is preserved
>> after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
>> -- which can be used to perform the in-place decryption but we do not have
>> similar routine for non-early cases. In order to address your feedback,
>> we have to add similar functions. So far, we have not seen the need for
>> having such functions except this cases. The approach we have right now
>> works just fine and not sure if its worth adding new functions.
>>
>> Thoughts ?
>>
>> [1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of
>> memory
> Shouldn't this be called DEFINE_PER_CPU_UNENCRYPTED?  ISTM the "HV
> shared" bit is incidental.

Thanks for the suggestion, we could call it DEFINE_PER_CPU_UNENCRYPTED.
I will use it in next rev.

-Brijesh

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-09-01 Thread Brijesh Singh

Hi Boris,

On 08/30/2017 12:46 PM, Borislav Petkov wrote:

On Wed, Aug 30, 2017 at 11:18:42AM -0500, Brijesh Singh wrote:

I was trying to avoid mixing early and no-early set_memory_decrypted() but if
feedback is: use early_set_memory_decrypted() only if its required otherwise
use set_memory_decrypted() then I can improve the logic in next rev. thanks


Yes, I think you should use the early versions when you're, well,
*early* :-) But get rid of that for_each_possible_cpu() and do it only
on the current CPU, as this is a per-CPU path anyway. If you need to
do it on *every* CPU and very early, then you need a separate function
which is called in kvm_smp_prepare_boot_cpu() as there you're pre-SMP.



I am trying to implement your feedback and now remember why I choose to
use early_set_memory_decrypted() and for_each_possible_cpu loop. These
percpu variables are static. Hence before clearing the C-bit we must
perform the in-place decryption so that original assignment is preserved
after we change the C-bit. Tom's SME patch [1] added sme_early_decrypt()
-- which can be used to perform the in-place decryption but we do not have
similar routine for non-early cases. In order to address your feedback,
we have to add similar functions. So far, we have not seen the need for
having such functions except this cases. The approach we have right now
works just fine and not sure if its worth adding new functions.

Thoughts ?

[1] Commit :7f8b7e7 x86/mm: Add support for early encryption/decryption of 
memory


-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-08-30 Thread Brijesh Singh

Hi Boris,

On 08/29/2017 05:22 AM, Borislav Petkov wrote:

[...]


On Mon, Jul 24, 2017 at 02:07:56PM -0500, Brijesh Singh wrote:

Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU


   MSRs


variable at compile time and share its physical address with hypervisor.


That sentence needs changing - the MSRs don't allocate - for them gets
allocated.


It presents a challege when SEV is active in guest OS, when SEV is active,
the guest memory is encrypted with guest key hence hypervisor will not
able to modify the guest memory. When SEV is active, we need to clear the
encryption attribute (aka C-bit) of shared physical addresses so that both
guest and hypervisor can access the data.


This whole paragraph needs rewriting.



I will improve the commit message in next rev.

[...]


+/* NOTE: function is marked as __ref because it is used by __init functions */


No need for that comment.

What should you look into is why do you need to call the early versions:

" * producing a warning (of course, no warning does not mean code is
  * correct, so optimally document why the __ref is needed and why it's OK)."

And we do have the normal set_memory_decrypted() etc helpers so why
aren't we using those?



Since kvm_guest_init() is called early in the boot process hence we will not
able to use set_memory_decrypted() function. IIRC, if we try calling
set_memory_decrypted() early then we will hit a BUG_ON [1] -- mainly when it
tries to flush the caches.

[1] 
http://elixir.free-electrons.com/linux/latest/source/arch/x86/mm/pageattr.c#L167




If you need to use the early ones too, then you probably need to
differentiate this in the callers by passing a "bool early", which calls
the proper flavor.



Sure I can rearrange code to make it more readable and use "bool early"
parameter to differentiate it.



+static int __ref kvm_map_hv_shared_decrypted(void)
+{
+   static int once, ret;
+   int cpu;
+
+   if (once)
+   return ret;


So this function gets called per-CPU but you need to do this ugly "once"
thing - i.e., global function called in a per-CPU context.

Why can't you do that mapping only on the current CPU and then
when that function is called on the next CPU, it will do the same thing
on that next CPU?




Yes, it can be done but I remember running into issues during the CPU hot plug.
The patch uses early_set_memory_decrypted() -- which calls
kernel_physical_mapping_init() to split the large pages into smaller. IIRC, the
API did not work after the system is successfully booted. After the system is
booted we must use the set_memory_decrypted().

I was trying to avoid mixing early and no-early set_memory_decrypted() but if
feedback is: use early_set_memory_decrypted() only if its required otherwise
use set_memory_decrypted() then I can improve the logic in next rev. thanks


[...]


diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index da0be9a..52854cf 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -783,6 +783,9 @@
. = ALIGN(cacheline);   \
*(.data..percpu)\
*(.data..percpu..shared_aligned)\
+   . = ALIGN(PAGE_SIZE);   \
+   *(.data..percpu..hv_shared) \
+   . = ALIGN(PAGE_SIZE);   \
VMLINUX_SYMBOL(__per_cpu_end) = .;


Yeah, no, you can't do that. That's adding this section unconditionally
on *every* arch. You need to do some ifdeffery like it is done at the
beginning of that file and have this only on the arch which supports SEV.




Will do . thanks

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

2017-08-28 Thread Brijesh Singh
Hi Boris,


On 8/28/17 5:51 AM, Borislav Petkov wrote:

[..]

> +static int __init early_set_memory_enc_dec(resource_size_t paddr,
>> +   unsigned long size, bool enc)
>> +{
>> +unsigned long vaddr, vaddr_end, vaddr_next;
>> +unsigned long psize, pmask;
>> +int split_page_size_mask;
>> +pte_t *kpte;
>> +int level;
>> +
>> +vaddr = (unsigned long)__va(paddr);
>> +vaddr_next = vaddr;
>> +vaddr_end = vaddr + size;
>> +
>> +/*
>> + * We are going to change the physical page attribute from C=1 to C=0
>> + * or vice versa. Flush the caches to ensure that data is written into
>> + * memory with correct C-bit before we change attribute.
>> + */
>> +clflush_cache_range(__va(paddr), size);
>> +
>> +for (; vaddr < vaddr_end; vaddr = vaddr_next) {
>> +kpte = lookup_address(vaddr, &level);
>> +if (!kpte || pte_none(*kpte))
>> +return 1;
> Return before flushing TLBs? Perhaps you mean
>
>   ret = 1;
>   goto out;
>
> here and out does
>
>   __flush_tlb_all();
>   return ret;

thanks, good catch. I will fix in next rev.

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Brijesh Singh



On 07/26/2017 02:26 PM, H. Peter Anvin wrote:


   \

   static inline void outs##bwl(int port, const void *addr, unsigned

long count) \

   {


This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/


Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling

a real

function will be significant.
The code bloat reduction will be significant.


I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is

actually

still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the

early

console.



There are some indirect user of string I/O functions. The following
functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this
approach,
I have added a new file arch/x86/kernel/io.c which provides non rep
version of
string I/O routines. The file gets built and used only when
AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with
AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
a function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)
\
  unsigned type value = in##bwl(port);\
  slow_down_io(); \
  return value;   \
-}
\
-
\
+}
+
+#define BUILDIO_REP(bwl, bw, type)
\
static inline void outs##bwl(int port, const void *addr, unsigned long
count) \
{
\
  asm volatile("rep; outs" #bwl   \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
unsigned long count)\
{
\
  asm volatile("rep; ins" #bwl\
   : "+D"(addr), "+c"(count) : "d"(port));\
-}
+}
\
  
  BUILDIO(b, b, char)

  BUILDIO(w, w, short)
  BUILDIO(l, , int)
  
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+extern void outsb_try_rep(int port, const void *addr, unsigned long
count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long
count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long
count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb  outsb_try_rep
+#define insb   insb_try_rep
+#define outsw  outsw_try_rep
+#define insw   insw_try_rep
+#define outsl  outsl_try_rep
+#define insl   insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
  extern void *xlate_dev_mem_ptr(phys_addr_t phys);
  extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -42,6 +42,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
  
  obj-y  := process_$(BITS).o signal.o

  obj-$(CONFIG_COMPAT)   += signal_compat.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += io.o
obj-y  += traps.o irq.o irq_$(BITS).o
dumpstack_$(BITS).o
  obj-y  += time.o ioport.o dumpstack.o nmi.o
  obj-$(CONFIG_MODIFY_LDT_SYSCALL)   += ldt.o
diff --git a/arch/x86/kernel/io.c b/arch/x86/kernel/io.c
new file mode 100644
index 000..f58afa9
--- /dev/null
+++ b/arch/x86/kernel/io.c
@@ -0,0 +1,87 @@
+#include 
+#include 
+#include 
+
+void outsb_try_rep(int port, const void *addr, unsigned long count)
+{
+   if (sev_active()) {
+   unsigned char *value = (unsigned char *)addr;
+   while (count) {
+   outb(*value, port);
+   value++;
+   count--;
+   }
+   } else {
+   asm volatile("rep; outsb" : "+S"(addr), "+c"(count) :
"d"(port));
+   }
+}
+
+void insb_try_rep(int port, void *addr, unsigned long count)
+{
+   if (sev_active()) {
+   unsigned char *value = (unsigned char *)addr;
+  

Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Brijesh Singh


Hi Arnd and David,

On 07/26/2017 05:45 AM, Arnd Bergmann wrote:

On Tue, Jul 25, 2017 at 11:51 AM, David Laight  wrote:

From: Brijesh Singh

Sent: 24 July 2017 20:08
From: Tom Lendacky 

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/include/asm/io.h | 26 ++
  1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..2f3c002 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) 
  \
   \
  static inline void outs##bwl(int port, const void *addr, unsigned long count) 
\
  {


This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/


Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling a real
function will be significant.
The code bloat reduction will be significant.


I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is actually
still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the early
console.



There are some indirect user of string I/O functions. The following functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this approach,
I have added a new file arch/x86/kernel/io.c which provides non rep version of
string I/O routines. The file gets built and used only when AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make a 
function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)   
\
unsigned type value = in##bwl(port);\
slow_down_io(); \
return value;   \
-}  \
-   \
+}
+
+#define BUILDIO_REP(bwl, bw, type) \
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {  \
asm volatile("rep; outs" #bwl   \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr, 
unsigned long count)\
 {  \
asm volatile("rep; ins" #bwl\
 : "+D"(addr), "+c"(count) : "d"(port));\
-}
+}  \
 
 BUILDIO(b, b, char)

 BUILDIO(w, w, short)
 BUILDIO(l, , int)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+extern void outsb_try_rep(int port, const void *addr, unsigned long count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb  outsb_try_rep
+#define insb   insb_try_rep
+#define outsw  outsw_try_rep
+#define insw   insw_try_rep
+#define outsl  outsl_try_rep
+#define insl   insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
 extern void *xlate_dev_mem_ptr(phys_addr_t phys);
 extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/M

Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption

2017-07-25 Thread Brijesh Singh



On 07/25/2017 12:45 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:41PM -0500, Brijesh Singh wrote:

Subject: Re: [RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure 
Encrypted Virtualization (SEV) descrption

 ^^

Please introduce a spellchecker into your workflow.


Update amd-memory-encryption document describing the AMD Secure Encrypted


"Update the AMD memory encryption document...

The patch has the proper URL already.


Virtualization (SEV) feature.

Signed-off-by: Brijesh Singh 
---
  Documentation/x86/amd-memory-encryption.txt | 29 ++---
  1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
index f512ab7..747df07 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -1,4 +1,5 @@
-Secure Memory Encryption (SME) is a feature found on AMD processors.
+Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
+features found on AMD processors.
  
  SME provides the ability to mark individual pages of memory as encrypted using

  the standard x86 page tables.  A page that is marked encrypted will be
@@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted when 
written to
  DRAM.  SME can therefore be used to protect the contents of DRAM from physical
  attacks on the system.
  
+SEV enables running encrypted virtual machine (VMs) in which the code and data


 machines


+of the virtual machine are secured so that decrypted version is available only


... of the guest VM ...   ... so that a decrypted ...


+within the VM itself. SEV guest VMs have concept of private and shared memory.


have *the* concept - you need to use
definite and indefinite articles in your
text.


+Private memory is encrypted with the guest-specific key, while shared memory
+may be encrypted with hypervisor key.


And here you explain that the hypervisor key is the same key which we
use in SME. So that people can make the connection.


+
  A page is encrypted when a page table entry has the encryption bit set (see
  below on how to determine its position).  The encryption bit can also be
  specified in the cr3 register, allowing the PGD table to be encrypted. Each
@@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption 
bit in the PGD entry
  for a PUD which results in the PUD pointed to by that entry to not be
  encrypted.
  
-Support for SME can be determined through the CPUID instruction. The CPUID

-function 0x801f reports information related to SME:
+When SEV is enabled, certain type of memory (namely insruction pages and guest


When SEV is enabled, instruction pages and guest page tables are ...


+page tables) are always treated as private. Due to security reasons all DMA


security reasons??


+operations inside the guest must be performed on shared memory. Since the
+memory encryption bit is only controllable by the guest OS when it is operating


 ... is controlled ...


+in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces memory


... forces the 
memory ...


+encryption bit to 1.
+
+Support for SME and SEV can be determined through the CPUID instruction. The
+CPUID function 0x801f reports information related to SME:
  
  	0x801f[eax]:

Bit[0] indicates support for SME
+   0x81f[eax]:


There's a 0 missing and you don't really need it as it is already above.


+   Bit[1] indicates support for SEV
0x801f[ebx]:
Bits[5:0]  pagetable bit number used to activate memory
   encryption
@@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory 
encryption:
Bit[23]   0 = memory encryption features are disabled
  1 = memory encryption features are enabled
  
+If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine if


If this MSR is going to be part of the architecture - and I really think
it is - then call it MSR_AMD64_SEV.



Thanks Boris, I'll update the doc per your feedbacks. And will rename the MSR to
MSR_AMD64_SEV.

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/io.h | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..2f3c002 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) 
\
\
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {  \
-   asm volatile("rep; outs" #bwl   \
-: "+S"(addr), "+c"(count) : "d"(port));\
+   if (sev_active()) { \
+   unsigned type *value = (unsigned type *)addr;   \
+   while (count) { \
+   out##bwl(*value, port); \
+   value++;\
+   count--;\
+   }   \
+   } else {\
+   asm volatile("rep; outs" #bwl   \
+: "+S"(addr), "+c"(count) : "d"(port));\
+   }   \
 }  \
\
 static inline void ins##bwl(int port, void *addr, unsigned long count) \
 {  \
-   asm volatile("rep; ins" #bwl\
-: "+D"(addr), "+c"(count) : "d"(port));\
+   if (sev_active()) { \
+   unsigned type *value = (unsigned type *)addr;   \
+   while (count) { \
+   *value = in##bwl(port); \
+   value++;\
+   count--;\
+   }   \
+   } else {\
+   asm volatile("rep; ins" #bwl\
+: "+D"(addr), "+c"(count) : "d"(port));\
+   }   \
 }
 
 BUILDIO(b, b, char)
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Early in the boot process, add checks to determine if the kernel is
running with Secure Encrypted Virtualization (SEV) active.

Checking for SEV requires checking that the kernel is running under a
hypervisor (CPUID 0x0001, bit 31), that the SEV feature is available
(CPUID 0x801f, bit 1) and then check a non-interceptable SEV MSR
(0xc0010131, bit 0).

This check is required so that during early compressed kernel booting the
pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
updated to include the encryption mask so that when the kernel is
decompressed into encrypted memory.

After the kernel is decompressed and continues booting the same logic is
used to check if SEV is active and set a flag indicating so.  This allows
us to distinguish between SME and SEV, each of which have unique
differences in how certain things are handled: e.g. DMA (always bounce
buffered with SEV) or EFI tables (always access decrypted with SME).

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/boot/compressed/Makefile  |   2 +
 arch/x86/boot/compressed/head_64.S |  16 +
 arch/x86/boot/compressed/mem_encrypt.S | 103 +
 arch/x86/boot/compressed/misc.h|   2 +
 arch/x86/boot/compressed/pagetable.c   |   8 ++-
 arch/x86/include/asm/mem_encrypt.h |   3 +
 arch/x86/include/asm/msr-index.h   |   3 +
 arch/x86/include/uapi/asm/kvm_para.h   |   1 -
 arch/x86/mm/mem_encrypt.c  |  42 +++---
 9 files changed, 169 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/boot/compressed/mem_encrypt.S

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 2c860ad..d2fe901 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o 
$(obj)/misc.o \
$(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
$(obj)/piggy.o $(obj)/cpuflags.o
 
+vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o
+
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
 ifdef CONFIG_X86_64
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index fbf4c32..6179d43 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -130,6 +130,19 @@ ENTRY(startup_32)
  /*
   * Build early 4G boot pagetable
   */
+   /*
+* If SEV is active then set the encryption mask in the page tables.
+* This will insure that when the kernel is copied and decompressed
+* it will be done so encrypted.
+*/
+   callget_sev_encryption_bit
+   xorl%edx, %edx
+   testl   %eax, %eax
+   jz  1f
+   subl$32, %eax   /* Encryption bit is always above bit 31 */
+   bts %eax, %edx  /* Set encryption mask for page tables */
+1:
+
/* Initialize Page tables to 0 */
lealpgtable(%ebx), %edi
xorl%eax, %eax
@@ -140,12 +153,14 @@ ENTRY(startup_32)
lealpgtable + 0(%ebx), %edi
leal0x1007 (%edi), %eax
movl%eax, 0(%edi)
+   addl%edx, 4(%edi)
 
/* Build Level 3 */
lealpgtable + 0x1000(%ebx), %edi
leal0x1007(%edi), %eax
movl$4, %ecx
 1: movl%eax, 0x00(%edi)
+   addl%edx, 0x04(%edi)
addl$0x1000, %eax
addl$8, %edi
decl%ecx
@@ -156,6 +171,7 @@ ENTRY(startup_32)
movl$0x0183, %eax
movl$2048, %ecx
 1: movl%eax, 0(%edi)
+   addl%edx, 4(%edi)
addl$0x0020, %eax
addl$8, %edi
decl%ecx
diff --git a/arch/x86/boot/compressed/mem_encrypt.S 
b/arch/x86/boot/compressed/mem_encrypt.S
new file mode 100644
index 000..696716e
--- /dev/null
+++ b/arch/x86/boot/compressed/mem_encrypt.S
@@ -0,0 +1,103 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2017 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+   .text
+   .code32
+ENTRY(get_sev_encryption_bit)
+   xor %eax, %eax
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   push%ebx
+   push%ecx
+   push%edx
+
+   /* Check if running under a hypervisor */
+   movl$1, %eax
+   cpuid
+   bt  $31, %ecx   /* Check the hypervisor bit */
+   jnc .Lno_sev
+
+   movl$0x8000, %eax   /* CPUID to check the highest leaf */
+   cpuid
+   cmpl$0x801f, %eax   /* See if 0x801f is available */
+   jb  .Lno_sev
+
+   /*
+* Check

[RFC Part1 PATCH v3 17/17] X86/KVM: Clear encryption attribute when SEV is active

2017-07-24 Thread Brijesh Singh
The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
periodically updates the contents of the memory. When SEV is active, we
must clear the encryption attributes from the shared memory pages so that
both hypervisor and guest can access the data.

Signed-off-by: Brijesh Singh 
---
 arch/x86/entry/vdso/vma.c  |  5 ++--
 arch/x86/kernel/kvmclock.c | 64 +++---
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 726355c..ff50251 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -114,10 +114,11 @@ static int vvar_fault(const struct vm_special_mapping *sm,
struct pvclock_vsyscall_time_info *pvti =
pvclock_pvti_cpu0_va();
if (pvti && vclock_was_used(VCLOCK_PVCLOCK)) {
-   ret = vm_insert_pfn(
+   ret = vm_insert_pfn_prot(
vma,
vmf->address,
-   __pa(pvti) >> PAGE_SHIFT);
+   __pa(pvti) >> PAGE_SHIFT,
+   pgprot_decrypted(vma->vm_page_prot));
}
} else if (sym_offset == image->sym_hvclock_page) {
struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page();
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d889676..f3a8101 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -45,7 +46,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock wall_clock;
+static struct pvclock_wall_clock *wall_clock;
 
 struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 {
@@ -64,15 +65,18 @@ static void kvm_get_wallclock(struct timespec *now)
int low, high;
int cpu;
 
-   low = (int)__pa_symbol(&wall_clock);
-   high = ((u64)__pa_symbol(&wall_clock) >> 32);
+   if (!wall_clock)
+   return;
+
+   low = (int)slow_virt_to_phys(wall_clock);
+   high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
 
native_write_msr(msr_kvm_wall_clock, low, high);
 
cpu = get_cpu();
 
vcpu_time = &hv_clock[cpu].pvti;
-   pvclock_read_wallclock(&wall_clock, vcpu_time, now);
+   pvclock_read_wallclock(wall_clock, vcpu_time, now);
 
put_cpu();
 }
@@ -249,11 +253,39 @@ static void kvm_shutdown(void)
native_machine_shutdown();
 }
 
+static phys_addr_t __init kvm_memblock_alloc(phys_addr_t size,
+phys_addr_t align)
+{
+   phys_addr_t mem;
+
+   mem = memblock_alloc(size, align);
+   if (!mem)
+   return 0;
+
+   if (sev_active()) {
+   if (early_set_memory_decrypted(mem, size))
+   goto e_free;
+   }
+
+   return mem;
+e_free:
+   memblock_free(mem, size);
+   return 0;
+}
+
+static void __init kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
+{
+   if (sev_active())
+   early_set_memory_encrypted(addr, size);
+
+   memblock_free(addr, size);
+}
+
 void __init kvmclock_init(void)
 {
struct pvclock_vcpu_time_info *vcpu_time;
-   unsigned long mem;
-   int size, cpu;
+   unsigned long mem, mem_wall_clock;
+   int size, cpu, wall_clock_size;
u8 flags;
 
size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
@@ -270,15 +302,29 @@ void __init kvmclock_init(void)
printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);
 
-   mem = memblock_alloc(size, PAGE_SIZE);
-   if (!mem)
+   wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
+   mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
+   if (!mem_wall_clock)
return;
+
+   wall_clock = __va(mem_wall_clock);
+   memset(wall_clock, 0, wall_clock_size);
+
+   mem = kvm_memblock_alloc(size, PAGE_SIZE);
+   if (!mem) {
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
+   return;
+   }
+
hv_clock = __va(mem);
memset(hv_clock, 0, size);
 
if (kvm_register_clock("primary cpu clock")) {
hv_clock = NULL;
-   memblock_free(mem, size);
+   kvm_memblock_free(mem, size);
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
retu

[RFC Part1 PATCH v3 15/17] x86: Add support for changing memory encryption attribute in early boot

2017-07-24 Thread Brijesh Singh
Some KVM-specific custom MSRs shares the guest physical address with
hypervisor. When SEV is active, the shared physical address must be mapped
with encryption attribute cleared so that both hypervsior and guest can
access the data.

Add APIs to change memory encryption attribute in early boot code.

Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/mem_encrypt.h |  17 ++
 arch/x86/mm/mem_encrypt.c  | 117 +
 2 files changed, 134 insertions(+)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 9cb6472..30b539e 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -46,6 +46,11 @@ void __init sme_early_init(void);
 void __init sme_encrypt_kernel(void);
 void __init sme_enable(struct boot_params *bp);
 
+int __init early_set_memory_decrypted(resource_size_t paddr,
+ unsigned long size);
+int __init early_set_memory_encrypted(resource_size_t paddr,
+ unsigned long size);
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void);
 
@@ -69,6 +74,18 @@ static inline void __init sme_early_init(void) { }
 static inline void __init sme_encrypt_kernel(void) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
+static inline int __init early_set_memory_decrypted(resource_size_t paddr,
+   unsigned long size)
+{
+   return 0;
+}
+
+static inline int __init early_set_memory_encrypted(resource_size_t paddr,
+   unsigned long size)
+{
+   return 0;
+}
+
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 /*
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ed8780e..d174b1c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -28,6 +28,8 @@
 #include 
 #include 
 
+#include "mm_internal.h"
+
 static char sme_cmdline_arg[] __initdata = "mem_encrypt";
 static char sme_cmdline_on[]  __initdata = "on";
 static char sme_cmdline_off[] __initdata = "off";
@@ -257,6 +259,121 @@ static void sme_free(struct device *dev, size_t size, 
void *vaddr,
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
+static void __init __set_clr_pte_enc(pte_t *kpte, int level, bool enc)
+{
+   pgprot_t old_prot, new_prot;
+   unsigned long pfn;
+   pte_t new_pte;
+
+   switch (level) {
+   case PG_LEVEL_4K:
+   pfn = pte_pfn(*kpte);
+   old_prot = pte_pgprot(*kpte);
+   break;
+   case PG_LEVEL_2M:
+   pfn = pmd_pfn(*(pmd_t *)kpte);
+   old_prot = pmd_pgprot(*(pmd_t *)kpte);
+   break;
+   case PG_LEVEL_1G:
+   pfn = pud_pfn(*(pud_t *)kpte);
+   old_prot = pud_pgprot(*(pud_t *)kpte);
+   break;
+   default:
+   return;
+   }
+
+   new_prot = old_prot;
+   if (enc)
+   pgprot_val(new_prot) |= _PAGE_ENC;
+   else
+   pgprot_val(new_prot) &= ~_PAGE_ENC;
+
+   /* if prot is same then do nothing */
+   if (pgprot_val(old_prot) == pgprot_val(new_prot))
+   return;
+
+   new_pte = pfn_pte(pfn, new_prot);
+   set_pte_atomic(kpte, new_pte);
+}
+
+static int __init early_set_memory_enc_dec(resource_size_t paddr,
+  unsigned long size, bool enc)
+{
+   unsigned long vaddr, vaddr_end, vaddr_next;
+   unsigned long psize, pmask;
+   int split_page_size_mask;
+   pte_t *kpte;
+   int level;
+
+   vaddr = (unsigned long)__va(paddr);
+   vaddr_next = vaddr;
+   vaddr_end = vaddr + size;
+
+   /*
+* We are going to change the physical page attribute from C=1 to C=0
+* or vice versa. Flush the caches to ensure that data is written into
+* memory with correct C-bit before we change attribute.
+*/
+   clflush_cache_range(__va(paddr), size);
+
+   for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+   kpte = lookup_address(vaddr, &level);
+   if (!kpte || pte_none(*kpte))
+   return 1;
+
+   if (level == PG_LEVEL_4K) {
+   __set_clr_pte_enc(kpte, level, enc);
+   vaddr_next = (vaddr & PAGE_MASK) + PAGE_SIZE;
+   continue;
+   }
+
+   psize = page_level_size(level);
+   pmask = page_level_mask(level);
+
+   /*
+* Check, whether we can change the large page in one go.
+* We request a split, when the address is not aligned and
+* the number of pages to set/clear encryption bit is smaller
+* than the number of pages in the large page.
+   

[RFC Part1 PATCH v3 16/17] X86/KVM: Provide support to create Guest and HV shared per-CPU variables

2017-07-24 Thread Brijesh Singh
Some KVM specific MSR's (steal-time, asyncpf, avic_eio) allocates per-CPU
variable at compile time and share its physical address with hypervisor.
It presents a challege when SEV is active in guest OS, when SEV is active,
the guest memory is encrypted with guest key hence hypervisor will not
able to modify the guest memory. When SEV is active, we need to clear the
encryption attribute (aka C-bit) of shared physical addresses so that both
guest and hypervisor can access the data.

To solve this problem, I have tried these three options:

1) Convert the static per-CPU to dynamic per-CPU allocation and when SEV
is detected clear the C-bit from the page table. But while doing so I
found that per-CPU dynamic allocator was not ready when kvm_guest_cpu_init
was called.

2) Since the C-bit works on PAGE_SIZE hence add some extra padding to
'struct kvm-steal-time' to make it PAGE_SIZE and then at runtime
clear the encryption attribute of the full PAGE. The downside of this -
we need to modify structure which may break the compatibility.

3) Define a new per-CPU section (.data..percpu.hv_shared) which will be
used to hold the compile time shared per-CPU variables. When SEV is
detected we map this section without C-bit.

This patch implements #3. It introduces a new DEFINE_PER_CPU_HV_SHAHRED
macro to create a compile time per-CPU variable. When SEV is detected we
clear the C-bit from the shared per-CPU variable.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/kvm.c | 46 ---
 include/asm-generic/vmlinux.lds.h |  3 +++
 include/linux/percpu-defs.h   | 12 ++
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 71c17a5..1f6fec8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,8 +75,8 @@ static int parse_no_kvmclock_vsyscall(char *arg)
 
 early_param("no-kvmclock-vsyscall", parse_no_kvmclock_vsyscall);
 
-static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64);
-static DEFINE_PER_CPU(struct kvm_steal_time, steal_time) __aligned(64);
+static DEFINE_PER_CPU_HV_SHARED(struct kvm_vcpu_pv_apf_data, apf_reason) 
__aligned(64);
+static DEFINE_PER_CPU_HV_SHARED(struct kvm_steal_time, steal_time) 
__aligned(64);
 static int has_steal_clock = 0;
 
 /*
@@ -303,7 +303,7 @@ static void kvm_register_steal_time(void)
cpu, (unsigned long long) slow_virt_to_phys(st));
 }
 
-static DEFINE_PER_CPU(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
+static DEFINE_PER_CPU_HV_SHARED(unsigned long, kvm_apic_eoi) = 
KVM_PV_EOI_DISABLED;
 
 static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 {
@@ -319,11 +319,51 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 
val)
apic->native_eoi_write(APIC_EOI, APIC_EOI_ACK);
 }
 
+/* NOTE: function is marked as __ref because it is used by __init functions */
+static int __ref kvm_map_hv_shared_decrypted(void)
+{
+   static int once, ret;
+   int cpu;
+
+   if (once)
+   return ret;
+
+   /*
+* Iterate through all possible CPU's and clear the C-bit from
+* percpu variables.
+*/
+   for_each_possible_cpu(cpu) {
+   struct kvm_vcpu_pv_apf_data *apf;
+   unsigned long pa;
+
+   apf = &per_cpu(apf_reason, cpu);
+   pa = slow_virt_to_phys(apf);
+   sme_early_decrypt(pa & PAGE_MASK, PAGE_SIZE);
+   ret = early_set_memory_decrypted(pa, PAGE_SIZE);
+   if (ret)
+   break;
+   }
+
+   once = 1;
+   return ret;
+}
+
 static void kvm_guest_cpu_init(void)
 {
if (!kvm_para_available())
return;
 
+   /*
+* When SEV is active, map the shared percpu as unencrypted so that
+* both guest and hypervsior can access the data.
+*/
+   if (sev_active()) {
+   if (kvm_map_hv_shared_decrypted()) {
+   printk(KERN_ERR "Failed to map percpu as 
unencrypted\n");
+   return;
+   }
+   }
+
if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
 
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index da0be9a..52854cf 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -783,6 +783,9 @@
. = ALIGN(cacheline);   \
*(.data..percpu)\
*(.data..percpu..shared_aligned)\
+   . = ALIGN(PAGE_SIZE);   \
+   *(.data..percpu..hv_shared) \
+   . = ALIGN(PAGE_SIZE);  

[RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

In order for memory pages to be properly mapped when SEV is active, we
need to use the PAGE_KERNEL protection attribute as the base protection.
This will insure that memory mapping of, e.g. ACPI tables, receives the
proper mapping attributes.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/ioremap.c  | 28 
 include/linux/ioport.h |  3 +++
 kernel/resource.c  | 17 +
 3 files changed, 48 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c0be7cf..7b27332 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
 }
 
+static int __ioremap_res_desc_other(struct resource *res, void *arg)
+{
+   return (res->desc != IORES_DESC_NONE);
+}
+
+/*
+ * This function returns true if the target memory is marked as
+ * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
+ * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
+ */
+static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
+{
+   u64 start, end;
+
+   start = (u64)addr;
+   end = start + size - 1;
+
+   return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
+}
+
 /*
  * Remap an arbitrary physical address space into the kernel virtual
  * address space. It transparently creates kernel huge I/O mapping when
@@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
pcm = new_pcm;
}
 
+   /*
+* If the page being mapped is in memory and SEV is active then
+* make sure the memory encryption attribute is enabled in the
+* resulting mapping.
+*/
prot = PAGE_KERNEL_IO;
+   if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
+   prot = pgprot_encrypted(prot);
+
switch (pcm) {
case _PAGE_CACHE_MODE_UC:
default:
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 1c66b9c..297f5b8 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -268,6 +268,9 @@ extern int
 walk_system_ram_range(unsigned long start_pfn, unsigned long nr_pages,
void *arg, int (*func)(unsigned long, unsigned long, void *));
 extern int
+walk_mem_res(u64 start, u64 end, void *arg,
+int (*func)(struct resource *, void *));
+extern int
 walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(struct resource *, void *));
 extern int
diff --git a/kernel/resource.c b/kernel/resource.c
index 5f9ee7bb0..ec3fa0c 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -468,6 +468,23 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
 arg, func);
 }
 
+/*
+ * This function calls the @func callback against all memory ranges, which
+ * are ranges marked as IORESOURCE_MEM and IORESOUCE_BUSY.
+ */
+int walk_mem_res(u64 start, u64 end, void *arg,
+int (*func)(struct resource *, void *))
+{
+   struct resource res;
+
+   res.start = start;
+   res.end = end;
+   res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+
+   return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+arg, func);
+}
+
 #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
 
 /*
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 12/17] x86/mm: DMA support for SEV memory encryption

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

DMA access to memory mapped as encrypted while SEV is active can not be
encrypted during device write or decrypted during device read. In order
for DMA to properly work when SEV is active, the SWIOTLB bounce buffers
must be used.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/mem_encrypt.c | 86 +++
 lib/swiotlb.c |  5 +--
 2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 1e4643e..5e5d460 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -191,8 +191,86 @@ void __init sme_early_init(void)
/* Update the protection map with memory encryption mask */
for (i = 0; i < ARRAY_SIZE(protection_map); i++)
protection_map[i] = pgprot_encrypted(protection_map[i]);
+
+   if (sev_active())
+   swiotlb_force = SWIOTLB_FORCE;
+}
+
+static void *sme_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
+  gfp_t gfp, unsigned long attrs)
+{
+   unsigned long dma_mask;
+   unsigned int order;
+   struct page *page;
+   void *vaddr = NULL;
+
+   dma_mask = dma_alloc_coherent_mask(dev, gfp);
+   order = get_order(size);
+
+   /*
+* Memory will be memset to zero after marking decrypted, so don't
+* bother clearing it before.
+*/
+   gfp &= ~__GFP_ZERO;
+
+   page = alloc_pages_node(dev_to_node(dev), gfp, order);
+   if (page) {
+   dma_addr_t addr;
+
+   /*
+* Since we will be clearing the encryption bit, check the
+* mask with it already cleared.
+*/
+   addr = __sme_clr(phys_to_dma(dev, page_to_phys(page)));
+   if ((addr + size) > dma_mask) {
+   __free_pages(page, get_order(size));
+   } else {
+   vaddr = page_address(page);
+   *dma_handle = addr;
+   }
+   }
+
+   if (!vaddr)
+   vaddr = swiotlb_alloc_coherent(dev, size, dma_handle, gfp);
+
+   if (!vaddr)
+   return NULL;
+
+   /* Clear the SME encryption bit for DMA use if not swiotlb area */
+   if (!is_swiotlb_buffer(dma_to_phys(dev, *dma_handle))) {
+   set_memory_decrypted((unsigned long)vaddr, 1 << order);
+   memset(vaddr, 0, PAGE_SIZE << order);
+   *dma_handle = __sme_clr(*dma_handle);
+   }
+
+   return vaddr;
+}
+
+static void sme_free(struct device *dev, size_t size, void *vaddr,
+dma_addr_t dma_handle, unsigned long attrs)
+{
+   /* Set the SME encryption bit for re-use if not swiotlb area */
+   if (!is_swiotlb_buffer(dma_to_phys(dev, dma_handle)))
+   set_memory_encrypted((unsigned long)vaddr,
+1 << get_order(size));
+
+   swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
+static const struct dma_map_ops sme_dma_ops = {
+   .alloc  = sme_alloc,
+   .free   = sme_free,
+   .map_page   = swiotlb_map_page,
+   .unmap_page = swiotlb_unmap_page,
+   .map_sg = swiotlb_map_sg_attrs,
+   .unmap_sg   = swiotlb_unmap_sg_attrs,
+   .sync_single_for_cpu= swiotlb_sync_single_for_cpu,
+   .sync_single_for_device = swiotlb_sync_single_for_device,
+   .sync_sg_for_cpu= swiotlb_sync_sg_for_cpu,
+   .sync_sg_for_device = swiotlb_sync_sg_for_device,
+   .mapping_error  = swiotlb_dma_mapping_error,
+};
+
 /* Architecture __weak replacement functions */
 void __init mem_encrypt_init(void)
 {
@@ -202,6 +280,14 @@ void __init mem_encrypt_init(void)
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();
 
+   /*
+* With SEV, DMA operations cannot use encryption. New DMA ops
+* are required in order to mark the DMA areas as decrypted or
+* to use bounce buffers.
+*/
+   if (sev_active())
+   dma_ops = &sme_dma_ops;
+
pr_info("AMD Secure Memory Encryption (SME) active\n");
 }
 
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 8c6c83e..85fed2f 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -507,8 +507,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
if (no_iotlb_memory)
panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
 
-   if (sme_active())
-   pr_warn_once("SME is active and system is using DMA bounce 
buffers\n");
+   if (sme_active() || sev_active())
+   pr_warn_once("%s is active and system is using DMA bounce 
buffers\n

[RFC Part1 PATCH v3 10/17] resource: Provide resource struct in resource walk callback

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

In prep for a new function that will need additional resource information
during the resource walk, update the resource walk callback to pass the
resource structure.  Since the current callback start and end arguments
are pulled from the resource structure, the callback functions can obtain
them from the resource structure directly.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/powerpc/kernel/machine_kexec_file_64.c | 12 +---
 arch/x86/kernel/crash.c | 18 +-
 arch/x86/kernel/pmem.c  |  2 +-
 include/linux/ioport.h  |  4 ++--
 include/linux/kexec.h   |  2 +-
 kernel/kexec_file.c |  5 +++--
 kernel/resource.c   |  9 +
 7 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec_file_64.c 
b/arch/powerpc/kernel/machine_kexec_file_64.c
index 992c0d2..e4395f9 100644
--- a/arch/powerpc/kernel/machine_kexec_file_64.c
+++ b/arch/powerpc/kernel/machine_kexec_file_64.c
@@ -91,11 +91,13 @@ int arch_kimage_file_post_load_cleanup(struct kimage *image)
  * and that value will be returned. If all free regions are visited without
  * func returning non-zero, then zero will be returned.
  */
-int arch_kexec_walk_mem(struct kexec_buf *kbuf, int (*func)(u64, u64, void *))
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+   int (*func)(struct resource *, void *))
 {
int ret = 0;
u64 i;
phys_addr_t mstart, mend;
+   struct resource res = { };
 
if (kbuf->top_down) {
for_each_free_mem_range_reverse(i, NUMA_NO_NODE, 0,
@@ -105,7 +107,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
@@ -117,7 +121,9 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf, int 
(*func)(u64, u64, void *))
 * range while in kexec, end points to the last byte
 * in the range.
 */
-   ret = func(mstart, mend - 1, kbuf);
+   res.start = mstart;
+   res.end = mend - 1;
+   ret = func(&res, kbuf);
if (ret)
break;
}
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 44404e2..815008c 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -209,7 +209,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
 }
 
 #ifdef CONFIG_KEXEC_FILE
-static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+static int get_nr_ram_ranges_callback(struct resource *res, void *arg)
 {
unsigned int *nr_ranges = arg;
 
@@ -342,7 +342,7 @@ static int elf_header_exclude_ranges(struct crash_elf_data 
*ced,
return ret;
 }
 
-static int prepare_elf64_ram_headers_callback(u64 start, u64 end, void *arg)
+static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg)
 {
struct crash_elf_data *ced = arg;
Elf64_Ehdr *ehdr;
@@ -355,7 +355,7 @@ static int prepare_elf64_ram_headers_callback(u64 start, 
u64 end, void *arg)
ehdr = ced->ehdr;
 
/* Exclude unwanted mem ranges */
-   ret = elf_header_exclude_ranges(ced, start, end);
+   ret = elf_header_exclude_ranges(ced, res->start, res->end);
if (ret)
return ret;
 
@@ -518,14 +518,14 @@ static int add_e820_entry(struct boot_params *params, 
struct e820_entry *entry)
return 0;
 }
 
-static int memmap_entry_callback(u64 start, u64 end, void *arg)
+static int memmap_entry_callback(struct resource *res, void *arg)
 {
struct crash_memmap_data *cmd = arg;
struct boot_params *params = cmd->params;
struct e820_entry ei;
 
-   ei.addr = start;
-   ei.size = end - start + 1;
+   ei.addr = res->start;
+   ei.size = res->end - res->start + 1;
ei.type = cmd->type;
add_e820_entry(params, &ei);
 
@@ -619,12 +619,12 @@ int crash_setup_memmap_entries(struct kimage *image, 
struct boot_params *params)
return ret;
 }
 
-static int determine_backup_region(u64 start, u64 end, void *arg)
+static int determine_backup_region(struct resource *res, void *arg)
 {
struct kimage *image = arg;
 
-   image->arch.backup_src_start = start;
-   image->arch.backup_src_sz = end - start + 1;
+   image->a

[RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
functions each have much of the same code.  Create a new function that
consolidates the common code from these functions in one place to reduce
the amount of duplicated code.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 kernel/resource.c | 53 ++---
 1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..7b20b3e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
res->start = p->start;
if (res->end > p->end)
res->end = p->end;
+   res->desc = p->desc;
return 0;
 }
 
+static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
+bool first_level_children_only,
+void *arg, int (*func)(u64, u64, void *))
+{
+   u64 orig_end = res->end;
+   int ret = -1;
+
+   while ((res->start < res->end) &&
+  !find_next_iomem_res(res, desc, first_level_children_only)) {
+   ret = (*func)(res->start, res->end, arg);
+   if (ret)
+   break;
+
+   res->start = res->end + 1;
+   res->end = orig_end;
+   }
+
+   return ret;
+}
+
 /*
  * Walks through iomem resources and calls func() with matching resource
  * ranges. This walks through whole tree and not just first level children.
@@ -418,26 +439,12 @@ int walk_iomem_res_desc(unsigned long desc, unsigned long 
flags, u64 start,
u64 end, void *arg, int (*func)(u64, u64, void *))
 {
struct resource res;
-   u64 orig_end;
-   int ret = -1;
 
res.start = start;
res.end = end;
res.flags = flags;
-   orig_end = res.end;
-
-   while ((res.start < res.end) &&
-   (!find_next_iomem_res(&res, desc, false))) {
-
-   ret = (*func)(res.start, res.end, arg);
-   if (ret)
-   break;
-
-   res.start = res.end + 1;
-   res.end = orig_end;
-   }
 
-   return ret;
+   return __walk_iomem_res_desc(&res, desc, false, arg, func);
 }
 
 /*
@@ -451,22 +458,13 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *))
 {
struct resource res;
-   u64 orig_end;
-   int ret = -1;
 
res.start = start;
res.end = end;
res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-   orig_end = res.end;
-   while ((res.start < res.end) &&
-   (!find_next_iomem_res(&res, IORES_DESC_NONE, true))) {
-   ret = (*func)(res.start, res.end, arg);
-   if (ret)
-   break;
-   res.start = res.end + 1;
-   res.end = orig_end;
-   }
-   return ret;
+
+   return __walk_iomem_res_desc(&res, IORES_DESC_NONE, true,
+arg, func);
 }
 
 #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
@@ -508,6 +506,7 @@ static int __is_ram(unsigned long pfn, unsigned long 
nr_pages, void *arg)
 {
return 1;
 }
+
 /*
  * This generic page_is_ram() returns true if specified address is
  * registered as System RAM in iomem_resource list.
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

When SEV is active the trampoline area will need to be in encrypted
memory so only mark the area decrypted if SME is active.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/realmode/init.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 1f71980..c7eeca7 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
/*
 * If SME is active, the trampoline area will need to be in
 * decrypted memory in order to bring up other processors
-* successfully.
+* successfully. For SEV the trampoline area needs to be in
+* encrypted memory, so only do this for SME.
 */
-   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
+   if (sme_active())
+   set_memory_decrypted((unsigned long)base, size >> PAGE_SHIFT);
 
memcpy(base, real_mode_blob, size);
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

When Secure Encrypted Virtualization (SEV) is active, boot data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/ioremap.c | 44 
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 34f0e18..c0be7cf 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -422,6 +422,9 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  * areas should be mapped decrypted. And since the encryption key can
  * change across reboots, persistent memory should also be mapped
  * decrypted.
+ *
+ * If SEV is active, that implies that BIOS/UEFI also ran encrypted so
+ * only persistent memory should be mapped decrypted.
  */
 static bool memremap_should_map_decrypted(resource_size_t phys_addr,
  unsigned long size)
@@ -458,6 +461,11 @@ static bool memremap_should_map_decrypted(resource_size_t 
phys_addr,
case E820_TYPE_ACPI:
case E820_TYPE_NVS:
case E820_TYPE_UNUSABLE:
+   /* For SEV, these areas are encrypted */
+   if (sev_active())
+   break;
+   /* Fallthrough */
+
case E820_TYPE_PRAM:
return true;
default:
@@ -581,7 +589,7 @@ static bool __init 
early_memremap_is_setup_data(resource_size_t phys_addr,
 bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned long size,
 unsigned long flags)
 {
-   if (!sme_active())
+   if (!sme_active() && !sev_active())
return true;
 
if (flags & MEMREMAP_ENC)
@@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t 
phys_addr, unsigned long size,
if (flags & MEMREMAP_DEC)
return false;
 
-   if (memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   return false;
+   if (sme_active()) {
+   if (memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size) ||
+   memremap_should_map_decrypted(phys_addr, size))
+   return false;
+   } else if (sev_active()) {
+   if (memremap_should_map_decrypted(phys_addr, size))
+   return false;
+   }
 
return true;
 }
@@ -608,15 +621,22 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
 unsigned long size,
 pgprot_t prot)
 {
-   if (!sme_active())
+   if (!sme_active() && !sev_active())
return prot;
 
-   if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   prot = pgprot_decrypted(prot);
-   else
-   prot = pgprot_encrypted(prot);
+   if (sme_active()) {
+   if (early_memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size) ||
+   memremap_should_map_decrypted(phys_addr, size))
+   prot = pgprot_decrypted(prot);
+   else
+   prot = pgprot_encrypted(prot);
+   } else if (sev_active()) {
+   if (memremap_should_map_decrypted(phys_addr, size))
+   prot = pgprot_decrypted(prot);
+   else
+   prot = pgprot_encrypted(prot);
+   }
 
return prot;
 }
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

The current code checks only for sme_active() when determining whether
to perform the encryption attribute change.  Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/pageattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dfb7d65..b726b23 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
unsigned long start;
int ret;
 
-   /* Nothing to do if the SME is not active */
-   if (!sme_active())
+   /* Nothing to do if SME and SEV are not active */
+   if (!sme_active() && !sev_active())
return 0;
 
/* Should not be working on unaligned addresses */
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/platform/efi/efi_64.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e8388..1ecb3f6 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;
+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
 
+   if (sev_active())
+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t 
*md, unsigned long pf)
pgd_t *pgd = efi_pgd;
int err1, err2;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
/* Update the 1:1 mapping */
pfn = md->phys_addr >> PAGE_SHIFT;
err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, 
pf);
@@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
 
+   if (sev_active())
+   pf |= _PAGE_ENC;
+
efi_update_mappings(md, pf);
}
 }
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 02/17] x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SME is identified by
CPUID 0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/msr-index.h   |  2 ++
 arch/x86/kernel/cpu/amd.c  | 30 +-
 arch/x86/kernel/cpu/scattered.c|  1 +
 4 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 14f0f29..b6ae647 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -197,6 +197,7 @@
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
 #define X86_FEATURE_SME( 7*32+10) /* AMD Secure Memory 
Encryption */
+#define X86_FEATURE_SEV( 7*32+11) /* AMD Secure Encrypted 
Virtualization */
 
 #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number 
*/
 #define X86_FEATURE_INTEL_PT   ( 7*32+15) /* Intel Processor Trace */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 17f5c12..e399d68 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -378,6 +378,8 @@
 #define MSR_K7_PERFCTR30xc0010007
 #define MSR_K7_CLK_CTL 0xc001001b
 #define MSR_K7_HWCR0xc0010015
+#define MSR_K7_HWCR_SMMLOCK_BIT0
+#define MSR_K7_HWCR_SMMLOCKBIT_ULL(MSR_K7_HWCR_SMMLOCK_BIT)
 #define MSR_K7_FID_VID_CTL 0xc0010041
 #define MSR_K7_FID_VID_STATUS  0xc0010042
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 110ca5d..c413f04 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -618,11 +618,16 @@ static void early_init_amd(struct cpuinfo_x86 *c)
set_cpu_bug(c, X86_BUG_AMD_E400);
 
/*
-* BIOS support is required for SME. If BIOS has enabled SME then
-* adjust x86_phys_bits by the SME physical address space reduction
-* value. If BIOS has not enabled SME then don't advertise the
-* feature (set in scattered.c). Also, since the SME support requires
-* long mode, don't advertise the feature under CONFIG_X86_32.
+* BIOS support is required for SME and SEV.
+*   For SME: If BIOS has enabled SME then adjust x86_phys_bits by
+*the SME physical address space reduction value.
+*If BIOS has not enabled SME then don't advertise the
+*SME feature (set in scattered.c).
+*   For SEV: If BIOS has not enabled SEV then don't advertise the
+*SEV feature (set in scattered.c).
+*
+*   In all cases, since support for SME and SEV requires long mode,
+*   don't advertise the feature under CONFIG_X86_32.
 */
if (cpu_has(c, X86_FEATURE_SME)) {
u64 msr;
@@ -637,6 +642,21 @@ static void early_init_amd(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_SME);
}
}
+
+   if (cpu_has(c, X86_FEATURE_SEV)) {
+   if (IS_ENABLED(CONFIG_X86_32)) {
+   clear_cpu_cap(c, X86_FEATURE_SEV);
+   } else {
+   u64 syscfg, hwcr;
+
+   /* Check if SEV is enabled */
+   rdmsrl(MSR_K8_SYSCFG, syscfg);
+   rdmsrl(MSR_K7_HWCR, hwcr);
+   if (!(syscfg & MSR_K8_SYSCFG_MEM_ENCRYPT) ||
+   !(hwcr & MSR_K7_HWCR_SMMLOCK))
+   clear_cpu_cap(c, X86_FEATURE_SEV);
+   }
+   }
 }
 
 static void init_amd_k8(struct cpuinfo_x86 *c)
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 05459ad..63a78d5 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -32,6 +32,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
{ X86_FEATURE_SME,  CPUID_EAX,  0, 0x801f, 0 },
+   { X86_FEATURE_SEV,  CPUID_EAX,  1, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 04/17] x86/mm: Don't attempt to encrypt initrd under SEV

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

When SEV is active the initrd/initramfs will already have already been
placed in memory encyrpted so do not try to encrypt it.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/setup.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 0bfe0c1..01d56a1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -379,9 +379,11 @@ static void __init reserve_initrd(void)
 * If SME is active, this memory will be marked encrypted by the
 * kernel when it is accessed (including relocation). However, the
 * ramdisk image was loaded decrypted by the bootloader, so make
-* sure that it is encrypted before accessing it.
+* sure that it is encrypted before accessing it. For SEV the
+* ramdisk will already be encyrpted, so only do this for SME.
 */
-   sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
+   if (sme_active())
+   sme_early_encrypt(ramdisk_image, ramdisk_end - ramdisk_image);
 
initrd_start = 0;
 
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

2017-07-24 Thread Brijesh Singh
From: Tom Lendacky 

Provide support for Secure Encyrpted Virtualization (SEV). This initial
support defines a flag that is used by the kernel to determine if it is
running with SEV active.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
 arch/x86/include/asm/mem_encrypt.h | 2 ++
 arch/x86/mm/mem_encrypt.c  | 3 +++
 include/linux/mem_encrypt.h| 8 +++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 8e618fc..9274ec7 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -22,6 +22,7 @@
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 
 extern unsigned long sme_me_mask;
+extern unsigned int sev_enabled;
 
 void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
 unsigned long decrypted_kernel_vaddr,
@@ -50,6 +51,7 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned long 
size);
 #else  /* !CONFIG_AMD_MEM_ENCRYPT */
 
 #define sme_me_mask0UL
+#define sev_enabled0
 
 static inline void __init sme_early_encrypt(resource_size_t paddr,
unsigned long size) { }
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 0fbd092..1e4643e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -40,6 +40,9 @@ static char sme_cmdline_off[] __initdata = "off";
 unsigned long sme_me_mask __section(.data) = 0;
 EXPORT_SYMBOL_GPL(sme_me_mask);
 
+unsigned int sev_enabled __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sev_enabled);
+
 /* Buffer used for early in-place encryption by BSP, no locking needed */
 static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
 
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
index 1255f09..ea0831a 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,12 +22,18 @@
 #else  /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
 #define sme_me_mask0UL
+#define sev_enabled0
 
 #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */
 
 static inline bool sme_active(void)
 {
-   return !!sme_me_mask;
+   return (sme_me_mask && !sev_enabled);
+}
+
+static inline bool sev_active(void)
+{
+   return (sme_me_mask && sev_enabled);
 }
 
 static inline unsigned long sme_get_me_mask(void)
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 01/17] Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV) descrption

2017-07-24 Thread Brijesh Singh
Update amd-memory-encryption document describing the AMD Secure Encrypted
Virtualization (SEV) feature.

Signed-off-by: Brijesh Singh 
---
 Documentation/x86/amd-memory-encryption.txt | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
index f512ab7..747df07 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -1,4 +1,5 @@
-Secure Memory Encryption (SME) is a feature found on AMD processors.
+Secure Memory Encryption (SME) and Secure Encrypted Virtualization (SEV) are
+features found on AMD processors.
 
 SME provides the ability to mark individual pages of memory as encrypted using
 the standard x86 page tables.  A page that is marked encrypted will be
@@ -6,6 +7,12 @@ automatically decrypted when read from DRAM and encrypted when 
written to
 DRAM.  SME can therefore be used to protect the contents of DRAM from physical
 attacks on the system.
 
+SEV enables running encrypted virtual machine (VMs) in which the code and data
+of the virtual machine are secured so that decrypted version is available only
+within the VM itself. SEV guest VMs have concept of private and shared memory.
+Private memory is encrypted with the guest-specific key, while shared memory
+may be encrypted with hypervisor key.
+
 A page is encrypted when a page table entry has the encryption bit set (see
 below on how to determine its position).  The encryption bit can also be
 specified in the cr3 register, allowing the PGD table to be encrypted. Each
@@ -19,11 +26,20 @@ so that the PGD is encrypted, but not set the encryption 
bit in the PGD entry
 for a PUD which results in the PUD pointed to by that entry to not be
 encrypted.
 
-Support for SME can be determined through the CPUID instruction. The CPUID
-function 0x801f reports information related to SME:
+When SEV is enabled, certain type of memory (namely insruction pages and guest
+page tables) are always treated as private. Due to security reasons all DMA
+operations inside the guest must be performed on shared memory. Since the
+memory encryption bit is only controllable by the guest OS when it is operating
+in 64-bit or 32-bit PAE mode, in all other modes the SEV hardware forces memory
+encryption bit to 1.
+
+Support for SME and SEV can be determined through the CPUID instruction. The
+CPUID function 0x801f reports information related to SME:
 
0x801f[eax]:
Bit[0] indicates support for SME
+   0x81f[eax]:
+   Bit[1] indicates support for SEV
0x801f[ebx]:
Bits[5:0]  pagetable bit number used to activate memory
   encryption
@@ -39,6 +55,13 @@ determine if SME is enabled and/or to enable memory 
encryption:
Bit[23]   0 = memory encryption features are disabled
  1 = memory encryption features are enabled
 
+If SEV is supported, MSR 0xc0010131 (MSR_F17H_SEV) can be used to determine if
+SEV is active:
+
+   0xc0010131:
+   Bit[0]0 = memory encryption is not active
+ 1 = memory encryption is active
+
 Linux relies on BIOS to set this bit if BIOS has determined that the reduction
 in the physical address space as a result of enabling memory encryption (see
 CPUID information above) will not conflict with the address space resource
-- 
2.9.4

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC Part1 PATCH v3 00/17] x86: Secure Encrypted Virtualization (AMD)

2017-07-24 Thread Brijesh Singh
This part of Secure Encrypted Virtualization (SEV) series focuses on the
changes required in a guest OS for SEV support.

When SEV is active, the memory content of guest OS will be transparently 
encrypted
with a key unique to the guest VM.

SEV guests have concept of private and shared memory. Private memory is 
encrypted
with the guest-specific key, while shared memory may be encrypted with 
hypervisor
key. Certain type of memory (namely insruction pages and guest page tables) are
always treated as private. Due to security reasons all DMA operations inside the
guest must be performed on shared memory.

The SEV feature is enabled by the hypervisor, and guest can identify it through
CPUID function and the 0xc0010131 (F17H_SEV) MSR. When enabled, page table 
entries
will determine how memory is accessed. If a page table entry has the memory
encryption mask set, then that memory will be accessed using guest-specific key.
Certain memory (instruction pages, page tables) will always be accessed using
guest-specific key.

This patch series builds upon the Secure Memory Encryption (SME) feature. Unlike
SME, when SEV is enabled, all the data (e.g EFI, kernel, initrd, etc) will have
been placed into memory as encrypted by the guest BIOS.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot. Since the DMA operations inside guest must be
performed on shared memory hence it uses SW-IOTLB to complete the DMA 
operations.

The following links provide additional details:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

Secure Encrypted Virutualization Key Management:
http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf

KVM Forum Presentation:
http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf

SEV Guest BIOS support:
  SEV support has been accepted into EDKII/OVMF BIOS
  https://github.com/tianocore/edk2/commits/master

---
This RFC is based on tip/master commit : 22db3de (Merge branch 'x86/mm').
Complete git tree is available: https://github.com/codomania/tip/tree/sev-rfc-3

Changes since v2:
 * add documentation
 * update early_set_memory_* to use kernel_physical_mapping_init()
   to split larger page into smaller (recommended by Boris)
 * changes to address v2 feedback

Brijesh Singh (4):
  Documentation/x86: Add AMD Secure Encrypted Virtualization (SEV)
descrption
  x86: Add support for changing memory encryption attribute in early
boot
  X86/KVM: Provide support to create Guest and HV shared per-CPU
variables
  X86/KVM: Clear encryption attribute when SEV is active

Tom Lendacky (13):
  x86/CPU/AMD: Add the Secure Encrypted Virtualization CPU feature
  x86/mm: Secure Encrypted Virtualization (SEV) support
  x86/mm: Don't attempt to encrypt initrd under SEV
  x86, realmode: Don't decrypt trampoline area under SEV
  x86/mm: Use encrypted access of boot related data with SEV
  x86/mm: Include SEV for encryption memory attribute changes
  x86/efi: Access EFI data as encrypted when SEV is active
  resource: Consolidate resource walking code
  resource: Provide resource struct in resource walk callback
  x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory
pages
  x86/mm: DMA support for SEV memory encryption
  x86/io: Unroll string I/O when SEV is active
  x86/boot: Add early boot support when running with SEV active

 Documentation/x86/amd-memory-encryption.txt |  29 +++-
 arch/powerpc/kernel/machine_kexec_file_64.c |  12 +-
 arch/x86/boot/compressed/Makefile   |   2 +
 arch/x86/boot/compressed/head_64.S  |  16 ++
 arch/x86/boot/compressed/mem_encrypt.S  | 103 
 arch/x86/boot/compressed/misc.h |   2 +
 arch/x86/boot/compressed/pagetable.c|   8 +-
 arch/x86/entry/vdso/vma.c   |   5 +-
 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/io.h   |  26 ++-
 arch/x86/include/asm/mem_encrypt.h  |  22 +++
 arch/x86/include/asm/msr-index.h|   5 +
 arch/x86/include/uapi/asm/kvm_para.h|   1 -
 arch/x86/kernel/cpu/amd.c   |  30 +++-
 arch/x86/kernel/cpu/scattered.c |   1 +
 arch/x86/kernel/crash.c |  18 +-
 arch/x86/kernel/kvm.c   |  46 +-
 arch/x86/kernel/kvmclock.c  |  64 ++-
 arch/x86/kernel/pmem.c  |   2 +-
 arch/x86/kernel/setup.c |   6 +-
 arch/x86/mm/ioremap.c   |  72 ++--
 arch/x86/mm/mem_encrypt.c   | 248 +++-
 arch/x86/mm/pageattr.c  |   4 +-
 arch/x86/platform/efi/ef

Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-04-07 Thread Brijesh Singh



On 04/07/2017 06:33 AM, Borislav Petkov wrote:

On Thu, Apr 06, 2017 at 01:37:41PM -0500, Brijesh Singh wrote:

I did thought about prot idea but ran into another corner case which may require
us changing the signature of phys_pud_init and phys_pmd_init. The paddr_start
and paddr_end args into kernel_physical_mapping_init() should be aligned on PMD
level down (see comment [1]). So, if we encounter a case where our address range
is part of large page but we need to clear only one entry (i.e asked to clear 
just
one page into 2M region). In that case, now we need to pass additional arguments
into kernel_physical_mapping, phys_pud_init and phys_pmd_init to hint the 
splitting
code that it should use our prot for specific entries and all other entries 
will use
the old_prot.


Ok, but your !4K case:

+   /*
+* virtual address is part of large page, create the page
+* table mapping to use smaller pages (4K). The virtual and
+* physical address must be aligned to PMD level.
+*/
+   kernel_physical_mapping_init(__pa(vaddr & PMD_MASK),
+__pa((vaddr_end & PMD_MASK) + 
PMD_SIZE),
+0);


would map a 2M page as encrypted by default. What if we want to map a 2M page
frame as ~_PAGE_ENC?



Thanks for feedbacks, I will make sure that we cover all other cases in final 
patch.
Untested but something like this can be used to check whether we can change the 
large page
in one go or request the splitting.

+   psize = page_level_size(level);
+   pmask = page_level_mask(level);
+
+   /*
+* Check, whether we can change the large page in one go.
+* We request a split, when the address is not aligned and
+* the number of pages to set or clear encryption bit is smaller
+* than the number of pages in the large page.
+*/
+   if (vaddr == (vaddr & pmask) && ((vaddr_end - vaddr) >= psize)) 
{
+   /* UPDATE PMD HERE */
+   vaddr_next = (vaddr & pmask) + psize;
+   continue;
+   }
+

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-04-06 Thread Brijesh Singh



On 04/06/2017 12:25 PM, Borislav Petkov wrote:

Hi Brijesh,

On Thu, Apr 06, 2017 at 09:05:03AM -0500, Brijesh Singh wrote:

I looked into arch/x86/mm/init_{32,64}.c and as you pointed the file contains
routines to do basic page splitting. I think it sufficient for our usage.


Good :)


I should be able to drop the memblock patch from the series and update the
Patch 15 [1] to use the kernel_physical_mapping_init().

The kernel_physical_mapping_init() creates the page table mapping using
default KERNEL_PAGE attributes, I tried to extend the function by passing
'bool enc' flags to hint whether to clr or set _PAGE_ENC when splitting the
pages. The code did not looked clean hence I dropped that idea.


Or, you could have a

__kernel_physical_mapping_init_prot(..., prot)

helper which gets a protection argument and hands it down. The lower
levels already hand down prot which is good.



I did thought about prot idea but ran into another corner case which may require
us changing the signature of phys_pud_init and phys_pmd_init. The paddr_start
and paddr_end args into kernel_physical_mapping_init() should be aligned on PMD
level down (see comment [1]). So, if we encounter a case where our address range
is part of large page but we need to clear only one entry (i.e asked to clear 
just
one page into 2M region). In that case, now we need to pass additional arguments
into kernel_physical_mapping, phys_pud_init and phys_pmd_init to hint the 
splitting
code that it should use our prot for specific entries and all other entries 
will use
the old_prot.

[1] http://lxr.free-electrons.com/source/arch/x86/mm/init_64.c#L546



The interface kernel_physical_mapping_init() will then itself call:

__kernel_physical_mapping_init_prot(..., PAGE_KERNEL);

for the normal cases.

That in a pre-patch of course.

How does that sound?


--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-04-06 Thread Brijesh Singh

Hi Boris,

On 03/17/2017 05:17 AM, Borislav Petkov wrote:

On Thu, Mar 16, 2017 at 11:25:36PM +0100, Paolo Bonzini wrote:

The kvmclock memory is initially zero so there is no need for the
hypervisor to allocate anything; the point of these patches is just to
access the data in a natural way from Linux source code.


I realize that.


I also don't really like the patch as is (plus it fails modpost), but
IMO reusing __change_page_attr and __split_large_page is the right thing
to do.


Right, so teaching pageattr.c about memblock could theoretically come
around and bite us later when a page allocated with memblock gets freed
with free_page().

And looking at this more, we have all this kernel pagetable preparation
code down the init_mem_mapping() call and the pagetable setup in
arch/x86/mm/init_{32,64}.c

And that code even does some basic page splitting. Oh and it uses
alloc_low_pages() which knows whether to do memblock reservation or the
common __get_free_pages() when slabs are up.



I looked into arch/x86/mm/init_{32,64}.c and as you pointed the file contains
routines to do basic page splitting. I think it sufficient for our usage.

I should be able to drop the memblock patch from the series and update the
Patch 15 [1] to use the kernel_physical_mapping_init().

The kernel_physical_mapping_init() creates the page table mapping using
default KERNEL_PAGE attributes, I tried to extend the function by passing
'bool enc' flags to hint whether to clr or set _PAGE_ENC when splitting the
pages. The code did not looked clean hence I dropped that idea. Instead,
I took the below approach. I did some runtime test and it seem to be working 
okay.

[1] http://marc.info/?l=linux-mm&m=148846773731212&w=2

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 7df5f4c..de16ef4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -22,6 +23,8 @@
 #include 
 #include 
 
+#include "mm_internal.h"

+
 extern pmdval_t early_pmd_flags;
 int __init __early_make_pgtable(unsigned long, pmdval_t);
 void __init __early_pgtable_flush(void);
@@ -258,6 +261,72 @@ static void sme_free(struct device *dev, size_t size, void 
*vaddr,
swiotlb_free_coherent(dev, size, vaddr, dma_handle);
 }
 
+static int __init early_set_memory_enc_dec(resource_size_t paddr,

+  unsigned long size, bool enc)
+{
+   pte_t *kpte;
+   int level;
+   unsigned long vaddr, vaddr_end, vaddr_next;
+
+   vaddr = (unsigned long)__va(paddr);
+   vaddr_next = vaddr;
+   vaddr_end = vaddr + size;
+
+   /*
+* We are going to change the physical page attribute from C=1 to C=0.
+* Flush the caches to ensure that all the data with C=1 is flushed to
+* memory. Any caching of the vaddr after function returns will
+* use C=0.
+*/
+   clflush_cache_range(__va(paddr), size);
+
+   for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+   kpte = lookup_address(vaddr, &level);
+   if (!kpte || pte_none(*kpte) )
+   return 1;
+
+   if (level == PG_LEVEL_4K) {
+   pte_t new_pte;
+   unsigned long pfn = pte_pfn(*kpte);
+   pgprot_t new_prot = pte_pgprot(*kpte);
+
+   if (enc)
+   pgprot_val(new_prot) |= _PAGE_ENC;
+   else
+   pgprot_val(new_prot) &= ~_PAGE_ENC;
+
+   new_pte = pfn_pte(pfn, canon_pgprot(new_prot));
+   pr_info("  pte %016lx -> 0x%016lx\n", pte_val(*kpte),
+   pte_val(new_pte));
+   set_pte_atomic(kpte, new_pte);
+   vaddr_next = (vaddr & PAGE_MASK) + PAGE_SIZE;
+   continue;
+   }
+
+   /*
+* virtual address is part of large page, create the page
+* table mapping to use smaller pages (4K). The virtual and
+* physical address must be aligned to PMD level.
+*/
+   kernel_physical_mapping_init(__pa(vaddr & PMD_MASK),
+__pa((vaddr_end & PMD_MASK) + 
PMD_SIZE),
+0);
+   }
+
+   __flush_tlb_all();
+   return 0;
+}
+
+int __init early_set_memory_decrypted(resource_size_t paddr, unsigned long 
size)
+{
+   return early_set_memory_enc_dec(paddr, size, false);
+}
+
+int __init early_set_memory_encrypted(resource_size_t paddr, unsigned long 
size)
+{
+   return early_set_memory_enc_dec(paddr, size, true);
+}
+


So what would be much cleaner, IMHO, is if one would reuse that code to
change init_mm.pgd mappings early without copying pageattr.c.

init_mem_mapping() gets called before kvm_guest_init() i

Re: [RFC PATCH v2 18/32] kvm: svm: Use the hardware provided GPA instead of page walk

2017-03-29 Thread Brijesh Singh

Hi Boris,

On 03/29/2017 10:14 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:16:05AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

When a guest causes a NPF which requires emulation, KVM sometimes walks
the guest page tables to translate the GVA to a GPA. This is unnecessary
most of the time on AMD hardware since the hardware provides the GPA in
EXITINFO2.

The only exception cases involve string operations involving rep or
operations that use two memory locations. With rep, the GPA will only be
the value of the initial NPF and with dual memory locations we won't know
which memory address was translated into EXITINFO2.

Signed-off-by: Tom Lendacky 
Reviewed-by: Borislav Petkov 


I think I already asked you to remove Revewed-by tags when you have to
change an already reviewed patch in non-trivial manner. Why does this
one still have my Reviewed-by tag?



Actually this patch is included in RFCv2 series for the completeness.

The patch is already been reviewed and accepted in kvm upstream tree but it
was not present in the tip branch hence I cherry-picked into RFC so that we do
not break the build. SEV runtime behavior needs this patch. I have tried to
highlight it in cover letter. It was my bad that I missed fixing the Reviewed-by
tag during cherry picking. Sorry about that and will be extra careful next time 
around. Thanks


~ Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 15/32] x86: Add support for changing memory encryption attribute in early boot

2017-03-27 Thread Brijesh Singh

Hi Boris,

On 03/24/2017 12:12 PM, Borislav Petkov wrote:

 }

+static inline int __init early_set_memory_decrypted(void *addr,
+   unsigned long size)
+{
+   return 1;



return 1 when !CONFIG_AMD_MEM_ENCRYPT ?

The non-early variants return 0.



I will fix it and use the same return value.


+}
+
+static inline int __init early_set_memory_encrypted(void *addr,
+   unsigned long size)
+{
+   return 1;
+}
+
 #define __sme_pa   __pa



+   unsigned long pfn, npages;
+   unsigned long addr = (unsigned long)vaddr & PAGE_MASK;
+
+   /* We are going to change the physical page attribute from C=1 to C=0.
+* Flush the caches to ensure that all the data with C=1 is flushed to
+* memory. Any caching of the vaddr after function returns will
+* use C=0.
+*/


Kernel comments style is:

/*
 * A sentence ending with a full-stop.
 * Another sentence. ...
 * More sentences. ...
 */



I will update to use kernel comment style.



+   clflush_cache_range(vaddr, size);
+
+   npages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   pfn = slow_virt_to_phys((void *)addr) >> PAGE_SHIFT;
+
+   return kernel_map_pages_in_pgd(init_mm.pgd, pfn, addr, npages,
+   flags & ~sme_me_mask);
+
+}
+
+int __init early_set_memory_decrypted(void *vaddr, unsigned long size)
+{
+   unsigned long flags = get_pte_flags((unsigned long)vaddr);


So this does lookup_address()...


+   return early_set_memory_enc_dec(vaddr, size, flags & ~sme_me_mask);


... and this does it too in slow_virt_to_phys(). So you do it twice per
vaddr.

So why don't you define a __slow_virt_to_phys() helper - notice
the "__" - which returns flags in its second parameter and which
slow_virt_to_phys() calls with a NULL second parameter in the other
cases?



I will look into creating a helper function. thanks

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 29/32] kvm: svm: Add support for SEV DEBUG_DECRYPT command

2017-03-16 Thread Brijesh Singh



On 03/16/2017 05:54 AM, Paolo Bonzini wrote:



On 02/03/2017 16:18, Brijesh Singh wrote:

+static int __sev_dbg_decrypt_page(struct kvm *kvm, unsigned long src,
+   void *dst, int *error)
+{
+   inpages = sev_pin_memory(src, PAGE_SIZE, &npages);
+   if (!inpages) {
+   ret = -ENOMEM;
+   goto err_1;
+   }
+
+   data->handle = sev_get_handle(kvm);
+   data->dst_addr = __psp_pa(dst);
+   data->src_addr = __sev_page_pa(inpages[0]);
+   data->length = PAGE_SIZE;
+
+   ret = sev_issue_cmd(kvm, SEV_CMD_DBG_DECRYPT, data, error);
+   if (ret)
+   printk(KERN_ERR "SEV: DEBUG_DECRYPT %d (%#010x)\n",
+   ret, *error);
+   sev_unpin_memory(inpages, npages);
+err_1:
+   kfree(data);
+   return ret;
+}
+
+static int sev_dbg_decrypt(struct kvm *kvm, struct kvm_sev_cmd *argp)
+{
+   void *data;
+   int ret, offset, len;
+   struct kvm_sev_dbg debug;
+
+   if (!sev_guest(kvm))
+   return -ENOTTY;
+
+   if (copy_from_user(&debug, (void *)argp->data,
+   sizeof(struct kvm_sev_dbg)))
+   return -EFAULT;
+   /*
+* TODO: add support for decrypting length which crosses the
+* page boundary.
+*/
+   offset = debug.src_addr & (PAGE_SIZE - 1);
+   if (offset + debug.length > PAGE_SIZE)
+   return -EINVAL;
+


Please do add it, it doesn't seem very different from what you're doing
in LAUNCH_UPDATE_DATA.  There's no need for a separate
__sev_dbg_decrypt_page function, you can just pin/unpin here and do a
per-page loop as in LAUNCH_UPDATE_DATA.



I can certainly add support to handle crossing the page boundary cases.
Should we limit the size to prevent user passing arbitrary long length
and we end up looping inside the kernel? I was thinking to limit to a PAGE_SIZE.

~ Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 30/32] kvm: svm: Add support for SEV DEBUG_ENCRYPT command

2017-03-16 Thread Brijesh Singh



On 03/16/2017 06:03 AM, Paolo Bonzini wrote:



On 02/03/2017 16:18, Brijesh Singh wrote:

+   data = (void *) get_zeroed_page(GFP_KERNEL);


The page does not need to be zeroed, does it?



No, we don't have to zero it. I will fix it.


+
+   if ((len & 15) || (dst_addr & 15)) {
+   /* if destination address and length are not 16-byte
+* aligned then:
+* a) decrypt destination page into temporary buffer
+* b) copy source data into temporary buffer at correct offset
+* c) encrypt temporary buffer
+*/
+   ret = __sev_dbg_decrypt_page(kvm, dst_addr, data, &argp->error);


Ah, I see now you're using this function here for read-modify-write.
data is already pinned here, so even if you keep the function it makes
sense to push pinning out of __sev_dbg_decrypt_page and into
sev_dbg_decrypt.


I can push out pinning part outside __sev_dbg_decrypt_page




+   if (ret)
+   goto err_3;
+   d_off = dst_addr & (PAGE_SIZE - 1);
+
+   if (copy_from_user(data + d_off,
+   (uint8_t *)debug.src_addr, len)) {
+   ret = -EFAULT;
+   goto err_3;
+   }
+
+   encrypt->length = PAGE_SIZE;


Why decrypt/re-encrypt all the page instead of just the 16 byte area
around the [dst_addr, dst_addr+len) range?



good catch, I should be fine just decrypting a 16 byte area. Will fix in next 
rev


+   encrypt->src_addr = __psp_pa(data);
+   encrypt->dst_addr =  __sev_page_pa(inpages[0]);
+   } else {
+   if (copy_from_user(data, (uint8_t *)debug.src_addr, len)) {
+   ret = -EFAULT;
+   goto err_3;
+   }


Do you need copy_from_user, or can you just pin/unpin memory as for
DEBUG_DECRYPT?



We can work either with pin/unpin or copy_from_user. I think I choose 
copy_from_user because
in most of time ENCRYPT path was used when I set breakpoint through gdb which 
basically
requires copying pretty small data into guest memory. It may be very much 
possible that
someone can try to copy lot more data and then pin/unpin can speedup the things.

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 26/32] kvm: svm: Add support for SEV LAUNCH_UPDATE_DATA command

2017-03-16 Thread Brijesh Singh


On 03/16/2017 05:48 AM, Paolo Bonzini wrote:



On 02/03/2017 16:17, Brijesh Singh wrote:

+static struct page **sev_pin_memory(unsigned long uaddr, unsigned long ulen,
+   unsigned long *n)
+{
+   struct page **pages;
+   int first, last;
+   unsigned long npages, pinned;
+
+   /* Get number of pages */
+   first = (uaddr & PAGE_MASK) >> PAGE_SHIFT;
+   last = ((uaddr + ulen - 1) & PAGE_MASK) >> PAGE_SHIFT;
+   npages = (last - first + 1);
+
+   pages = kzalloc(npages * sizeof(struct page *), GFP_KERNEL);
+   if (!pages)
+   return NULL;
+
+   /* pin the user virtual address */
+   down_read(¤t->mm->mmap_sem);
+   pinned = get_user_pages_fast(uaddr, npages, 1, pages);
+   up_read(¤t->mm->mmap_sem);


get_user_pages_fast, like get_user_pages_unlocked, must be called
without mmap_sem held.


Sure.




+   if (pinned != npages) {
+   printk(KERN_ERR "SEV: failed to pin  %ld pages (got %ld)\n",
+   npages, pinned);
+   goto err;
+   }
+
+   *n = npages;
+   return pages;
+err:
+   if (pinned > 0)
+   release_pages(pages, pinned, 0);
+   kfree(pages);
+
+   return NULL;
+}

+   /* the array of pages returned by get_user_pages() is a page-aligned
+* memory. Since the user buffer is probably not page-aligned, we need
+* to calculate the offset within a page for first update entry.
+*/
+   offset = uaddr & (PAGE_SIZE - 1);
+   len = min_t(size_t, (PAGE_SIZE - offset), ulen);
+   ulen -= len;
+
+   /* update first page -
+* special care need to be taken for the first page because we might
+* be dealing with offset within the page
+*/


No need to special case the first page; just set "offset = 0" inside the
loop after the first iteration.



Will do.

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 32/32] x86: kvm: Pin the guest memory when SEV is active

2017-03-16 Thread Brijesh Singh



On 03/16/2017 05:38 AM, Paolo Bonzini wrote:



On 02/03/2017 16:18, Brijesh Singh wrote:

The SEV memory encryption engine uses a tweak such that two identical
plaintexts at different location will have a different ciphertexts.
So swapping or moving ciphertexts of two pages will not result in
plaintexts being swapped. Relocating (or migrating) a physical backing pages
for SEV guest will require some additional steps. The current SEV key
management spec [1] does not provide commands to swap or migrate (move)
ciphertexts. For now we pin the memory allocated for the SEV guest. In
future when SEV key management spec provides the commands to support the
page migration we can update the KVM code to remove the pinning logical
without making any changes into userspace (qemu).

The patch pins userspace memory when a new slot is created and unpin the
memory when slot is removed.

[1] http://support.amd.com/TechDocs/55766_SEV-KM%20API_Spec.pdf


This is not enough, because memory can be hidden temporarily from the
guest and remapped later.  Think of a PCI BAR that is backed by RAM, or
also SMRAM.  The pinning must be kept even in that case.

You need to add a pair of KVM_MEMORY_ENCRYPT_OPs (one that doesn't map
to a PSP operation), such as KVM_REGISTER/UNREGISTER_ENCRYPTED_RAM.  In
QEMU you can use a RAMBlockNotifier to invoke the ioctls.



I was hoping to avoid adding new ioctl, but I see your point. Will add a pair 
of ioctl's
and use RAMBlocNotifier to invoke those ioctls.

-Brijesh
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages

2017-03-10 Thread Brijesh Singh

Hi Boris,

On 03/10/2017 05:06 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:15:15AM -0500, Brijesh Singh wrote:

If kernel_maps_pages_in_pgd is called early in boot process to change the


kernel_map_pages_in_pgd()


memory attributes then it fails to allocate memory when spliting large
pages. The patch extends the cpa_data to provide the support to use
memblock_alloc when slab allocator is not available.

The feature will be used in Secure Encrypted Virtualization (SEV) mode,
where we may need to change the memory region attributes in early boot
process.

Signed-off-by: Brijesh Singh 
---
 arch/x86/mm/pageattr.c |   51 
 1 file changed, 42 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 46cc89d..9e4ab3b 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -37,6 +38,7 @@ struct cpa_data {
int flags;
unsigned long   pfn;
unsignedforce_split : 1;
+   unsignedforce_memblock :1;
int curpage;
struct page **pages;
 };
@@ -627,9 +629,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,

 static int
 __split_large_page(struct cpa_data *cpa, pte_t *kpte, unsigned long address,
-  struct page *base)
+ pte_t *pbase, unsigned long new_pfn)
 {
-   pte_t *pbase = (pte_t *)page_address(base);
unsigned long ref_pfn, pfn, pfninc = 1;
unsigned int i, level;
pte_t *tmp;
@@ -646,7 +647,7 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, 
unsigned long address,
return 1;
}

-   paravirt_alloc_pte(&init_mm, page_to_pfn(base));
+   paravirt_alloc_pte(&init_mm, new_pfn);

switch (level) {
case PG_LEVEL_2M:
@@ -707,7 +708,8 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, 
unsigned long address,
 * pagetable protections, the actual ptes set above control the
 * primary protection behavior:
 */
-   __set_pmd_pte(kpte, address, mk_pte(base, __pgprot(_KERNPG_TABLE)));
+   __set_pmd_pte(kpte, address,
+   native_make_pte((new_pfn << PAGE_SHIFT) + _KERNPG_TABLE));

/*
 * Intel Atom errata AAH41 workaround.
@@ -723,21 +725,50 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpte, 
unsigned long address,
return 0;
 }

+static pte_t *try_alloc_pte(struct cpa_data *cpa, unsigned long *pfn)
+{
+   unsigned long phys;
+   struct page *base;
+
+   if (cpa->force_memblock) {
+   phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);


Maybe there's a reason this fires:

WARNING: modpost: Found 2 section mismatch(es).
To see full details build your kernel with:
'make CONFIG_DEBUG_SECTION_MISMATCH=y'

WARNING: vmlinux.o(.text+0x48edc): Section mismatch in reference from the 
function __change_page_attr() to the function .init.text:memblock_alloc()
The function __change_page_attr() references
the function __init memblock_alloc().
This is often because __change_page_attr lacks a __init
annotation or the annotation of memblock_alloc is wrong.

WARNING: vmlinux.o(.text+0x491d1): Section mismatch in reference from the 
function __change_page_attr() to the function .meminit.text:memblock_free()
The function __change_page_attr() references
the function __meminit memblock_free().
This is often because __change_page_attr lacks a __meminit
annotation or the annotation of memblock_free is wrong.



I can take a look at fixing those warning. In my initial attempt was to create
a new function to clear encryption bit but it ended up looking very similar to
__change_page_attr_set_clr() hence decided to extend the exiting function to
use memblock_alloc().



Why do we need this whole early mapping? For the guest? I don't like
that memblock thing at all.


Early in boot process, guest kernel allocates some structure (its either
statically allocated or dynamic allocated via memblock_alloc). And shares the 
physical
address of these structure with hypervisor. Since entire guest memory area is 
mapped
as encrypted hence those structure's are mapped as encrypted memory range. We 
need
a method to clear the encryption bit. Sometime these structure maybe part of 2M 
pages
and need to split into smaller pages.



So I think the approach with the .data..percpu..hv_shared section is
fine and we should consider SEV-ES

http://support.amd.com/TechDocs/Protecting%20VM%20Register%20State%20with%20SEV-ES.pdf

and do this right from the get-go so that when SEV-ES comes along, we
should simply be ready and extend that mechanism to put the whole Guest
Hypervisor Communication Block in there.




But then the fact that you're mapping those decrypted in init_mm.pgd
makes me think you don't need that early mapping t

Re: [RFC PATCH v2 12/32] x86: Add early boot support when running with SEV active

2017-03-10 Thread Brijesh Singh

Hi Boris and Paolo,

On 03/09/2017 10:29 AM, Borislav Petkov wrote:

On Thu, Mar 09, 2017 at 05:13:33PM +0100, Paolo Bonzini wrote:

This is not how you check if running under a hypervisor; you should
check the HYPERVISOR bit, i.e. bit 31 of cpuid(1).ecx.  This in turn
tells you if leaf 0x4000 is valid.


Ah, good point, I already do that in the microcode loader :)

/*
 * CPUID(1).ECX[31]: reserved for hypervisor use. This is still not
 * completely accurate as xen pv guests don't see that CPUID bit set but
 * that's good enough as they don't land on the BSP path anyway.
 */
if (native_cpuid_ecx(1) & BIT(31))
return *res;


That said, the main issue with this function is that it hardcodes the
behavior for KVM.  It is possible that another hypervisor defines its
0x4001 leaf in such a way that KVM_FEATURE_SEV has a different meaning.

Instead, AMD should define a "well-known" bit in its own space (i.e.
0x80xx) that is only used by hypervisors that support SEV.  This is
similar to how Intel defined one bit in leaf 1 to say "is leaf
0x4000 valid".


+   if (eax > 0x4000) {
+   eax = 0x4001;
+   ecx = 0;
+   native_cpuid(&eax, &ebx, &ecx, &edx);
+   if (!(eax & BIT(KVM_FEATURE_SEV)))
+   goto out;
+
+   eax = 0x801f;
+   ecx = 0;
+   native_cpuid(&eax, &ebx, &ecx, &edx);
+   if (!(eax & 1))


Right, so this is testing CPUID_0x801f_ECX(0)[0], SME. Why not
simply set that bit for the guest too, in kvm?



CPUID_8000_001F[EAX] indicates whether the feature is supported.
CPUID_0x801F[EAX]:
 * Bit 0 - SME supported
 * Bit 1 - SEV supported
 * Bit 3 - SEV-ES supported

We can use MSR_K8_SYSCFG[MemEncryptionModeEnc] to check if memory encryption is 
enabled.
Currently, KVM returns zero when guest OS read MSR_K8_SYSCFG. I can update my 
patch sets
to set this bit for SEV enabled guests.

We could update this patch to use the below logic:

 * CPUID(0) - Check for AuthenticAMD
 * CPID(1) - Check if under hypervisor
 * CPUID(0x8000) - Check for highest supported leaf
 * CPUID(0x801F).EAX - Check for SME and SEV support
 * rdmsr (MSR_K8_SYSCFG)[MemEncryptionModeEnc] - Check if SMEE is set


Paolo,

One question, do we need "AuthenticAMD" check when we are running under 
hypervisor ?
I was looking at qemu code and found that qemu exposes parameters to change the 
CPU
vendor id. The above check will fail if user changes the vendor id while 
launching
the SEV guest.

-Brijesh

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 01/32] x86: Add the Secure Encrypted Virtualization CPU feature

2017-03-06 Thread Brijesh Singh
On 03/04/2017 04:11 AM, Borislav Petkov wrote:
> On Fri, Mar 03, 2017 at 03:01:23PM -0600, Brijesh Singh wrote:
> 
> This looks like a wraparound...
> 
> $ test-apply.sh /tmp/brijesh.singh.delta
> checking file Documentation/admin-guide/kernel-parameters.txt
> Hunk #1 succeeded at 2144 (offset -9 lines).
> checking file Documentation/x86/amd-memory-encryption.txt
> patch:  malformed patch at line 23: DRAM from physical
> 
> Yap.
> 
> Looks like exchange or your mail client decided to do some patch editing
> on its own.
> 
> Please send it to yourself first and try applying.
> 

Sending it through stg mail to avoid line wrapping. Please let me know if 
something
is still messed up. I have tried applying it and it seems to apply okay.

---
 Documentation/admin-guide/kernel-parameters.txt |4 +--
 Documentation/x86/amd-memory-encryption.txt |   33 +--
 arch/x86/include/asm/cpufeature.h   |7 +
 arch/x86/include/asm/cpufeatures.h  |6 +---
 arch/x86/include/asm/disabled-features.h|3 +-
 arch/x86/include/asm/required-features.h|3 +-
 arch/x86/kernel/cpu/amd.c   |   23 
 arch/x86/kernel/cpu/common.c|   23 
 arch/x86/kernel/cpu/scattered.c |1 +
 9 files changed, 50 insertions(+), 53 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 91c40fa..b91e2495 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2153,8 +2153,8 @@
mem_encrypt=on: Activate SME
mem_encrypt=off:Do not activate SME
 
-   Refer to the SME documentation for details on when
-   memory encryption can be activated.
+   Refer to Documentation/x86/amd-memory-encryption.txt
+   for details on when memory encryption can be activated.
 
mem_sleep_default=  [SUSPEND] Default system suspend mode:
s2idle  - Suspend-To-Idle
diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt
index 0938e89..0b72ff2 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -7,9 +7,9 @@ DRAM.  SME can therefore be used to protect the contents of 
DRAM from physical
 attacks on the system.
 
 A page is encrypted when a page table entry has the encryption bit set (see
-below how to determine the position of the bit).  The encryption bit can be
-specified in the cr3 register, allowing the PGD table to be encrypted. Each
-successive level of page tables can also be encrypted.
+below on how to determine its position).  The encryption bit can be specified
+in the cr3 register, allowing the PGD table to be encrypted. Each successive
+level of page tables can also be encrypted.
 
 Support for SME can be determined through the CPUID instruction. The CPUID
 function 0x801f reports information related to SME:
@@ -17,13 +17,14 @@ function 0x801f reports information related to SME:
0x801f[eax]:
Bit[0] indicates support for SME
0x801f[ebx]:
-   Bit[5:0]  pagetable bit number used to activate memory
- encryption
-   Bit[11:6] reduction in physical address space, in bits, when
- memory encryption is enabled (this only affects system
- physical addresses, not guest physical addresses)
-
-If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+   Bits[5:0]  pagetable bit number used to activate memory
+  encryption
+   Bits[11:6] reduction in physical address space, in bits, when
+  memory encryption is enabled (this only affects
+  system physical addresses, not guest physical
+  addresses)
+
+If support for SME is present, MSR 0xc00100010 (MSR_K8_SYSCFG) can be used to
 determine if SME is enabled and/or to enable memory encryption:
 
0xc0010010:
@@ -41,7 +42,7 @@ The state of SME in the Linux kernel can be documented as 
follows:
  The CPU supports SME (determined through CPUID instruction).
 
- Enabled:
- Supported and bit 23 of the SYS_CFG MSR is set.
+ Supported and bit 23 of MSR_K8_SYSCFG is set.
 
- Active:
  Supported, Enabled and the Linux kernel is actively applying
@@ -51,7 +52,9 @@ The state of SME in the Linux kernel can be documented as 
follows:
 SME can also be enabled and activated in the BIOS. If SME is enabled and
 activated in the BIOS, then all memory accesses will be encrypted and it 

Re: [RFC PATCH v2 00/32] x86: Secure Encrypted Virtualization (AMD)

2017-03-03 Thread Brijesh Singh

Hi Bjorn,

On 03/03/2017 02:33 PM, Bjorn Helgaas wrote:

On Thu, Mar 02, 2017 at 10:12:01AM -0500, Brijesh Singh wrote:

This RFC series provides support for AMD's new Secure Encrypted Virtualization
(SEV) feature. This RFC is build upon Secure Memory Encryption (SME) RFCv4 [1].


What kernel version is this series based on?



This patch series is based off of the master branch of tip.
  Commit a27cb9e1b2b4 ("Merge branch 'WIP.sched/core'")
  Tom's RFC v4 patches (http://marc.info/?l=linux-mm&m=148725973013686&w=2)

Accidentally, I ended up rebasing SEV RFCv2 patches from updated SME v4 
instead of original SME v4. So you may need to apply patch [1]


[1] http://marc.info/?l=linux-mm&m=148857523132253&w=2

Optionally, I have posted the full git tree here [2]

[2] https://github.com/codomania/tip/branches

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 01/32] x86: Add the Secure Encrypted Virtualization CPU feature

2017-03-03 Thread Brijesh Singh

Hi Boris,

On 03/03/2017 10:59 AM, Borislav Petkov wrote:

On Thu, Mar 02, 2017 at 10:12:09AM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

Update the CPU features to include identifying and reporting on the
Secure Encrypted Virtualization (SEV) feature.  SME is identified by
CPUID 0x801f, but requires BIOS support to enable it (set bit 23 of
MSR_K8_SYSCFG and set bit 0 of MSR_K7_HWCR).  Only show the SEV feature
as available if reported by CPUID and enabled by BIOS.

Signed-off-by: Tom Lendacky 
---
 arch/x86/include/asm/cpufeatures.h |1 +
 arch/x86/include/asm/msr-index.h   |2 ++
 arch/x86/kernel/cpu/amd.c  |   22 ++
 arch/x86/kernel/cpu/scattered.c|1 +
 4 files changed, 22 insertions(+), 4 deletions(-)


So this patchset is not really ontop of Tom's patchset because this
patch doesn't apply. The reason is, Tom did the SME bit this way:

https://lkml.kernel.org/r/20170216154236.19244.7580.st...@tlendack-t1.amdoffice.net

but it should've been in scattered.c.


diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index cabda87..c3f58d9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -31,6 +31,7 @@ static const struct cpuid_bit cpuid_bits[] = {
{ X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
{ X86_FEATURE_SME,  CPUID_EAX,  0, 0x801f, 0 },
+   { X86_FEATURE_SEV,  CPUID_EAX,  1, 0x801f, 0 },
{ 0, 0, 0, 0, 0 }


... and here it is in scattered.c, as it should be. So you've used an
older version of the patch, it seems.

Please sync with Tom to see whether he's reworked the v4 version of that
patch already. If yes, then you could send only the SME and SEV adding
patches as a reply to this message so that I can continue reviewing in
the meantime.



Just realized my error, I actually end up using Tom's recent updates to 
v4 instead of original v4. Here is the diff. If you have Tom's v4 
applied then apply this diff before applying SEV v2 version. Sorry about 
that.


Optionally, you also pull the complete tree from github [1].

[1] https://github.com/codomania/tip/tree/sev-rfc-v2


diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt

index 91c40fa..b91e2495 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2153,8 +2153,8 @@
mem_encrypt=on: Activate SME
mem_encrypt=off:Do not activate SME

-   Refer to the SME documentation for details on when
-   memory encryption can be activated.
+   Refer to Documentation/x86/amd-memory-encryption.txt
+   for details on when memory encryption can be activated.

mem_sleep_default=  [SUSPEND] Default system suspend mode:
s2idle  - Suspend-To-Idle
diff --git a/Documentation/x86/amd-memory-encryption.txt 
b/Documentation/x86/amd-memory-encryption.txt

index 0938e89..0b72ff2 100644
--- a/Documentation/x86/amd-memory-encryption.txt
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -7,9 +7,9 @@ DRAM.  SME can therefore be used to protect the contents 
of DRAM from physical

 attacks on the system.

 A page is encrypted when a page table entry has the encryption bit set 
(see

-below how to determine the position of the bit).  The encryption bit can be
-specified in the cr3 register, allowing the PGD table to be encrypted. Each
-successive level of page tables can also be encrypted.
+below on how to determine its position).  The encryption bit can be 
specified
+in the cr3 register, allowing the PGD table to be encrypted. Each 
successive

+level of page tables can also be encrypted.

 Support for SME can be determined through the CPUID instruction. The CPUID
 function 0x801f reports information related to SME:
@@ -17,13 +17,14 @@ function 0x801f reports information related to SME:
0x801f[eax]:
Bit[0] indicates support for SME
0x801f[ebx]:
-   Bit[5:0]  pagetable bit number used to activate memory
- encryption
-   Bit[11:6] reduction in physical address space, in bits, when
- memory encryption is enabled (this only affects system
- physical addresses, not guest physical addresses)
-
-If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+   Bits[5:0]  pagetable bit number used to activate memory
+  encryption
+   Bits[11:6] reduction in physical address space, in bits, when
+  memory encryption is enabled (this only affects
+  s

[RFC PATCH v2 17/32] x86: kvmclock: Clear encryption attribute when SEV is active

2017-03-02 Thread Brijesh Singh
The guest physical memory area holding the struct pvclock_wall_clock and
struct pvclock_vcpu_time_info are shared with the hypervisor. Hypervisor
periodically updates the contents of the memory. When SEV is active we must
clear the encryption attributes of the shared memory pages so that both
hypervisor and guest can access the data.

Signed-off-by: Brijesh Singh 
---
 arch/x86/kernel/kvmclock.c |   65 ++--
 1 file changed, 56 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 278de4f..3b38b3d 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -44,7 +45,7 @@ early_param("no-kvmclock", parse_no_kvmclock);
 
 /* The hypervisor will put information about time periodically here */
 static struct pvclock_vsyscall_time_info *hv_clock;
-static struct pvclock_wall_clock wall_clock;
+static struct pvclock_wall_clock *wall_clock;
 
 struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 {
@@ -62,15 +63,18 @@ static void kvm_get_wallclock(struct timespec *now)
int low, high;
int cpu;
 
-   low = (int)__pa_symbol(&wall_clock);
-   high = ((u64)__pa_symbol(&wall_clock) >> 32);
+   if (!wall_clock)
+   return;
+
+   low = (int)slow_virt_to_phys(wall_clock);
+   high = ((u64)slow_virt_to_phys(wall_clock) >> 32);
 
native_write_msr(msr_kvm_wall_clock, low, high);
 
cpu = get_cpu();
 
vcpu_time = &hv_clock[cpu].pvti;
-   pvclock_read_wallclock(&wall_clock, vcpu_time, now);
+   pvclock_read_wallclock(wall_clock, vcpu_time, now);
 
put_cpu();
 }
@@ -246,11 +250,40 @@ static void kvm_shutdown(void)
native_machine_shutdown();
 }
 
+static phys_addr_t kvm_memblock_alloc(phys_addr_t size, phys_addr_t align)
+{
+   phys_addr_t mem;
+
+   mem = memblock_alloc(size, align);
+   if (!mem)
+   return 0;
+
+   /* When SEV is active clear the encryption attributes of the pages */
+   if (sev_active()) {
+   if (early_set_memory_decrypted(__va(mem), size))
+   goto e_free;
+   }
+
+   return mem;
+e_free:
+   memblock_free(mem, size);
+   return 0;
+}
+
+static void kvm_memblock_free(phys_addr_t addr, phys_addr_t size)
+{
+   /* When SEV is active restore the encryption attributes of the pages */
+   if (sev_active())
+   early_set_memory_encrypted(__va(addr), size);
+
+   memblock_free(addr, size);
+}
+
 void __init kvmclock_init(void)
 {
struct pvclock_vcpu_time_info *vcpu_time;
-   unsigned long mem;
-   int size, cpu;
+   unsigned long mem, mem_wall_clock;
+   int size, cpu, wall_clock_size;
u8 flags;
 
size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
@@ -267,15 +300,29 @@ void __init kvmclock_init(void)
printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
msr_kvm_system_time, msr_kvm_wall_clock);
 
-   mem = memblock_alloc(size, PAGE_SIZE);
-   if (!mem)
+   wall_clock_size = PAGE_ALIGN(sizeof(struct pvclock_wall_clock));
+   mem_wall_clock = kvm_memblock_alloc(wall_clock_size, PAGE_SIZE);
+   if (!mem_wall_clock)
return;
+
+   wall_clock = __va(mem_wall_clock);
+   memset(wall_clock, 0, wall_clock_size);
+
+   mem = kvm_memblock_alloc(size, PAGE_SIZE);
+   if (!mem) {
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
+   return;
+   }
+
hv_clock = __va(mem);
memset(hv_clock, 0, size);
 
if (kvm_register_clock("primary cpu clock")) {
hv_clock = NULL;
-   memblock_free(mem, size);
+   kvm_memblock_free(mem, size);
+   kvm_memblock_free(mem_wall_clock, wall_clock_size);
+   wall_clock = NULL;
return;
}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html