Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky
On 7/30/21 5:34 PM, Sean Christopherson wrote:
> On Tue, Jul 27, 2021, Tom Lendacky wrote:
>> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>>   * The unused memory range was mapped decrypted, change the encryption
>>   * attribute from decrypted to encrypted before freeing it.
>>   */
>> -if (mem_encrypt_active()) {
>> +if (sme_me_mask) {
> 
> Any reason this uses sme_me_mask?  The helper it calls, 
> __set_memory_enc_dec(),
> uses prot_guest_has(PATTR_MEM_ENCRYPT) so I assume it's available?

Probably just a slip on my part. I was debating at one point calling the
helper vs. referencing the variables/functions directly in the
mem_encrypt.c file.

Thanks,
Tom

> 
>>  r = set_memory_encrypted(vaddr, npages);
>>  if (r) {
>>  pr_warn("failed to free unused decrypted pages\n");
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky
On 8/2/21 5:45 AM, Joerg Roedel wrote:
> On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote:
>> @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct 
>> trampoline_header *th)
>>  if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>>  th->flags |= TH_FLAGS_SME_ACTIVE;
>>  
>> -if (sev_es_active()) {
>> +if (prot_guest_has(PATTR_GUEST_PROT_STATE)) {
>>  /*
>>   * Skip the call to verify_cpu() in secondary_startup_64 as it
>>   * will cause #VC exceptions when the AP can't handle them yet.
> 
> Not sure how TDX will handle AP booting, are you sure it needs this
> special setup as well? Otherwise a check for SEV-ES would be better
> instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.

Thanks,
Tom

> 
> Regards,
> 
> Joerg
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky
On 8/2/21 7:42 AM, Christophe Leroy wrote:
> 
> 
> Le 28/07/2021 à 00:26, Tom Lendacky a écrit :
>> Replace occurrences of mem_encrypt_active() with calls to prot_guest_has()
>> with the PATTR_MEM_ENCRYPT attribute.
> 
> 
> What about
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Flinuxppc-dev%2Fpatch%2F20210730114231.23445-1-will%40kernel.org%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7C1198d62463e04a27be5908d955b30433%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635049667233612%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Erpu4Du05sVYkYuAfTkXdLvq48%2FlfLS2q%2FZW8DG3tFw%3D&reserved=0>
>  ?

Ah, looks like that just went into the PPC tree and isn't part of the tip
tree. I'll have to look into how to handle that one.

Thanks,
Tom

> 
> Christophe
> 
> 
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Dave Hansen 
>> Cc: Andy Lutomirski 
>> Cc: Peter Zijlstra 
>> Cc: David Airlie 
>> Cc: Daniel Vetter 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Thomas Zimmermann 
>> Cc: VMware Graphics 
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Dave Young 
>> Cc: Baoquan He 
>> Signed-off-by: Tom Lendacky 
>> ---
>>   arch/x86/kernel/head64.c    | 4 ++--
>>   arch/x86/mm/ioremap.c   | 4 ++--
>>   arch/x86/mm/mem_encrypt.c   | 5 ++---
>>   arch/x86/mm/pat/set_memory.c    | 3 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
>>   drivers/gpu/drm/drm_cache.c | 4 ++--
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
>>   drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
>>   drivers/iommu/amd/iommu.c   | 3 ++-
>>   drivers/iommu/amd/iommu_v2.c    | 3 ++-
>>   drivers/iommu/iommu.c   | 3 ++-
>>   fs/proc/vmcore.c    | 6 +++---
>>   kernel/dma/swiotlb.c    | 4 ++--
>>   13 files changed, 29 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index de01903c3735..cafed6456d45 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -19,7 +19,7 @@
>>   #include 
>>   #include 
>>   #include 
>> -#include 
>> +#include 
>>   #include 
>>     #include 
>> @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long
>> physaddr,
>>    * there is no need to zero it after changing the memory encryption
>>    * attribute.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
>>   vaddr = (unsigned long)__start_bss_decrypted;
>>   vaddr_end = (unsigned long)__end_bss_decrypted;
>>   for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 0f2d5ace5986..5e1c1f5cbbe8 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -693,7 +693,7 @@ static bool __init
>> early_memremap_is_setup_data(resource_size_t phys_addr,
>>   bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned
>> long size,
>>    unsigned long flags)
>>   {
>> -    if (!mem_encrypt_active())
>> +    if (!prot_guest_has(PATTR_MEM_ENCRYPT))
>>   return true;
>>     if (flags & MEMREMAP_ENC)
>> @@ -723,7 +723,7 @@ pgprot_t __init
>> early_memremap_pgprot_adjust(resource_size_t phys_addr,
>>   {
>>   bool encrypted_prot;
>>   -    if (!mem_encrypt_active())
>> +    if (!prot_guest_has(PATTR_MEM_ENCRYPT))
>>   return prot;
>>     encrypted_prot = true;
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 451de8e84fce..0f1533dbe81c 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long
>> vaddr, unsigned long size)
>>   /*
>>    * SME and SEV are very similar but they are not the same, so there are
>>    * times that the kernel will need to distinguish between SME and SEV.
>> The
>> - * sme_active() and sev_active() functions are used for this.  When a
>> - * distinction isn't needed, the mem_encrypt_active() function can be
>> used.
>> + * sme_active() and sev_active() functions are used for this.
>>    *
>>    * The trampoline code is a good example for this requirement.  Before
>>    * paging is activated, SME will access all memory as decrypted, but SEV
>> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>>    * The unused memory range was mapped decrypted, change the
>> encryption
>>    * attribute from decrypted to encrypted before freeing it.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (sme_me_mask) {
>>   r = set_memory_encrypted(vaddr, npages);
>>   if (r) {
>>   pr_warn("failed to free unused decrypted pages\n");
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c

Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Kuppuswamy, Sathyanarayanan




On 8/9/21 2:59 PM, Tom Lendacky wrote:

Not sure how TDX will handle AP booting, are you sure it needs this
special setup as well? Otherwise a check for SEV-ES would be better
instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.


We don't plan to set PROT_STATE. So it does not affect TDX.
For SMP, we use MADT ACPI table for AP booting.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 00/11] Implement generic prot_guest_has() helper function

2021-08-09 Thread Tom Lendacky
On 8/8/21 8:41 PM, Kuppuswamy, Sathyanarayanan wrote:
> Hi Tom,
> 
> On 7/27/21 3:26 PM, Tom Lendacky wrote:
>> This patch series provides a generic helper function, prot_guest_has(),
>> to replace the sme_active(), sev_active(), sev_es_active() and
>> mem_encrypt_active() functions.
>>
>> It is expected that as new protected virtualization technologies are
>> added to the kernel, they can all be covered by a single function call
>> instead of a collection of specific function calls all called from the
>> same locations.
>>
>> The powerpc and s390 patches have been compile tested only. Can the
>> folks copied on this series verify that nothing breaks for them.
> 
> With this patch set, select ARCH_HAS_PROTECTED_GUEST and set
> CONFIG_AMD_MEM_ENCRYPT=n, creates following error.
> 
> ld: arch/x86/mm/ioremap.o: in function `early_memremap_is_setup_data':
> arch/x86/mm/ioremap.c:672: undefined reference to `early_memremap_decrypted'
> 
> It looks like early_memremap_is_setup_data() is not protected with
> appropriate config.

Ok, thanks for finding that. I'll fix that.

Thanks,
Tom

> 
> 
>>
>> Cc: Andi Kleen 
>> Cc: Andy Lutomirski 
>> Cc: Ard Biesheuvel 
>> Cc: Baoquan He 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Borislav Petkov 
>> Cc: Christian Borntraeger 
>> Cc: Daniel Vetter 
>> Cc: Dave Hansen 
>> Cc: Dave Young 
>> Cc: David Airlie 
>> Cc: Heiko Carstens 
>> Cc: Ingo Molnar 
>> Cc: Joerg Roedel 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: Peter Zijlstra 
>> Cc: Thomas Gleixner 
>> Cc: Thomas Zimmermann 
>> Cc: Vasily Gorbik 
>> Cc: VMware Graphics 
>> Cc: Will Deacon 
>>
>> ---
>>
>> Patches based on:
>>   
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftip%2Ftip.git&data=04%7C01%7Cthomas.lendacky%40amd.com%7C563b5e30a3254f6739aa08d95ad6e242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637640701228434514%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vx9v4EmYqVTsJ7KSr97gQaBWJ%2Fq%2BE9NOzXMhe3Fp7T8%3D&reserved=0
>> master
>>    commit 79e920060fa7 ("Merge branch 'WIP/fixes'")
>>
>> Tom Lendacky (11):
>>    mm: Introduce a function to check for virtualization protection
>>  features
>>    x86/sev: Add an x86 version of prot_guest_has()
>>    powerpc/pseries/svm: Add a powerpc version of prot_guest_has()
>>    x86/sme: Replace occurrences of sme_active() with prot_guest_has()
>>    x86/sev: Replace occurrences of sev_active() with prot_guest_has()
>>    x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
>>    treewide: Replace the use of mem_encrypt_active() with
>>  prot_guest_has()
>>    mm: Remove the now unused mem_encrypt_active() function
>>    x86/sev: Remove the now unused mem_encrypt_active() function
>>    powerpc/pseries/svm: Remove the now unused mem_encrypt_active()
>>  function
>>    s390/mm: Remove the now unused mem_encrypt_active() function
>>
>>   arch/Kconfig   |  3 ++
>>   arch/powerpc/include/asm/mem_encrypt.h |  5 --
>>   arch/powerpc/include/asm/protected_guest.h | 30 +++
>>   arch/powerpc/platforms/pseries/Kconfig |  1 +
>>   arch/s390/include/asm/mem_encrypt.h    |  2 -
>>   arch/x86/Kconfig   |  1 +
>>   arch/x86/include/asm/kexec.h   |  2 +-
>>   arch/x86/include/asm/mem_encrypt.h | 13 +
>>   arch/x86/include/asm/protected_guest.h | 27 ++
>>   arch/x86/kernel/crash_dump_64.c    |  4 +-
>>   arch/x86/kernel/head64.c   |  4 +-
>>   arch/x86/kernel/kvm.c  |  3 +-
>>   arch/x86/kernel/kvmclock.c |  4 +-
>>   arch/x86/kernel/machine_kexec_64.c | 19 +++
>>   arch/x86/kernel/pci-swiotlb.c  |  9 ++--
>>   arch/x86/kernel/relocate_kernel_64.S   |  2 +-
>>   arch/x86/kernel/sev.c  |  6 +--
>>   arch/x86/kvm/svm/svm.c |  3 +-
>>   arch/x86/mm/ioremap.c  | 16 +++---
>>   arch/x86/mm/mem_encrypt.c  | 60 +++---
>>   arch/x86/mm/mem_encrypt_identity.c |  3 +-
>>   arch/x86/mm/pat/set_memory.c   |  3 +-
>>   arch/x86/platform/efi/efi_64.c |  9 ++--
>>   arch/x86/realmode/init.c   |  8 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 +-
>>   drivers/gpu/drm/drm_cache.c    |  4 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c    |  4 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_msg.c    |  6 +--
>>   drivers/iommu/amd/init.c   |  7 +--
>>   drivers/iommu/amd/iommu.c  |  3 +-
>>   drivers/iommu/amd/iommu_v2.c   |  3 +-
>>   drivers/iommu/iommu.c  |  3 +-
>>   fs/proc/vmcore.c   |  6 +--
>>   include/linux/mem_encrypt.h    |  4 --
>>   incl

Re: [PATCH] x86: allow i386 kexec to load x86_64 bzImage anywhere

2021-08-09 Thread Kevin Mitchell
On Mon, Aug 09, 2021 at 10:13:16AM +0800, Baoquan He wrote:
> On 08/06/21 at 07:21pm, Kevin Mitchell wrote:
> > In linux-5.2, the following commit allows the kdump kernel to be loaded
> > at a higher address than 896M
> > 
> > 9ca5c8e632ce ("x86/kdump: Have crashkernel=X reserve under 4G by
> > default")
> > 
> > While this limit does indeed seem unnecessary for x86_64 kernels, it
> > still is required to boot to or from i386 kernels. Therefore,
> > kexec-tools continues to enforce it when using the i386 bzImage loader.
> > 
> > However, the i386 bzImage loader may also be used to load an x86_64
> > kernel from i386 user space to be kexeced by an x86_64 kernel. In this
> > case, the limit was incorrectly enforced.
> 
> Are you doing kexec/kdump switching to x86_64 kernel on a i386 system?

We run an x86_64 kernel, but our initrd userspace in which we load the x86_64
kdump kernel is i386 to conserve space as many of our smaller devices are memory
constrained. We also need to fit this initrd into our coreboot SPI flash image
which is limited to a few megabytes.

> Could you tell more about your testing or product environment so that we
> know why we need to do that?

Previous to 9ca5c8e632ce, this worked without issue because the crashkernel area
was always reserved in a location that satisfied the limits defined in
kexec-bzImage.c even on an x86_64 kernel. Once we switched to
linux-5.10, we started seeing

Aboot# kexec --load-panic --initrd=initrd-i386-kdump
--command-line="$crash_cmd_line" linux-x86_64-kdump
Could not find a free area of memory of 0x8000 bytes...
locate_hole failed

which is the result of hitting the kexec-bzImage.c limits. However, these appear
not to apply in the case of x86_64 kernel loaded from an x86_64 kernel even when
using an i386 kexec.

With this patch, I am able to load the kdump kernel into the new default
crashkernel location assigned by the linux-5.10 kernel (e.g., 1968MB, 3264MB)
and have it successfully kexeced when triggering a panic. This was tested on all
our current CPU platforms including both AMD (eKabini, Steppe Eagle, Crowned
Eagle, Merlin Falcon) and Intel (SandyBridge, Broadwell-DE) CPUs variously
running on between 4Gb - 64 Gb of RAM.

Conversely, I tried unconditionally removing the limits in kexec-bzImage.c, but
found that if either or both of the running or kdump kernels were i386 and the
crashkernel reservation was above the 896M limit (I had to force this as the
default location selected by the kernel is below this), the kexec would hang
indefinitely. Therefore, I have kept those limits in place when either kernel is
i386.

> AFAIK, we rarely kexec/kdump switch to
> x86_64 kenrel from a i386 kernel.
> 
> Thanks
> Baoquan
> 
> > 
> > This commit adds an additional check for an x86_64 image kexeced by an
> > x86_64 kernel in the i386 loader and bumps the limit to the maximum
> > addressable 4G in that case.
> > 
> > Signed-off-by: Kevin Mitchell 
> > ---
> >  kexec/arch/i386/kexec-bzImage.c | 41 ++---
> >  1 file changed, 28 insertions(+), 13 deletions(-)
> > 
> > diff --git a/kexec/arch/i386/kexec-bzImage.c 
> > b/kexec/arch/i386/kexec-bzImage.c
> > index df8985d..7b8e36e 100644
> > --- a/kexec/arch/i386/kexec-bzImage.c
> > +++ b/kexec/arch/i386/kexec-bzImage.c
> > @@ -22,6 +22,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -114,6 +115,7 @@ int do_bzImage_load(struct kexec_info *info,
> > struct entry32_regs regs32;
> > struct entry16_regs regs16;
> > unsigned int relocatable_kernel = 0;
> > +   unsigned int kernel64 = 0;
> > unsigned long kernel32_load_addr;
> > char *modified_cmdline;
> > unsigned long cmdline_end;
> > @@ -155,6 +157,13 @@ int do_bzImage_load(struct kexec_info *info,
> > dbgprintf("bzImage is relocatable\n");
> > }
> >  
> > +   if ((setup_header.protocol_version >= 0x020C) &&
> > +   (info->kexec_flags & KEXEC_ARCH_X86_64) &&
> > +   (setup_header.xloadflags & 1)) {
> > +   kernel64 = 1;
> > +   dbgprintf("loading x86_64 bzImage from an x86_64 kernel\n");
> > +   }
> > +
> > /* Can't use bzImage for crash dump purposes with real mode entry */
> > if((info->kexec_flags & KEXEC_ON_CRASH) && real_mode_entry) {
> > fprintf(stderr, "Can't use bzImage for crash dump purposes"
> > @@ -197,17 +206,17 @@ int do_bzImage_load(struct kexec_info *info,
> > /* Load the trampoline.  This must load at a higher address
> >  * than the argument/parameter segment or the kernel will stomp
> >  * it's gdt.
> > -*
> > -* x86_64 purgatory code has got relocations type R_X86_64_32S
> > -* that means purgatory got to be loaded within first 2G otherwise
> > -* overflow takes place while applying relocations.
> >  */
> > -   if (!real_mode_entry && relocatable_kernel)
> > +   if (!real_mode_entry && relocatable_kernel) {
> > +   /* x86_64 purga