Re: [PATCH v1 1/8] x86/xen: update xen_oldmem_pfn_is_ram() documentation

2021-09-28 Thread Boris Ostrovsky


On 9/28/21 2:22 PM, David Hildenbrand wrote:
> The callback is only used for the vmcore nowadays.
>
> Signed-off-by: David Hildenbrand 


Reviewed-by: Boris Ostrovsky 



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/8] x86/xen: simplify xen_oldmem_pfn_is_ram()

2021-09-28 Thread Boris Ostrovsky

On 9/28/21 2:22 PM, David Hildenbrand wrote:
> Let's simplify return handling.
>
> Signed-off-by: David Hildenbrand 
> ---
>  arch/x86/xen/mmu_hvm.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
> index b242d1f4b426..eb61622df75b 100644
> --- a/arch/x86/xen/mmu_hvm.c
> +++ b/arch/x86/xen/mmu_hvm.c
> @@ -21,23 +21,16 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
>   .domid = DOMID_SELF,
>   .pfn = pfn,
>   };
> - int ram;
>  
>   if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, ))
>   return -ENXIO;
>  
>   switch (a.mem_type) {
>   case HVMMEM_mmio_dm:
> - ram = 0;
> - break;
> - case HVMMEM_ram_rw:
> - case HVMMEM_ram_ro:
> + return 0;
>   default:
> - ram = 1;
> - break;
> + return 1;
>   }
> -
> - return ram;
>  }
>  #endif
>  


How about

    return a.mem_type != HVMMEM_mmio_dm;


Result should be promoted to int and this has added benefit of not requiring 
changes in patch 4.


-boris


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 02:01:57PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Yes. But, since the check is related to TDX, I just want to confirm whether
> you are fine with naming the function as intel_*().

Why is this such a big of a deal?!

There's amd_cc_platform_has() and intel_cc_platform_has() will be the
corresponding Intel version.

> Since this patch is going to have dependency on TDX code, I will include
> this patch in TDX patch set.

Ok.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Kuppuswamy, Sathyanarayanan




On 9/28/21 1:58 PM, Borislav Petkov wrote:

On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote:

Just read it. If you want to use cpuid_has_tdx_guest() directly in
cc_platform_has(), then you want to rename intel_cc_platform_has() to
tdx_cc_platform_has()?


Why?

You simply do:

if (cpuid_has_tdx_guest())
intel_cc_platform_has(...);

and lemme paste from that mail: " ...you should use
cpuid_has_tdx_guest() instead but cache its result so that you don't
call CPUID each time the kernel executes cc_platform_has()."

Makes sense?


Yes. But, since the check is related to TDX, I just want to confirm whether
you are fine with naming the function as intel_*().

Since this patch is going to have dependency on TDX code, I will include
this patch in TDX patch set.





--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 01:48:46PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Just read it. If you want to use cpuid_has_tdx_guest() directly in
> cc_platform_has(), then you want to rename intel_cc_platform_has() to
> tdx_cc_platform_has()?

Why?

You simply do:

if (cpuid_has_tdx_guest())
intel_cc_platform_has(...);

and lemme paste from that mail: " ...you should use
cpuid_has_tdx_guest() instead but cache its result so that you don't
call CPUID each time the kernel executes cc_platform_has()."

Makes sense?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Kuppuswamy, Sathyanarayanan




On 9/28/21 1:31 PM, Borislav Petkov wrote:

On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote:

Intel CC support patch is not included in this series. You want me
to address the issue raised by Joerg before merging it?


Did you not see my email to you today:

https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic


Just read it. If you want to use cpuid_has_tdx_guest() directly in
cc_platform_has(), then you want to rename intel_cc_platform_has() to
tdx_cc_platform_has()?



?



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
On Tue, Sep 28, 2021 at 12:19:49PM -0700, Kuppuswamy, Sathyanarayanan wrote:
> Intel CC support patch is not included in this series. You want me
> to address the issue raised by Joerg before merging it?

Did you not see my email to you today:

https://lkml.kernel.org/r/yvl4zughfsh1q...@zn.tnic

?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 4/8] powerpc/pseries/svm: Add a powerpc version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Introduce a powerpc version of the cc_platform_has() function. This will
be used to replace the powerpc mem_encrypt_active() implementation, so
the implementation will initially only support the CC_ATTR_MEM_ENCRYPT
attribute.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
Acked-by: Michael Ellerman 
---
 arch/powerpc/platforms/pseries/Kconfig   |  1 +
 arch/powerpc/platforms/pseries/Makefile  |  2 ++
 arch/powerpc/platforms/pseries/cc_platform.c | 26 
 3 files changed, 29 insertions(+)
 create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 5e037df2a3a1..2e57391e0778 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -159,6 +159,7 @@ config PPC_SVM
select SWIOTLB
select ARCH_HAS_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+   select ARCH_HAS_CC_PLATFORM
help
 There are certain POWER platforms which support secure guests using
 the Protected Execution Facility, with the help of an Ultravisor
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..41d8aee98da4 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -31,3 +31,5 @@ obj-$(CONFIG_FA_DUMP) += rtas-fadump.o
 
 obj-$(CONFIG_SUSPEND)  += suspend.o
 obj-$(CONFIG_PPC_VAS)  += vas.o
+
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o
diff --git a/arch/powerpc/platforms/pseries/cc_platform.c 
b/arch/powerpc/platforms/pseries/cc_platform.c
new file mode 100644
index ..e8021af83a19
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/cc_platform.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#include 
+#include 
+
+#include 
+#include 
+
+bool cc_platform_has(enum cc_attr attr)
+{
+   switch (attr) {
+   case CC_ATTR_MEM_ENCRYPT:
+   return is_secure_guest();
+
+   default:
+   return false;
+   }
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Borislav Petkov
From: Borislav Petkov 

Hi all,

here's v4 of the cc_platform_has() patchset with feedback incorporated.

I'm going to route this through tip if there are no objections.

Thx.

Tom Lendacky (8):
  x86/ioremap: Selectively build arch override encryption functions
  arch/cc: Introduce a function to check for confidential computing
features
  x86/sev: Add an x86 version of cc_platform_has()
  powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
  x86/sme: Replace occurrences of sme_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_active() with cc_platform_has()
  x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
  treewide: Replace the use of mem_encrypt_active() with
cc_platform_has()

 arch/Kconfig |  3 +
 arch/powerpc/include/asm/mem_encrypt.h   |  5 --
 arch/powerpc/platforms/pseries/Kconfig   |  1 +
 arch/powerpc/platforms/pseries/Makefile  |  2 +
 arch/powerpc/platforms/pseries/cc_platform.c | 26 ++
 arch/powerpc/platforms/pseries/svm.c |  5 +-
 arch/s390/include/asm/mem_encrypt.h  |  2 -
 arch/x86/Kconfig |  1 +
 arch/x86/include/asm/io.h|  8 ++
 arch/x86/include/asm/kexec.h |  2 +-
 arch/x86/include/asm/mem_encrypt.h   | 12 +--
 arch/x86/kernel/Makefile |  6 ++
 arch/x86/kernel/cc_platform.c| 69 +++
 arch/x86/kernel/crash_dump_64.c  |  4 +-
 arch/x86/kernel/head64.c |  9 +-
 arch/x86/kernel/kvm.c|  3 +-
 arch/x86/kernel/kvmclock.c   |  4 +-
 arch/x86/kernel/machine_kexec_64.c   | 19 +++--
 arch/x86/kernel/pci-swiotlb.c|  9 +-
 arch/x86/kernel/relocate_kernel_64.S |  2 +-
 arch/x86/kernel/sev.c|  6 +-
 arch/x86/kvm/svm/svm.c   |  3 +-
 arch/x86/mm/ioremap.c| 18 ++--
 arch/x86/mm/mem_encrypt.c| 55 
 arch/x86/mm/mem_encrypt_identity.c   |  9 +-
 arch/x86/mm/pat/set_memory.c |  3 +-
 arch/x86/platform/efi/efi_64.c   |  9 +-
 arch/x86/realmode/init.c |  8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |  4 +-
 drivers/gpu/drm/drm_cache.c  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c  |  6 +-
 drivers/iommu/amd/init.c |  7 +-
 drivers/iommu/amd/iommu.c|  3 +-
 drivers/iommu/amd/iommu_v2.c |  3 +-
 drivers/iommu/iommu.c|  3 +-
 fs/proc/vmcore.c |  6 +-
 include/linux/cc_platform.h  | 88 
 include/linux/mem_encrypt.h  |  4 -
 kernel/dma/swiotlb.c |  4 +-
 40 files changed, 310 insertions(+), 129 deletions(-)
 create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c
 create mode 100644 arch/x86/kernel/cc_platform.c
 create mode 100644 include/linux/cc_platform.h

-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/8] arch/cc: Introduce a function to check for confidential computing features

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

In preparation for other confidential computing technologies, introduce
a generic helper function, cc_platform_has(), that can be used to
check for specific active confidential computing attributes, like
memory encryption. This is intended to eliminate having to add multiple
technology-specific checks to the code (e.g. if (sev_active() ||
tdx_active() || ... ).

 [ bp: s/_CC_PLATFORM_H/_LINUX_CC_PLATFORM_H/g ]

Co-developed-by: Andi Kleen 
Signed-off-by: Andi Kleen 
Co-developed-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Kuppuswamy Sathyanarayanan 

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/Kconfig|  3 ++
 include/linux/cc_platform.h | 88 +
 2 files changed, 91 insertions(+)
 create mode 100644 include/linux/cc_platform.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 8df1c7102643..d1e69d6e8498 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1234,6 +1234,9 @@ config RELR
 config ARCH_HAS_MEM_ENCRYPT
bool
 
+config ARCH_HAS_CC_PLATFORM
+   bool
+
 config HAVE_SPARSE_SYSCALL_NR
bool
help
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
new file mode 100644
index ..a075b70b9a70
--- /dev/null
+++ b/include/linux/cc_platform.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#ifndef _LINUX_CC_PLATFORM_H
+#define _LINUX_CC_PLATFORM_H
+
+#include 
+#include 
+
+/**
+ * enum cc_attr - Confidential computing attributes
+ *
+ * These attributes represent confidential computing features that are
+ * currently active.
+ */
+enum cc_attr {
+   /**
+* @CC_ATTR_MEM_ENCRYPT: Memory encryption is active
+*
+* The platform/OS is running with active memory encryption. This
+* includes running either as a bare-metal system or a hypervisor
+* and actively using memory encryption or as a guest/virtual machine
+* and actively using memory encryption.
+*
+* Examples include SME, SEV and SEV-ES.
+*/
+   CC_ATTR_MEM_ENCRYPT,
+
+   /**
+* @CC_ATTR_HOST_MEM_ENCRYPT: Host memory encryption is active
+*
+* The platform/OS is running as a bare-metal system or a hypervisor
+* and actively using memory encryption.
+*
+* Examples include SME.
+*/
+   CC_ATTR_HOST_MEM_ENCRYPT,
+
+   /**
+* @CC_ATTR_GUEST_MEM_ENCRYPT: Guest memory encryption is active
+*
+* The platform/OS is running as a guest/virtual machine and actively
+* using memory encryption.
+*
+* Examples include SEV and SEV-ES.
+*/
+   CC_ATTR_GUEST_MEM_ENCRYPT,
+
+   /**
+* @CC_ATTR_GUEST_STATE_ENCRYPT: Guest state encryption is active
+*
+* The platform/OS is running as a guest/virtual machine and actively
+* using memory encryption and register state encryption.
+*
+* Examples include SEV-ES.
+*/
+   CC_ATTR_GUEST_STATE_ENCRYPT,
+};
+
+#ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+
+/**
+ * cc_platform_has() - Checks if the specified cc_attr attribute is active
+ * @attr: Confidential computing attribute to check
+ *
+ * The cc_platform_has() function will return an indicator as to whether the
+ * specified Confidential Computing attribute is currently active.
+ *
+ * Context: Any context
+ * Return:
+ * * TRUE  - Specified Confidential Computing attribute is active
+ * * FALSE - Specified Confidential Computing attribute is not active
+ */
+bool cc_platform_has(enum cc_attr attr);
+
+#else  /* !CONFIG_ARCH_HAS_CC_PLATFORM */
+
+static inline bool cc_platform_has(enum cc_attr attr) { return false; }
+
+#endif /* CONFIG_ARCH_HAS_CC_PLATFORM */
+
+#endif /* _LINUX_CC_PLATFORM_H */
-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/8] x86/ioremap: Selectively build arch override encryption functions

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

In preparation for other uses of the cc_platform_has() function
besides AMD's memory encryption support, selectively build the
AMD memory encryption architecture override functions only when
CONFIG_AMD_MEM_ENCRYPT=y. These functions are:

- early_memremap_pgprot_adjust()
- arch_memremap_can_ram_remap()

Additionally, routines that are only invoked by these architecture
override functions can also be conditionally built. These functions are:

- memremap_should_map_decrypted()
- memremap_is_efi_data()
- memremap_is_setup_data()
- early_memremap_is_setup_data()

And finally, phys_mem_access_encrypted() is conditionally built as well,
but requires a static inline version of it when CONFIG_AMD_MEM_ENCRYPT is
not set.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/io.h | 8 
 arch/x86/mm/ioremap.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index 841a5d104afa..5c6a4af0b911 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -391,6 +391,7 @@ extern void arch_io_free_memtype_wc(resource_size_t start, 
resource_size_t size)
 #define arch_io_reserve_memtype_wc arch_io_reserve_memtype_wc
 #endif
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 extern bool arch_memremap_can_ram_remap(resource_size_t offset,
unsigned long size,
unsigned long flags);
@@ -398,6 +399,13 @@ extern bool arch_memremap_can_ram_remap(resource_size_t 
offset,
 
 extern bool phys_mem_access_encrypted(unsigned long phys_addr,
  unsigned long size);
+#else
+static inline bool phys_mem_access_encrypted(unsigned long phys_addr,
+unsigned long size)
+{
+   return true;
+}
+#endif
 
 /**
  * iosubmit_cmds512 - copy data to single MMIO location, in 512-bit units
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 60ade7dd71bd..ccff76cedd8f 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -508,6 +508,7 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
memunmap((void *)((unsigned long)addr & PAGE_MASK));
 }
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT
 /*
  * Examine the physical address to determine if it is an area of memory
  * that should be mapped decrypted.  If the memory is not part of the
@@ -746,7 +747,6 @@ bool phys_mem_access_encrypted(unsigned long phys_addr, 
unsigned long size)
return arch_memremap_can_ram_remap(phys_addr, size, 0);
 }
 
-#ifdef CONFIG_AMD_MEM_ENCRYPT
 /* Remap memory with encryption */
 void __init *early_memremap_encrypted(resource_size_t phys_addr,
  unsigned long size)
-- 
2.29.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 3/8] x86/sev: Add an x86 version of cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Introduce an x86 version of the cc_platform_has() function. This will be
used to replace vendor specific calls like sme_active(), sev_active(),
etc.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/Kconfig   |  1 +
 arch/x86/include/asm/mem_encrypt.h |  1 +
 arch/x86/kernel/Makefile   |  6 +++
 arch/x86/kernel/cc_platform.c  | 69 ++
 arch/x86/mm/mem_encrypt.c  |  1 +
 5 files changed, 78 insertions(+)
 create mode 100644 arch/x86/kernel/cc_platform.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab83c22d274e..9f190ec4f953 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1518,6 +1518,7 @@ config AMD_MEM_ENCRYPT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
select INSTRUCTION_DECODER
select ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS
+   select ARCH_HAS_CC_PLATFORM
help
  Say yes to enable support for the encryption of system memory.
  This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 9c80c68d75b5..3fb9f5ebefa4 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include 
+#include 
 
 #include 
 
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 8f4e8fa6ed75..2ff3e600f426 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -21,6 +21,7 @@ CFLAGS_REMOVE_ftrace.o = -pg
 CFLAGS_REMOVE_early_printk.o = -pg
 CFLAGS_REMOVE_head64.o = -pg
 CFLAGS_REMOVE_sev.o = -pg
+CFLAGS_REMOVE_cc_platform.o = -pg
 endif
 
 KASAN_SANITIZE_head$(BITS).o   := n
@@ -29,6 +30,7 @@ KASAN_SANITIZE_dumpstack_$(BITS).o:= n
 KASAN_SANITIZE_stacktrace.o:= n
 KASAN_SANITIZE_paravirt.o  := n
 KASAN_SANITIZE_sev.o   := n
+KASAN_SANITIZE_cc_platform.o   := n
 
 # With some compiler versions the generated code results in boot hangs, caused
 # by several compilation units. To be safe, disable all instrumentation.
@@ -47,6 +49,7 @@ endif
 KCOV_INSTRUMENT:= n
 
 CFLAGS_head$(BITS).o   += -fno-stack-protector
+CFLAGS_cc_platform.o   += -fno-stack-protector
 
 CFLAGS_irq.o := -I $(srctree)/$(src)/../include/asm/trace
 
@@ -147,6 +150,9 @@ obj-$(CONFIG_UNWINDER_FRAME_POINTER)+= 
unwind_frame.o
 obj-$(CONFIG_UNWINDER_GUESS)   += unwind_guess.o
 
 obj-$(CONFIG_AMD_MEM_ENCRYPT)  += sev.o
+
+obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += cc_platform.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
new file mode 100644
index ..03bb2f343ddb
--- /dev/null
+++ b/arch/x86/kernel/cc_platform.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Confidential Computing Platform Capability checks
+ *
+ * Copyright (C) 2021 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky 
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_INTEL_TDX_GUEST
+   return false;
+#else
+   return false;
+#endif
+}
+
+/*
+ * SME and SEV are very similar but they are not the same, so there are
+ * times that the kernel will need to distinguish between SME and SEV. The
+ * cc_platform_has() function is used for this.  When a distinction isn't
+ * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
+ *
+ * The trampoline code is a good example for this requirement.  Before
+ * paging is activated, SME will access all memory as decrypted, but SEV
+ * will access all memory as encrypted.  So, when APs are being brought
+ * up under SME the trampoline area cannot be encrypted, whereas under SEV
+ * the trampoline area must be encrypted.
+ */
+static bool amd_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+   switch (attr) {
+   case CC_ATTR_MEM_ENCRYPT:
+   return sme_me_mask;
+
+   case CC_ATTR_HOST_MEM_ENCRYPT:
+   return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+
+   case CC_ATTR_GUEST_MEM_ENCRYPT:
+   return sev_status & MSR_AMD64_SEV_ENABLED;
+
+   case CC_ATTR_GUEST_STATE_ENCRYPT:
+   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+
+   default:
+   return false;
+   }
+#else
+   return false;
+#endif
+}
+
+
+bool cc_platform_has(enum cc_attr attr)
+{
+   if (sme_me_mask)
+   return amd_cc_platform_has(attr);
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(cc_platform_has);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index ff08dc463634..e29b1418d00c 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -20,6 +20,7 

Re: [PATCH v4 0/8] Implement generic cc_platform_has() helper function

2021-09-28 Thread Kuppuswamy, Sathyanarayanan




On 9/28/21 12:10 PM, Borislav Petkov wrote:

From: Borislav Petkov 

Hi all,

here's v4 of the cc_platform_has() patchset with feedback incorporated.

I'm going to route this through tip if there are no objections.


Intel CC support patch is not included in this series. You want me
to address the issue raised by Joerg before merging it?



Thx.

Tom Lendacky (8):
   x86/ioremap: Selectively build arch override encryption functions
   arch/cc: Introduce a function to check for confidential computing
 features
   x86/sev: Add an x86 version of cc_platform_has()
   powerpc/pseries/svm: Add a powerpc version of cc_platform_has()
   x86/sme: Replace occurrences of sme_active() with cc_platform_has()
   x86/sev: Replace occurrences of sev_active() with cc_platform_has()
   x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()
   treewide: Replace the use of mem_encrypt_active() with
 cc_platform_has()

  arch/Kconfig |  3 +
  arch/powerpc/include/asm/mem_encrypt.h   |  5 --
  arch/powerpc/platforms/pseries/Kconfig   |  1 +
  arch/powerpc/platforms/pseries/Makefile  |  2 +
  arch/powerpc/platforms/pseries/cc_platform.c | 26 ++
  arch/powerpc/platforms/pseries/svm.c |  5 +-
  arch/s390/include/asm/mem_encrypt.h  |  2 -
  arch/x86/Kconfig |  1 +
  arch/x86/include/asm/io.h|  8 ++
  arch/x86/include/asm/kexec.h |  2 +-
  arch/x86/include/asm/mem_encrypt.h   | 12 +--
  arch/x86/kernel/Makefile |  6 ++
  arch/x86/kernel/cc_platform.c| 69 +++
  arch/x86/kernel/crash_dump_64.c  |  4 +-
  arch/x86/kernel/head64.c |  9 +-
  arch/x86/kernel/kvm.c|  3 +-
  arch/x86/kernel/kvmclock.c   |  4 +-
  arch/x86/kernel/machine_kexec_64.c   | 19 +++--
  arch/x86/kernel/pci-swiotlb.c|  9 +-
  arch/x86/kernel/relocate_kernel_64.S |  2 +-
  arch/x86/kernel/sev.c|  6 +-
  arch/x86/kvm/svm/svm.c   |  3 +-
  arch/x86/mm/ioremap.c| 18 ++--
  arch/x86/mm/mem_encrypt.c| 55 
  arch/x86/mm/mem_encrypt_identity.c   |  9 +-
  arch/x86/mm/pat/set_memory.c |  3 +-
  arch/x86/platform/efi/efi_64.c   |  9 +-
  arch/x86/realmode/init.c |  8 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  |  4 +-
  drivers/gpu/drm/drm_cache.c  |  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |  4 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_msg.c  |  6 +-
  drivers/iommu/amd/init.c |  7 +-
  drivers/iommu/amd/iommu.c|  3 +-
  drivers/iommu/amd/iommu_v2.c |  3 +-
  drivers/iommu/iommu.c|  3 +-
  fs/proc/vmcore.c |  6 +-
  include/linux/cc_platform.h  | 88 
  include/linux/mem_encrypt.h  |  4 -
  kernel/dma/swiotlb.c |  4 +-
  40 files changed, 310 insertions(+), 129 deletions(-)
  create mode 100644 arch/powerpc/platforms/pseries/cc_platform.c
  create mode 100644 arch/x86/kernel/cc_platform.c
  create mode 100644 include/linux/cc_platform.h



--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of sme_active() with the more generic cc_platform_has()
using CC_ATTR_HOST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_HOST_MEM_ENCRYPT
can be updated, as required.

This also replaces two usages of sev_active() that are really geared
towards detecting if SME is active.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/kexec.h |  2 +-
 arch/x86/include/asm/mem_encrypt.h   |  2 --
 arch/x86/kernel/machine_kexec_64.c   | 15 ---
 arch/x86/kernel/pci-swiotlb.c|  9 -
 arch/x86/kernel/relocate_kernel_64.S |  2 +-
 arch/x86/mm/ioremap.c|  6 +++---
 arch/x86/mm/mem_encrypt.c| 13 -
 arch/x86/mm/mem_encrypt_identity.c   |  9 -
 arch/x86/realmode/init.c |  5 +++--
 drivers/iommu/amd/init.c |  7 ---
 10 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 0a6e34b07017..11b7c06e2828 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -129,7 +129,7 @@ relocate_kernel(unsigned long indirection_page,
unsigned long page_list,
unsigned long start_address,
unsigned int preserve_context,
-   unsigned int sme_active);
+   unsigned int host_mem_enc_active);
 #endif
 
 #define ARCH_HAS_KIMAGE_ARCH
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 3fb9f5ebefa4..63c5b99ccae5 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sme_active(void);
 bool sev_active(void);
 bool sev_es_active(void);
 
@@ -76,7 +75,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sme_active(void) { return false; }
 static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
 
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 131f30fdcfbd..7040c0fa921c 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -358,7 +359,7 @@ void machine_kexec(struct kimage *image)
   (unsigned long)page_list,
   image->start,
   image->preserve_context,
-  sme_active());
+  
cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT));
 
 #ifdef CONFIG_KEXEC_JUMP
if (image->preserve_context)
@@ -569,12 +570,12 @@ void arch_kexec_unprotect_crashkres(void)
  */
 int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, gfp_t gfp)
 {
-   if (sev_active())
+   if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
return 0;
 
/*
-* If SME is active we need to be sure that kexec pages are
-* not encrypted because when we boot to the new kernel the
+* If host memory encryption is active we need to be sure that kexec
+* pages are not encrypted because when we boot to the new kernel the
 * pages won't be accessed encrypted (initially).
 */
return set_memory_decrypted((unsigned long)vaddr, pages);
@@ -582,12 +583,12 @@ int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
pages, gfp_t gfp)
 
 void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages)
 {
-   if (sev_active())
+   if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
return;
 
/*
-* If SME is active we need to reset the pages back to being
-* an encrypted mapping before freeing them.
+* If host memory encryption is active we need to reset the pages back
+* to being an encrypted mapping before freeing them.
 */
set_memory_encrypted((unsigned long)vaddr, pages);
 }
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..814ab46a0dad 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -6,7 +6,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
@@ -45,11 +45,10 @@ int __init pci_swiotlb_detect_4gb(void)
swiotlb = 1;
 
/*
-* If SME is active then swiotlb will be set to 1 so that bounce
-* buffers are allocated and used for devices that do not support
-* the addressing range required for the encryption mask.
+* 

[PATCH 8/8] treewide: Replace the use of mem_encrypt_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of mem_encrypt_active() with calls to cc_platform_has() with
the CC_ATTR_MEM_ENCRYPT attribute.

Remove the implementation of mem_encrypt_active() across all arches.

For s390, since the default implementation of the cc_platform_has()
matches the s390 implementation of mem_encrypt_active(), cc_platform_has()
does not need to be implemented in s390 (the config option
ARCH_HAS_CC_PLATFORM is not set).

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/powerpc/include/asm/mem_encrypt.h  | 5 -
 arch/powerpc/platforms/pseries/svm.c| 5 +++--
 arch/s390/include/asm/mem_encrypt.h | 2 --
 arch/x86/include/asm/mem_encrypt.h  | 5 -
 arch/x86/kernel/head64.c| 9 +++--
 arch/x86/mm/ioremap.c   | 4 ++--
 arch/x86/mm/mem_encrypt.c   | 2 +-
 arch/x86/mm/pat/set_memory.c| 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
 drivers/gpu/drm/drm_cache.c | 4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
 drivers/iommu/amd/iommu.c   | 3 ++-
 drivers/iommu/amd/iommu_v2.c| 3 ++-
 drivers/iommu/iommu.c   | 3 ++-
 fs/proc/vmcore.c| 6 +++---
 include/linux/mem_encrypt.h | 4 
 kernel/dma/swiotlb.c| 4 ++--
 18 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/include/asm/mem_encrypt.h 
b/arch/powerpc/include/asm/mem_encrypt.h
index ba9dab07c1be..2f26b8fc8d29 100644
--- a/arch/powerpc/include/asm/mem_encrypt.h
+++ b/arch/powerpc/include/asm/mem_encrypt.h
@@ -10,11 +10,6 @@
 
 #include 
 
-static inline bool mem_encrypt_active(void)
-{
-   return is_secure_guest();
-}
-
 static inline bool force_dma_unencrypted(struct device *dev)
 {
return is_secure_guest();
diff --git a/arch/powerpc/platforms/pseries/svm.c 
b/arch/powerpc/platforms/pseries/svm.c
index 87f001b4c4e4..c083ecbbae4d 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,7 +64,7 @@ void __init svm_swiotlb_init(void)
 
 int set_memory_encrypted(unsigned long addr, int numpages)
 {
-   if (!mem_encrypt_active())
+   if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return 0;
 
if (!PAGE_ALIGNED(addr))
@@ -76,7 +77,7 @@ int set_memory_encrypted(unsigned long addr, int numpages)
 
 int set_memory_decrypted(unsigned long addr, int numpages)
 {
-   if (!mem_encrypt_active())
+   if (!cc_platform_has(CC_ATTR_MEM_ENCRYPT))
return 0;
 
if (!PAGE_ALIGNED(addr))
diff --git a/arch/s390/include/asm/mem_encrypt.h 
b/arch/s390/include/asm/mem_encrypt.h
index 2542cbf7e2d1..08a8b96606d7 100644
--- a/arch/s390/include/asm/mem_encrypt.h
+++ b/arch/s390/include/asm/mem_encrypt.h
@@ -4,8 +4,6 @@
 
 #ifndef __ASSEMBLY__
 
-static inline bool mem_encrypt_active(void) { return false; }
-
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
 
diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index da14ede311aa..2d4f5c17d79c 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -96,11 +96,6 @@ static inline void mem_encrypt_free_decrypted_mem(void) { }
 
 extern char __start_bss_decrypted[], __end_bss_decrypted[], 
__start_bss_decrypted_unused[];
 
-static inline bool mem_encrypt_active(void)
-{
-   return sme_me_mask;
-}
-
 static inline u64 sme_get_me_mask(void)
 {
return sme_me_mask;
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index de01903c3735..fc5371a7e9d1 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -19,7 +19,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include 
@@ -284,8 +284,13 @@ unsigned long __head __startup_64(unsigned long physaddr,
 * The bss section will be memset to zero later in the initialization so
 * there is no need to zero it after changing the memory encryption
 * attribute.
+*
+* This is early code, use an open coded check for SME instead of
+* using cc_platform_has(). This eliminates worries about removing
+* instrumentation or checking boot_cpu_data in the cc_platform_has()
+* function.
 */
-   if (mem_encrypt_active()) {
+   if (sme_get_me_mask()) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index b59a5cbc6bc5..026031b3b782 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -694,7 +694,7 @@ static bool __init 

[PATCH 7/8] x86/sev: Replace occurrences of sev_es_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of sev_es_active() with the more generic cc_platform_has()
using CC_ATTR_GUEST_STATE_ENCRYPT. If future support is added for other
memory encyrption techonologies, the use of CC_ATTR_GUEST_STATE_ENCRYPT
can be updated, as required.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/mem_encrypt.h |  2 --
 arch/x86/kernel/sev.c  |  6 +++---
 arch/x86/mm/mem_encrypt.c  | 24 +++-
 arch/x86/realmode/init.c   |  3 +--
 4 files changed, 7 insertions(+), 28 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index a5a58ccd1ee3..da14ede311aa 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sev_es_active(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
 
@@ -74,7 +73,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sev_es_active(void) { return false; }
 
 static inline int __init
 early_set_memory_decrypted(unsigned long vaddr, unsigned long size) { return 
0; }
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a6895e440bc3..53a6837d354b 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -11,7 +11,7 @@
 
 #include  /* For show_regs() */
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -615,7 +615,7 @@ int __init sev_es_efi_map_ghcbs(pgd_t *pgd)
int cpu;
u64 pfn;
 
-   if (!sev_es_active())
+   if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return 0;
 
pflags = _PAGE_NX | _PAGE_RW;
@@ -774,7 +774,7 @@ void __init sev_es_init_vc_handling(void)
 
BUILD_BUG_ON(offsetof(struct sev_es_runtime_data, ghcb_page) % 
PAGE_SIZE);
 
-   if (!sev_es_active())
+   if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return;
 
if (!sev_es_check_cpu_features())
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 932007a6913b..2d04c39bea1d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -361,25 +361,6 @@ int __init early_set_memory_encrypted(unsigned long vaddr, 
unsigned long size)
return early_set_memory_enc_dec(vaddr, size, true);
 }
 
-/*
- * SME and SEV are very similar but they are not the same, so there are
- * times that the kernel will need to distinguish between SME and SEV. The
- * cc_platform_has() function is used for this.  When a distinction isn't
- * needed, the CC_ATTR_MEM_ENCRYPT attribute can be used.
- *
- * The trampoline code is a good example for this requirement.  Before
- * paging is activated, SME will access all memory as decrypted, but SEV
- * will access all memory as encrypted.  So, when APs are being brought
- * up under SME the trampoline area cannot be encrypted, whereas under SEV
- * the trampoline area must be encrypted.
- */
-
-/* Needs to be called from non-instrumentable code */
-bool noinstr sev_es_active(void)
-{
-   return sev_status & MSR_AMD64_SEV_ES_ENABLED;
-}
-
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
@@ -449,7 +430,7 @@ static void print_mem_encrypt_feature_info(void)
pr_cont(" SEV");
 
/* Encrypted Register State */
-   if (sev_es_active())
+   if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
pr_cont(" SEV-ES");
 
pr_cont("\n");
@@ -468,7 +449,8 @@ void __init mem_encrypt_init(void)
 * With SEV, we need to unroll the rep string I/O instructions,
 * but SEV-ES supports them through the #VC handler.
 */
-   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) && !sev_es_active())
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
+   !cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
static_branch_enable(_enable_key);
 
print_mem_encrypt_feature_info();
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index c878c5ee5a4c..4a3da7592b99 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -2,7 +2,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 
@@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct trampoline_header 
*th)
if (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT))
th->flags |= TH_FLAGS_SME_ACTIVE;
 
-   if (sev_es_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT)) {
/*
 * Skip the call to verify_cpu() in secondary_startup_64 as it
 * will cause #VC exceptions when the AP can't handle them yet.

[PATCH 6/8] x86/sev: Replace occurrences of sev_active() with cc_platform_has()

2021-09-28 Thread Borislav Petkov
From: Tom Lendacky 

Replace uses of sev_active() with the more generic cc_platform_has()
using CC_ATTR_GUEST_MEM_ENCRYPT. If future support is added for other
memory encryption technologies, the use of CC_ATTR_GUEST_MEM_ENCRYPT
can be updated, as required.

Signed-off-by: Tom Lendacky 
Signed-off-by: Borislav Petkov 
---
 arch/x86/include/asm/mem_encrypt.h |  2 --
 arch/x86/kernel/crash_dump_64.c|  4 +++-
 arch/x86/kernel/kvm.c  |  3 ++-
 arch/x86/kernel/kvmclock.c |  4 ++--
 arch/x86/kernel/machine_kexec_64.c |  4 ++--
 arch/x86/kvm/svm/svm.c |  3 ++-
 arch/x86/mm/ioremap.c  |  6 +++---
 arch/x86/mm/mem_encrypt.c  | 21 -
 arch/x86/platform/efi/efi_64.c |  9 +
 9 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h 
b/arch/x86/include/asm/mem_encrypt.h
index 63c5b99ccae5..a5a58ccd1ee3 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -51,7 +51,6 @@ void __init mem_encrypt_free_decrypted_mem(void);
 void __init mem_encrypt_init(void);
 
 void __init sev_es_init_vc_handling(void);
-bool sev_active(void);
 bool sev_es_active(void);
 
 #define __bss_decrypted __section(".bss..decrypted")
@@ -75,7 +74,6 @@ static inline void __init sme_encrypt_kernel(struct 
boot_params *bp) { }
 static inline void __init sme_enable(struct boot_params *bp) { }
 
 static inline void sev_es_init_vc_handling(void) { }
-static inline bool sev_active(void) { return false; }
 static inline bool sev_es_active(void) { return false; }
 
 static inline int __init
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index 045e82e8945b..a7f617a3981d 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
  unsigned long offset, int userbuf,
@@ -73,5 +74,6 @@ ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char 
*buf, size_t csize,
 
 ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
 {
-   return read_from_oldmem(buf, count, ppos, 0, sev_active());
+   return read_from_oldmem(buf, count, ppos, 0,
+   cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT));
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b656456c3a94..8863d1941f1b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -418,7 +419,7 @@ static void __init sev_map_percpu_data(void)
 {
int cpu;
 
-   if (!sev_active())
+   if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
return;
 
for_each_possible_cpu(cpu) {
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index ad273e5861c1..fc3930c5db1b 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -16,9 +16,9 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
-#include 
 #include 
 #include 
 
@@ -232,7 +232,7 @@ static void __init kvmclock_init_mem(void)
 * hvclock is shared between the guest and the hypervisor, must
 * be mapped decrypted.
 */
-   if (sev_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
r = set_memory_decrypted((unsigned long) hvclock_mem,
 1UL << order);
if (r) {
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 7040c0fa921c..f5da4a18070a 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -167,7 +167,7 @@ static int init_transition_pgtable(struct kimage *image, 
pgd_t *pgd)
}
pte = pte_offset_kernel(pmd, vaddr);
 
-   if (sev_active())
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
prot = PAGE_KERNEL_EXEC;
 
set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, prot));
@@ -207,7 +207,7 @@ static int init_pgtable(struct kimage *image, unsigned long 
start_pgtable)
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
 
-   if (sev_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
info.page_flag   |= _PAGE_ENC;
info.kernpg_flag |= _PAGE_ENC;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 05e8d4d27969..a2f78a8cfdaa 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -455,7 +456,7 @@ static int has_svm(void)
return 0;
}
 
-   if (sev_active()) {
+   if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
pr_info("KVM is unsupported when running as an SEV guest\n");
   

[PATCH v1 5/8] virtio-mem: factor out hotplug specifics from virtio_mem_init() into virtio_mem_init_hotplug()

2021-09-28 Thread David Hildenbrand
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.

Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 81 -
 1 file changed, 44 insertions(+), 37 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index bef8ad6bf466..2ba7e8d6ba8d 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2392,41 +2392,10 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
return 0;
 }
 
-static int virtio_mem_init(struct virtio_mem *vm)
+static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 {
const struct range pluggable_range = mhp_get_pluggable_range(true);
uint64_t sb_size, addr;
-   uint16_t node_id;
-
-   if (!vm->vdev->config->get) {
-   dev_err(>vdev->dev, "config access disabled\n");
-   return -EINVAL;
-   }
-
-   /*
-* We don't want to (un)plug or reuse any memory when in kdump. The
-* memory is still accessible (but not mapped).
-*/
-   if (is_kdump_kernel()) {
-   dev_warn(>vdev->dev, "disabled in kdump kernel\n");
-   return -EBUSY;
-   }
-
-   /* Fetch all properties that can't change. */
-   virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size,
-   >plugged_size);
-   virtio_cread_le(vm->vdev, struct virtio_mem_config, block_size,
-   >device_block_size);
-   virtio_cread_le(vm->vdev, struct virtio_mem_config, node_id,
-   _id);
-   vm->nid = virtio_mem_translate_node_id(vm, node_id);
-   virtio_cread_le(vm->vdev, struct virtio_mem_config, addr, >addr);
-   virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
-   >region_size);
-
-   /* Determine the nid for the device based on the lowest address. */
-   if (vm->nid == NUMA_NO_NODE)
-   vm->nid = memory_add_physaddr_to_nid(vm->addr);
 
/* bad device setup - warn only */
if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
@@ -2496,10 +2465,6 @@ static int virtio_mem_init(struct virtio_mem *vm)
  vm->offline_threshold);
}
 
-   dev_info(>vdev->dev, "start address: 0x%llx", vm->addr);
-   dev_info(>vdev->dev, "region size: 0x%llx", vm->region_size);
-   dev_info(>vdev->dev, "device block size: 0x%llx",
-(unsigned long long)vm->device_block_size);
dev_info(>vdev->dev, "memory block size: 0x%lx",
 memory_block_size_bytes());
if (vm->in_sbm)
@@ -2508,10 +2473,52 @@ static int virtio_mem_init(struct virtio_mem *vm)
else
dev_info(>vdev->dev, "big block size: 0x%llx",
 (unsigned long long)vm->bbm.bb_size);
+
+   return 0;
+}
+
+static int virtio_mem_init(struct virtio_mem *vm)
+{
+   uint16_t node_id;
+
+   if (!vm->vdev->config->get) {
+   dev_err(>vdev->dev, "config access disabled\n");
+   return -EINVAL;
+   }
+
+   /*
+* We don't want to (un)plug or reuse any memory when in kdump. The
+* memory is still accessible (but not mapped).
+*/
+   if (is_kdump_kernel()) {
+   dev_warn(>vdev->dev, "disabled in kdump kernel\n");
+   return -EBUSY;
+   }
+
+   /* Fetch all properties that can't change. */
+   virtio_cread_le(vm->vdev, struct virtio_mem_config, plugged_size,
+   >plugged_size);
+   virtio_cread_le(vm->vdev, struct virtio_mem_config, block_size,
+   >device_block_size);
+   virtio_cread_le(vm->vdev, struct virtio_mem_config, node_id,
+   _id);
+   vm->nid = virtio_mem_translate_node_id(vm, node_id);
+   virtio_cread_le(vm->vdev, struct virtio_mem_config, addr, >addr);
+   virtio_cread_le(vm->vdev, struct virtio_mem_config, region_size,
+   >region_size);
+
+   /* Determine the nid for the device based on the lowest address. */
+   if (vm->nid == NUMA_NO_NODE)
+   vm->nid = memory_add_physaddr_to_nid(vm->addr);
+
+   dev_info(>vdev->dev, "start address: 0x%llx", vm->addr);
+   dev_info(>vdev->dev, "region size: 0x%llx", vm->region_size);
+   dev_info(>vdev->dev, "device block size: 0x%llx",
+(unsigned long long)vm->device_block_size);
if (vm->nid != NUMA_NO_NODE && IS_ENABLED(CONFIG_NUMA))
dev_info(>vdev->dev, "nid: %d", vm->nid);
 
-   return 0;
+   return virtio_mem_init_hotplug(vm);
 }
 
 static int virtio_mem_create_resource(struct virtio_mem *vm)
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 6/8] virtio-mem: factor out hotplug specifics from virtio_mem_probe() into virtio_mem_init_hotplug()

2021-09-28 Thread David Hildenbrand
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.

Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 87 +++--
 1 file changed, 45 insertions(+), 42 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 2ba7e8d6ba8d..1be3ee7f684d 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -260,6 +260,8 @@ static void virtio_mem_fake_offline_going_offline(unsigned 
long pfn,
 static void virtio_mem_fake_offline_cancel_offline(unsigned long pfn,
   unsigned long nr_pages);
 static void virtio_mem_retry(struct virtio_mem *vm);
+static int virtio_mem_create_resource(struct virtio_mem *vm);
+static void virtio_mem_delete_resource(struct virtio_mem *vm);
 
 /*
  * Register a virtio-mem device so it will be considered for the online_page
@@ -2395,7 +2397,8 @@ static int virtio_mem_init_vq(struct virtio_mem *vm)
 static int virtio_mem_init_hotplug(struct virtio_mem *vm)
 {
const struct range pluggable_range = mhp_get_pluggable_range(true);
-   uint64_t sb_size, addr;
+   uint64_t unit_pages, sb_size, addr;
+   int rc;
 
/* bad device setup - warn only */
if (!IS_ALIGNED(vm->addr, memory_block_size_bytes()))
@@ -2474,7 +2477,48 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
dev_info(>vdev->dev, "big block size: 0x%llx",
 (unsigned long long)vm->bbm.bb_size);
 
+   /* create the parent resource for all memory */
+   rc = virtio_mem_create_resource(vm);
+   if (rc)
+   return rc;
+
+   /* use a single dynamic memory group to cover the whole memory device */
+   if (vm->in_sbm)
+   unit_pages = PHYS_PFN(memory_block_size_bytes());
+   else
+   unit_pages = PHYS_PFN(vm->bbm.bb_size);
+   rc = memory_group_register_dynamic(vm->nid, unit_pages);
+   if (rc < 0)
+   goto out_del_resource;
+   vm->mgid = rc;
+
+   /*
+* If we still have memory plugged, we have to unplug all memory first.
+* Registering our parent resource makes sure that this memory isn't
+* actually in use (e.g., trying to reload the driver).
+*/
+   if (vm->plugged_size) {
+   vm->unplug_all_required = true;
+   dev_info(>vdev->dev, "unplugging all memory is required\n");
+   }
+
+   /* register callbacks */
+   vm->memory_notifier.notifier_call = virtio_mem_memory_notifier_cb;
+   rc = register_memory_notifier(>memory_notifier);
+   if (rc)
+   goto out_unreg_group;
+   rc = register_virtio_mem_device(vm);
+   if (rc)
+   goto out_unreg_mem;
+
return 0;
+out_unreg_mem:
+   unregister_memory_notifier(>memory_notifier);
+out_unreg_group:
+   memory_group_unregister(vm->mgid);
+out_del_resource:
+   virtio_mem_delete_resource(vm);
+   return rc;
 }
 
 static int virtio_mem_init(struct virtio_mem *vm)
@@ -2578,7 +2622,6 @@ static bool virtio_mem_has_memory_added(struct virtio_mem 
*vm)
 static int virtio_mem_probe(struct virtio_device *vdev)
 {
struct virtio_mem *vm;
-   uint64_t unit_pages;
int rc;
 
BUILD_BUG_ON(sizeof(struct virtio_mem_req) != 24);
@@ -2608,40 +2651,6 @@ static int virtio_mem_probe(struct virtio_device *vdev)
if (rc)
goto out_del_vq;
 
-   /* create the parent resource for all memory */
-   rc = virtio_mem_create_resource(vm);
-   if (rc)
-   goto out_del_vq;
-
-   /* use a single dynamic memory group to cover the whole memory device */
-   if (vm->in_sbm)
-   unit_pages = PHYS_PFN(memory_block_size_bytes());
-   else
-   unit_pages = PHYS_PFN(vm->bbm.bb_size);
-   rc = memory_group_register_dynamic(vm->nid, unit_pages);
-   if (rc < 0)
-   goto out_del_resource;
-   vm->mgid = rc;
-
-   /*
-* If we still have memory plugged, we have to unplug all memory first.
-* Registering our parent resource makes sure that this memory isn't
-* actually in use (e.g., trying to reload the driver).
-*/
-   if (vm->plugged_size) {
-   vm->unplug_all_required = true;
-   dev_info(>vdev->dev, "unplugging all memory is required\n");
-   }
-
-   /* register callbacks */
-   vm->memory_notifier.notifier_call = virtio_mem_memory_notifier_cb;
-   rc = register_memory_notifier(>memory_notifier);
-   if (rc)
-   goto out_unreg_group;
-   rc = register_virtio_mem_device(vm);
-   if (rc)
-   goto out_unreg_mem;
-
virtio_device_ready(vdev);
 
/* trigger a config update to start processing the requested_size */
@@ -2649,12 +2658,6 @@ static int 

[PATCH v1 8/8] virtio-mem: kdump mode to sanitize /proc/vmcore access

2021-09-28 Thread David Hildenbrand
Although virtio-mem currently supports reading unplugged memory in the
hypervisor, this will change in the future, indicated to the device via
a new feature flag. We similarly sanitized /proc/kcore access recently. [1]

Let's register a vmcore callback, to allow vmcore code to check if a PFN
belonging to a virtio-mem device is either currently plugged and should
be dumped or is currently unplugged and should not be accessed, instead
mapping the shared zeropage or returning zeroes when reading.

This is important when not capturing /proc/vmcore via tools like
"makedumpfile" that can identify logically unplugged virtio-mem memory via
PG_offline in the memmap, but simply by e.g., copying the file.

Distributions that support virtio-mem+kdump have to make sure that the
virtio_mem module will be part of the kdump kernel or the kdump initrd;
dracut was recently [2] extended to include virtio-mem in the generated
initrd. As long as no special kdump kernels are used, this will
automatically make sure that virtio-mem will be around in the kdump initrd
and sanitize /proc/vmcore access -- with dracut.

With this series, we'll send one virtio-mem state request for every
~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend
to read/map.

In the future, we might want to allow building virtio-mem for kdump
mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way,
we could support special stripped-down kdump kernels that have many
other config options disabled; we'll tackle that once required. Further,
we might want to try sensing bigger blocks (e.g., memory sections)
first before falling back to device blocks on demand.

Tested with Fedora rawhide, which contains a recent kexec-tools version
(considering "System RAM (virtio_mem)" when creating the vmcore header) and
a recent dracut version (including the virtio_mem module in the kdump
initrd).

[1] https://lkml.kernel.org/r/20210526093041.8800-1-da...@redhat.com
[2] https://github.com/dracutdevs/dracut/pull/1157

Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 136 
 1 file changed, 124 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 76d8aef3cfd2..ec0b2ab37acb 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -223,6 +223,9 @@ struct virtio_mem {
 * When this lock is held the pointers can't change, ONLINE and
 * OFFLINE blocks can't change the state and no subblocks will get
 * plugged/unplugged.
+*
+* In kdump mode, used to serialize requests, last_block_addr and
+* last_block_plugged.
 */
struct mutex hotplug_mutex;
bool hotplug_active;
@@ -230,6 +233,9 @@ struct virtio_mem {
/* An error occurred we cannot handle - stop processing requests. */
bool broken;
 
+   /* Cached valued of is_kdump_kernel() when the device was probed. */
+   bool in_kdump;
+
/* The driver is being removed. */
spinlock_t removal_lock;
bool removing;
@@ -243,6 +249,13 @@ struct virtio_mem {
/* Memory notifier (online/offline events). */
struct notifier_block memory_notifier;
 
+#ifdef CONFIG_PROC_VMCORE
+   /* vmcore callback for /proc/vmcore handling in kdump mode */
+   struct vmcore_cb vmcore_cb;
+   uint64_t last_block_addr;
+   bool last_block_plugged;
+#endif /* CONFIG_PROC_VMCORE */
+
/* Next device in the list of virtio-mem devices. */
struct list_head next;
 };
@@ -2293,6 +2306,12 @@ static void virtio_mem_run_wq(struct work_struct *work)
uint64_t diff;
int rc;
 
+   if (unlikely(vm->in_kdump)) {
+   dev_warn_once(>vdev->dev,
+"unexpected workqueue run in kdump kernel\n");
+   return;
+   }
+
hrtimer_cancel(>retry_timer);
 
if (vm->broken)
@@ -2521,6 +2540,86 @@ static int virtio_mem_init_hotplug(struct virtio_mem *vm)
return rc;
 }
 
+#ifdef CONFIG_PROC_VMCORE
+static int virtio_mem_send_state_request(struct virtio_mem *vm, uint64_t addr,
+uint64_t size)
+{
+   const uint64_t nb_vm_blocks = size / vm->device_block_size;
+   const struct virtio_mem_req req = {
+   .type = cpu_to_virtio16(vm->vdev, VIRTIO_MEM_REQ_STATE),
+   .u.state.addr = cpu_to_virtio64(vm->vdev, addr),
+   .u.state.nb_blocks = cpu_to_virtio16(vm->vdev, nb_vm_blocks),
+   };
+   int rc = -ENOMEM;
+
+   dev_dbg(>vdev->dev, "requesting state: 0x%llx - 0x%llx\n", addr,
+   addr + size - 1);
+
+   switch (virtio_mem_send_request(vm, )) {
+   case VIRTIO_MEM_RESP_ACK:
+   return virtio16_to_cpu(vm->vdev, vm->resp.u.state.state);
+   case VIRTIO_MEM_RESP_ERROR:
+   rc = -EINVAL;
+   break;
+   default:
+   break;
+   

[PATCH v1 7/8] virtio-mem: factor out hotplug specifics from virtio_mem_remove() into virtio_mem_deinit_hotplug()

2021-09-28 Thread David Hildenbrand
Let's prepare for a new virtio-mem kdump mode in which we don't actually
hot(un)plug any memory but only observe the state of device blocks.

Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 1be3ee7f684d..76d8aef3cfd2 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2667,9 +2667,8 @@ static int virtio_mem_probe(struct virtio_device *vdev)
return rc;
 }
 
-static void virtio_mem_remove(struct virtio_device *vdev)
+static void virtio_mem_deinit_hotplug(struct virtio_mem *vm)
 {
-   struct virtio_mem *vm = vdev->priv;
unsigned long mb_id;
int rc;
 
@@ -2716,7 +2715,8 @@ static void virtio_mem_remove(struct virtio_device *vdev)
 * away. Warn at least.
 */
if (virtio_mem_has_memory_added(vm)) {
-   dev_warn(>dev, "device still has system memory added\n");
+   dev_warn(>vdev->dev,
+"device still has system memory added\n");
} else {
virtio_mem_delete_resource(vm);
kfree_const(vm->resource_name);
@@ -2730,6 +2730,13 @@ static void virtio_mem_remove(struct virtio_device *vdev)
} else {
vfree(vm->bbm.bb_states);
}
+}
+
+static void virtio_mem_remove(struct virtio_device *vdev)
+{
+   struct virtio_mem *vm = vdev->priv;
+
+   virtio_mem_deinit_hotplug(vm);
 
/* reset the device and cleanup the queues */
vdev->config->reset(vdev);
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 2/8] x86/xen: simplify xen_oldmem_pfn_is_ram()

2021-09-28 Thread David Hildenbrand
Let's simplify return handling.

Signed-off-by: David Hildenbrand 
---
 arch/x86/xen/mmu_hvm.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index b242d1f4b426..eb61622df75b 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -21,23 +21,16 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
.domid = DOMID_SELF,
.pfn = pfn,
};
-   int ram;
 
if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, ))
return -ENXIO;
 
switch (a.mem_type) {
case HVMMEM_mmio_dm:
-   ram = 0;
-   break;
-   case HVMMEM_ram_rw:
-   case HVMMEM_ram_ro:
+   return 0;
default:
-   ram = 1;
-   break;
+   return 1;
}
-
-   return ram;
 }
 #endif
 
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 1/8] x86/xen: update xen_oldmem_pfn_is_ram() documentation

2021-09-28 Thread David Hildenbrand
The callback is only used for the vmcore nowadays.

Signed-off-by: David Hildenbrand 
---
 arch/x86/xen/mmu_hvm.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index 57409373750f..b242d1f4b426 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -9,12 +9,9 @@
 
 #ifdef CONFIG_PROC_VMCORE
 /*
- * This function is used in two contexts:
- * - the kdump kernel has to check whether a pfn of the crashed kernel
- *   was a ballooned page. vmcore is using this function to decide
- *   whether to access a pfn of the crashed kernel.
- * - the kexec kernel has to check whether a pfn was ballooned by the
- *   previous kernel. If the pfn is ballooned, handle it properly.
+ * The kdump kernel has to check whether a pfn of the crashed kernel
+ * was a ballooned page. vmcore is using this function to decide
+ * whether to access a pfn of the crashed kernel.
  * Returns 0 if the pfn is not backed by a RAM page, the caller may
  * handle the pfn special in this case.
  */
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 4/8] proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks

2021-09-28 Thread David Hildenbrand
Let's support multiple registered callbacks, making sure that
registering vmcore callbacks cannot fail. Make the callback return a
bool instead of an int, handling how to deal with errors internally.
Drop unused HAVE_OLDMEM_PFN_IS_RAM.

We soon want to make use of this infrastructure from other drivers:
virtio-mem, registering one callback for each virtio-mem device, to
prevent reading unplugged virtio-mem memory.

Handle it via a generic vmcore_cb structure, prepared for future
extensions: for example, once we support virtio-mem on s390x where the
vmcore is completely constructed in the second kernel, we want to detect
and add plugged virtio-mem memory ranges to the vmcore in order for them
to get dumped properly.

Handle corner cases that are unexpected and shouldn't happen in sane
setups: registering a callback after the vmcore has already been opened
(warn only) and unregistering a callback after the vmcore has already been
opened (warn and essentially read only zeroes from that point on).

Signed-off-by: David Hildenbrand 
---
 arch/x86/kernel/aperture_64.c | 13 -
 arch/x86/xen/mmu_hvm.c| 15 +++---
 fs/proc/vmcore.c  | 99 ---
 include/linux/crash_dump.h| 26 +++--
 4 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c
index 10562885f5fc..af3ba08b684b 100644
--- a/arch/x86/kernel/aperture_64.c
+++ b/arch/x86/kernel/aperture_64.c
@@ -73,12 +73,23 @@ static int gart_mem_pfn_is_ram(unsigned long pfn)
  (pfn >= aperture_pfn_start + aperture_page_count));
 }
 
+#ifdef CONFIG_PROC_VMCORE
+static bool gart_oldmem_pfn_is_ram(struct vmcore_cb *cb, unsigned long pfn)
+{
+   return !!gart_mem_pfn_is_ram(pfn);
+}
+
+static struct vmcore_cb gart_vmcore_cb = {
+   .pfn_is_ram = gart_oldmem_pfn_is_ram,
+};
+#endif
+
 static void __init exclude_from_core(u64 aper_base, u32 aper_order)
 {
aperture_pfn_start = aper_base >> PAGE_SHIFT;
aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT;
 #ifdef CONFIG_PROC_VMCORE
-   WARN_ON(register_oldmem_pfn_is_ram(_mem_pfn_is_ram));
+   register_vmcore_cb(_vmcore_cb);
 #endif
 #ifdef CONFIG_PROC_KCORE
WARN_ON(register_mem_pfn_is_ram(_mem_pfn_is_ram));
diff --git a/arch/x86/xen/mmu_hvm.c b/arch/x86/xen/mmu_hvm.c
index eb61622df75b..49bd4a6a5858 100644
--- a/arch/x86/xen/mmu_hvm.c
+++ b/arch/x86/xen/mmu_hvm.c
@@ -12,10 +12,10 @@
  * The kdump kernel has to check whether a pfn of the crashed kernel
  * was a ballooned page. vmcore is using this function to decide
  * whether to access a pfn of the crashed kernel.
- * Returns 0 if the pfn is not backed by a RAM page, the caller may
+ * Returns "false" if the pfn is not backed by a RAM page, the caller may
  * handle the pfn special in this case.
  */
-static int xen_oldmem_pfn_is_ram(unsigned long pfn)
+static bool xen_vmcore_pfn_is_ram(struct vmcore_cb *cb, unsigned long pfn)
 {
struct xen_hvm_get_mem_type a = {
.domid = DOMID_SELF,
@@ -23,15 +23,18 @@ static int xen_oldmem_pfn_is_ram(unsigned long pfn)
};
 
if (HYPERVISOR_hvm_op(HVMOP_get_mem_type, ))
-   return -ENXIO;
+   return true;
 
switch (a.mem_type) {
case HVMMEM_mmio_dm:
-   return 0;
+   return false;
default:
-   return 1;
+   return true;
}
 }
+static struct vmcore_cb xen_vmcore_cb = {
+   .pfn_is_ram = xen_vmcore_pfn_is_ram,
+};
 #endif
 
 static void xen_hvm_exit_mmap(struct mm_struct *mm)
@@ -65,6 +68,6 @@ void __init xen_hvm_init_mmu_ops(void)
if (is_pagetable_dying_supported())
pv_ops.mmu.exit_mmap = xen_hvm_exit_mmap;
 #ifdef CONFIG_PROC_VMCORE
-   WARN_ON(register_oldmem_pfn_is_ram(_oldmem_pfn_is_ram));
+   register_vmcore_cb(_vmcore_cb);
 #endif
 }
diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index a9bd80ab670e..7a04b2eca287 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -62,46 +62,75 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0);
 /* Device Dump Size */
 static size_t vmcoredd_orig_sz;
 
-/*
- * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error
- * The called function has to take care of module refcounting.
- */
-static int (*oldmem_pfn_is_ram)(unsigned long pfn);
-
-int register_oldmem_pfn_is_ram(int (*fn)(unsigned long pfn))
+static DECLARE_RWSEM(vmcore_cb_rwsem);
+/* List of registered vmcore callbacks. */
+static LIST_HEAD(vmcore_cb_list);
+/* Whether we had a surprise unregistration of a callback. */
+static bool vmcore_cb_unstable;
+/* Whether the vmcore has been opened once. */
+static bool vmcore_opened;
+
+void register_vmcore_cb(struct vmcore_cb *cb)
 {
-   if (oldmem_pfn_is_ram)
-   return -EBUSY;
-   oldmem_pfn_is_ram = fn;
-   return 0;
+   down_write(_cb_rwsem);
+   INIT_LIST_HEAD(>next);
+   

[PATCH v1 0/8] proc/vmcore: sanitize access to virtio-mem memory

2021-09-28 Thread David Hildenbrand
As so often with virtio-mem changes that mess with common MM
infrastructure, this might be a good candiate to go via Andrew's tree.

--

After removing /dev/kmem, sanitizing /proc/kcore and handling /dev/mem,
this series tackles the last sane way how a VM could accidentially access
logically unplugged memory managed by a virtio-mem device: /proc/vmcore

When dumping memory via "makedumpfile", PG_offline pages, used by
virtio-mem to flag logically unplugged memory, are already properly
excluded; however, especially when accessing/copying /proc/vmcore "the
usual way", we can still end up reading logically unplugged memory part of
a virtio-mem device.

Patch #1-#3 are cleanups. Patch #4 extends the existing oldmem_pfn_is_ram
mechanism. Patch #5-#7 are virtio-mem refactorings for patch #8, which
implements the virtio-mem logic to query the state of device blocks.

Patch #8:

"
Although virtio-mem currently supports reading unplugged memory in the
hypervisor, this will change in the future, indicated to the device via
a new feature flag. We similarly sanitized /proc/kcore access recently.
[...]
Distributions that support virtio-mem+kdump have to make sure that the
virtio_mem module will be part of the kdump kernel or the kdump initrd;
dracut was recently [2] extended to include virtio-mem in the generated
initrd. As long as no special kdump kernels are used, this will
automatically make sure that virtio-mem will be around in the kdump initrd
and sanitize /proc/vmcore access -- with dracut.
"

This is the last remaining bit to support
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE [3] in the Linux implementation of
virtio-mem.

Note: this is best-effort. We'll never be able to control what runs inside
the second kernel, really, but we also don't have to care: we only care
about sane setups where we don't want our VM getting zapped once we
touch the wrong memory location while dumping. While we usually expect sane
setups to use "makedumfile", nothing really speaks against just copying
/proc/vmcore, especially in environments where HWpoisioning isn't typically
expected. Also, we really don't want to put all our trust completely on the
memmap, so sanitizing also makes sense when just using "makedumpfile".

[1] https://lkml.kernel.org/r/20210526093041.8800-1-da...@redhat.com
[2] https://github.com/dracutdevs/dracut/pull/1157
[3] https://lists.oasis-open.org/archives/virtio-comment/202109/msg00021.html

Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Dave Young 
Cc: Baoquan He 
Cc: Vivek Goyal 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Mike Rapoport 
Cc: "Rafael J. Wysocki" 
Cc: x...@kernel.org
Cc: xen-de...@lists.xenproject.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: kexec@lists.infradead.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux...@kvack.org

David Hildenbrand (8):
  x86/xen: update xen_oldmem_pfn_is_ram() documentation
  x86/xen: simplify xen_oldmem_pfn_is_ram()
  proc/vmcore: let pfn_is_ram() return a bool
  proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore
callbacks
  virtio-mem: factor out hotplug specifics from virtio_mem_init() into
virtio_mem_init_hotplug()
  virtio-mem: factor out hotplug specifics from virtio_mem_probe() into
virtio_mem_init_hotplug()
  virtio-mem: factor out hotplug specifics from virtio_mem_remove() into
virtio_mem_deinit_hotplug()
  virtio-mem: kdump mode to sanitize /proc/vmcore access

 arch/x86/kernel/aperture_64.c |  13 +-
 arch/x86/xen/mmu_hvm.c|  31 ++--
 drivers/virtio/virtio_mem.c   | 297 --
 fs/proc/vmcore.c  | 105 
 include/linux/crash_dump.h|  26 ++-
 5 files changed, 332 insertions(+), 140 deletions(-)


base-commit: 5816b3e6577eaa676ceb00a848f0fd65fe2adc29
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1 3/8] proc/vmcore: let pfn_is_ram() return a bool

2021-09-28 Thread David Hildenbrand
The callback should deal with errors internally, it doesn't make sense to
expose these via pfn_is_ram(). We'll rework the callbacks next. Right now
we consider errors as if "it's RAM"; no functional change.

Signed-off-by: David Hildenbrand 
---
 fs/proc/vmcore.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 9a15334da208..a9bd80ab670e 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -84,11 +84,11 @@ void unregister_oldmem_pfn_is_ram(void)
 }
 EXPORT_SYMBOL_GPL(unregister_oldmem_pfn_is_ram);
 
-static int pfn_is_ram(unsigned long pfn)
+static bool pfn_is_ram(unsigned long pfn)
 {
int (*fn)(unsigned long pfn);
/* pfn is ram unless fn() checks pagetype */
-   int ret = 1;
+   bool ret = true;
 
/*
 * Ask hypervisor if the pfn is really ram.
@@ -97,7 +97,7 @@ static int pfn_is_ram(unsigned long pfn)
 */
fn = oldmem_pfn_is_ram;
if (fn)
-   ret = fn(pfn);
+   ret = !!fn(pfn);
 
return ret;
 }
@@ -124,7 +124,7 @@ ssize_t read_from_oldmem(char *buf, size_t count,
nr_bytes = count;
 
/* If pfn is not ram, return zeros for sparse dump files */
-   if (pfn_is_ram(pfn) == 0)
+   if (!pfn_is_ram(pfn))
memset(buf, 0, nr_bytes);
else {
if (encrypted)
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1 2/4] memblock: allow to specify flags with memblock_add_node()

2021-09-28 Thread Heiko Carstens
On Mon, Sep 27, 2021 at 05:05:16PM +0200, David Hildenbrand wrote:
> We want to specify flags when hotplugging memory. Let's prepare to pass
> flags to memblock_add_node() by adjusting all existing users.
> 
> Note that when hotplugging memory the system is already up and running
> and we don't want to add the memory first and apply flags later: it
> should happen within one memblock call.
> 
> Signed-off-by: David Hildenbrand 
> ---
...
>  arch/s390/kernel/setup.c | 3 ++-

For s390
Acked-by: Heiko Carstens 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec