Re: [PATCHv11 18/19] x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method

2024-06-10 Thread Kirill A. Shutemov
On Mon, Jun 10, 2024 at 03:40:20PM +0200, Borislav Petkov wrote:
> On Fri, Jun 07, 2024 at 06:14:28PM +0300, Kirill A. Shutemov wrote:
> >   I was able to address this issue by switching cpa_lock to a mutex.
> >   However, this solution will only work if the callers for set_memory
> >   interfaces are not called from an atomic context. I need to verify if
> >   this is the case.
> 
> Dunno, I'd be nervous about this. Althouth from looking at
> 
>ad5ca55f6bdb ("x86, cpa: srlz cpa(), global flush tlb after splitting big 
> page and before doing cpa")
> 
> I don't see how "So that we don't allow any other cpu" can't be done
> with a mutex. Perhaps the set_memory* interfaces should be usable in as
> many contexts as possible.
> 
> Have you run this with lockdep enabled?

Yes, it booted to the shell just fine. However, that doesn't prove
anything. The set_memory_* function has many obscured cases.

> > - The function __flush_tlb_all() in kernel_(un)map_pages_in_pgd() must be
> >   called with preemption disabled. Once again, I am unsure why this has
> >   not caused issues in the EFI case.
> 
> It could be because EFI does all that setup on the BSP only before the
> others have arrived but I don't remember anymore... It is more than
> a decade ago when I did this...

Are you okay with this? Disabling preemption looks strange, but I don't
see a better option.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv11 18/19] x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method

2024-06-10 Thread Borislav Petkov
On Fri, Jun 07, 2024 at 06:14:28PM +0300, Kirill A. Shutemov wrote:
>   I was able to address this issue by switching cpa_lock to a mutex.
>   However, this solution will only work if the callers for set_memory
>   interfaces are not called from an atomic context. I need to verify if
>   this is the case.

Dunno, I'd be nervous about this. Althouth from looking at

   ad5ca55f6bdb ("x86, cpa: srlz cpa(), global flush tlb after splitting big 
page and before doing cpa")

I don't see how "So that we don't allow any other cpu" can't be done
with a mutex. Perhaps the set_memory* interfaces should be usable in as
many contexts as possible.

Have you run this with lockdep enabled?

> - The function __flush_tlb_all() in kernel_(un)map_pages_in_pgd() must be
>   called with preemption disabled. Once again, I am unsure why this has
>   not caused issues in the EFI case.

It could be because EFI does all that setup on the BSP only before the
others have arrived but I don't remember anymore... It is more than
a decade ago when I did this...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 3/3] doc/hotplug: update man and --help

2024-06-10 Thread Sourabh Jain

Hello Hari,

On 10/06/24 14:49, Hari Bathini wrote:



On 22/05/24 6:43 pm, Sourabh Jain wrote:

Update the man page and --help option to make the description of the
--hotplug option easier to understand.

Cc: Aditya Gupta 
Cc: Baoquan He 
Cc: Coiby Xu 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
  kexec/kexec.8 | 8 
  kexec/kexec.c | 3 ++-
  2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kexec/kexec.8 b/kexec/kexec.8
index 9e995fe..7dddae9 100644
--- a/kexec/kexec.8
+++ b/kexec/kexec.8
@@ -140,10 +140,10 @@ Open a help file for
  .BR kexec .
  .TP
  .B \-\-hotplug


Can we have the description changed like:


-Setup for kernel modification of the elfcorehdr. This option performs
-the steps needed to support kernel updates to the elfcorehdr in the
-presence of hot un/plug and/or on/offline events. This option only
-useful for KEXEC_LOAD syscall.
+Helps avoid kdump kernel reload on CPU/Memory hotplug or on/offline 
events.
+If this option is enabled, the kexec segments will be set up in a 
way that
+the kernel can safely update them on CPU/memory hotplug and/or 
on/offline

+events. This option is only useful for the KEXEC_LOAD syscall.


"Setup kexec segments such that kernel can safely update them on 
CPU/Memory hot add/remove events. If this option is enabled, kernel does

in-kernel update of kexec segments on CPU/Memory hot add/remove events.
Helps avoid the need to reload kdump kernel."



  .TP
  .B \-i\ (\-\-no-checks)
  Fast reboot, no memory integrity checks.
diff --git a/kexec/kexec.c b/kexec/kexec.c
index 034cea6..2b06438 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1093,7 +1093,8 @@ void usage(void)
 "  back to the compatibility syscall 
when file based\n"
 "  syscall is not supported or the 
kernel did not\n"

 "  understand the image (default)\n"
-   " --hotplug    Setup for kernel modification of 
elfcorehdr.\n"


+   " --hotplug    Helps avoid kdump kernel reload on 
CPU/Memory hotplug\n"

+   "  or on/offline events.\n"


"Do in-kernel update of kexec segments on CPU/Memory hot add/remove 
events. This avoids the need to reload kdump kernel."


The suggested descriptions look good to me. I will update them in the 
next version.


Thanks,
Sourabh Jain


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/3] powerpc/kexec_load: add hotplug support

2024-06-10 Thread Sourabh Jain

Hello Hari,

On 10/06/24 15:08, Hari Bathini wrote:



On 22/05/24 6:43 pm, Sourabh Jain wrote:

Kernel commits b741092d5976 ("powerpc/crash: add crash CPU hotplug
support") and 849599b702ef ("powerpc/crash: add crash memory hotplug
support") added crash CPU/Memory hotplug support on PowerPC. This patch
extends that support for the kexec_load syscall.

During CPU/Memory hotplug events on PowerPC, two kexec segments,
elfcorehdr, and FDT, get updated by the kernel. To ensure the kernel
can safely update these two kexec segments for the kdump image loaded
using the kexec_load system call, the following changes are made:

1. Extra size is allocated for both elfcorehdr and FDT to accommodate
    additional resources in the future. For the elfcorehdr, the size 
hint
    is taken from /sys/kernel/crash_elfcorehdr_size sysfs, while for 
FDT,

    extra size is allocated to hold possible CPU nodes.

2. Both elfcorehdr and FDT are skipped from SHA calculation.

Cc: Aditya Gupta 
Cc: Baoquan He 
Cc: Coiby Xu 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
  kexec/arch/ppc64/crashdump-ppc64.c  |  16 ++-
  kexec/arch/ppc64/fdt.c  | 200 +++-
  kexec/arch/ppc64/include/arch/fdt.h |   2 +-
  kexec/arch/ppc64/kexec-elf-ppc64.c  |   2 +-
  kexec/arch/ppc64/kexec-ppc64.c  |  12 +-
  5 files changed, 225 insertions(+), 7 deletions(-)

diff --git a/kexec/arch/ppc64/crashdump-ppc64.c 
b/kexec/arch/ppc64/crashdump-ppc64.c

index 6d47898..c14b593 100644
--- a/kexec/arch/ppc64/crashdump-ppc64.c
+++ b/kexec/arch/ppc64/crashdump-ppc64.c
@@ -476,7 +476,7 @@ int load_crashdump_segments(struct kexec_info 
*info, char* mod_cmdline,

  uint64_t max_addr, unsigned long min_base)
  {
  void *tmp;
-    unsigned long sz;
+    unsigned long sz, memsz;
  uint64_t elfcorehdr;
  int nr_ranges, align = 1024, i;
  unsigned long long end;
@@ -531,8 +531,18 @@ int load_crashdump_segments(struct kexec_info 
*info, char* mod_cmdline,

  }
  }
  -    elfcorehdr = add_buffer(info, tmp, sz, sz, align, min_base,
-    max_addr, 1);
+    memsz = sz;
+    /* To support --hotplug, replace the calculated minimum size 
with the
+ * value from /sys/kernel/crash_elfcorehdr_size and align it 
correctly.

+ */
+    if (do_hotplug) {
+    if (elfcorehdrsz > sz)
+    memsz = _ALIGN(elfcorehdrsz, align);
+    }
+
+    /* Record the location of the elfcorehdr for hotplug handling */
+    info->elfcorehdr = elfcorehdr = add_buffer(info, tmp, sz, memsz, 
align,

+   min_base, max_addr, 1);
  reserve(elfcorehdr, sz);
  /* modify and store the cmdline in a global array. This is later
   * read by flatten_device_tree and modified if required
diff --git a/kexec/arch/ppc64/fdt.c b/kexec/arch/ppc64/fdt.c
index 8bc6d2d..10abc29 100644
--- a/kexec/arch/ppc64/fdt.c
+++ b/kexec/arch/ppc64/fdt.c
@@ -17,6 +17,13 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../kexec.h"
+#include "../../kexec-syscall.h"
    /*
   * Let the kernel know it booted from kexec, as some things (e.g.
@@ -46,17 +53,208 @@ static int fixup_kexec_prop(void *fdt)
  return 0;
  }
  +static inline bool is_dot_dir(char * d_path)
+{
+    return d_path[0] == '.';
+}
+
+/*
+ * Returns size of files including file name size under the given
+ * @cpu_node_path.
+ */
+static unsigned int get_cpu_node_size(char *cpu_node_path)
+{
+    DIR *d;
+    struct dirent *de;
+    struct stat statbuf;
+    unsigned int cpu_node_size = 0;
+    char cpu_prop_path[2 * PATH_MAX];
+
+    d = opendir(cpu_node_path);
+    if (!d)
+    return 0;
+
+    while ((de = readdir(d)) != NULL) {
+    if (de->d_type != DT_REG)
+    continue;
+
+    memset(cpu_prop_path, '\0', PATH_MAX);
+    snprintf(cpu_prop_path, 2 * PATH_MAX, "%s/%s", 
cpu_node_path, de->d_name);

+
+    if (stat(cpu_prop_path, &statbuf))
+    continue;
+
+    cpu_node_size += statbuf.st_size;
+    cpu_node_size += strlen(de->d_name);
+    }
+
+    return cpu_node_size;
+}
+
+/*
+ * Checks if the node specified by the given @path represents a CPU 
node.

+ *
+ * Returns true if the @path has a "device_type" file containing "cpu";
+ * otherwise, returns false.
+ */
+static bool is_cpu_node(char *path)
+{
+    FILE *file;
+    bool ret = false;
+    char device_type[4];
+
+    file = fopen(path, "r");
+    if (!file)
+    return false;
+
+    memset(device_type, '\0', 4);
+    if (fread(device_type, 1, 3, file) < 3)
+    goto out;
+
+    if (strcmp(device_type, "cpu"))
+    goto out;
+
+    ret = true;
+
+out:
+    fclose(file);
+    return ret;
+}
+
+static unsigned int get_threads_per_cpu(char *path)
+{
+    struct stat statbuf;
+    if (stat(path, &statbuf))
+    return 0;
+
+    return statbuf.st_size / 4;
+}
+
+/*
+ * Finds the following CPU attributes:
+ *
+ * cpus_in_system: Current

Re: [PATCH 1/3] kexec_load: Use new kexec flag for hotplug support

2024-06-10 Thread Sourabh Jain

Hello Hari,

On 10/06/24 14:22, Hari Bathini wrote:



On 22/05/24 6:43 pm, Sourabh Jain wrote:

Kernel commit 79365026f869 (crash: add a new kexec flag for hotplug
support) has introduced a new kexec flag to generalize hotplug support.
The newly introduced kexec flags for hotplug allow architectures to
exclude all the required kexec segments from SHA calculation so that
the kernel can update them on hotplug events. This was not possible
earlier with the KEXEC_UPDATE_ELFCOREHDR kexec flags since it was added
only for the elfcorehdr segment.

To enable architectures to control the list of kexec segments to exclude
when hotplug support is enabled, add a new architecture-specific
function named arch_do_exclude_segment. During the SHA calculation, this
function gets called to let the architecture decide whether a specific
kexec segment should be considered for SHA calculation or not.

Given that the KEXEC_UPDATE_ELFCOREHDR is no longer required and was
colliding with the KEXEC_LIVE_UPDATE update flag, it is removed.

Cc: Aditya Gupta 
Cc: Baoquan He 
Cc: Coiby Xu 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
  kexec/arch/arm/kexec-arm.c |  5 +
  kexec/arch/arm64/kexec-arm64.c |  4 
  kexec/arch/cris/kexec-cris.c   |  4 
  kexec/arch/hppa/kexec-hppa.c   |  5 +
  kexec/arch/i386/kexec-x86.c    |  8 
  kexec/arch/ia64/kexec-ia64.c   |  4 
  kexec/arch/loongarch/kexec-loongarch.c |  5 +
  kexec/arch/m68k/kexec-m68k.c   |  5 +
  kexec/arch/mips/kexec-mips.c   |  4 
  kexec/arch/ppc/kexec-ppc.c |  4 
  kexec/arch/ppc64/kexec-ppc64.c |  5 +
  kexec/arch/s390/kexec-s390.c   |  5 +
  kexec/arch/sh/kexec-sh.c   |  5 +
  kexec/arch/x86_64/kexec-x86_64.c   |  5 +
  kexec/kexec-syscall.h  |  2 +-
  kexec/kexec.c  | 14 --
  kexec/kexec.h  |  2 ++
  17 files changed, 79 insertions(+), 7 deletions(-)

diff --git a/kexec/arch/arm/kexec-arm.c b/kexec/arch/arm/kexec-arm.c
index 49f35b1..34531f9 100644
--- a/kexec/arch/arm/kexec-arm.c
+++ b/kexec/arch/arm/kexec-arm.c
@@ -148,3 +148,8 @@ int have_sysfs_fdt(void)
  {
  return !access(SYSFS_FDT, F_OK);
  }
+
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), 
struct kexec_info *UNUSED(info))

+{
+    return 0;
+}
diff --git a/kexec/arch/arm64/kexec-arm64.c 
b/kexec/arch/arm64/kexec-arm64.c

index 4a67b0d..9d052b0 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -1363,3 +1363,7 @@ void arch_reuse_initrd(void)
  void arch_update_purgatory(struct kexec_info *UNUSED(info))
  {
  }
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), 
struct kexec_info *UNUSED(info))

+{
+    return 0;
+}
diff --git a/kexec/arch/cris/kexec-cris.c b/kexec/arch/cris/kexec-cris.c
index 3b69709..7f09121 100644
--- a/kexec/arch/cris/kexec-cris.c
+++ b/kexec/arch/cris/kexec-cris.c
@@ -109,3 +109,7 @@ unsigned long add_buffer(struct kexec_info *info, 
const void *buf,

  buf_min, buf_max, buf_end, 1);
  }
  +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), 
struct kexec_info *UNUSED(info))

+{
+    return 0;
+}
diff --git a/kexec/arch/hppa/kexec-hppa.c b/kexec/arch/hppa/kexec-hppa.c
index 77c9739..a64dc3d 100644
--- a/kexec/arch/hppa/kexec-hppa.c
+++ b/kexec/arch/hppa/kexec-hppa.c
@@ -146,3 +146,8 @@ unsigned long virt_to_phys(unsigned long addr)
  {
  return addr - phys_offset;
  }
+
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), 
struct kexec_info *UNUSED(info))

+{
+    return 0;
+}
diff --git a/kexec/arch/i386/kexec-x86.c b/kexec/arch/i386/kexec-x86.c
index 444cb69..b4947a0 100644
--- a/kexec/arch/i386/kexec-x86.c
+++ b/kexec/arch/i386/kexec-x86.c
@@ -208,3 +208,11 @@ void arch_update_purgatory(struct kexec_info *info)
  elf_rel_set_symbol(&info->rhdr, "panic_kernel",
  &panic_kernel, sizeof(panic_kernel));
  }
+
+int arch_do_exclude_segment(struct kexec_segment *seg_ptr, struct 
kexec_info *info)

+{
+    if (info->elfcorehdr == (unsigned long) seg_ptr->mem)
+    return 1;
+
+    return 0;
+}
diff --git a/kexec/arch/ia64/kexec-ia64.c b/kexec/arch/ia64/kexec-ia64.c
index 418d997..8d9c1f3 100644
--- a/kexec/arch/ia64/kexec-ia64.c
+++ b/kexec/arch/ia64/kexec-ia64.c
@@ -245,3 +245,7 @@ void arch_update_purgatory(struct kexec_info 
*UNUSED(info))

  {
  }
  +int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), 
struct kexec_info *UNUSED(info))

+{
+    return 0;
+}
diff --git a/kexec/arch/loongarch/kexec-loongarch.c 
b/kexec/arch/loongarch/kexec-loongarch.c

index 32a42d2..9a50ff6 100644
--- a/kexec/arch/loongarch/kexec-loongarch.c
+++ b/kexec/arch/loongarch/kexec-loongarch.c
@@ -378,3 +378,8 @@ unsigned long add_buffer(struct kexec_info *info, 
const void *buf,

Re: [PATCH v6 00/10] x86/sev: KEXEC/KDUMP support for SEV-ES guests

2024-06-10 Thread Greg KH
On Mon, Jun 10, 2024 at 12:21:03PM +0200, vsnt...@gmail.com wrote:
> From: Vasant Karasulli 
> 
> Hi,
> 
> here are changes to enable kexec/kdump in SEV-ES guests. The biggest
> problem for supporting kexec/kdump under SEV-ES is to find a way to
> hand the non-boot CPUs (APs) from one kernel to another.
> 
> Without SEV-ES the first kernel parks the CPUs in a HLT loop until
> they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
> For virtual machines the CPU reset is emulated by the hypervisor,
> which sets the vCPU registers back to reset state.
> 
> This does not work under SEV-ES, because the hypervisor has no access
> to the vCPU registers and can't make modifications to them. So an
> SEV-ES guest needs to reset the vCPU itself and park it using the
> AP-reset-hold protocol. Upon wakeup the guest needs to jump to
> real-mode and to the reset-vector configured in the AP-Jump-Table.
> 
> The code to do this is the main part of this patch-set. It works by
> placing code on the AP Jump-Table page itself to park the vCPU and for
> jumping to the reset vector upon wakeup. The code on the AP Jump Table
> runs in 16-bit protected mode with segment base set to the beginning
> of the page. The AP Jump-Table is usually not within the first 1MB of
> memory, so the code can't run in real-mode.
> 
> The AP Jump-Table is the best place to put the parking code, because
> the memory is owned, but read-only by the firmware and writeable by
> the OS. Only the first 4 bytes are used for the reset-vector, leaving
> the rest of the page for code/data/stack to park a vCPU. The code
> can't be in kernel memory because by the time the vCPU wakes up the
> memory will be owned by the new kernel, which might have overwritten it
> already.
> 
> The other patches add initial GHCB Version 2 protocol support, because
> kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
> which is a GHCB protocol version 2 feature.
> 
> The kexec'ed kernel is also entered via the decompressor and needs
> MMIO support there, so this patch-set also adds MMIO #VC support to
> the decompressor and support for handling CLFLUSH instructions.
> 
> Finally there is also code to disable kexec/kdump support at runtime
> when the environment does not support it (e.g. no GHCB protocol
> version 2 support or AP Jump Table over 4GB).
> 
> The diffstat looks big, but most of it is moving code for MMIO #VC
> support around to make it available to the decompressor.
> 
> The previous version of this patch-set can be found here:
> 
>   https://lore.kernel.org/kvm/20240408074049.7049-1-vsnt...@gmail.com/
> 
> Please review.
> 
> Thanks,
>Vasant
> 
> Changes v5->v6:
> - Rebased to v6.10-rc3 kernel
>
> Changes v4->v5:
> - Rebased to v6.9-rc2 kernel
>   - Applied review comments by Tom Lendacky
> - Exclude the AP jump table related code for SEV-SNP guests
> 
> Changes v3->v4:
> - Rebased to v6.8 kernel
>   - Applied review comments by Sean Christopherson
>   - Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
>   into a single function which makes caching jump table address
>   unnecessary
> - annotated struct sev_ap_jump_table_header with __packed attribute
>   - added code to set up real mode data segment at boot time instead of
>   hardcoding the value.
> 
> Joerg Roedel (9):
>   x86/kexec/64: Disable kexec when SEV-ES is active
>   x86/sev: Save and print negotiated GHCB protocol version
>   x86/sev: Set GHCB data structure version
>   x86/sev: Setup code to park APs in the AP Jump Table
>   x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
>   x86/sev: Use AP Jump Table blob to stop CPU
>   x86/sev: Add MMIO handling support to boot/compressed/ code
>   x86/sev: Handle CLFLUSH MMIO events
>   x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob
> 
> Vasant Karasulli (1):
>   x86/sev: Exclude AP jump table related code for SEV-SNP guests
> 
>  arch/x86/boot/compressed/sev.c  |  45 +-
>  arch/x86/include/asm/insn-eval.h|   1 +
>  arch/x86/include/asm/realmode.h |   5 +
>  arch/x86/include/asm/sev-ap-jumptable.h |  30 +
>  arch/x86/include/asm/sev.h  |   7 +
>  arch/x86/kernel/machine_kexec_64.c  |  12 +
>  arch/x86/kernel/process.c   |   8 +
>  arch/x86/kernel/sev-shared.c| 234 +-
>  arch/x86/kernel/sev.c   | 376 +-
>  arch/x86/lib/insn-eval-shared.c | 921 
>  arch/x86/lib/insn-eval.c| 911 +--
>  arch/x86/realmode/Makefile  |   9 +-
>  arch/x86/realmode/init.c|   5 +-
>  arch/x86/realmode/rm/Makefile   |  11 +-
>  arch/x86/realmode/rm/header.S   |   3 +
>  arch/x86/realmode/rm/sev.S  |  85 +++
>  arch/x86/realmode/rmpiggy.S |   6 +
>  arch/x86/realmode/sev/Makefile  |  

[PATCH v6 07/10] x86/sev: Add MMIO handling support to boot/compressed/ code

2024-06-10 Thread vsntk18
From: Joerg Roedel 

Move the code for MMIO handling in the #VC handler to sev-shared.c so
that it can be used in the decompressor code. The decompressor needs
to handle MMIO events for writing to the VGA framebuffer.

When the kernel is booted via UEFI the VGA console is not enabled that
early, but a kexec boot will enable it and the decompressor needs MMIO
support to write to the frame buffer.

This also requires to share some code from lib/insn-eval.c. Since
insn-eval.c can't be included into the decompressor code directly,
move the relevant parts into lib/insn-eval-shared.c and include that
file.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/boot/compressed/sev.c  |  45 +-
 arch/x86/kernel/sev-shared.c| 196 +++
 arch/x86/kernel/sev.c   | 195 ---
 arch/x86/lib/insn-eval-shared.c | 914 
 arch/x86/lib/insn-eval.c| 911 +--
 5 files changed, 1140 insertions(+), 1121 deletions(-)
 create mode 100644 arch/x86/lib/insn-eval-shared.c

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 0457a9d7e515..be930fb9f7b8 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -29,25 +29,6 @@
 static struct ghcb boot_ghcb_page __aligned(PAGE_SIZE);
 struct ghcb *boot_ghcb;
 
-/*
- * Copy a version of this function here - insn-eval.c can't be used in
- * pre-decompression code.
- */
-static bool insn_has_rep_prefix(struct insn *insn)
-{
-   insn_byte_t p;
-   int i;
-
-   insn_get_prefixes(insn);
-
-   for_each_insn_prefix(insn, i, p) {
-   if (p == 0xf2 || p == 0xf3)
-   return true;
-   }
-
-   return false;
-}
-
 /*
  * Only a dummy for insn_get_seg_base() - Early boot-code is 64bit only and
  * doesn't use segments.
@@ -57,6 +38,16 @@ static unsigned long insn_get_seg_base(struct pt_regs *regs, 
int seg_reg_idx)
return 0UL;
 }
 
+static int get_seg_base_limit(struct insn *insn, struct pt_regs *regs,
+ int regoff, unsigned long *base,
+ unsigned long *limit)
+{
+   if (base)
+   *base = 0ULL;
+   if (limit)
+   *limit = ~0ULL;
+}
+
 static inline u64 sev_es_rd_ghcb_msr(void)
 {
struct msr m;
@@ -104,6 +95,14 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
return ES_OK;
 }
 
+static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct 
es_em_ctxt *ctxt,
+  unsigned long vaddr, phys_addr_t 
*paddr)
+{
+   *paddr = (phys_addr_t)vaddr;
+
+   return ES_OK;
+}
+
 static enum es_result vc_ioio_check(struct es_em_ctxt *ctxt, u16 port, size_t 
size)
 {
return ES_OK;
@@ -122,9 +121,14 @@ static bool fault_in_kernel_space(unsigned long address)
 
 #define __BOOT_COMPRESSED
 
+#undef WARN_ONCE
+#define WARN_ONCE(condition, format...)
+
 /* Basic instruction decoding support needed */
+#include 
 #include "../../lib/inat.c"
 #include "../../lib/insn.c"
+#include "../../lib/insn-eval-shared.c"
 
 /* Include code for early handlers */
 #include "../../kernel/sev-shared.c"
@@ -323,6 +327,9 @@ void do_boot_stage2_vc(struct pt_regs *regs, unsigned long 
exit_code)
case SVM_EXIT_CPUID:
result = vc_handle_cpuid(boot_ghcb, &ctxt);
break;
+   case SVM_EXIT_NPF:
+   result = vc_handle_mmio(boot_ghcb, &ctxt);
+   break;
default:
result = ES_UNSUPPORTED;
break;
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index f63262a9c2a5..1b25a6cacec7 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -1043,6 +1043,202 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
return ES_OK;
 }
 
+static long *vc_insn_get_rm(struct es_em_ctxt *ctxt)
+{
+   long *reg_array;
+   int offset;
+
+   reg_array = (long *)ctxt->regs;
+   offset= insn_get_modrm_rm_off(&ctxt->insn, ctxt->regs);
+
+   if (offset < 0)
+   return NULL;
+
+   offset /= sizeof(long);
+
+   return reg_array + offset;
+}
+
+static enum es_result vc_do_mmio(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
+unsigned int bytes, bool read)
+{
+   u64 exit_code, exit_info_1, exit_info_2;
+   unsigned long ghcb_pa = __pa(ghcb);
+   enum es_result res;
+   phys_addr_t paddr;
+   void __user *ref;
+
+   ref = insn_get_addr_ref(&ctxt->insn, ctxt->regs);
+   if (ref == (void __user *)-1L)
+   return ES_UNSUPPORTED;
+
+   exit_code = read ? SVM_VMGEXIT_MMIO_READ : SVM_VMGEXIT_MMIO_WRITE;
+
+   res = vc_slow_virt_to_phys(ghcb, ctxt, (unsigned long)ref, &paddr);
+   if (res != ES_OK) {
+   if (res == ES_EXCEPTION && !read)
+   ctxt->fi.error_code |= X86_PF_WRITE;
+

[PATCH v6 10/10] x86/sev: Exclude AP jump table related code for SEV-SNP guests

2024-06-10 Thread vsntk18
From: Vasant Karasulli 

Unlike SEV-ES, AP jump table technique is not used in SEV-SNP
when transitioning from one layer of code to another
(e.g. when going from UEFI to the OS).

Signed-off-by: Vasant Karasulli 
---
 arch/x86/kernel/sev.c| 6 +-
 arch/x86/realmode/init.c | 5 +++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e64320507da2..a9cf74512269 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1392,7 +1392,8 @@ STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park);
 void sev_es_stop_this_cpu(void)
 {
if (!(cc_vendor == CC_VENDOR_AMD) ||
-   !cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+   !cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT) ||
+cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return;
 
/* Only park in the AP jump table when the code has been installed */
@@ -1468,6 +1469,9 @@ bool sev_kexec_supported(void)
if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
return true;
 
+   if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+   return false;
+
/*
 * KEXEC with SEV-ES and more than one CPU is only supported
 * when the AP jump table is installed.
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index f9bc444a3064..ed798939be5d 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -80,8 +80,9 @@ static void __init sme_sev_setup_real_mode(struct 
trampoline_header *th)
 */
th->start = (u64) secondary_startup_64_no_verify;
 
-   if (sev_es_setup_ap_jump_table(real_mode_header))
-   panic("Failed to get/update SEV-ES AP Jump Table");
+   if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+   if (sev_es_setup_ap_jump_table(real_mode_header))
+   panic("Failed to get/update SEV-ES AP Jump 
Table");
}
 #endif
 }
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 09/10] x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob

2024-06-10 Thread vsntk18
From: Joerg Roedel 

When the AP jump table blob is installed the kernel can hand over the
APs from the old to the new kernel. Enable kexec when the AP jump
table blob has been installed.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/kernel/machine_kexec_64.c |  3 ++-
 arch/x86/kernel/sev.c  | 15 +++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 6f681ced6594..e557eadb0ec9 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -233,6 +233,7 @@ u64 snp_get_unsupported_features(u64 status);
 u64 sev_get_status(void);
 void sev_show_status(void);
 void sev_es_stop_this_cpu(void);
+bool sev_kexec_supported(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -263,6 +264,7 @@ static inline u64 snp_get_unsupported_features(u64 status) 
{ return 0; }
 static inline u64 sev_get_status(void) { return 0; }
 static inline void sev_show_status(void) { }
 static inline void sev_es_stop_this_cpu(void) { }
+static inline bool sev_kexec_supported(void) { return true; }
 #endif
 
 #ifdef CONFIG_KVM_AMD_SEV
diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 1dfb47df5c01..43f5f7e48cbc 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_ACPI
 /*
@@ -269,7 +270,7 @@ static void load_segments(void)
 
 static bool machine_kexec_supported(void)
 {
-   if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+   if (!sev_kexec_supported())
return false;
 
return true;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 30ede17b5a04..e64320507da2 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1463,6 +1463,21 @@ static void __init sev_es_setup_play_dead(void)
 static inline void sev_es_setup_play_dead(void) { }
 #endif
 
+bool sev_kexec_supported(void)
+{
+   if (!cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+   return true;
+
+   /*
+* KEXEC with SEV-ES and more than one CPU is only supported
+* when the AP jump table is installed.
+*/
+   if (num_possible_cpus() > 1)
+   return sev_ap_jumptable_blob_installed;
+   else
+   return true;
+}
+
 static void __init alloc_runtime_data(int cpu)
 {
struct sev_es_runtime_data *data;
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 08/10] x86/sev: Handle CLFLUSH MMIO events

2024-06-10 Thread vsntk18
From: Joerg Roedel 

Handle CLFLUSH instruction to MMIO memory in the #VC handler. The
instruction is ignored by the handler, as the Hypervisor is
responsible for cache management of emulated MMIO memory.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/include/asm/insn-eval.h | 1 +
 arch/x86/kernel/sev-shared.c | 3 +++
 arch/x86/lib/insn-eval-shared.c  | 7 +++
 3 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/insn-eval.h b/arch/x86/include/asm/insn-eval.h
index 54368a43abf6..3bcea641913a 100644
--- a/arch/x86/include/asm/insn-eval.h
+++ b/arch/x86/include/asm/insn-eval.h
@@ -40,6 +40,7 @@ enum insn_mmio_type {
INSN_MMIO_READ_ZERO_EXTEND,
INSN_MMIO_READ_SIGN_EXTEND,
INSN_MMIO_MOVS,
+   INSN_MMIO_IGNORE,
 };
 
 enum insn_mmio_type insn_decode_mmio(struct insn *insn, int *bytes);
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index 1b25a6cacec7..2a963ad84f10 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -1171,6 +1171,9 @@ static enum es_result vc_handle_mmio(struct ghcb *ghcb, 
struct es_em_ctxt *ctxt)
if (mmio == INSN_MMIO_DECODE_FAILED)
return ES_DECODE_FAILED;
 
+   if (mmio == INSN_MMIO_IGNORE)
+   return ES_OK;
+
if (mmio != INSN_MMIO_WRITE_IMM && mmio != INSN_MMIO_MOVS) {
reg_data = insn_get_modrm_reg_ptr(insn, ctxt->regs);
if (!reg_data)
diff --git a/arch/x86/lib/insn-eval-shared.c b/arch/x86/lib/insn-eval-shared.c
index 02acdc2921ff..27fd347d84ae 100644
--- a/arch/x86/lib/insn-eval-shared.c
+++ b/arch/x86/lib/insn-eval-shared.c
@@ -906,6 +906,13 @@ enum insn_mmio_type insn_decode_mmio(struct insn *insn, 
int *bytes)
*bytes = 2;
type = INSN_MMIO_READ_SIGN_EXTEND;
break;
+   case 0xae: /* CLFLUSH */
+   /*
+* Ignore CLFLUSHes - those go to emulated MMIO anyway 
and the
+* hypervisor is responsible for cache management.
+*/
+   type = INSN_MMIO_IGNORE;
+   break;
}
break;
}
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 05/10] x86/sev: Park APs on AP Jump Table with GHCB protocol version 2

2024-06-10 Thread vsntk18
From: Joerg Roedel 

GHCB protocol version 2 adds the MSR-based AP-reset-hold VMGEXIT which
does not need a GHCB. Use that to park APs in 16-bit protected mode on
the AP jump table.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/include/asm/realmode.h |  3 ++
 arch/x86/kernel/sev.c   | 55 ++---
 arch/x86/realmode/rm/Makefile   | 11 +++--
 arch/x86/realmode/rm/header.S   |  3 ++
 arch/x86/realmode/rm/sev.S  | 85 +
 5 files changed, 146 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/realmode/rm/sev.S

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index bd54a48fe077..b0a2aa9b8366 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -23,6 +23,9 @@ struct real_mode_header {
u32 trampoline_header;
 #ifdef CONFIG_AMD_MEM_ENCRYPT
u32 sev_es_trampoline_start;
+   u32 sev_ap_park;
+   u32 sev_ap_park_seg;
+   u32 sev_ap_park_gdt;
 #endif
 #ifdef CONFIG_X86_64
u32 trampoline_start64;
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index c15d3568cab9..84b79630f065 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1147,8 +1148,9 @@ void __init snp_set_wakeup_secondary_cpu(void)
 void __init sev_es_setup_ap_jump_table_data(void *base, u32 pa)
 {
struct sev_ap_jump_table_header *header;
+   u64 *ap_jumptable_gdt, *sev_ap_park_gdt;
struct desc_ptr *gdt_descr;
-   u64 *ap_jumptable_gdt;
+   int idx;
 
header = base;
 
@@ -1158,9 +1160,16 @@ void __init sev_es_setup_ap_jump_table_data(void *base, 
u32 pa)
 * real-mode.
 */
ap_jumptable_gdt = (u64 *)(base + header->ap_jumptable_gdt);
-   ap_jumptable_gdt[SEV_APJT_CS16 / 8] = GDT_ENTRY(0x9b, pa, 0x);
-   ap_jumptable_gdt[SEV_APJT_DS16 / 8] = GDT_ENTRY(0x93, pa, 0x);
-   ap_jumptable_gdt[SEV_RM_DS / 8] = GDT_ENTRY(0x93, 0, 0x);
+   sev_ap_park_gdt  = __va(real_mode_header->sev_ap_park_gdt);
+
+   idx = SEV_APJT_CS16 / 8;
+   ap_jumptable_gdt[idx] = sev_ap_park_gdt[idx] = GDT_ENTRY(0x9b, pa, 
0x);
+
+   idx = SEV_APJT_DS16 / 8;
+   ap_jumptable_gdt[idx] = sev_ap_park_gdt[idx] = GDT_ENTRY(0x93, pa, 
0x);
+
+   idx = SEV_RM_DS / 8;
+   ap_jumptable_gdt[idx] = GDT_ENTRY(0x93, 0x0, 0x);
 
/* Write correct GDT base address into GDT descriptor */
gdt_descr = (struct desc_ptr *)(base + header->ap_jumptable_gdt);
@@ -1349,6 +1358,38 @@ void setup_ghcb(void)
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
+void __noreturn sev_jumptable_ap_park(void)
+{
+   local_irq_disable();
+
+   write_cr3(real_mode_header->trampoline_pgd);
+
+   /* Exiting long mode will fail if CR4.PCIDE is set. */
+   if (cpu_feature_enabled(X86_FEATURE_PCID))
+   cr4_clear_bits(X86_CR4_PCIDE);
+
+   /*
+* Set all GPRs except EAX, EBX, ECX, and EDX to reset state to prepare
+* for software reset.
+*/
+   asm volatile("xorl  %%r15d, %%r15d\n"
+"xorl  %%r14d, %%r14d\n"
+"xorl  %%r13d, %%r13d\n"
+"xorl  %%r12d, %%r12d\n"
+"xorl  %%r11d, %%r11d\n"
+"xorl  %%r10d, %%r10d\n"
+"xorl  %%r9d,  %%r9d\n"
+"xorl  %%r8d,  %%r8d\n"
+"xorl  %%esi, %%esi\n"
+"xorl  %%edi, %%edi\n"
+"xorl  %%esp, %%esp\n"
+"xorl  %%ebp, %%ebp\n"
+"ljmpl *%0" : :
+"m" (real_mode_header->sev_ap_park));
+   unreachable();
+}
+STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park);
+
 static void sev_es_ap_hlt_loop(void)
 {
struct ghcb_state state;
@@ -1385,8 +1426,10 @@ static void sev_es_play_dead(void)
play_dead_common();
 
/* IRQs now disabled */
-
-   sev_es_ap_hlt_loop();
+   if (sev_ap_jumptable_blob_installed)
+   sev_jumptable_ap_park();
+   else
+   sev_es_ap_hlt_loop();
 
/*
 * If we get here, the VCPU was woken up again. Jump to CPU
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index a0fb39abc5c8..9c5892219cb1 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -19,11 +19,12 @@ wakeup-objs += video-vga.o
 wakeup-objs+= video-vesa.o
 wakeup-objs+= video-bios.o
 
-realmode-y += header.o
-realmode-y += trampoline_$(BITS).o
-realmode-y += stack.o
-realmode-y += reboot.o
-realmode-$(CONFIG_ACPI_SLEEP)  += $(wakeup-objs)
+realmode-y += header.o
+realmode-y   

[PATCH v6 06/10] x86/sev: Use AP Jump Table blob to stop CPU

2024-06-10 Thread vsntk18
From: Joerg Roedel 

To support kexec under SEV-ES the APs can't be parked with HLT. Upon
wakeup the AP needs to find its way to execute at the reset vector set
by the new kernel and in real-mode.

This is what the AP jump table blob provides, so stop the APs the
SEV-ES way by calling the AP-reset-hold VMGEXIT from the AP jump
table.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/include/asm/sev.h |  2 ++
 arch/x86/kernel/process.c  |  8 
 arch/x86/kernel/sev.c  | 15 ++-
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 963d51dcf0e6..6f681ced6594 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -232,6 +232,7 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end);
 u64 snp_get_unsupported_features(u64 status);
 u64 sev_get_status(void);
 void sev_show_status(void);
+void sev_es_stop_this_cpu(void);
 #else
 static inline void sev_es_ist_enter(struct pt_regs *regs) { }
 static inline void sev_es_ist_exit(void) { }
@@ -261,6 +262,7 @@ static inline void snp_accept_memory(phys_addr_t start, 
phys_addr_t end) { }
 static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
 static inline u64 sev_get_status(void) { return 0; }
 static inline void sev_show_status(void) { }
+static inline void sev_es_stop_this_cpu(void) { }
 #endif
 
 #ifdef CONFIG_KVM_AMD_SEV
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b8441147eb5e..0bc615d69c0e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "process.h"
 
@@ -836,6 +837,13 @@ void __noreturn stop_this_cpu(void *dummy)
cpumask_clear_cpu(cpu, &cpus_stop_mask);
 
for (;;) {
+   /*
+* SEV-ES guests need a special stop routine to support
+* kexec. Try this first, if it fails the function will
+* return and native_halt() is used.
+*/
+   sev_es_stop_this_cpu();
+
/*
 * Use native_halt() so that memory contents don't change
 * (stack usage and variables) after possibly issuing the
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 84b79630f065..8d3cc5cd7e11 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1357,7 +1357,6 @@ void setup_ghcb(void)
snp_register_ghcb_early(__pa(&boot_ghcb_page));
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
 void __noreturn sev_jumptable_ap_park(void)
 {
local_irq_disable();
@@ -1390,6 +1389,20 @@ void __noreturn sev_jumptable_ap_park(void)
 }
 STACK_FRAME_NON_STANDARD(sev_jumptable_ap_park);
 
+void sev_es_stop_this_cpu(void)
+{
+   if (!(cc_vendor == CC_VENDOR_AMD) ||
+   !cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+   return;
+
+   /* Only park in the AP jump table when the code has been installed */
+   if (!sev_ap_jumptable_blob_installed)
+   return;
+
+   sev_jumptable_ap_park();
+}
+
+#ifdef CONFIG_HOTPLUG_CPU
 static void sev_es_ap_hlt_loop(void)
 {
struct ghcb_state state;
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 04/10] x86/sev: Setup code to park APs in the AP Jump Table

2024-06-10 Thread vsntk18
From: Joerg Roedel 

The AP jump table under SEV-ES contains the reset vector where non-boot
CPUs start executing when coming out of reset. This means that a CPU
coming out of the AP-reset-hold VMGEXIT also needs to start executing at
the reset vector stored in the AP jump table.

The problem is to find a safe place to put the real-mode code which
executes the VMGEXIT and jumps to the reset vector. The code can not be
in kernel memory, because after kexec that memory is owned by the new
kernel and the code might have been overwritten.

Fortunately the AP jump table itself is a safe place, because the
memory is not owned by the OS and will not be overwritten by a new
kernel started through kexec. The table is 4k in size and only the
first 4 bytes are used for the reset vector. This leaves enough space
for some 16-bit code to do the job and even a small stack.

The AP jump table must be 4K in size, in encrypted memory and it must
be 4K (page) aligned. There can only be one AP jump table and it
should reside in memory that has been marked as reserved by UEFI.

Install 16-bit code into the AP jump table under SEV-ES.
The code will do an AP-reset-hold VMGEXIT and jump to the
reset vector after being woken up.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/include/asm/realmode.h |   2 +
 arch/x86/include/asm/sev-ap-jumptable.h |  30 ++
 arch/x86/kernel/sev.c   |  94 ++---
 arch/x86/realmode/Makefile  |   9 +-
 arch/x86/realmode/rmpiggy.S |   6 ++
 arch/x86/realmode/sev/Makefile  |  33 ++
 arch/x86/realmode/sev/ap_jump_table.S   | 131 
 arch/x86/realmode/sev/ap_jump_table.lds |  24 +
 8 files changed, 316 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
 create mode 100644 arch/x86/realmode/sev/Makefile
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
 create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 87e5482acd0d..bd54a48fe077 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -63,6 +63,8 @@ extern unsigned long initial_code;
 extern unsigned long initial_stack;
 #ifdef CONFIG_AMD_MEM_ENCRYPT
 extern unsigned long initial_vc_handler;
+extern unsigned char rm_ap_jump_table_blob[];
+extern unsigned char rm_ap_jump_table_blob_end[];
 #endif
 
 extern u32 *trampoline_lock;
diff --git a/arch/x86/include/asm/sev-ap-jumptable.h 
b/arch/x86/include/asm/sev-ap-jumptable.h
new file mode 100644
index ..17b07fb19297
--- /dev/null
+++ b/arch/x86/include/asm/sev-ap-jumptable.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * AMD Encrypted Register State Support
+ *
+ * Author: Joerg Roedel 
+ */
+#ifndef __ASM_SEV_AP_JUMPTABLE_H
+#define __ASM_SEV_AP_JUMPTABLE_H
+
+#defineSEV_APJT_CS16   0x8
+#defineSEV_APJT_DS16   0x10
+#defineSEV_RM_DS   0x18
+
+#define SEV_APJT_ENTRY 0x10
+
+#ifndef __ASSEMBLY__
+
+/*
+ * The reset_ip and reset_cs members are fixed and defined through the GHCB
+ * specification. Do not change or move them around.
+ */
+struct sev_ap_jump_table_header {
+   u16 reset_ip;
+   u16 reset_cs;
+   u16 ap_jumptable_gdt;
+} __packed;
+
+#endif /* !__ASSEMBLY__ */
+
+#endif /* __ASM_SEV_AP_JUMPTABLE_H */
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index f0d87549b1e1..c15d3568cab9 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -92,6 +93,9 @@ static struct ghcb *boot_ghcb __section(".data");
 /* Bitmap of SEV features supported by the hypervisor */
 static u64 sev_hv_features __ro_after_init;
 
+/* Whether the AP jump table blob was successfully installed */
+static bool sev_ap_jumptable_blob_installed __ro_after_init;
+
 /* #VC handler runtime per-CPU data */
 struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -670,12 +674,12 @@ static u64 __init get_snp_jump_table_addr(void)
return addr;
 }
 
-static u64 __init get_jump_table_addr(void)
+static phys_addr_t __init get_jump_table_addr(void)
 {
struct ghcb_state state;
unsigned long flags;
struct ghcb *ghcb;
-   u64 ret = 0;
+   phys_addr_t jump_table_pa;
 
if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
return get_snp_jump_table_addr();
@@ -694,13 +698,13 @@ static u64 __init get_jump_table_addr(void)
 
if (ghcb_sw_exit_info_1_is_valid(ghcb) &&
ghcb_sw_exit_info_2_is_valid(ghcb))
-   ret = ghcb->save.sw_exit_info_2;
+   jump_table_pa = ghcb->save.sw_exit_info_2;
 
__sev_put_ghcb(&state);
 
local_irq_restore(flags);
 
-   return ret;
+   return jump_table_pa;
 }
 
 static void __head
@@ -1135,38 +1139,104 @@ void __init snp_s

[PATCH v6 03/10] x86/sev: Set GHCB data structure version

2024-06-10 Thread vsntk18
From: Joerg Roedel 

It turned out that the GHCB->protocol field does not declare the
version of the guest-hypervisor communication protocol, but rather the
version of the GHCB data structure. Reflect that in the define used to
set the protocol field.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/include/asm/sev.h   | 3 +++
 arch/x86/kernel/sev-shared.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ca20cc4e5826..963d51dcf0e6 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -19,6 +19,9 @@
 #define GHCB_PROTOCOL_MAX  2ULL
 #define GHCB_DEFAULT_USAGE 0ULL
 
+/* Version of the GHCB data structure */
+#define GHCB_VERSION   1
+
 #defineVMGEXIT()   { asm volatile("rep; 
vmmcall\n\r"); }
 
 struct boot_params;
diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index f5717eddf75b..f63262a9c2a5 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -264,7 +264,7 @@ static enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
  u64 exit_info_2)
 {
/* Fill in protocol and format specifiers */
-   ghcb->protocol_version = ghcb_version;
+   ghcb->protocol_version = GHCB_VERSION;
ghcb->ghcb_usage   = GHCB_DEFAULT_USAGE;
 
ghcb_set_sw_exit_code(ghcb, exit_code);
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 02/10] x86/sev: Save and print negotiated GHCB protocol version

2024-06-10 Thread vsntk18
From: Joerg Roedel 

Save the results of the GHCB protocol negotiation into a data structure
and print information about versions supported and used to the kernel
log.

This is useful for debugging kexec issues in SEV-ES guests down the
road to quickly spot whether kexec is supported on the given host.

Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/kernel/sev-shared.c | 33 +++--
 arch/x86/kernel/sev.c|  8 
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
index b4f8fa0f722c..f5717eddf75b 100644
--- a/arch/x86/kernel/sev-shared.c
+++ b/arch/x86/kernel/sev-shared.c
@@ -23,6 +23,23 @@
 #define sev_printk_rtl(fmt, ...)
 #endif
 
+/*
+ * struct ghcb_info - Used to return GHCB protocol
+ *negotiation details.
+ *
+ * @hv_proto_min:  Minimum GHCB protocol version supported by Hypervisor
+ * @hv_proto_max:  Maximum GHCB protocol version supported by Hypervisor
+ * @vm_proto:  Protocol version the VM (this kernel) will use
+ */
+struct ghcb_info {
+   unsigned int hv_proto_min;
+   unsigned int hv_proto_max;
+   unsigned int vm_proto;
+};
+
+/* Negotiated GHCB protocol version */
+static struct ghcb_info ghcb_info __ro_after_init;
+
 /* I/O parameters for CPUID-related helpers */
 struct cpuid_leaf {
u32 fn;
@@ -159,12 +176,24 @@ static bool sev_es_negotiate_protocol(void)
if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
return false;
 
-   if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
-   GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
+   /* Sanity check untrusted input */
+   if (GHCB_MSR_PROTO_MIN(val) > GHCB_MSR_PROTO_MAX(val))
return false;
 
+   /* Use maximum supported protocol version */
ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), 
GHCB_PROTOCOL_MAX);
 
+   /*
+* Hypervisor does not support any protocol version required for this
+* kernel.
+*/
+   if (ghcb_version < GHCB_MSR_PROTO_MIN(val))
+   return false;
+
+   ghcb_info.hv_proto_min = GHCB_MSR_PROTO_MIN(val);
+   ghcb_info.hv_proto_max = GHCB_MSR_PROTO_MAX(val);
+   ghcb_info.vm_proto = ghcb_version;
+
return true;
 }
 
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 3342ed58e168..f0d87549b1e1 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1399,6 +1399,14 @@ void __init sev_es_init_vc_handling(void)
 
/* Secondary CPUs use the runtime #VC handler */
initial_vc_handler = (unsigned long)kernel_exc_vmm_communication;
+
+   /*
+* Print information about supported and negotiated GHCB protocol
+* versions.
+*/
+   pr_info("Hypervisor GHCB protocol version support: min=%u max=%u\n",
+   ghcb_info.hv_proto_min, ghcb_info.hv_proto_max);
+   pr_info("Using GHCB protocol version %u\n", ghcb_info.vm_proto);
 }
 
 static void __init vc_early_forward_exception(struct es_em_ctxt *ctxt)
-- 
2.34.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v6 01/10] x86/kexec/64: Disable kexec when SEV-ES is active

2024-06-10 Thread vsntk18
From: Joerg Roedel 

SEV-ES needs special handling to support kexec. Disable it when SEV-ES
is active until support is implemented.

Cc: sta...@vger.kernel.org
Signed-off-by: Joerg Roedel 
Signed-off-by: Vasant Karasulli 
---
 arch/x86/kernel/machine_kexec_64.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index cc0f7f70b17b..1dfb47df5c01 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -267,11 +267,22 @@ static void load_segments(void)
);
 }
 
+static bool machine_kexec_supported(void)
+{
+   if (cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
+   return false;
+
+   return true;
+}
+
 int machine_kexec_prepare(struct kimage *image)
 {
unsigned long start_pgtable;
int result;
 
+   if (!machine_kexec_supported())
+   return -ENOSYS;
+
/* Calculate the offsets */
start_pgtable = page_to_pfn(image->control_code_page) << PAGE_SHIFT;
 
-- 
2.34.1




[PATCH v6 00/10] x86/sev: KEXEC/KDUMP support for SEV-ES guests

2024-06-10 Thread vsntk18
From: Vasant Karasulli 

Hi,

here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.

Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.

This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.

The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.

The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.

The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
which is a GHCB protocol version 2 feature.

The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.

Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).

The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.

The previous version of this patch-set can be found here:

https://lore.kernel.org/kvm/20240408074049.7049-1-vsnt...@gmail.com/

Please review.

Thanks,
   Vasant

Changes v5->v6:
- Rebased to v6.10-rc3 kernel
   
Changes v4->v5:
- Rebased to v6.9-rc2 kernel
- Applied review comments by Tom Lendacky
  - Exclude the AP jump table related code for SEV-SNP guests

Changes v3->v4:
- Rebased to v6.8 kernel
- Applied review comments by Sean Christopherson
- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
  into a single function which makes caching jump table address
  unnecessary
- annotated struct sev_ap_jump_table_header with __packed attribute
- added code to set up real mode data segment at boot time instead of
  hardcoding the value.

Joerg Roedel (9):
  x86/kexec/64: Disable kexec when SEV-ES is active
  x86/sev: Save and print negotiated GHCB protocol version
  x86/sev: Set GHCB data structure version
  x86/sev: Setup code to park APs in the AP Jump Table
  x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
  x86/sev: Use AP Jump Table blob to stop CPU
  x86/sev: Add MMIO handling support to boot/compressed/ code
  x86/sev: Handle CLFLUSH MMIO events
  x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob

Vasant Karasulli (1):
  x86/sev: Exclude AP jump table related code for SEV-SNP guests

 arch/x86/boot/compressed/sev.c  |  45 +-
 arch/x86/include/asm/insn-eval.h|   1 +
 arch/x86/include/asm/realmode.h |   5 +
 arch/x86/include/asm/sev-ap-jumptable.h |  30 +
 arch/x86/include/asm/sev.h  |   7 +
 arch/x86/kernel/machine_kexec_64.c  |  12 +
 arch/x86/kernel/process.c   |   8 +
 arch/x86/kernel/sev-shared.c| 234 +-
 arch/x86/kernel/sev.c   | 376 +-
 arch/x86/lib/insn-eval-shared.c | 921 
 arch/x86/lib/insn-eval.c| 911 +--
 arch/x86/realmode/Makefile  |   9 +-
 arch/x86/realmode/init.c|   5 +-
 arch/x86/realmode/rm/Makefile   |  11 +-
 arch/x86/realmode/rm/header.S   |   3 +
 arch/x86/realmode/rm/sev.S  |  85 +++
 arch/x86/realmode/rmpiggy.S |   6 +
 arch/x86/realmode/sev/Makefile  |  33 +
 arch/x86/realmode/sev/ap_jump_table.S   | 131 
 arch/x86/realmode/sev/ap_jump_table.lds |  24 +
 20 files changed, 1711 insertions(+), 1146 deletions(-)
 create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
 create mode 100644 arch/x86/lib/insn-eval-share

Re: [PATCH 2/3] powerpc/kexec_load: add hotplug support

2024-06-10 Thread Hari Bathini




On 22/05/24 6:43 pm, Sourabh Jain wrote:

Kernel commits b741092d5976 ("powerpc/crash: add crash CPU hotplug
support") and 849599b702ef ("powerpc/crash: add crash memory hotplug
support") added crash CPU/Memory hotplug support on PowerPC. This patch
extends that support for the kexec_load syscall.

During CPU/Memory hotplug events on PowerPC, two kexec segments,
elfcorehdr, and FDT, get updated by the kernel. To ensure the kernel
can safely update these two kexec segments for the kdump image loaded
using the kexec_load system call, the following changes are made:

1. Extra size is allocated for both elfcorehdr and FDT to accommodate
additional resources in the future. For the elfcorehdr, the size hint
is taken from /sys/kernel/crash_elfcorehdr_size sysfs, while for FDT,
extra size is allocated to hold possible CPU nodes.

2. Both elfcorehdr and FDT are skipped from SHA calculation.

Cc: Aditya Gupta 
Cc: Baoquan He 
Cc: Coiby Xu 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
  kexec/arch/ppc64/crashdump-ppc64.c  |  16 ++-
  kexec/arch/ppc64/fdt.c  | 200 +++-
  kexec/arch/ppc64/include/arch/fdt.h |   2 +-
  kexec/arch/ppc64/kexec-elf-ppc64.c  |   2 +-
  kexec/arch/ppc64/kexec-ppc64.c  |  12 +-
  5 files changed, 225 insertions(+), 7 deletions(-)

diff --git a/kexec/arch/ppc64/crashdump-ppc64.c 
b/kexec/arch/ppc64/crashdump-ppc64.c
index 6d47898..c14b593 100644
--- a/kexec/arch/ppc64/crashdump-ppc64.c
+++ b/kexec/arch/ppc64/crashdump-ppc64.c
@@ -476,7 +476,7 @@ int load_crashdump_segments(struct kexec_info *info, char* 
mod_cmdline,
uint64_t max_addr, unsigned long min_base)
  {
void *tmp;
-   unsigned long sz;
+   unsigned long sz, memsz;
uint64_t elfcorehdr;
int nr_ranges, align = 1024, i;
unsigned long long end;
@@ -531,8 +531,18 @@ int load_crashdump_segments(struct kexec_info *info, char* 
mod_cmdline,
}
}
  
-	elfcorehdr = add_buffer(info, tmp, sz, sz, align, min_base,

-   max_addr, 1);
+   memsz = sz;
+   /* To support --hotplug, replace the calculated minimum size with the
+* value from /sys/kernel/crash_elfcorehdr_size and align it correctly.
+*/
+   if (do_hotplug) {
+   if (elfcorehdrsz > sz)
+   memsz = _ALIGN(elfcorehdrsz, align);
+   }
+
+   /* Record the location of the elfcorehdr for hotplug handling */
+   info->elfcorehdr = elfcorehdr = add_buffer(info, tmp, sz, memsz, align,
+  min_base, max_addr, 1);
reserve(elfcorehdr, sz);
/* modify and store the cmdline in a global array. This is later
 * read by flatten_device_tree and modified if required
diff --git a/kexec/arch/ppc64/fdt.c b/kexec/arch/ppc64/fdt.c
index 8bc6d2d..10abc29 100644
--- a/kexec/arch/ppc64/fdt.c
+++ b/kexec/arch/ppc64/fdt.c
@@ -17,6 +17,13 @@
  #include 
  #include 
  #include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../kexec.h"
+#include "../../kexec-syscall.h"
  
  /*

   * Let the kernel know it booted from kexec, as some things (e.g.
@@ -46,17 +53,208 @@ static int fixup_kexec_prop(void *fdt)
return 0;
  }
  
+static inline bool is_dot_dir(char * d_path)

+{
+   return d_path[0] == '.';
+}
+
+/*
+ * Returns size of files including file name size under the given
+ * @cpu_node_path.
+ */
+static unsigned int get_cpu_node_size(char *cpu_node_path)
+{
+   DIR *d;
+   struct dirent *de;
+   struct stat statbuf;
+   unsigned int cpu_node_size = 0;
+   char cpu_prop_path[2 * PATH_MAX];
+
+   d = opendir(cpu_node_path);
+   if (!d)
+   return 0;
+
+   while ((de = readdir(d)) != NULL) {
+   if (de->d_type != DT_REG)
+   continue;
+
+   memset(cpu_prop_path, '\0', PATH_MAX);
+   snprintf(cpu_prop_path, 2 * PATH_MAX, "%s/%s", cpu_node_path, 
de->d_name);
+
+   if (stat(cpu_prop_path, &statbuf))
+   continue;
+
+   cpu_node_size += statbuf.st_size;
+   cpu_node_size += strlen(de->d_name);
+   }
+
+   return cpu_node_size;
+}
+
+/*
+ * Checks if the node specified by the given @path represents a CPU node.
+ *
+ * Returns true if the @path has a "device_type" file containing "cpu";
+ * otherwise, returns false.
+ */
+static bool is_cpu_node(char *path)
+{
+   FILE *file;
+   bool ret = false;
+   char device_type[4];
+
+   file = fopen(path, "r");
+   if (!file)
+   return false;
+
+   memset(device_type, '\0', 4);
+   if (fread(device_type, 1, 3, file) < 3)
+   goto out;
+
+   if (strcmp(device_type, "cpu"))
+   goto out;
+
+   ret = true;
+
+out:
+   fclose(file);
+   return ret;
+}
+
+static unsigned int get_thr

Re: [PATCH 3/3] doc/hotplug: update man and --help

2024-06-10 Thread Hari Bathini




On 22/05/24 6:43 pm, Sourabh Jain wrote:

Update the man page and --help option to make the description of the
--hotplug option easier to understand.

Cc: Aditya Gupta 
Cc: Baoquan He 
Cc: Coiby Xu 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
  kexec/kexec.8 | 8 
  kexec/kexec.c | 3 ++-
  2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/kexec/kexec.8 b/kexec/kexec.8
index 9e995fe..7dddae9 100644
--- a/kexec/kexec.8
+++ b/kexec/kexec.8
@@ -140,10 +140,10 @@ Open a help file for
  .BR kexec .
  .TP
  .B \-\-hotplug


Can we have the description changed like:


-Setup for kernel modification of the elfcorehdr. This option performs
-the steps needed to support kernel updates to the elfcorehdr in the
-presence of hot un/plug and/or on/offline events. This option only
-useful for KEXEC_LOAD syscall.
+Helps avoid kdump kernel reload on CPU/Memory hotplug or on/offline events.
+If this option is enabled, the kexec segments will be set up in a way that
+the kernel can safely update them on CPU/memory hotplug and/or on/offline
+events. This option is only useful for the KEXEC_LOAD syscall.


"Setup kexec segments such that kernel can safely update them on 
CPU/Memory hot add/remove events. If this option is enabled, kernel does

in-kernel update of kexec segments on CPU/Memory hot add/remove events.
Helps avoid the need to reload kdump kernel."



  .TP
  .B \-i\ (\-\-no-checks)
  Fast reboot, no memory integrity checks.
diff --git a/kexec/kexec.c b/kexec/kexec.c
index 034cea6..2b06438 100644
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -1093,7 +1093,8 @@ void usage(void)
   "  back to the compatibility syscall when file 
based\n"
   "  syscall is not supported or the kernel did 
not\n"
   "  understand the image (default)\n"
-  " --hotplugSetup for kernel modification of 
elfcorehdr.\n"



+  " --hotplugHelps avoid kdump kernel reload on CPU/Memory 
hotplug\n"
+  " or on/offline events.\n"


"Do in-kernel update of kexec segments on CPU/Memory hot add/remove 
events. This avoids the need to reload kdump kernel."


- Hari

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/3] kexec_load: Use new kexec flag for hotplug support

2024-06-10 Thread Hari Bathini




On 22/05/24 6:43 pm, Sourabh Jain wrote:

Kernel commit 79365026f869 (crash: add a new kexec flag for hotplug
support) has introduced a new kexec flag to generalize hotplug support.
The newly introduced kexec flags for hotplug allow architectures to
exclude all the required kexec segments from SHA calculation so that
the kernel can update them on hotplug events. This was not possible
earlier with the KEXEC_UPDATE_ELFCOREHDR kexec flags since it was added
only for the elfcorehdr segment.

To enable architectures to control the list of kexec segments to exclude
when hotplug support is enabled, add a new architecture-specific
function named arch_do_exclude_segment. During the SHA calculation, this
function gets called to let the architecture decide whether a specific
kexec segment should be considered for SHA calculation or not.

Given that the KEXEC_UPDATE_ELFCOREHDR is no longer required and was
colliding with the KEXEC_LIVE_UPDATE update flag, it is removed.

Cc: Aditya Gupta 
Cc: Baoquan He 
Cc: Coiby Xu 
Cc: Hari Bathini 
Cc: Mahesh Salgaonkar 
Signed-off-by: Sourabh Jain 
---
  kexec/arch/arm/kexec-arm.c |  5 +
  kexec/arch/arm64/kexec-arm64.c |  4 
  kexec/arch/cris/kexec-cris.c   |  4 
  kexec/arch/hppa/kexec-hppa.c   |  5 +
  kexec/arch/i386/kexec-x86.c|  8 
  kexec/arch/ia64/kexec-ia64.c   |  4 
  kexec/arch/loongarch/kexec-loongarch.c |  5 +
  kexec/arch/m68k/kexec-m68k.c   |  5 +
  kexec/arch/mips/kexec-mips.c   |  4 
  kexec/arch/ppc/kexec-ppc.c |  4 
  kexec/arch/ppc64/kexec-ppc64.c |  5 +
  kexec/arch/s390/kexec-s390.c   |  5 +
  kexec/arch/sh/kexec-sh.c   |  5 +
  kexec/arch/x86_64/kexec-x86_64.c   |  5 +
  kexec/kexec-syscall.h  |  2 +-
  kexec/kexec.c  | 14 --
  kexec/kexec.h  |  2 ++
  17 files changed, 79 insertions(+), 7 deletions(-)

diff --git a/kexec/arch/arm/kexec-arm.c b/kexec/arch/arm/kexec-arm.c
index 49f35b1..34531f9 100644
--- a/kexec/arch/arm/kexec-arm.c
+++ b/kexec/arch/arm/kexec-arm.c
@@ -148,3 +148,8 @@ int have_sysfs_fdt(void)
  {
return !access(SYSFS_FDT, F_OK);
  }
+
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct 
kexec_info *UNUSED(info))
+{
+   return 0;
+}
diff --git a/kexec/arch/arm64/kexec-arm64.c b/kexec/arch/arm64/kexec-arm64.c
index 4a67b0d..9d052b0 100644
--- a/kexec/arch/arm64/kexec-arm64.c
+++ b/kexec/arch/arm64/kexec-arm64.c
@@ -1363,3 +1363,7 @@ void arch_reuse_initrd(void)
  void arch_update_purgatory(struct kexec_info *UNUSED(info))
  {
  }
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct 
kexec_info *UNUSED(info))
+{
+   return 0;
+}
diff --git a/kexec/arch/cris/kexec-cris.c b/kexec/arch/cris/kexec-cris.c
index 3b69709..7f09121 100644
--- a/kexec/arch/cris/kexec-cris.c
+++ b/kexec/arch/cris/kexec-cris.c
@@ -109,3 +109,7 @@ unsigned long add_buffer(struct kexec_info *info, const 
void *buf,
  buf_min, buf_max, buf_end, 1);
  }
  
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct kexec_info *UNUSED(info))

+{
+   return 0;
+}
diff --git a/kexec/arch/hppa/kexec-hppa.c b/kexec/arch/hppa/kexec-hppa.c
index 77c9739..a64dc3d 100644
--- a/kexec/arch/hppa/kexec-hppa.c
+++ b/kexec/arch/hppa/kexec-hppa.c
@@ -146,3 +146,8 @@ unsigned long virt_to_phys(unsigned long addr)
  {
return addr - phys_offset;
  }
+
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct 
kexec_info *UNUSED(info))
+{
+   return 0;
+}
diff --git a/kexec/arch/i386/kexec-x86.c b/kexec/arch/i386/kexec-x86.c
index 444cb69..b4947a0 100644
--- a/kexec/arch/i386/kexec-x86.c
+++ b/kexec/arch/i386/kexec-x86.c
@@ -208,3 +208,11 @@ void arch_update_purgatory(struct kexec_info *info)
elf_rel_set_symbol(&info->rhdr, "panic_kernel",
&panic_kernel, sizeof(panic_kernel));
  }
+
+int arch_do_exclude_segment(struct kexec_segment *seg_ptr, struct kexec_info 
*info)
+{
+   if (info->elfcorehdr == (unsigned long) seg_ptr->mem)
+   return 1;
+
+   return 0;
+}
diff --git a/kexec/arch/ia64/kexec-ia64.c b/kexec/arch/ia64/kexec-ia64.c
index 418d997..8d9c1f3 100644
--- a/kexec/arch/ia64/kexec-ia64.c
+++ b/kexec/arch/ia64/kexec-ia64.c
@@ -245,3 +245,7 @@ void arch_update_purgatory(struct kexec_info *UNUSED(info))
  {
  }
  
+int arch_do_exclude_segment(struct kexec_segment *UNUSED(seg_ptr), struct kexec_info *UNUSED(info))

+{
+   return 0;
+}
diff --git a/kexec/arch/loongarch/kexec-loongarch.c 
b/kexec/arch/loongarch/kexec-loongarch.c
index 32a42d2..9a50ff6 100644
--- a/kexec/arch/loongarch/kexec-loongarch.c
+++ b/kexec/arch/loongarch/kexec-loongarch.c
@@ -378,3 +378,8 @@ unsigned long add_buffer(struct kexec_info *info, const 
void *buf,
return add_bu