boot: choose AP stack based on APIC ID

Krystian Hebel Tue, 12 Mar 2024 07:14:47 -0700

Hi,

On 26.01.2024 19:30, Julien Grall wrote:

Hi,

I am not too familiary with the x86 boot code. But I will give a tryto review :).


On 14/11/2023 17:49, Krystian Hebel wrote:

This is made as first step of making parallel AP bring-up possible. It
should be enough for pre-C code.

Signed-off-by: Krystian Hebel <krystian.he...@3mdeb.com>
---
  xen/arch/x86/boot/trampoline.S | 20 ++++++++++++++++++++
  xen/arch/x86/boot/x86_64.S     | 28 +++++++++++++++++++++++++++-
  xen/arch/x86/setup.c           |  7 +++++++
  3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/boot/trampoline.Sb/xen/arch/x86/boot/trampoline.S

index b8ab0ffdcbb0..ec254125016d 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -72,6 +72,26 @@ trampoline_protmode_entry:
          mov     $X86_CR4_PAE,%ecx
          mov     %ecx,%cr4
  +        /*

+ * Get APIC ID while we're in non-paged mode. Start bychecking if

+         * x2APIC is enabled.
+         */
+        mov     $MSR_APIC_BASE, %ecx
+        rdmsr
+        and     $APIC_BASE_EXTD, %eax
+        jnz     .Lx2apic
+
+        /* Not x2APIC, read from MMIO */
+        mov     0xfee00020, %esp
+        shr     $24, %esp
+        jmp     1f
+
+.Lx2apic:
+        mov     $(MSR_X2APIC_FIRST + (0x20 >> 4)), %ecx
+        rdmsr
+        mov     %eax, %esp
+1:
+
          /* Load pagetable base register. */
          mov     $sym_offs(idle_pg_table),%eax
          add     bootsym_rel(trampoline_xen_phys_start,4,%eax)
diff --git a/xen/arch/x86/boot/x86_64.S b/xen/arch/x86/boot/x86_64.S
index 04bb62ae8680..b85b47b5c1a0 100644
--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -15,7 +15,33 @@ ENTRY(__high_start)
          mov     $XEN_MINIMAL_CR4,%rcx
          mov     %rcx,%cr4
  -        mov     stack_start(%rip),%rsp
+        test    %ebx,%ebx
+        cmovz   stack_start(%rip), %rsp
+        jz      .L_stack_set
+
+        /* APs only: get stack base from APIC ID saved in %esp. */
+        mov     $-1, %rax
+        lea     x86_cpu_to_apicid(%rip), %rcx

I would consider to move this patch after #2 and #3, so the logic isnot modified again. This would help the review.

I agree, maybe even after #4 after that patch is modified according toother comments.

+1:
+        add     $1, %rax
+        cmp     $NR_CPUS, %eax
+        jb      2f
I think we can get rid of this jump by reworking the loop so %eax istested as the end of the loop. But this is boot code, so it ispossibly not worth it. I will leave the x86 maintainers commenting.

Not sure if I understood "end of the loop" correctly, but if I did, itwould result in out-of-bounds read. Anyway, this is changed by furtherpatches which I still have to reorder, I'll check if final form can beimproved.

+        hlt
+2:
+        cmp     %esp, (%rcx, %rax, 4)
+        jne     1b
+
+        /* %eax is now Xen CPU index. */
+        lea     stack_base(%rip), %rcx
+        mov     (%rcx, %rax, 8), %rsp
+
+        test    %rsp,%rsp
+        jnz     1f
+        hlt
+1:

NIT: Can you use 3? This makes the code easier to read and less proneto error (you have two very close 1).

Ack

+        add     $(STACK_SIZE - CPUINFO_sizeof), %rsp
+
+.L_stack_set:
            /* Reset EFLAGS (subsumes CLI and CLD). */
          pushq   $0
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index a3d3f797bb1e..1285969901e0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c

@@ -1951,6 +1951,7 @@ void __init noreturn __start_xen(unsigned longmbi_p)

       */
      if ( !pv_shim )
      {
+        /* Separate loop to make parallel AP bringup possible. */

The loop split seems to be unrelated to this patch. Actually, I wasexpecting that only the assembly code would be modified.

Fair point, I originally left it here so the code can be bisected ifneeded, but code changed significantly since then. In current form itshould be safe to do this in the last commit.

          for_each_present_cpu ( i )
          {
              /* Set up cpu_to_node[]. */
@@ -1958,6 +1959,12 @@ void __init noreturn __start_xen(unsigned longmbi_p)
              /* Set up node_to_cpumask based on cpu_to_node[]. */
              numa_add_cpu(i);
  +            if ( stack_base[i] == NULL )
+                stack_base[i] = cpu_alloc_stack(i);
I don't quite understand this change at least in the context of thispatch. AFAICT the stack will be currently allocated incpu_smpboot_callback() which is called while the CPU is prepared. Soyou should not need this allocation right now.
Looking at the rest of the series, it seems you allocate the stackearlier so you start the CPU bring-up earlier. But they will still beheld in assembly code until cpu_up() is called.
So effectively, part of the C part of the CPUs bring-up is stillserialized. Did I understand the logic correctly?
If so, I would suggest to clarify it in the series because this wasn'tobvious to me (I was expecting start_secondary() would also run inparallell).

start_secondary() is started in parallel, CPUs are not held in assembly.Calling C (almost) always requires stack, and most of this series comesto making stack available for all APs at once, just so APs can loopearly in start_secondary().

You are correct, most of C part is serialized. I tried to make itparallel as well but quickly gave up. Some of the notifiers callbacksare resistant against any attempts at parallelization, and this set ofpatches already gave satisfactory improvements in boot time (and wasenough to go through peculiar SMP bring-up used by Intel TXT dynamiclaunch, which is the reason why I had to do it in the first place).

Regarding the change in setup.c, I think it would make more sense inpatch #9.
Cheers,

Best regards,

--
Krystian Hebel
Firmware Engineer
https://3mdeb.com | @3mdeb_com

Re: [XEN PATCH 1/9] x86/boot: choose AP stack based on APIC ID

Reply via email to