The imported boot.S places the boot stack inside the .bss segment:
.bss
.boot_stack:
.space 4096
.boot_stack_end:
c_boot_entry() is the first C function called from _start, with sp
already pointing at .boot_stack_end. Its first action is to call
zero_out_bss(), which memsets [__bss_start, __bss_end) — the whole
.bss range, including the very boot stack the kernel is *currently
running on*. That wipes the saved x29/x30 and any locals the
compiler spilled on entry, so the next return / function call
branches to 0 and the kernel hangs in EL1.
Move the boot stack into its own `.boot_stack` nobits section and
place that section after `__bss_end` in the linker script so
zero_out_bss() leaves it alone:
.section .boot_stack, "aw", %nobits
boot_stack:
.space 4096
.boot_stack_end:
Brought up under qemu-system-aarch64 -M virt the bug fires
immediately; wip-aarch64 likely never exercised the
zero_out_bss-from-_start path because its testing was on a
different boot route.
---
aarch64/aarch64/boot.S | 10 ++++++++--
aarch64/ldscript | 3 +++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/aarch64/aarch64/boot.S b/aarch64/aarch64/boot.S
index 85d3b944..ab736489 100644
--- a/aarch64/aarch64/boot.S
+++ b/aarch64/aarch64/boot.S
@@ -92,8 +92,14 @@ ENTRY(_start)
b EXT(c_boot_entry)
END(_start)
- .bss
-.boot_stack:
+ /*
+ * Put the boot stack in its own nobits section so it lives outside
+ * [__bss_start, __bss_end). Otherwise c_boot_entry's call to
+ * zero_out_bss() (which memsets the whole BSS region) would clobber
+ * its own saved x29/x30, sending us to PC=0 on ret.
+ */
+ .section .boot_stack, "aw", %nobits
+boot_stack:
.space 4096
.boot_stack_end:
diff --git a/aarch64/ldscript b/aarch64/ldscript
index 236fc6f8..a5aec69d 100644
--- a/aarch64/ldscript
+++ b/aarch64/ldscript
@@ -27,6 +27,9 @@ SECTIONS
__bss_start = .;
*(.bss);
__bss_end = .;
+ /* Boot stack lives in its own nobits region after __bss_end so
+ it survives zero_out_bss() running from within itself. */
+ *(.boot_stack);
}
_image_end = .;
}
--
2.54.0