From: Waldemar Kozaczuk <jwkozac...@gmail.com> Committer: Nadav Har'El <n...@scylladb.com> Branch: master
aarch64: fix memory mapping initialization when booting with #vCPUs >= 2 As the issue #1092 explains, OSv hangs most of the time when booting with 2 or more vCPUs. More specifically it only happens in non-emulated mode on QEMU or Firecracker with KVM acceleration on real hardware host like Raspberry PI 4. It is also important to mention that very occasionally OSv successfully boots with more than one vCPU in such setup. Based on the results of the remote debugging session, it seems that OSv hangs on primary vCPU in the `smp_launch()` routine while waiting for all secondary vCPUs to come up. More specifically the stack backtrace on the secondary cpu seems to indicate an exception raised in `init_mmu` function in `boot.S` at line 161 (https://github.com/cloudius-systems/osv/blob/0051eb5dd74b9aa2d429629b046f4d0dc58b7b3a/arch/aarch64/boot.S#L161) which is executed right after `init_secondary_pt`. This would mean that secondary vCPU crashed. The secondary cpu registers `ttbr0_el1` and `ttbr1_el1` which should hold addresses of the translation table roots are set to 0 which seems to be incorrect and most likely caused the crash in `init_mmu`. Both `ttbr0_el1` and `ttbr1_el1` registers are prepared to be set in `init_secondary_pt` which copies values from `smpboot_ttbr0` and `smpboot_ttbr1` global variables. Both `smpboot_ttbr0` and `smpboot_ttbr1` variables are set in in the loop in `smp_launch()` for every secondary vCPU (which seems wrong as it should be done only once). It seems that when a secondary vCPU boots and calls `start_secondary_cpu`, it does not see the correct values of these global variables in memory possibly due to some memory coherency side effects. So this patch makes both `smpboot_ttbr0` and `smpboot_ttbr1` set much earlier exactly just when the runtime translation tables switch is executed on the primary cpu in the `switch_to_runtime_page_tables()` function. This change by itself makes OSv boot most of the time with 2 or vCPUs but it still hangs sometimes. This most likely can be explained by the fact that `start_secondary_cpu` directly uses runtime tranlation tables which resides in memory not visible by default. To counter it, this patch also changes `start_secondary_cpu` in boot.S to first enable the boot translation tables and only then switch to the runtime tables (see added calls to `init_boot_pt` and `init_mmu`). Based on the empirical tests (more than 100 executions) OSv can now boot with 2 or vCPUs with KVM acceleration on. Fixes #1092 Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com> Message-Id: <20201009210208.461186-1-jwkozac...@gmail.com> --- diff --git a/arch/aarch64/boot.S b/arch/aarch64/boot.S --- a/arch/aarch64/boot.S +++ b/arch/aarch64/boot.S @@ -157,6 +157,8 @@ start_secondary_cpu: isb bl init_cpu + bl init_boot_pt + bl init_mmu bl init_secondary_pt bl init_mmu diff --git a/arch/aarch64/mmu.cc b/arch/aarch64/mmu.cc --- a/arch/aarch64/mmu.cc +++ b/arch/aarch64/mmu.cc @@ -61,10 +61,15 @@ void flush_tlb_local() { static pt_element<4> page_table_root[2] __attribute__((init_priority((int)init_prio::pt_root))); u64 mem_addr; +extern "C" { /* see boot.S */ + extern u64 smpboot_ttbr0; + extern u64 smpboot_ttbr1; +} + void switch_to_runtime_page_tables() { - auto pt_ttbr0 = mmu::page_table_root[0].next_pt_addr(); - auto pt_ttbr1 = mmu::page_table_root[1].next_pt_addr(); + auto pt_ttbr0 = smpboot_ttbr0 = mmu::page_table_root[0].next_pt_addr(); + auto pt_ttbr1 = smpboot_ttbr1 = mmu::page_table_root[1].next_pt_addr(); asm volatile("msr ttbr0_el1, %0; isb;" ::"r" (pt_ttbr0)); asm volatile("msr ttbr1_el1, %0; isb;" ::"r" (pt_ttbr1)); mmu::flush_tlb_all(); diff --git a/arch/aarch64/smp.cc b/arch/aarch64/smp.cc --- a/arch/aarch64/smp.cc +++ b/arch/aarch64/smp.cc @@ -17,8 +17,6 @@ #include <alloca.h> extern "C" { /* see boot.S */ - extern u64 smpboot_ttbr0; - extern u64 smpboot_ttbr1; extern init_stack *smp_stack_free; extern u64 start_secondary_cpu(); } @@ -85,8 +83,6 @@ void smp_launch() attr.stack(81920).pin(c).name(name); c->init_idle_thread(); c->bringup_thread = new sched::thread([=] { secondary_bringup(c); }, attr, true); - smpboot_ttbr0 = processor::read_ttbr0(); - smpboot_ttbr1 = processor::read_ttbr1(); psci::_psci.cpu_on(c->arch.mpid, mmu::virt_to_phys(reinterpret_cast<void *>(start_secondary_cpu))); } while (smp_processors != sched::cpus.size()) -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/0000000000000c919105b15ff1a8%40google.com.