Re: Page fault outside of application

Nadav Har'El Tue, 23 Jan 2018 04:17:55 -0800

On Tue, Jan 23, 2018 at 12:40 PM, Rick Payne <ri...@rossfell.co.uk> wrote:


>
> A few moving parts, so not sure what is causing this - but trying to start
> an erlang application I'm seeing this:
>

I don't have any bright ideas, but just a few small comments below,
hopefully (?) they will help something...


> eth0: 192.168.122.61
> page fault outside application, addr: 0x000010000a60fe28
> [registers]
> RIP: 0x0000000000492dd1 <elf::object::arch_relocate_jump_slot(unsigned
> int, void*, long)+67>
> C
> (gdb) bt
> #0  processor::cli_hlt () at arch/x64/processor.hh:248
> #1  0x0000000000209ac4 in arch::halt_no_interrupts () at
> arch/x64/arch.hh:48
> #2  0x0000000000499033 in osv::halt () at arch/x64/power.cc:24
> #3  0x000000000022c65f in abort (fmt=0xa23855 "Aborted\n") at
> runtime.cc:132
> #4  0x000000000022c522 in abort () at runtime.cc:98
> #5  0x00000000003c4b26 in mmu::vm_sigsegv (addr=17592360173096,
>     ef=0xffff800104713068) at core/mmu.cc:1316
> #6  0x00000000003c4bc2 in mmu::vm_fault (addr=17592360173096,
>     ef=0xffff800104713068) at core/mmu.cc:1330
> #7  0x00000000004887fd in page_fault (ef=0xffff800104713068)
>     at arch/x64/mmu.cc:38
> #8  <signal handler called>
> #9  0x0000000000492dd1 in elf::object::arch_relocate_jump_slot (
>     this=0xffffa001042d9e00, sym=1, addr=0x10000a60fe28, addend=0)
>     at arch/x64/arch-elf.cc:109
>

This writes in "addr", which seems a reasonable address (doesn't seem like
junk).
In object::resolve_pltgot() you can see the addr is _base + slot.r_offset
maybe you
can print them and see with "nm"/"readelf" of the object being loaded if
this offset
address makes sense (in the PLT section)?

If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something like
https://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349
Can you reproduce this bug? If you can, you can confirm (or rule out) this
wild guess by changing in
arch/x64/mmu.cc, flush_tlb_all(), the line

if (sched::thread::current()->is_app())

to if(false).

If the bug goes away, it can be related. If it doesn't go away, than it's
not related.

But this is just a wild guess - probably wrong... I can't think of a better
explanation now.


#10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
>     this=0xffffa001042d9e00, index=0) at core/elf.cc:692
> #11 0x00000000004021ca in elf_resolve_pltgot (index=0,
> obj=0xffffa001042d9e00)
>     at core/elf.cc:1538
> #12 0x000000000048727d in __elf_resolve_pltgot () at arch/x64/elf-dl.S:47
> #13 0xffffa001042d9e00 in ?? ()
>

This is strange, it's running dynamically-generated code, which calls
getenv()?

#14 0x00000000042d9e00 in ?? ()
> #15 0x0000000000000000 in ?? ()
>
> Any pointers as to how to debug this further? It seems to be trying to
> resolve symbols in 'erlexec' - specifically getenv.
>
> Cheers
> Rick
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Page fault outside of application

Reply via email to