Hi,
On 23/01/18 20:16, Nadav Har'El wrote:
I don't have any bright ideas, but just a few small comments below,
hopefully (?) they will help something...
Appreciated...
This writes in "addr", which seems a reasonable address (doesn't seem
like junk).
In object::resolve_pltgot() you can see the addr is _base +
slot.r_offset maybe you
can print them and see with "nm"/"readelf" of the object being loaded if
this offset
address makes sense (in the PLT section)?
So that made sense as far as I can see:
(gdb)
#9 0x0000000000492c7b in elf::object::arch_relocate_jump_slot (
this=0xffffa0010327b400, sym=1, addr=0x10000aa0fe28, addend=0)
at arch/x64/arch-elf.cc:109
109 *static_cast<void**>(addr) = symsym.relocated_addr();
(gdb) p symsym.obj._base
$1 = (void *) 0x0
(gdb) up
#10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
this=0xffffa0010327b400, index=0) at core/elf.cc:692
692 if (!arch_relocate_jump_slot(sym, addr, slot.r_addend)) {
(gdb) p slot.r_offset
$2 = 2162216
(gdb) p/x slot.r_offset
$3 = 0x20fe28
(gdb)
$ readelf -a _build/default/rel/dbgp_webapi/erts-9.0.5/bin/erlexec |
grep 20fe28
00000020fe28 000100000007 R_X86_64_JUMP_SLO 0000000000000000
getenv@GLIBC_2.2.5 + 0
If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something like
https://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349
Can you reproduce this bug? If you can, you can confirm (or rule out)
this wild guess by changing in
arch/x64/mmu.cc, flush_tlb_all(), the line
if (sched::thread::current()->is_app())
to if(false).
If the bug goes away, it can be related. If it doesn't go away, than
it's not related.
I tried that, same crash.
But this is just a wild guess - probably wrong... I can't think of a
better explanation now.
#10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
this=0xffffa001042d9e00, index=0) at core/elf.cc:692
#11 0x00000000004021ca in elf_resolve_pltgot (index=0,
obj=0xffffa001042d9e00)
at core/elf.cc:1538
#12 0x000000000048727d in __elf_resolve_pltgot () at
arch/x64/elf-dl.S:47
#13 0xffffa001042d9e00 in ?? ()
This is strange, it's running dynamically-generated code, which calls
getenv()?
I don't believe so. I think this is right where erlexec is being
started. I'll work on verifying that now.
I have a start-otp.so which loads the erlexec and sets off a pthread to
run it, so my hypothesis is that this is at the point that start-otp is
loading up the erlexec library.
Cheers,
Rick
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.