Re: Page fault outside of application

2018-01-30 Thread Rick Payne
On Tue, 2018-01-30 at 11:47 +0200, Nadav Har'El wrote:
> I have a vague feeling that fix_permissions() cannot just work on the
> whole object it needs to know which of the PT_LOAD segments (see
> file::load_segment()) the RELRO falls in, but I'm hazy on the
> details. Maybe even file::load_segment() maps the segment with the
> wrong alignment? But unfortunately, I cannot even reproduce the
> problem you are seeing (even though I do have gcc 7.2.1), let alone
> fix it.

Thanks for the insights. I should be able to make our erlang/otp repo
public shortly, then you can try and reproduce it with the code base
I'm using.

I'll try and get to it this week...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-30 Thread Nadav Har'El
On Mon, Jan 29, 2018 at 3:51 PM, Nadav Har'El  wrote:

>
> On Mon, Jan 29, 2018 at 12:16 PM, Rick Payne  wrote:
>
>>
>> Maybe I'm not following. The GNU_RELO sections look the same between
>> the 2 versions of erlexec. First one (-ubuntu17.10) fails, second one
>> is fine:
>>
>> rickp@mo:~$ readelf --headers /usr/local/packages/OTP-20.0.5-OSv-
>> ubuntu17.10/erts-9.0.5/bin/erlexec | grep -2 RELRO
>>   GNU_STACK  0x 0x
>> 0x
>>  0x 0x  RW 0x10
>>   GNU_RELRO  0xebe8 0x0020ebe8
>> 0x0020ebe8
>>  0x0418 0x0418  R  0x1
>>
>> rickp@mo:~$ readelf --headers /usr/local/packages/OTP-20.0.5-OSv/erts-
>> 9.0.5/bin/erlexec | grep -2 RELRO
>>   GNU_STACK  0x 0x
>> 0x
>>  0x 0x  RW 0x10
>>   GNU_RELRO  0xec08 0x0020ec08
>> 0x0020ec08
>>  0x03f8 0x03f8  R  0x1
>>
>> > Only when "-z now" is used during linking (DT_BIND_NOW object flag)
>> > do we do all the function lookups on startup (see
>> > object::relocate_pltgot()) and then, it's ok that the .GOT.PLT is
>> > also marked RELRO and made read-only.
>> >
>> > I'm *guessing* (with no evidence) that one of the following happened:
>> > 1. Your compiler defaults to "full relro" (-Wl,-z,now -Wl,-z,relro)
>> > but for some reason object::relocate_pltgot() doesn't recognize the
>> > bind_now.
>>
>> So there is definitely a difference in the binaries. In the one that
>> fails, getenv is defined like this, in the .rela.plt section:
>>
>> 0020ee30  00010007 R_X86_64_JUMP_SLO  getenv@GL
>> IBC_2.2.5 + 0
>>
>
> So this address,  0020ee30, is beyond the end of the GNU_RELRO
> section, 0x0020ebe8
> So it should NOT have been made read-only.
>
> Maybe we're making a mistake with page alignments... When these addresses
> are translated to memory addresses,
> they need to be on different pages. Maybe we're doing something wrong?
>

I have a feeling - though I am really rusty on the details - that we're
indeed doing something wrong here (and above we can
see the evidence that something is wrong - an address outside the GNU_RELRO
area was made read-only).

https://lists.kernelnewbies.org/pipermail/kernelnewbies/2013-March/007647.html
explains (though I didn't fully understand...) how although the RELRO
addresses look like they are not page aligned, they actually are.

But if I add printouts in fix_permissions() I see information like start
0x216d68 (not page aligned!) sz 0x298, and then we "align" those to vstart
0x216000 memsz 0x1000 but when doing that, we make too much memory
read-only!

I have a vague feeling that fix_permissions() cannot just work on the whole
object it needs to know which of the PT_LOAD segments (see
file::load_segment()) the RELRO falls in, but I'm hazy on the details.
Maybe even file::load_segment() maps the segment with the wrong alignment?
But unfortunately, I cannot even reproduce the problem you are seeing (even
though I do have gcc 7.2.1), let alone fix it.



>
> I don't remember now how these offsets are translated to memory addresses,
> we should review that code.
>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 11:43 +0200, Nadav Har'El wrote:
> 1. Your compiler defaults to "full relro" (-Wl,-z,now -Wl,-z,relro)
> but for some reason object::relocate_pltgot() doesn't recognize the
> bind_now.

FWIW, on both workign and non-working builds, I see '-pie -z now -z
relro' being passed to the linker stage for erlexec. I see very little
difference between the two :(

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 12:27 +0200, Nadav Har'El wrote:
> Both versions used "-pie", not "-shared"?

Should be, yes. Its exactly the same build setup and the Makefile shows
'-pie' for LDFLAGS.

I don't think gcc7.2 contains any of the -mindirect-branch changes, so
thats a red-herring. I'll continue poking at this tomorrow (its getting
late here).

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 11:43 +0200, Nadav Har'El wrote:
> 
> Hmm, I don't know, I wasn't aware anything like that changed.
> We usually change parts of the object marked by PT_GNU_RELRO to read-
> only in object::fix_permissions(), I'm guessing (but didn't check)
> this what caused the read-only page you're seeing.

I'll take a look.

> The compiler usually does NOT mark the .GOT.PLT section - for
> function lookup - as RELRO, because this needs to be modified after
> startup, every time a function is used for the first time;

Maybe I'm not following. The GNU_RELO sections look the same between
the 2 versions of erlexec. First one (-ubuntu17.10) fails, second one
is fine:

rickp@mo:~$ readelf --headers /usr/local/packages/OTP-20.0.5-OSv-
ubuntu17.10/erts-9.0.5/bin/erlexec | grep -2 RELRO
  GNU_STACK  0x 0x
0x
 0x 0x  RW 0x10
  GNU_RELRO  0xebe8 0x0020ebe8
0x0020ebe8
 0x0418 0x0418  R  0x1

rickp@mo:~$ readelf --headers /usr/local/packages/OTP-20.0.5-OSv/erts-
9.0.5/bin/erlexec | grep -2 RELRO
  GNU_STACK  0x 0x
0x
 0x 0x  RW 0x10
  GNU_RELRO  0xec08 0x0020ec08
0x0020ec08
 0x03f8 0x03f8  R  0x1

> Only when "-z now" is used during linking (DT_BIND_NOW object flag)
> do we do all the function lookups on startup (see
> object::relocate_pltgot()) and then, it's ok that the .GOT.PLT is
> also marked RELRO and made read-only.
> 
> I'm *guessing* (with no evidence) that one of the following happened:
> 1. Your compiler defaults to "full relro" (-Wl,-z,now -Wl,-z,relro)
> but for some reason object::relocate_pltgot() doesn't recognize the
> bind_now.

So there is definitely a difference in the binaries. In the one that
fails, getenv is defined like this, in the .rela.plt section:

0020ee30  00010007 R_X86_64_JUMP_SLO  getenv@GL
IBC_2.2.5 + 0

But in the one that works, its like this, .rela.dyn section:

0020ee28  00010006 R_X86_64_GLOB_DAT  getenv@GL
IBC_2.2.5 + 0

I see LDFLAGS being set to '-pie' so I don't really understand why the
first one is a jump slot, vs what I'd expect (GLOB_DAT).

> 2. Somehow the loop in object::relocate_pltgot() missed some of the
> functions - like getenv() 

I think its suspicious that getenv() is the first thing to be fixed up,
so I suspect its more fundamental.

> 3. Something in the new compiler changed the meaning of PT_GNU_RELRO
> or added other flags which confused object::fix_permissions() and
> caused it to make a page read-only when it shouldn't have.

Ok. I think I need to do some more reading on elf...

Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Nadav Har'El
On Mon, Jan 29, 2018 at 11:20 AM, Rick Payne  wrote:

> On Mon, 2018-01-29 at 10:54 +0200, Nadav Har'El wrote:
>
> This all seems reasonable.
> Maybe we somehow got the PLT becoming read-only, so we are getting a
> pagefault trying to write to it?
> Can you please try in gdb "osv mmap" and look at the mapping which
> includes the faulting address (0x1aa0fe28), is it read-write or
> read-only?
>
>
> New build, so a slightly different address, but in the same range (and its
> the same crash). I think you've nailed it though:
>
> (gdb) up
> #6  0x003c451c in mmu::vm_fault (addr=17592355974704,
> ef=0x83d82068) at core/mmu.cc:1330
> 1330 vm_sigsegv(addr, ef);
> (gdb) p/x addr
> $1 = 0x1a20ee30
>
> 0x1a00 0x1a00f000 [60.0 kB]flags=fmF  perm=rx 
>   offset=0x
> path=/otp/erts-9.0.5/bin/erlexec
> 0x1a20e000 0x1a20f000 [4.0 kB] flags=fmF  perm=r  
>   offset=0xe000
> path=/otp/erts-9.0.5/bin/erlexec
> 0x1a20f000 0x1a21 [4.0 kB] flags=fmF  perm=rw 
>   offset=0xf000
> path=/otp/erts-9.0.5/bin/erlexec
>
> That address is in the second segment, and thus marked 'r'. Is gcc7 doing
> something different thats incompatible with the elf loader in OSv? Related
> to the intel fiasco?
>

Hmm, I don't know, I wasn't aware anything like that changed.
We usually change parts of the object marked by PT_GNU_RELRO to read-only
in object::fix_permissions(), I'm guessing (but didn't check) this what
caused the read-only page you're seeing.
The compiler usually does NOT mark the .GOT.PLT section - for function
lookup - as RELRO, because this needs to be modified after startup, every
time a function is used for the first time;
Only when "-z now" is used during linking (DT_BIND_NOW object flag) do we
do all the function lookups on startup (see object::relocate_pltgot()) and
then, it's ok that the .GOT.PLT is also marked RELRO and made read-only.

I'm *guessing* (with no evidence) that one of the following happened:
1. Your compiler defaults to "full relro" (-Wl,-z,now -Wl,-z,relro) but for
some reason object::relocate_pltgot() doesn't recognize the bind_now.
2. Somehow the loop in object::relocate_pltgot() missed some of the
functions - like getenv()
3. Something in the new compiler changed the meaning of PT_GNU_RELRO or
added other flags which confused object::fix_permissions() and caused it to
make a page read-only when it shouldn't have.

Good luck (and thanks) on figuring this out,
Nadav.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Rick Payne
On Mon, 2018-01-29 at 10:54 +0200, Nadav Har'El wrote:
> This all seems reasonable.
> Maybe we somehow got the PLT becoming read-only, so we are getting a
> pagefault trying to write to it?
> Can you please try in gdb "osv mmap" and look at the mapping which
> includes the faulting address (0x1aa0fe28), is it read-write or
> read-only?

New build, so a slightly different address, but in the same range (and
its the same crash). I think you've nailed it though:
(gdb) up#6  0x003c451c in mmu::vm_fault
(addr=17592355974704, ef=0x83d82068) at
core/mmu.cc:13301330vm_sigsegv(addr, ef);(gdb) p/x
addr$1 = 0x1a20ee30
0x1a00 0x1a00f000 [60.0
kB]flags=fmF  perm=rx   offset=0x path=/otp/erts-
9.0.5/bin/erlexec0x1a20e000 0x1a20f000 [4.0
kB] flags=fmF  perm=roffset=0xe000 path=/otp/erts-
9.0.5/bin/erlexec0x1a20f000 0x1a21 [4.0
kB] flags=fmF  perm=rw   offset=0xf000 path=/otp/erts-
9.0.5/bin/erlexec
That address is in the second segment, and thus marked 'r'. Is gcc7
doing something different thats incompatible with the elf loader in
OSv? Related to the intel fiasco?
Cheers,Rick

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-29 Thread Nadav Har'El
On Wed, Jan 24, 2018 at 11:07 AM, Rick Payne  wrote:

> Hi,
>
> On 23/01/18 20:16, Nadav Har'El wrote:
>
>> I don't have any bright ideas, but just a few small comments below,
>> hopefully (?) they will help something...
>>
>
> Appreciated...
>
> This writes in "addr", which seems a reasonable address (doesn't seem like
>> junk).
>> In object::resolve_pltgot() you can see the addr is _base + slot.r_offset
>> maybe you
>> can print them and see with "nm"/"readelf" of the object being loaded if
>> this offset
>> address makes sense (in the PLT section)?
>>
>
> So that made sense as far as I can see:
>
> (gdb)
> #9  0x00492c7b in elf::object::arch_relocate_jump_slot (
> this=0xa0010327b400, sym=1, addr=0x1aa0fe28, addend=0)
> at arch/x64/arch-elf.cc:109
> 109 *static_cast(addr) = symsym.relocated_addr();
> (gdb) p symsym.obj._base
> $1 = (void *) 0x0
> (gdb) up
> #10 0x003fdfd7 in elf::object::resolve_pltgot (
> this=0xa0010327b400, index=0) at core/elf.cc:692
> 692 if (!arch_relocate_jump_slot(sym, addr, slot.r_addend)) {
> (gdb) p slot.r_offset
> $2 = 2162216
> (gdb) p/x slot.r_offset
> $3 = 0x20fe28
> (gdb)
>
> $ readelf -a _build/default/rel/dbgp_webapi/erts-9.0.5/bin/erlexec | grep
> 20fe28
> 0020fe28  00010007 R_X86_64_JUMP_SLO 
> getenv@GLIBC_2.2.5 + 0
>

This all seems reasonable.
Maybe we somehow got the PLT becoming read-only, so we are getting a
pagefault trying to write to it?
Can you please try in gdb "osv mmap" and look at the mapping which includes
the faulting address (0x1aa0fe28), is it read-write or read-only?

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-24 Thread Rick Payne

On 24/01/18 17:09, Rick Payne wrote:

Hi Geraldo,

On 23/01/18 19:58, Geraldo Netto wrote:

Hello Rick,

Rick, could you please, provide the full output with the -V ?
eg: scripts/run.py  -V


Its a custom build, I'm running it via qemu direct.


Here it is:

qemu-system-x86_64: -mon chardev=stdio,mode=readline,default: option 
'default' does nothing and is deprecated

OSv v0.24-497-gbb2c4e4e-rebar3
2 CPUs detected
Firmware vendor: SeaBIOS
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
vga: Add VGA device instance
eth0: ethernet address: 00:11:11:11:11:01
virtio-blk: Add blk device instances 0 as vblk0, devsize=524288000
random: virtio-rng registered as a source.
random: intel drng, rdrand registered as a source.
random:  initialized
VFS: unmounting /dev
VFS: mounting zfs at /zfs
zfs: mounting osv/zfs from device /dev/vblk0.1
random: device unblocked.
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
program zpool.so returned 1
BSD shrinker: event handler list found: 0xa0010092a300
BSD shrinker found: 1
BSD shrinker: unlocked, running
[I/32 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1711999488]
[I/32 dhcp]: Waiting for IP...
[I/32 dhcp]: Broadcasting DHCPDISCOVER message with xid: [507118272]
[I/32 dhcp]: Waiting for IP...
[I/32 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1416187925]
[I/32 dhcp]: Waiting for IP...
[I/212 dhcp]: Received DHCPOFFER message from DHCP server: 192.168.122.1 
regarding offerred IP address: 192.168.122.61
[I/212 dhcp]: Broadcasting DHCPREQUEST message with xid: [1416187925] to 
SELECT offered IP: 192.168.122.61

[I/212 dhcp]: DHCP received hostname: osv

[I/212 dhcp]: Received DHCPACK message from DHCP server: 192.168.122.1 
regarding offerred IP address: 192.168.122.61
[I/212 dhcp]: Server acknowledged IP 192.168.122.61 for interface eth0 
with time to lease in seconds: 3600

eth0: 192.168.122.61
[I/212 dhcp]: Configuring eth0: ip 192.168.122.61 subnet mask 
255.255.255.0 gateway 192.168.122.1 MTU 1500

[I/212 dhcp]: Set hostname to: osv
Running from /init/00-cmdline: /usr/mgmt/cloud-init.so;

Running from /init/30-auto-02: /libhttpserver-api.so &!
httpserver: loaded plugin from path: 
/usr/mgmt/plugins/libhttpserver-api_fs.so

page fault outside application, addr: 0x1a20fe28
[registers]
RIP: 0x00492c7b 

RFL: 0x00010206  CS:  0x0008  SS:  0x0010
RAX: 0x1a20fe28  RBX: 0xa00104529530  RCX: 
0x65746567  RDX: 0x006c38bb
RSI: 0x1a0011d6  RDI: 0x00dd8290  RBP: 
0x202fe690  R8:  0x0010
R9:  0x900100735000  R10: 0x202feb20  R11: 
0x0001e200  R12: 0x0009
R13: 0x900100735000  R14: 0x202feb20  R15: 
0x0001e200  RSP: 0x202fe660

Aborted


Cheers,
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-24 Thread Rick Payne

Hi,

On 23/01/18 20:16, Nadav Har'El wrote:
I don't have any bright ideas, but just a few small comments below, 
hopefully (?) they will help something...


Appreciated...

This writes in "addr", which seems a reasonable address (doesn't seem 
like junk).
In object::resolve_pltgot() you can see the addr is _base + 
slot.r_offset maybe you
can print them and see with "nm"/"readelf" of the object being loaded if 
this offset

address makes sense (in the PLT section)?


So that made sense as far as I can see:

(gdb)
#9  0x00492c7b in elf::object::arch_relocate_jump_slot (
this=0xa0010327b400, sym=1, addr=0x1aa0fe28, addend=0)
at arch/x64/arch-elf.cc:109
109 *static_cast(addr) = symsym.relocated_addr();
(gdb) p symsym.obj._base
$1 = (void *) 0x0
(gdb) up
#10 0x003fdfd7 in elf::object::resolve_pltgot (
this=0xa0010327b400, index=0) at core/elf.cc:692
692 if (!arch_relocate_jump_slot(sym, addr, slot.r_addend)) {
(gdb) p slot.r_offset
$2 = 2162216
(gdb) p/x slot.r_offset
$3 = 0x20fe28
(gdb)

$ readelf -a _build/default/rel/dbgp_webapi/erts-9.0.5/bin/erlexec | 
grep 20fe28
0020fe28  00010007 R_X86_64_JUMP_SLO  
getenv@GLIBC_2.2.5 + 0



If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something like 
https://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349
Can you reproduce this bug? If you can, you can confirm (or rule out) 
this wild guess by changing in

arch/x64/mmu.cc, flush_tlb_all(), the line

if (sched::thread::current()->is_app())

to if(false).

If the bug goes away, it can be related. If it doesn't go away, than 
it's not related.


I tried that, same crash.

But this is just a wild guess - probably wrong... I can't think of a 
better explanation now.



#10 0x003fdfd7 in elf::object::resolve_pltgot (
     this=0xa001042d9e00, index=0) at core/elf.cc:692
#11 0x004021ca in elf_resolve_pltgot (index=0,
obj=0xa001042d9e00)
     at core/elf.cc:1538
#12 0x0048727d in __elf_resolve_pltgot () at
arch/x64/elf-dl.S:47
#13 0xa001042d9e00 in ?? ()


This is strange, it's running dynamically-generated code, which calls 
getenv()?


I don't believe so. I think this is right where erlexec is being 
started. I'll work on verifying that now.


I have a start-otp.so which loads the erlexec and sets off a pthread to 
run it, so my hypothesis is that this is at the point that start-otp is 
loading up the erlexec library.


Cheers,
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-23 Thread Nadav Har'El
On Tue, Jan 23, 2018 at 12:40 PM, Rick Payne  wrote:

>
> A few moving parts, so not sure what is causing this - but trying to start
> an erlang application I'm seeing this:
>

I don't have any bright ideas, but just a few small comments below,
hopefully (?) they will help something...


> eth0: 192.168.122.61
> page fault outside application, addr: 0x1a60fe28
> [registers]
> RIP: 0x00492dd1  int, void*, long)+67>
> C
> (gdb) bt
> #0  processor::cli_hlt () at arch/x64/processor.hh:248
> #1  0x00209ac4 in arch::halt_no_interrupts () at
> arch/x64/arch.hh:48
> #2  0x00499033 in osv::halt () at arch/x64/power.cc:24
> #3  0x0022c65f in abort (fmt=0xa23855 "Aborted\n") at
> runtime.cc:132
> #4  0x0022c522 in abort () at runtime.cc:98
> #5  0x003c4b26 in mmu::vm_sigsegv (addr=17592360173096,
> ef=0x800104713068) at core/mmu.cc:1316
> #6  0x003c4bc2 in mmu::vm_fault (addr=17592360173096,
> ef=0x800104713068) at core/mmu.cc:1330
> #7  0x004887fd in page_fault (ef=0x800104713068)
> at arch/x64/mmu.cc:38
> #8  
> #9  0x00492dd1 in elf::object::arch_relocate_jump_slot (
> this=0xa001042d9e00, sym=1, addr=0x1a60fe28, addend=0)
> at arch/x64/arch-elf.cc:109
>

This writes in "addr", which seems a reasonable address (doesn't seem like
junk).
In object::resolve_pltgot() you can see the addr is _base + slot.r_offset
maybe you
can print them and see with "nm"/"readelf" of the object being loaded if
this offset
address makes sense (in the PLT section)?

If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something like
https://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349
Can you reproduce this bug? If you can, you can confirm (or rule out) this
wild guess by changing in
arch/x64/mmu.cc, flush_tlb_all(), the line

if (sched::thread::current()->is_app())

to if(false).

If the bug goes away, it can be related. If it doesn't go away, than it's
not related.

But this is just a wild guess - probably wrong... I can't think of a better
explanation now.


#10 0x003fdfd7 in elf::object::resolve_pltgot (
> this=0xa001042d9e00, index=0) at core/elf.cc:692
> #11 0x004021ca in elf_resolve_pltgot (index=0,
> obj=0xa001042d9e00)
> at core/elf.cc:1538
> #12 0x0048727d in __elf_resolve_pltgot () at arch/x64/elf-dl.S:47
> #13 0xa001042d9e00 in ?? ()
>

This is strange, it's running dynamically-generated code, which calls
getenv()?

#14 0x042d9e00 in ?? ()
> #15 0x in ?? ()
>
> Any pointers as to how to debug this further? It seems to be trying to
> resolve symbols in 'erlexec' - specifically getenv.
>
> Cheers
> Rick
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Page fault outside of application

2018-01-23 Thread Geraldo Netto
Hello Rick,

Rick, could you please, provide the full output with the -V ?
eg: scripts/run.py  -V

I may be wrong but erlexec may not work in OSv
because OSv does not provide fork(), execXX(), ...
also, if I'm not mistaken, elf support is incomplete which means you
can only load native software in a dlopen() fashion
Other friends from this list may provide more information/details/fix
any misinformation i might have said


Kind Regards,

Geraldo Netto
Sapere Aude => Non dvcor, dvco
http://exdev.sf.net/

On 23 January 2018 at 08:40, Rick Payne  wrote:
>
> A few moving parts, so not sure what is causing this - but trying to start
> an erlang application I'm seeing this:
>
> eth0: 192.168.122.61
> page fault outside application, addr: 0x1a60fe28
> [registers]
> RIP: 0x00492dd1  void*, long)+67>
> C
> (gdb) bt
> #0  processor::cli_hlt () at arch/x64/processor.hh:248
> #1  0x00209ac4 in arch::halt_no_interrupts () at arch/x64/arch.hh:48
> #2  0x00499033 in osv::halt () at arch/x64/power.cc:24
> #3  0x0022c65f in abort (fmt=0xa23855 "Aborted\n") at runtime.cc:132
> #4  0x0022c522 in abort () at runtime.cc:98
> #5  0x003c4b26 in mmu::vm_sigsegv (addr=17592360173096,
> ef=0x800104713068) at core/mmu.cc:1316
> #6  0x003c4bc2 in mmu::vm_fault (addr=17592360173096,
> ef=0x800104713068) at core/mmu.cc:1330
> #7  0x004887fd in page_fault (ef=0x800104713068)
> at arch/x64/mmu.cc:38
> #8  
> #9  0x00492dd1 in elf::object::arch_relocate_jump_slot (
> this=0xa001042d9e00, sym=1, addr=0x1a60fe28, addend=0)
> at arch/x64/arch-elf.cc:109
> #10 0x003fdfd7 in elf::object::resolve_pltgot (
> this=0xa001042d9e00, index=0) at core/elf.cc:692
> #11 0x004021ca in elf_resolve_pltgot (index=0,
> obj=0xa001042d9e00)
> at core/elf.cc:1538
> #12 0x0048727d in __elf_resolve_pltgot () at arch/x64/elf-dl.S:47
> #13 0xa001042d9e00 in ?? ()
> #14 0x042d9e00 in ?? ()
> #15 0x in ?? ()
>
> Any pointers as to how to debug this further? It seems to be trying to
> resolve symbols in 'erlexec' - specifically getenv.
>
> Cheers
> Rick
>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.