After you connect with gdb can you run 'osv mmap' and send us the output.
Make sure you run 'osv syms' before it and dump backtrace after. Please
see https://github.com/cloudius-systems/osv/wiki/Debugging-OSv for any
details.
BTW can you build and run OSv ZFS image on the host without NIX? As I
understand NIX is really just a layer on top of any Linux distribution, no?
I am afraid I do not still understand what exactly NiX is I guess.
On Monday, December 7, 2020 at 2:58:40 PM UTC-5 Matthew Kenigsberg wrote:
> (gdb) frame 18
> #18 0x000000004039c95a in elf::object::arch_relocate_jump_slot
> (this=this@entry=0xffffa0000110fa00, sym=...,
> addr=addr@entry=0x100000040ca8, addend=addend@entry=0) at
> arch/x64/arch-elf.cc:172
> 172 *static_cast<void**>(addr) = sym.relocated_addr();
> (gdb) print _pathname
> $14 = {static npos = 18446744073709551615,
> _M_dataplus = {<std::allocator<char>> =
> {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
> _M_p = 0xffffa0000110fa30 "/libzfs.so"}, _M_string_length = 10, {
> _M_local_buf = "/libzfs.so\000\000\000\000\000", _M_allocated_capacity
> = 3347131623889529903}}
>
> Also been wondering if nix using nonstandard paths is causing problems,
> like for libc:
> [nix-shell:~/osv/build/release]$ ldd libzfs.so
> linux-vdso.so.1 (0x00007ffcedbb9000)
> libuutil.so => not found
> libc.so.6 =>
> /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6
> (0x00007f7594f38000)
>
>
> /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib64/ld-linux-x86-64.so.2
>
> (0x00007f7595131000)
> On Sunday, December 6, 2020 at 8:43:10 AM UTC-7 [email protected] wrote:
>
>> It might be easier to simply print '_pathname' value if you switch to the
>> right frame in gdb. It would be nice to confirm that the problem we have is
>> with zpool.so and that might lead to understanding why this crash happens.
>> Maybe the is something wrong with building zpool.so.
>>
>> BTW based on this fragment of the stacktrace:
>>
>> #6 0x000000004035cb07 in elf::program::<lambda(const
>> elf::program::modules_list&)>::operator() (
>> __closure=<synthetic pointer>, __closure=<synthetic pointer>, ml=...)
>> at core/elf.cc:1620
>> #7 elf::program::with_modules<elf::program::lookup_addr(void
>> const*)::<lambda(const elf::program::modules_list&)> >
>> (f=..., this=0xffffa00000097e70) at include/osv/elf.hh:702
>> #8 elf::program::lookup_addr (this=0xffffa00000097e70,
>> addr=addr@entry=0x1000000254ce) at core/elf.cc:1617
>> #9 0x00000000404357cc in osv::lookup_name_demangled
>> (addr=addr@entry=0x1000000254ce,
>> buf=buf@entry=0xffff8000012146d0 "???+19630095", len=len@entry=1024)
>> at core/demangle.cc:47
>> #10 0x000000004023c4e0 in print_backtrace () at runtime.cc:85
>>
>> It seems we have a bug (or need of improvement) in print_backtrace() to
>> make it NOT try to demangle names like "???+19630095" which causes
>> follow-up fault.
>>
>> At the same time, it is strange that we crash at line 983 which seems to
>> indicate something goes wrong when processing zpool.so.
>>
>> 981 if (dynamic_exists(DT_HASH)) {
>>
>> 982 auto hashtab = dynamic_ptr<Elf64_Word>(DT_HASH);
>>
>> *983 return hashtab[1];*
>>
>> 984 }
>>
>> On Sunday, December 6, 2020 at 10:06:21 AM UTC-5 Waldek Kozaczuk wrote:
>>
>>> Can you run the ROFS image you built? Also as I understand it NIX is a
>>> package manager but what Linux distribution are you using?
>>>
>>> As far as ZFS goes could you enable ELF debugging - change this line:
>>>
>>> conf-debug_elf=0
>>>
>>> To
>>>
>>> conf-debug_elf=1
>>>
>>> In conf/base.mk, delete core/elf.o and force rebuild the kernel. I
>>> think you may also need to change the script upload_manifest.py to peeped
>>> ‘—verbose’ to the command line with cpiod.so
>>>
>>> It should show more info about elf loading. It may still be necessary to
>>> add extra printouts to capture which exact elf it is crashing on in
>>> arch_relocate_jump().
>>>
>>> In worst case I would need a copy of your loader-stripped.elf and
>>> possibly all the other files like cpiod.so, zfs.so that go into the bootfs
>>> part of the image.
>>>
>>> Regards,
>>> Waldek
>>>
>>>
>>> On Sat, Dec 5, 2020 at 19:31 Matthew Kenigsberg <[email protected]>
>>> wrote:
>>>
>>>> After forcing it to use the right path for libz.so.1, it's working with
>>>> rofs, but still having the same issue when using zfs, even after I correct
>>>> the path for libz.
>>>>
>>>> On Saturday, December 5, 2020 at 5:18:37 PM UTC-7 Matthew Kenigsberg
>>>> wrote:
>>>>
>>>>> gcc version 9.3.0 (GCC)
>>>>> QEMU emulator version 5.1.0
>>>>>
>>>>> Running with fs=rofs I get the error:
>>>>> Traceback (most recent call last):
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 369, in
>>>>> <module>
>>>>> main()
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 366, in main
>>>>> gen_image(outfile, manifest)
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 269, in
>>>>> gen_image
>>>>> system_structure_block, bytes_written = write_fs(fp, manifest)
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 246, in
>>>>> write_fs
>>>>> count, directory_entries_index = write_dir(fp, manifest.get(''),
>>>>> '', manifest)
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 207, in
>>>>> write_dir
>>>>> count, directory_entries_index = write_dir(fp, val, dirpath + '/'
>>>>> + entry, manifest)
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 207, in
>>>>> write_dir
>>>>> count, directory_entries_index = write_dir(fp, val, dirpath + '/'
>>>>> + entry, manifest)
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 222, in
>>>>> write_dir
>>>>> inode.count = write_file(fp, val)
>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 164, in
>>>>> write_file
>>>>> with open(path, 'rb') as f:
>>>>> FileNotFoundError: [Errno 2] No such file or directory: 'libz.so.1'
>>>>>
>>>>> I think that's from this line in usr.manifest?
>>>>> /usr/lib/libz.so.1: libz.so.1
>>>>>
>>>>> Don't have zlib in the manifest without fs=rofs, and I think zpool
>>>>> uses it?
>>>>>
>>>>> Looking into it...
>>>>> On Saturday, December 5, 2020 at 4:36:20 PM UTC-7 [email protected]
>>>>> wrote:
>>>>>
>>>>>> I can not reproduce it on Ubuntu 20.20 neither Fedora 33. Here is the
>>>>>> code fragment where it happens:
>>>>>>
>>>>>> 169 bool object::arch_relocate_jump_slot(symbol_module& sym, void
>>>>>> *addr, Elf64_Sxword addend)
>>>>>>
>>>>>> 170 {
>>>>>>
>>>>>> 171 if (sym.symbol) {
>>>>>>
>>>>>> 172 *static_cast<void**>(addr) = sym.relocated_addr();
>>>>>>
>>>>>> 173 return true;
>>>>>>
>>>>>> 174 } else {
>>>>>>
>>>>>> 175 return false;
>>>>>>
>>>>>> 176 }
>>>>>>
>>>>>> 177 }
>>>>>> It looks like writing at the addr 0x100000040ca8 in line 172 caused
>>>>>> the fault. Why?
>>>>>>
>>>>>> And then the 2nd page fault in the gdb backtrace as the 1st one was
>>>>>> being handled (not sure if that is a bug or just a state of loading of a
>>>>>> program).
>>>>>>
>>>>>> 981 if (dynamic_exists(DT_HASH)) {
>>>>>>
>>>>>> 982 auto hashtab = dynamic_ptr<Elf64_Word>(DT_HASH);
>>>>>>
>>>>>> 983 return hashtab[1];
>>>>>>
>>>>>> 984 }
>>>>>> Is something wrong with the elf files cpiod.so, mkfs.so or zfs.so or
>>>>>> something?
>>>>>>
>>>>>> Can you try to do the same with ROFS?
>>>>>>
>>>>>> fs=rofs
>>>>>> On Saturday, December 5, 2020 at 5:44:12 PM UTC-5 Matthew Kenigsberg
>>>>>> wrote:
>>>>>>
>>>>>>> Struggling to get scripts/build to run on NixOS because I'm getting
>>>>>>> a page fault. NixOS does keep shared libraries in nonstandard
>>>>>>> locations,
>>>>>>> not sure if that's breaking something. More details below, but any
>>>>>>> ideas?
>>>>>>>
>>>>>>> As far as I can tell, the error is caused by tools/mkfs/mkfs.cc:71:
>>>>>>> run_cmd("/zpool.so", zpool_args);
>>>>>>>
>>>>>>> The error from scripts/build:
>>>>>>>
>>>>>>> OSv v0.55.0-145-g97f17a7a
>>>>>>> eth0: 192.168.122.15
>>>>>>> Booted up in 154.38 ms
>>>>>>> Cmdline: /tools/mkfs.so; /tools/cpiod.so --prefix /zfs/zfs/; /zfs.so
>>>>>>> set compression=off osv
>>>>>>> Running mkfs...
>>>>>>> page fault outside application, addr: 0x0000100000040ca8
>>>>>>> [registers]
>>>>>>> RIP: 0x000000004039c25a
>>>>>>> <elf::object::arch_relocate_jump_slot(elf::symbol_module&, void*,
>>>>>>> long)+26>
>>>>>>> RFL: 0x0000000000010202 CS: 0x0000000000000008 SS:
>>>>>>> 0x0000000000000010
>>>>>>> RAX: 0x000010000007a340 RBX: 0x0000100000040ca8 RCX:
>>>>>>> 0x000010000006abb0 RDX: 0x0000000000000002
>>>>>>> RSI: 0x00002000001f6f70 RDI: 0xffffa00001058c00 RBP:
>>>>>>> 0x00002000001f6f30 R8: 0xffffa00000a68460
>>>>>>> R9: 0xffffa00000f18da0 R10: 0x0000000000000000 R11:
>>>>>>> 0x00000000409dd380 R12: 0xffffa00000f18c00
>>>>>>> R13: 0xffffa00000f18da0 R14: 0x0000000000000000 R15:
>>>>>>> 0x00000000409dd380 RSP: 0x00002000001f6f20
>>>>>>> Aborted
>>>>>>>
>>>>>>> [backtrace]
>>>>>>> 0x00000000403458d3 <???+1077172435>
>>>>>>> 0x00000000403477ce <mmu::vm_fault(unsigned long,
>>>>>>> exception_frame*)+350>
>>>>>>> 0x0000000040398ba2 <page_fault+162>
>>>>>>> 0x0000000040397a16 <???+1077508630>
>>>>>>> 0x0000000040360a13 <elf::object::resolve_pltgot(unsigned int)+387>
>>>>>>> 0x0000000040360c38 <elf_resolve_pltgot+56>
>>>>>>> 0x000000004039764f <???+1077507663>
>>>>>>> 0xffffa000012b880f <???+19630095>
>>>>>>>
>>>>>>> Trying to get a backtrace after connecting with gdb:
>>>>>>> (gdb) bt
>>>>>>> #0 abort (fmt=fmt@entry=0x40644b90 "Assertion failed: %s (%s: %s:
>>>>>>> %d)\n") at runtime.cc:105
>>>>>>> #1 0x000000004023c6fb in __assert_fail (expr=expr@entry=0x40672cf8
>>>>>>> "ef->rflags & processor::rflags_if",
>>>>>>> file=file@entry=0x40672d25 "arch/x64/mmu.cc",
>>>>>>> line=line@entry=38, func=func@entry=0x40672d1a "page_fault")
>>>>>>> at runtime.cc:139
>>>>>>> #2 0x0000000040398c05 in page_fault (ef=0xffff800000015048) at
>>>>>>> arch/x64/arch-cpu.hh:107
>>>>>>> #3 <signal handler called>
>>>>>>> #4 0x000000004035c879 in elf::object::symtab_len
>>>>>>> (this=0xffffa00000f18c00) at core/elf.cc:983
>>>>>>> #5 0x000000004035c938 in elf::object::lookup_addr
>>>>>>> (this=0xffffa00000f18c00, addr=addr@entry=0x1000000254ce)
>>>>>>> at core/elf.cc:1015
>>>>>>> #6 0x000000004035cb07 in elf::program::<lambda(const
>>>>>>> elf::program::modules_list&)>::operator() (
>>>>>>> __closure=<synthetic pointer>, __closure=<synthetic pointer>,
>>>>>>> ml=...) at core/elf.cc:1620
>>>>>>> #7 elf::program::with_modules<elf::program::lookup_addr(void
>>>>>>> const*)::<lambda(const elf::program::modules_list&)> >
>>>>>>> (f=..., this=0xffffa00000097e70) at include/osv/elf.hh:702
>>>>>>> #8 elf::program::lookup_addr (this=0xffffa00000097e70,
>>>>>>> addr=addr@entry=0x1000000254ce) at core/elf.cc:1617
>>>>>>> #9 0x00000000404357cc in osv::lookup_name_demangled
>>>>>>> (addr=addr@entry=0x1000000254ce,
>>>>>>> buf=buf@entry=0xffff8000012146d0 "???+19630095",
>>>>>>> len=len@entry=1024) at core/demangle.cc:47
>>>>>>> #10 0x000000004023c4e0 in print_backtrace () at runtime.cc:85
>>>>>>> #11 0x000000004023c6b4 in abort (fmt=fmt@entry=0x40644a9f
>>>>>>> "Aborted\n") at runtime.cc:121
>>>>>>> #12 0x0000000040202989 in abort () at runtime.cc:98
>>>>>>> #13 0x00000000403458d4 in mmu::vm_sigsegv (ef=0xffff800001215068,
>>>>>>> addr=<optimized out>) at core/mmu.cc:1314
>>>>>>> #14 mmu::vm_sigsegv (addr=<optimized out>, ef=0xffff800001215068) at
>>>>>>> core/mmu.cc:1308
>>>>>>> #15 0x00000000403477cf in mmu::vm_fault
>>>>>>> (addr=addr@entry=17592186309800, ef=ef@entry=0xffff800001215068)
>>>>>>> at core/mmu.cc:1328
>>>>>>> #16 0x0000000040398ba3 in page_fault (ef=0xffff800001215068) at
>>>>>>> arch/x64/mmu.cc:42
>>>>>>> #17 <signal handler called>
>>>>>>> #18 0x000000004039c25a in elf::object::arch_relocate_jump_slot
>>>>>>> (this=this@entry=0xffffa00000f18c00, sym=...,
>>>>>>> addr=addr@entry=0x100000040ca8, addend=addend@entry=0) at
>>>>>>> arch/x64/arch-elf.cc:172
>>>>>>> #19 0x0000000040360a14 in elf::object::resolve_pltgot
>>>>>>> (this=0xffffa00000f18c00, index=<optimized out>)
>>>>>>> at core/elf.cc:843
>>>>>>> #20 0x0000000040360c39 in elf_resolve_pltgot (index=308,
>>>>>>> obj=0xffffa00000f18c00) at core/elf.cc:1860
>>>>>>> #21 0x0000000040397650 in __elf_resolve_pltgot () at
>>>>>>> arch/x64/elf-dl.S:47
>>>>>>> #22 0x00001000000254cf in ?? ()
>>>>>>> #23 0xffffa000012b8800 in ?? ()
>>>>>>> #24 0x00002000001f74a0 in ?? ()
>>>>>>> #25 0x00001000000254cf in ?? ()
>>>>>>> #26 0x00002000001f7480 in ?? ()
>>>>>>> #27 0x00000000403f241c in calloc (nmemb=<optimized out>,
>>>>>>> size=<optimized out>) at core/mempool.cc:1811
>>>>>>> #28 0xffff900000a98000 in ?? ()
>>>>>>> #29 0x0000000000000000 in ?? ()
>>>>>>> On Saturday, November 28, 2020 at 1:39:46 PM UTC-7 Matthew
>>>>>>> Kenigsberg wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'll send something, might take a bit before I find time to work on
>>>>>>>> it though.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Matthew
>>>>>>>>
>>>>>>>> On Saturday, November 28, 2020 at 1:11:11 PM UTC-7 Roman Shaposhnik
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On Tue, Nov 24, 2020 at 8:03 AM Waldek Kozaczuk <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >
>>>>>>>>> > Hey,
>>>>>>>>> >
>>>>>>>>> > Send a patch with a new app that could demonstrate it, please,
>>>>>>>>> if you can. I would like to see it. Sounds like a nice improvement.
>>>>>>>>>
>>>>>>>>> FWIW: I'd love to see it too -- been meaning to play with Nix and
>>>>>>>>> this
>>>>>>>>> gives me a perfect excuse ;-)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Roman.
>>>>>>>>>
>>>>>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "OSv Development" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/osv-dev/rhjHPr7OBEw/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/osv-dev/7913b79b-6c06-4f2a-95d3-9dc44e45eb45n%40googlegroups.com
>>>>
>>>> <https://groups.google.com/d/msgid/osv-dev/7913b79b-6c06-4f2a-95d3-9dc44e45eb45n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>
--
You received this message because you are subscribed to the Google Groups "OSv
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/osv-dev/a34a3a3d-7732-4303-91bb-aea881adefecn%40googlegroups.com.