It would be also nice to understand if we are crashing on the 1st 
arch_relocate_jump_slot() for libfzs.so or is it a specific JUMP_SLOT that 
causes this crash? 

On Tuesday, December 8, 2020 at 10:39:06 AM UTC-5 Waldek Kozaczuk wrote:

> After you connect with gdb can you run 'osv mmap' and send us the output. 
> Make sure you run 'osv syms' before it and dump backtrace after. Please see 
> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv for any 
> details.
>
> BTW can you build and run OSv ZFS image on the host without NIX? As I 
> understand NIX is really just a layer on top of any Linux distribution, no? 
> I am afraid I do not still understand what exactly NiX is I guess.
>
>
> On Monday, December 7, 2020 at 2:58:40 PM UTC-5 Matthew Kenigsberg wrote:
>
>> (gdb) frame 18
>> #18 0x000000004039c95a in elf::object::arch_relocate_jump_slot 
>> (this=this@entry=0xffffa0000110fa00, sym=..., 
>>     addr=addr@entry=0x100000040ca8, addend=addend@entry=0) at 
>> arch/x64/arch-elf.cc:172
>> 172            *static_cast<void**>(addr) = sym.relocated_addr();
>> (gdb) print _pathname
>> $14 = {static npos = 18446744073709551615, 
>>   _M_dataplus = {<std::allocator<char>> = 
>> {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, 
>>     _M_p = 0xffffa0000110fa30 "/libzfs.so"}, _M_string_length = 10, {
>>     _M_local_buf = "/libzfs.so\000\000\000\000\000", 
>> _M_allocated_capacity = 3347131623889529903}}
>>
>> Also been wondering if nix using nonstandard paths is causing problems, 
>> like for libc:
>> [nix-shell:~/osv/build/release]$ ldd libzfs.so 
>>     linux-vdso.so.1 (0x00007ffcedbb9000)
>>     libuutil.so => not found
>>     libc.so.6 => 
>> /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6 
>> (0x00007f7594f38000)
>>    
>>  
>> /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib64/ld-linux-x86-64.so.2
>>  
>> (0x00007f7595131000)
>> On Sunday, December 6, 2020 at 8:43:10 AM UTC-7 [email protected] wrote:
>>
>>> It might be easier to simply print '_pathname' value if you switch to 
>>> the right frame in gdb. It would be nice to confirm that the problem we 
>>> have is with zpool.so and that might lead to understanding why this crash 
>>> happens. Maybe the is something wrong with building zpool.so.
>>>
>>> BTW based on this fragment of the stacktrace:
>>>
>>> #6  0x000000004035cb07 in elf::program::<lambda(const 
>>> elf::program::modules_list&)>::operator() (
>>>     __closure=<synthetic pointer>, __closure=<synthetic pointer>, 
>>> ml=...) at core/elf.cc:1620
>>> #7  elf::program::with_modules<elf::program::lookup_addr(void 
>>> const*)::<lambda(const elf::program::modules_list&)> >
>>>     (f=..., this=0xffffa00000097e70) at include/osv/elf.hh:702
>>> #8  elf::program::lookup_addr (this=0xffffa00000097e70, 
>>> addr=addr@entry=0x1000000254ce) at core/elf.cc:1617
>>> #9  0x00000000404357cc in osv::lookup_name_demangled 
>>> (addr=addr@entry=0x1000000254ce,
>>>     buf=buf@entry=0xffff8000012146d0 "???+19630095", len=len@entry=1024) 
>>> at core/demangle.cc:47
>>> #10 0x000000004023c4e0 in print_backtrace () at runtime.cc:85
>>>
>>> It seems we have a bug (or need of improvement) in print_backtrace() to 
>>> make it NOT try to demangle names like "???+19630095" which causes 
>>> follow-up fault.
>>>
>>> At the same time, it is strange that we crash at line 983 which seems to 
>>> indicate something goes wrong when processing zpool.so.
>>>
>>>  981     if (dynamic_exists(DT_HASH)) {
>>>
>>>  982         auto hashtab = dynamic_ptr<Elf64_Word>(DT_HASH);
>>>
>>>  *983         return hashtab[1];*
>>>
>>>  984     }
>>>
>>> On Sunday, December 6, 2020 at 10:06:21 AM UTC-5 Waldek Kozaczuk wrote:
>>>
>>>> Can you run the ROFS image you built? Also as I understand it NIX is a 
>>>> package manager but what Linux distribution are you using?
>>>>
>>>> As far as ZFS goes could you enable ELF debugging - change this line:
>>>>
>>>> conf-debug_elf=0
>>>>
>>>> To
>>>>
>>>> conf-debug_elf=1
>>>>
>>>> In conf/base.mk, delete core/elf.o and force rebuild the kernel. I 
>>>> think you may also need to change the script upload_manifest.py to peeped 
>>>> ‘—verbose’ to the command line with cpiod.so
>>>>
>>>> It should show more info about elf loading. It may still be necessary 
>>>> to add extra printouts to capture which exact elf it is crashing on in 
>>>> arch_relocate_jump(). 
>>>>
>>>> In worst case I would need a copy of your loader-stripped.elf and 
>>>> possibly all the other files like cpiod.so, zfs.so that go into the bootfs 
>>>> part of the image. 
>>>>
>>>> Regards,
>>>> Waldek
>>>>
>>>>
>>>> On Sat, Dec 5, 2020 at 19:31 Matthew Kenigsberg <[email protected]> 
>>>> wrote:
>>>>
>>>>> After forcing it to use the right path for libz.so.1, it's working 
>>>>> with rofs, but still having the same issue when using zfs, even after I 
>>>>> correct the path for libz.
>>>>>
>>>>> On Saturday, December 5, 2020 at 5:18:37 PM UTC-7 Matthew Kenigsberg 
>>>>> wrote:
>>>>>
>>>>>> gcc version 9.3.0 (GCC)
>>>>>> QEMU emulator version 5.1.0
>>>>>>
>>>>>> Running with fs=rofs I get the error:
>>>>>> Traceback (most recent call last):
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 369, in 
>>>>>> <module>
>>>>>>     main()
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 366, in main
>>>>>>     gen_image(outfile, manifest)
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 269, in 
>>>>>> gen_image
>>>>>>     system_structure_block, bytes_written = write_fs(fp, manifest)
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 246, in 
>>>>>> write_fs
>>>>>>     count, directory_entries_index = write_dir(fp, manifest.get(''), 
>>>>>> '', manifest)
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 207, in 
>>>>>> write_dir
>>>>>>     count, directory_entries_index = write_dir(fp, val, dirpath + '/' 
>>>>>> + entry, manifest)
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 207, in 
>>>>>> write_dir
>>>>>>     count, directory_entries_index = write_dir(fp, val, dirpath + '/' 
>>>>>> + entry, manifest)
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 222, in 
>>>>>> write_dir
>>>>>>     inode.count = write_file(fp, val)
>>>>>>   File "/home/matthew/osv/scripts/gen-rofs-img.py", line 164, in 
>>>>>> write_file
>>>>>>     with open(path, 'rb') as f:
>>>>>> FileNotFoundError: [Errno 2] No such file or directory: 'libz.so.1'
>>>>>>
>>>>>> I think that's from this line in usr.manifest?
>>>>>> /usr/lib/libz.so.1: libz.so.1
>>>>>>
>>>>>> Don't have zlib in the manifest without fs=rofs, and I think zpool 
>>>>>> uses it?
>>>>>>
>>>>>> Looking into it...
>>>>>> On Saturday, December 5, 2020 at 4:36:20 PM UTC-7 [email protected] 
>>>>>> wrote:
>>>>>>
>>>>>>> I can not reproduce it on Ubuntu 20.20 neither Fedora 33. Here is 
>>>>>>> the code fragment where it happens:
>>>>>>>
>>>>>>> 169 bool object::arch_relocate_jump_slot(symbol_module& sym, void 
>>>>>>> *addr, Elf64_Sxword addend)
>>>>>>>
>>>>>>> 170 {
>>>>>>>
>>>>>>> 171     if (sym.symbol) {
>>>>>>>
>>>>>>> 172         *static_cast<void**>(addr) = sym.relocated_addr();
>>>>>>>
>>>>>>> 173         return true;
>>>>>>>
>>>>>>> 174     } else {
>>>>>>>
>>>>>>> 175         return false;
>>>>>>>
>>>>>>> 176     }
>>>>>>>
>>>>>>> 177 }
>>>>>>> It looks like writing at the addr 0x100000040ca8 in line 172 caused 
>>>>>>> the fault. Why?
>>>>>>>
>>>>>>> And then the 2nd page fault in the gdb backtrace as the 1st one was 
>>>>>>> being handled (not sure if that is a bug or just a state of loading of 
>>>>>>> a 
>>>>>>> program).
>>>>>>>
>>>>>>> 981     if (dynamic_exists(DT_HASH)) {
>>>>>>>
>>>>>>>  982         auto hashtab = dynamic_ptr<Elf64_Word>(DT_HASH);
>>>>>>>
>>>>>>>  983         return hashtab[1];
>>>>>>>
>>>>>>>  984     }
>>>>>>> Is something wrong with the elf files cpiod.so, mkfs.so or zfs.so or 
>>>>>>> something?
>>>>>>>
>>>>>>> Can you try to do the same with ROFS?
>>>>>>>
>>>>>>> fs=rofs
>>>>>>> On Saturday, December 5, 2020 at 5:44:12 PM UTC-5 Matthew Kenigsberg 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Struggling to get scripts/build to run on NixOS because I'm getting 
>>>>>>>> a page fault. NixOS does keep shared libraries in nonstandard 
>>>>>>>> locations, 
>>>>>>>> not sure if that's breaking something. More details below, but any 
>>>>>>>> ideas?
>>>>>>>>
>>>>>>>> As far as I can tell, the error is caused by tools/mkfs/mkfs.cc:71:
>>>>>>>>     run_cmd("/zpool.so", zpool_args);
>>>>>>>>
>>>>>>>> The error from scripts/build:
>>>>>>>>
>>>>>>>> OSv v0.55.0-145-g97f17a7a
>>>>>>>> eth0: 192.168.122.15
>>>>>>>> Booted up in 154.38 ms
>>>>>>>> Cmdline: /tools/mkfs.so; /tools/cpiod.so --prefix /zfs/zfs/; 
>>>>>>>> /zfs.so set compression=off osv
>>>>>>>> Running mkfs...
>>>>>>>> page fault outside application, addr: 0x0000100000040ca8
>>>>>>>> [registers]
>>>>>>>> RIP: 0x000000004039c25a 
>>>>>>>> <elf::object::arch_relocate_jump_slot(elf::symbol_module&, void*, 
>>>>>>>> long)+26>
>>>>>>>> RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:  
>>>>>>>> 0x0000000000000010
>>>>>>>> RAX: 0x000010000007a340  RBX: 0x0000100000040ca8  RCX: 
>>>>>>>> 0x000010000006abb0  RDX: 0x0000000000000002
>>>>>>>> RSI: 0x00002000001f6f70  RDI: 0xffffa00001058c00  RBP: 
>>>>>>>> 0x00002000001f6f30  R8:  0xffffa00000a68460
>>>>>>>> R9:  0xffffa00000f18da0  R10: 0x0000000000000000  R11: 
>>>>>>>> 0x00000000409dd380  R12: 0xffffa00000f18c00
>>>>>>>> R13: 0xffffa00000f18da0  R14: 0x0000000000000000  R15: 
>>>>>>>> 0x00000000409dd380  RSP: 0x00002000001f6f20
>>>>>>>> Aborted
>>>>>>>>
>>>>>>>> [backtrace]
>>>>>>>> 0x00000000403458d3 <???+1077172435>
>>>>>>>> 0x00000000403477ce <mmu::vm_fault(unsigned long, 
>>>>>>>> exception_frame*)+350>
>>>>>>>> 0x0000000040398ba2 <page_fault+162>
>>>>>>>> 0x0000000040397a16 <???+1077508630>
>>>>>>>> 0x0000000040360a13 <elf::object::resolve_pltgot(unsigned int)+387>
>>>>>>>> 0x0000000040360c38 <elf_resolve_pltgot+56>
>>>>>>>> 0x000000004039764f <???+1077507663>
>>>>>>>> 0xffffa000012b880f <???+19630095>
>>>>>>>>
>>>>>>>> Trying to get a backtrace after connecting with gdb:
>>>>>>>> (gdb) bt
>>>>>>>> #0  abort (fmt=fmt@entry=0x40644b90 "Assertion failed: %s (%s: %s: 
>>>>>>>> %d)\n") at runtime.cc:105
>>>>>>>> #1  0x000000004023c6fb in __assert_fail (expr=expr@entry=0x40672cf8 
>>>>>>>> "ef->rflags & processor::rflags_if", 
>>>>>>>>     file=file@entry=0x40672d25 "arch/x64/mmu.cc", 
>>>>>>>> line=line@entry=38, func=func@entry=0x40672d1a "page_fault")
>>>>>>>>     at runtime.cc:139
>>>>>>>> #2  0x0000000040398c05 in page_fault (ef=0xffff800000015048) at 
>>>>>>>> arch/x64/arch-cpu.hh:107
>>>>>>>> #3  <signal handler called>
>>>>>>>> #4  0x000000004035c879 in elf::object::symtab_len 
>>>>>>>> (this=0xffffa00000f18c00) at core/elf.cc:983
>>>>>>>> #5  0x000000004035c938 in elf::object::lookup_addr 
>>>>>>>> (this=0xffffa00000f18c00, addr=addr@entry=0x1000000254ce)
>>>>>>>>     at core/elf.cc:1015
>>>>>>>> #6  0x000000004035cb07 in elf::program::<lambda(const 
>>>>>>>> elf::program::modules_list&)>::operator() (
>>>>>>>>     __closure=<synthetic pointer>, __closure=<synthetic pointer>, 
>>>>>>>> ml=...) at core/elf.cc:1620
>>>>>>>> #7  elf::program::with_modules<elf::program::lookup_addr(void 
>>>>>>>> const*)::<lambda(const elf::program::modules_list&)> >
>>>>>>>>     (f=..., this=0xffffa00000097e70) at include/osv/elf.hh:702
>>>>>>>> #8  elf::program::lookup_addr (this=0xffffa00000097e70, 
>>>>>>>> addr=addr@entry=0x1000000254ce) at core/elf.cc:1617
>>>>>>>> #9  0x00000000404357cc in osv::lookup_name_demangled 
>>>>>>>> (addr=addr@entry=0x1000000254ce, 
>>>>>>>>     buf=buf@entry=0xffff8000012146d0 "???+19630095", 
>>>>>>>> len=len@entry=1024) at core/demangle.cc:47
>>>>>>>> #10 0x000000004023c4e0 in print_backtrace () at runtime.cc:85
>>>>>>>> #11 0x000000004023c6b4 in abort (fmt=fmt@entry=0x40644a9f 
>>>>>>>> "Aborted\n") at runtime.cc:121
>>>>>>>> #12 0x0000000040202989 in abort () at runtime.cc:98
>>>>>>>> #13 0x00000000403458d4 in mmu::vm_sigsegv (ef=0xffff800001215068, 
>>>>>>>> addr=<optimized out>) at core/mmu.cc:1314
>>>>>>>> #14 mmu::vm_sigsegv (addr=<optimized out>, ef=0xffff800001215068) 
>>>>>>>> at core/mmu.cc:1308
>>>>>>>> #15 0x00000000403477cf in mmu::vm_fault 
>>>>>>>> (addr=addr@entry=17592186309800, ef=ef@entry=0xffff800001215068)
>>>>>>>>     at core/mmu.cc:1328
>>>>>>>> #16 0x0000000040398ba3 in page_fault (ef=0xffff800001215068) at 
>>>>>>>> arch/x64/mmu.cc:42
>>>>>>>> #17 <signal handler called>
>>>>>>>> #18 0x000000004039c25a in elf::object::arch_relocate_jump_slot 
>>>>>>>> (this=this@entry=0xffffa00000f18c00, sym=..., 
>>>>>>>>     addr=addr@entry=0x100000040ca8, addend=addend@entry=0) at 
>>>>>>>> arch/x64/arch-elf.cc:172
>>>>>>>> #19 0x0000000040360a14 in elf::object::resolve_pltgot 
>>>>>>>> (this=0xffffa00000f18c00, index=<optimized out>)
>>>>>>>>     at core/elf.cc:843
>>>>>>>> #20 0x0000000040360c39 in elf_resolve_pltgot (index=308, 
>>>>>>>> obj=0xffffa00000f18c00) at core/elf.cc:1860
>>>>>>>> #21 0x0000000040397650 in __elf_resolve_pltgot () at 
>>>>>>>> arch/x64/elf-dl.S:47
>>>>>>>> #22 0x00001000000254cf in ?? ()
>>>>>>>> #23 0xffffa000012b8800 in ?? ()
>>>>>>>> #24 0x00002000001f74a0 in ?? ()
>>>>>>>> #25 0x00001000000254cf in ?? ()
>>>>>>>> #26 0x00002000001f7480 in ?? ()
>>>>>>>> #27 0x00000000403f241c in calloc (nmemb=<optimized out>, 
>>>>>>>> size=<optimized out>) at core/mempool.cc:1811
>>>>>>>> #28 0xffff900000a98000 in ?? ()
>>>>>>>> #29 0x0000000000000000 in ?? ()
>>>>>>>> On Saturday, November 28, 2020 at 1:39:46 PM UTC-7 Matthew 
>>>>>>>> Kenigsberg wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'll send something, might take a bit before I find time to work 
>>>>>>>>> on it though.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Matthew
>>>>>>>>>
>>>>>>>>> On Saturday, November 28, 2020 at 1:11:11 PM UTC-7 Roman 
>>>>>>>>> Shaposhnik wrote:
>>>>>>>>>
>>>>>>>>>> On Tue, Nov 24, 2020 at 8:03 AM Waldek Kozaczuk <
>>>>>>>>>> [email protected]> wrote: 
>>>>>>>>>> > 
>>>>>>>>>> > Hey, 
>>>>>>>>>> > 
>>>>>>>>>> > Send a patch with a new app that could demonstrate it, please, 
>>>>>>>>>> if you can. I would like to see it. Sounds like a nice improvement. 
>>>>>>>>>>
>>>>>>>>>> FWIW: I'd love to see it too -- been meaning to play with Nix and 
>>>>>>>>>> this 
>>>>>>>>>> gives me a perfect excuse ;-) 
>>>>>>>>>>
>>>>>>>>>> Thanks, 
>>>>>>>>>> Roman. 
>>>>>>>>>>
>>>>>>>>> -- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "OSv Development" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/osv-dev/rhjHPr7OBEw/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/osv-dev/7913b79b-6c06-4f2a-95d3-9dc44e45eb45n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/osv-dev/7913b79b-6c06-4f2a-95d3-9dc44e45eb45n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/073bcc64-150e-4a3d-9c93-05f833a95eebn%40googlegroups.com.

Reply via email to