It would be also nice to understand if we are crashing on the 1st arch_relocate_jump_slot() for libfzs.so or is it a specific JUMP_SLOT that causes this crash?
On Tuesday, December 8, 2020 at 10:39:06 AM UTC-5 Waldek Kozaczuk wrote: > After you connect with gdb can you run 'osv mmap' and send us the output. > Make sure you run 'osv syms' before it and dump backtrace after. Please see > https://github.com/cloudius-systems/osv/wiki/Debugging-OSv for any > details. > > BTW can you build and run OSv ZFS image on the host without NIX? As I > understand NIX is really just a layer on top of any Linux distribution, no? > I am afraid I do not still understand what exactly NiX is I guess. > > > On Monday, December 7, 2020 at 2:58:40 PM UTC-5 Matthew Kenigsberg wrote: > >> (gdb) frame 18 >> #18 0x000000004039c95a in elf::object::arch_relocate_jump_slot >> (this=this@entry=0xffffa0000110fa00, sym=..., >> addr=addr@entry=0x100000040ca8, addend=addend@entry=0) at >> arch/x64/arch-elf.cc:172 >> 172 *static_cast<void**>(addr) = sym.relocated_addr(); >> (gdb) print _pathname >> $14 = {static npos = 18446744073709551615, >> _M_dataplus = {<std::allocator<char>> = >> {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, >> _M_p = 0xffffa0000110fa30 "/libzfs.so"}, _M_string_length = 10, { >> _M_local_buf = "/libzfs.so\000\000\000\000\000", >> _M_allocated_capacity = 3347131623889529903}} >> >> Also been wondering if nix using nonstandard paths is causing problems, >> like for libc: >> [nix-shell:~/osv/build/release]$ ldd libzfs.so >> linux-vdso.so.1 (0x00007ffcedbb9000) >> libuutil.so => not found >> libc.so.6 => >> /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6 >> (0x00007f7594f38000) >> >> >> /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib64/ld-linux-x86-64.so.2 >> >> (0x00007f7595131000) >> On Sunday, December 6, 2020 at 8:43:10 AM UTC-7 [email protected] wrote: >> >>> It might be easier to simply print '_pathname' value if you switch to >>> the right frame in gdb. It would be nice to confirm that the problem we >>> have is with zpool.so and that might lead to understanding why this crash >>> happens. Maybe the is something wrong with building zpool.so. >>> >>> BTW based on this fragment of the stacktrace: >>> >>> #6 0x000000004035cb07 in elf::program::<lambda(const >>> elf::program::modules_list&)>::operator() ( >>> __closure=<synthetic pointer>, __closure=<synthetic pointer>, >>> ml=...) at core/elf.cc:1620 >>> #7 elf::program::with_modules<elf::program::lookup_addr(void >>> const*)::<lambda(const elf::program::modules_list&)> > >>> (f=..., this=0xffffa00000097e70) at include/osv/elf.hh:702 >>> #8 elf::program::lookup_addr (this=0xffffa00000097e70, >>> addr=addr@entry=0x1000000254ce) at core/elf.cc:1617 >>> #9 0x00000000404357cc in osv::lookup_name_demangled >>> (addr=addr@entry=0x1000000254ce, >>> buf=buf@entry=0xffff8000012146d0 "???+19630095", len=len@entry=1024) >>> at core/demangle.cc:47 >>> #10 0x000000004023c4e0 in print_backtrace () at runtime.cc:85 >>> >>> It seems we have a bug (or need of improvement) in print_backtrace() to >>> make it NOT try to demangle names like "???+19630095" which causes >>> follow-up fault. >>> >>> At the same time, it is strange that we crash at line 983 which seems to >>> indicate something goes wrong when processing zpool.so. >>> >>> 981 if (dynamic_exists(DT_HASH)) { >>> >>> 982 auto hashtab = dynamic_ptr<Elf64_Word>(DT_HASH); >>> >>> *983 return hashtab[1];* >>> >>> 984 } >>> >>> On Sunday, December 6, 2020 at 10:06:21 AM UTC-5 Waldek Kozaczuk wrote: >>> >>>> Can you run the ROFS image you built? Also as I understand it NIX is a >>>> package manager but what Linux distribution are you using? >>>> >>>> As far as ZFS goes could you enable ELF debugging - change this line: >>>> >>>> conf-debug_elf=0 >>>> >>>> To >>>> >>>> conf-debug_elf=1 >>>> >>>> In conf/base.mk, delete core/elf.o and force rebuild the kernel. I >>>> think you may also need to change the script upload_manifest.py to peeped >>>> ‘—verbose’ to the command line with cpiod.so >>>> >>>> It should show more info about elf loading. It may still be necessary >>>> to add extra printouts to capture which exact elf it is crashing on in >>>> arch_relocate_jump(). >>>> >>>> In worst case I would need a copy of your loader-stripped.elf and >>>> possibly all the other files like cpiod.so, zfs.so that go into the bootfs >>>> part of the image. >>>> >>>> Regards, >>>> Waldek >>>> >>>> >>>> On Sat, Dec 5, 2020 at 19:31 Matthew Kenigsberg <[email protected]> >>>> wrote: >>>> >>>>> After forcing it to use the right path for libz.so.1, it's working >>>>> with rofs, but still having the same issue when using zfs, even after I >>>>> correct the path for libz. >>>>> >>>>> On Saturday, December 5, 2020 at 5:18:37 PM UTC-7 Matthew Kenigsberg >>>>> wrote: >>>>> >>>>>> gcc version 9.3.0 (GCC) >>>>>> QEMU emulator version 5.1.0 >>>>>> >>>>>> Running with fs=rofs I get the error: >>>>>> Traceback (most recent call last): >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 369, in >>>>>> <module> >>>>>> main() >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 366, in main >>>>>> gen_image(outfile, manifest) >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 269, in >>>>>> gen_image >>>>>> system_structure_block, bytes_written = write_fs(fp, manifest) >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 246, in >>>>>> write_fs >>>>>> count, directory_entries_index = write_dir(fp, manifest.get(''), >>>>>> '', manifest) >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 207, in >>>>>> write_dir >>>>>> count, directory_entries_index = write_dir(fp, val, dirpath + '/' >>>>>> + entry, manifest) >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 207, in >>>>>> write_dir >>>>>> count, directory_entries_index = write_dir(fp, val, dirpath + '/' >>>>>> + entry, manifest) >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 222, in >>>>>> write_dir >>>>>> inode.count = write_file(fp, val) >>>>>> File "/home/matthew/osv/scripts/gen-rofs-img.py", line 164, in >>>>>> write_file >>>>>> with open(path, 'rb') as f: >>>>>> FileNotFoundError: [Errno 2] No such file or directory: 'libz.so.1' >>>>>> >>>>>> I think that's from this line in usr.manifest? >>>>>> /usr/lib/libz.so.1: libz.so.1 >>>>>> >>>>>> Don't have zlib in the manifest without fs=rofs, and I think zpool >>>>>> uses it? >>>>>> >>>>>> Looking into it... >>>>>> On Saturday, December 5, 2020 at 4:36:20 PM UTC-7 [email protected] >>>>>> wrote: >>>>>> >>>>>>> I can not reproduce it on Ubuntu 20.20 neither Fedora 33. Here is >>>>>>> the code fragment where it happens: >>>>>>> >>>>>>> 169 bool object::arch_relocate_jump_slot(symbol_module& sym, void >>>>>>> *addr, Elf64_Sxword addend) >>>>>>> >>>>>>> 170 { >>>>>>> >>>>>>> 171 if (sym.symbol) { >>>>>>> >>>>>>> 172 *static_cast<void**>(addr) = sym.relocated_addr(); >>>>>>> >>>>>>> 173 return true; >>>>>>> >>>>>>> 174 } else { >>>>>>> >>>>>>> 175 return false; >>>>>>> >>>>>>> 176 } >>>>>>> >>>>>>> 177 } >>>>>>> It looks like writing at the addr 0x100000040ca8 in line 172 caused >>>>>>> the fault. Why? >>>>>>> >>>>>>> And then the 2nd page fault in the gdb backtrace as the 1st one was >>>>>>> being handled (not sure if that is a bug or just a state of loading of >>>>>>> a >>>>>>> program). >>>>>>> >>>>>>> 981 if (dynamic_exists(DT_HASH)) { >>>>>>> >>>>>>> 982 auto hashtab = dynamic_ptr<Elf64_Word>(DT_HASH); >>>>>>> >>>>>>> 983 return hashtab[1]; >>>>>>> >>>>>>> 984 } >>>>>>> Is something wrong with the elf files cpiod.so, mkfs.so or zfs.so or >>>>>>> something? >>>>>>> >>>>>>> Can you try to do the same with ROFS? >>>>>>> >>>>>>> fs=rofs >>>>>>> On Saturday, December 5, 2020 at 5:44:12 PM UTC-5 Matthew Kenigsberg >>>>>>> wrote: >>>>>>> >>>>>>>> Struggling to get scripts/build to run on NixOS because I'm getting >>>>>>>> a page fault. NixOS does keep shared libraries in nonstandard >>>>>>>> locations, >>>>>>>> not sure if that's breaking something. More details below, but any >>>>>>>> ideas? >>>>>>>> >>>>>>>> As far as I can tell, the error is caused by tools/mkfs/mkfs.cc:71: >>>>>>>> run_cmd("/zpool.so", zpool_args); >>>>>>>> >>>>>>>> The error from scripts/build: >>>>>>>> >>>>>>>> OSv v0.55.0-145-g97f17a7a >>>>>>>> eth0: 192.168.122.15 >>>>>>>> Booted up in 154.38 ms >>>>>>>> Cmdline: /tools/mkfs.so; /tools/cpiod.so --prefix /zfs/zfs/; >>>>>>>> /zfs.so set compression=off osv >>>>>>>> Running mkfs... >>>>>>>> page fault outside application, addr: 0x0000100000040ca8 >>>>>>>> [registers] >>>>>>>> RIP: 0x000000004039c25a >>>>>>>> <elf::object::arch_relocate_jump_slot(elf::symbol_module&, void*, >>>>>>>> long)+26> >>>>>>>> RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: >>>>>>>> 0x0000000000000010 >>>>>>>> RAX: 0x000010000007a340 RBX: 0x0000100000040ca8 RCX: >>>>>>>> 0x000010000006abb0 RDX: 0x0000000000000002 >>>>>>>> RSI: 0x00002000001f6f70 RDI: 0xffffa00001058c00 RBP: >>>>>>>> 0x00002000001f6f30 R8: 0xffffa00000a68460 >>>>>>>> R9: 0xffffa00000f18da0 R10: 0x0000000000000000 R11: >>>>>>>> 0x00000000409dd380 R12: 0xffffa00000f18c00 >>>>>>>> R13: 0xffffa00000f18da0 R14: 0x0000000000000000 R15: >>>>>>>> 0x00000000409dd380 RSP: 0x00002000001f6f20 >>>>>>>> Aborted >>>>>>>> >>>>>>>> [backtrace] >>>>>>>> 0x00000000403458d3 <???+1077172435> >>>>>>>> 0x00000000403477ce <mmu::vm_fault(unsigned long, >>>>>>>> exception_frame*)+350> >>>>>>>> 0x0000000040398ba2 <page_fault+162> >>>>>>>> 0x0000000040397a16 <???+1077508630> >>>>>>>> 0x0000000040360a13 <elf::object::resolve_pltgot(unsigned int)+387> >>>>>>>> 0x0000000040360c38 <elf_resolve_pltgot+56> >>>>>>>> 0x000000004039764f <???+1077507663> >>>>>>>> 0xffffa000012b880f <???+19630095> >>>>>>>> >>>>>>>> Trying to get a backtrace after connecting with gdb: >>>>>>>> (gdb) bt >>>>>>>> #0 abort (fmt=fmt@entry=0x40644b90 "Assertion failed: %s (%s: %s: >>>>>>>> %d)\n") at runtime.cc:105 >>>>>>>> #1 0x000000004023c6fb in __assert_fail (expr=expr@entry=0x40672cf8 >>>>>>>> "ef->rflags & processor::rflags_if", >>>>>>>> file=file@entry=0x40672d25 "arch/x64/mmu.cc", >>>>>>>> line=line@entry=38, func=func@entry=0x40672d1a "page_fault") >>>>>>>> at runtime.cc:139 >>>>>>>> #2 0x0000000040398c05 in page_fault (ef=0xffff800000015048) at >>>>>>>> arch/x64/arch-cpu.hh:107 >>>>>>>> #3 <signal handler called> >>>>>>>> #4 0x000000004035c879 in elf::object::symtab_len >>>>>>>> (this=0xffffa00000f18c00) at core/elf.cc:983 >>>>>>>> #5 0x000000004035c938 in elf::object::lookup_addr >>>>>>>> (this=0xffffa00000f18c00, addr=addr@entry=0x1000000254ce) >>>>>>>> at core/elf.cc:1015 >>>>>>>> #6 0x000000004035cb07 in elf::program::<lambda(const >>>>>>>> elf::program::modules_list&)>::operator() ( >>>>>>>> __closure=<synthetic pointer>, __closure=<synthetic pointer>, >>>>>>>> ml=...) at core/elf.cc:1620 >>>>>>>> #7 elf::program::with_modules<elf::program::lookup_addr(void >>>>>>>> const*)::<lambda(const elf::program::modules_list&)> > >>>>>>>> (f=..., this=0xffffa00000097e70) at include/osv/elf.hh:702 >>>>>>>> #8 elf::program::lookup_addr (this=0xffffa00000097e70, >>>>>>>> addr=addr@entry=0x1000000254ce) at core/elf.cc:1617 >>>>>>>> #9 0x00000000404357cc in osv::lookup_name_demangled >>>>>>>> (addr=addr@entry=0x1000000254ce, >>>>>>>> buf=buf@entry=0xffff8000012146d0 "???+19630095", >>>>>>>> len=len@entry=1024) at core/demangle.cc:47 >>>>>>>> #10 0x000000004023c4e0 in print_backtrace () at runtime.cc:85 >>>>>>>> #11 0x000000004023c6b4 in abort (fmt=fmt@entry=0x40644a9f >>>>>>>> "Aborted\n") at runtime.cc:121 >>>>>>>> #12 0x0000000040202989 in abort () at runtime.cc:98 >>>>>>>> #13 0x00000000403458d4 in mmu::vm_sigsegv (ef=0xffff800001215068, >>>>>>>> addr=<optimized out>) at core/mmu.cc:1314 >>>>>>>> #14 mmu::vm_sigsegv (addr=<optimized out>, ef=0xffff800001215068) >>>>>>>> at core/mmu.cc:1308 >>>>>>>> #15 0x00000000403477cf in mmu::vm_fault >>>>>>>> (addr=addr@entry=17592186309800, ef=ef@entry=0xffff800001215068) >>>>>>>> at core/mmu.cc:1328 >>>>>>>> #16 0x0000000040398ba3 in page_fault (ef=0xffff800001215068) at >>>>>>>> arch/x64/mmu.cc:42 >>>>>>>> #17 <signal handler called> >>>>>>>> #18 0x000000004039c25a in elf::object::arch_relocate_jump_slot >>>>>>>> (this=this@entry=0xffffa00000f18c00, sym=..., >>>>>>>> addr=addr@entry=0x100000040ca8, addend=addend@entry=0) at >>>>>>>> arch/x64/arch-elf.cc:172 >>>>>>>> #19 0x0000000040360a14 in elf::object::resolve_pltgot >>>>>>>> (this=0xffffa00000f18c00, index=<optimized out>) >>>>>>>> at core/elf.cc:843 >>>>>>>> #20 0x0000000040360c39 in elf_resolve_pltgot (index=308, >>>>>>>> obj=0xffffa00000f18c00) at core/elf.cc:1860 >>>>>>>> #21 0x0000000040397650 in __elf_resolve_pltgot () at >>>>>>>> arch/x64/elf-dl.S:47 >>>>>>>> #22 0x00001000000254cf in ?? () >>>>>>>> #23 0xffffa000012b8800 in ?? () >>>>>>>> #24 0x00002000001f74a0 in ?? () >>>>>>>> #25 0x00001000000254cf in ?? () >>>>>>>> #26 0x00002000001f7480 in ?? () >>>>>>>> #27 0x00000000403f241c in calloc (nmemb=<optimized out>, >>>>>>>> size=<optimized out>) at core/mempool.cc:1811 >>>>>>>> #28 0xffff900000a98000 in ?? () >>>>>>>> #29 0x0000000000000000 in ?? () >>>>>>>> On Saturday, November 28, 2020 at 1:39:46 PM UTC-7 Matthew >>>>>>>> Kenigsberg wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'll send something, might take a bit before I find time to work >>>>>>>>> on it though. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Matthew >>>>>>>>> >>>>>>>>> On Saturday, November 28, 2020 at 1:11:11 PM UTC-7 Roman >>>>>>>>> Shaposhnik wrote: >>>>>>>>> >>>>>>>>>> On Tue, Nov 24, 2020 at 8:03 AM Waldek Kozaczuk < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> > >>>>>>>>>> > Hey, >>>>>>>>>> > >>>>>>>>>> > Send a patch with a new app that could demonstrate it, please, >>>>>>>>>> if you can. I would like to see it. Sounds like a nice improvement. >>>>>>>>>> >>>>>>>>>> FWIW: I'd love to see it too -- been meaning to play with Nix and >>>>>>>>>> this >>>>>>>>>> gives me a perfect excuse ;-) >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Roman. >>>>>>>>>> >>>>>>>>> -- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "OSv Development" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/osv-dev/rhjHPr7OBEw/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/osv-dev/7913b79b-6c06-4f2a-95d3-9dc44e45eb45n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/osv-dev/7913b79b-6c06-4f2a-95d3-9dc44e45eb45n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/073bcc64-150e-4a3d-9c93-05f833a95eebn%40googlegroups.com.
