Hi, thanks for the prompt replies. I was just typing up a long response (and ended up suspecting similar culprits) until I noticed there were 2 new replies. Thanks for looking into it. I'm guessing this means it is more or less impossible to use the pxz application, unless I rewrite the source such that it does not fork the xz executable?
On Thursday, 21 May 2020 21:20:47 UTC+2, Waldek Kozaczuk wrote: > > I think this code in the app might explain this huge malloc: > > lzma_options_lzma lzma_options; > xzcmd_max = sysconf(_SC_ARG_MAX); > page_size = sysconf(_SC_PAGE_SIZE); > xzcmd = malloc(xzcmd_max); > > > On Thursday, May 21, 2020 at 3:16:29 PM UTC-4, Waldek Kozaczuk wrote: >> >> I connected with gdb and here is stacktrace I got for the main app thread: >> >> #0 sched::thread::switch_to (this=this@entry=0xffff8000001d1040) at >> arch/x64/arch-switch.hh:108 >> #1 0x000000004040dace in sched::cpu::reschedule_from_interrupt >> (this=0xffff80000001e040, called_from_yield=called_from_yield@entry=false, >> preempt_after=..., preempt_after@entry=...) at core/sched.cc:339 >> #2 0x000000004040e800 in sched::cpu::schedule () at >> include/osv/sched.hh:1315 >> #3 0x000000004040e8e6 in sched::thread::wait >> (this=this@entry=0xffff800000f0a040) at core/sched.cc:1216 >> #4 0x000000004043ca86 in sched::thread::do_wait_for<lockfree::mutex, >> sched::wait_object<waitqueue> > (mtx=...) at include/osv/mutex.h:41 >> #5 sched::thread::wait_for<waitqueue&> (mtx=...) at >> include/osv/sched.hh:1225 >> #6 waitqueue::wait (this=this@entry=0x408fa650 <mmu::vma_list_mutex+48>, >> mtx=...) at core/waitqueue.cc:56 >> #7 0x00000000403eb27b in rwlock::reader_wait_lockable (this=<optimized >> out>) at core/rwlock.cc:174 >> #8 rwlock::rlock (this=this@entry=0x408fa620 <mmu::vma_list_mutex>) at >> core/rwlock.cc:29 >> #9 0x000000004034b88c in rwlock_for_read::lock (this=0x408fa620 >> <mmu::vma_list_mutex>) at include/osv/rwlock.h:113 >> #10 std::lock_guard<rwlock_for_read&>::lock_guard (__m=..., >> this=<synthetic pointer>) at /usr/include/c++/9/bits/std_mutex.h:159 >> #11 lock_guard_for_with_lock<rwlock_for_read&>::lock_guard_for_with_lock >> (lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89 >> #12 mmu::vm_fault (addr=17592186081280, addr@entry=17592186083096, >> ef=ef@entry=0xffff800000f0f068) at core/mmu.cc:1333 >> #13 0x00000000403adf7c in page_fault (ef=0xffff800000f0f068) at >> arch/x64/mmu.cc:42 >> #14 <signal handler called> >> #15 0x00000000405bf0cd in _Unwind_IteratePhdrCallback () >> #16 0x000000004047fd37 in <lambda(const >> elf::program::modules_list&)>::operator() (ml=..., __closure=<synthetic >> pointer>) at libc/dlfcn.cc:118 >> #17 elf::program::with_modules<dl_iterate_phdr(int (*)(dl_phdr_info*, >> size_t, void*), void*)::<lambda(const elf::program::modules_list&)> > >> (f=..., >> this=0xffffa0000009cbb0) at include/osv/elf.hh:698 >> #18 dl_iterate_phdr (callback=0x405befa0 <_Unwind_IteratePhdrCallback>, >> data=0x200000700520) at libc/dlfcn.cc:99 >> #19 0x00000000405c0255 in _Unwind_Find_FDE () >> #20 0x00000000405bc693 in uw_frame_state_for () >> #21 0x00000000405be1da in _Unwind_RaiseException () >> #22 0x00000000404c4d1c in __cxa_throw () >> #23 0x0000000040205229 in mmu::find_hole (start=<optimized out>, >> size=<optimized out>) at include/osv/error.h:36 >> #24 0x000000004034ecea in mmu::allocate (v=v@entry=0xffffa00000cf2b80, >> start=35184372088832, start@entry=0, size=size@entry=9223372036854779904, >> search=search@entry=true) at core/mmu.cc:1113 >> #25 0x000000004034fa97 in mmu::map_anon (addr=addr@entry=0x0, >> size=size@entry=9223372036854779904, flags=flags@entry=2, perm=perm@entry=3) >> at core/mmu.cc:1219 >> #26 0x00000000403f89a0 in memory::mapped_malloc_large (offset=64, >> size=9223372036854779904) at core/mempool.cc:919 >> #27 memory::malloc_large (size=9223372036854779904, alignment=16, >> block=true, contiguous=false) at core/mempool.cc:919 >> #28 0x00000000403fa272 in std_malloc (size=9223372036854775807, >> alignment=16) at core/mempool.cc:1795 >> #29 0x00000000403fa63b in malloc (size=9223372036854775807) at >> core/mempool.cc:2001 >> #30 0x00001000000075d5 in main () >> #31 0x0000000040444c11 in osv::application::run_main >> (this=0xffffa0007ffb4210) at /usr/include/c++/9/bits/stl_vector.h:915 >> #32 0x0000000040444d65 in __libc_start_main (main=0x100000007560 <main>) >> at core/app.cc:37 >> #33 0x000010000000801e in _start () >> >> It is trying to allocate tons of memory and it looks like we crash in >> find_hole() probably with throw make_error(ENOMEM); >> >> I wonder if it is app (https://github.com/jnovy/pxz/blob/master/pxz.c) >> passing such memory size or is there some bug on our side? >> >> (BTW osv info threads fails like this - would be nice to fix it: >> >> (gdb) osv info threads >> 1 (0xffff800000017040) reclaimer cpu0 status::waiting >> condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:43 >> vruntime 6.07461e-25 >> Python Exception <class 'Exception'> Class does not extend >> list_base_hook: sched::timer_base: >> Error occurred in Python: Class does not extend list_base_hook: >> sched::timer_base >> ) >> >> When I examined pxz.c it eventually calls execvpe() which will definitely >> NOT work in OSv (OSv does not support processes so forking does not work -> >> there is some research fork that does that which I sent paper about >> recently). >> >> 135 void __attribute__((noreturn)) run_xz( char **argv, char **envp ) { >> 136 execve(XZ_BINARY, argv, envp); >> 137 error(0, errno, "execution of "XZ_BINARY" binary failed"); >> 138 exit(EXIT_FAILURE); >> 139 } >> >> xz seems to work fine (at least --help): >> >> ./scripts/manifest_from_host.sh -w xz && ./scripts/build >> --append-manifest fs=rofs >> ./scripts/firecracker.py >> OSv v0.55.0-9-gc13529d9 >> Booted up in 7.42 ms >> Cmdline: /xz --help >> Usage: /xz [OPTION]... [FILE]... >> Compress or decompress FILEs in the .xz format. >> >> -z, --compress force compression >> -d, --decompress force decompression >> -t, --test test compressed file integrity >> -l, --list list information about .xz files >> -k, --keep keep (don't delete) input files >> -f, --force force overwrite of output file and (de)compress >> links >> -c, --stdout write to standard output and don't delete input >> files >> -0 ... -9 compression preset; default is 6; take compressor >> *and* >> decompressor memory usage into account before using >> 7-9! >> -e, --extreme try to improve compression ratio by using more CPU >> time; >> does not affect decompressor memory requirements >> -T, --threads=NUM use at most NUM threads; the default is 1; set to 0 >> to use as many threads as there are processor cores >> -q, --quiet suppress warnings; specify twice to suppress errors >> too >> -v, --verbose be verbose; specify twice for even more verbose >> -h, --help display this short help and exit >> -H, --long-help display the long help (lists also the advanced >> options) >> -V, --version display the version number and exit >> >> With no FILE, or when FILE is -, read standard input. >> >> Report bugs to <lasse...@tukaani.org <javascript:>> (in English or >> Finnish). >> XZ Utils home page: <https://tukaani.org/xz/> >> >> Waldek >> >> On Thursday, May 21, 2020 at 6:59:07 AM UTC-4, Nadav Har'El wrote: >>> >>> On Thu, May 21, 2020 at 12:46 PM De Vries <f1r3fl...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Sorry if this is a bit of a newbie question. I'm trying to run a pretty >>>> simple application on OSv: pxz <https://github.com/jnovy/pxz>. I'm >>>> able to run other apps like mysql for example without any problem. >>>> I have tried this the following way. First, I compiled the pxz >>>> executable with the -fPIE flag on the host machine, then put it in a new >>>> folder at osv/apps/pxz. I then ran the following: >>>> ./scripts/manifest_from_host.sh -r ~/osv/apps/pxz/pxz > ./apps/pxz/usr. >>>> manifest >>>> ./scripts/build image=pxz >>>> >>>> It generates the following usr.manifest >>>> # (PIE) Position Independent Executable >>>> /pxz: /home/user1/osv/apps/pxz/pxz >>>> # -------------------- >>>> # Dependencies >>>> # -------------------- >>>> /usr/lib/libgomp.so.1: /usr/lib/x86_64-linux-gnu/libgomp.so.1 >>>> /usr/lib/liblzma.so.5: /lib/x86_64-linux-gnu/liblzma.so.5 >>>> # -------------------- >>>> >>>> Running it with >>>> ./scripts/run.py -e "pxz --version" >>>> >>>> Results in >>>> OSv v0.55.0-6-g557251e1 >>>> eth0: 192.168.122.15 >>>> Booted up in 407.56 ms >>>> Cmdline: pxz --version >>>> >>>> But it just hangs. No errors, but also no output. I have tried actually >>>> using pxz (not just --version) to compress a file but that also hangs >>>> indefinitely (while this works fine on the host machine). >>>> >>> >>> It's hard to say. It seems like you did everything right. I assume that >>> if you run "pxz --version" on the host it works properly - prints a version >>> number and exits - right? >>> During the "hang", does OSv do some busy loop ("top" will show you the >>> OSv vm taking 100% CPU) or waits for something? >>> >>> One thing you can do to figure out what is going on is to attach gdb to >>> the running VM, and inquire from it what threads are running, and what they >>> are waiting for. >>> It's not trivial to do, but not particular difficult either, and >>> explained well (I hope) here: >>> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb >>> Note that you don't need to rebuild OSv specially for debugging to debug >>> it this way. >>> >>> >>>> >>>> Running ./scripts/run.py with the -V flag looks completely fine except >>>> maybe for the last line that is printed (after it prints Cmdline: pxz >>>> --version): >>>> sysconf(): stubbed for parameter 0 >>>> >>>> >>> This is a _SC_ARG_MAX parameter to sysconf(), it is indeed not >>> implemented (and can be trivially implemented) but I doubt that this is the >>> problem causing the hang (I also wonder why this program would need to >>> check _SC_ARG_MAX if it's just planning to print the version number, not >>> exec() anything - you can look at this software's source code to see what >>> it does with _SC_ARG_MAX. >>> >>> >>> >>>> I have also tried to run pxz using the way its done in the >>>> native-example application, but that also results in it hanging >>>> indefinitely. >>>> What could be the issue here? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "OSv Development" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to osv...@googlegroups.com. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/0be8f468-8df1-4964-b376-3ba2219abb47%40googlegroups.com.