Denys Vlasenko wrote on Wed, Nov 08, 2023 at 04:50:12PM +0100: > On Wed, Nov 8, 2023 at 3:58 AM Dominique Martinet > <asmad...@codewreck.org> wrote: > > > > From: Dominique Martinet <dominique.marti...@atmark-techno.com> > > > > On alpine linux, calling busybox time with a nonexisting command would > > hang forever: > > ``` > > time: can't execute 'fdsa': No such file or directory > > Command exited with non-zero status 127 > > real 0m 0.00s > > user 0m 0.00s > > sys 0m 0.00s > > <hang> > > ``` > > > > This is because time uses vfork then BB_EXECVP_or_die, which ultimately > > calls exit() after execvp fails. exit() is explicitly listed as unsafe > > to call after vfork as the child process shares its memory with the > > parent; in particular on musl exit will take the init_fini_lock() and > > calling it twice in the same address space will just hang. > > _exit() on the other hand is safe to call, so replace die_func with it. > > Doesn't happen to me... maybe I have an older musl.
I've just checked old alpines -- I can reproduce this all the way from alpine 3.10 (musl 1.1.22, busybox 1.30.1) to current edge (musl 1.2.4). alpine 3.9 (musl 1.1.20, busybox 1.29.3) didn't hang; I didn't check all the way but time.c didn't seem to change in that time frame so it's probably a "new" (~2019) bug introduced by the libc change, yes. > > For this particular problem die_func could just be overriden in > > time.c's run_command after vfork, but this problem actually came up > > before in shell/hush.c as per the fflush_and__exit comment so it seems > > more appropriate to do this globally in the xvfork macro and get rid of > > this whole class of bugs forever. > > Maybe we can always use _exit(), > or fflush_all(); _exit() in xfunc_die()? hmm, exit does a bunch of stuff: - we apparently use atexit() in a few applets, so these would need the real exit() to get invoked - any lib destructor would also be called, I'm not sure we rely on any so that's probably fine. Same with on_exit(), not used directly. - The man page for exit(3) on my system also says files created by tmpfile(3) are removed; we don't seem to use it either. - flushing and closinig all FILEs; I agree close itself isn't needed, fflush_all() as you've suggested is probably enough... And hush.c has been calling it after vfork so it's probably fine? because while the address space is shared, that takes and releases locks when it's done, so while it's not explicitely safe it's probably ok. And fflush is definitely needed in the normal case. I'm not sure there's anything else that needs to be done on exit? So that leaves atexit hooks which we need in some applets (mount, sed -- the one in libpwdgrp is just a free "to make valgrind happy" so can be skipped); if we update these two to override die_func instead (don't even need to use exit there, they can just register their hook as die_func instead of atexit), then I guess it would be ok to do `fflush_all(); _exit()` in xfunc_die(), but I'm not 100% comfortable with that so that'll require a bunch of testing. I can update the patch if you really prefer this, but I won't be able to test much more than running the test suite on x86_64/debian and alpines (I guess it's a good excuse to add a new test item for 'time notavalidcommand' at the same time), it'll probably require a bit more validiation. -- Dominique Martinet | Asmadeus _______________________________________________ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox