This is possibly one of the longest emails I have ever written. So bear with me ...
After all the recent patches, I have submitted that replaced many source files under libc/ (~70) and headers under include/api/ (~25) with their identical copies under musl/ folder, there is still a lot of stuff left to do something about. And this is not going to be an easy walk but then after some substantial research I have done, it seems to be doable. In general, in order to ease the upgrade to the newer version of musl (probably 1.1.24), we should first prepare (clean, prune) existing code under include/api and libc/ and at the same time identify files that do not need any change (should not be affected by upgrade). Before we agree on the exact steps I am proposing later (with some outstanding questions), let me first do an inventory of what is left under include/api and libc/. So at this point, we have: - 93 header files under include/api - (find include/api/ -type f -name \*.h | wc -l) - I think most of those originate from musl but were manually changed. I hope most of them with some effort can be replaced with the current or new version of musl - any other ideas? - Out of those, 19 files under include/api/aarch64 for sure come from newer version fo musl so until the upgrade they should stay as is - under libc/ (total of 215 files) - 15 C header files - 153 C source files - 6 C++ header files - 41 C++ source files Now let me break those down into following buckets: - The files I believe *do NOT need to change *with musl upgrade of because they originate from somewhere else or were written from scratch (mostly in C++); see some of my comments/questions next to each one [total of 56 files] - - misc/realpath.c - vdso/vdso.c // Somehow get rid of it - process/execle.c - process/execve.cc - process/waitpid.cc - unistd/getppid.c - unistd/getsid.c - unistd/getpgid.c - unistd/getpgrp.c - unistd/sethostname.c - math/finite.c - math/finitef.c - math/finitel.c - math/aliases.c // SEE if there is a way to somehow use aliases without mdifying musl or new musl has those aliases - misc/uname.c - linux/makedev.c // some relationship to musl macros - errno/strerror.c // small file originally taken from musl but does not look like the one in musl anymore - libc/libc.hh // depends on "internal/libc.h" - libc/pthread.hh - libc/pipe_buffer.hh - libc/network/__dns.hh - libc/signal.hh - libc/arch/x64/ucontext/ucontext.cc - libc/misc/backtrace.cc - libc/misc/mntent.cc - libc/misc/mntent.cc - libc/network/__dns.cc // Where did it come from? Does anything have/should come/get upgraded from musl or other place? - libc/process/execve.cc - libc/process/waitpid.cc - libc/stdio/__stdout_write.cc - libc/stdio/printf-hooks.cc - libc/unistd/setpgid.cc - just a stub - libc/unistd/setsid.cc - just a stub - libc/unistd/sync.cc - just a stub - libc/af_local.cc - libc/cxa_thread_atexit.cc - libc/cpu_set.cc - libc/dlfcn.cc - libc/eventfd.cc - libc/malloc_hooks.cc - libc/mallopt.cc - libc/mman.cc - libc/mount.cc - libc/notify.cc - libc/pipe.cc - libc/pipe_buffer.cc - libc/pthread.cc - libc/pthread_barrier.cc - libc/resource.cc - libc/sem.cc - libc/shm.cc - libc/signal.cc - libc/time.cc - libc/timerfd.cc - libc/user.cc - libc/arch/x64/setjmp/sigrtmin.c -> musl/src/signal/sigrtmin.c // there does not seem to be aarch64 version, but OSv has - The family of *_chk() function that provide small layer of glibc compability and *should NOT need to change*. Should we move them to libc/glibc-compat dir (there is already include/glibc-compat directory)?, total of 17 - - __read_chk.c - stdio/__fprintf_chk.c - stdio/__fread_chk.c - stdio/__vfprintf_chk.c - string/__stpcpy_chk.c - string/__strcpy_chk.c - string/__wcscpy_chk.c - string/__memcpy_chk.c - string/__strcat_chk.c - string/__memset_chk.c - string/__strncat_chk.c - string/__explicit_bzero_chk.c - string/__memmove_chk.c - string/__strncpy_chk.c - __pread64_chk.cc -> why is it a C++ function? - internal/_chk_fail.cc - misc/__longjmp_chk.cc -> why is it a C++ function? - Also, consider moving out __realpath_chk() out of misc/realpath.c to its own file. - Files that are different from current musl equivalent ones but should most likely be replaced in Makefile with the musl ones (see the point_to_musl_as_is.diff attachment): - string/strsignal.c - the current musl one seems to be newer and most up to date - no signficant differences it seems - internal/floatscan.h - internal/floatscan.c - internal/intscan.h - internal/intscan.c - internal/shgetc.c - possibly can be replaced with musl ones - prng/random.c - network/gai_strerror.cc -> ../musl/src/network/gai_strerror.c - misc/lockf.cc ../musl/src/misc/lockf.c // why is it C++? seems to have minimal changes - Files under arch (21 files) I have not idea if they are subject to upgrade - - libc/arch/aarch64/setjmp/sigsetjmp.s - libc/arch/aarch64/setjmp/sigrtmin.c - libc/arch/aarch64/setjmp/block.c - libc/arch/aarch64/setjmp/longjmp.s - libc/arch/aarch64/setjmp/siglongjmp.c - libc/arch/aarch64/setjmp/sigrtmax.c - libc/arch/aarch64/setjmp/setjmp.s - libc/arch/aarch64/atomic.h - libc/arch/x64/ucontext/getcontext.s - libc/arch/x64/ucontext/ucontext.cc - libc/arch/x64/ucontext/start_context.s - libc/arch/x64/ucontext/setcontext.s - libc/arch/x64/setjmp/sigsetjmp.s - libc/arch/x64/setjmp/sigrtmin.c - libc/arch/x64/setjmp/block.c - libc/arch/x64/setjmp/longjmp.s - libc/arch/x64/setjmp/siglongjmp.c - libc/arch/x64/setjmp/sigrtmax.c - libc/arch/x64/setjmp/setjmp.s - libc/arch/x64/atomic.h - libc/arch/arm/src/__aeabi_atexit.c - locale/ (26 files) - I think most of those come from older musl and can be replaced with their newer copies in current musl; I think libc/locale have two files for each family of functions - for example libc/locale/strcoll.c, libc/locale/strcoll_l.c vs single musl/src/locale/strcoll.c; see locale.diff attachment for details and this musl commit as an example - https://git.musl-libc.org/cgit/musl/diff/?id=4b0306c83c8c3614afbaf18a18e22d24f335ea04 - - - locale/strcoll_l.c - locale/wcsftime_l.c - locale/iswctype_l.c - locale/freelocale.c - locale/wcsxfrm_l.c - locale/uselocale.c - locale/setlocale.c - locale/strtold_l.c - locale/wcsxfrm.c - locale/strfmon.c - locale/strcoll.c - locale/wcscoll.c - locale/strxfrm.c - locale/strtof_l.c - locale/towlower_l.c - locale/towupper_l.c - locale/toupper_l.c - locale/nl_langinfo_l.c - locale/langinfo.c - locale/strtod_l.c - locale/strftime_l.c - locale/wctype_l.c - locale/wcscoll_l.c - locale/duplocale.c - locale/strxfrm_l.c - time/ - time functios that I think come from older musl but were not changed after; with little effort can be replaced (point to) musl originals; see the time.diff attachment and this musl commit - https://git.musl-libc.org/cgit/musl/diff/?id=1cc81f5cb0df2b66a795ff0c26d7bbc4d16e13c6 - - time/__tm_to_time.c - time/__time_to_tm.c - time/timegm.c - time/__asctime.c - time/__time.h - time/mktime.c - time/strftime.c - time/tzset.c - time/wcsftime.c - time/localtime_r.c - time/gmtime_r.c - time/localtime.c - time/gmtime.c - time/ftime.c - network/ - the diffs to musl - - network/if_indextoname.c - SYS_close replaced with close() and socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) replaced with socket(AF_UNIX, SOCK_DGRAM, 0) - network/gai_strerror.cc - ? - network/__ipparse.c - includes __dns.hh instead of __dns.h - network/if_nameindex.c - socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) replaced with socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, 0); - network/gethostbyname_r.c - I think it originates from musl and is identical except for extra gethostbyname() that can probably be moved to a separate file? - network/inet_addr.c - does NOT exist in musl - network/getifaddrs.c - completely different from musl copy - network/getnameinfo.c - seems to originate from muls but has extra code and maybe has to stay as is just like getaddrinfo() - network/res_init.c - only diff: weak_alias(res_init, __res_init) - network/if_nametoindex.c - similar to if_indextoname (see above) - network/__dns.hh/__dns.cc - should stay as is - network/getaddrinfo.c - seems to originate from muls but quite different from the current musl version so maybe should stay as is - network/inet_ntop.c - originates from musl and has not changed since; but different from current musl - network/inet_aton.c - does not exist in musl but the only commit says it was taken from musl-libc - unistd/ - only files that possibly are subject to upgrade - ttyname_r.c - implemented by Rean for node js and I do not think it comes from musl - ttyname.c is - almost identical to an existing file in musl (see attached diff) and delegates to ttyname_r() maybe we should point to musl one - I think we have a unit tests for both - stdlib/ - - wcstol.c - besides including different headers it replaces f.lock = -1 with f.no_locking = true (what is this exactly about?) - strtol.c - similar as wcstol.c - strtod.c - similar as wcstol.c - qsort_r.c - does not exist in current musl (I checked in 1.1.24) so maybe should stay as is - string/ - (_chk.c files and others not subject to upgrade skipped) - - string/memset.c - seems identical but memset_base() vs memset() - what is it about? - string/memmove.c - this is OSv specific so probably should stay as is - string/strtok_r.c - seems identical except for '__restrict' vs 'restrict' and #undef - string/stresep.c - per commit seems to come from netbsd (ftp.tku.edu.tw/NetBSD/NetBSD-current/src/lib/libc/string/stresep.c) -> should stay as is - string/__strndup.c - not in musl , seems to come from somewhere else - string/rawmemchr.c - not from musl, where does it come from? - string/strerror_r.c - originates from musl but seems to be completely different from current musl copy; modified to be both POSIX and glibc compliant - string/memcpy.c - implementation seems to be OSv specfic, so maybe is not subject to musl upgrade - string/strlcat.c - slightly different from current musl but has not been modfied since original import -> probably we should use musl version - string/explicit_bzero.c - not from musl it seems; glibc? - stdio - this one is the largest in terms of changes (but maybe not) and biggest mess - - fmemopen.c - has extra 'if (!libc.threaded) f->lock = -1;' to deal with some thread related issue - see commit 3f053b8f950be4b89fcb928f956ddc31ba8ba68a - ftrylockfile.c - pthread_self()->tid vs mutex_owned(&f->mutex) - __stdio_read.c - SYS_readv vs readv() mostly - open_wmemstream.c - extra 'if (!libc.threaded) f->lock = -1;' just like fmemopen.c - __fdopen.c - SYS_fcntl vs fcntl() and SYS_ioctl vs ioctl() mostly - __stdio_close.c - SYS_close vs close() - vfwscanf.c - quite different fromm current musl but seems to not have been modified for OSv purposes > so maybe just needs to be replaced in makefile to musl copy? - vdprintf.c - .lock = -1 vs .no_locking = true (what is this about?) - vfprintf.c - the only difference is handling of L prefix for long long int - __stdio_seek.c - SYS_lseek vs lseek() - tmpnam.c - SYS_access vs access() and SYS_clock_gettime vs clock_gettime() - getc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if (f->no_locking || !__lockfile(f)) - fopen.c - SYS_open vs open() and SYS_close vs close() - vsnprintf.c - .lock = -1 vs .no_locking = true plus extra handling of some overflow condition - flockfile.c - seems to have been rewritten for OSv, ftrylockfile vs mutex_owned() etc - vfscanf.c - seems to originate from musl but per couple of old commits from 2013 certian features have been disabled; should we repoint ot tp current musl? - vswscanf.c - '.lock = -1' vs '.no_locking = true' - stdin.c - modified for all kinds of reasons - freopen.c - SYS_fcntl vs fcntl() and __dup3() vs dup3() - sscanf.c - extra weak_alias for __isoc99_sscanf - tmpfile.c - SYS_open vs open() and SYS_close vs unlink() - stdout.c - extra '.lock = -1' - what is this about? - open_memstream.c - extra 'if (!libc.threaded) f->lock = -1;' - putc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if (f->no_locking || !__lockfile(f))' - __lockfile.c - seems to have been heavily adapted for OSv purposes - remove.c - SYS_unlink vs unlink() plus some POSIX compliance changes - fputc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if (f->no_locking || !__lockfile(f))' - __stdio_write.c - SYS_writev vs writev() - stderr.c - extra '.lock = -1,' - funlockfile.c - includes "pthread_impl.h" - fgetc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if (f->no_locking || !__lockfile(f))' - vsscanf.c - '.read = do_read, .lock = -1' vs '.read = do_read, .no_locking = true,' plus week_alias for __isoc99_vsscanf - vswprintf.c - 'f.lock = -1;' vs 'f.no_locking = true;' - __fopen_rb_ca.c - SYS_open vs open() plus 'f->lock = -1;' (what is this about?) >From above, It seems there are 3 categories of files under libc/: 1. Brand new implementations of libc functionality mostly in C++ that will not need to change as part of musl upgrade 2. Glibc compatibility or implementations not from musl; like above they will not need to change as part of musl upgrade 3. Files that originate from musl but need to be adapted for OSv specific reasons To help us with the upgrade we should add more unit tests. To that extent, I have found musl unit test - https://wiki.musl-libc.org/libc-test.html and possibly some unit tests from bionic ( https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/stdio_test.cpp or https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/stdlib_test.cpp). The bionic ones would need to be adapted but the musl ones could be used as-is. To minimize effort for future upgrade I was wondering if the following ideas could help us: - use some preprocessor macros "magic" to replace for syscall() calls with regular functions - it seems like many files under libc are original musl source except for these type of changes - in cases, we are forced to modify musl originals, maybe instead of storing modified files under libc/ like we have now, we should store the *.patch files instead and apply as part of new build process step; would it be better? So I think the upgrade plan should be this: 1. Add more unit tests 2. Prune/clean libc to minimize the set of files that are part of category 3 (see above) 3. Do the upgrade (per Nadav's original plan): - Bring the latest Musl version into a "musl-1.1.24" subdirectory in OSv. - Leave the old "musl/" and "libc/" directories as well. - In Makefile, make a new list of objects (say, "nmusl") which will take code from musl-1.1.24/ instead of musl/. - Start to switch individual files and directories from "musl += ..." to "nmusl += ...". We can start with the math functions needed for aarch64. Eventually, everything can be converted to nmusl and hopefully, nothing or little will break and need to be fixed. - Replace our include/api with musl-1.1.24/include. Would be even better to drop include/api, and just use musl-1.1.5/include directly if we can. I think, though, this is low priority, and might take quite a bit of work ("git log include/api" shows we modified this quite a bit since we took it from musl). I'm hoping that the old header files will work correctly also for the newer musl. - I think this might be tricky one at this point What do you think? Waldek -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/ea8c0bb0-9e1b-47be-bc80-4c34f6874614n%40googlegroups.com.