This is possibly one of the longest emails I have ever written. So bear 
with me ...

After all the recent patches, I have submitted that replaced many source 
files under libc/ (~70) and headers under include/api/ (~25) with their 
identical copies under musl/ folder, there is still a lot of stuff left to 
do something about. And this is not going to be an easy walk but then after 
some substantial research I have done, it seems to be doable. In general, 
in order to ease the upgrade to the newer version of musl (probably 
1.1.24), we should first prepare (clean, prune) existing code under 
include/api and libc/ and at the same time identify files that do not need 
any change (should not be affected by upgrade).

Before we agree on the exact steps I am proposing later (with some 
outstanding questions), let me first do an inventory of what is left under 
include/api and libc/.

So at this point, we have:

   - 93 header files under include/api - (find include/api/ -type f -name 
   \*.h | wc -l)
      - I think most of those originate from musl but were manually 
      changed. I hope most of them with some effort can be replaced with the 
      current or new version of musl - any other ideas?
      - Out of those, 19 files under include/api/aarch64 for sure come from 
      newer version fo musl so until the upgrade they should stay as is
   - under libc/ (total of 215 files)
      - 15 C header files
      - 153 C source files 
      - 6 C++ header files
      - 41 C++ source files
   
Now let me break those down into following buckets:

   - The files I believe *do NOT need to change *with musl upgrade of 
   because they originate from somewhere else or were written from scratch 
   (mostly in C++); see some of my comments/questions next to each one [total 
   of 56 files]
      - 
      - misc/realpath.c
      - vdso/vdso.c // Somehow get rid of it
      - process/execle.c
      - process/execve.cc
      - process/waitpid.cc
      - unistd/getppid.c
      - unistd/getsid.c
      - unistd/getpgid.c
      - unistd/getpgrp.c
      - unistd/sethostname.c
      - math/finite.c
      - math/finitef.c
      - math/finitel.c
      - math/aliases.c // SEE if there is a way to somehow use aliases 
      without mdifying musl or new musl has those aliases
      - misc/uname.c
      - linux/makedev.c // some relationship to musl macros
      - errno/strerror.c // small file originally taken from musl but does 
      not look like the one in musl anymore
      - libc/libc.hh // depends on "internal/libc.h"
      - libc/pthread.hh
      - libc/pipe_buffer.hh
      - libc/network/__dns.hh
      - libc/signal.hh
      - libc/arch/x64/ucontext/ucontext.cc
      - libc/misc/backtrace.cc
      - libc/misc/mntent.cc
      - libc/misc/mntent.cc
      - libc/network/__dns.cc // Where did it come from? Does anything 
      have/should come/get upgraded from musl or other place?
      - libc/process/execve.cc
      - libc/process/waitpid.cc
      - libc/stdio/__stdout_write.cc
      - libc/stdio/printf-hooks.cc
      - libc/unistd/setpgid.cc - just a stub
      - libc/unistd/setsid.cc - just a stub
      - libc/unistd/sync.cc - just a stub
      - libc/af_local.cc
      - libc/cxa_thread_atexit.cc
      - libc/cpu_set.cc
      - libc/dlfcn.cc
      - libc/eventfd.cc
      - libc/malloc_hooks.cc
      - libc/mallopt.cc
      - libc/mman.cc
      - libc/mount.cc
      - libc/notify.cc
      - libc/pipe.cc
      - libc/pipe_buffer.cc
      - libc/pthread.cc
      - libc/pthread_barrier.cc
      - libc/resource.cc
      - libc/sem.cc
      - libc/shm.cc
      - libc/signal.cc
      - libc/time.cc
      - libc/timerfd.cc
      - libc/user.cc
      - libc/arch/x64/setjmp/sigrtmin.c -> musl/src/signal/sigrtmin.c // 
      there does not seem to be aarch64 version, but OSv has
      - The family of *_chk() function that provide small layer of glibc 
   compability and *should NOT need to change*. Should we move them to 
   libc/glibc-compat dir (there is already include/glibc-compat directory)?, 
   total of 17
      - 
      - __read_chk.c
      - stdio/__fprintf_chk.c
      - stdio/__fread_chk.c
      - stdio/__vfprintf_chk.c
      - string/__stpcpy_chk.c
      - string/__strcpy_chk.c
      - string/__wcscpy_chk.c
      - string/__memcpy_chk.c
      - string/__strcat_chk.c
      - string/__memset_chk.c
      - string/__strncat_chk.c
      - string/__explicit_bzero_chk.c
      - string/__memmove_chk.c
      - string/__strncpy_chk.c
      - __pread64_chk.cc -> why is it a C++ function?
      - internal/_chk_fail.cc
      - misc/__longjmp_chk.cc -> why is it a C++ function?
      - Also, consider moving out __realpath_chk() out of misc/realpath.c 
      to its own file.
      - Files that are different from current musl equivalent ones but 
   should most likely be replaced in Makefile with the musl ones (see the 
   point_to_musl_as_is.diff attachment):
      - string/strsignal.c - the current musl one seems to be newer and 
      most up to date
      - no signficant differences it seems
         - internal/floatscan.h
         - internal/floatscan.c
         - internal/intscan.h
         - internal/intscan.c
         - internal/shgetc.c
      - possibly can be replaced with musl ones
         - prng/random.c
         - network/gai_strerror.cc -> ../musl/src/network/gai_strerror.c
         - misc/lockf.cc ../musl/src/misc/lockf.c // why is it C++? seems 
         to have minimal changes
      - Files under arch (21 files) I have not idea if they are subject to 
   upgrade
      - 
      - libc/arch/aarch64/setjmp/sigsetjmp.s
      - libc/arch/aarch64/setjmp/sigrtmin.c
      - libc/arch/aarch64/setjmp/block.c
      - libc/arch/aarch64/setjmp/longjmp.s
      - libc/arch/aarch64/setjmp/siglongjmp.c
      - libc/arch/aarch64/setjmp/sigrtmax.c
      - libc/arch/aarch64/setjmp/setjmp.s
      - libc/arch/aarch64/atomic.h
      - libc/arch/x64/ucontext/getcontext.s
      - libc/arch/x64/ucontext/ucontext.cc
      - libc/arch/x64/ucontext/start_context.s
      - libc/arch/x64/ucontext/setcontext.s
      - libc/arch/x64/setjmp/sigsetjmp.s
      - libc/arch/x64/setjmp/sigrtmin.c
      - libc/arch/x64/setjmp/block.c
      - libc/arch/x64/setjmp/longjmp.s
      - libc/arch/x64/setjmp/siglongjmp.c
      - libc/arch/x64/setjmp/sigrtmax.c
      - libc/arch/x64/setjmp/setjmp.s
      - libc/arch/x64/atomic.h
      - libc/arch/arm/src/__aeabi_atexit.c
   - locale/ (26 files) - I think most of those come from older musl and 
   can be replaced with their newer copies in current musl; I think 
   libc/locale have two files for each family of functions - for example 
   libc/locale/strcoll.c, libc/locale/strcoll_l.c vs single 
   musl/src/locale/strcoll.c; see locale.diff attachment for details and this 
   musl commit as an example - 
   
https://git.musl-libc.org/cgit/musl/diff/?id=4b0306c83c8c3614afbaf18a18e22d24f335ea04
      - 
      - 
      - locale/strcoll_l.c
      - locale/wcsftime_l.c
      - locale/iswctype_l.c
      - locale/freelocale.c
      - locale/wcsxfrm_l.c
      - locale/uselocale.c
      - locale/setlocale.c
      - locale/strtold_l.c
      - locale/wcsxfrm.c
      - locale/strfmon.c
      - locale/strcoll.c
      - locale/wcscoll.c
      - locale/strxfrm.c
      - locale/strtof_l.c
      - locale/towlower_l.c
      - locale/towupper_l.c
      - locale/toupper_l.c
      - locale/nl_langinfo_l.c
      - locale/langinfo.c
      - locale/strtod_l.c
      - locale/strftime_l.c
      - locale/wctype_l.c
      - locale/wcscoll_l.c
      - locale/duplocale.c
      - locale/strxfrm_l.c
   - time/ - time functios that I think come from older musl but were not 
   changed after; with little effort can be replaced (point to) musl 
   originals; see the time.diff attachment and this musl commit - 
   
https://git.musl-libc.org/cgit/musl/diff/?id=1cc81f5cb0df2b66a795ff0c26d7bbc4d16e13c6
   - 
      - time/__tm_to_time.c
      - time/__time_to_tm.c
      - time/timegm.c
      - time/__asctime.c
      - time/__time.h
      - time/mktime.c
      - time/strftime.c
      - time/tzset.c
      - time/wcsftime.c
      - time/localtime_r.c
      - time/gmtime_r.c
      - time/localtime.c
      - time/gmtime.c
      - time/ftime.c
   - network/ - the diffs to musl
   - 
      - network/if_indextoname.c - SYS_close replaced with close() and 
      socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) replaced with socket(AF_UNIX, 
      SOCK_DGRAM, 0)
      - network/gai_strerror.cc - ?
      - network/__ipparse.c - includes __dns.hh instead of __dns.h
      - network/if_nameindex.c - socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 
      0) replaced with socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, 0);
      - network/gethostbyname_r.c - I think it originates from musl and is 
      identical except for extra gethostbyname() that can probably be moved to 
a 
      separate file?
      - network/inet_addr.c - does NOT exist in musl
      - network/getifaddrs.c - completely different from musl copy
      - network/getnameinfo.c - seems to originate from muls but has extra 
      code and maybe has to stay as is just like getaddrinfo()
      - network/res_init.c - only diff: weak_alias(res_init, __res_init)
      - network/if_nametoindex.c - similar to if_indextoname (see above)
      - network/__dns.hh/__dns.cc - should stay as is 
      - network/getaddrinfo.c - seems to originate from muls but quite 
      different from the current musl version so maybe should stay as is
      - network/inet_ntop.c - originates from musl and has not changed 
      since; but different from current musl
      - network/inet_aton.c - does not exist in musl but the only commit 
      says it was taken from musl-libc
   - unistd/ - only files that possibly are subject to upgrade
      - ttyname_r.c - implemented by Rean for node js and I do not think it 
      comes from musl
      - ttyname.c is - almost identical to an existing file in musl (see 
      attached diff) and delegates to ttyname_r() maybe we should point to musl 
      one
      - I think we have a unit tests for both
   - stdlib/
      - 
      - wcstol.c - besides including different headers it replaces f.lock = 
      -1 with f.no_locking = true (what is this exactly about?)
      - strtol.c - similar as wcstol.c
      - strtod.c - similar as wcstol.c
      - qsort_r.c - does not exist in current musl (I checked in 1.1.24) so 
      maybe should stay as is
   - string/ - (_chk.c files and others not subject to upgrade skipped)
   - 
      - string/memset.c - seems identical but memset_base() vs memset() - 
      what is it about?
      - string/memmove.c - this is OSv specific so probably should stay as 
      is
      - string/strtok_r.c - seems identical except for '__restrict' vs 
      'restrict' and #undef
      - string/stresep.c - per commit seems to come from netbsd 
      (ftp.tku.edu.tw/NetBSD/NetBSD-current/src/lib/libc/string/stresep.c) -> 
      should stay as is
      - string/__strndup.c  - not in musl , seems to come from somewhere 
      else
      - string/rawmemchr.c - not from musl, where does it come from?
      - string/strerror_r.c - originates from musl but seems to be 
      completely different from current musl copy; modified to be both POSIX 
and 
      glibc compliant
      - string/memcpy.c - implementation seems to be OSv specfic, so maybe 
      is not subject to musl upgrade
      - string/strlcat.c - slightly different from current musl but has not 
      been modfied since original import -> probably we should use musl version
      - string/explicit_bzero.c - not from musl it seems; glibc?
      - stdio - this one is the largest in terms of changes (but maybe not) 
   and biggest mess
   - 
      - fmemopen.c - has extra 'if (!libc.threaded) f->lock = -1;' to deal 
      with some thread related issue - see commit 
      3f053b8f950be4b89fcb928f956ddc31ba8ba68a
      - ftrylockfile.c - pthread_self()->tid vs mutex_owned(&f->mutex)
      - __stdio_read.c - SYS_readv vs readv() mostly
      - open_wmemstream.c - extra 'if (!libc.threaded) f->lock = -1;' just 
      like fmemopen.c
      - __fdopen.c - SYS_fcntl vs fcntl() and SYS_ioctl vs ioctl() mostly
      - __stdio_close.c - SYS_close vs close()
      - vfwscanf.c - quite different fromm current musl but seems to not 
      have been modified for OSv purposes > so maybe just needs to be replaced 
in 
      makefile to musl copy?
      - vdprintf.c - .lock = -1 vs .no_locking = true (what is this about?)
      - vfprintf.c - the only difference is handling of L prefix for long 
      long int
      - __stdio_seek.c - SYS_lseek vs lseek()
      - tmpnam.c - SYS_access vs access() and SYS_clock_gettime vs 
      clock_gettime()
      - getc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if (f->no_locking 
      || !__lockfile(f))
      - fopen.c - SYS_open vs open() and SYS_close vs close()
      - vsnprintf.c - .lock = -1 vs .no_locking = true plus extra handling 
      of some overflow condition
      - flockfile.c - seems to have been rewritten for OSv, ftrylockfile vs 
      mutex_owned() etc
      - vfscanf.c - seems to originate from musl but per couple of old 
      commits from 2013 certian features have been disabled; should we repoint 
ot 
      tp current musl?
      - vswscanf.c - '.lock = -1' vs '.no_locking = true'
      - stdin.c - modified for all kinds of reasons
      - freopen.c - SYS_fcntl vs fcntl() and __dup3() vs dup3()
      - sscanf.c - extra weak_alias for __isoc99_sscanf
      - tmpfile.c - SYS_open vs open() and SYS_close vs unlink()
      - stdout.c - extra '.lock = -1' - what is this about?
      - open_memstream.c - extra 'if (!libc.threaded) f->lock = -1;'
      - putc.c - 'if (f->lock < 0 || !__lockfile(f))' vs 'if (f->no_locking 
      || !__lockfile(f))'
      - __lockfile.c - seems to have been heavily adapted for OSv purposes
      - remove.c - SYS_unlink vs unlink() plus some POSIX compliance changes
      - fputc.c -  'if (f->lock < 0 || !__lockfile(f))' vs 'if 
      (f->no_locking || !__lockfile(f))'
      - __stdio_write.c - SYS_writev vs writev()
      - stderr.c - extra '.lock = -1,'
      - funlockfile.c - includes "pthread_impl.h"
      - fgetc.c -  'if (f->lock < 0 || !__lockfile(f))' vs 'if 
      (f->no_locking || !__lockfile(f))'
      - vsscanf.c - '.read = do_read, .lock = -1' vs '.read = do_read, 
      .no_locking = true,' plus week_alias for __isoc99_vsscanf
      - vswprintf.c  - 'f.lock = -1;' vs 'f.no_locking = true;'
      - __fopen_rb_ca.c - SYS_open vs open() plus 'f->lock = -1;' (what is 
      this about?)
   
>From above, It seems there are 3 categories of files under libc/:

   1. Brand new implementations of libc functionality mostly in C++ that 
   will not need to change as part of musl upgrade
   2. Glibc compatibility or implementations not from musl; like above they 
   will not need to change as part of musl upgrade
   3. Files that originate from musl but need to be adapted for OSv 
   specific reasons


To help us with the upgrade we should add more unit tests. To that extent, 
I have found musl unit test - https://wiki.musl-libc.org/libc-test.html and 
possibly some unit tests from bionic (
https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/stdio_test.cpp
 or 
https://android.googlesource.com/platform/bionic/+/refs/heads/master/tests/stdlib_test.cpp).
 
The bionic ones would need to be adapted but the musl ones could be used 
as-is.

To minimize effort for future upgrade I was wondering if the following 
ideas could help us:
- use some preprocessor macros "magic" to replace for syscall() calls with 
regular functions - it seems like many files under libc are original musl 
source except for these type of changes
- in cases, we are forced to modify musl originals, maybe instead of 
storing modified files under libc/ like we have now, we should store the 
*.patch files instead and apply as part of new build process step; would it 
be better?

So I think the upgrade plan should be this: 
1. Add more unit tests
2. Prune/clean libc to minimize the set of files that are part of category 
3 (see above)
3. Do the upgrade (per Nadav's original plan):

   - Bring the latest Musl version into a "musl-1.1.24" subdirectory in OSv.
   - Leave the old "musl/" and "libc/" directories as well.
   - In Makefile, make a new list of objects (say, "nmusl") which will take 
   code from musl-1.1.24/ instead of musl/.
   - Start to switch individual files and directories from "musl += ..." to 
   "nmusl += ...".  We can start with the math functions needed for aarch64. 
   Eventually, everything can be converted to nmusl and hopefully, nothing or 
   little will break and need to be fixed.
   - Replace our include/api with musl-1.1.24/include. Would be even better 
   to drop include/api, and just use musl-1.1.5/include directly if we can. I 
   think, though, this is low priority, and might take quite a bit of work 
   ("git log include/api" shows we modified this quite a bit since we took it 
   from musl). I'm hoping that the old header files will work correctly also 
   for the newer musl.
      - I think this might be tricky one at this point
   

What do you think?

Waldek

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/ea8c0bb0-9e1b-47be-bc80-4c34f6874614n%40googlegroups.com.

Reply via email to