[Correction to a rather misleading wording in my description of the failure sequencing.]
On Jul 21, 2024, at 03:36, Mark Millard <mark...@yahoo.com> wrote: > On Jul 20, 2024, at 16:42, Mark Millard <mark...@yahoo.com> wrote: > >> On Jul 20, 2024, at 01:57, Konstantin Belousov <kostik...@gmail.com> wrote: >> >>> [Everything and everybody in Cc: are stripped for good]. >>> >>> On Fri, Jul 19, 2024 at 10:38:36PM -0700, Mark Millard wrote: >>>> 0x201375c0 - 0x2014092c is .bss in /lib/libthr.so.3 >>>> >>>> (gdb) bt >>>> #0 0x201aeec0 in __pthread_map_stacks_exec () from /lib/libc.so.7 >>>> #1 0x2005d1e4 in ?? () from /libexec/ld-elf.so.1 >>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>>> (gdb) disass >>>> Dump of assembler code for function __pthread_map_stacks_exec: >>>> => 0x201aeec0 <+0>: ldr r0, [pc, #8] @ 0x201aeed0 >>>> <__pthread_map_stacks_exec+16> >>>> 0x201aeec4 <+4>: add r0, pc, r0 >>>> 0x201aeec8 <+8>: ldr r0, [r0, #156] @ 0x9c >>>> 0x201aeecc <+12>: bx r0 >>>> 0x201aeed0 <+16>: andseq r6, r7, r4, lsr #12 >>>> End of assembler dump. >>>> >>> >>> Do the following: >>> 1. Rebuild rtld/libc/libthr with the debugging info and no optimization, >>> i.e. ensure that flags are "-O0 -g" or "-Og -g" and not -O2. See >>> the first comment in libexec/rtld-elf/Makefile for the hint how to >>> do it. >> >> I did a full buildworld with "-Og -g" via temporary >> use of: >> >> diff --git a/share/mk/sys.mk b/share/mk/sys.mk >> index 44db9266784f..9c6c7ce575a4 100644 >> --- a/share/mk/sys.mk >> +++ b/share/mk/sys.mk >> @@ -145,7 +145,8 @@ CC ?= c89 >> CFLAGS ?= -O >> .else >> CC ?= cc >> -CFLAGS ?= -O2 -pipe >> +#CFLAGS ?= -O2 -pipe >> +CFLAGS ?= -Og -g -pipe >> .if defined(NO_STRICT_ALIASING) >> CFLAGS += -fno-strict-aliasing >> .endif >> >> I installed the result armv7 world into a >> directory tree and installed pkg and cairo. >> >>> 2. Reproduce the issue >> >> The dlopen_test.c based case does not fail under the world >> built with "-Og -g": >> >> # cc -g -std=c11 -pedantic -Wall -pthread dlopen_test.c ; ./a.out >> # >> >>> under gdb >> >> (gdb) run >> Starting program: /root/a.out [Inferior 1 (process 36680) exited normally] >> (gdb) >> >> So it does not reproduce in gdb when buildworld was based >> on "-Og -g". > > I found another context that has useful debugger information > and also fails. It avoids graphviz being involved: > > ) a pkgbase install that I had around (pkgbase has debug information) > ) also set up /home/pkgbuild/worktrees/main/ to refer to the /usr/src/ that > pkgbase put in place > ) pkg install cairo > ) use of my simple dlopen program > > (gdb) run > Starting program: /root/a.out > Catchpoint 7 > Inferior loaded /lib/libgcc_s.so.1 > /lib/libthr.so.3 > /lib/libc.so.7 > /lib/libsys.so.7 > r_debug_state (rd=<optimized out>, m=<optimized out>) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4485 > 4485 } > (gdb) c > Continuing. > > Breakpoint 3, get_program_var_addr (name=0x20042f2a "__progname", > lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 > 4523 symlook_init(&req, name); > (gdb) c > Continuing. > > Breakpoint 3, get_program_var_addr (name=0x20043c97 "environ", lockstate=0x0) > at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 > 4523 symlook_init(&req, name); > (gdb) c > Continuing. > > Breakpoint 3, get_program_var_addr (name=0x20043c9f "__elf_aux_vector", > lockstate=0x0) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 > 4523 symlook_init(&req, name); > (gdb) c > Continuing. > > Breakpoint 3, get_program_var_addr (name=0x200442e8 "__libc_atexit", > lockstate=lockstate@entry=0xffffd668) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 > 4523 symlook_init(&req, name); > (gdb) c > Continuing. > > Catchpoint 7 > Inferior loaded /usr/local/lib/libcairo.so.2 > /usr/local/lib/libpixman-1.so.0 > /usr/local/lib/libfontconfig.so.1 > /usr/local/lib/libfreetype.so.6 > /usr/local/lib/libEGL.so.1 > /usr/lib/libdl.so.1 > /usr/local/lib/libpng16.so.16 > /usr/local/lib/libxcb-shm.so.0 > /usr/local/lib/libxcb.so.1 > /usr/local/lib/libxcb-render.so.0 > /usr/local/lib/libXrender.so.1 > /usr/local/lib/libX11.so.6 > /usr/local/lib/libXext.so.6 > /lib/libz.so.6 > /usr/local/lib/libGL.so.1 > /lib/libm.so.5 > /usr/local/lib/libexpat.so.1 > /usr/lib/libbz2.so.4 > /usr/local/lib/libbrotlidec.so.1 > /usr/local/lib/libGLdispatch.so.0 > /usr/local/lib/libXau.so.6 > /usr/local/lib/libXdmcp.so.6 > /usr/local/lib/libGLX.so.0 > /usr/local/lib/libbrotlicommon.so.1 > r_debug_state (rd=<optimized out>, m=<optimized out>) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4485 > 4485 } > (gdb) c > Continuing. > > Breakpoint 3, get_program_var_addr (name=0x200435bf > "__pthread_map_stacks_exec", lockstate=0xffffd290) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:4523 > 4523 symlook_init(&req, name); > (gdb) c > Continuing. > > Breakpoint 8.3, _thr_stack_fix_protection (thrd=0x20070000) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 > 140 round_up(thrd->attr.guardsize_attr), > (gdb) bt > #0 _thr_stack_fix_protection (thrd=0x20070000) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 > #1 __thr_map_stacks_exec () at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 > #2 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 > #3 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", > fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized > out>, mode=1, lockstate=0xffffd290) > at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 > #4 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", > fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 > #5 0x00020510 in main () at dlopen_test.c:14 > (gdb) s > 139 mprotect((char *)thrd->attr.stackaddr_attr + > (gdb) s > 141 round_up(thrd->attr.stacksize_attr), > (gdb) s > 140 round_up(thrd->attr.guardsize_attr), > (gdb) s > round_up (size=4096) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:129 > 129 if (size % _thr_page_size != 0) > (gdb) s > 130 size = ((size / _thr_page_size) + 1) * > (gdb) bt > #0 round_up (size=4096) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130 > #1 _thr_stack_fix_protection (thrd=0x20070000) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 > #2 __thr_map_stacks_exec () at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 > #3 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 > #4 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", > fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized > out>, mode=1, lockstate=0xffffd290) > at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 > #5 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", > fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 > #6 0x00020510 in main () at dlopen_test.c:14 > (gdb) si > 129 if (size % _thr_page_size != 0) > (gdb) 130 size = ((size / _thr_page_size) + 1) * > (gdb) bt > #0 round_up (size=4096) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130 > #1 _thr_stack_fix_protection (thrd=0x20070000) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 > #2 __thr_map_stacks_exec () at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 > #3 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 > #4 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", > fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized > out>, mode=1, lockstate=0xffffd290) > at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 > #5 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", > fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 > #6 0x00020510 in main () at dlopen_test.c:14 > (gdb) disass /s > Dump of assembler code for function __thr_map_stacks_exec: > . . . > 130 size = ((size / _thr_page_size) + 1) * > 0x20112eec <+340>: mov r0, r6 > > 129 if (size % _thr_page_size != 0) > 0x20112ef0 <+344>: ldr r4, [pc, r4] > > 130 size = ((size / _thr_page_size) + 1) * > => 0x20112ef4 <+348>: mov r1, r4 > 0x20112ef8 <+352>: bl 0x20116b60 > > NOTE: 0x20116760 - 0x20116f30 is .plt in /lib/libthr.so.3 > > --Type <RET> for more, q to quit, c to continue without paging-- > 0x20112efc <+356>: mov r9, r0 > 0x20112f00 <+360>: mov r0, r5 > 0x20112f04 <+364>: mov r1, r4 > 0x20112f08 <+368>: bl 0x20116b60 > > NOTE: 0x20116760 - 0x20116f30 is .plt in /lib/libthr.so.3 > > 0x20112f0c <+372>: mls r1, r0, r4, r5 > . . . > (gdb) si > 0x20112ef8 130 size = ((size / _thr_page_size) + 1) * > (gdb) 0x20116b60 in ?? () from /lib/libthr.so.3 > (gdb) bt > #0 0x20116b60 in ?? () from /lib/libthr.so.3 > #1 0x20112efc in round_up (size=4096) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:130 > #2 _thr_stack_fix_protection (thrd=0x20070000) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:140 > #3 __thr_map_stacks_exec () at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_stack.c:178 > #4 0x2005d1e4 in map_stacks_exec (lockstate=0xffffd290) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:5946 > #5 dlopen_object (name=name@entry=0x1042d "/usr/local/lib/libcairo.so.2", > fd=<optimized out>, fd@entry=-1, refobj=<optimized out>, lo_flags=<optimized > out>, mode=1, lockstate=0xffffd290) > at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3872 > #6 0x20059e4c in rtld_dlopen (name=0x1042d "/usr/local/lib/libcairo.so.2", > fd=-1, mode=1) at /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:3751 > #7 0x00020510 in main () at dlopen_test.c:14 > (gdb) si > 0x20116b64 in ?? () from /lib/libthr.so.3 > (gdb) si > 0x20116b68 in ?? () from /lib/libthr.so.3 > (gdb) si > 0x20116760 in ?? () from /lib/libthr.so.3 > (gdb) si > 0x20116764 in ?? () from /lib/libthr.so.3 > (gdb) si > 0x20116768 in ?? () from /lib/libthr.so.3 > (gdb) si > 0x2011676c in ?? () from /lib/libthr.so.3 > (gdb) si > _rtld_bind_start () at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:78 > 78 stmdb sp!,{r0-r5,sl,fp} > (gdb) bt > #0 _rtld_bind_start () at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:78 > #1 0x201373b0 in ?? () from /lib/libthr.so.3 > > NOTE: 0x201373a8 - 0x201375a0 is .got.plt in /lib/libthr.so.3 > > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Turns out that _thr_rtld_rlock_acquire is looping when the > process is stuck: > > . . . > (gdb) bt > #0 _thr_rtld_rlock_acquire (lock=0x20137c40) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 > #1 0x20060788 in rlock_acquire (lock=0x2008af10 <rtld_locks>, > lockstate=lockstate@entry=0xffffd0ec) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld_lock.c:259 > #2 0x20059098 in _rtld_bind (obj=0x2008f404, reloff=496) at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/rtld.c:1035 > #3 0x2005483c in _rtld_bind_start () at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 > #4 0x2005483c in _rtld_bind_start () at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 > #5 0x2005483c in _rtld_bind_start () at > /home/pkgbuild/worktrees/main/libexec/rtld-elf/arm/rtld_start.S:89 > . . . > > (gdb) info threads > Id Target Id Frame * 1 LWP 100174 of process 97711 > _thr_rtld_rlock_acquire (lock=0x20137c40) at > /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 > > So: Only the one main thread. > > It is repeating the _thr_rwlock_rdlock loop (lines 121/122): The above wording is greatly misleading for what I was trying to refer to (by being greatly incomplete). Correcting via a different description . . . For starting with (via breakpoint activity stopping there): #0 _umtx_op () at _umtx_op.S:4 #1 0x2036845c in _umtx_op_err (obj=0x20137c40, op=12, val=0, uaddr=0x0, uaddr2=0x0) at /home/pkgbuild/worktrees/main/lib/libsys/_umtx_op_err.c:36 #2 0x20115da8 in __thr_rwlock_rdlock (rwlock=rwlock@entry=0x20137c40, flags=0, tsp=<optimized out>) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.c:294 #3 0x2010ebf4 in _thr_rwlock_rdlock (rwlock=0x20137c40, flags=0, tsp=0x0) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.h:229 #4 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 . . . I get the sequence below that involves one ^C to get out of the hung-up status for some activity (that eventually hangs up again). It is true that _thr_rtld_rlock_acquire does not return but loops for this kind of sequence. It is just an insufficient overall description. (gdb) finish Run till exit from #0 _umtx_op () at _umtx_op.S:4 ^C Program received signal SIGINT, Interrupt. Sent by kernel. _umtx_op () at _umtx_op.S:4 4 in _umtx_op.S (gdb) finish Run till exit from #0 _umtx_op () at _umtx_op.S:4 0x2036845c in _umtx_op_err (obj=<optimized out>, op=<optimized out>, val=<optimized out>, uaddr=<optimized out>, uaddr2=0x0) at /home/pkgbuild/worktrees/main/lib/libsys/_umtx_op_err.c:36 36 if (_umtx_op(obj, op, val, uaddr, uaddr2) == -1) (gdb) finish Run till exit from #0 0x2036845c in _umtx_op_err (obj=<optimized out>, op=<optimized out>, val=<optimized out>, uaddr=<optimized out>, uaddr2=0x0) at /home/pkgbuild/worktrees/main/lib/libsys/_umtx_op_err.c:36 0x20115da8 in __thr_rwlock_rdlock (rwlock=rwlock@entry=0x20137c40, flags=<optimized out>, tsp=<optimized out>) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.c:294 294 return (_umtx_op_err(rwlock, UMTX_OP_RW_RDLOCK, flags, Value returned is $3 = 4 (gdb) finish Run till exit from #0 0x20115da8 in __thr_rwlock_rdlock (rwlock=rwlock@entry=0x20137c40, flags=<optimized out>, tsp=<optimized out>) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.c:294 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 121 while (_thr_rwlock_rdlock(&l->lock, 0, NULL) != 0) Value returned is $4 = 4 (gdb) finish Run till exit from #0 _thr_rtld_rlock_acquire (lock=0x20137c40) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_rtld.c:121 Breakpoint 1, _thr_rwlock_rdlock (rwlock=0x20137c40, flags=0, tsp=0x0) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.h:229 229 return (__thr_rwlock_rdlock(rwlock, flags, tsp)); (gdb) finish Run till exit from #0 _thr_rwlock_rdlock (rwlock=0x20137c40, flags=0, tsp=0x0) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.h:229 Breakpoint 2.1, __thr_rwlock_rdlock (rwlock=rwlock@entry=0x20137c40, flags=0, tsp=0x0) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.c:284 284 if (tsp == NULL) { (gdb) finish Run till exit from #0 __thr_rwlock_rdlock (rwlock=rwlock@entry=0x20137c40, flags=0, tsp=0x0) at /home/pkgbuild/worktrees/main/lib/libthr/thread/thr_umtx.c:284 Breakpoint 4.2, _umtx_op_err (obj=0x20137c40, op=12, val=0, uaddr=0x0, uaddr2=0x0) at /home/pkgbuild/worktrees/main/lib/libsys/_umtx_op_err.c:36 36 if (_umtx_op(obj, op, val, uaddr, uaddr2) == -1) (gdb) finish Run till exit from #0 _umtx_op_err (obj=0x20137c40, op=12, val=0, uaddr=0x0, uaddr2=0x0) at /home/pkgbuild/worktrees/main/lib/libsys/_umtx_op_err.c:36 Breakpoint 5.2, _umtx_op () at _umtx_op.S:4 warning: 4 _umtx_op.S: No such file or directory (gdb) > (gdb) list 115 > 110 _thr_rtld_rlock_acquire(void *lock) > 111 { > 112 struct pthread *curthread; > 113 struct rtld_lock *l; > 114 int errsave; > 115 > 116 curthread = _get_curthread(); > 117 SAVE_ERRNO(); > 118 l = (struct rtld_lock *)lock; > 119 > (gdb) > 120 THR_CRITICAL_ENTER(curthread); > 121 while (_thr_rwlock_rdlock(&l->lock, 0, NULL) != 0) > 122 ; > 123 curthread->rdlock_count++; > 124 RESTORE_ERRNO(); > 125 } > > >>> , and backtrace all threads from userspace. >>> I only need userspace backtrace, not either kernel-side stacks nor >>> the syscall history. >>> >>> Are you sure that the issue is specific to armv7, might be it takes more >>> efforts to reproduce on host native? > > I'll note that my personal builds of armv7 are set up to use -mcpu=corext-a7 . It appears in my experiments that such may generally avoid the hangups. It may be why my initial attempts to reproduce the problem months back did not manage to reproduce it. It may also contribute to the "-Og -g" test not getting a hangup failure. My reproductions are based on having installed and used official pkgbase builds mostly, and a little official snapshot based activity (not updated via pkgbase). This avoids such personal oddities as -mcpu=corext-a7 use from being involved in my builds. === Mark Millard marklmi at yahoo.com