On Tue, Jul 23, 2024 at 2:11 PM John F Carr <j...@mit.edu> wrote: > On Jul 23, 2024, at 13:46, Michal Meloun <meloun.mic...@gmail.com> wrote: > > > > On 23.07.2024 11:36, Konstantin Belousov wrote: > >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: > >>> The good news is that I'm finally able to generate a working/locking > >>> test case. The culprit (at least for me) is if "-mcpu" is used when > >>> compiling libthr (e.g. indirectly injected via CPUTYPE in > /etc/make.conf). > >>> If it is not used, libthr is broken (regardless of -O level or > debug/normal > >>> build), but -mcpu=cortex-a15 will always produce a working libthr. > >> I think this is very significant progress. > >> Do you plan to drill down more to see what is going on? > > > > So the problem is now clear, and I fear it may apply to other > architectures as well. > > dlopen_object() (from rtld_elf), > > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, > > holds the rtld_bind_lock write lock for almost the entire time a new > library is loaded. > > If the code uses a yet unresolved symbol to load the library, the > rtl_bind() function attempts to get read lock of rtld_bind_lock and a > deadlock occurs. > > > > In this case, it round_up() in _thr_stack_fix_protection, > > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. > > Issued by __aeabi_uidiv (since not all armv7 processors support HW > divide). > > > > Unfortunately, I'm not sure how to fix it. The compiler can emit > __aeabi_<> in any place, and I'm not sure if it can resolve all the symbols > used by rtld_eld and libthr beforehand. > > > > > > Michal > > > > In this case (but not for all _aeabi_ functions) we can avoid division > as long as page size is a power of 2. > > The function is > > static inline size_t > round_up(size_t size) > { > if (size % _thr_page_size != 0) > size = ((size / _thr_page_size) + 1) * > _thr_page_size; > return size; > } > > The body can be condensed to > > return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); > > This is shorter in both lines of code and instruction bytes. >
I like this change... But we do need to fix the deadlocks... They seem to be more likely when building in bsd-user emulation... Warner