On 24.07.2024 12:24, Konstantin Belousov wrote:
On Tue, Jul 23, 2024 at 08:11:13PM +0000, John F Carr wrote:
On Jul 23, 2024, at 13:46, Michal Meloun <meloun.mic...@gmail.com> wrote:

On 23.07.2024 11:36, Konstantin Belousov wrote:
On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote:
The good news is that I'm finally able to generate a working/locking
test case.  The culprit (at least for me) is if "-mcpu" is used when
compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf).
If it is not used, libthr is broken (regardless of -O level or debug/normal
build), but -mcpu=cortex-a15 will always produce a working libthr.
I think this is very significant progress.
Do you plan to drill down more to see what is going on?

So the problem is now clear, and I fear it may apply to other architectures as 
well.
dlopen_object() (from rtld_elf),
https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766,
holds the rtld_bind_lock write lock for almost the entire time a new library is 
loaded.
If the code uses a yet unresolved symbol to load the library, the rtl_bind() 
function attempts to get read lock of  rtld_bind_lock and a deadlock occurs.

In this case, it round_up() in _thr_stack_fix_protection,
https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136.
Issued by __aeabi_uidiv (since not all armv7 processors support HW divide).

Unfortunately, I'm not sure how to fix it.  The compiler can emit __aeabi_<> in 
any place, and I'm not sure if it can resolve all the symbols used by rtld_eld and 
libthr beforehand.


Michal


In this case (but not for all _aeabi_ functions) we can avoid division
as long as page size is a power of 2.

The function is

   static inline size_t
   round_up(size_t size)
   {
        if (size % _thr_page_size != 0)
                size = ((size / _thr_page_size) + 1) *
                    _thr_page_size;
        return size;
   }

The body can be condensed to

   return (size + _thr_page_size - 1) & ~(_thr_page_size - 1);

This is shorter in both lines of code and instruction bytes.

Lets not allow this to be lost.  Could anybody confirm that the patch
below fixes the issue?

commit d560f4f6690a48476565278fd07ca131bf4eeb3c
Author: Konstantin Belousov <k...@freebsd.org>
Date:   Wed Jul 24 13:17:55 2024 +0300

     rtld: avoid division in __thr_map_stacks_exec()
The function is called by rtld with the rtld bind lock write-locked,
     when fixing the stack permission during dso load.  Not every ARMv7 CPU
     supports the div, which causes the recursive entry into rtld to resolve
     the  __aeabi_uidiv symbol, causing self-lock.
Workaround the problem by using roundup2() instead of open-coding less
     efficient formula.
Diagnosed by: mmel
     Based on submission by: John F Carr <j...@mit.edu>
     Sponsored by:   The FreeBSD Foundation
     MFC after:      1 week





For final resolving of deadlocks, after a full day of digging, I'm very much incline of adding -znow to the linker flags for libthr.so (and maybe also for ld-elf.so). The runtime cost of resolving all symbols at startup is very low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_* symbols, since they don't have an official C prototypes, and some are not compatible with C calling conventions.

Warner, Konstantin, could you please comment on this?


Michal

Reply via email to