Le 31/03/2019 à 15:19, Aurelien Jarno a écrit : > This bug is very likely a bug present in old glibc versions. It has been > brought to light when enabling TLS support in openblas and not by a new > glibc version. > > Right now the bug has been workarounded by disabling TLS support in > openblas. The way to handle this bug is to write a small testcase that > can be forwarded upstream. It's not an easy task though. >
Hi, I've made a test case here [0]. I've not tested it against latest glibc commit. But it does reproduce the deadlock with glibc 2.28 on Linux. To run the test case, do this: ``` gcc test_compiler_tls.c -o test_compiler_tls -ldl -g -pthread gcc test_compiler_tls_lib.c -shared -o test_compiler_tls_lib.so \ -g -pthread -fPIC ./test_compiler_tls ./test_compiler_tls_lib & gdb --pid $! -ex 'thr a a bt' ``` This reproduce the deadlock that I've found in openblas: 1- The test_thread open the library which call its constructor 2- The library's constructor create a thread `thread_that_use_tls_after_sleep` 3- The thread `thread_that_use_tls_after_sleep` sleep for 100ms (this needs to be enough so dl_close is called before the sleep ends) 3- The test_thread close the library with dl_close 4- dl_close lock `dl_load_lock` and call the library's destructor 5- The library's destructor wait `thread_that_use_tls_after_sleep` to finish 6- The `thread_that_use_tls_after_sleep` thread try to read the TLS variable which cause a call to `__tls_get_addr` 7- `__tls_get_addr` cause a deadlock in `tls_get_addr_tail` trying to lock the same `dl_load_lock` as dl_close does 8- Nothing happen because dl_close thread is waiting for the `thread_that_use_tls_after_sleep` thread to finish which having the lock and the latter thread try to lock the same lock as dl_close and so never exit. See [1] for the stacktrace. Thread 3 is the library's thread created in its constructor and joined in its destructor. Thread 2 is the thread that does dl_open and dl_close. Thread 1 is a "monitoring" thread to implement a timeout of 10s (useful if this tests need to run on a CI system) Where dl_close lock the `dl_load_lock`: [2] Where tls_get_addr_tail lock the `dl_load_lock`: [3] [0]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7 [1]: https://gist.github.com/amurzeau/26f045bdfea407528dd7de3102fb4be7#file-gdb_stacktrace-txt [2]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-close.c#L812 [3]: https://github.com/bminor/glibc/blob/glibc-2.28/elf/dl-tls.c#L761 -- Alexis Murzeau PGP: B7E6 0EBB 9293 7B06 BDBC 2787 E7BD 1904 F480 937F
signature.asc
Description: OpenPGP digital signature