[PATCH] ldso: fix dlsym hang when reloading DSOs
It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). In these cases we must not re-add the DSO to the global symbol scope as it is already there. Or we end up corrupting the linked list and further dlsym lookups will hang. Signed-off-by: Timo Teräs timo.te...@iki.fi --- ldso/libdl/libdl.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/ldso/libdl/libdl.c b/ldso/libdl/libdl.c index c451b63..c901512 100644 --- a/ldso/libdl/libdl.c +++ b/ldso/libdl/libdl.c @@ -543,15 +543,23 @@ static void *do_dlopen(const char *libname, int flag, ElfW(Addr) from) #ifdef SHARED /* -* Get the tail of the list. * In the static case doesn't need to extend the global scope, it is * ready to be used as it is, because _dl_loaded_modules already points * to the dlopened library. +* +* Extend the global scope by adding the local scope of the dlopened DSO. +* But only if it's not there. It can happen under certain cases that the +* DSO had refcount = 0, but was already loaded. (NODELETE flag is set, or +* it is pulled in via both NEEDED dependency and explicit dlopen()). */ - for (ls = _dl_loaded_modules-symbol_scope; ls ls-next; ls = ls-next); - - /* Extend the global scope by adding the local scope of the dlopened DSO. */ - ls-next = dyn_chain-dyn-symbol_scope; + for (ls = _dl_loaded_modules-symbol_scope; ; ls = ls-next) { + if (ls == dyn_chain-dyn-symbol_scope) + break; + if (ls-next == NULL) { + ls-next = dyn_chain-dyn-symbol_scope; + break; + } + } #endif #ifdef __mips__ /* -- 1.8.3.1 ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On 27/06/2013 8.34, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). In these cases we must not re-add the DSO to the global symbol scope as it is already there. Or we end up corrupting the linked list and further dlsym lookups will hang. Signed-off-by: Timo Teräs timo.te...@iki.fi Acked-by: Carmelo Amoroso carmelo.amor...@st.com Cheers, Carmelo --- ldso/libdl/libdl.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/ldso/libdl/libdl.c b/ldso/libdl/libdl.c index c451b63..c901512 100644 --- a/ldso/libdl/libdl.c +++ b/ldso/libdl/libdl.c @@ -543,15 +543,23 @@ static void *do_dlopen(const char *libname, int flag, ElfW(Addr) from) #ifdef SHARED /* - * Get the tail of the list. * In the static case doesn't need to extend the global scope, it is * ready to be used as it is, because _dl_loaded_modules already points * to the dlopened library. + * + * Extend the global scope by adding the local scope of the dlopened DSO. + * But only if it's not there. It can happen under certain cases that the + * DSO had refcount = 0, but was already loaded. (NODELETE flag is set, or + * it is pulled in via both NEEDED dependency and explicit dlopen()). */ - for (ls = _dl_loaded_modules-symbol_scope; ls ls-next; ls = ls-next); - - /* Extend the global scope by adding the local scope of the dlopened DSO. */ - ls-next = dyn_chain-dyn-symbol_scope; + for (ls = _dl_loaded_modules-symbol_scope; ; ls = ls-next) { + if (ls == dyn_chain-dyn-symbol_scope) + break; + if (ls-next == NULL) { + ls-next = dyn_chain-dyn-symbol_scope; + break; + } + } #endif #ifdef __mips__ /* ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. Rich ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On 27 June 2013 20:16:26 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. I was thinking the same, without having looked yet. This does not sound like the correct fix. Sent with AquaMail for Android http://www.aqua-mail.com ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On Thu, 27 Jun 2013 14:16:26 -0400 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. It seems this is not the only case. As mentioned, NEEDED dependencies together with dlopen()/dlclose() can result in this too in some cases. Apparently usage_count refers to active handles from dlopen() - and can be zero if the lib is pulled in via NEEDED dependency only. Thus, this is the least intrusive fix. While restructuring dlopen() and dlclose() could fix this better, I did not have the time or interest to rewrite most of the libdl code. - Timo ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On Thu, 27 Jun 2013 21:36:14 +0300 Timo Teras timo.te...@iki.fi wrote: On Thu, 27 Jun 2013 14:16:26 -0400 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. It seems this is not the only case. As mentioned, NEEDED dependencies together with dlopen()/dlclose() can result in this too in some cases. Apparently usage_count refers to active handles from dlopen() - and can be zero if the lib is pulled in via NEEDED dependency only. Thus, this is the least intrusive fix. While restructuring dlopen() and dlclose() could fix this better, I did not have the time or interest to rewrite most of the libdl code. Hum. Actually it might be that this happens with NODELETE flag only. Seems my test case had that instead of the weird combo. I'll test with NODELETE flag just adding usage_count by extra reference. Will post new patch if it works (or report back if it didn't). - Timo ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On Thu, Jun 27, 2013 at 09:36:14PM +0300, Timo Teras wrote: On Thu, 27 Jun 2013 14:16:26 -0400 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. It seems this is not the only case. As mentioned, NEEDED dependencies together with dlopen()/dlclose() can result in this too in some cases. Apparently usage_count refers to active handles from dlopen() - and can be zero if the lib is pulled in via NEEDED dependency only. This is also a bug in the reference counting. DT_NEEDED and dlopen should both be considered references for counting purposes. Otherwise you can end up with a reference count of zero when the library is still in use. Rich ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On Thu, 27 Jun 2013 20:36:44 +0200 Bernhard Reutner-Fischer rep.dot@gmail.com wrote: On 27 June 2013 20:16:26 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. I was thinking the same, without having looked yet. This does not sound like the correct fix. This would be the alternate patch - but somehow I lost my original test case. And qemu seems to now depend on GTK making the other known issue also go away. Paolo, could you test if this fixes your problem also? diff --git a/ldso/ldso/dl-elf.c b/ldso/ldso/dl-elf.c index 96459f3..71133dd 100644 --- a/ldso/ldso/dl-elf.c +++ b/ldso/ldso/dl-elf.c @@ -831,6 +831,8 @@ struct elf_resolve *_dl_load_elf_shared_library(unsigned rflags, #endif (*rpnt)-dyn = tpnt; tpnt-usage_count++; + if (rtld_flags RTLD_NODELETE) + tpnt-usage_count++; #ifdef __LDSO_STANDALONE_SUPPORT__ tpnt-libtype = (epnt-e_type == ET_DYN) ? elf_lib : elf_executable; #else diff --git a/ldso/libdl/libdl.c b/ldso/libdl/libdl.c index c451b63..035b56d 100644 --- a/ldso/libdl/libdl.c +++ b/ldso/libdl/libdl.c @@ -822,7 +822,7 @@ static int do_dlclose(void *vhandle, int need_fini) _dl_handles = rpnt-next_handle; _dl_if_debug_print(%s: usage count: %d\n, handle-dyn-libname, handle-dyn-usage_count); - if (handle-dyn-usage_count != 1 || (handle-dyn-rtld_flags RTLD_NODELETE)) { + if (handle-dyn-usage_count != 1) { handle-dyn-usage_count--; free(handle); return 0; @@ -844,7 +844,7 @@ static int do_dlclose(void *vhandle, int need_fini) for (j = 0; j handle-init_fini.nlist; ++j) { tpnt = handle-init_fini.init_fini[j]; tpnt-usage_count--; - if (tpnt-usage_count == 0 !(tpnt-rtld_flags RTLD_NODELETE)) { + if (tpnt-usage_count == 0) { if ((tpnt-dynamic_info[DT_FINI] || tpnt-dynamic_info[DT_FINI_ARRAY]) need_fini ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On 27 June 2013 21:07:38 Timo Teras timo.te...@iki.fi wrote: On Thu, 27 Jun 2013 20:36:44 +0200 Bernhard Reutner-Fischer rep.dot@gmail.com wrote: On 27 June 2013 20:16:26 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. I was thinking the same, without having looked yet. This does not sound like the correct fix. This would be the alternate patch - but somehow I lost my original test case. And qemu seems to now depend on GTK making the other known issue also go away. Paolo, could you test if this fixes your problem also? diff --git a/ldso/ldso/dl-elf.c b/ldso/ldso/dl-elf.c index 96459f3..71133dd 100644 --- a/ldso/ldso/dl-elf.c +++ b/ldso/ldso/dl-elf.c @@ -831,6 +831,8 @@ struct elf_resolve *_dl_load_elf_shared_library(unsigned rflags, #endif (*rpnt)-dyn = tpnt; tpnt-usage_count++; + if (rtld_flags RTLD_NODELETE) + tpnt-usage_count++; #ifdef __LDSO_STANDALONE_SUPPORT__ tpnt-libtype = (epnt-e_type == ET_DYN) ? elf_lib : elf_executable; #else diff --git a/ldso/libdl/libdl.c b/ldso/libdl/libdl.c index c451b63..035b56d 100644 --- a/ldso/libdl/libdl.c +++ b/ldso/libdl/libdl.c @@ -822,7 +822,7 @@ static int do_dlclose(void *vhandle, int need_fini) _dl_handles = rpnt-next_handle; _dl_if_debug_print(%s: usage count: %d\n, handle-dyn-libname, handle-dyn-usage_count); - if (handle-dyn-usage_count != 1 || (handle-dyn-rtld_flags RTLD_NODELETE)) { + if (handle-dyn-usage_count != 1) { handle-dyn-usage_count--; free(handle); return 0; @@ -844,7 +844,7 @@ static int do_dlclose(void *vhandle, int need_fini) for (j = 0; j handle-init_fini.nlist; ++j) { tpnt = handle-init_fini.init_fini[j]; tpnt-usage_count--; We want to keep usage_count at at least 1 if NODELETE here in dlclose: tpnt-usage_count -= !(tpnt-rtld_flags RTLD_NODELETE); Instead, or, better -- and then usage_count = MAX(usage_count, usage_count RTLD_NODELETE); or something to that effect. Thanks, - if (tpnt-usage_count == 0 !(tpnt-rtld_flags RTLD_NODELETE)) { + if (tpnt-usage_count == 0) { if ((tpnt-dynamic_info[DT_FINI] || tpnt-dynamic_info[DT_FINI_ARRAY]) need_fini Sent with AquaMail for Android http://www.aqua-mail.com ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc
Re: [PATCH] ldso: fix dlsym hang when reloading DSOs
On Thu, 27 Jun 2013 21:25:12 +0200 Bernhard Reutner-Fischer rep.dot@gmail.com wrote: On 27 June 2013 21:07:38 Timo Teras timo.te...@iki.fi wrote: On Thu, 27 Jun 2013 20:36:44 +0200 Bernhard Reutner-Fischer rep.dot@gmail.com wrote: On 27 June 2013 20:16:26 Rich Felker dal...@aerifal.cx wrote: On Thu, Jun 27, 2013 at 09:34:32AM +0300, Timo Teräs wrote: It can happen under certain cases that the DSO had refcount 0, but was already loaded. (NODELETE flag is set, or it is pulled in via both NEEDED dependency and explicit dlopen()). Wouldn't it be more logical to prevent this from happening by not using a refcount of zero? For example, NODELETE could just perform an extra increment on the refcnt so that it never reaches 0. In essence, the NODELETE flag is a permanent reference to the DSO. I was thinking the same, without having looked yet. This does not sound like the correct fix. This would be the alternate patch - but somehow I lost my original test case. And qemu seems to now depend on GTK making the other known issue also go away. Paolo, could you test if this fixes your problem also? diff --git a/ldso/ldso/dl-elf.c b/ldso/ldso/dl-elf.c index 96459f3..71133dd 100644 --- a/ldso/ldso/dl-elf.c +++ b/ldso/ldso/dl-elf.c @@ -831,6 +831,8 @@ struct elf_resolve *_dl_load_elf_shared_library(unsigned rflags, #endif (*rpnt)-dyn = tpnt; tpnt-usage_count++; + if (rtld_flags RTLD_NODELETE) + tpnt-usage_count++; #ifdef __LDSO_STANDALONE_SUPPORT__ tpnt-libtype = (epnt-e_type == ET_DYN) ? elf_lib : elf_executable; #else diff --git a/ldso/libdl/libdl.c b/ldso/libdl/libdl.c index c451b63..035b56d 100644 --- a/ldso/libdl/libdl.c +++ b/ldso/libdl/libdl.c @@ -822,7 +822,7 @@ static int do_dlclose(void *vhandle, int need_fini) _dl_handles = rpnt-next_handle; _dl_if_debug_print(%s: usage count: %d\n, handle-dyn-libname, handle-dyn-usage_count); - if (handle-dyn-usage_count != 1 || (handle-dyn-rtld_flags RTLD_NODELETE)) { + if (handle-dyn-usage_count != 1) { handle-dyn-usage_count--; free(handle); return 0; @@ -844,7 +844,7 @@ static int do_dlclose(void *vhandle, int need_fini) for (j = 0; j handle-init_fini.nlist; ++j) { tpnt = handle-init_fini.init_fini[j]; tpnt-usage_count--; We want to keep usage_count at at least 1 if NODELETE here in dlclose: tpnt-usage_count -= !(tpnt-rtld_flags RTLD_NODELETE); Instead, or, better -- and then usage_count = MAX(usage_count, usage_count RTLD_NODELETE); or something to that effect. The first chunk's: + if (rtld_flags RTLD_NODELETE) + tpnt-usage_count++; Will ensure that the reference count never drops below one. There's no way this extra reference gets deleted. However, it seems that do_dlopen() has: if (tpnt-usage_count 1) { // assume already dlopened } So the extra ref count actually messes up this. This should be simple enough to replace with test like: if (tpnt-init_flag DL_OPENED) But then I believe the the following search for previous dlopen handle will fail. Perhaps it is not a problem since the same happens with .so files that where NEEDED for the application and get later dlopened. - Timo ___ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc