Re: [lldb-dev] Deadlock loading DWARF symbols

2020-10-14 Thread Frédéric Riss via lldb-dev
[I thought I had already sent this out weeks ago…] 

> On Oct 2, 2020, at 2:13 PM, Greg Clayton via lldb-dev 
>  wrote:
> 
> Yes this is bad, and GetDescription() is used as a convenience to print out 
> the module path (which might be a .o file within a .a file) and optionally 
> architecture of the module. It probably shouldn't be taking the module lock 
> as the only member variables that that GetDescription accesses are:
> 
> Module::m_arch
> Module::m_file
> Module::m_object_name
> 
> I would almost vote to take out the mutex lock in GetDescription() as the 
> arch, file and name don't change after the module has been created. I am 
> going to CC a few extra folks for discussion.
> 
> Anyone else have any objections to removing the mutex in GetDescription? 
> Seems like this deadlock is easy to trigger if you have DWARF with errors or 
> warnings inside of it.

I remember having a discussion with Jim about a very similar issue and IIRC, he 
told me that the arch could change after a module is created. I don’t remember 
the reason off the top of my head.

I think we are hitting a very similar deadlock in the Swift REPL for slightly 
different reasons (same lock though).

Fred 

> Greg
> 
> 
>> On Oct 2, 2020, at 6:50 AM, Dmitry Antipov via lldb-dev 
>>  wrote:
>> 
>> I'm observing the following deadlock:
>> 
>> One thread calls Module::PreloadSymbols() which takes m_mutex of this 
>> Module. Module::PreloadSymbols()
>> calls ManualDWARFIndex::Index(), which, in turn, creates thread pool and 
>> waits for all threads completion:
>> 
>> (gdb)
>> #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f67f176914c) 
>> at ../sysdeps/nptl/futex-internal.h:183
>> #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, 
>> mutex=0x7f67f17690c8, cond=0x7f67f1769120) at pthread_cond_wait.c:508
>> #2  __pthread_cond_wait (cond=0x7f67f1769120, mutex=0x7f67f17690c8) at 
>> pthread_cond_wait.c:638
>> #3  0x7f67f3974890 in 
>> std::condition_variable::wait(std::unique_lock&) () from 
>> /lib64/libstdc++.so.6
>> #4  0x7f67f4440c4b in 
>> std::condition_variable::wait > 
>> (__p=..., __lock=..., this=0x7f67f1769120)
>>   at /usr/include/c++/10/condition_variable:108
>> #5  llvm::ThreadPool::wait (this=this@entry=0x7f67f1769060) at 
>> source/llvm/lib/Support/ThreadPool.cpp:72
>> #6  0x7f67fc6fa3a6 in lldb_private::ManualDWARFIndex::Index 
>> (this=0x7f66fe87e950)
>>   at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:94
>> #7  0x7f67fc6b3825 in SymbolFileDWARF::PreloadSymbols 
>> (this=0x7f67de7af6f0) at /usr/include/c++/10/bits/unique_ptr.h:421
>> #8  0x7f67fc1ee488 in lldb_private::Module::PreloadSymbols 
>> (this=0x7f67de79b620) at source/lldb/source/Core/Module.cpp:1397
>> #9  0x7f67fc397a37 in lldb_private::Target::GetOrCreateModule 
>> (this=this@entry=0x96c7a0, module_spec=..., notify=notify@entry=true, 
>> error_ptr=error_ptr@entry=0x0)
>>   at /usr/include/c++/10/bits/shared_ptr_base.h:1324
>> ...
>> 
>> OTOH one of pool threads makes an attempt to lock Module's mutex:
>> 
>> (gdb) bt
>> #0  __lll_lock_wait (futex=futex@entry=0x7f67de79b638, private=0) at 
>> lowlevellock.c:52
>> #1  0x7f67fcd907f1 in __GI___pthread_mutex_lock (mutex=0x7f67de79b638) 
>> at ../nptl/pthread_mutex_lock.c:115
>> #2  0x7f67fc1ed922 in __gthread_mutex_lock (__mutex=0x7f67de79b638) at 
>> /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:749
>> #3  __gthread_recursive_mutex_lock (__mutex=0x7f67de79b638) at 
>> /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:811
>> #4  std::recursive_mutex::lock (this=0x7f67de79b638) at 
>> /usr/include/c++/10/mutex:106
>> #5  std::lock_guard::lock_guard (__m=..., 
>> this=) at /usr/include/c++/10/bits/std_mutex.h:159
>> #6  lldb_private::Module::GetDescription (this=this@entry=0x7f67de79b620, 
>> s=..., level=level@entry=lldb::eDescriptionLevelBrief)
>>   at source/lldb/source/Core/Module.cpp:1083
>> #7  0x7f67fc1f2070 in lldb_private::Module::ReportError 
>> (this=0x7f67de79b620, format=0x7f67fca03660 "DW_FORM_ref* DIE reference 
>> 0x%lx is outside of its CU")
>>   at source/lldb/include/lldb/Utility/Stream.h:358
>> #8  0x7f67fc6adfb4 in DWARFFormValue::Reference 
>> (this=this@entry=0x7f66f5ff29c0) at 
>> /usr/include/c++/10/bits/shared_ptr_base.h:1324
>> #9  0x7f67fc6aaa77 in DWARFDebugInfoEntry::GetAttributes 
>> (this=this@entry=0x7f662e3580e0, cu=cu@entry=0x7f66ff6ebad0, attributes=...,
>>   recurse=recurse@entry=DWARFBaseDIE::Recurse::yes, 
>> curr_depth=curr_depth@entry=0)
>>   at source/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:439
>> #10 0x7f67fc6f8f8f in DWARFDebugInfoEntry::GetAttributes 
>> (recurse=DWARFBaseDIE::Recurse::yes, attrs=..., cu=0x7f66ff6ebad0, 
>> this=0x7f662e3580e0)
>>   at source/lldb/source/./Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:54
>> #11 lldb_private::ManualDWARFIndex::IndexUnitImpl (unit=..., 
>> cu_language=cu_language@entry=lldb::eLanguageTyp

Re: [lldb-dev] Deadlock loading DWARF symbols

2020-10-05 Thread Pavel Labath via lldb-dev

On 02/10/2020 23:13, Greg Clayton wrote:

Yes this is bad, and GetDescription() is used as a convenience to print out the 
module path (which might be a .o file within a .a file) and optionally 
architecture of the module. It probably shouldn't be taking the module lock as 
the only member variables that that GetDescription accesses are:

Module::m_arch
Module::m_file
Module::m_object_name

I would almost vote to take out the mutex lock in GetDescription() as the arch, 
file and name don't change after the module has been created. I am going to CC 
a few extra folks for discussion.

Anyone else have any objections to removing the mutex in GetDescription? Seems 
like this deadlock is easy to trigger if you have DWARF with errors or warnings 
inside of it.



That sounds reasonable to me. All of the above fields can change during 
the early stages of Module construction (while the ObjectFile is being 
parsed and such), but I would certainly hope they remain stable after 
that. And this early construction process should be single-threaded.


So, I am fine with saying any subsequent modification is a bug.

pl
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Deadlock loading DWARF symbols

2020-10-02 Thread Greg Clayton via lldb-dev
Yes this is bad, and GetDescription() is used as a convenience to print out the 
module path (which might be a .o file within a .a file) and optionally 
architecture of the module. It probably shouldn't be taking the module lock as 
the only member variables that that GetDescription accesses are:

Module::m_arch
Module::m_file
Module::m_object_name

I would almost vote to take out the mutex lock in GetDescription() as the arch, 
file and name don't change after the module has been created. I am going to CC 
a few extra folks for discussion.

Anyone else have any objections to removing the mutex in GetDescription? Seems 
like this deadlock is easy to trigger if you have DWARF with errors or warnings 
inside of it.

Greg


> On Oct 2, 2020, at 6:50 AM, Dmitry Antipov via lldb-dev 
>  wrote:
> 
> I'm observing the following deadlock:
> 
> One thread calls Module::PreloadSymbols() which takes m_mutex of this Module. 
> Module::PreloadSymbols()
> calls ManualDWARFIndex::Index(), which, in turn, creates thread pool and 
> waits for all threads completion:
> 
> (gdb)
> #0  futex_wait_cancelable (private=0, expected=0, futex_word=0x7f67f176914c) 
> at ../sysdeps/nptl/futex-internal.h:183
> #1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x7f67f17690c8, 
> cond=0x7f67f1769120) at pthread_cond_wait.c:508
> #2  __pthread_cond_wait (cond=0x7f67f1769120, mutex=0x7f67f17690c8) at 
> pthread_cond_wait.c:638
> #3  0x7f67f3974890 in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /lib64/libstdc++.so.6
> #4  0x7f67f4440c4b in 
> std::condition_variable::wait > 
> (__p=..., __lock=..., this=0x7f67f1769120)
>at /usr/include/c++/10/condition_variable:108
> #5  llvm::ThreadPool::wait (this=this@entry=0x7f67f1769060) at 
> source/llvm/lib/Support/ThreadPool.cpp:72
> #6  0x7f67fc6fa3a6 in lldb_private::ManualDWARFIndex::Index 
> (this=0x7f66fe87e950)
>at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:94
> #7  0x7f67fc6b3825 in SymbolFileDWARF::PreloadSymbols 
> (this=0x7f67de7af6f0) at /usr/include/c++/10/bits/unique_ptr.h:421
> #8  0x7f67fc1ee488 in lldb_private::Module::PreloadSymbols 
> (this=0x7f67de79b620) at source/lldb/source/Core/Module.cpp:1397
> #9  0x7f67fc397a37 in lldb_private::Target::GetOrCreateModule 
> (this=this@entry=0x96c7a0, module_spec=..., notify=notify@entry=true, 
> error_ptr=error_ptr@entry=0x0)
>at /usr/include/c++/10/bits/shared_ptr_base.h:1324
> ...
> 
> OTOH one of pool threads makes an attempt to lock Module's mutex:
> 
> (gdb) bt
> #0  __lll_lock_wait (futex=futex@entry=0x7f67de79b638, private=0) at 
> lowlevellock.c:52
> #1  0x7f67fcd907f1 in __GI___pthread_mutex_lock (mutex=0x7f67de79b638) at 
> ../nptl/pthread_mutex_lock.c:115
> #2  0x7f67fc1ed922 in __gthread_mutex_lock (__mutex=0x7f67de79b638) at 
> /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:749
> #3  __gthread_recursive_mutex_lock (__mutex=0x7f67de79b638) at 
> /usr/include/c++/10/x86_64-redhat-linux/bits/gthr-default.h:811
> #4  std::recursive_mutex::lock (this=0x7f67de79b638) at 
> /usr/include/c++/10/mutex:106
> #5  std::lock_guard::lock_guard (__m=..., 
> this=) at /usr/include/c++/10/bits/std_mutex.h:159
> #6  lldb_private::Module::GetDescription (this=this@entry=0x7f67de79b620, 
> s=..., level=level@entry=lldb::eDescriptionLevelBrief)
>at source/lldb/source/Core/Module.cpp:1083
> #7  0x7f67fc1f2070 in lldb_private::Module::ReportError 
> (this=0x7f67de79b620, format=0x7f67fca03660 "DW_FORM_ref* DIE reference 0x%lx 
> is outside of its CU")
>at source/lldb/include/lldb/Utility/Stream.h:358
> #8  0x7f67fc6adfb4 in DWARFFormValue::Reference 
> (this=this@entry=0x7f66f5ff29c0) at 
> /usr/include/c++/10/bits/shared_ptr_base.h:1324
> #9  0x7f67fc6aaa77 in DWARFDebugInfoEntry::GetAttributes 
> (this=this@entry=0x7f662e3580e0, cu=cu@entry=0x7f66ff6ebad0, attributes=...,
>recurse=recurse@entry=DWARFBaseDIE::Recurse::yes, 
> curr_depth=curr_depth@entry=0)
>at source/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.cpp:439
> #10 0x7f67fc6f8f8f in DWARFDebugInfoEntry::GetAttributes 
> (recurse=DWARFBaseDIE::Recurse::yes, attrs=..., cu=0x7f66ff6ebad0, 
> this=0x7f662e3580e0)
>at source/lldb/source/./Plugins/SymbolFile/DWARF/DWARFDebugInfoEntry.h:54
> #11 lldb_private::ManualDWARFIndex::IndexUnitImpl (unit=..., 
> cu_language=cu_language@entry=lldb::eLanguageTypeRust, set=...)
>at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:180
> #12 0x7f67fc6f96b7 in lldb_private::ManualDWARFIndex::IndexUnit 
> (this=, unit=..., dwp=0x0, set=...)
>at source/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp:126
> ...
> 
> So this is a deadlock because thread pool is created with module lock held, 
> and one (or more,
> I'm observing two) pool thread(s) might want to grab the same lock to issue 
> an error message.
> 
> Commenting out the whole body of Module::GetDescription() m