This is an automated email from the ASF dual-hosted git repository.
lide pushed a commit to branch branch-1.2-lts
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/branch-1.2-lts by this push:
new 353cb94dbd3 [fix](be deadlock) avoid be deadlock because of MemTracker
(#51321)
353cb94dbd3 is described below
commit 353cb94dbd32c3928a9aaf3e6716ac29e1824e38
Author: camby <[email protected]>
AuthorDate: Wed May 28 15:31:08 2025 +0800
[fix](be deadlock) avoid be deadlock because of MemTracker (#51321)
### What problem does this PR solve?
In branch-2.0, we already refractor these codes in pr:
https://github.com/apache/doris/pull/18590
Deadlock stack:
1、While load, we alloc MemTracker and need lock TrackerGroup.group_lock
```
NodeChannel::NodeChannel
_node_channel_tracker = std::make_shared<MemTracker>
MemTracker::bind_parent
std::lock_guard<std::mutex>
l(mem_tracker_pool[_parent_group_num].group_lock);
_tracker_group_it =
mem_tracker_pool[_parent_group_num].trackers.insert(
mem_tracker_pool[_parent_group_num].trackers.end(), this);
```
2、but while we try to call std::list::insert, we need alloc
std::_List_node,here new_hook (in file tcmalloc_hook.h) is triggered,
then we lock the same TrackerGroup.group_lock, make it deadlock
```
new_hook
doris::ThreadMemTrackerMgr::consume
doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true>
doris::ThreadMemTrackerMgr::exceeded
doris::MemTrackerLimiter::print_log_usage
doris::MemTracker::make_group_snapshot
std::lock_guard<std::mutex>
l(mem_tracker_pool[group_num].group_lock);
```
Full stack info:
```
(gdb) bt
#0 0x00007f219772454d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f219771fe9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f219771fd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000563ca7a8a824 in __gthread_mutex_lock (__mutex=0x563cb18cbc58)
at
/var/local/ldb-toolchain/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749
#4 std::mutex::lock (this=0x563cb18cbc58) at
/var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:100
#5 std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic
pointer>) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:229
#6 doris::MemTracker::make_group_snapshot (snapshots=0x7f1f7fe06ea0,
group_num=<optimized out>, parent_label=...)
at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:115
#7 0x0000563ca7a7eeb4 in doris::MemTrackerLimiter::print_log_usage
(this=0x5640ab7956c0, msg=...)
at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker_limiter.cpp:198
#8 0x0000563ca7a8d1e4 in doris::ThreadMemTrackerMgr::exceeded
(this=this@entry=0x563ce6a2c820, size=1048584)
at
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.cpp:59
#9 0x0000563ca78cfeb4 in
doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true>
(this=0x563ce6a2c820)
at
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:223
#10 doris::ThreadMemTrackerMgr::consume (size=<optimized out>,
this=0x563ce6a2c820)
at
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:188
#11 doris::ThreadMemTrackerMgr::consume (size=<optimized out>,
this=0x563ce6a2c820)
at
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:178
#12 new_hook (ptr=<optimized out>, size=24) at
/data/TCHouse-D-1.2/be/src/runtime/memory/tcmalloc_hook.h:39
#13 0x0000563caf3dfa78 in MallocHook::InvokeNewHookSlow
(p=p@entry=0x56422d713b40, s=s@entry=24) at src/malloc_hook.cc:498
#14 0x0000563caf55f2c1 in MallocHook::InvokeNewHook (s=24,
p=0x56422d713b40) at src/malloc_hook-inl.h:127
#15 tcmalloc::do_allocate_full<tcmalloc::cpp_throw_oom>
(size=size@entry=24) at src/tcmalloc.cc:1805
#16 tcmalloc::allocate_full_cpp_throw_oom (size=size@entry=24) at
src/tcmalloc.cc:1815
#17 0x0000563caf55f429 in
tcmalloc::dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=24) at
src/tcmalloc.cc:1822
#18 0x0000563ca7a8ab9a in
__gnu_cxx::new_allocator<std::_List_node<doris::MemTracker*> >::allocate
(__n=1, this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:103
#19
std::allocator_traits<std::allocator<std::_List_node<doris::MemTracker*> >
>::allocate (__n=1, __a=...)
at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:460
#20 std::__cxx11::_List_base<doris::MemTracker*,
std::allocator<doris::MemTracker*> >::_M_get_node (this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:442
#21 std::__cxx11::list<doris::MemTracker*,
std::allocator<doris::MemTracker*> >::_M_create_node<doris::MemTracker*>
(this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:634
#22 std::__cxx11::list<doris::MemTracker*,
std::allocator<doris::MemTracker*> >::emplace<doris::MemTracker*>
(__position=..., this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/list.tcc:92
#23 std::__cxx11::list<doris::MemTracker*,
std::allocator<doris::MemTracker*> >::insert (__x=<optimized out>,
__position=..., this=0x563cb18cbc40)
at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:1309
#24 doris::MemTracker::bind_parent (this=0x563eff46ad10, parent=<optimized
out>) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:79
#25 0x0000563ca7a8b608 in doris::MemTracker::MemTracker
(this=this@entry=0x563eff46ad10, label=..., parent=parent@entry=0x0)
at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:66
#26 0x0000563ca7967467 in
__gnu_cxx::new_allocator<doris::MemTracker>::construct<doris::MemTracker,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
> (__p=0x563eff46ad10, this=<optimized out>) at
/var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
#27 std::allocator_traits<std::allocator<doris::MemTracker>
>::construct<doris::MemTracker, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, __a=...)
at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
#28 std::_Sp_counted_ptr_inplace<doris::MemTracker,
std::allocator<doris::MemTracker>,
(__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > (__a=..., this=0x563eff46ad00)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
#29
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::MemTracker,
std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > (__a=..., __p=<optimized out>,
this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
#30 std::__shared_ptr<doris::MemTracker,
(__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::MemTracker>,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >
> (__tag=..., this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
#31
std::shared_ptr<doris::MemTracker>::shared_ptr<std::allocator<doris::MemTracker>,
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
> > (__tag=..., this=<optimized out>) at
/var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
#32 std::allocate_shared<doris::MemTracker,
std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > (__a=...) at
/var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
#33 std::make_shared<doris::MemTracker, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > > ()
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
#34 doris::stream_load::NodeChannel::NodeChannel
(this=this@entry=0x563d0ca7e110, parent=<optimized out>,
index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized
out>) at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:49
#35 0x0000563cac74a82d in doris::stream_load::VNodeChannel::VNodeChannel
(this=this@entry=0x563d0ca7e110, parent=<optimized out>,
index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized
out>) at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:37
#36 0x0000563ca7967eaa in
__gnu_cxx::new_allocator<doris::stream_load::VNodeChannel>::construct<doris::stream_load::VNodeChannel,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
(__p=0x563d0ca7e110, this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
#37 std::allocator_traits<std::allocator<doris::stream_load::VNodeChannel>
>::construct<doris::stream_load::VNodeChannel,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
(__p=0x563d0ca7e110, __a=...) at
/var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
#38 std::_Sp_counted_ptr_inplace<doris::stream_load::VNodeChannel,
std::allocator<doris::stream_load::VNodeChannel>,
(__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<doris::stream_load::OlapTableSink*&,
doris::stream_load::IndexChannel*, long&> (__a=..., this=0x563d0ca7e100)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
#39
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::stream_load::VNodeChannel,
std::allocator<doris::stream_load::VNodeChannel>,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
(__a=..., __p=<optimized out>, this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
#40 std::__shared_ptr<doris::stream_load::VNodeChannel,
(__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::stream_load::VNodeChannel>,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
(__tag=..., this=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
#41
std::shared_ptr<doris::stream_load::VNodeChannel>::shared_ptr<std::allocator<doris::stream_load::VNodeChannel>,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
(__tag=..., this=<optimized out>) at
/var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
#42 std::allocate_shared<doris::stream_load::VNodeChannel,
std::allocator<doris::stream_load::VNodeChannel>,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
(__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
#43 std::make_shared<doris::stream_load::VNodeChannel,
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&>
()
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
#44 doris::stream_load::IndexChannel::init (this=this@entry=0x563d4b43f200,
state=state@entry=0x563e027a3500, tablets=...)
at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:705
#45 0x0000563ca79698d8 in doris::stream_load::OlapTableSink::prepare
(this=this@entry=0x5644b9efe880, state=state@entry=0x563e027a3500)
at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
#46 0x0000563cac74db65 in doris::stream_load::VOlapTableSink::prepare
(this=0x5644b9efe880, state=0x563e027a3500)
at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:450
#47 0x0000563ca7946c21 in doris::PlanFragmentExecutor::prepare
(this=this@entry=0x563f55efd280, request=..., fragments_ctx=<optimized out>)
at /var/local/ldb-toolchain/include/c++/11/bits/unique_ptr.h:173
#48 0x0000563ca791f17c in doris::FragmentExecState::prepare
(this=this@entry=0x563f55efd200, params=...)
--Type <RET> for more, q to quit, c to continue without paging--
at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:238
#49 0x0000563ca7926629 in
doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&,
std::function<void (doris::PlanFragmentExecutor*)>)
(this=this@entry=0x563cb6e93400, params=..., cb=...) at
/data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:720
#50 0x0000563ca7928dab in doris::FragmentMgr::exec_plan_fragment
(this=0x563cb6e93400, params=...)
at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:564
#51 0x0000563ca7aaf507 in
doris::PInternalServiceImpl::_exec_plan_fragment_impl
(this=this@entry=0x563cb577b880, ser_request=...,
version=<optimized out>, compact=<optimized out>) at
/data/TCHouse-D-1.2/be/src/service/internal_service.cpp:480
#52 0x0000563ca7aaf703 in
doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread
(this=0x563cb577b880, controller=<optimized out>,
request=0x563d6c5bb9b0, response=0x564203b92ce0, done=0x563d885a60c0)
at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:254
#53 0x0000563ca78d6dbd in std::function<void ()>::operator()() const
(this=0x7f1f7fe090c8)
at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#54 doris::PriorityThreadPool::work_thread (this=0x563cb577ba38,
thread_id=<optimized out>)
at /data/TCHouse-D-1.2/be/src/util/priority_thread_pool.hpp:145
#55 0x0000563caf526b00 in std::execute_native_thread_routine
(__p=0x563cbfc81d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#56 0x00007f219771dea5 in start_thread () from /lib64/libpthread.so.0
#57 0x00007f2197a309fd in clone () from /lib64/libc.so.6
```
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
---
be/src/runtime/memory/thread_mem_tracker_mgr.cpp | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
b/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
index d45d6b8cb5f..5ef60b9c13c 100644
--- a/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
+++ b/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
@@ -56,7 +56,9 @@ void ThreadMemTrackerMgr::exceeded(int64_t size) {
if (_cb_func != nullptr) {
_cb_func();
}
- _limiter_tracker_raw->print_log_usage(_exceed_mem_limit_msg);
+
+ // avoid deadlock, do not print log here:
+ // _limiter_tracker_raw->print_log_usage(_exceed_mem_limit_msg);
if (is_attach_query()) {
if (_is_process_exceed && _wait_gc) {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]