This is an automated email from the ASF dual-hosted git repository.

lide pushed a commit to branch branch-1.2-lts
in repository https://gitbox.apache.org/repos/asf/doris.git


The following commit(s) were added to refs/heads/branch-1.2-lts by this push:
     new 353cb94dbd3 [fix](be deadlock) avoid be deadlock because of MemTracker 
(#51321)
353cb94dbd3 is described below

commit 353cb94dbd32c3928a9aaf3e6716ac29e1824e38
Author: camby <[email protected]>
AuthorDate: Wed May 28 15:31:08 2025 +0800

    [fix](be deadlock) avoid be deadlock because of MemTracker (#51321)
    
    ### What problem does this PR solve?
    
    In branch-2.0, we already refractor these codes in pr:
    https://github.com/apache/doris/pull/18590
    
    Deadlock stack:
    1、While load, we alloc MemTracker and need lock TrackerGroup.group_lock
    ```
       NodeChannel::NodeChannel
         _node_channel_tracker = std::make_shared<MemTracker>
           MemTracker::bind_parent
             std::lock_guard<std::mutex> 
l(mem_tracker_pool[_parent_group_num].group_lock);
             _tracker_group_it = 
mem_tracker_pool[_parent_group_num].trackers.insert(
                    mem_tracker_pool[_parent_group_num].trackers.end(), this);
    ```
    
    2、but while we try to call std::list::insert, we need alloc
    std::_List_node,here new_hook (in file tcmalloc_hook.h) is triggered,
    then we lock the same TrackerGroup.group_lock, make it deadlock
    ```
    new_hook
    doris::ThreadMemTrackerMgr::consume
      doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true>
        doris::ThreadMemTrackerMgr::exceeded
          doris::MemTrackerLimiter::print_log_usage
             doris::MemTracker::make_group_snapshot
               std::lock_guard<std::mutex> 
l(mem_tracker_pool[group_num].group_lock);
    ```
    
    Full stack info:
    ```
    (gdb) bt
    #0  0x00007f219772454d in __lll_lock_wait () from /lib64/libpthread.so.0
    #1  0x00007f219771fe9b in _L_lock_883 () from /lib64/libpthread.so.0
    #2  0x00007f219771fd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
    #3  0x0000563ca7a8a824 in __gthread_mutex_lock (__mutex=0x563cb18cbc58)
        at 
/var/local/ldb-toolchain/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749
    #4  std::mutex::lock (this=0x563cb18cbc58) at 
/var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:100
    #5  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic 
pointer>) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:229
    #6  doris::MemTracker::make_group_snapshot (snapshots=0x7f1f7fe06ea0, 
group_num=<optimized out>, parent_label=...)
        at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:115
    #7  0x0000563ca7a7eeb4 in doris::MemTrackerLimiter::print_log_usage 
(this=0x5640ab7956c0, msg=...)
        at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker_limiter.cpp:198
    #8  0x0000563ca7a8d1e4 in doris::ThreadMemTrackerMgr::exceeded 
(this=this@entry=0x563ce6a2c820, size=1048584)
        at 
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.cpp:59
    #9  0x0000563ca78cfeb4 in 
doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true> 
(this=0x563ce6a2c820)
        at 
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:223
    #10 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, 
this=0x563ce6a2c820)
        at 
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:188
    #11 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, 
this=0x563ce6a2c820)
        at 
/data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:178
    #12 new_hook (ptr=<optimized out>, size=24) at 
/data/TCHouse-D-1.2/be/src/runtime/memory/tcmalloc_hook.h:39
    #13 0x0000563caf3dfa78 in MallocHook::InvokeNewHookSlow 
(p=p@entry=0x56422d713b40, s=s@entry=24) at src/malloc_hook.cc:498
    #14 0x0000563caf55f2c1 in MallocHook::InvokeNewHook (s=24, 
p=0x56422d713b40) at src/malloc_hook-inl.h:127
    #15 tcmalloc::do_allocate_full<tcmalloc::cpp_throw_oom> 
(size=size@entry=24) at src/tcmalloc.cc:1805
    #16 tcmalloc::allocate_full_cpp_throw_oom (size=size@entry=24) at 
src/tcmalloc.cc:1815
    #17 0x0000563caf55f429 in 
tcmalloc::dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=24) at 
src/tcmalloc.cc:1822
    #18 0x0000563ca7a8ab9a in 
__gnu_cxx::new_allocator<std::_List_node<doris::MemTracker*> >::allocate 
(__n=1, this=0x563cb18cbc40)
        at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:103
    #19 
std::allocator_traits<std::allocator<std::_List_node<doris::MemTracker*> > 
>::allocate (__n=1, __a=...)
        at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:460
    #20 std::__cxx11::_List_base<doris::MemTracker*, 
std::allocator<doris::MemTracker*> >::_M_get_node (this=0x563cb18cbc40)
        at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:442
    #21 std::__cxx11::list<doris::MemTracker*, 
std::allocator<doris::MemTracker*> >::_M_create_node<doris::MemTracker*> 
(this=0x563cb18cbc40)
        at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:634
    #22 std::__cxx11::list<doris::MemTracker*, 
std::allocator<doris::MemTracker*> >::emplace<doris::MemTracker*> 
(__position=..., this=0x563cb18cbc40)
        at /var/local/ldb-toolchain/include/c++/11/bits/list.tcc:92
    #23 std::__cxx11::list<doris::MemTracker*, 
std::allocator<doris::MemTracker*> >::insert (__x=<optimized out>, 
__position=..., this=0x563cb18cbc40)
        at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:1309
    #24 doris::MemTracker::bind_parent (this=0x563eff46ad10, parent=<optimized 
out>) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:79
    #25 0x0000563ca7a8b608 in doris::MemTracker::MemTracker 
(this=this@entry=0x563eff46ad10, label=..., parent=parent@entry=0x0)
        at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:66
    #26 0x0000563ca7967467 in 
__gnu_cxx::new_allocator<doris::MemTracker>::construct<doris::MemTracker, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
> (__p=0x563eff46ad10, this=<optimized out>) at 
/var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
    #27 std::allocator_traits<std::allocator<doris::MemTracker> 
>::construct<doris::MemTracker, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, __a=...) 
at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
    #28 std::_Sp_counted_ptr_inplace<doris::MemTracker, 
std::allocator<doris::MemTracker>, 
(__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::__cxx11::basic_string<char,
 std::char_traits<char>, std::allocator<char> > > (__a=..., this=0x563eff46ad00)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
    #29 
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::MemTracker,
 std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > (__a=..., __p=<optimized out>, 
this=<optimized out>)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
    #30 std::__shared_ptr<doris::MemTracker, 
(__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::MemTracker>, 
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
> (__tag=..., this=<optimized out>)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
    #31 
std::shared_ptr<doris::MemTracker>::shared_ptr<std::allocator<doris::MemTracker>,
 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
> > (__tag=..., this=<optimized out>) at 
/var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
    #32 std::allocate_shared<doris::MemTracker, 
std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > (__a=...) at 
/var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
    #33 std::make_shared<doris::MemTracker, std::__cxx11::basic_string<char, 
std::char_traits<char>, std::allocator<char> > > ()
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
    #34 doris::stream_load::NodeChannel::NodeChannel 
(this=this@entry=0x563d0ca7e110, parent=<optimized out>,
        index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized 
out>) at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:49
    #35 0x0000563cac74a82d in doris::stream_load::VNodeChannel::VNodeChannel 
(this=this@entry=0x563d0ca7e110, parent=<optimized out>,
        index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized 
out>) at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:37
    #36 0x0000563ca7967eaa in 
__gnu_cxx::new_allocator<doris::stream_load::VNodeChannel>::construct<doris::stream_load::VNodeChannel,
 doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
(__p=0x563d0ca7e110, this=<optimized out>)
        at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
    #37 std::allocator_traits<std::allocator<doris::stream_load::VNodeChannel> 
>::construct<doris::stream_load::VNodeChannel, 
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
(__p=0x563d0ca7e110, __a=...) at 
/var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
    #38 std::_Sp_counted_ptr_inplace<doris::stream_load::VNodeChannel, 
std::allocator<doris::stream_load::VNodeChannel>, 
(__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<doris::stream_load::OlapTableSink*&,
 doris::stream_load::IndexChannel*, long&> (__a=..., this=0x563d0ca7e100)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
    #39 
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::stream_load::VNodeChannel,
 std::allocator<doris::stream_load::VNodeChannel>, 
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
(__a=..., __p=<optimized out>, this=<optimized out>)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
    #40 std::__shared_ptr<doris::stream_load::VNodeChannel, 
(__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::stream_load::VNodeChannel>,
 doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
(__tag=..., this=<optimized out>)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
    #41 
std::shared_ptr<doris::stream_load::VNodeChannel>::shared_ptr<std::allocator<doris::stream_load::VNodeChannel>,
 doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
(__tag=..., this=<optimized out>) at 
/var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
    #42 std::allocate_shared<doris::stream_load::VNodeChannel, 
std::allocator<doris::stream_load::VNodeChannel>, 
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
(__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
    #43 std::make_shared<doris::stream_load::VNodeChannel, 
doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> 
()
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
    #44 doris::stream_load::IndexChannel::init (this=this@entry=0x563d4b43f200, 
state=state@entry=0x563e027a3500, tablets=...)
        at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:705
    #45 0x0000563ca79698d8 in doris::stream_load::OlapTableSink::prepare 
(this=this@entry=0x5644b9efe880, state=state@entry=0x563e027a3500)
        at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
    #46 0x0000563cac74db65 in doris::stream_load::VOlapTableSink::prepare 
(this=0x5644b9efe880, state=0x563e027a3500)
        at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:450
    #47 0x0000563ca7946c21 in doris::PlanFragmentExecutor::prepare 
(this=this@entry=0x563f55efd280, request=..., fragments_ctx=<optimized out>)
        at /var/local/ldb-toolchain/include/c++/11/bits/unique_ptr.h:173
    #48 0x0000563ca791f17c in doris::FragmentExecState::prepare 
(this=this@entry=0x563f55efd200, params=...)
    --Type <RET> for more, q to quit, c to continue without paging--
        at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:238
    #49 0x0000563ca7926629 in 
doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, 
std::function<void (doris::PlanFragmentExecutor*)>) 
(this=this@entry=0x563cb6e93400, params=..., cb=...) at 
/data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:720
    #50 0x0000563ca7928dab in doris::FragmentMgr::exec_plan_fragment 
(this=0x563cb6e93400, params=...)
        at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:564
    #51 0x0000563ca7aaf507 in 
doris::PInternalServiceImpl::_exec_plan_fragment_impl 
(this=this@entry=0x563cb577b880, ser_request=...,
        version=<optimized out>, compact=<optimized out>) at 
/data/TCHouse-D-1.2/be/src/service/internal_service.cpp:480
    #52 0x0000563ca7aaf703 in 
doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread 
(this=0x563cb577b880, controller=<optimized out>,
        request=0x563d6c5bb9b0, response=0x564203b92ce0, done=0x563d885a60c0) 
at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:254
    #53 0x0000563ca78d6dbd in std::function<void ()>::operator()() const 
(this=0x7f1f7fe090c8)
        at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
    #54 doris::PriorityThreadPool::work_thread (this=0x563cb577ba38, 
thread_id=<optimized out>)
        at /data/TCHouse-D-1.2/be/src/util/priority_thread_pool.hpp:145
    #55 0x0000563caf526b00 in std::execute_native_thread_routine 
(__p=0x563cbfc81d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
    #56 0x00007f219771dea5 in start_thread () from /lib64/libpthread.so.0
    #57 0x00007f2197a309fd in clone () from /lib64/libc.so.6
    ```
    
    ### Release note
    
    None
    
    ### Check List (For Author)
    
    - Test <!-- At least one of them must be included. -->
        - [ ] Regression test
        - [ ] Unit Test
        - [ ] Manual test (add detailed scripts or steps below)
        - [ ] No need to test or manual test. Explain why:
    - [ ] This is a refactor/code format and no logic has been changed.
            - [ ] Previous test can cover this change.
            - [ ] No code files have been changed.
            - [ ] Other reason <!-- Add your reason?  -->
    
    - Behavior changed:
        - [ ] No.
        - [ ] Yes. <!-- Explain the behavior change -->
    
    - Does this need documentation?
        - [ ] No.
    - [ ] Yes. <!-- Add document PR link here. eg:
    https://github.com/apache/doris-website/pull/1214 -->
    
    ### Check List (For Reviewer who merge this PR)
    
    - [ ] Confirm the release note
    - [ ] Confirm test cases
    - [ ] Confirm document
    - [ ] Add branch pick label <!-- Add branch pick label that this PR
    should merge into -->
---
 be/src/runtime/memory/thread_mem_tracker_mgr.cpp | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/be/src/runtime/memory/thread_mem_tracker_mgr.cpp 
b/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
index d45d6b8cb5f..5ef60b9c13c 100644
--- a/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
+++ b/be/src/runtime/memory/thread_mem_tracker_mgr.cpp
@@ -56,7 +56,9 @@ void ThreadMemTrackerMgr::exceeded(int64_t size) {
     if (_cb_func != nullptr) {
         _cb_func();
     }
-    _limiter_tracker_raw->print_log_usage(_exceed_mem_limit_msg);
+
+    // avoid deadlock, do not print log here:
+    // _limiter_tracker_raw->print_log_usage(_exceed_mem_limit_msg);
 
     if (is_attach_query()) {
         if (_is_process_exceed && _wait_gc) {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to