felixwluo opened a new pull request, #45747:
URL: https://github.com/apache/doris/pull/45747
### What problem does this PR solve?
Core Dump
```
(gdb) bt
#0 0x000055f476bcda1d in
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
(this=0x7f1187acbb00)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168
#1 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count
(this=0x7f12bbeaac98)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702
#2 std::__shared_ptr<doris::MetricEntity,
(__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f12bbeaac90)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149
#3 doris::BaseTablet::~BaseTablet (this=0x7f12bbeaac10) at
/root/be/src/olap/base_tablet.cpp:53
#4 0x000055f476beabbb in
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
(this=0x7f12bbeaac00)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168
#5 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count
(this=0x7f11b8d046c8)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702
#6 std::__shared_ptr<doris::Tablet,
(__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f11b8d046c0)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149
#7 std::destroy_at<std::shared_ptr<doris::Tablet> >
(__location=0x7f11b8d046c0)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88
#8 std::_Destroy<std::shared_ptr<doris::Tablet> > (__pointer=0x7f11b8d046c0)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:138
#9 std::_Destroy_aux<false>::__destroy<std::shared_ptr<doris::Tablet>*>
(__first=0x7f11b8d046c0, __last=0x7f11b8d04c80)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:152
#10 std::_Destroy<std::shared_ptr<doris::Tablet>*> (__first=<optimized out>,
__last=0x7f11b8d04c80)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:184
#11 std::_Destroy<std::shared_ptr<doris::Tablet>*,
std::shared_ptr<doris::Tablet> > (__first=<optimized out>,
__last=0x7f11b8d04c80)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:746
#12 std::vector<std::shared_ptr<doris::Tablet>,
std::allocator<std::shared_ptr<doris::Tablet> > >::~vector (this=<optimized
out>)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:680
#13 doris::TabletManager::start_trash_sweep()::$_2::operator()() const
(this=<optimized out>) at /root/be/src/olap/tablet_manager.cpp:1105
#14 doris::TabletManager::start_trash_sweep (this=0x7f17fc2d1d00) at
/root/be/src/olap/tablet_manager.cpp:1110
#15 0x000055f4761ac0c6 in doris::StorageEngine::start_trash_sweep
(this=0x7f17fbef7000, usage=0x7f150f1bf3d0, ignore_guard=<optimized out>)
at /root/be/src/olap/storage_engine.cpp:803
#16 0x000055f476a355e6 in
doris::StorageEngine::_garbage_sweeper_thread_callback (this=0x7f17fbef7000) at
/root/be/src/olap/olap_server.cpp:300
#17 0x000055f47707da51 in std::function<void ()>::operator()() const
(this=0x7f1187acbb00)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560
#18 doris::Thread::supervise_thread (arg=0x7f17fbf7da40) at
/root/be/src/util/thread.cpp:498
#19 0x00007f182d17fea5 in start_thread () from /lib64/libpthread.so.0
#20 0x00007f182dbae9fd in clone () from /lib64/libc.so.6
```
Cause of occurrence
`The crash occurred during the processing of _metric_entity at BaseTablet
destructor, from memory, the reference count for _metric_entity is already 0,
but there is still a weak reference, n a multithreaded environment, a race
condition may occur between deregister_entity and reset_metric_entity`
GDB
```
(gdb) f 0
#0 0x000055f476bcda1d in
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
(this=0x7f1187acbb00)
at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168
168 in
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h
(gdb) p *this
$18 = {<std::_Mutex_base<(__gnu_cxx::_Lock_policy)2>> = {<No data fields>},
_vptr$_Sp_counted_base = 0x55f46f61696a, _M_use_count = 0,
_M_weak_count = 1}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]