[
https://issues.apache.org/jira/browse/KUDU-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925013#comment-17925013
]
ASF subversion and git services commented on KUDU-3633:
-------------------------------------------------------
Commit 24d93d4ad7eb0471284f711efb133066e1736a8c in kudu's branch
refs/heads/branch-1.18.x from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=24d93d4ad ]
KUDU-3633 shutdown DnsResolver in ServerBase::ShutdownImpl()
The thread pool of the DNS resolver should be shut down along with the
messenger in ServerBase to prevent retrying of RPCs that failed as a
collateral of the shutdown process in progress. Those RPCs might be
retried by invoking rpc::Proxy::RefreshDnsAndEnqueueRequest(), etc.
On the related note, I also added a guard to protect ThreadPool::tokens_
in the destructor of the ThreadPool class, as elsewhere. I also snuck
in an update to call DCHECK() in a loop only when DCHECK_IS_ON()
macro evaluates to 'true'.
This addresses flakiness reported at least in one of the RemoteKsckTest
scenarios (e.g., TestFilterOnNotabletTable in [1]). One of the related
TSAN reports looked like below:
RemoteKsckTest.TestFilterOnNotabletTable: WARNING: ThreadSanitizer: data race
Read of size 8 at 0x7b54001e5118 by main thread:
#0 std::__1::__hash_table<kudu::ThreadPoolToken*, ...>::size() const
#1 std::__1::unordered_set<kudu::ThreadPoolToken*, ...>::size() const
#2 kudu::ThreadPool::~ThreadPool()
...
#6 kudu::kserver::KuduServer::~KuduServer()
#7 kudu::tserver::TabletServer::~TabletServer()
...
Previous write of size 8 at 0x7b54001e5118 by thread T262 ...:
#0 std::__1::__hash_table<kudu::ThreadPoolToken*, ...>::remove(...)
...
#4 kudu::ThreadPool::ReleaseToken(...)
#5 kudu::ThreadPoolToken::~ThreadPoolToken()
...
#24 kudu::consensus::LeaderElection::~LeaderElection()
...
#35 kudu::rpc::Proxy::RefreshDnsAndEnqueueRequest(...)
...
#41 kudu::DnsResolver::RefreshAddressesAsync()
...
Thread T262 'dns-resolver [w' (tid=29102, running) created by thread T182 at:
#0 pthread_create
#1 kudu::Thread::StartThread(...)
#2 kudu::Thread::Create(...)
#3 kudu::ThreadPool::CreateThread()
#4 kudu::ThreadPool::DoSubmit(..., kudu::ThreadPoolToken*)
#5 kudu::ThreadPool::Submit(...)
#6 kudu::DnsResolver::RefreshAddressesAsync(..)
#7 kudu::rpc::Proxy::RefreshDnsAndEnqueueRequest(...)
#8 kudu::rpc::Proxy::AsyncRequest(...)
...
#15 kudu::rpc::OutboundCall::CallCallback()
#16 kudu::rpc::OutboundCall::SetFailed()
#17 kudu::rpc::Connection::Shutdown()
#18 kudu::rpc::ReactorThread::ShutdownInternal()
...
#25 kudu::rpc::ReactorThread::RunThread()
...
[1] http://dist-test.cloudera.org:8080/test_drilldown?test_name=ksck_remote-test
Change-Id: I525f1078a349dbd2926938bb4fcc3e80888dfbb4
Reviewed-on: http://gerrit.cloudera.org:8080/22434
Tested-by: Alexey Serbin <[email protected]>
Reviewed-by: Abhishek Chennaka <[email protected]>
(cherry picked from commit fc40fcda30a93baabf50299a68af6023a44b369d)
Reviewed-on: http://gerrit.cloudera.org:8080/22449
> Threadpool check flakiness in ksck_remote-test during MiniMaster shutdown
> -------------------------------------------------------------------------
>
> Key: KUDU-3633
> URL: https://issues.apache.org/jira/browse/KUDU-3633
> Project: Kudu
> Issue Type: Sub-task
> Reporter: Bakai Ádám
> Assignee: Bakai Ádám
> Priority: Major
>
> {code:java}
> F20241204 12:57:40.147302 16123 threadpool.cc:391] Check failed: 1 ==
> tokens_.size() (1 vs. 3) Threadpool raft destroyed with 3 allocated tokens
> *** Check failure stack trace: ***\{code}
> {code:java}
> @ 0x7f6b96b2cd64 google::LogMessage::SendToLog() at ??:0
> @ 0x7f6b96b2d910 google::LogMessage::Flush() at ??:0
> @ 0x7f6b96b32a4b google::LogMessageFatal::~LogMessageFatal() at ??:0
> @ 0x7f6b974a777d kudu::ThreadPool::~ThreadPool() at ??:0
> I20241204 12:57:40.556027 23288 raft_consensus.cc:1270] T
> df574f38d0a746d1929d9494d82da991 P c273df5d41694d4da3bc1b5bc5e81b84 [term 2
> FOLLOWER]: Refusing update from remote peer 2e54eeefd5f947279415fb606d3fe035:
> Log matching property violated. Preceding OpId in replica: term: 1 index: 1.
> Preceding OpId from leader: term: 2 index: 2. (index mismatch)
> I20241204 12:57:40.558073 23666 consensus_queue.cc:1035] T
> df574f38d0a746d1929d9494d82da991 P 2e54eeefd5f947279415fb606d3fe035 [LEADER]:
> Connected to new peer: Peer: permanent_uuid:
> "c273df5d41694d4da3bc1b5bc5e81b84" member_type: VOTER last_known_addr { host:
> "127.15.190.193" port: 33967 }, Status: LMP_MISMATCH, Last received: 0.0,
> Next index: 2, Last known committed idx: 1, Time since last communication:
> 0.000s
> @ 0x7f6b9ff4f6bf std::__1::default_delete<>::operator()() at ??:0
> I20241204 12:57:40.605798 23460 raft_consensus.cc:1270] T
> df574f38d0a746d1929d9494d82da991 P 87f06d0d674a4791871f81a7af62b7be [term 2
> FOLLOWER]: Refusing update from remote peer 2e54eeefd5f947279415fb606d3fe035:
> Log matching property violated. Preceding OpId in replica: term: 1 index: 1.
> Preceding OpId from leader: term: 2 index: 2. (index mismatch)
> I20241204 12:57:40.611544 23707 consensus_queue.cc:1035] T
> df574f38d0a746d1929d9494d82da991 P 2e54eeefd5f947279415fb606d3fe035 [LEADER]:
> Connected to new peer: Peer: permanent_uuid:
> "87f06d0d674a4791871f81a7af62b7be" member_type: VOTER last_known_addr { host:
> "127.15.190.195" port: 35365 }, Status: LMP_MISMATCH, Last received: 0.0,
> Next index: 2, Last known committed idx: 1, Time since last communication:
> 0.000s
> @ 0x7f6b9ff4f62e std::__1::unique_ptr<>::reset() at ??:0
> @ 0x7f6b9ff0e2cc std::__1::unique_ptr<>::~unique_ptr() at ??:0
> @ 0x7f6b9ffb65b4 kudu::kserver::KuduServer::~KuduServer() at ??:0
> @ 0x7f6b9ffac863 kudu::master::Master::~Master() at ??:0
> @ 0x7f6b9ffacb5a kudu::master::Master::~Master() at ??:0
> @ 0x7f6b9ffea408 std::__1::default_delete<>::operator()() at ??:0
> @ 0x7f6b9ffe34ce std::__1::unique_ptr<>::reset() at ??:0
> @ 0x7f6ba00773c3 kudu::master::MiniMaster::Shutdown() at ??:0
> @ 0x354ea9
> kudu::tools::RemoteKsckTest_TestClusterWithLocation_Test::TestBody() at
> /root/tmp/test123/kudu/src/kudu/tools/ksck_remote-test.cc:607
> @ 0x7f6ba045adc0
> testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
> @ 0x7f6ba04389c2 testing::Test::Run() at ??:0
> @ 0x7f6ba0439cd9 testing::TestInfo::Run() at ??:0
> @ 0x7f6ba043acb5 testing::TestSuite::Run() at ??:0
> @ 0x7f6ba044f7a5 testing::internal::UnitTestImpl::RunAllTests() at
> ??:0
> @ 0x7f6ba045bc80
> testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
> @ 0x7f6ba044ed5d testing::UnitTest::Run() at ??:0
> @ 0x3801bc RUN_ALL_TESTS() at
> /root/tmp/test123/kudu/thirdparty/installed/tsan/include/gtest/gtest.h:?
> @ 0x37f0bd main at
> /root/tmp/test123/kudu/src/kudu/util/test_main.cc:?
> @ 0x7f6b93f58bf7 __libc_start_main at ??:0
> @ 0x298ada _start at ??:? {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)