[
https://issues.apache.org/jira/browse/KUDU-3651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935620#comment-17935620
]
ASF subversion and git services commented on KUDU-3651:
-------------------------------------------------------
Commit e9b3dae78edd84b4630e77601f285476ecae2f38 in kudu's branch
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=e9b3dae78 ]
KUDU-3651 fix race condition in TabletReplica::Stop()
This patch addresses a race condition in TabletReplica::Stop(). Before
this patch, new operations might be accepted by a tablet replica right
after calling OpTracker::WaitForAllToFinish() and before completing the
shutdown of the replica's prepare pool token.
The race has been manifesting itself at least as a flakiness in various
test scenarios in txn_participant-test [1]. In one particular instance,
the following TSAN warnings were issued while running the
TxnParticipantTest.TestBeginCommitAnchorsOnFlush scenario:
WARNING: ThreadSanitizer: data race (pid=4116)
Write of size 8 at 0x7b4400027688 by main thread:
#0 std::__1::__vector_base<kudu::MemTracker*,
std::__1::allocator<kudu::MemTracker*> >::__destruct_at_end(kudu::MemTracker**)
...
#3 std::__1::vector<kudu::MemTracker*,
std::__1::allocator<kudu::MemTracker*> >::~vector()
#4 kudu::MemTracker::~MemTracker() mem_tracker.cc:83:1
...
#9 kudu::tablet::OpTracker::~OpTracker()
#10 kudu::tablet::TabletReplica::~TabletReplica()
...
#16
scoped_refptr<kudu::tablet::TabletReplica>::reset(kudu::tablet::TabletReplica*)
#17 kudu::tablet::TabletReplicaTestBase::RestartReplica(bool)
Previous read of size 8 at 0x7b4400027688 by thread T20 (mutexes: write
M1047222376632167904):
#0 std::__1::vector<kudu::MemTracker*,
std::__1::allocator<kudu::MemTracker*> >::end()
#1 kudu::MemTracker::Release(long)
#2 kudu::tablet::OpTracker::Release(kudu::tablet::OpDriver*)
#3 kudu::tablet::OpDriver::Finalize()
#4 kudu::tablet::OpDriver::ApplyTask()
#5 kudu::tablet::OpDriver::ApplyAsync()::$_2::operator()()
...
[1]
http://dist-test.cloudera.org:8080/test_drilldown?test_name=txn_participant-test
Change-Id: I993015bf73ad8fe84a864b8b3c030e1be00e26e0
Reviewed-on: http://gerrit.cloudera.org:8080/22612
Reviewed-by: Abhishek Chennaka <[email protected]>
Reviewed-by: Marton Greber <[email protected]>
Tested-by: Marton Greber <[email protected]>
> Race condition in TabletReplica::Stop()
> ---------------------------------------
>
> Key: KUDU-3651
> URL: https://issues.apache.org/jira/browse/KUDU-3651
> Project: Kudu
> Issue Type: Bug
> Components: master, tablet, tserver
> Affects Versions: 0.7.0, 0.7.1, 0.8.0, 0.9.0, 0.9.1, 0.10.0, 1.0.0, 1.0.1,
> 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0,
> 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0, 1.15.0, 1.16.0,
> 1.17.0, 1.17.1
> Reporter: Alexey Serbin
> Priority: Major
> Attachments: txn_participant-test.txt.xz
>
>
> There is a race condition in {{TabletReplica::Stop()}}: new operations might
> be accepted by a tablet replica right after calling
> {{OpTracker::WaitForAllToFinish()}} and before the replica's prepare pool
> token is shut down.
> The race manifests itself at least as a TSAN warning in various scenarios of
> {{txn_participant-test}}. See the attached log for a report on one
> particular instance of the race. One excerpt is below:
> {noformat}
> WARNING: ThreadSanitizer: data race (pid=4116)
> Write of size 8 at 0x7b4400027688 by main thread:
> #0 std::__1::__vector_base<kudu::MemTracker*,
> std::__1::allocator<kudu::MemTracker*>
> >::__destruct_at_end(kudu::MemTracker**)
> ...
> #3 std::__1::vector<kudu::MemTracker*,
> std::__1::allocator<kudu::MemTracker*> >::~vector()
> #4 kudu::MemTracker::~MemTracker() mem_tracker.cc:83:1
> ...
> #9 kudu::tablet::OpTracker::~OpTracker()
> #10 kudu::tablet::TabletReplica::~TabletReplica()
> ...
> #16
> scoped_refptr<kudu::tablet::TabletReplica>::reset(kudu::tablet::TabletReplica*)
> #17 kudu::tablet::TabletReplicaTestBase::RestartReplica(bool)
>
> Previous read of size 8 at 0x7b4400027688 by thread T20 (mutexes: write
> M1047222376632167904):
> #0 std::__1::vector<kudu::MemTracker*,
> std::__1::allocator<kudu::MemTracker*> >::end()
> #1 kudu::MemTracker::Release(long)
> #2 kudu::tablet::OpTracker::Release(kudu::tablet::OpDriver*)
> #3 kudu::tablet::OpDriver::Finalize()
> #4 kudu::tablet::OpDriver::ApplyTask()
> #5 kudu::tablet::OpDriver::ApplyAsync()::$_2::operator()()
> ...
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)