Hello Tidy Bot, Kudu Jenkins, helifu, Adar Dembo, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14061 to look at the new patch set (#4). Change subject: [tablet] Fixed the bug of DeltaTracker::CountDeletedRows ...................................................................... [tablet] Fixed the bug of DeltaTracker::CountDeletedRows When Tablet.CountLiveRows was called in a multi-thread case, there's a chance we'll see the following failure. User stack: F0814 12:05:51.975797 96375 diskrowset.cc:759] Check failed: *count >= 0 (-3 vs. 0) *** Check failure stack trace: *** *** Aborted at 1565755551 (unix time) try "date -d @1565755551" if you are using GNU date *** PC: @ 0x7f9bd20425f7 __GI_raise *** SIGABRT (@0x70900017872) received by PID 96370 (TID 0x7f9bce2d7700) from PID 96370; stack trace: *** @ 0x7f9bdaff6100 (unknown) @ 0x7f9bd20425f7 __GI_raise @ 0x7f9bd2043ce8 __GI_abort @ 0x7f9bd4540c99 google::logging_fail() @ 0x7f9bd454246d google::LogMessage::Fail() @ 0x7f9bd45443c3 google::LogMessage::SendToLog() @ 0x7f9bd4541fc9 google::LogMessage::Flush() @ 0x7f9bd4544d4f google::LogMessageFatal::~LogMessageFatal() @ 0x7f9bddc9aabe kudu::tablet::DiskRowSet::CountLiveRows() @ 0x7f9bddbdeb79 kudu::tablet::Tablet::CountLiveRows() @ 0x49891f kudu::tablet::MultiThreadedTabletTest<>::CollectStatisticsThread() @ 0x4ae34b boost::_mfi::mf1<>::operator()() @ 0x4add25 boost::_bi::list2<>::operator()<>() @ 0x4acfe9 boost::_bi::bind_t<>::operator()() @ 0x4ac8a6 boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x7f9bd7116492 boost::function0<>::operator()() @ 0x7f9bd62e5324 kudu::Thread::SuperviseThread() @ 0x7f9bdafeedc5 start_thread @ 0x7f9bd2103ced __clone This is because there is DeltaTracker lack of lock protection when modify the number of live rows in rowset_metadata_ and reset the deleted_row_count_. This caused deleted_row_count_ to be duplicated when calculating the number of live rows of DRS. Consider the following sequence: | T1 | T2 |---------- |---------- |+ In DT::Flush | | Take compact_flush_lock_ (excl) | | Take component_lock_ (excl) | | deleted_row_count_ = ... | | Release component_lock_ | | + In DT::FlushDMS | | Call RSMD::IncrementLiveRows | | --> RSMD::live_row_count - deleted_row_count_ | |+ In DRS::CountLiveRows | | Take component_lock_ (shared) | | Call RSMD::live_row_count - DT::CountDeletedRows | | --> RSMD::live_row_count - deleted_row_count_ | | --> we double counted deleted_row_count_ !!! | Take component_lock_ (excl) | | deleted_row_count_ = 0 | | Release component_lock_ | | Release compact_flush_lock_ | Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf --- M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/delta_tracker.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata-test.cc M src/kudu/tablet/mt-tablet-test.cc M src/kudu/tablet/rowset_metadata.cc M src/kudu/tablet/rowset_metadata.h 10 files changed, 145 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/14061/4 -- To view, visit http://gerrit.cloudera.org:8080/14061 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9bb4456123087778c9dc799777c5990938a84fdf Gerrit-Change-Number: 14061 Gerrit-PatchSet: 4 Gerrit-Owner: Yao Xu <ocla...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Yao Xu <ocla...@gmail.com> Gerrit-Reviewer: helifu <hzhel...@corp.netease.com>