Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )
Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only ...................................................................... Patch Set 3: (13 comments) Mostly nits, but there are a few places that seem a little suspicious w.r.t locking. http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/master/txn_manager-test.cc File src/kudu/master/txn_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/master/txn_manager-test.cc@137 PS3, Line 137: // Find the replica. : tablet_replica_ = LookupTabletReplica(); nit: spacing Also why is this necessary? Doesn't the master hold a reference to this replica? Is this meant to protect against a race? http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@285 PS3, Line 285: { Do we need to take the state change lock here? Is it OK if we proceed while racing with shutdown? http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@286 PS3, Line 286: std::unique_lock<simple_spinlock> lock(lock_); nit: since we're not using 'lock', we can just use std::lock_guard<> here http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@312 PS3, Line 312: // Submit to pool : if (shutdown_latch_->count() == 0) { : return; : } Do we still need this if we take the state change lock and hold it for the duration of this function? http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@877 PS3, Line 877: RegisterTxnCoordinator nit: this doesn't seem to be "registering" anything per se. Maybe call this MarkTxnCoordinatorRegistered() instead? http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@887 PS3, Line 887: tablet_ nit: tablet_id()? http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@887 PS3, Line 887: "Not registering transaction coordinator for " << tablet_ : << ": tablet not in RUNNING state"; nit: use Substitute() for better readability. Also since this isn't _not_ registering, so maybe instead say "Registering transaction coordinator while tablet is not RUNNING" or somesuch. http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.h@110 PS3, Line 110: specified nit: I might've missed this; where is this specified? http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc@204 PS3, Line 204: nit: drop space http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc@356 PS3, Line 356: auto cleanup = MakeScopedCleanup([&]() { : status_tablet_.tablet_replica_->UpdateTxnCoordinatorReloadTask(false); : }); nit: if we don't intend on cancelling this, we don't need to define a variable for it, and we can just use the SCOPED_CLEANUP macro instead http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc@434 PS3, Line 434: if (!check([this]() { return this->LoadFromTabletUnlocked(); }, : *consensus, term, kLoadMetaOpDescription).ok()) { : return; : } nit: Why bother putting this in a lambda? IMO it makes it more difficult to read, since readers have to reason about a couple layers of function calls here. http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tserver/ts_tablet_manager.cc@1426 PS3, Line 1426: if (r->txn_coordinator_registered()) { It seems like there's still room for a TOCTOU race here if right after this check, we deleted the tablet. http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tserver/ts_tablet_manager.cc@1447 PS3, Line 1447: auto cleanup = MakeScopedCleanup([&]() { : r->UpdateTxnCoordinatorStalenessTask(false); : }); nit: SCOPED_CLEANUP here too -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 05 Jan 2021 23:42:19 +0000 Gerrit-HasComments: Yes