Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )
Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only ...................................................................... Patch Set 3: (17 comments) http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@154 PS1, Line 154: *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ? > Do we also need to call cb() here? Yeah, good point, I don't see reasons not to, added it. http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@147 PS1, Line 147: cmeta_manager_(DCHECK_NOTNULL(std::move(cmeta_manager))), : local_peer_pb_(std::move(local_peer_pb)), : log_anchor_registry_(new LogAnchorRegistry()), : apply_pool_(apply_pool), : reload_txn_status_tablet_pool_(reload_txn_status_tablet_pool), : shutdown_latch_(shutdown_latch), : txn_coordinator_(meta_->table_type() && : *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ? : DCHECK_NOTNULL(txn_ > FWIW I think this would be more plumbing that it's worth. Agree with Andrew on this. Alexey, let me know if you think otherwise http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302 PS1, Line 302: PREFIX(WA > nit: why does this need to be a lambda? Just want to be clear this is for leadership change cb, but I guess it is a bit redundant. Removed. http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302 PS1, Line 302: LOG_WITH_PREFIX(WARNIN > Just curious: is this going to be called if leadership status hasn't change Right, this is going to be called for any leadership status change, even though the former leader can wins the latest election again. As we discussed offline, it is not clear what the performance implication will be, so adding a TODO in case we think optimization of not reloading the metadata for this case. http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h File src/kudu/transactions/txn_status_manager.h: http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h@95 PS1, Line 95: explicit ScopedLeaderSharedLock(TxnStatusManager* txn_status); > warning: invalid case style for parameter 'txn_status_' [readability-identi Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@62 PS1, Line 62: DEFINE_int32(txn_status_manager_inject_latency_load_from_tablet_ms, 0, > warning: using decl 'CoordinateTransactionResponsePB' is unused [misc-unuse Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@83 PS1, Line 83: > nit: construction Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@189 PS1, Line 189: > nit: remove extra slashes? Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@300 PS1, Line 300: if (consensus->role() != RaftPeerPB::LEADER) { > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@324 PS1, Line 324: RETURN_NOT_OK(status_tablet_.tablet_replica_->op_tracker()->WaitForAllToFinish(timeout)); > warning: the parameter 'func' is copied for each invocation but only used a Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@347 PS1, Line 347: return Status::NotAuthoriz > nit: you can use __builtin_unreachable() Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@357 PS1, Line 357: table > nit: This gets used once -- why does it need to be a lambda? Hmm, since 'check' expects 'std::function<Status()>', what do you suggest instead of using lambda? http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc File src/kudu/transactions/txn_status_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@202 PS2, Line 202: return; > What happens if an instance of this lock is being constructed and the table Right, updated the code to ensure the lock is only hold for tablet replica that is fully started as Andrew suggested. http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@386 PS1, Line 386: disks > nit: data directories ? Done http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@389 PS1, Line 389: RETURN_NOT_OK(ThreadPoolBuilder("tablet-delete") : .set_max_threads(max_delete_threads) : .Build(&delete_tablet_pool_)); > Is it necessary to gracefully shutdown this pool along with other tablet po Yeah, added the shutdown along with the other pools. http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@1238 PS1, Line 1238: std::lock_guard<RWMutex> loc > We should to shut down our new pool here. Done http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc@1420 PS2, Line 1420: vector<scoped_refptr<TabletReplica>> replicas; : { : shared_lock<RWMutex> l(lock_); : for (const auto& elem : tablet_map_) { : auto r = elem.second; : // Find txn status tablet replicas. : if (r->txn_coordinator_registered()) { : replicas.emplace_back(std::move(r)); : } : } : } : for (auto& r : replicas) { : if (shutdown_latch_.count() == 0) { : return; : } > It may be worth considering updating this to look more like how we schedule Makes sense, updated the code accordingly. Now txn_status_tablet-itest seems to be longer flaky, http://dist-test.cloudera.org/job?job_id=hao.hao.1609829456.46540, however, txn_status_manager-itest is still flaky, investigating it. -- To view, visit http://gerrit.cloudera.org:8080/16648 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a Gerrit-Change-Number: 16648 Gerrit-PatchSet: 3 Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 05 Jan 2021 07:01:47 +0000 Gerrit-HasComments: Yes