Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by 
the leader only
......................................................................


Patch Set 3:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@154
PS1, Line 154:                        *meta_->table_type() == 
TableTypePB::TXN_STATUS_TABLE ?
> Do we also need to call cb() here?
Yeah, good point, I don't see reasons not to,  added it.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@147
PS1, Line 147:       cmeta_manager_(DCHECK_NOTNULL(std::move(cmeta_manager))),
             :       local_peer_pb_(std::move(local_peer_pb)),
             :       log_anchor_registry_(new LogAnchorRegistry()),
             :       apply_pool_(apply_pool),
             :       
reload_txn_status_tablet_pool_(reload_txn_status_tablet_pool),
             :       shutdown_latch_(shutdown_latch),
             :       txn_coordinator_(meta_->table_type() &&
             :                        *meta_->table_type() == 
TableTypePB::TXN_STATUS_TABLE ?
             :                        DCHECK_NOTNULL(txn_
> FWIW I think this would be more plumbing that it's worth.
Agree with Andrew on this. Alexey, let me know if you think otherwise


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302
PS1, Line 302: PREFIX(WA
> nit: why does this need to be a lambda?
Just want to be clear this is for leadership change cb, but I guess it is a bit 
redundant. Removed.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302
PS1, Line 302:     LOG_WITH_PREFIX(WARNIN
> Just curious: is this going to be called if leadership status hasn't change
Right, this is going to be called for any leadership status change, even though 
the former leader can wins the latest election again. As we discussed offline, 
it is not clear what the performance implication will be, so adding a TODO in 
case we think optimization of not reloading the metadata for this case.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h@95
PS1, Line 95:     explicit ScopedLeaderSharedLock(TxnStatusManager* txn_status);
> warning: invalid case style for parameter 'txn_status_' [readability-identi
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@62
PS1, Line 62: 
DEFINE_int32(txn_status_manager_inject_latency_load_from_tablet_ms, 0,
> warning: using decl 'CoordinateTransactionResponsePB' is unused [misc-unuse
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@83
PS1, Line 83:
> nit: construction
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@189
PS1, Line 189:
> nit: remove extra slashes?
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@300
PS1, Line 300:   if (consensus->role() != RaftPeerPB::LEADER) {
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@324
PS1, Line 324:   
RETURN_NOT_OK(status_tablet_.tablet_replica_->op_tracker()->WaitForAllToFinish(timeout));
> warning: the parameter 'func' is copied for each invocation but only used a
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@347
PS1, Line 347:     return Status::NotAuthoriz
> nit: you can use __builtin_unreachable()
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@357
PS1, Line 357: table
> nit: This gets used once -- why does it need to be a lambda?
Hmm, since 'check' expects 'std::function<Status()>', what do you suggest 
instead of using lambda?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@202
PS2, Line 202:     return;
> What happens if an instance of this lock is being constructed and the table
Right, updated the code to ensure the lock is only hold for tablet replica that 
is fully started as Andrew suggested.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@386
PS1, Line 386: disks
> nit: data directories ?
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@389
PS1, Line 389:   RETURN_NOT_OK(ThreadPoolBuilder("tablet-delete")
             :                 .set_max_threads(max_delete_threads)
             :                 .Build(&delete_tablet_pool_));
> Is it necessary to gracefully shutdown this pool along with other tablet po
Yeah, added the shutdown along with the other pools.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@1238
PS1, Line 1238:     std::lock_guard<RWMutex> loc
> We should to shut down our new pool here.
Done


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc@1420
PS2, Line 1420: vector<scoped_refptr<TabletReplica>> replicas;
              :     {
              :       shared_lock<RWMutex> l(lock_);
              :       for (const auto& elem : tablet_map_) {
              :         auto r = elem.second;
              :         // Find txn status tablet replicas.
              :         if (r->txn_coordinator_registered()) {
              :           replicas.emplace_back(std::move(r));
              :         }
              :       }
              :     }
              :     for (auto& r : replicas) {
              :       if (shutdown_latch_.count() == 0) {
              :         return;
              :       }
> It may be worth considering updating this to look more like how we schedule
Makes sense, updated the code accordingly. Now txn_status_tablet-itest seems to 
be longer flaky, 
http://dist-test.cloudera.org/job?job_id=hao.hao.1609829456.46540, however, 
txn_status_manager-itest is still flaky, investigating it.



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 05 Jan 2021 07:01:47 +0000
Gerrit-HasComments: Yes

Reply via email to