Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the 
leader only
......................................................................


Patch Set 6:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744
PS6, Line 744: ;
> nit: drop extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751
PS6, Line 751: threads.emplace_back([&] {
> What does the multithreading add here? Seems like it complicates things sin
The point is to ensure concurrent txn reads/writes have no failures with 
frequent leader elections. Add the comment to be clear.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758
PS6, Line 758:         if (s.ok()) {
> What failures are we expecting?
Add a comment


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748
PS6, Line 748:   std::atomic<int64_t> next_thread_id = 0;
             :   vector<int> txn_id_index = { 1, 6, 11, 16, 21};
             :   for (int t = 0; t < 5; t++) {
             :     threads.emplace_back([&] {
             :       int thread_id = next_thread_id++;
             :       CHECK_LT(thread_id, txn_id_index.size());
             :       int start_txn_id = txn_id_index[thread_id];
             :       for (int i = start_txn_id; i < start_txn_id + 5; i++) {
             :         TxnStatusEntryPB txn_status;
             :         Status s = txn_sys_client_->BeginTransaction(i, kUser);
             :         if (s.ok()) {
             :           // Make sure a re-election happens before the 
following read.
             :           SleepFor(MonoDelta::FromMilliseconds(3 * 
FLAGS_raft_heartbeat_interval_ms));
             :           s = txn_sys_client_->GetTransactionStatus(i, kUser, 
&txn_status);
             :           CHECK_OK(s);
             :           CHECK(txn_status.has_user());
             :           CHECK_STREQ(kUser, txn_status.user().c_str());
             :           CHECK(txn_status.has_state());
             :           CHECK_EQ(TxnStatePB::OPEN, txn_status.state());
             :         }
             :       }
             :     });
             :   }
> nit: would be easier to read as:
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403
PS6, Line 403: <<
> nit: maybe add a message here too indicating what's going on
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472
PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down 
or shutdown. State: $0",
             :                                TabletStatePB_Name(state_));
> nit: might be cleaner to let the callers of these otherwise simple function
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@979
PS6, Line 979:   if (!txn_coordinator_) {
             :     return false;
             :   }
> nit: maybe only call this if txn_coordinator_ is set and change this to a D
I updated the name to reflect the functionality which is 
ShouldRunTxnCoordinatorStalenessTask(). So keep it as it is here.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@997
PS6, Line 997: state_ == STOPPING || state_ == STOPPED) {
> nit: I'm curious, why not reuse the != RUNNING condition from above? A comm
Added a comment.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@109
PS6, Line 109: but some operations
             :     // may still be legal.
> I might be missing something but is this fact used anywhere? What operation
Sorry this is not used, removed it.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@106
PS6, Line 106:     const Status& replica_status() const { return 
replica_status_; }
             :
             :     // Leadership status of the transaction status manager. If 
not OK, the
             :     // transaction status manager is not the leader, but some 
operations
             :     // may still be legal.
             :     const Status& leader_status() const {
             :       return leader_status_;
             :     }
> nit: Are these ever used?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@118
PS6, Line 118: if (!replica_status_.ok()) {
             :         return replica_status_;
             :       }
             :       return leader_status_;
> nit: a bit simpler as
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83: If this time is exceeded, the "
            :              "node crashes."
> If the consequence is so severe, maybe we should set this to be something l
After thinking more, I decide to remove the crash logic and just return, since 
node crash seems to be too severe for timeout on 
OpTracker::WaitForAllToFinish(). Now, if timeout, it will trigger the client to 
get a ServiceUnavailable error and they can retry until eventually succeed (or 
fail). Let me know if you think otherwise.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@217
PS6, Line 217: leader_ready_term != initial_term_ ||
             :                     !leader_shared_lock_.owns_lock())
> nit: Curious, doesn't this mean that we've changed leaders? The error messa
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@235
PS6, Line 235: NOT_THE_LEADER
> Should this only be set if leader_status_ is set?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@320
PS6, Line 320: RETURN_NOT_OK
> nit: just return this?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@369
PS6, Line 369:   }
> nit: maybe log something about beginning to wait?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@414
PS6, Line 414: failed
> nit: "interrupted" would be less alarming, given this is somewhat normal
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@430
PS6, Line 430: Substitute("Tablet")
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc@1428
PS6, Line 1428: RunTx
> nit: maybe name these ShouldRun...
Done



--
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 6
Gerrit-Owner: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <hao....@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 07:31:25 +0000
Gerrit-HasComments: Yes

Reply via email to