Todd Lipcon has posted comments on this change.

Change subject: Allow tablet shutdown without completing txs
......................................................................


Patch Set 11:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/7439/11/src/kudu/tablet/mvcc.h
File src/kudu/tablet/mvcc.h:

PS11, Line 270: transactions must not durably complete operations
is this precisely true? or is it that transactions are _allowed_ to abort? 
there is no CHECK ensuring that, once shutting down, nothing commits, and I 
think adding such a restriction could be tricky.

I think it is reasonable to disallow starting of any new timestamps once 
shutting down, but I dont think it's doing that yet either, right?


http://gerrit.cloudera.org:8080/#/c/7439/11/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

PS11, Line 764: message
use ToString so we retain the original code?


Line 765:         LOG(ERROR) << s.ToString();
add context of the tablet id, maybe the opid which should be available?

I think WARN is probably more appropriate because things are recovering without 
data loss


PS11, Line 800: exitted
typo? exited


PS11, Line 799:   if (PREDICT_FALSE(mvcc_.IsShuttingDown())) {
              :     return Status::Aborted("Transaction exitted early because 
tablet is shutting down");
              :   }
mind making a short function like Tablet::CheckNotShuttingDown with this code 
in it? seems it's called from a few places


Line 852:         LOG(ERROR) << "Failed to check if row is present; ending 
transaction early";
we should be able to CHECK(IsShuttingDown()) or something here, right?

I think we basically want an invariant that operations can only abort if the 
tablet is shutting down. Even if you are the one who hit the error, you need to 
ensure that it's marked shutting down before you complete the operation. 
Otherwise you could release the lock and allow some other operation to proceed 
and commit out-of-order.


Line 853:         CHECK(s.IsDiskFailure());
I think this is a bit too restrictive. For example if there is some local 
heisenbug it could cause a tablet to become corrupt in some way that is not a 
disk failure error, and I think the correct thing to do in that circumstance is 
also to mark the tablet as failed (without marking all other tablets on that 
disk as failed)


http://gerrit.cloudera.org:8080/#/c/7439/11/src/kudu/tablet/transactions/transaction_driver.cc
File src/kudu/tablet/transactions/transaction_driver.cc:

Line 347:       if 
(!state()->tablet_replica()->tablet()->mvcc_manager()->IsShuttingDown()) {
I think reaching into MvccManager is a bit strange here. Even though there is a 
1:1 of tablet and mvccmanager I think it makes more sense to have the 
additional state enum in Tablet which says "shutting down". It happens that 
when the tablet shuts down, it also shuts down its MvccManager (thus allowing 
writes to abort without firing a CHECK) but conceptually the Tablet is the 
thing that is shutting down.

Also don't we already have such an enum in the TabletReplica indicating it's 
shutting down? it would be ideal if this code can avoid calling specifically 
into Tablet much.


Line 352:     }
can you add FALLTHROUGH_INTENDED here?


PS11, Line 356:       VLOG_WITH_PREFIX(1) << "Transaction " << ToString() << " 
failed prior to "
              :           "replication success: " << s.ToString();
this is no longer accurate in all cases


http://gerrit.cloudera.org:8080/#/c/7439/11/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS11, Line 640:   if (replica->tablet()) {
              :     // If we're deleting the tablet, there's no point in 
ensuring that applies
              :     // complete. Currently-running Applies will henceforth 
return early.
              :     replica->tablet()->mvcc_manager()->SetShuttingDown();
              :   }
              :  
why not put this in TabletReplica::Shutdown() implementation itself so as to 
avoid spilling as many implementation details outward?


-- 
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to