Andrew Wong has posted comments on this change. Change subject: Allow tablet shutdown without completing txs ......................................................................
Patch Set 8: (7 comments) Also as per our slack discussion, rather than making use of tablet::FAILED, which in https://gerrit.cloudera.org/#/c/7440/ I'm making a "final" state, I went with your suggestion of shutting down the MvccManager. There's still a separation of replica and tablet, but I think this at least makes sense w.r.t. transactions. Also note in on of my review comments that the MvccManager::SetShuttingDown() isn't really used here, but will be in the ErrorNotificationCb. http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/mvcc.h File src/kudu/tablet/mvcc.h: Line 404: void Cancel(); > here's an idea: how about rather than specifically cancelling each transact Done http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/transactions/transaction.h File src/kudu/tablet/transactions/transaction.h: Line 20: #include <mutex> > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 481: CHECK(s.IsDiskFailure()); > per conversation on Slack, I think generalizing this to something like the I updated HandleFailure to allow replicating/replicated failures to go through if the tablet is shutting down. The drive is able to get the necessary info (i.e. whether mvcc is shut down) through the TransactionState. http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/transactions/write_transaction.cc File src/kudu/tablet/transactions/write_transaction.cc: Line 134: if (PREDICT_FALSE(tablet->IsInFailedDir())) { > making ApplyRowOperations return a Status would allow centralizing this log Done Line 139: tablet->ApplyRowOperations(state()); > here if Tablet indicated that it has shut down then you can early-return wi Done http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1076: tablet->SetInFailedDir(); > again maybe generalize to just 'MarkFailed' since the handling isn't partic Since this codepath is, in this patch, still fatal, I've reverted these changes. The code here *would* be: >tablet->mvcc_manager()->SetShuttingDown(); Line 1078: LOG(FATAL) << Substitute("Data directory $0 failed. Disk failure handling not implemented", > now implemented? Not quite, I moved other parts of failure handling (i.e. submission of Shutdown() call, permissiveness failed Flushes) to another patch to be "easier to review," and kept this fatal. Only updated here to show where the new code would be used. -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes