Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7439 )
Change subject: mvcc: allow tablet shutdown without completing txs ...................................................................... Patch Set 21: (26 comments) http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@77 PS20, Line 77: using itest::SimpleIntKeyKuduSchema; > warning: using decl 'InternalMiniCluster' is unused [misc-unused-using-decl Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@130 PS20, Line 130: ASSERT_OK(cluster->Start()); > this should probably be less specific to implementation details Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@179 PS20, Line 179: FLAGS_enable_leader_failure_detection = false; > I'm not sure this test suite is the best place for these tests, since they I moved to stop_tablet-itest.cc http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@183 PS20, Line 183: FLAGS_allow_unsafe_replication_factor = true; > here you're stopping a random server, not necessarily the one that the scan Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@187 PS20, Line 187: > are the readers using fault-tolerant scans by default here? Yeah they are (see test_workload.cc:219) http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@197 PS20, Line 197: > Somewhere we have a utility function like AssertNoCrashes(). Otherwise mayb Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@205 PS20, Line 205: ValueDeleter deleter(&ts_map); > in these tests it might be beneficial to configure the MM such that it cons Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@235 PS20, Line 235: > is it possible in this test to re-start the tserver that has the stopped ta I think it'd be a bit tricky in this patch since this makes use of IMC instead of EMC, unless there's a ClusterVerifier equivalent for IMCs that I'm not aware of. Good point though, in EMC tests I've been verifying with ClusterVerifier. Trickier still, this actually makes use of the fact that we're using IMC by stopping the tablet directly by reaching into the server, so I'd prefer to keep this test IMC. http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@238 PS20, Line 238: // The MarkDirty() callback is on an async thread so it might take the > can you combine this test with the above one, using a utility function that Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc-test.cc File src/kudu/tablet/mvcc-test.cc: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc-test.cc@498 PS20, Line 498: waiting_thread.join(); : ASSERT_OK(s); > don't you want to join before assert here? Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc-test.cc@598 PS20, Line 598: waiting_thread.join(); : ASSERT_STR_CONTAINS(s.ToString(), "closed"); : ASSERT_TRUE(s.IsAborted()); : : // New waiters should abort immediately. : s = mgr.WaitForApplyingTransactionsToCommit(); : > if you reordered the join and the ASSERT then you could avoid the complexit Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h File src/kudu/tablet/mvcc.h: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@253 PS20, Line 253: Status:: > specify the type of status Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@254 PS20, Line 254: Status WaitForApplyingTransactionsToCommit() const WARN_UNUSED_RESULT; > worth adding a WARN_UNUSED_RESULT here Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@275 PS20, Line 275: not > not? Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@280 PS20, Line 280: e bool i > rename to is_closed since it's inlinable short method Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@160 PS20, Line 160: // > mind adding a WARN_UNUSED_RESULT to these functions that you're changing fr Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@405 PS20, Line 405: > nit: "should" is kinda fuzzy. Do you mean "will"? Nicer for callers to thin Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@405 PS20, Line 405: : // Return handle to the metric entity of this tablet. : const scoped > Would be good to move this declaration up next to 'Shutdown()' and clarify Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@465 PS20, Line 465: // into an internal UPDATE operation and performs it. > add doc for what bad status means. Same above with the Apply calls. Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@537 PS20, Line 537: // The 'old_ms' pointer will be set to the current MemRowSet set before the replacement. : // If the MemRowSet is not empty it will be added to the 'compaction' input : // and the MemRowSet compaction > can't you just use the corresponding method in mvcc? You're right, although sometimes it's convenient to condition on a the status, and sometimes it's convenient to condition a bool. E.g. if we need to return an error in multiple places if the tablet has been Stopped(), this is more convenient than repeating: if (tablet->Stopped()) { return Status::Aborted("aborted message"); } http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@806 PS20, Line 806: // In order for failures here to be safe, the tablet must have been > I think it's worth a comment here explaining why we expect it to already ha Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@827 PS20, Line 827: p(); > nit: how about "Error while checking roe presence" Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@839 PS20, Line 839: w_ops_bas > PREDICT_FALSE? Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@900 PS20, Line 900: ol present = false; > can't you use CheckNotStopped here Done http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet_replica.cc@263 PS20, Line 263: > is it ok to access tablet_ outside of state_change_lock_? can't quite recal >From the comment around state_change_lock_, it seems it's only guarding >long-running lifecycle operations, which this isn't. lock_ seems necessary >though, since it accesses tablet_. I think it was safe before just because we had the lock above serializing and deduplicating Stop() calls, but I'll add it up there. http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tserver/tablet_service.cc@1731 PS20, Line 1731: if (!s.ok()) { : tmp_error_code = TabletServerErrorPB::INVALID_SNAPSHOT; > this sentence seems incomplete, so what do we do? Done, also moved this a bit lower and conditioned the error code on Stopped() instead. -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-Change-Number: 7439 Gerrit-PatchSet: 21 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Tue, 31 Oct 2017 00:38:24 +0000 Gerrit-HasComments: Yes