Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 )
Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets ...................................................................... Patch Set 9: (18 comments) Skimmed trough for a first look, will take another look later today. Overall looks pretty good, as of now mostly nits. http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@15 PS9, Line 15: before nit: before --> behind ? http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@60 PS9, Line 60: all peers BTW, what happens if _some_ of the peers are unavailable? Do we need to add a test for that or there is nothing special about that? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h@353 PS9, Line 353: void set_timestamp(uint64_t timestamp) { : timestamp_ = timestamp; : } Is it used at all? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h File src/kudu/tools/ksck_checksum.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h@256 PS9, Line 256: AtomicInt<int64_t> rows_summed_; : AtomicInt<int64_t> disk_bytes_summed_; nit: why not to move into std::atomic with those as well? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@66 PS9, Line 66: DEFINE_int32(max_progress_report_wait_ms, 5000, : "Maximum number of milliseconds to wait between progress reports. " : "Used to speed up tests."); : TAG_FLAG(max_progress_report_wait_ms, hidden); Could this be also useful while using the tool in the field? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@70 PS9, Line 70: timestamp_update_period_ms Does it make sense to keep it in milliseconds? Maybe, a second-grain resolution would be enough? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@86 PS9, Line 86: when all the checksums for its replicas begin nit: maybe, change to 'when the checksumming for all its replicas begins' http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@101 PS9, Line 101: listed in 'tservers' What about the reverse: is it OK to have a tablet server in 'tservers' that is not referenced by any tablet from the 'tablet_infos'? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@106 PS9, Line 106: CHECK(num_replicas); nit: maybe, DCHECK is enough, since it would crash anyway further down if it were null. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@110 PS9, Line 110: tserver_uuid_set Consider using helper functions from map-util.h: those present more expressive semantics. E.g., here it's supposed to be InsertOrDie(&tserver_uuid_set, tserver->uuid()), right? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@187 PS9, Line 187: bool KsckChecksumManager::HasOpenTsSlotsUnlocked() { nit: add const specifier? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@383 PS9, Line 383: replica_uuids.insert nit: use helper functions from map-util.h http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc File src/kudu/tools/ksck_remote-test.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@416 PS9, Line 416: TestChecksumSnapshotLastingLongerThanAHM Did you try to run this with --stress-cpu-threads=8 and many iterations using dist-test? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@429 PS9, Line 429: or and http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@428 PS9, Line 428: #if defined(THREAD_SANITIZER) || defined(ADDRESS_SANITIZER) : LOG(WARNING) << "test is skipped in TSAN or ASAN builds"; : return; : #endif nit: shift this left one indent http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@476 PS9, Line 476: MonoDelta::FromSeconds(timeout_sec), : MonoDelta::FromSeconds(timeout_sec), : scan_concurrency, : use_snapshot, : KsckChecksumOptions::kCurrentTimestamp))); nit: indent should be 4 spaces from the outer scope's indent. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.h File src/kudu/tools/ksck_remote.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.h@130 PS9, Line 130: style nit: indent should be 4 spaces from the outer scope http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.cc File src/kudu/tools/ksck_remote.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.cc@321 PS9, Line 321: const Schema& schema, nit: maybe, use the move semantics while passing this parameter as well? -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Attila Bukor <abu...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Thu, 04 Oct 2018 17:28:51 +0000 Gerrit-HasComments: Yes