Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9528 )
Change subject: WIP KUDU-2129 make ksck less scary when copying ...................................................................... Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/9528/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9528/1//COMMIT_MSG@25 PS1, Line 25: Report "non-HEALTHY" instead of "bad" in the above log This is the one change I have an issue with because it makes it less clear what ksck is saying. It tells you what the thing is not instead of what it is, which makes it harder to know how to respond. Admittedly "bad" isn't great either, since "bad" tables may actually be self-healing. Maybe we need to break down the results further into HEALTHY, RECOVERING, NEEDS_INTERVENTION, or something? http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@322 PS1, Line 322: with non-HEALTHY statuses Howabout "are not healthy"? http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@402 PS1, Line 402: "Table Summary\n" : " Name | Status | Total Tablets | Healthy | Recovering | Under-replicated | Unavailable\n" : "------+---------+---------------+---------+------------+------------------+-------------\n" : " test | HEALTHY | 3 | 3 | 0 | 0 | 0"); Are these the long lines you're complaining about? Maybe we can add a helper like ExpectedKsckTableSummary() that takes a container of expected CheckResults or just an expected count of each kind, and returns the table as a string. http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck-test.cc@420 PS1, Line 420: "The consensus matrix is:\n" : " Config source | Replicas | Current term | Config index | Committed?\n" : "---------------+------------------+--------------+--------------+------------\n" : " master | A* B C | | | Yes\n" : " A | A* B C D | 0 | | Yes\n" : " B | A* B C | 0 | | Yes\n" : " C | A* B C | 0 | | Yes"); On the other hand I think the consensus matrix literals are kind of a feature, since as we test for detecting more situations they serve as a reference for how certain situations look in ksck. Also it'd be harder to write a helper. http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h@461 PS1, Line 461: // There was a discrepancy among the tablets' consensus configs and the master's. Can you add a blurb like this for all non-OK statuses? Maybe like // The tablet has less replicas than its table's replication factor. UNDER_REPLICATED // The tablet is missing a majority of its replicas and is unavailable for writes. If a majority cannot be brought back online then the tablet requires manual intervention to recover. UNAVAILABLE // The tablet is missing replicas but is in the process of self-healing. RECOVERING I also have weak preferences for: 1. Renaming OK to HEALTHY. 2. Reordering the enum values to go from better to worse. IIRC it's HEALTHY/OK RECOVERING UNDER_REPLICATED CONSENSUS_MISMATCH UNAVAILABLE http://gerrit.cloudera.org:8080/#/c/9528/1/src/kudu/tools/ksck.h@489 PS1, Line 489: if (recovering_tablets > 0) { : return CheckResult::RECOVERING; : } : if (underreplicated_tablets > 0) { : return CheckResult::UNDER_REPLICATED; : } Isn't under-replicated worse than recovering? -- To view, visit http://gerrit.cloudera.org:8080/9528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iba0c1f5b5a7083cbef99d3674dfebe1457075ce0 Gerrit-Change-Number: 9528 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Wed, 07 Mar 2018 18:04:32 +0000 Gerrit-HasComments: Yes