Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )
Change subject: KUDU-2290: Tool to re-create a tablet ...................................................................... Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@9 PS2, Line 9: replace_tablet > nit: how about unsafe_replace_tablet, given we're guaranteed "data loss" Done http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@13 PS2, Line 13: Before, the table would : have to be recreated to recover from a permanently lost tablet > Maybe we should add a check somewhere that we don't actually have a live ta I looked into this a bit and the master doesn't have that much information about how healthy a tablet is, and isn't the authority on it, especially if the tablet is unavailable but not unrecoverable. So I think having the scary "unsafe" in front of the name suffices here; adding a --force flag would mean in practice we'd always have to specify --force and it wouldn't be meaningful. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc@97 PS2, Line 97: using master::ReplaceTabletRequestPB; > warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc@98 PS2, Line 98: using master::ReplaceTabletResponsePB; > warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using- Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h@577 PS2, Line 577: , causing all of its data : // to be permanently lost. > Would it be feasible to turn the replaced tablet into a new single-tablet t Done, but it involved making an exception in the MetaCache lookup code because having a tablet switch tables violated some assumptions. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc@4249 PS2, Line 4249: replaced_tablet->mutable_metadata()->mutable_dirty()->set_state(SysTabletsEntryPB::DELETED, > if we make it DELETED it will also get removed off of the tablet servers. w Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@712 PS2, Line 712: there an e > nit: there is an Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@792 PS2, Line 792: option (kudu.rpc.authz_method) = "AuthorizeService"; > we don't want AuthorizeSuperUser here? Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc@517 PS2, Line 517: LOG(INFO) << "ReplaceTablet: received request to replace tablet " << req->tablet_id(); > worth logging the requestor info too Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2371 PS2, Line 2371: const int kNumTservers = 3; : const int kNumTablets = 3; > Maybe test with multiple masters too? This would still work, right? It's a bit of work to change ToolTest to support that and there's nothing really different about multimaster, so I hope you don't mind if I pass on it. (Also my manual testing was with multimaster). http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2392 PS2, Line 2392: // Replace the tablet. This isn't the use case for the tool, but it should work. > Should it? Maybe we should safeguard against users shooting themselves in t The master can think a tablet is healthy when it's busted, as we've seen, and it's not the authority on the health of a tablet, so I think we ought to trust the users to know what they are doing or be scared of the "unsafe" at the beginning of the action name. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2490 PS2, Line 2490: workload.StopAndJoin(); > Another thought: maybe use the cluster verifier after this to make sure thi Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Sun, 11 Mar 2018 19:26:10 +0000 Gerrit-HasComments: Yes