Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9490 )

Change subject: [tools] Add a tool to recover master data from tablet servers
......................................................................


Patch Set 4:

I'm looking at this again because I actually ran into a case where this was 
useful. Let me jot down my thoughts right now about Adar's original objections 
to how this tool works:

1) By calling Master::Init(), do we bind to some ports? Will we start 
responding to incoming RPCs? A CLI tool shouldn't do either.

Yes, we do. This should be fixed.

2) By calling SysCatalogTable::CreateNew, you end up baking particulars about 
_this_ process into the on-disk data. For example (maybe the only example), a 
cmeta file will be generated with the first RPC address of _this process_ 
(well, of Master::Init I guess) in it. That seems like a bad idea since the CLI 
tool and the actual master are likely to run differently (i.e. different UNIX 
users, maybe different machines too if the only goal here is to generate some 
on-disk data).

This is fine. The idea of the tool is that it makes the data, and it definitely 
will require additional recovery work if e.g. recovering from a failure of all 
masters of a multi-master setup. The tool should be run as the kudu user 
otherwise the files produced will have to be modified for a regular master 
process to use them anyway. It shouldn't be run remotely because it uses the 
wal and data dir structure that the new master will have. It should in-place 
reassemble a starting point for a new master.

Experience has shown it does work fine this way in a real cluster.

So here are some avenues to explore:
1) Continuing the thread of "reconstruction via generating physical on-disk 
data directly", could we instantiate a Tablet, load the master's schema, 
perform tablet writes directly to the Tablet, then Flush() at the end? Then 
there's no TabletReplica, no cmeta, no WAL, etc. TabletHarness is a test-only 
class that you may be able to reuse. The big question is whether a master could 
load such a tablet afterwards.

For one, we have to create at least a WAL for the tablet replica to start 
later. I think this is a lot of work for no gain compared to modifying the 
current approach not to bind to ports or potentially process RPCs.

2) Or, if we go in the other direction, could we start an empty master and 
"import" reconstructed metadata into it via RPC?

This is what this tool is doing, basically. The master process just lives 
inside the tool. The benefit is we don't need to introduce new RPC endpoints to 
the master to shovel the data tservers -> tool -> master.


--
To view, visit http://gerrit.cloudera.org:8080/9490
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6
Gerrit-Change-Number: 9490
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <abu...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-Comment-Date: Tue, 27 Nov 2018 23:25:30 +0000
Gerrit-HasComments: No

Reply via email to