Dinesh Bhat has posted comments on this change.

Change subject: [tools] Manual recovery tools (part 1)
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4834/1/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

PS1, Line 974: er_->tablet_ser
> Well, even a tablet with no data still has metadata that needs to be copied
Sorry, I think I didn't understand your question clearly before.  Test doesn't 
fail without those calls,  but how can we confirm that destination replica 
caught up with source replica after tablet-copy ? One way could be to check the 
OpIds on the Log has caught upto workload.batches_completed(). If we didn't 
want to rely on workload.batches_completed(), then we could ask the source peer 
for it's lastOpId and check if all servers reached consensus on that OpId. In 
the absence of any workload, that logIndex could be -1 or 0. You are right that 
we don't need the 'data blocks' on disk for the tablet copy to take effect, but 
here the data was used more to verify Log replication than for the presence of 
data on the disk. I updated the comment to reflect this logic and also reduced 
the rows to 10 (seemed sufficient)

PS: I took some of these ideas from an existing test (tablet_copy-itest IIRC).


http://gerrit.cloudera.org:8080/#/c/4834/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

PS2, Line 936: }
             : 
             : // Test 'kudu tablet copy' tool when the destination tablet 
server is online.
             : // 1. Test the copy tool when the destination replica is healthy
             : // 2. Test the copy tool when the 
> Need to update the comments I highlighted too. At least the very first line
Done


http://gerrit.cloudera.org:8080/#/c/4834/5/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

PS5, Line 1004:   string stdout;
              :   string stderr;
> Can remove these. Check the rest too.
Done


PS5, Line 1052:  Explicit RPC to
              :   // delete-with-retries
> What does this mean? DeleteTabletWithRetries() is a function; what does "ex
Please ignore this since this didn't work as I expected, I need to fix a race 
between catalog_manager issuing a DELETE_TOMBSTONED in the change_config 
eviction path and this test issuing the tablet-copy on DELETE_TOMBSTONED 
tablet. The race is caught here : 
http://104.196.14.100/job/kudu-gerrit/4202/BUILD_TYPE=RELEASE/testReport/junit/(root)/ToolTest/TestRemoteReplicaCopy/

I wonder if it's possible to consult master catalog manager to see if this 
replica has been evicted successfully from the config so that this test can 
safely issue a tablet copy to destination tserver which has the tablet 
tombstoned. As it is now, it turns out, it is not sufficient just to verify the 
state of the tablet on destination tserver @L1045.


PS5, Line 1101: replica data size to be visible.
> What does this mean?
bad wording I guess - changed it to "Generate some workload followed by a flush 
to grow the size of the tablet on disk."


http://gerrit.cloudera.org:8080/#/c/4834/2/src/kudu/tools/tool_action_remote_replica.cc
File src/kudu/tools/tool_action_remote_replica.cc:

Line 312:   req.set_caller_term(std::numeric_limits<int64_t>::max());
> Yeah, I think that may be good. This would be -1 or 0 normally, and max whe
Thanks, following it up in the coming patch along with test-flakiness fix.


-- 
To view, visit http://gerrit.cloudera.org:8080/4834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I113a25e9b6c14f7c3814140917b61e35030b58d0
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to