Dinesh Bhat has posted comments on this change. Change subject: [tools] Manual recovery tools (part 1) ......................................................................
Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/4834/1/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: PS1, Line 974: er_->tablet_ser > Well, even a tablet with no data still has metadata that needs to be copied Sorry, I think I didn't understand your question clearly before. Test doesn't fail without those calls, but how can we confirm that destination replica caught up with source replica after tablet-copy ? One way could be to check the OpIds on the Log has caught upto workload.batches_completed(). If we didn't want to rely on workload.batches_completed(), then we could ask the source peer for it's lastOpId and check if all servers reached consensus on that OpId. In the absence of any workload, that logIndex could be -1 or 0. You are right that we don't need the 'data blocks' on disk for the tablet copy to take effect, but here the data was used more to verify Log replication than for the presence of data on the disk. I updated the comment to reflect this logic and also reduced the rows to 10 (seemed sufficient) PS: I took some of these ideas from an existing test (tablet_copy-itest IIRC). http://gerrit.cloudera.org:8080/#/c/4834/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: PS2, Line 936: } : : // Test 'kudu tablet copy' tool when the destination tablet server is online. : // 1. Test the copy tool when the destination replica is healthy : // 2. Test the copy tool when the > Need to update the comments I highlighted too. At least the very first line Done http://gerrit.cloudera.org:8080/#/c/4834/5/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: PS5, Line 1004: string stdout; : string stderr; > Can remove these. Check the rest too. Done PS5, Line 1052: Explicit RPC to : // delete-with-retries > What does this mean? DeleteTabletWithRetries() is a function; what does "ex Please ignore this since this didn't work as I expected, I need to fix a race between catalog_manager issuing a DELETE_TOMBSTONED in the change_config eviction path and this test issuing the tablet-copy on DELETE_TOMBSTONED tablet. The race is caught here : http://104.196.14.100/job/kudu-gerrit/4202/BUILD_TYPE=RELEASE/testReport/junit/(root)/ToolTest/TestRemoteReplicaCopy/ I wonder if it's possible to consult master catalog manager to see if this replica has been evicted successfully from the config so that this test can safely issue a tablet copy to destination tserver which has the tablet tombstoned. As it is now, it turns out, it is not sufficient just to verify the state of the tablet on destination tserver @L1045. PS5, Line 1101: replica data size to be visible. > What does this mean? bad wording I guess - changed it to "Generate some workload followed by a flush to grow the size of the tablet on disk." http://gerrit.cloudera.org:8080/#/c/4834/2/src/kudu/tools/tool_action_remote_replica.cc File src/kudu/tools/tool_action_remote_replica.cc: Line 312: req.set_caller_term(std::numeric_limits<int64_t>::max()); > Yeah, I think that may be good. This would be -1 or 0 normally, and max whe Thanks, following it up in the coming patch along with test-flakiness fix. -- To view, visit http://gerrit.cloudera.org:8080/4834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I113a25e9b6c14f7c3814140917b61e35030b58d0 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Dinesh Bhat <din...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes