Alexey Serbin has posted comments on this change. Change subject: KUDU-2027 retry scan RPC if negotiation times out ......................................................................
Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS3, Line 423: CHECK(ts->proxy()); > on the 'Open()' RPC, we should be able to fail-over the scan even if it's n Yep, I had some doubts about this change (too restrictive, but safe). Thank you for the clarification. I'll update this and other places correspondingly. http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/integration-tests/client-negotiation-failover-itest.cc File src/kudu/integration-tests/client-negotiation-failover-itest.cc: PS3, Line 152: hopefully > hrm, I think you are being pretty optimistic here. If you assume hashing pr Well, I assumed the fact that the numbers are sequential adds some bias to the distribution of the results across partitions. >From the other side, during the test, the leader tablets tend to switch from >one server to another due to re-elections. If they flock to the same tablet >server, that get us into the situation where even all operations are put into >different partitions, the behaviour of the test is the same as if all >operations are put into the same tablet. If all the values are put into the same partition (i.e. the same tablet) and that particular tablet is not paused, the test does not exercise the expected negotiation timeout behavior for that particular run. So, assuming there is an issue with the negotiation timeouts, the test would give false negative in that case. However, the test does not produce false positives (I assume by 'correctness' you meant false positives). Since the test runs multiple iterations and it's run multiple times itself, I feel confident it gives us the necessary coverage. Of course, it's possible to do add some code to make sure the writes go to different partitions and do tablet 'un-herding', but I think it's better to keep the code simpler and rely on high number or runs. Line 191: ASSERT_OK(ScanToStrings(&scanner, &row_strings)); > shouldn't this still be assertable like the above, given we set READ_AT_SNA In the first revision it was exactly like that (I assumed it should be like that). However, I found that it fails sometimes: i.e. CLOSEST_REPLICA and READ_AT_SNAPSHOT does not guarantee RYW behaviour. It might be a bug -- I asked David about that on the HipChat Kudu-dev channel, but I haven't gotten response. I'll ping him regarding this. Line 196: SCOPED_TRACE(Substitute("read-latest at closest replica, iteration $0", i)); > this should say "read-latest" not "snapshot" right? good catch -- yes, it should. -- To view, visit http://gerrit.cloudera.org:8080/7037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes