Alexey Serbin has posted comments on this change.

Change subject: KUDU-2027 retry scan RPC if negotiation times out
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/client/scanner-internal.cc
File src/kudu/client/scanner-internal.cc:

PS3, Line 423:     CHECK(ts->proxy());
> on the 'Open()' RPC, we should be able to fail-over the scan even if it's n
Yep, I had some doubts about this change (too restrictive, but safe).  Thank 
you for the clarification.  I'll update this and other places correspondingly.


http://gerrit.cloudera.org:8080/#/c/7037/3/src/kudu/integration-tests/client-negotiation-failover-itest.cc
File src/kudu/integration-tests/client-negotiation-failover-itest.cc:

PS3, Line 152: hopefully
> hrm, I think you are being pretty optimistic here. If you assume hashing pr
Well, I assumed the fact that the numbers are sequential adds some bias to the 
distribution of the results across partitions.

>From the other side, during the test, the leader tablets tend to switch from 
>one server to another due to re-elections.  If they flock to the same tablet 
>server, that get us into the situation where even all operations are put into 
>different partitions, the behaviour of the test is the same as if all 
>operations are put into the same tablet. 

If all the values are put into the same partition (i.e. the same tablet) and 
that particular tablet is not paused, the test does not exercise the expected 
negotiation timeout behavior for that particular run.  So, assuming there is an 
issue with the negotiation timeouts, the test would give false negative in that 
case.  However, the test does not produce false positives (I assume by 
'correctness' you meant false positives).

Since the test runs multiple iterations and it's run multiple times itself, I 
feel confident it gives us the necessary coverage.

Of course, it's possible to do add some code to make sure the writes go to 
different partitions and do tablet 'un-herding', but I think it's better to 
keep the code simpler and rely on high number or runs.


Line 191:     ASSERT_OK(ScanToStrings(&scanner, &row_strings));
> shouldn't this still be assertable like the above, given we set READ_AT_SNA
In the first revision it was exactly like that (I assumed it should be like 
that).  However, I found that it fails sometimes: i.e. CLOSEST_REPLICA and 
READ_AT_SNAPSHOT does not guarantee RYW behaviour.  It might be a bug -- I 
asked David about that on the HipChat Kudu-dev channel, but I haven't gotten 
response.  I'll ping him regarding this.


Line 196:     SCOPED_TRACE(Substitute("read-latest at closest replica, 
iteration $0", i));
> this should say "read-latest" not "snapshot" right?
good catch -- yes, it should.


-- 
To view, visit http://gerrit.cloudera.org:8080/7037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I05a3c38cc07c9b6edd1ae773d9850e92645f3b14
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to