[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 2: Verified+1

Unrelated flake


--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 23:42:44 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..

KUDU-2343. java: properly reconnect to new leader master after failover

This fixes the "fake" location information returned in response to a
ConnectToMaster RPC to include a distinct "fake UUID" for each master.
Previously, we were using an empty string for the UUID of the masters.
This caused collisions in the ConnectionCache, which is keyed by server
UUIDs.

The fake UUID added by this patch matches the fake UUID already in use
by AsyncKuduClient.newMasterRpcProxy. This should allow us to share the
RPC connection between the ConnectToMaster RPCs and the subsequent
GetTableLocation RPCs, which is also a benefit for latency after a
failover or on a fresh client.

Additionally, this will help with various log messages that previously
would print an empty UUID string.

A prior version of this patch solved the problem by changing the key for
the ConnectionCache to be based on IP address, which has other benefits
in terms of future support for servers changing their DNS resolution at
runtime. However, since this patch is intended for backport into prior
releases, this simpler approach is taken for now. A TODO is added for
the longer-term idea.

An existing test which tested killing a master now runs in a second mode
which restarts the master. This reproduced the bug prior to the fix.
This patch also cleans up that test somewhat - it was doing some buggy
logic to attempt to kill more than one tablet server, but in fact just
called "killTabletServer" three times on the same one. Killing three
tablet servers never made sense, either, since the table in the test
only had three replicas. Neither did it make sense to start six tablet
servers for the test.

Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Reviewed-on: http://gerrit.cloudera.org:8080/9612
Reviewed-by: Alexey Serbin 
Tested-by: Todd Lipcon 
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M 
java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M 
java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
6 files changed, 65 insertions(+), 19 deletions(-)

Approvals:
  Alexey Serbin: Looks good to me, approved
  Todd Lipcon: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java:

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java@a73
PS2, Line 73:
> Was this the culprit of the problem?
yea, at least this plus the fact that we cached based on UUID.


http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java:

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java@a65
PS2, Line 65:
> Oh, it seems I misunderstood what those entries were, indeed.  I should hav
yea I didn't much understand what was going on either til I looked with the 
debugger



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 23:42:08 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 2: Code-Review+2

(3 comments)

Thank you for fixing this issue.  I didn't expect we had no test that would 
cover the situation described in KUDU-2343

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java:

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java@a73
PS2, Line 73:
Was this the culprit of the problem?


http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java:

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java@72
PS2, Line 72: TODO(todd) it would make more sense to key this by IP address 
rather than by UUID in
:* case a server actually changes address and re-registers to 
the cluster.
Yup, that would be closer to the 'nature of the things', but I think for 1.7 
the current fix is adequate enough.


http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java:

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java@a65
PS2, Line 65:
> this old comment was just wrong. The extra 1 was from the fake master ID ""
Oh, it seems I misunderstood what those entries were, indeed.  I should have 
printed them out.



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 23:32:42 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java:

http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java@a65
PS2, Line 65:
this old comment was just wrong. The extra 1 was from the fake master ID "" 
being distinct from the three masters which used master- UUIDs.



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 22:08:05 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 2:

> Patch Set 1:
>
> > Example alternate patch which also seems to fix the issue:
>
> Big fan.

OK, I put up the alternate simpler patch for now. Also removed the new 
assertions and toString changes to make this an easier backport.


--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 22:07:07 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Jean-Daniel Cryans, Kudu Jenkins, Adar Dembo, Anonymous 
Coward #314,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9612

to look at the new patch set (#2).

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..

KUDU-2343. java: properly reconnect to new leader master after failover

This fixes the "fake" location information returned in response to a
ConnectToMaster RPC to include a distinct "fake UUID" for each master.
Previously, we were using an empty string for the UUID of the masters.
This caused collisions in the ConnectionCache, which is keyed by server
UUIDs.

The fake UUID added by this patch matches the fake UUID already in use
by AsyncKuduClient.newMasterRpcProxy. This should allow us to share the
RPC connection between the ConnectToMaster RPCs and the subsequent
GetTableLocation RPCs, which is also a benefit for latency after a
failover or on a fresh client.

Additionally, this will help with various log messages that previously
would print an empty UUID string.

A prior version of this patch solved the problem by changing the key for
the ConnectionCache to be based on IP address, which has other benefits
in terms of future support for servers changing their DNS resolution at
runtime. However, since this patch is intended for backport into prior
releases, this simpler approach is taken for now. A TODO is added for
the longer-term idea.

An existing test which tested killing a master now runs in a second mode
which restarts the master. This reproduced the bug prior to the fix.
This patch also cleans up that test somewhat - it was doing some buggy
logic to attempt to kill more than one tablet server, but in fact just
called "killTabletServer" three times on the same one. Killing three
tablet servers never made sense, either, since the table in the test
only had three replicas. Neither did it make sense to start six tablet
servers for the test.

Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M 
java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M 
java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java
6 files changed, 65 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/9612/2
--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9612/1/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
File java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java:

http://gerrit.cloudera.org:8080/#/c/9612/1/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java@143
PS1, Line 143: AsyncKuduClient.LOG.info("getConnection({}) -> {}", 
serverInfo.toString(), result == null ? "null" : result.getLogPrefix());
oops, didn't mean to leave this log in



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 21:19:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 1:

> Example alternate patch which also seems to fix the issue:

Big fan.


--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 21:16:41 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10
PS1, Line 10: We don't currently use the
: master UUIDs in the Java client, so the connections to all 
masters were
: getting conflated in the cache.
> Also I just double checked, and ConnectToMasterResponsePB doesn't include t
Example alternate patch which also seems to fix the issue:
https://gist.github.com/8f0f8e38b04fb567f014e248e6f5d79b

(maybe this is better for a 1.7-targeted and backportable fix?)



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 21:15:26 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10
PS1, Line 10: We don't currently use the
: master UUIDs in the Java client, so the connections to all 
masters were
: getting conflated in the cache.
> In typical java-client form, we have multiple ways where these rpc proxies
Also I just double checked, and ConnectToMasterResponsePB doesn't include the 
master's UUID. So, the client is never aware of master UUIDs even though the 
masters have them.

So, we could have fixed this bug by filling in a pseudo-ID like 
hostport.toString() there so as to match the UUID used in newMasterRpcProxy. Do 
you think that's better?



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 21:10:47 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10
PS1, Line 10: We don't currently use the
: master UUIDs in the Java client, so the connections to all 
masters were
: getting conflated in the cache.
> Just passing through, but can you show me where this happens? I dug around
In typical java-client form, we have multiple ways where these rpc proxies get 
created. newMasterRpcProxy is only used in the 'connectToMasters' function, 
which is only used when we first connect to the cluster.

After we locate a master leader, we make a 'fake' GetTableLocationsResponsePB 
in ConnectToClusterResponse.getAsTableLocations(). We don't fill in the UUID 
field there.

I suppose filling in that with the master's UUID would have been an alternate 
way to fix this, rather than caching based on IP/port. However I think the 
ip/port cache makes more sense anyway in the case that a server re-registers on 
a different IP/port. (currently we have some server-side deficiencies that 
prevent that, but we shouldn't compound it with client-side issues)



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 21:07:18 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9612 )

Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10
PS1, Line 10: We don't currently use the
: master UUIDs in the Java client, so the connections to all 
masters were
: getting conflated in the cache.
Just passing through, but can you show me where this happens? I dug around and 
found AsyncKuduClient.newMasterRpcProxy, which creates a ServerInfo with UUID 
"master-" and that ought to uniquely identify every master in a 
multi-master cluster.



--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Tue, 13 Mar 2018 20:49:02 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover

2018-03-13 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Anonymous Coward #314,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/9612

to review the following change.


Change subject: KUDU-2343. java: properly reconnect to new leader master after 
failover
..

KUDU-2343. java: properly reconnect to new leader master after failover

This fixes the connection cache in the java client to be based on IP
address and port rather than based on UUID. We don't currently use the
master UUIDs in the Java client, so the connections to all masters were
getting conflated in the cache. This would cause the client to reconnect
back to a follower master even if the leader had changed and even if it
had properly identified the address of the new leader.

An existing test which tested killing a master now runs in a second mode
which restarts the master. This reproduced the bug prior to the fix.
This patch also cleans up that test somewhat - it was doing some buggy
logic to attempt to kill more than one tablet server, but in fact just
called "killTabletServer" three times on the same one. Killing three
tablet servers never made sense, either, since the table in the test
only had three replicas. Neither did it make sense to start six tablet
servers for the test.

Along the way, this patch also improves toString() output for a number
of Java client structures. This makes it easier to debug
Connection-level issues.

Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java
M java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java
M java/kudu-client/src/main/java/org/apache/kudu/client/TableLocationsCache.java
M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java
M 
java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestRemoteTablet.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java
9 files changed, 105 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/9612/1
--
To view, visit http://gerrit.cloudera.org:8080/9612
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04
Gerrit-Change-Number: 9612
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward #314