[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 2: Verified+1 Unrelated flake -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 23:42:44 + Gerrit-HasComments: No
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. KUDU-2343. java: properly reconnect to new leader master after failover This fixes the "fake" location information returned in response to a ConnectToMaster RPC to include a distinct "fake UUID" for each master. Previously, we were using an empty string for the UUID of the masters. This caused collisions in the ConnectionCache, which is keyed by server UUIDs. The fake UUID added by this patch matches the fake UUID already in use by AsyncKuduClient.newMasterRpcProxy. This should allow us to share the RPC connection between the ConnectToMaster RPCs and the subsequent GetTableLocation RPCs, which is also a benefit for latency after a failover or on a fresh client. Additionally, this will help with various log messages that previously would print an empty UUID string. A prior version of this patch solved the problem by changing the key for the ConnectionCache to be based on IP address, which has other benefits in terms of future support for servers changing their DNS resolution at runtime. However, since this patch is intended for backport into prior releases, this simpler approach is taken for now. A TODO is added for the longer-term idea. An existing test which tested killing a master now runs in a second mode which restarts the master. This reproduced the bug prior to the fix. This patch also cleans up that test somewhat - it was doing some buggy logic to attempt to kill more than one tablet server, but in fact just called "killTabletServer" three times on the same one. Killing three tablet servers never made sense, either, since the table in the test only had three replicas. Neither did it make sense to start six tablet servers for the test. Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Reviewed-on: http://gerrit.cloudera.org:8080/9612 Reviewed-by: Alexey Serbin Tested-by: Todd Lipcon --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java 6 files changed, 65 insertions(+), 19 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has removed a vote on this change. Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java File java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java: http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java@a73 PS2, Line 73: > Was this the culprit of the problem? yea, at least this plus the fact that we cached based on UUID. http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java: http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java@a65 PS2, Line 65: > Oh, it seems I misunderstood what those entries were, indeed. I should hav yea I didn't much understand what was going on either til I looked with the debugger -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 23:42:08 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 2: Code-Review+2 (3 comments) Thank you for fixing this issue. I didn't expect we had no test that would cover the situation described in KUDU-2343 http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java File java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java: http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java@a73 PS2, Line 73: Was this the culprit of the problem? http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java File java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java: http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java@72 PS2, Line 72: TODO(todd) it would make more sense to key this by IP address rather than by UUID in :* case a server actually changes address and re-registers to the cluster. Yup, that would be closer to the 'nature of the things', but I think for 1.7 the current fix is adequate enough. http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java: http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java@a65 PS2, Line 65: > this old comment was just wrong. The extra 1 was from the fake master ID "" Oh, it seems I misunderstood what those entries were, indeed. I should have printed them out. -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 23:32:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java: http://gerrit.cloudera.org:8080/#/c/9612/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java@a65 PS2, Line 65: this old comment was just wrong. The extra 1 was from the fake master ID "" being distinct from the three masters which used master- UUIDs. -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 22:08:05 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 2: > Patch Set 1: > > > Example alternate patch which also seems to fix the issue: > > Big fan. OK, I put up the alternate simpler patch for now. Also removed the new assertions and toString changes to make this an easier backport. -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 22:07:07 + Gerrit-HasComments: No
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Hello Alexey Serbin, Jean-Daniel Cryans, Kudu Jenkins, Adar Dembo, Anonymous Coward #314, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9612 to look at the new patch set (#2). Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. KUDU-2343. java: properly reconnect to new leader master after failover This fixes the "fake" location information returned in response to a ConnectToMaster RPC to include a distinct "fake UUID" for each master. Previously, we were using an empty string for the UUID of the masters. This caused collisions in the ConnectionCache, which is keyed by server UUIDs. The fake UUID added by this patch matches the fake UUID already in use by AsyncKuduClient.newMasterRpcProxy. This should allow us to share the RPC connection between the ConnectToMaster RPCs and the subsequent GetTableLocation RPCs, which is also a benefit for latency after a failover or on a fresh client. Additionally, this will help with various log messages that previously would print an empty UUID string. A prior version of this patch solved the problem by changing the key for the ConnectionCache to be based on IP address, which has other benefits in terms of future support for servers changing their DNS resolution at runtime. However, since this patch is intended for backport into prior releases, this simpler approach is taken for now. A TODO is added for the longer-term idea. An existing test which tested killing a master now runs in a second mode which restarts the master. This reproduced the bug prior to the fix. This patch also cleans up that test somewhat - it was doing some buggy logic to attempt to kill more than one tablet server, but in fact just called "killTabletServer" three times on the same one. Killing three tablet servers never made sense, either, since the table in the test only had three replicas. Neither did it make sense to start six tablet servers for the test. Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectToClusterResponse.java M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestConnectionCache.java 6 files changed, 65 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/9612/2 -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 2 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9612/1/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java File java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java: http://gerrit.cloudera.org:8080/#/c/9612/1/java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java@143 PS1, Line 143: AsyncKuduClient.LOG.info("getConnection({}) -> {}", serverInfo.toString(), result == null ? "null" : result.getLogPrefix()); oops, didn't mean to leave this log in -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 21:19:48 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Jean-Daniel Cryans has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 1: > Example alternate patch which also seems to fix the issue: Big fan. -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 21:16:41 + Gerrit-HasComments: No
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10 PS1, Line 10: We don't currently use the : master UUIDs in the Java client, so the connections to all masters were : getting conflated in the cache. > Also I just double checked, and ConnectToMasterResponsePB doesn't include t Example alternate patch which also seems to fix the issue: https://gist.github.com/8f0f8e38b04fb567f014e248e6f5d79b (maybe this is better for a 1.7-targeted and backportable fix?) -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 21:15:26 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10 PS1, Line 10: We don't currently use the : master UUIDs in the Java client, so the connections to all masters were : getting conflated in the cache. > In typical java-client form, we have multiple ways where these rpc proxies Also I just double checked, and ConnectToMasterResponsePB doesn't include the master's UUID. So, the client is never aware of master UUIDs even though the masters have them. So, we could have fixed this bug by filling in a pseudo-ID like hostport.toString() there so as to match the UUID used in newMasterRpcProxy. Do you think that's better? -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 21:10:47 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10 PS1, Line 10: We don't currently use the : master UUIDs in the Java client, so the connections to all masters were : getting conflated in the cache. > Just passing through, but can you show me where this happens? I dug around In typical java-client form, we have multiple ways where these rpc proxies get created. newMasterRpcProxy is only used in the 'connectToMasters' function, which is only used when we first connect to the cluster. After we locate a master leader, we make a 'fake' GetTableLocationsResponsePB in ConnectToClusterResponse.getAsTableLocations(). We don't fill in the UUID field there. I suppose filling in that with the master's UUID would have been an alternate way to fix this, rather than caching based on IP/port. However I think the ip/port cache makes more sense anyway in the case that a server re-registers on a different IP/port. (currently we have some server-side deficiencies that prevent that, but we shouldn't compound it with client-side issues) -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 13 Mar 2018 21:07:18 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9612 ) Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9612/1//COMMIT_MSG@10 PS1, Line 10: We don't currently use the : master UUIDs in the Java client, so the connections to all masters were : getting conflated in the cache. Just passing through, but can you show me where this happens? I dug around and found AsyncKuduClient.newMasterRpcProxy, which creates a ServerInfo with UUID "master-" and that ought to uniquely identify every master in a multi-master cluster. -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314 Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Tue, 13 Mar 2018 20:49:02 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2343. java: properly reconnect to new leader master after failover
Hello Alexey Serbin, Anonymous Coward #314, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/9612 to review the following change. Change subject: KUDU-2343. java: properly reconnect to new leader master after failover .. KUDU-2343. java: properly reconnect to new leader master after failover This fixes the connection cache in the java client to be based on IP address and port rather than based on UUID. We don't currently use the master UUIDs in the Java client, so the connections to all masters were getting conflated in the cache. This would cause the client to reconnect back to a follower master even if the leader had changed and even if it had properly identified the address of the new leader. An existing test which tested killing a master now runs in a second mode which restarts the master. This reproduced the bug prior to the fix. This patch also cleans up that test somewhat - it was doing some buggy logic to attempt to kill more than one tablet server, but in fact just called "killTabletServer" three times on the same one. Killing three tablet servers never made sense, either, since the table in the test only had three replicas. Neither did it make sense to start six tablet servers for the test. Along the way, this patch also improves toString() output for a number of Java client structures. This makes it easier to debug Connection-level issues. Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/main/java/org/apache/kudu/client/ConnectionCache.java M java/kudu-client/src/main/java/org/apache/kudu/client/RemoteTablet.java M java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java M java/kudu-client/src/main/java/org/apache/kudu/client/TableLocationsCache.java M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestRemoteTablet.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java 9 files changed, 105 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/9612/1 -- To view, visit http://gerrit.cloudera.org:8080/9612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I36f96c6712800e398ed46887d97d4b09fd993b04 Gerrit-Change-Number: 9612 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #314