[kudu-CR] KUDU-1034 client does not failover due to timeout

2017-05-21 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6924

to look at the new patch set (#2).

Change subject: KUDU-1034 client does not failover due to timeout
..

KUDU-1034 client does not failover due to timeout

This patch fixes the issue described by KUDU-1034: the client does not
mark the failed tablet server as 'failed' in case of timeout and
continues to use it over and over again to send further requests,
even if other tablet replicas might be available.

Besides the actual fix, this patch incorporates an integration test
RaftConsensusITest.TestClientFailoverOnLeaderTimeout) written by Mike.

Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/rpc/retriable_rpc.h
3 files changed, 58 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/6924/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6924
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icfcece485e4053d921ffdc865612b3e7b9a992a3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2021 retry master RPC if negotiation times out

2017-05-21 Thread Alexey Serbin (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6927

to look at the new patch set (#2).

Change subject: KUDU-2021 retry master RPC if negotiation times out
..

KUDU-2021 retry master RPC if negotiation times out

This patch fixes KUDU-2021, i.e. with this patch, in case of a
multi-master Kudu cluster, the Kudu C++ client retries an RPC with other
master if the connection negotiation with leader master timed out.

Added new integration test to cover the updated client's behavior.

Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
---
M src/kudu/client/client-internal.cc
M src/kudu/integration-tests/client_failover-itest.cc
M src/kudu/integration-tests/master_migration-itest.cc
3 files changed, 110 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/6927/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Fix flaky test TestRecoverFromOpIdOverflow (again)

2017-05-21 Thread Mike Percy (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6943

to look at the new patch set (#2).

Change subject: Fix flaky test TestRecoverFromOpIdOverflow (again)
..

Fix flaky test TestRecoverFromOpIdOverflow (again)

The previous attempt to fix this in commit
f0580499dc50e8a47ff6251301cdc15b9b79edcb had a flaw, but this test
really does fix the primary source of the flakiness. What appears to
have happened in the previous attempt is the dist-test passed and then I
made a couple additional tweaks before committing it which actually
broke it again.

The only "real" code change (the aforementioned fix) is on lines
L367-L371, however while I was in this test I also "modernized" it a bit
by making it inherit from ExternalMiniClusterITestBase which resulted in
a net-negative line count in this patch.

I ran the current version of this patch on dist-test in DEBUG mode with
8 cpu stress threads, and 199/200 passed (there is a nearly 50% failure
rate with 8 stress threads without this fix). The one that failed
actually timed out (with no logs, so I have no idea what went wrong)
but it is likely some unrelated (infrastructure?) issue.

This was the dist-test job:

http://dist-test.cloudera.org/job?job_id=mpercy.1495321732.3266

Change-Id: I1f7326136479311ba2a84b384327e07d280df7c3
---
M src/kudu/integration-tests/ts_recovery-itest.cc
1 file changed, 30 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/43/6943/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6943
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1f7326136479311ba2a84b384327e07d280df7c3
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins