[kudu-CR] [tests] fixed flake in consensus peer health status

Alexey Serbin (Code Review) Mon, 30 Apr 2018 11:27:22 -0700

Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit


    http://gerrit.cloudera.org:8080/10237

to look at the new patch set (#2).

Change subject: [tests] fixed flake in consensus_peer_health_status
......................................................................

[tests] fixed flake in consensus_peer_health_status

Fixed flake in the TestPeerHealthStatusTransitions scenario of the
ConsensusPeerHealthStatusITest test.  Prior to the fix, the flakiness
happened when the target tablet server was shutdown during an on-going
tablet copy, where the tablet copy was initiated by AddServer() call
while preparing the mini-cluster for the peer health sequence of
"HEALTHY -> UNKNOWN -> FAILED -> FAILED_UNRECOVERABLE".
In that situation, the source tablet server had corresponding WAL
segments anchored, so they could not be GCed.  As a result, the tablet
replica would not get FAILED_UNRECOVERABLE health status in 30 seconds
because --tablet_copy_idle_timeout_sec is set to 600 seconds by default.

Test results using dist-test, before and after the fix (ASAN),
before:
  http://dist-test.cloudera.org/job?job_id=aserbin.1525111507.120645

after:
  http://dist-test.cloudera.org/job?job_id=aserbin.1525099526.63998

Change-Id: I8eb640604e98361029aa3342ffa3050e922b6629
---
M src/kudu/integration-tests/consensus_peer_health_status-itest.cc
1 file changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/37/10237/2
--
To view, visit http://gerrit.cloudera.org:8080/10237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8eb640604e98361029aa3342ffa3050e922b6629
Gerrit-Change-Number: 10237
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

[kudu-CR] [tests] fixed flake in consensus peer health status

Reply via email to