Alexey Serbin has posted comments on this change.

Change subject: KUDU-2021 test for negotiation timeout on Master RPCs
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6927/6//COMMIT_MSG
Commit Message:

Line 7: KUDU-2021 retry master RPC if negotiation times out
> How about:
That looks better!

I would shorted it a bit, though:

KUDU-2021 test for negotiation timeout on Master RPCs


http://gerrit.cloudera.org:8080/#/c/6927/6/src/kudu/integration-tests/client_failover-itest.cc
File src/kudu/integration-tests/client_failover-itest.cc:

Line 484:   static const int kTimeoutMs = 1 * 60 * 1000;
> nit: how about const MonoDelta kTimeout = MonoDelta::FromSeconds(60)
yep, that's much better.


Line 488:   cluster_opts_.master_rpc_ports = { 37030, 37031, 37032 };
> huh. do we have to pick these manually?
Yep, that's the ugly piece of all multi-master tests.  The crux is that all 
masters should know other master's end-points  when starting up.

The fix would be: make probing on ports, somehow reserve them, and then start 
the masters with the reserved ports.


Line 540:   ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms", 
"500"));
> It's not obvious to me why a 500ms negotiation delay is sufficient to cause
It seems this went this state after some iterations.

It supposed to be something like the following:

  ASSERT_OK(cluster_->SetFlag(m, "rpc_negotiation_inject_delay_ms",
                              Substitute("$0", 
FLAGS_rpc_negotiation_timeout_ms)));
  thread pause_thread([&]() {
      SleepFor(MonoDelta::FromMilliseconds(FLAGS_rpc_negotiation_timeout_ms / 
2));
      CHECK_OK(m->Pause())
    });


Line 542:       SleepFor(MonoDelta::FromSeconds(3));
> What's the purpose of this sleep? Is it so this executes after ListTables()
Yep, it seem that was a typo.  I added corresponding comment into the code 
regarding purpose of this sleep.  Basically, we want the former leader master 
to lose its leadership role, but only after the client has connected and 
started connection negotiation.


Line 548:       ScopedResumeExternalDaemon resumer(m);
> CHECK_OK(m->Resume()); ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6927
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib62126c9d8c6c65f447c5d03a0377eaff823393c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to