-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48744/
-----------------------------------------------------------

(Updated June 15, 2016, 11:20 p.m.)


Review request for mesos, Adam B and Vinod Kone.


Bugs: MESOS-2043
    https://issues.apache.org/jira/browse/MESOS-2043


Repository: mesos


Description
-------

The master, agent and scheduler all use the same value for when an
authentication attempt times out. This can lead to situations where
attempts time out on the master and e.g., an agent simultaneously.

If then the agent attempts another authentication while the master has
not finished properly cleaning up the attempt the master would queue
the new attempt behind the existing one, and subsequently notify the
agent that the former attempt timed out. The agent on the other hand
already timed out that attempt and is waiting for the new one to make
progress.

Once the master and e.g., agent have entered this process they will
likely move in lockstep, and it becomes highly unlikely for the agent
to successfully authenticate.

Here we change the timeout used in the agent and scheduler to avoid
this lockstep behavior. We allow for slightly more time on the
agent/scheduler side before an attempt times out. We also use a value
that makes sure that cycles of authentication attempt and timeout have
very different periods on master and agent/scheduler.


Diffs
-----

  src/sched/sched.cpp 9f561d73a2e591afdc3ba4adb35a11763dced402 
  src/slave/slave.cpp 0af04d6fe53f92e03905fb7b3bec72b09d5e8e57 

Diff: https://reviews.apache.org/r/48744/diff/


Testing (updated)
-------

Tested on internal CI on a collection of Linux setups.


Thanks,

Benjamin Bannier

Reply via email to