[ https://issues.apache.org/jira/browse/MESOS-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303633#comment-15303633 ]
Adam B commented on MESOS-2043: ------------------------------- Unfortunately no time/assignee for this in 0.29, so I'm untargeting it for now. I am quite confident that we need to revisit our retry-on-failure logic for authn. Hopefully we'll get time+resources for this next release (or in a patch release). > framework auth fail with timeout error and never get authenticated > ------------------------------------------------------------------ > > Key: MESOS-2043 > URL: https://issues.apache.org/jira/browse/MESOS-2043 > Project: Mesos > Issue Type: Bug > Components: master, scheduler driver, security, slave > Affects Versions: 0.21.0 > Reporter: Bhuvan Arumugam > Priority: Critical > Labels: mesosphere, security > Attachments: aurora-scheduler.20141104-1606-1706.log, master.log, > mesos-master.20141104-1606-1706.log, slave.log > > > I'm facing this issue in master as of > https://github.com/apache/mesos/commit/74ea59e144d131814c66972fb0cc14784d3503d4 > As [~adam-mesos] mentioned in IRC, this sounds similar to MESOS-1866. I'm > running 1 master and 1 scheduler (aurora). The framework authentication fail > due to time out: > error on mesos master: > {code} > I1104 19:37:17.741449 8329 master.cpp:3874] Authenticating > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > I1104 19:37:17.741585 8329 master.cpp:3885] Using default CRAM-MD5 > authenticator > I1104 19:37:17.742106 8336 authenticator.hpp:169] Creating new server SASL > connection > W1104 19:37:22.742959 8329 master.cpp:3953] Authentication timed out > W1104 19:37:22.743548 8329 master.cpp:3930] Failed to authenticate > scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083: > Authentication discarded > {code} > scheduler error: > {code} > I1104 19:38:57.885486 49012 sched.cpp:283] Authenticating with master > master@MASTER_IP:PORT > I1104 19:38:57.885928 49002 authenticatee.hpp:133] Creating new client SASL > connection > I1104 19:38:57.890581 49007 authenticatee.hpp:224] Received SASL > authentication mechanisms: CRAM-MD5 > I1104 19:38:57.890656 49007 authenticatee.hpp:250] Attempting to authenticate > with mechanism 'CRAM-MD5' > W1104 19:39:02.891196 49005 sched.cpp:378] Authentication timed out > I1104 19:39:02.891850 49018 sched.cpp:338] Failed to authenticate with master > master@MASTER_IP:PORT: Authentication discarded > {code} > Looks like 2 instances {{scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94}} & > {{scheduler-d2d4437b-d375-4467-a583-362152fe065a}} of same framework is > trying to authenticate and fail. > {code} > W1104 19:36:30.769420 8319 master.cpp:3930] Failed to authenticate > scheduler-20f88a53-5945-4977-b5af-28f6c52d3c94@SCHEDULER_IP:8083: Failed to > communicate with authenticatee > I1104 19:36:42.701441 8328 master.cpp:3860] Queuing up authentication > request from scheduler-d2d4437b-d375-4467-a583-362152fe065a@SCHEDULER_IP:8083 > because authentication is still in progress > {code} > Restarting master and scheduler didn't fix it. > This particular issue happen with 1 master and 1 scheduler after MESOS-1866 > is fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)