[ https://issues.apache.org/jira/browse/MESOS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Klues updated MESOS-6948: ------------------------------- Priority: Critical (was: Blocker) I'm moving this to a Critical bug rather than a blocker because: 1) It only happens very rarely ([~greggomann] can get it to trigger periodically inside his CentOS vagrant image, but no where else) 2) We haven't seen it manifest in practice with the CLI tool we built around these APIs (e.g. I have no problem doing a quick `dcos task exec <id> printf output` and getting the output back). 3) Even if there is an error in the wild, it's very rare and only happens at connection time. After the connection is established, thing should run smoothly. > AgentAPITest.LaunchNestedContainerSession is flaky > -------------------------------------------------- > > Key: MESOS-6948 > URL: https://issues.apache.org/jira/browse/MESOS-6948 > Project: Mesos > Issue Type: Bug > Components: tests > Environment: CentOS 7 VM, libevent and SSL enabled > Reporter: Greg Mann > Assignee: Kevin Klues > Priority: Critical > Labels: debugging, tests > Attachments: AgentAPITest.LaunchNestedContainerSession.txt > > > This was observed in a CentOS 7 VM, with libevent and SSL enabled: > {code} > I0118 22:17:23.528846 2887 http.cpp:464] Processing call > LAUNCH_NESTED_CONTAINER_SESSION > I0118 22:17:23.530452 2887 containerizer.cpp:1807] Starting nested container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.532265 2887 containerizer.cpp:1831] Trying to chown > '/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-0000/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e' > to user 'vagrant' > I0118 22:17:23.535213 2887 switchboard.cpp:570] Launching > 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" > --help="false" > --socket_address="/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430" > --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" > --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" > --wait_for_connection="true"' for container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.537210 2887 switchboard.cpp:600] Created I/O switchboard > server (pid: 3335) listening on socket file > '/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430' for > container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.543665 2887 containerizer.cpp:1540] Launching > 'mesos-containerizer' with flags '--help="false" > --launch_info="{"command":{"shell":true,"value":"printf output && printf > error > 1>&2"},"environment":{},"err":{"fd":16,"type":"FD"},"in":{"fd":11,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"vagrant"}" > --pipe_read="12" --pipe_write="13" > --runtime_directory="/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_QVZGrY/containers/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e" > --unshare_namespace_mnt="false"' > I0118 22:17:23.556032 2887 launcher.cpp:133] Forked child with pid '3337' > for container > '492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e' > I0118 22:17:23.563900 2887 fetcher.cpp:349] Starting to fetch URIs for > container: > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e, > directory: > /tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-0000/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.962441 2887 containerizer.cpp:2481] Container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e has > exited > I0118 22:17:23.962484 2887 containerizer.cpp:2118] Destroying container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e in > RUNNING state > I0118 22:17:23.962715 2887 launcher.cpp:149] Asked to destroy container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e > I0118 22:17:23.977562 2887 process.cpp:3733] Failed to process request for > '/slave(69)/api/v1': Container has or is being destroyed > W0118 22:17:23.978216 2887 http.cpp:2734] Failed to attach to nested > container > 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e: > Container has or is being destroyed > I0118 22:17:23.978330 2887 process.cpp:1435] Returning '500 Internal Server > Error' for '/slave(69)/api/v1' (Container has or is being destroyed) > ../../src/tests/api_tests.cpp:3960: Failure > Value of: (response).get().status > Actual: "500 Internal Server Error" > Expected: http::OK().status > Which is: "200 OK" > {code} > Find attached the full log from a failed run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)