[ 
https://issues.apache.org/jira/browse/MESOS-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Klues updated MESOS-6948:
-------------------------------
    Priority: Critical  (was: Blocker)

I'm moving this to a Critical bug rather than a blocker because:

1) It only happens very rarely ([~greggomann] can get it to trigger 
periodically inside his CentOS vagrant image, but no where else)

2) We haven't seen it manifest in practice with the CLI tool we built around 
these APIs (e.g. I have no problem doing a quick `dcos task exec <id> printf 
output` and getting the output back).

3) Even if there is an error in the wild, it's very rare and only happens at 
connection time. After the connection is established, thing should run smoothly.

> AgentAPITest.LaunchNestedContainerSession is flaky
> --------------------------------------------------
>
>                 Key: MESOS-6948
>                 URL: https://issues.apache.org/jira/browse/MESOS-6948
>             Project: Mesos
>          Issue Type: Bug
>          Components: tests
>         Environment: CentOS 7 VM, libevent and SSL enabled
>            Reporter: Greg Mann
>            Assignee: Kevin Klues
>            Priority: Critical
>              Labels: debugging, tests
>         Attachments: AgentAPITest.LaunchNestedContainerSession.txt
>
>
> This was observed in a CentOS 7 VM, with libevent and SSL enabled:
> {code}
> I0118 22:17:23.528846  2887 http.cpp:464] Processing call 
> LAUNCH_NESTED_CONTAINER_SESSION
> I0118 22:17:23.530452  2887 containerizer.cpp:1807] Starting nested container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.532265  2887 containerizer.cpp:1831] Trying to chown 
> '/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-0000/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e'
>  to user 'vagrant'
> I0118 22:17:23.535213  2887 switchboard.cpp:570] Launching 
> 'mesos-io-switchboard' with flags '--heartbeat_interval="30secs" 
> --help="false" 
> --socket_address="/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430"
>  --stderr_from_fd="15" --stderr_to_fd="2" --stdin_to_fd="12" 
> --stdout_from_fd="13" --stdout_to_fd="1" --tty="false" 
> --wait_for_connection="true"' for container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.537210  2887 switchboard.cpp:600] Created I/O switchboard 
> server (pid: 3335) listening on socket file 
> '/tmp/mesos-io-switchboard-5a08fbd5-0d70-411e-8389-ac115a5f6430' for 
> container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.543665  2887 containerizer.cpp:1540] Launching 
> 'mesos-containerizer' with flags '--help="false" 
> --launch_info="{"command":{"shell":true,"value":"printf output && printf 
> error 
> 1>&2"},"environment":{},"err":{"fd":16,"type":"FD"},"in":{"fd":11,"type":"FD"},"out":{"fd":14,"type":"FD"},"user":"vagrant"}"
>  --pipe_read="12" --pipe_write="13" 
> --runtime_directory="/tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_QVZGrY/containers/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e"
>  --unshare_namespace_mnt="false"'
> I0118 22:17:23.556032  2887 launcher.cpp:133] Forked child with pid '3337' 
> for container 
> '492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e'
> I0118 22:17:23.563900  2887 fetcher.cpp:349] Starting to fetch URIs for 
> container: 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e, 
> directory: 
> /tmp/ContentType_AgentAPITest_LaunchNestedContainerSession_0_ykIax9/slaves/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-S0/frameworks/707fd1a2-1a93-4e9f-a9b2-5453a207b4c5-0000/executors/14a26e2a-58b7-4166-909c-c90787d84fcb/runs/492a5d0a-0060-416c-ad80-dd0441f558dc/containers/62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.962441  2887 containerizer.cpp:2481] Container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e has 
> exited
> I0118 22:17:23.962484  2887 containerizer.cpp:2118] Destroying container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e in 
> RUNNING state
> I0118 22:17:23.962715  2887 launcher.cpp:149] Asked to destroy container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e
> I0118 22:17:23.977562  2887 process.cpp:3733] Failed to process request for 
> '/slave(69)/api/v1': Container has or is being destroyed
> W0118 22:17:23.978216  2887 http.cpp:2734] Failed to attach to nested 
> container 
> 492a5d0a-0060-416c-ad80-dd0441f558dc.62c170bb-7298-4209-b797-80d7ca73353e: 
> Container has or is being destroyed
> I0118 22:17:23.978330  2887 process.cpp:1435] Returning '500 Internal Server 
> Error' for '/slave(69)/api/v1' (Container has or is being destroyed)
> ../../src/tests/api_tests.cpp:3960: Failure
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
> {code}
> Find attached the full log from a failed run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to