[ 
https://issues.apache.org/jira/browse/MESOS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318666#comment-16318666
 ] 

Andrei Budnik commented on MESOS-7742:
--------------------------------------

As we have launched 
[`cat`|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/tests/api_tests.cpp#L6529]
 command as a nested container, related ioswitchboard process will be in the 
same process group. Whenever a process group leader ({{cat}}) terminates, all 
processes in the process group are killed, including ioswitchboard.
ioswitchboard handles HTTP requests from the slave, e.g. 
{{ATTACH_CONTAINER_INPUT}} request in this test.
Usually, after reading all client's data, {{Http::_attachContainerInput()}} 
invokes a callback which calls 
[writer.close()|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/slave/http.cpp#L3223].
[writer.close()|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L561]
 implies sending a 
[\r\n\r\n|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1045]
 to the ioswitchboard process.
ioswitchboard returns [200 
OK|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/src/slave/containerizer/mesos/io/switchboard.cpp#L1572]
 response, hence agent returns {{200 OK}} for {{ATTACH_CONTAINER_INPUT}} 
request as expected.

However, if ioswitchboard terminates before it receives {{\r\n\r\n}} or before 
agent receives {{200 OK}} response from the ioswitchboard, connection (via unix 
socket) might be closed, so corresponding {{ConnectionProcess}} will handle 
this case as an unexpected [EOF| 
https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1293
 
https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1293]
 during 
[read|https://github.com/apache/mesos/blob/3290b401d20f2db2933294470ea8a2356a47c305/3rdparty/libprocess/src/http.cpp#L1216]
 of a response. That will lead to {{500 Internal Server Error}} response from 
the agent.

> ContentType/AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-7742
>                 URL: https://issues.apache.org/jira/browse/MESOS-7742
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Vinod Kone
>            Assignee: Andrei Budnik
>              Labels: flaky-test, mesosphere-oncall
>         Attachments: AgentAPITest.LaunchNestedContainerSession-badrun.txt, 
> LaunchNestedContainerSessionDisconnected-badrun.txt
>
>
> Observed this on ASF CI and internal Mesosphere CI. Affected tests:
> {noformat}
> AgentAPIStreamingTest.AttachInputToNestedContainerSession
> AgentAPITest.LaunchNestedContainerSession
> AgentAPITest.AttachContainerInputAuthorization/0
> AgentAPITest.LaunchNestedContainerSessionWithTTY/0
> AgentAPITest.LaunchNestedContainerSessionDisconnected/1
> {noformat}
> This issue comes at least in three different flavours. Take 
> {{AgentAPIStreamingTest.AttachInputToNestedContainerSession}} as an example.
> h5. Flavour 1
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "503 Service Unavailable"
> Expected: http::OK().status
> Which is: "200 OK"
>     Body: ""
> {noformat}
> h5. Flavour 2
> {noformat}
> ../../src/tests/api_tests.cpp:6473
> Value of: (response).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
>     Body: "Disconnected"
> {noformat}
> h5. Flavour 3
> {noformat}
> /home/ubuntu/workspace/mesos/Mesos_CI-build/FLAG/CMake/label/mesos-ec2-ubuntu-16.04/mesos/src/tests/api_tests.cpp:6367
> Value of: (sessionResponse).get().status
>   Actual: "500 Internal Server Error"
> Expected: http::OK().status
> Which is: "200 OK"
>     Body: ""
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to