[jira] [Comment Edited] (MESOS-9334) Container stuck at ISOLATING state due to libevent poll never returns

2018-10-24 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661975#comment-16661975
 ] 

Qian Zhang edited comment on MESOS-9334 at 10/24/18 9:17 AM:
-

After reading some libevent code and our code to call libevent, I think the 
root cause of this issue is, after we call libevent to poll an fd, that fd is 
disabled inside libevent due to a race. Here is the flow:
 # Container1 is launched and cgroups memory subsystem calls the function 
`cgroups::memory::oom::listen()` to listen OOM event for this container, and 
that function will internally open an fd, call libevent to poll it, and return 
a future to cgroups memory subsystem.
 # Container1 exits and when we destroy it, the cleanup method of cgroups 
memory subsystem will discard the future got in #1. As the result, 
`Listener::finalize()` will be called (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/src/linux/cgroups.cpp#L1069:L1087]
 for details), and it will
 ** Discard the future returned by libevent poll which will cause 
`pollDiscard()` called and that will trigger `pollCallback` to be executed 
*asynchronously* (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L66:L70]
 for details).
 ** Close the fd opened in #1 *immediately* which means the fd can be reused 
now.
 # Container2 is launched, and CNI isolator calls `io::read` to read the 
stdout/stderr of CNI plugin for this container. Internally `io::read` *reuses* 
the fd closed in #2 and call libevent to poll it.
 # Now the function `pollCallback` for container1 is executed, and it will 
delete the poll object which will trigger `event_free` to deallocate the event 
for this container (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L50:L52]
 for details). Internally `event_free` will call `event_del` => 
`event_del_internal` => `evmap_io_del` => `evsel->del` to *disable* the fd (see 
[this 
code|https://github.com/libevent/libevent/blob/release-2.0.22-stable/event-internal.h#L78:L79]
 for details), but that fd is now used to read stdout/stderr for container2 in 
#3. Since the fd is disabled inside libevent, the `io::read` we do in #3 will 
never return so the container2 will be stuck at `ISOLATING` state.


was (Author: qianzhang):
After reading some libevent code and our code to call libevent, I think the 
root cause of this issue is, after we call libevent to poll an fd, that fd is 
disabled inside libevent due to a race. Here is the flow:
 # Container1 is launched and cgroups memory subsystem calls the function 
`cgroups::memory::oom::listen()` to listen OOM event for this container, and 
that function will internally open an fd, call libevent to poll it, and return 
a future to cgroups memory subsystem.
 # Container1 exits and when we destroy it, the cleanup method of cgroups 
memory subsystem will discard the future got in #1. As the result, 
`Listener::finalize()` will be called (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/src/linux/cgroups.cpp#L1069:L1087]
 for details), and it will
 ** Discard the future returned by libevent poll which will cause 
`pollDiscard()` called and that will trigger `pollCallback` to be executed 
*asynchronously* (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L66:L70]
 for details).
 ** Close the fd opened in #1 *immediately* which means the fd can be reused 
now.
 # Container2 is launched, and CNI isolator calls `io::read` to read the 
stdout/stderr of CNI plugin for this container. Internally `io::read` *reuses* 
the fd closed in #2 and call libevent to poll it.
 # Now the function `pollCallback` for container1 is executed, and it will 
delete the poll object which will trigger `event_free` to deallocate the event 
for this container (see [this 
code|https://github.com/apache/mesos/blob/1.7.0/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp#L50:L52]
 for details). Internally `event_free` will call `event_del` -> 
`event_del_internal` -> `evmap_io_del` -> `evsel->del` to *disable* the fd (see 
[this 
code|https://github.com/libevent/libevent/blob/release-2.0.22-stable/event-internal.h#L78:L79]
 for details), but that fd is now used to read stdout/stderr for container2 in 
#3. Since the fd is disabled inside libevent, the `io::read` we do in #3 will 
never return so the container2 will be stuck at `ISOLATING` state.

> Container stuck at ISOLATING state due to libevent poll never returns
> -
>
> Key: MESOS-9334
> URL: https://issues.apache.org/jira/browse/MESOS-9334
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Qian Zhang
>   

[jira] [Comment Edited] (MESOS-9334) Container stuck at ISOLATING state due to libevent poll never returns

2018-10-22 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658795#comment-16658795
 ] 

Qian Zhang edited comment on MESOS-9334 at 10/22/18 2:14 PM:
-

I added some logs into `libevent_poll.cpp` (see the diff below for details) and 
reproduced this issue a couple of times.
{code:java}
--- a/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
+++ b/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
@@ -32,11 +34,17 @@ struct Poll
 };
 
 
-void pollCallback(evutil_socket_t, short what, void* arg)
+void pollCallback(evutil_socket_t fd, short what, void* arg)
 {
   Poll* poll = reinterpret_cast(arg);
 
+  LOG(INFO) << "==pollCallback starts with fd " << fd
+<< " and with poll " << poll << "==";
+
   if (poll->promise.future().hasDiscard()) {
+LOG(INFO) << "==pollCallback discards with fd "
+  << fd << "==";
+
 poll->promise.discard();
   } else {
 // Convert libevent specific EV_READ / EV_WRITE to io::* specific
@@ -44,17 +52,24 @@ void pollCallback(evutil_socket_t, short what, void* arg)
 short events =
   ((what & EV_READ) ? io::READ : 0) | ((what & EV_WRITE) ? io::WRITE : 0);
 
+LOG(INFO) << "==pollCallback sets promise with fd " << fd
+  << " and with events " << events << "==";
+
 poll->promise.set(events);
   }
 
   // Deleting the `poll` also destructs `ev` and hence triggers `event_free`,
   // which makes the event non-pending.
   delete poll;
+
+  LOG(INFO) << "==pollCallback ends with fd " << fd << "==";
 }
 
 
 void pollDiscard(const std::weak_ptr& ev, short events)
 {
+  LOG(INFO) << "==pollDiscard is called==";
+
   // Discarding inside the event loop prevents `pollCallback()` from being
   // called twice if the future is discarded.
   run_in_event_loop([=]() {
@@ -78,6 +93,9 @@ Future poll(int_fd fd, short events)
 
   internal::Poll* poll = new internal::Poll();
 
+  LOG(INFO) << "==libevent starts polling with fd " << fd
+<< " and with poll " << poll << "==";
+
   Future future = poll->promise.future();
 
   // Convert io::READ / io::WRITE to libevent specific values of these
{code}
Here is what I found in the agent log when this issue occurred (fd 48 is the 
stderr file descriptor of `NetworkCniIsolatorSetup`):
{code:java}
I1021 15:57:45.00 2116 libevent_poll.cpp:96] ==libevent starts 
polling with fd 48 and with poll 0x7f60df029eb0==
I1021 15:57:45.00 2117 libevent_poll.cpp:41] ==pollCallback starts 
with fd 48 and with poll 0x7f60e6e56c70==
I1021 15:57:45.00 2117 libevent_poll.cpp:45] ==pollCallback 
discards with fd 48==
I1021 15:57:45.00 2117 libevent_poll.cpp:65] ==pollCallback ends 
with fd 48==
{code}
We can see libevent started to poll fd 48 with the poll object 0x7f60df029eb0, 
but when `pollCallback` was called for fd 48, the poll object is different 
(0x7f60e6e56c70) which has been discarded (see the third log line) !!! 

And when I searched 0x7f60e6e56c70 in the agent log, I found:
{code:java}
I1021 15:57:22.00 2115 memory.cpp:478] Started listening for OOM events for 
container 4753a5ef-eccd-4373-b8f0-a4b40abf0fb5
I1021 15:57:22.00 2115 libevent_poll.cpp:96] ==libevent starts 
polling with fd 48 and with poll 0x7f60e6e56c70==
{code}
So the poll object 0x7f60e6e56c70 was created 23 seconds ago with the same file 
descriptor (fd 48) which is used to listen OOM events for another container 
(4753a5ef-eccd-4373-b8f0-a4b40abf0fb5), and that container was destroyed at 
15:57:44 right before agent started to wait the stderr of 
`NetworkCniIsolatorSetup` (15:57:45).
{code:java}
I1021 15:57:44.00  2114 containerizer.cpp:2459] Destroying container 
4753a5ef-eccd-4373-b8f0-a4b40abf0fb5 in RUNNING state
{code}
I reproduced this issue a couple of times, the observations I had from agent 
log are always same as the above.


was (Author: qianzhang):
I added some logs into `libevent_poll.cpp` (see the diff below for details) and 
reproduced this issue a couple of times.
{code:java}
--- a/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
+++ b/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
@@ -32,11 +34,17 @@ struct Poll
 };
 
 
-void pollCallback(evutil_socket_t, short what, void* arg)
+void pollCallback(evutil_socket_t fd, short what, void* arg)
 {
   Poll* poll = reinterpret_cast(arg);
 
+  LOG(INFO) << "==pollCallback starts with fd " << fd
+<< " and with poll " << poll << "==";
+
   if (poll->promise.future().hasDiscard()) {
+LOG(INFO) << "==pollCallback discards with fd "
+  << fd << "==";
+
 poll->promise.discard();
   } else {
 // Convert libevent specific EV_READ / EV_WRITE to io::* 

[jira] [Comment Edited] (MESOS-9334) Container stuck at ISOLATING state due to libevent poll never returns

2018-10-22 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658795#comment-16658795
 ] 

Qian Zhang edited comment on MESOS-9334 at 10/22/18 9:55 AM:
-

I added some logs into `libevent_poll.cpp` (see the diff below for details) and 
reproduced this issue a couple of times.
{code:java}
--- a/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
+++ b/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
@@ -32,11 +34,17 @@ struct Poll
 };
 
 
-void pollCallback(evutil_socket_t, short what, void* arg)
+void pollCallback(evutil_socket_t fd, short what, void* arg)
 {
   Poll* poll = reinterpret_cast(arg);
 
+  LOG(INFO) << "==pollCallback starts with fd " << fd
+<< " and with poll " << poll << "==";
+
   if (poll->promise.future().hasDiscard()) {
+LOG(INFO) << "==pollCallback discards with fd "
+  << fd << "==";
+
 poll->promise.discard();
   } else {
 // Convert libevent specific EV_READ / EV_WRITE to io::* specific
@@ -44,17 +52,24 @@ void pollCallback(evutil_socket_t, short what, void* arg)
 short events =
   ((what & EV_READ) ? io::READ : 0) | ((what & EV_WRITE) ? io::WRITE : 0);
 
+LOG(INFO) << "==pollCallback sets promise with fd " << fd
+  << " and with events " << events << "==";
+
 poll->promise.set(events);
   }
 
   // Deleting the `poll` also destructs `ev` and hence triggers `event_free`,
   // which makes the event non-pending.
   delete poll;
+
+  LOG(INFO) << "==pollCallback ends with fd " << fd << "==";
 }
 
 
 void pollDiscard(const std::weak_ptr& ev, short events)
 {
+  LOG(INFO) << "==pollDiscard is called==";
+
   // Discarding inside the event loop prevents `pollCallback()` from being
   // called twice if the future is discarded.
   run_in_event_loop([=]() {
@@ -78,6 +93,9 @@ Future poll(int_fd fd, short events)
 
   internal::Poll* poll = new internal::Poll();
 
+  LOG(INFO) << "==libevent starts polling with fd " << fd
+<< " and with poll " << poll << "==";
+
   Future future = poll->promise.future();
 
   // Convert io::READ / io::WRITE to libevent specific values of these
{code}
Here is what I found in the agent log when this issue occurred (fd 48 is the 
stderr file descriptor of `NetworkCniIsolatorSetup`):
{code:java}
I1021 15:57:45.00 2116 libevent_poll.cpp:96] ==libevent starts 
polling with fd 48 and with poll 0x7f60df029eb0==
I1021 15:57:45.00 2117 libevent_poll.cpp:41] ==pollCallback starts 
with fd 48 and with poll 0x7f60e6e56c70==
I1021 15:57:45.00 2117 libevent_poll.cpp:45] ==pollCallback 
discards with fd 48==
I1021 15:57:45.00 2117 libevent_poll.cpp:65] ==pollCallback ends 
with fd 48==
{code}
We can see libevent started to poll fd 48 with the poll object whose address is 
0x7f60df029eb0, but when `pollCallback` was called for fd 48, the address of 
poll object is different (0x7f60e6e56c70) which has been discarded (see the 
third log line) !!! 

And when I searched 0x7f60e6e56c70 in the agent log, I found:
{code:java}
I1021 15:57:22.00 2115 memory.cpp:478] Started listening for OOM events for 
container 4753a5ef-eccd-4373-b8f0-a4b40abf0fb5
I1021 15:57:22.00 2115 libevent_poll.cpp:96] ==libevent starts 
polling with fd 48 and with poll 0x7f60e6e56c70==
{code}
So the poll object 0x7f60e6e56c70 was created 23 seconds ago with the same file 
descriptor (fd 48) which is used to listen OOM events for another container 
(4753a5ef-eccd-4373-b8f0-a4b40abf0fb5), and that container was destroyed at 
15:57:44 right before agent started to wait the stderr of 
`NetworkCniIsolatorSetup` (15:57:45).
{code:java}
I1021 15:57:44.00  2114 containerizer.cpp:2459] Destroying container 
4753a5ef-eccd-4373-b8f0-a4b40abf0fb5 in RUNNING state
{code}
I reproduced this issue a couple of times, the observations I had from agent 
log are same as the above.


was (Author: qianzhang):
I added some logs into `libevent_poll.cpp` (see the diff below for details) and 
reproduced this issue a couple of times.
{code:java}
--- a/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
+++ b/3rdparty/libprocess/src/posix/libevent/libevent_poll.cpp
@@ -32,11 +34,17 @@ struct Poll
 };
 
 
-void pollCallback(evutil_socket_t, short what, void* arg)
+void pollCallback(evutil_socket_t fd, short what, void* arg)
 {
   Poll* poll = reinterpret_cast(arg);
 
+  LOG(INFO) << "==pollCallback starts with fd " << fd
+<< " and with poll " << poll << "==";
+
   if (poll->promise.future().hasDiscard()) {
+LOG(INFO) << "==pollCallback discards with fd "
+  << fd << "==";
+
 poll->promise.discard();
   } else {
 // Convert libevent specific EV_READ /