[jira] [Comment Edited] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-17 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872263#comment-15872263
 ] 

Benjamin Mahler edited comment on MESOS-7122 at 2/17/17 6:31 PM:
-

{quote}
I don't think I understood you correctly here. It sounds like you are saying 
that every API that returns a Future ought to be a separate thread? Since 
basically everything returns a Future, that doesn't seem practical.
{quote}

Right, that was the point :). The reasoning seems flawed given it can be 
applied everywhere but something about the reaper is special.


was (Author: bmahler):
{code}
I don't think I understood you correctly here. It sounds like you are saying 
that every API that returns a Future ought to be a separate thread? Since 
basically everything returns a Future, that doesn't seem practical.
{code}

Right, that was the point :). The reasoning seems flawed given it can be 
applied everywhere but something about the reaper is special.

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> 

[jira] [Comment Edited] (MESOS-7122) Process reaper should have a dedicated thread to avoid deadlock.

2017-02-16 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870986#comment-15870986
 ] 

Benjamin Mahler edited comment on MESOS-7122 at 2/17/17 1:13 AM:
-

[~xujyan] I'm not so sure, since this ticket to me seems to just be a specific 
case of having blocking in actors and an insufficient number of worker threads. 
Generalizing the suggestion in this ticket seems to imply having extraneous 
threads for more than just the reaper?

{quote}
This happens in the Mesos HDFS client, which synchronously runs a hadoop 
subprocess.
{quote}

Does this mean that there is blocking in the hdfs client? Can we remove the 
blocking?


was (Author: bmahler):
[~xujyan] I'm not so sure, since this ticket to me seems to just be a specific 
case of having blocking in actors and an insufficient number of worker threads. 
Generalizing the suggestion in this ticket seems to imply having extraneous 
threads for more than just the reaper?

> Process reaper should have a dedicated thread to avoid deadlock.
> 
>
> Key: MESOS-7122
> URL: https://issues.apache.org/jira/browse/MESOS-7122
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess
>Reporter: James Peach
>
> In a test environment, we saw that libprocess can deadlock when the process 
> reaper is unable to run. 
> This happens in the Mesos HDFS client, which synchronously runs a {{hadoop}} 
> subprocess. If this happens too many times, the {{ReaperProcess}} is never 
> scheduled to reap the subprocess statuses. Since the HDFS {{Future}} never 
> completes, we deadlock with all the threads in the call stack below. If there 
> was a dedicated thread for the {{ReaperProcess}} to run on, or some other way 
> to endure that is is scheduled we could avoid the deadlock.
> {noformat}
> #0  0x7f67b6ffc68c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f67b6da12fc in 
> std::condition_variable::wait(std::unique_lock&) () from 
> /usr/lib64/libstdc++.so.6
> #2  0x7f67b8b864f6 in process::ProcessManager::wait(process::UPID const&) 
> () from /usr/lib64/libmesos-1.2.0.so
> #3  0x7f67b8b8d347 in process::wait(process::UPID const&, Duration 
> const&) () from /usr/lib64/libmesos-1.2.0.so
> #4  0x7f67b8b51a85 in process::Latch::await(Duration const&) () from 
> /usr/lib64/libmesos-1.2.0.so
> #5  0x7f67b834fc9f in process::Future::await(Duration const&) 
> const () from /usr/lib64/libmesos-1.2.0.so
> #6  0x7f67b833d700 in 
> mesos::internal::slave::fetchSize(std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&) () from /usr/lib64/libmesos-1.2.0.so
> #7  0x7f67b833df5e in 
> std::result_of  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()>::type 
> process::AsyncExecutorProcess::execute  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2}>(std::result_of const&, 
> boost::disable_if const&::is_void  const&, mesos::CommandInfo const&, std::basic_string std::char_traits, std::allocator > const&, 
> Option 
> > const&, mesos::SlaveID const&, mesos::internal::slave::Flags 
> const&)::{lambda()#2} ()()> >, void>::type*) () from 
> /usr/lib64/libmesos-1.2.0.so
> #8  0x7f67b833a3d5 in std::_Function_handler ()(process::ProcessBase*), process::Future > 
> process::dispatch, process::AsyncExecutorProcess, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
> const&, mesos::internal::slave::Flags const&)::{lambda()#2} const&, void*, 
> {lambda()#2}, 
> mesos::internal::slave::FetcherProcess::fetch(mesos::ContainerID const&, 
> mesos::CommandInfo const&, std::basic_string std::allocator > const&, Option std::char_traits, std::allocator > > const&, mesos::SlaveID 
>