[
https://issues.apache.org/jira/browse/MESOS-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005501#comment-14005501
]
Jie Yu commented on MESOS-1404:
-------------------------------
Consider using "posix_spawn"?
> 'execute()' in mesos_containerizer.cpp is not async signal safe
> ---------------------------------------------------------------
>
> Key: MESOS-1404
> URL: https://issues.apache.org/jira/browse/MESOS-1404
> Project: Mesos
> Issue Type: Bug
> Reporter: Jie Yu
>
> This is due to 'fork()' is not implemented async signal safe in glibc,
> although according to Posix, it should be. When the child tries to execute
> commands returned from isolator prepare(), it will use os::system which uses
> 'fork'.
> I observed this stack trace when I debug a deadlock:
> {noformat}
> (gdb) bt
> #0 0x00007f8fb2d5d2ce in __lll_lock_wait_private () from /lib64/libc.so.6
> #1 0x00007f8fb2ce1d8e in _L_lock_44 () from /lib64/libc.so.6
> #2 0x00007f8fb2cdab4c in ptmalloc_lock_all () from /lib64/libc.so.6
> #3 0x00007f8fb2d11d65 in fork () from /lib64/libc.so.6
> #4 0x00007f8fb4e898de in system (command=..., directory=<value optimized
> out>, envp=..., uid=0, gid=0, redirectIO=<value optimized out>, pipeRead=29,
> pipeWrite=30,
> commands=std::list = {...}) at
> ../../../mesos/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:558
> #5 mesos::internal::slave::execute (command=..., directory=<value optimized
> out>, envp=..., uid=0, gid=0, redirectIO=<value optimized out>, pipeRead=29,
> pipeWrite=30,
> commands=std::list = {...}) at
> ../../../mesos/src/slave/containerizer/mesos_containerizer.cpp:483
> #6 0x00007f8fb4e97bab in __call<, 0, 1, 2, 3, 4, 5, 6, 7, 8>
> (__functor=<value optimized out>)
> at
> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1137
> #7 operator()<> (__functor=<value optimized out>) at
> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1191
> #8 std::tr1::_Function_handler<int(), std::tr1::_Bind<int
> (*(mesos::CommandInfo, std::basic_string<char, std::char_traits<char>,
> std::allocator<char> >, os::ExecEnv, unsigned int, unsigned int, bool, int,
> int, std::list<Option<mesos::CommandInfo>,
> std::allocator<Option<mesos::CommandInfo> > >))(const mesos::CommandInfo&,
> const std::string&, const os::ExecEnv&, uid_t, gid_t, bool, int, int, const
> std::list<Option<mesos::CommandInfo>,
> std::allocator<Option<mesos::CommandInfo> > >&)> >::_M_invoke(const
> std::tr1::_Any_data &) (__functor=<value optimized out>) at
> /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1654
> #9 0x00007f8fb4fcaebe in mesos::internal::slave::_childMain(const
> std::tr1::function<int()> &, int *) (childFunction=..., pipes=0x7f8fad4f0040)
> at ../../../mesos/src/slave/containerizer/linux_launcher.cpp:193
> #10 0x00007f8fb2d4db6d in clone () from /lib64/libc.so.6
> (gdb) info thread
> * 1 Thread 0x7f8fad4f1700 (LWP 62980) 0x00007f8fb2d5d2ce in
> __lll_lock_wait_private () from /lib64/libc.so.6
> {noformat}
> This stack trace matches the stack trace that has been discussed in glibc
> issue tracker:
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
> And they marked this issue as "WON'T FIX". Here is some discussion:
> {noformat}
> The Austin group met yesterday and retained the decision to interpret fork as
> async-signal-unsafe with future specifications mandating that posix_spawn be
> made async-signal-safe to fill the functionality gap. Minutes of the meeting
> are available at https://www.opengroup.org/austin/docs/austin_446.txt.
> I think this bug can now be closed as "WONTFIX"
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)