[ 
https://issues.apache.org/jira/browse/MESOS-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606180#comment-14606180
 ] 

Benjamin Mahler commented on MESOS-2768:
----------------------------------------

Sounds like we're accidentally closing libev's pipe..? Likely need an audit of 
{{close}} and {{os::close}} calls.

One suggestion, if everything goes through {{os::close}}, we can provide 
patches for folks to deploy that exposes libev's pipe fd's and if {{os::close}} 
encounters a call against these fd's, then it can stack trace so we can find 
the call site in the logs. Thoughts?

> SIGPIPE in process::run_in_event_loop()
> ---------------------------------------
>
>                 Key: MESOS-2768
>                 URL: https://issues.apache.org/jira/browse/MESOS-2768
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Yan Xu
>            Priority: Critical
>
> Observed in production.
> {noformat:title=slave log}
> I0526 12:17:48.027257 51633 slave.cpp:4077] Received a new estimation of the 
> oversubscribable resources 
> W0526 12:17:48.027257 51636 logging.cpp:91] RAW: Received signal SIGPIPE; 
> escalating to SIGABRT
> *** Aborted at 1432642668 (unix time) try "date -d @1432642668" if you are 
> using GNU date ***
> PC: @     0x7fa58c23eb6d raise
> *** SIGABRT (@0xc9a5) received by PID 51621 (TID 0x7fa58224c940) from PID 
> 51621; stack trace: ***
>     @     0x7fa58c23eca0 (unknown)
>     @     0x7fa58c23eb6d raise
>     @     0x7fa58cc19ba7 mesos::internal::logging::handler()
>     @     0x7fa58c23eca0 (unknown)
>     @     0x7fa58c23da2b __libc_write
>     @     0x7fa58cb57b6f evpipe_write.part.5
>     @     0x7fa58d245070 process::run_in_event_loop<>()
>     @     0x7fa58d2441ba process::EventLoop::delay()
>     @     0x7fa58d1c3c9c process::clock::scheduleTick()
>     @     0x7fa58d1c65b1 process::Clock::timer()
>     @     0x7fa58d23915a process::delay<>()
>     @     0x7fa58d23a740 process::ReaperProcess::wait()
>     @     0x7fa58d21261a process::ProcessManager::resume()
>     @     0x7fa58d2128dc process::schedule()
>     @     0x7fa58c23683d start_thread
>     @     0x7fa58ba28fcd clone
> {noformat}
> {noformat:title=gdb}
> (gdb) bt
> #0  0x00007fa58c23eb6d in raise () from /lib64/libpthread.so.0
> #1  0x00007fa58cc19ba7 in mesos::internal::logging::handler (signal=Unhandled 
> dwarf expression opcode 0xf3
> ) at logging/logging.cpp:92
> #2  <signal handler called>
> #3  0x00007fa58c23da2b in write () from /lib64/libpthread.so.0
> #4  0x00007fa58cb57b6f in evpipe_write (loop=0x7fa58e1e79c0, flag=Unhandled 
> dwarf expression opcode 0xfa
> ) at ev.c:2172
> #5  0x00007fa58d245070 in process::run_in_event_loop<Nothing>(const 
> std::function<process::Future<Nothing>()> &) (f=Unhandled dwarf expression 
> opcode 0xf3
> ) at src/libev.hpp:80
> #6  0x00007fa58d2441ba in process::EventLoop::delay(const Duration &, const 
> std::function<void()> &) (duration=Unhandled dwarf expression opcode 0xf3
> ) at src/libev.cpp:106
> #7  0x00007fa58d1c3c9c in process::clock::scheduleTick (timers=Unhandled 
> dwarf expression opcode 0xf3
> ) at src/clock.cpp:119
> #8  0x00007fa58d1c65b1 in process::Clock::timer(const Duration &, const 
> std::function<void()> &) (duration=Unhandled dwarf expression opcode 0xf3
> ) at src/clock.cpp:254
> #9  0x00007fa58d23915a in process::delay<process::ReaperProcess> 
> (duration=..., pid=Unhandled dwarf expression opcode 0xf3
> ) at ./include/process/delay.hpp:25
> #10 0x00007fa58d23a740 in process::ReaperProcess::wait (this=0x2056920) at 
> src/reap.cpp:93
> #11 0x00007fa58d21261a in process::ProcessManager::resume (this=0x1db8d20, 
> process=0x2056958) at src/process.cpp:2172
> #12 0x00007fa58d2128dc in process::schedule (arg=Unhandled dwarf expression 
> opcode 0xf3
> ) at src/process.cpp:602
> #13 0x00007fa58c23683d in start_thread () from /lib64/libpthread.so.0
> #14 0x00007fa58ba28fcd in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to