[ https://issues.apache.org/jira/browse/MESOS-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13898376#comment-13898376 ]
Ian Downes commented on MESOS-912: ---------------------------------- No master was running when I did this. {noformat} [1357][idownes:build]$ ./bin/gdb-mesos-slave.sh --master=127.0.0.1:5050 [67/67] GNU gdb 6.3.50-20050815 (Apple version gdb-1824) (Wed Feb 6 22:51:23 UTC 2013) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries ........ done (gdb) run Starting program: /Users/idownes/projects/mesos/build/src/.libs/mesos-slave --master=127.0.0.1:5050 Reading symbols for shared libraries +++++++....................................................................... done Reading symbols for shared libraries ... done I0211 13:58:05.130982 1926033792 main.cpp:112] Build: 2014-02-06 17:25:45 by idownes I0211 13:58:05.131191 1926033792 main.cpp:114] Version: 0.18.0 I0211 13:58:05.131199 1926033792 main.cpp:121] Git SHA: a9224441ef4e4868f18b92332e9c281f2ed0a411 I0211 13:58:05.131211 1926033792 containerizer.cpp:180] Using isolation: posix/cpu,posix/mem I0211 13:58:05.138918 1926033792 main.cpp:135] Starting Mesos slave I0211 13:58:05.139505 36204544 slave.cpp:112] Slave started on 1)@172.25.142.203:5051 I0211 13:58:05.139729 36204544 slave.cpp:122] Slave resources: cpus(*):4; mem(*):7168; disk(*):233112; ports(*):[31000-32000] I0211 13:58:05.140068 36204544 slave.cpp:150] Slave hostname: 172.25.142.203 I0211 13:58:05.140079 36204544 slave.cpp:151] Slave checkpoint: true I0211 13:58:05.141855 37277696 state.cpp:33] Recovering state from '/tmp/mesos/meta' I0211 13:58:05.142027 37277696 state.cpp:62] Failed to find the latest slave from '/tmp/mesos/meta' I0211 13:58:05.142153 23818240 status_update_manager.cpp:188] Recovering status update manager I0211 13:58:05.142304 23818240 mesos_containerizer.cpp:137] Recovering containerizer I0211 13:58:05.142681 37277696 slave.cpp:2670] Finished recovery I0211 13:58:05.142953 37277696 slave.cpp:2702] Garbage collecting old slave 201402051155-16777343-5050-11632-0 I0211 13:58:05.143077 37277696 slave.cpp:2702] Garbage collecting old slave 201402051155-16777343-5050-11632-1 I0211 13:58:05.143153 37277696 slave.cpp:2702] Garbage collecting old slave 201402051155-16777343-5050-11632-2 I0211 13:58:05.143225 37277696 slave.cpp:2702] Garbage collecting old slave 201402061801-16777343-5050-84709-0 I0211 13:58:05.143316 37277696 slave.cpp:2702] Garbage collecting old slave 201402071755-16777343-5050-36065-0 I0211 13:58:05.143405 37277696 slave.cpp:2702] Garbage collecting old slave 201402101457-16777343-5050-55200-0 I0211 13:58:05.143278 36741120 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201402051155-16777343-5050-11632-0' for gc 6.99999834488296days in the future I0211 13:58:05.143486 36741120 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/201402051155-16777343-5050-11632-0' for gc 6.99999834421333days in the future I0211 13:58:05.143725 36741120 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201402051155-16777343-5050-11632-1' for gc 6.99999834369185days in the future I0211 13:58:05.143761 36741120 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/201402051155-16777343-5050-11632-1' for gc 6.99999834330963days in the future I0211 13:58:05.143786 37277696 slave.cpp:397] New master detected at master@127.0.0.1:5050 I0211 13:58:05.143791 36741120 gc.cpp:56] Scheduling '/tmp/mesos/slaves/201402051155-16777343-5050-11632-2' for gc 6.99999834285926days in the future I0211 13:58:05.143954 37277696 slave.cpp:422] Detecting new master I0211 13:58:05.143992 36741120 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/201402051155-16777343-5050-11632-2' for gc 6.99999834248889days in the future I0211 13:58:05.144011 35667968 status_update_manager.cpp:162] New master detected at master@127.0.0.1:5050 Program received signal SIGPIPE, Broken pipe. 0x00007fff8c7980fa in __psynch_cvwait () (gdb) bt #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc87b in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc87b in process::ProcessManager::wait (this=0x1017201d0, pid=@0x1017201d0) at gate.hpp:2851 #4 0x00000001003e55dd in process::wait (pid=@0x7fff5fbfef00, duration=@0x7fff5fbfeef8) at process.cpp:3389 #5 0x0000000100002397 in main (argc=1606410488, argv=0x103800000) at main.cpp:141 {noformat} for all threads: {noformat} gdb) thread apply all bt Thread 10 (process 9436): #0 0x00007fff8c798322 in select$DARWIN_EXTSN () #1 0x00000001004ececb in select_poll (loop=0x19, timeout=2.1424611915836674e-314) at ev_select.c:170 #2 0x00000001004f1706 in ev_run (loop=0x19, flags=41422592) at ev.c:3360 #3 0x00000001003d903b in ev_loop [inlined] () at /Users/idownes/projects/mesos/build/3rdparty/libprocess/3rdparty/libev-4.15/ev.h:1287 #4 0x00000001003d903b in process::serve (arg=0x19) at process.cpp:1287 #5 0x00007fff8dd9e772 in _pthread_start () #6 0x00007fff8dd8b1a1 in thread_start () Thread 9 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc9eb in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc9eb in process::schedule (arg=0x101711ef0) at gate.hpp:1301 #4 0x00007fff8dd9e772 in _pthread_start () #5 0x00007fff8dd8b1a1 in thread_start () Thread 8 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc9eb in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc9eb in process::schedule (arg=0x101711ef0) at gate.hpp:1301 #4 0x00007fff8dd9e772 in _pthread_start () #5 0x00007fff8dd8b1a1 in thread_start () Thread 7 (process 9436): #0 0x00007fff8c7981ae in __psynch_rw_wrlock () #1 0x00007fff8dda4e76 in pthread_rwlock_wrlock () #2 0x00000001004b1cd1 in glog_internal_namespace_::MutexLock::MutexLock () at /Users/idownes/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/base/mutex.h:248 #3 0x00000001004b1cd1 in google::LogMessage::Flush (this=0x100871260) at mutex.h:1280 #4 0x00000001004b2c62 in google::LogMessage::~LogMessage (this=0x10238cdb8) at logging.cc:1240 #5 0x0000000100197f78 in Option<process::UPID>::isNone () at /Users/idownes/projects/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp:1844 #6 0x0000000100197f78 in __gnu_cxx::new_allocator<std::pair<std::string const, Option<std::string> > >::destroy () at /usr/include/c++/4.2.1/ext/new_allocator.h:1846 #7 0x0000000100197f78 in mesos::internal::slave::Slave::exited (this=0x103800000, pid=@0x104005058) at slave.cpp:402 #8 0x00000001003dc070 in process::ProcessManager::resume (this=0x10238cee0, process=0x103800338) at process.cpp:2614 #9 0x00000001003dca78 in process::schedule (arg=0x101713990) at process.cpp:1307 #10 0x00007fff8dd9e772 in _pthread_start () #11 0x00007fff8dd8b1a1 in thread_start () Thread 6 (process 9436): #0 0x00007fff8c7981ae in __psynch_rw_wrlock () #1 0x00007fff8dda4e76 in pthread_rwlock_wrlock () #2 0x00000001004b1cd1 in glog_internal_namespace_::MutexLock::MutexLock () at /Users/idownes/projects/mesos/build/3rdparty/libprocess/3rdparty/glog-0.3.3/src/base/mutex.h:248 #3 0x00000001004b1cd1 in google::LogMessage::Flush (this=0x100871260) at mutex.h:1280 #4 0x00000001004b2c62 in google::LogMessage::~LogMessage (this=0x102309ce8) at logging.cc:1240 #5 0x00000001001586e5 in __gnu_cxx::new_allocator<std::pair<std::string const, Option<std::string> > >::destroy () at /usr/include/c++/4.2.1/ext/new_allocator.h:56 #6 0x00000001001586e5 in mesos::internal::slave::GarbageCollectorProcess::schedule (this=0x104001f60, d=@0x102309d50, path=@0x1040082f0) at gc.cpp:402 #7 0x000000010015a1b6 in std::tr1::_Function_base::_Base_manager<std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<Nothing> (mesos::internal::slave::GarbageCollectorProcess::*)(Duration const&, std::string const&)> ()(std::tr1::_Placeholder<1>, Duration, std::string)> >::_M_get_pointer () at /usr/include/c++/4.2.1/tr1/functional:864 #8 0x000000010015a1b6 in __gnu_cxx::new_allocator<std::pair<std::string const, Option<std::string> > >::destroy () at /usr/include/c++/4.2.1/ext/new_allocator.h:488 #9 0x000000010015a1b6 in std::tr1::_Function_handler<process::Future<Nothing> ()(mesos::internal::slave::GarbageCollectorProcess*), std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<Nothing> (mesos::internal::slave::GarbageCollectorProcess::*)(Duration const&, std::string const&)> ()(std::tr1::_Placeholder<1>, Duration, std::string)> >::_M_invoke (__functor=@0x 1f05, __a1=0x1040082e8) at functional:402 #10 0x000000010015e4f3 in std::tr1::__shared_ptr<std::tr1::function<process::Future<Nothing> ()(mesos::internal::slave::GarbageCollectorProcess*)>, (__gnu_cxx::_Lock_policy)2>::operator* () at /usr/include/c++/4.2.1/tr1/boost_shared_ptr.h:865 #11 __gnu_cxx::new_allocator<std::pair<std::string const, Option<std::string> > >::destroy () at /usr/include/c++/4.2.1/ext/new_allocator.h:88 #12 0x000000010015e4f3 in process::internal::pdispatcher<Nothing, mesos::internal::slave::GarbageCollectorProcess> (process=0x102309d80, thunk=@0x102309d80, promise=@0x102309db0) at boost_shared_ptr.h:402 #13 0x000000010015adeb in std::tr1::__shared_ptr<std::tr1::function<process::Future<Nothing> ()(mesos::internal::slave::GarbageCollectorProcess*)>, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr () at bind_iterate.h:45 #14 0x000000010015adeb in std::tr1::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count () at /usr/include/c++/4.2.1/tr1/boost_shared_ptr.h:974 #15 0x000000010015adeb in __gnu_cxx::new_allocator<std::pair<std::string const, Option<std::string> > >::destroy () at /usr/include/c++/4.2.1/ext/new_allocator.h:504 #16 0x000000010015adeb in std::tr1::_Bind<void (*()(std::tr1::_Placeholder<1>, std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> ()(mesos::internal::slave::GarbageCollectorProcess*)> >, std::tr1::shared_ptr<process::Promise<Nothing> >))(process::ProcessBase*, std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> ()(mesos::internal::slav e::GarbageCollectorProcess*)> >, std::tr1::shared_ptr<process::Promise<Nothing> >)>::operator()<process::ProcessBase*> (this=0x102309d80, __u1=@0x1f05) at bind_iterate.h:402 #17 0x000000010015af28 in __gnu_cxx::new_allocator<std::pair<std::string const, Option<std::string> > >::destroy () at /usr/include/c++/4.2.1/ext/new_allocator.h:502 #18 0x000000010015af28 in std::tr1::_Function_handler<void ()(process::ProcessBase*), std::tr1::_Bind<void (*()(std::tr1::_Placeholder<1>, std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> ()(mesos::internal::slave::GarbageCollectorProcess*)> >, std::tr1::shared_ptr<process::Promise<Nothing> >))(process::ProcessBase*, std::tr1::shared_ptr<std::tr1 ::function<process::Future<Nothing> ()(mesos::internal::slave::GarbageCollectorProcess*)> >, std::tr1::shared_ptr<process::Promise<Nothing> >)> >::_M_invoke (__a1=0x104002008, __functor=@0x102309d80) at functional_iterate.h:402 #19 0x00000001003dc070 in process::ProcessManager::resume (this=0x102309ee0, process=0x104002008) at process.cpp:2614 #20 0x00000001003dca78 in process::schedule (arg=0x101713990) at process.cpp:1307 #21 0x00007fff8dd9e772 in _pthread_start () #22 0x00007fff8dd8b1a1 in thread_start () Thread 5 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc9eb in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc9eb in process::schedule (arg=0x101711ef0) at gate.hpp:1301 #4 0x00007fff8dd9e772 in _pthread_start () #5 0x00007fff8dd8b1a1 in thread_start () Thread 4 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc9eb in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc9eb in process::schedule (arg=0x101711ef0) at gate.hpp:1301 #4 0x00007fff8dd9e772 in _pthread_start () #5 0x00007fff8dd8b1a1 in thread_start () Thread 3 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc9eb in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc9eb in process::schedule (arg=0x101711ef0) at gate.hpp:1301 #4 0x00007fff8dd9e772 in _pthread_start () #5 0x00007fff8dd8b1a1 in thread_start () Thread 2 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc9eb in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc9eb in process::schedule (arg=0x101711ef0) at gate.hpp:1301 #4 0x00007fff8dd9e772 in _pthread_start () #5 0x00007fff8dd8b1a1 in thread_start () Thread 1 (process 9436): #0 0x00007fff8c7980fa in __psynch_cvwait () #1 0x00007fff8dda2fb9 in _pthread_cond_wait () #2 0x00000001003dc87b in Gate::arrive () at /Users/idownes/projects/mesos/3rdparty/libprocess/src/gate.hpp:73 #3 0x00000001003dc87b in process::ProcessManager::wait (this=0x1017201d0, pid=@0x1017201d0) at gate.hpp:2851 #4 0x00000001003e55dd in process::wait (pid=@0x7fff5fbfef00, duration=@0x7fff5fbfeef8) at process.cpp:3389 #5 0x0000000100002397 in main (argc=1606410488, argv=0x103800000) at main.cpp:141 {noformat} > Slave sometimes crashes with SIGPIPE > ------------------------------------ > > Key: MESOS-912 > URL: https://issues.apache.org/jira/browse/MESOS-912 > Project: Mesos > Issue Type: Bug > Affects Versions: 0.17.0 > Environment: OSX 10.8.5 > Reporter: Vinod Kone > Fix For: 0.18.0 > > > ➜ build git:(vinod/vote) ✗ ./bin/mesos-slave.sh --master=127.0.0.1:5055 > I0115 12:15:19.846664 2096390528 main.cpp:118] Build: 2014-01-14 16:52:48 by > vinod > I0115 12:15:19.847189 2096390528 main.cpp:120] Creating "process" isolator > I0115 12:15:19.847462 2096390528 main.cpp:132] Starting Mesos slave > I0115 12:15:19.847807 2096390528 slave.cpp:111] Slave started on > 1)@172.25.27.97:5051 > I0115 12:15:19.848068 2096390528 slave.cpp:211] Slave resources: cpus(*):4; > mem(*):7168; disk(*):481998; ports(*):[31000-32000] > I0115 12:15:19.852408 175071232 state.cpp:33] Recovering state from > '/tmp/mesos/meta' > I0115 12:15:19.853726 175071232 status_update_manager.cpp:188] Recovering > status update manager > I0115 12:15:19.853798 175071232 process_isolator.cpp:317] Recovering isolator > I0115 12:15:19.853883 175071232 slave.cpp:2769] Finished recovery > I0115 12:15:19.854004 173998080 slave.cpp:500] New master detected at > master@127.0.0.1:5055 > I0115 12:15:19.854161 175607808 status_update_manager.cpp:162] New master > detected at master@127.0.0.1:5055 > I0115 12:15:19.854220 173998080 slave.cpp:525] Detecting new master > I0115 12:15:19.854409 175607808 slave.cpp:1966] master@127.0.0.1:5055 exited > W0115 12:15:19.854440 175607808 slave.cpp:1969] Master disconnected! Waiting > for a new master to be elected > W0115 12:15:19.854440 2096390528 logging.cpp:69] RAW: Received signal > SIGPIPE; escalating to SIGABRT > *** Aborted at 1389816919 (unix time) try "date -d @1389816919" if you are > using GNU date *** > PC: @ 0x7fff98586d46 __kill > *** SIGABRT (@0x7fff98586d46) received by PID 21391 (TID 0x7fff7cf46180) > stack trace: *** > @ 0x7fff960b190a _sigtramp > @ 0x7fff7bf03588 std::string::_Rep::_S_empty_rep_storage > @ 0x7fff960b190a _sigtramp > @ 0x0 (unknown) > @ 0x10956046b process::ProcessManager::wait() > @ 0x109566e7d process::wait() > @ 0x10924760a main > @ 0x7fff947cc7e1 start > @ 0x2 (unknown) > [1] 21391 abort ./bin/mesos-slave.sh --master=127.0.0.1:5055 -- This message was sent by Atlassian JIRA (v6.1.5#6160)