Fix is available (it includes a regression test): https://github.com/cruppstahl/hypertable/commit/614a7d5e34c254ffa77f8d29b456866e8d71bbec
Thanks Christoph 2012/8/10 Christoph Rupp <[email protected]> > ok, i can reproduce it... will work on a fix till next tuesday/wednesday. > > Thanks > Christoph > > > 2012/8/9 BigQiao <[email protected]> > >> This deadlock still exists in 0.9.6.0, when delete a TableScanner >> >> a TableScanner destructor lock IndexScannerCallback >> then TableScannerAsync >> a Database Working Thread lock TableScannerAsync then IndexScannerCallback >> >> Thread 14 (Thread 0x7fffee266700 (LWP 10936)): //Database Working >> Thread >> #0 __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 >> #1 0x00007ffff79c8179 in _L_lock_953 () from /lib/libpthread.so.0 >> #2 0x00007ffff79c7f9b in __pthread_mutex_lock (mutex=0xc08630) at >> pthread_mutex_lock.c:61 >> #3 0x0000000000477886 in boost::mutex::lock (this=0xc08630) at >> /usr/include/boost/thread/pthread/mutex.hpp:50 >> #4 0x000000000047e790 in boost::unique_lock<boost::mutex>::lock >> (this=0x7fffee2638e0) at /usr/include/boost/thread/locks.hpp:349 >> #5 0x000000000047d51d in unique_lock (this=0x7fffee2638e0, m_=...) at >> /usr/include/boost/thread/locks.hpp:227 >> #6 0x00000000005f0a07 in >> Hypertable::IndexScannerCallback::scan_ok(Hypertable::TableScannerAsync*, >> boost::intrusive_ptr<Hypertable::ScanCells>&) () >> #7 0x00000000005ed180 in >> Hypertable::TableScannerAsync::maybe_callback_ok (this=0x10e7b50, >> scanner_id=1, next=true, do_callback=true, cells=...) >> at >> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:522 >> #8 0x00000000005ec5cc in Hypertable::TableScannerAsync::handle_result >> (this=0x10e7b50, scanner_id=1, event=..., is_create=true) >> at >> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:459 >> #9 0x00000000006286d2 in Hypertable::TableScannerHandler::run >> (this=0x7fffe8049e30) at >> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerHandler.cc:40 >> #10 0x000000000047b625 in >> Hypertable::ApplicationQueue::Worker::operator()() () >> #11 0x000000000048dbd2 in >> boost::detail::thread_data<Hypertable::ApplicationQueue::Worker>::run() () >> #12 0x00007ffff77b5200 in thread_proxy () from >> /usr/lib/libboost_thread.so.1.42.0 >> #13 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at >> pthread_create.c:300 >> #14 0x00007ffff4978b6d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #15 0x0000000000000000 in ?? () >> >> >> Thread 27 (Thread 0x7fffe33ee700 (LWP 10949)): //TableScanner >> Destructor Thread >> #0 pthread_cond_wait@@GLIBC_2.3.2 () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >> #1 0x000000000047d87f in >> boost::condition_variable_any::wait<boost::unique_lock<boost::mutex> > >> (this=0x10e7c18, m=...) >> at /usr/include/boost/thread/pthread/condition_variable.hpp:84 >> #2 0x00000000005ed224 in >> Hypertable::TableScannerAsync::wait_for_completion (this=0x10e7b50) >> at >> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:535 >> #3 0x00000000005eb370 in ~TableScannerAsync (this=0x10e7b50, >> __in_chrg=<value optimized out>) >> at >> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:318 >> #4 0x00000000005f01d3 in >> Hypertable::IndexScannerCallback::~IndexScannerCallback() () >> #5 0x00000000005eb579 in ~TableScannerAsync (this=0xc04ca0, >> __in_chrg=<value optimized out>) >> at >> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:324 >> #6 0x0000000000444847 in Hypertable::intrusive_ptr_release (rc=0xc04ca0) >> at /opt/hypertable/0.9.6.0/include/Common/ReferenceCount.h:73 >> #7 0x00000000005e70e3 in >> boost::intrusive_ptr<Hypertable::TableScannerAsync>::~intrusive_ptr() () >> #8 0x00000000005e6cf3 in Hypertable::TableScanner::~TableScanner() () >> #9 0x000000000043c943 in DBRecycled::run (this=0xa95c60) at >> /home/qiao/Project/Bingo/DistributedSpider/DBRecycled.cpp:48 >> #10 0x000000000046eed7 in thread_proc (param=0x7fffe805ae00) at >> /home/qiao/Project/Bingo/DistributedSpider/shared/Threading/ThreadPool.cpp:331 >> #11 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at >> pthread_create.c:300 >> #12 0x00007ffff4978b6d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >> #13 0x0000000000000000 in ?? () >> >> >> Sorry for the delay - i was finally able to reproduce it and i also fixed >>> it. >>> >>> The commit is a bit larger than my first try. >>> >>> https://github.com/cruppstahl/**hypertable/commits/v0.9.5<https://github.com/cruppstahl/hypertable/commits/v0.9.5> >>> >>> commit b45ba15b701373c3a1f689f8997f31**bde8ff5165 >>> Author: Christoph Rupp <[email protected]> >>> Date: Wed Apr 25 18:54:11 2012 +0200 >>> >>> issue 827: fixed deadlock when scanning secondary indices >>> >>> Thanks again for your great help! >>> >>> Best regards >>> Christoph >>> >>> 2012/4/26 gcc.lua <[email protected]> >>> >>>> Hi, >>>> >>>> thanks to reply quickly, but the commit just remove m_mutex inside >>>> virtual ~IndexScannerCallback() , >>>> I try it, will a new problem occured, see end of report, >>>> some additional info about reproduce this issue before you commit >>>> >>>> void run() >>>> { >>>> TableScannerPtr aScanner = tbSourcelist- >>>> >create_scanner( specbuilder.get(), 5000 ); >>>> >>>> while( aScanner->next( gotCell ) ) >>>> { >>>> .... >>>> if(condition) >>>> break;//if have next result, now break, internel scanner >>>> thread running >>>> .... >>>> } >>>> return;//trigger TableScanner destructor, next info see my first >>>> post please >>>> } >>>> >>>> //////////////////////////////**//////////////////////////////** >>>> ////////////////////////// >>>> >>>> >>>> pure virtual method called >>>> terminate called without an active exception >>>> >>>> Program received signal SIGABRT, Aborted. >>>> [Switching to Thread 0x7fffe6ff5700 (LWP 23887)] >>>> 0x00007ffff48db1b5 in raise () from /lib/libc.so.6 >>>> >>>> >>>> (gdb) where >>>> #0 0x00007ffff48db1b5 in raise () from /lib/libc.so.6 >>>> #1 0x00007ffff48ddfc0 in abort () from /lib/libc.so.6 >>>> #2 0x00007ffff516fdc5 in __gnu_cxx::__verbose_**terminate_handler() () >>>> from /usr/lib/libstdc++.so.6 >>>> #3 0x00007ffff516e166 in ?? () from /usr/lib/libstdc++.so.6 >>>> #4 0x00007ffff516e193 in std::terminate() () from /usr/lib/libstdc+ >>>> +.so.6 >>>> #5 0x00007ffff516ea6f in __cxa_pure_virtual () from /usr/lib/libstdc+ >>>> +.so.6 >>>> #6 0x00000000005c43c6 in >>>> Hypertable::TableScannerAsync:**:maybe_callback_ok >>>> (this=0x7fffb432ecd0, >>>> scanner_id=19373, next=true, do_callback=true, cells=...) >>>> at >>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/ >>>> TableScannerAsync.cc:520 >>>> #7 0x00000000005c393f in >>>> Hypertable::TableScannerAsync:**:handle_result >>>> (this=0x7fffb432ecd0, scanner_id=19373, event=..., is_create=true) >>>> at >>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/ >>>> TableScannerAsync.cc:464 >>>> #8 0x00000000005fdc5e in Hypertable::**TableScannerHandler::run >>>> (this=0x7fff99915850) at >>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/ >>>> TableScannerHandler.cc:40 >>>> #9 0x000000000045f2c5 in >>>> Hypertable::ApplicationQueue::**Worker::operator() (this=0xaaa120) at >>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/AsyncComm/ >>>> ApplicationQueue.h:173 >>>> #10 0x0000000000470f04 in >>>> boost::detail::thread_data<**Hypertable::ApplicationQueue::** >>>> Worker>::run >>>> (this=0xaa9ff0) at /usr/include/boost/thread/**detail/thread.hpp:56 >>>> #11 0x00007ffff77b5200 in thread_proxy () from >>>> /usr/lib/libboost_thread.so.1.**42.0 >>>> #12 0x00007ffff79c58ca in start_thread () from /lib/libpthread.so.0 >>>> #13 0x00007ffff497892d in clone () from /lib/libc.so.6 >>>> #14 0x0000000000000000 in ?? () >>>> >>>> On 4月26日, 上午12时56分, Christoph Rupp <[email protected]> wrote: >>>> > Hi, >>>> > >>>> > thanks for the great bug report. >>>> > >>>> > I am not able to reproduce this issue, but i think i came up with a >>>> fix. If >>>> > you want to check out the sources then you can get them here: >>>> https://github.com/**cruppstahl/hypertablebranch<https://github.com/cruppstahl/hypertablebranch>"v0.9.5" >>>> > >>>> > This is the commit: >>>> > commit 2572b5dcb524e1c36dc23307c37784**fd34c1bdde >>>> > Author: Christoph Rupp <[email protected]> >>>> > Date: Wed Apr 25 18:54:11 2012 +0200 >>>> > >>>> > issue 827: fixed deadlock when scanning secondary indices >>>> > >>>> > And here's the diff: >>>> > >>>> > diff --git a/src/cc/Hypertable/Lib/**IndexScannerCallback.h >>>> > b/src/cc/Hypertable/Li >>>> > index 70ffda7..1b37127 100644 >>>> > --- a/src/cc/Hypertable/Lib/**IndexScannerCallback.h >>>> > +++ b/src/cc/Hypertable/Lib/**IndexScannerCallback.h >>>> > @@ -118,13 +118,12 @@ static String last; >>>> > } >>>> > >>>> > virtual ~IndexScannerCallback() { >>>> > - ScopedLock lock(m_mutex); >>>> > - if (m_mutator) >>>> > - delete m_mutator; >>>> > foreach (TableScannerAsync *s, m_scanners) >>>> > delete s; >>>> > m_scanners.clear(); >>>> > sspecs_clear(); >>>> > + if (m_mutator) >>>> > + delete m_mutator; >>>> > >>>> > Can you please give it a try and see if this helps? >>>> > >>>> > Thanks >>>> > Christoph >>>> > >>>> > 2012/4/24 gcc.lua <[email protected]> >>>> > >>>> > > user thread logic like follow: >>>> > > TableScannerPtr aScanner = tbSourcelist- >>>> > > >create_scanner( specbuilder.get(), 5000 ); >>>> > > while( aScanner->next( gotCell ) ) >>>> > > { >>>> > > ..... >>>> > > } >>>> > >>>> > > dead lock between user thread and scanner thread: >>>> > >>>> > > 1. user thread TableScanner >>>> > >>>> > > TableScannerAsync::~**TableScannerAsync() { >>>> > > try { >>>> > > cancel(); >>>> > > wait_for_completion(); >>>> > > } >>>> > > catch (Exception &e) { >>>> > > HT_ERROR_OUT << e << HT_END; >>>> > > } >>>> > > if (m_use_index) { >>>> > > delete m_cb;//<======================**===dead lock entry >>>> > > m_cb = 0; >>>> > > } >>>> > > } >>>> > > //////////////////////////////**/////////// >>>> > > virtual ~IndexScannerCallback() { >>>> > > ScopedLock lock(m_mutex);//<========= user thread got this >>>> > > IndexScannerCallback::m_mutex >>>> > > if (m_mutator) >>>> > > delete m_mutator; >>>> > >>>> > > foreach (TableScannerAsync *s, m_scanners) >>>> > > delete s;//dead lock 1<=============user thread wait >>>> > > TableScannerAsync::m_mutex >>>> > >>>> > > 2. scanner thread >>>> > >>>> > > void TableScannerAsync::handle_**result(int scanner_id, EventPtr >>>> > > &event, bool is_create) { >>>> > >>>> > > bool cancelled = is_cancelled(); >>>> > > ScopedLock lock(m_mutex);<============**scanner thread got >>>> > > TableScannerAsync::m_mutex >>>> > > ScanCellsPtr cells; >>>> > >>>> > > . . . . . . >>>> > > maybe_callback_ok();<========**========call m_cb->scan_ok(this, >>>> > > cells); >>>> > >>>> > > } >>>> > > ////////////////////////////// >>>> > > class IndexScannerCallback : public ResultCallback { >>>> > >>>> > > virtual void scan_ok(TableScannerAsync *scanner, ScanCellsPtr >>>> > > &scancells) { >>>> > > bool is_eos = scancells->get_eos(); >>>> > > String table_name = scanner->get_table_name(); >>>> > >>>> > > ScopedLock lock(m_mutex);//dead lock 2<============scanner >>>> > > thread wait IndexScannerCallback::m_mutex >>>> > >>>> > > -- >>>> > > You received this message because you are subscribed to the Google >>>> Groups >>>> > > "Hypertable Development" group. >>>> > > To post to this group, send email to hyperta...@googlegroups.**com. >>>> > > To unsubscribe from this group, send email to >>>> > > hypertable-de...@**googlegroups.com. >>>> > > For more options, visit this group at >>>> > >http://groups.google.com/**group/hypertable-dev?hl=en<http://groups.google.com/group/hypertable-dev?hl=en> >>>> . >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Hypertable Development" group. >>>> To post to this group, send email to hyperta...@googlegroups.**com. >>>> To unsubscribe from this group, send email to hypertable-de...@** >>>> googlegroups.com. >>>> For more options, visit this group at http://groups.google.com/** >>>> group/hypertable-dev?hl=en<http://groups.google.com/group/hypertable-dev?hl=en> >>>> . >>>> >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "Hypertable Development" group. >> To view this discussion on the web visit >> https://groups.google.com/d/msg/hypertable-dev/-/sERE6hok0i0J. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/hypertable-dev?hl=en. >> > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
