Hi,

i am looking into this...

as far as i remember, the original problem from last April was caused by a
scanner which was scanning a large dataset, and then the scanner was
deleted (but results were still outstanding). Is this still the same
problem?

Thanks
Christph

2012/8/9 BigQiao <[email protected]>

> This deadlock still exists in 0.9.6.0,   when delete a TableScanner
>
> a TableScanner destructor  lock IndexScannerCallback then TableScannerAsync
> a Database Working Thread lock TableScannerAsync then IndexScannerCallback
>
> Thread 14 (Thread 0x7fffee266700 (LWP 10936)):      //Database Working
> Thread
> #0  __lll_lock_wait () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
> #1  0x00007ffff79c8179 in _L_lock_953 () from /lib/libpthread.so.0
> #2  0x00007ffff79c7f9b in __pthread_mutex_lock (mutex=0xc08630) at
> pthread_mutex_lock.c:61
> #3  0x0000000000477886 in boost::mutex::lock (this=0xc08630) at
> /usr/include/boost/thread/pthread/mutex.hpp:50
> #4  0x000000000047e790 in boost::unique_lock<boost::mutex>::lock
> (this=0x7fffee2638e0) at /usr/include/boost/thread/locks.hpp:349
> #5  0x000000000047d51d in unique_lock (this=0x7fffee2638e0, m_=...) at
> /usr/include/boost/thread/locks.hpp:227
> #6  0x00000000005f0a07 in
> Hypertable::IndexScannerCallback::scan_ok(Hypertable::TableScannerAsync*,
> boost::intrusive_ptr<Hypertable::ScanCells>&) ()
> #7  0x00000000005ed180 in Hypertable::TableScannerAsync::maybe_callback_ok
> (this=0x10e7b50, scanner_id=1, next=true, do_callback=true, cells=...)
>     at
> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:522
> #8  0x00000000005ec5cc in Hypertable::TableScannerAsync::handle_result
> (this=0x10e7b50, scanner_id=1, event=..., is_create=true)
>     at
> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:459
> #9  0x00000000006286d2 in Hypertable::TableScannerHandler::run
> (this=0x7fffe8049e30) at
> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerHandler.cc:40
> #10 0x000000000047b625 in
> Hypertable::ApplicationQueue::Worker::operator()() ()
> #11 0x000000000048dbd2 in
> boost::detail::thread_data<Hypertable::ApplicationQueue::Worker>::run() ()
> #12 0x00007ffff77b5200 in thread_proxy () from
> /usr/lib/libboost_thread.so.1.42.0
> #13 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #14 0x00007ffff4978b6d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #15 0x0000000000000000 in ?? ()
>
>
> Thread 27 (Thread 0x7fffe33ee700 (LWP 10949)):         //TableScanner
> Destructor Thread
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
> #1  0x000000000047d87f in
> boost::condition_variable_any::wait<boost::unique_lock<boost::mutex> >
> (this=0x10e7c18, m=...)
>     at /usr/include/boost/thread/pthread/condition_variable.hpp:84
> #2  0x00000000005ed224 in
> Hypertable::TableScannerAsync::wait_for_completion (this=0x10e7b50)
>     at
> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:535
> #3  0x00000000005eb370 in ~TableScannerAsync (this=0x10e7b50,
> __in_chrg=<value optimized out>)
>     at
> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:318
> #4  0x00000000005f01d3 in
> Hypertable::IndexScannerCallback::~IndexScannerCallback() ()
> #5  0x00000000005eb579 in ~TableScannerAsync (this=0xc04ca0,
> __in_chrg=<value optimized out>)
>     at
> /home/hadoop/temp/hypertable-0.9.6.0/src/cc/Hypertable/Lib/TableScannerAsync.cc:324
> #6  0x0000000000444847 in Hypertable::intrusive_ptr_release (rc=0xc04ca0)
> at /opt/hypertable/0.9.6.0/include/Common/ReferenceCount.h:73
> #7  0x00000000005e70e3 in
> boost::intrusive_ptr<Hypertable::TableScannerAsync>::~intrusive_ptr() ()
> #8  0x00000000005e6cf3 in Hypertable::TableScanner::~TableScanner() ()
> #9  0x000000000043c943 in DBRecycled::run (this=0xa95c60) at
> /home/qiao/Project/Bingo/DistributedSpider/DBRecycled.cpp:48
> #10 0x000000000046eed7 in thread_proc (param=0x7fffe805ae00) at
> /home/qiao/Project/Bingo/DistributedSpider/shared/Threading/ThreadPool.cpp:331
> #11 0x00007ffff79c58ca in start_thread (arg=<value optimized out>) at
> pthread_create.c:300
> #12 0x00007ffff4978b6d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
> #13 0x0000000000000000 in ?? ()
>
>
> Sorry for the delay - i was finally able to reproduce it and i also fixed
>> it.
>>
>> The commit is a bit larger than my first try.
>>
>> https://github.com/cruppstahl/**hypertable/commits/v0.9.5<https://github.com/cruppstahl/hypertable/commits/v0.9.5>
>>
>> commit b45ba15b701373c3a1f689f8997f31**bde8ff5165
>> Author: Christoph Rupp <[email protected]>
>> Date:   Wed Apr 25 18:54:11 2012 +0200
>>
>>     issue 827: fixed deadlock when scanning secondary indices
>>
>> Thanks again for your great help!
>>
>> Best regards
>> Christoph
>>
>> 2012/4/26 gcc.lua <[email protected]>
>>
>>> Hi,
>>>
>>> thanks to reply quickly, but the commit just remove m_mutex inside
>>> virtual ~IndexScannerCallback() ,
>>>  I try it,  will  a new problem occured, see end of report,
>>> some additional info about  reproduce this issue before you commit
>>>
>>> void run()
>>> {
>>> TableScannerPtr aScanner = tbSourcelist-
>>> >create_scanner( specbuilder.get(), 5000 );
>>>
>>>  while( aScanner->next( gotCell ) )
>>>  {
>>>   ....
>>>        if(condition)
>>>           break;//if have next result, now break, internel scanner
>>> thread running
>>>   ....
>>>  }
>>>  return;//trigger  TableScanner destructor,  next info see my first
>>> post please
>>> }
>>>
>>> //////////////////////////////**//////////////////////////////**
>>> //////////////////////////
>>>
>>>
>>> pure virtual method called
>>> terminate called without an active exception
>>>
>>> Program received signal SIGABRT, Aborted.
>>> [Switching to Thread 0x7fffe6ff5700 (LWP 23887)]
>>> 0x00007ffff48db1b5 in raise () from /lib/libc.so.6
>>>
>>>
>>> (gdb) where
>>> #0  0x00007ffff48db1b5 in raise () from /lib/libc.so.6
>>> #1  0x00007ffff48ddfc0 in abort () from /lib/libc.so.6
>>> #2  0x00007ffff516fdc5 in __gnu_cxx::__verbose_**terminate_handler() ()
>>> from /usr/lib/libstdc++.so.6
>>> #3  0x00007ffff516e166 in ?? () from /usr/lib/libstdc++.so.6
>>> #4  0x00007ffff516e193 in std::terminate() () from /usr/lib/libstdc+
>>> +.so.6
>>> #5  0x00007ffff516ea6f in __cxa_pure_virtual () from /usr/lib/libstdc+
>>> +.so.6
>>> #6  0x00000000005c43c6 in
>>> Hypertable::TableScannerAsync:**:maybe_callback_ok
>>> (this=0x7fffb432ecd0,
>>> scanner_id=19373, next=true, do_callback=true, cells=...)
>>>     at
>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/
>>> TableScannerAsync.cc:520
>>> #7  0x00000000005c393f in
>>> Hypertable::TableScannerAsync:**:handle_result
>>> (this=0x7fffb432ecd0, scanner_id=19373, event=..., is_create=true)
>>>     at
>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/
>>> TableScannerAsync.cc:464
>>> #8  0x00000000005fdc5e in Hypertable::**TableScannerHandler::run
>>> (this=0x7fff99915850) at
>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/Hypertable/Lib/
>>> TableScannerHandler.cc:40
>>> #9  0x000000000045f2c5 in
>>> Hypertable::ApplicationQueue::**Worker::operator() (this=0xaaa120) at
>>> /root/qiao/Project/hypertable-**0.9.5.6/src/cc/AsyncComm/
>>> ApplicationQueue.h:173
>>> #10 0x0000000000470f04 in
>>> boost::detail::thread_data<**Hypertable::ApplicationQueue::**
>>> Worker>::run
>>> (this=0xaa9ff0) at /usr/include/boost/thread/**detail/thread.hpp:56
>>> #11 0x00007ffff77b5200 in thread_proxy () from
>>> /usr/lib/libboost_thread.so.1.**42.0
>>> #12 0x00007ffff79c58ca in start_thread () from /lib/libpthread.so.0
>>> #13 0x00007ffff497892d in clone () from /lib/libc.so.6
>>> #14 0x0000000000000000 in ?? ()
>>>
>>> On 4月26日, 上午12时56分, Christoph Rupp <[email protected]> wrote:
>>> > Hi,
>>> >
>>> > thanks for the great bug report.
>>> >
>>> > I am not able to reproduce this issue, but i think i came up with a
>>> fix. If
>>> > you want to check out the sources then you can get them here:
>>> https://github.com/**cruppstahl/hypertablebranch<https://github.com/cruppstahl/hypertablebranch>"v0.9.5"
>>> >
>>> > This is the commit:
>>> > commit 2572b5dcb524e1c36dc23307c37784**fd34c1bdde
>>> > Author: Christoph Rupp <[email protected]>
>>> > Date:   Wed Apr 25 18:54:11 2012 +0200
>>> >
>>> >     issue 827: fixed deadlock when scanning secondary indices
>>> >
>>> > And here's the diff:
>>> >
>>> > diff --git a/src/cc/Hypertable/Lib/**IndexScannerCallback.h
>>> > b/src/cc/Hypertable/Li
>>> > index 70ffda7..1b37127 100644
>>> > --- a/src/cc/Hypertable/Lib/**IndexScannerCallback.h
>>> > +++ b/src/cc/Hypertable/Lib/**IndexScannerCallback.h
>>> > @@ -118,13 +118,12 @@ static String last;
>>> >      }
>>> >
>>> >      virtual ~IndexScannerCallback() {
>>> > -      ScopedLock lock(m_mutex);
>>> > -      if (m_mutator)
>>> > -        delete m_mutator;
>>> >        foreach (TableScannerAsync *s, m_scanners)
>>> >          delete s;
>>> >        m_scanners.clear();
>>> >        sspecs_clear();
>>> > +      if (m_mutator)
>>> > +        delete m_mutator;
>>> >
>>> > Can you please give it a try and see if this helps?
>>> >
>>> > Thanks
>>> > Christoph
>>> >
>>> > 2012/4/24 gcc.lua <[email protected]>
>>> >
>>> > > user thread  logic like follow:
>>> > > TableScannerPtr aScanner = tbSourcelist-
>>> > > >create_scanner( specbuilder.get(), 5000 );
>>> > >  while( aScanner->next( gotCell ) )
>>> > >  {
>>> > >         .....
>>> > >  }
>>> >
>>> > > dead lock between user thread and scanner thread:
>>> >
>>> > > 1. user thread TableScanner
>>> >
>>> > >    TableScannerAsync::~**TableScannerAsync() {
>>> > >  try {
>>> > >    cancel();
>>> > >    wait_for_completion();
>>> > >  }
>>> > >  catch (Exception &e) {
>>> > >    HT_ERROR_OUT << e << HT_END;
>>> > >  }
>>> > >  if (m_use_index) {
>>> > >    delete m_cb;//<======================**===dead lock entry
>>> > >    m_cb = 0;
>>> > >  }
>>> > > }
>>> > > //////////////////////////////**///////////
>>> > >   virtual ~IndexScannerCallback() {
>>> > >  ScopedLock lock(m_mutex);//<=========  user thread got this
>>> > > IndexScannerCallback::m_mutex
>>> > >      if (m_mutator)
>>> > >        delete m_mutator;
>>> >
>>> > >      foreach (TableScannerAsync *s, m_scanners)
>>> > >        delete s;//dead lock 1<=============user thread wait
>>> > > TableScannerAsync::m_mutex
>>> >
>>> > > 2. scanner thread
>>> >
>>> > >  void TableScannerAsync::handle_**result(int scanner_id, EventPtr
>>> > > &event, bool is_create) {
>>> >
>>> > >  bool cancelled = is_cancelled();
>>> > >  ScopedLock lock(m_mutex);<============**scanner thread got
>>> > > TableScannerAsync::m_mutex
>>> > >  ScanCellsPtr cells;
>>> >
>>> > >    . . . . . .
>>> > >  maybe_callback_ok();<========**========call  m_cb->scan_ok(this,
>>> > > cells);
>>> >
>>> > > }
>>> > > //////////////////////////////
>>> > >  class IndexScannerCallback : public ResultCallback {
>>> >
>>> > >    virtual void scan_ok(TableScannerAsync *scanner, ScanCellsPtr
>>> > > &scancells) {
>>> > >      bool is_eos = scancells->get_eos();
>>> > >      String table_name = scanner->get_table_name();
>>> >
>>> > >      ScopedLock lock(m_mutex);//dead lock 2<============scanner
>>> > > thread wait IndexScannerCallback::m_mutex
>>> >
>>> > > --
>>> > > You received this message because you are subscribed to the Google
>>> Groups
>>> > > "Hypertable Development" group.
>>> > > To post to this group, send email to hyperta...@googlegroups.**com.
>>> > > To unsubscribe from this group, send email to
>>> > > hypertable-de...@**googlegroups.com.
>>> > > For more options, visit this group at
>>> > >http://groups.google.com/**group/hypertable-dev?hl=en<http://groups.google.com/group/hypertable-dev?hl=en>
>>> .
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Hypertable Development" group.
>>> To post to this group, send email to hyperta...@googlegroups.**com.
>>> To unsubscribe from this group, send email to hypertable-de...@**
>>> googlegroups.com.
>>> For more options, visit this group at http://groups.google.com/**
>>> group/hypertable-dev?hl=en<http://groups.google.com/group/hypertable-dev?hl=en>
>>> .
>>>
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Hypertable Development" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/hypertable-dev/-/sERE6hok0i0J.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/hypertable-dev?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to