Hi Mateusz,

I assume that you're seeing CPU utilization drop to zero followed by a delay
and then this timeout.  Correct?  If that's the case, if you could get a
stack trace from the Hypertable.RangeServer and the Hypertable.Master
processes at the time of deadlock, that would be most helpful in chasing
this one down.  Thanks!

- Doug

On Sun, Mar 29, 2009 at 8:21 AM, Mateusz Berezecki <[email protected]>wrote:

>
> Hi Doug,
>
> I've been running another stress test on a single machine for 0.9.2.3
> release and it seems that the deadlock problem resurfaced. The patch
> I've applied did fix things, but it also moved the problem to a
> different place in the code now. Last time the deadlock occurred, it
> was locking up the RangeServer and the error messages were present in
> the logs for RangeServer. This time the problem appears on the
> application side:
>
> 1238339511 WARN mergesort_splits :
>
> (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutatorDispatchHandler.cc:85)
> Event: type=ERROR "HYPERTABLE request timeout" from=172.16.0.19:38060,
> will retry ...
> 1238339511 WARN mergesort_splits :
> (/home/mateusz/hypertable/src/cc/AsyncComm/IOHandlerData.cc:348)
> Received response for non-pending event
> (id=672,version=1,total_len=40)
> 1238339550 ERROR mergesort_splits : handle_exceptions
> (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:53):
> Hypertable::Exception: auto flushing - HYPERTABLE request timeout
>        at void
> Hypertable::TableMutator::auto_flush(Hypertable::Timer&)
> (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:215)
>        at void
> Hypertable::TableMutator::wait_for_previous_buffer(Hypertable::Timer&)
> (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutator.cc:303):
> waiting for previous buffer
>        at bool
>
> Hypertable::TableMutatorCompletionCounter::wait_for_completion(Hypertable::Timer&)
>
> (/home/mateusz/hypertable/src/cc/Hypertable/Lib/TableMutatorCompletionCounter.h:71):
> terminate called after throwing an instance of 'Hypertable::Exception'
>  what():  auto flushing
> Aborted
>
> I have found this deadlock in 2 scenarios outlined below:
> 1. when running a long lasting insertion process I tried selecting in
> CLI from the same table I was inserting to. this resulted in a
> deadlock, but it might have been just a coincidence as the deadlock
> might have been already triggered
>
> 2. runnning a long lasting insertion process. The application
> basically does external mergesort on approx 40 gb of data and inserts
> the data in the sorted order to the index table. this triggers the
> error I pasted verbatim above.
>
> Shall I be grabbing the stack trace at the time of deadlock occurring
> from the application, rangeserver or both ? Which one should be more
> helpful in investigating the bug?
>
> Mateusz
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to