Hi All,

I've done a stress test with Hypertable using Hadoop MapReduce, and
find some errors:

1. Querying an existing row key produces no result. This is a
temporary error that happens with a low frequency (< 0.1%), I can get
the correct result if I query the failed row key again manually.
There's no error log on both client and server side.

2. The querying processes sometimes hang infinitely if using the same
row key.

3. Under high stress, range server some times dies with a core dump:
#0  0x000000000051c54f in Hypertable::intrusive_ptr_release
(rc=0x1ca45a0)
    at /home/yd/src/hypertable-0.9.0.10-yd/src/cc/Common/
ReferenceCount.h:73
73            delete rc;
(gdb) where
#0  0x000000000051c54f in Hypertable::intrusive_ptr_release
(rc=0x1ca45a0)
    at /home/yd/src/hypertable-0.9.0.10-yd/src/cc/Common/
ReferenceCount.h:73
#1  0x000000000061b96a in ~intrusive_ptr (this=0x41400360) at /usr/
local/include/boost-1_34_1/boost/intrusive_ptr.hpp:83
#2  0x000000000062df85 in Hypertable::HandlerMap::purge_handler
(this=0xa61f00, handler=0x1ca45a0)
    at /home/yd/src/hypertable-0.9.0.10-yd/src/cc/AsyncComm/
HandlerMap.h:142
#3  0x000000000062dcc6 in
Hypertable::ReactorRunner::cleanup_and_remove_handlers
(this=0x414011a8, [EMAIL PROTECTED])
    at /home/yd/src/hypertable-0.9.0.10-yd/src/cc/AsyncComm/
ReactorRunner.cc:129
#4  0x000000000062d9c7 in Hypertable::ReactorRunner::operator()
(this=0x414011a8)
    at /home/yd/src/hypertable-0.9.0.10-yd/src/cc/AsyncComm/
ReactorRunner.cc:74
#5  0x000000000062be53 in
boost::detail::function::void_function_obj_invoker0<Hypertable::ReactorRunner,
void>::invoke (
    [EMAIL PROTECTED])
    at /home/yd/src/hypertable-0.9.0.10-yd/src/cc/boost-1_34-fix/boost/
function/function_template.hpp:158
#6  0x0000002a95cb0dc7 in boost::function0<void,
std::allocator<boost::function_base> >::operator() ()
   from /usr/local/lib/libboost_thread-gcc34-mt-1_34_1.so.1.34.1
#7  0x0000002a95cb0407 in boost::thread_group::join_all () from /usr/
local/lib/libboost_thread-gcc34-mt-1_34_1.so.1.34.1
#8  0x000000302b80610a in start_thread () from /lib64/tls/
libpthread.so.0
#9  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#10 0x0000000000000000 in ?? ()

4. Even if there is a very serious problem, the client API still
retries many times, ignoring the Hypertable.Request.Timeout=10 setting
in hypertable.cfg, for example:
1222178929 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178937 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178945 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178953 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178961 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178969 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178977 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178985 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222178993 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179001 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179009 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179017 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179025 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179033 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179041 INFO unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/ConnectionManager.cc:264) Event: type=DISCONNECT "COMM
connect error" from=10.65.25.148:38060; Problem connecting to Root
RangeServer, will retry in 8 seconds...
1222179049 ERROR unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
Hypertable/Lib/RangeLocator.cc:542) Timeout (20s) waiting for root
RangeServer connection - 10.65.25.148:38060
1222179049 ERROR unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
AsyncComm/Comm.cc:204) No connection for 10.65.25.148:38060
1222179049 WARN unknown : (/home/yd/src/hypertable-0.9.0.10-yd/src/cc/
Hypertable/Lib/RangeServerClient.cc:229) Comm::send_request to
10.65.25.148:38060 failed - COMM not connected
1222179049 ERROR unknown : dump_error_history (/home/yd/src/
hypertable-0.9.0.10-yd/src/cc/Hypertable/Lib/RangeLocator.h:144):
Hypertable::Exception: Problem creating scanner for start row
'0:16:00000000' on METADATA[..??] - COMM not connected
        at int
Hypertable::RangeLocator::find(Hypertable::TableIdentifier*, const
char*, Hypertable::RangeLocationInfo*, Hypertable::Timer&, bool) (/
home/yd/src/hypertable-0.9.0.10-yd/src/cc/Hypertable/Lib/
RangeLocator.cc:289)
        at void
Hypertable::RangeServerClient::send_message(sockaddr_in&,
Hypertable::CommBufPtr&, Hypertable::DispatchHandler*) (/home/yd/src/
hypertable-0.9.0.10-yd/src/cc/Hypertable/Lib/RangeServerClient.cc:
230): Comm::send_request to 10.65.25.148:38060 failed
Exception: Locating range for row = '00000000'

Donald
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to