Hi, all, the content of the file that cause assertion failure of BloomFilter :
/hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0 $ hexdump -C cs0 00000000 49 64 78 46 69 78 2d 2d 2d 2d 1a 00 ff ff ff ff | IdxFix----......| 00000010 00 00 00 00 00 00 00 00 7d 9f 49 64 78 56 61 72 |........}.IdxVar| 00000020 2d 2d 2d 2d 1a 00 ff ff ff ff 00 00 00 00 00 00 |----............| 00000030 00 00 87 97 |....| 00000034 FYI -- kuer On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]> wrote: > Recovering ranges from crashed RangeServers is one of the high > priority items Doug is working on. > > -Sanjit > > On Jul 21, 2009, at 7:59 PM, kuer wrote: > > > > > Hi, all, > > > Another question, as one of range-servers will coredump when > > replaying commit log, so I just stop rebooting it. But this time, the > > whole HT system seems stop working, too. > > > Client program complain socket.timeout, > > > hyperspace shell hangs : > > hypertable> show tables; > > METADATA > > kvcache > > storage_se > > > Elapsed time: 0.00 s > > hypertable> show create table storage_se; > > ^^^^^ waiting for .... ???? > > > Logging messages from Hypertable.Master : > > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] > > (AsyncComm/Comm.cc:212) No connection for 221.194.134.173:31060 > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [WARN] (Lib/ > > RangeServerClient.cc:312) Comm::send_request to 221.194.134.173:31060 > > failed - COMM not connected > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] > > find_range_and_start_scan (Lib/IntervalScanner.cc:408): > > Hypertable::Exception: Comm::send_request to 221.194.134.173:31060 > > failed - COMM not connected > > at void Hypertable::RangeServerClient::send_message(const > > sockaddr_in&, Hypertable::CommBufPtr&, Hypertable::DispatchHandler*) > > (Lib/RangeServerClient.cc:314) > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] (Master/ > > MasterGc.cc:239) Error: caught exception while gc'ing: Problem > > creating scanner on METADATA[..0: ] > > > NOTES: 221.194.134.173 is IP of the box where RangeServer went wrong. > > > My question is : > > since all information are shared by all rangeserver, why not > > hypertable.master reassign the ranges to other rangeserver when some > > of rangeservers go out of work ??? > > > thanks > > > -- kuer > > > On 7月22日, 上午10时43分, kuer <[email protected]> wrote: > >> Hi, Sanjit, > > >> I just upload the second part of range.log range.20090722.log. > >> 2.gz。 > > >> the first part of range.20090722.log.1.gz is about 18MB, it exceed > >> the > >> limits of upload files. > > >>http://hypertable-dev.googlegroups.com/web/range.20090722.log.2.gz? > >> gd... > > >> IF it is necessary, I will split the first log file and upload them. > > >> Thanks > > >> -- kuer > > >> On 7月22日, 上午10时15分, Sanjit Jhala <[email protected]> > >> wrote: > > >>> Hi Kuer, > > >>> You can gzip the RangeServer log and post them to the File Upload > >>> Page. Thanks for reporting this issue. > > >>> -Sanjit > > >>> On Jul 21, 2009, at 6:44 PM, kuer wrote: > > >>>> Hi, Sanjit, > > >>>> with --debug option, I get some logging message, but the file is > >>>> big, > >>>> how to share it with you? > > >>>> gdb backtrace of core files > > >>>> (gdb) bt > >>>> #0 0x0000000000538272 in > >>>> Hypertable > >>>> ::BasicBloomFilter<Hypertable::MurmurHash2>::BasicBloomFilter > >>>> () > >>>> #1 0x000000000053d3be in > >>>> Hypertable::CellStoreV1::create_bloom_filter > >>>> () > >>>> #2 0x000000000053e10e in Hypertable::CellStoreV1::finalize () > >>>> #3 0x000000000051f112 in Hypertable::AccessGroup::run_compaction > >>>> () > >>>> #4 0x0000000000504e45 in > >>>> Hypertable::Range::split_compact_and_shrink > >>>> () > >>>> #5 0x0000000000509310 in Hypertable::Range::split () > >>>> #6 0x00000000004ec693 in > >>>> Hypertable::MaintenanceQueue::Worker::operator() () > >>>> #7 0x00000000006a5c40 in thread_proxy () > >>>> #8 0x00000038ae406367 in start_thread () from /lib64/ > >>>> libpthread.so.0 > >>>> #9 0x00000038ad8d2f7d in clone () from /lib64/libc.so.6 > > >>>> -- kuer > > >>>> On 7月22日, 上午9时07分, Sanjit Jhala <[email protected]> > >>>> wrote: > >>>>> Hi Kuer, > > >>>>> This looks like a bug in the RangeServer code. The RangeServer is > >>>>> trying to create a CellStore file and while creating the > >>>>> CellStore's > >>>>> BloomFilter its hitting an error condition. > > >>>>> Can you try a couple of things to help debug this issue? > > >>>>> Firstly turn on the RangeServer debug logging and report > >>>>> RangeServer > >>>>> logs. You can do this by adding the global option --debug to your > >>>>> start-all-servers.sh command line. Example: < > >>>>> $HYPERTABLE_INSTALL_DIR>/ > >>>>> bin/start-all-servers.sh kfs --debug > > >>>>> Secondly, if you could compile a debug build and send the stack > >>>>> trace > >>>>> that would be helpful. To do this, from your hypertable build > >>>>> directory run > >>>>> ccmake <$HYPERTABLE_SRC_DIR> and make sure CMAKE_BUILD_TYPE is > >>>>> set > >>>>> to > >>>>> Debug and install the new build. After you try to bring up the > >>>>> RangeServer and it dumps core, you can load the core file in gdb > >>>>> (Eg: > >>>>> gdb gdb <$HYPERTABLE_INSTALL_DIR>/bin/Hypertable.RangeServer < > >>>>> $CORE_FILE>). You can run bt (backtrace) in gdb to get the stack > >>>>> trace. > > >>>>> -Sanjit > > >>>>> On Jul 21, 2009, at 5:36 PM, kuer wrote: > > >>>>>> Hi, all, > > >>>>>> one of RangeServers hangs after coredump and restarting . here > >>>>>> are > >>>>>> messages in rangeserver's log : > > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN] > >>>>>> (Lib/ > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because > >>>>>> 1246607682171649001 >= 1246607682128108001 (file='/hypertable/ > >>>>>> servers/ > >>>>>> 221.194.134.173_31060/log/root/0') > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN] > >>>>>> (Lib/ > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because > >>>>>> 1248187695757932563 >= 1247819802453791364 (file='/hypertable/ > >>>>>> servers/ > >>>>>> 221.194.134.173_31060/log/metadata/2') > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN] > >>>>>> (Lib/ > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because > >>>>>> 1248193806824860161 >= 1248189458336849002 (file='/hypertable/ > >>>>>> servers/ > >>>>>> 221.194.134.173_31060/log/user/401') > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/MaintenancePrioritizerLogCleanup.cc:103) Adding > >>>>>> maintenance for range METADATA[0: .. ] because mid-split(1) > >>>>>> 2009-07-22 08:23:41,449 1295067456 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/RangeServer.cc:2032) Memory Usage: 312320288 bytes > >>>>>> 2009-07-22 08:23:41,449 1378986304 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of > >>>>>> METADATA > >>>>>> [0: .. ](default) > >>>>>> 2009-07-22 08:23:41,529 1378986304 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA > >>>>>> [0: .. ](default) > >>>>>> 2009-07-22 08:23:41,530 1378986304 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/AccessGroup.cc:372) Starting InMemory Compaction of > >>>>>> METADATA[0: .. ](location) > >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA > >>>>>> [0: .. ](location) > >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO] > >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of > >>>>>> METADATA > >>>>>> [0: .. ](logging) > >>>>>> 2009-07-22 08:23:41,552 1378986304 Hypertable.RangeServer [FATAL] > >>>>>> (Common/BloomFilter.h:47) failed expectation: m_num_bits != 0 > > >>>>>> It seems that RangeServer cannot restore from log-replaying. > > >>>>>> What's the problem? How to fix it ? > > >>>>>> Thanks > > >>>>>> -- kuer --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
