Hi Kuer,

Can you grep through your logs (on all machines) for "ERROR" and "Exception"
?  Post the result of the grep output and we'll take a look.

- Doug

On Wed, Jul 22, 2009 at 2:05 AM, kuer <[email protected]> wrote:

>
> Hi, all,
>
> the content of the file that cause assertion failure of BloomFilter :
>
> /hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0
>
> $ hexdump -C cs0
> 00000000  49 64 78 46 69 78 2d 2d  2d 2d 1a 00 ff ff ff ff  |
> IdxFix----......|
> 00000010  00 00 00 00 00 00 00 00  7d 9f 49 64 78 56 61 72
> |........}.IdxVar|
> 00000020  2d 2d 2d 2d 1a 00 ff ff  ff ff 00 00 00 00 00 00
> |----............|
> 00000030  00 00 87 97                                       |....|
> 00000034
>
>  FYI
>
>   -- kuer
>
>
> On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]> wrote:
> >   Recovering ranges from crashed RangeServers is one of the high
> > priority items Doug is working on.
> >
> > -Sanjit
> >
> > On Jul 21, 2009, at 7:59 PM, kuer wrote:
> >
> >
> >
> > > Hi, all,
> >
> > > Another question,  as one of range-servers will coredump when
> > > replaying commit log, so I just stop rebooting it. But this time, the
> > > whole HT system seems stop working, too.
> >
> > > Client program complain socket.timeout,
> >
> > > hyperspace shell hangs :
> > > hypertable> show tables;
> > > METADATA
> > > kvcache
> > > storage_se
> >
> > >  Elapsed time:  0.00 s
> > > hypertable> show create table storage_se;
> > > ^^^^^ waiting for .... ????
> >
> > > Logging messages from Hypertable.Master :
> >
> > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR]
> > > (AsyncComm/Comm.cc:212) No connection for 221.194.134.173:31060
> > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [WARN] (Lib/
> > > RangeServerClient.cc:312) Comm::send_request to 221.194.134.173:31060
> > > failed - COMM not connected
> > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR]
> > > find_range_and_start_scan (Lib/IntervalScanner.cc:408):
> > > Hypertable::Exception: Comm::send_request to 221.194.134.173:31060
> > > failed - COMM not connected
> > >    at void Hypertable::RangeServerClient::send_message(const
> > > sockaddr_in&, Hypertable::CommBufPtr&, Hypertable::DispatchHandler*)
> > > (Lib/RangeServerClient.cc:314)
> > > 2009-07-22 10:45:45,276 1350199616 Hypertable.Master [ERROR] (Master/
> > > MasterGc.cc:239) Error: caught exception while gc'ing: Problem
> > > creating scanner on METADATA[..0: ]
> >
> > > NOTES: 221.194.134.173 is IP of the box where RangeServer went wrong.
> >
> > > My question is :
> > > since all information are shared by all rangeserver, why not
> > > hypertable.master reassign the ranges to other rangeserver when some
> > > of rangeservers go out of work ???
> >
> > > thanks
> >
> > >   -- kuer
> >
> > > On 7月22日, 上午10时43分, kuer <[email protected]> wrote:
> > >> Hi, Sanjit,
> >
> > >> I just upload the second part of range.log  range.20090722.log.
> > >> 2.gz。
> >
> > >> the first part of range.20090722.log.1.gz is about 18MB, it exceed
> > >> the
> > >> limits of upload files.
> >
> > >>http://hypertable-dev.googlegroups.com/web/range.20090722.log.2.gz?
> > >> gd...
> >
> > >> IF it is necessary, I will split the first log file and upload them.
> >
> > >> Thanks
> >
> > >>   -- kuer
> >
> > >> On 7月22日, 上午10时15分, Sanjit Jhala <[email protected]>
> > >> wrote:
> >
> > >>> Hi Kuer,
> >
> > >>> You can gzip the RangeServer log and post them to the File Upload
> > >>> Page. Thanks for reporting this issue.
> >
> > >>> -Sanjit
> >
> > >>> On Jul 21, 2009, at 6:44 PM, kuer wrote:
> >
> > >>>> Hi, Sanjit,
> >
> > >>>> with --debug option, I get some logging message, but the file is
> > >>>> big,
> > >>>> how to share it with you?
> >
> > >>>> gdb backtrace of core files
> >
> > >>>> (gdb) bt
> > >>>> #0  0x0000000000538272 in
> > >>>> Hypertable
> > >>>> ::BasicBloomFilter<Hypertable::MurmurHash2>::BasicBloomFilter
> > >>>> ()
> > >>>> #1  0x000000000053d3be in
> > >>>> Hypertable::CellStoreV1::create_bloom_filter
> > >>>> ()
> > >>>> #2  0x000000000053e10e in Hypertable::CellStoreV1::finalize ()
> > >>>> #3  0x000000000051f112 in Hypertable::AccessGroup::run_compaction
> > >>>> ()
> > >>>> #4  0x0000000000504e45 in
> > >>>> Hypertable::Range::split_compact_and_shrink
> > >>>> ()
> > >>>> #5  0x0000000000509310 in Hypertable::Range::split ()
> > >>>> #6  0x00000000004ec693 in
> > >>>> Hypertable::MaintenanceQueue::Worker::operator() ()
> > >>>> #7  0x00000000006a5c40 in thread_proxy ()
> > >>>> #8  0x00000038ae406367 in start_thread () from /lib64/
> > >>>> libpthread.so.0
> > >>>> #9  0x00000038ad8d2f7d in clone () from /lib64/libc.so.6
> >
> > >>>> -- kuer
> >
> > >>>> On 7月22日, 上午9时07分, Sanjit Jhala <[email protected]>
> > >>>> wrote:
> > >>>>> Hi Kuer,
> >
> > >>>>> This looks like a bug in the RangeServer code. The RangeServer is
> > >>>>> trying to create a CellStore file and while creating the
> > >>>>> CellStore's
> > >>>>> BloomFilter its hitting an error condition.
> >
> > >>>>> Can you try a couple of things to help debug this issue?
> >
> > >>>>> Firstly turn on the RangeServer debug logging and report
> > >>>>> RangeServer
> > >>>>> logs. You can do this by adding the global option --debug to your
> > >>>>> start-all-servers.sh command line. Example: <
> > >>>>> $HYPERTABLE_INSTALL_DIR>/
> > >>>>> bin/start-all-servers.sh kfs --debug
> >
> > >>>>> Secondly, if you could compile a debug build and send the stack
> > >>>>> trace
> > >>>>> that would be helpful. To do this, from your hypertable build
> > >>>>> directory run
> > >>>>> ccmake <$HYPERTABLE_SRC_DIR> and make  sure CMAKE_BUILD_TYPE is
> > >>>>> set
> > >>>>> to
> > >>>>> Debug and install the new build. After you try to bring up the
> > >>>>> RangeServer and it dumps core, you can load the core file in gdb
> > >>>>> (Eg:
> > >>>>> gdb gdb <$HYPERTABLE_INSTALL_DIR>/bin/Hypertable.RangeServer <
> > >>>>> $CORE_FILE>). You can run bt (backtrace) in gdb to get the stack
> > >>>>> trace.
> >
> > >>>>> -Sanjit
> >
> > >>>>> On Jul 21, 2009, at 5:36 PM, kuer wrote:
> >
> > >>>>>> Hi, all,
> >
> > >>>>>> one of RangeServers hangs after coredump and restarting . here
> > >>>>>> are
> > >>>>>> messages in rangeserver's log :
> >
> > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> > >>>>>> (Lib/
> > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> > >>>>>> 1246607682171649001 >= 1246607682128108001 (file='/hypertable/
> > >>>>>> servers/
> > >>>>>> 221.194.134.173_31060/log/root/0')
> > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> > >>>>>> (Lib/
> > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> > >>>>>> 1248187695757932563 >= 1247819802453791364 (file='/hypertable/
> > >>>>>> servers/
> > >>>>>> 221.194.134.173_31060/log/metadata/2')
> > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [WARN]
> > >>>>>> (Lib/
> > >>>>>> CommitLog.cc:250) clgc LOG FRAGMENT PURGE breaking because
> > >>>>>> 1248193806824860161 >= 1248189458336849002 (file='/hypertable/
> > >>>>>> servers/
> > >>>>>> 221.194.134.173_31060/log/user/401')
> > >>>>>> 2009-07-22 08:23:41,448 1295067456 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/MaintenancePrioritizerLogCleanup.cc:103) Adding
> > >>>>>> maintenance for range METADATA[0: .. ] because mid-split(1)
> > >>>>>> 2009-07-22 08:23:41,449 1295067456 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/RangeServer.cc:2032) Memory Usage: 312320288 bytes
> > >>>>>> 2009-07-22 08:23:41,449 1378986304 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> > >>>>>> METADATA
> > >>>>>> [0: .. ](default)
> > >>>>>> 2009-07-22 08:23:41,529 1378986304 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA
> > >>>>>> [0: .. ](default)
> > >>>>>> 2009-07-22 08:23:41,530 1378986304 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/AccessGroup.cc:372) Starting InMemory Compaction of
> > >>>>>> METADATA[0: .. ](location)
> > >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/AccessGroup.cc:533) Finished Compaction of METADATA
> > >>>>>> [0: .. ](location)
> > >>>>>> 2009-07-22 08:23:41,549 1378986304 Hypertable.RangeServer [INFO]
> > >>>>>> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> > >>>>>> METADATA
> > >>>>>> [0: .. ](logging)
> > >>>>>> 2009-07-22 08:23:41,552 1378986304 Hypertable.RangeServer [FATAL]
> > >>>>>> (Common/BloomFilter.h:47) failed expectation: m_num_bits != 0
> >
> > >>>>>> It seems that RangeServer cannot restore from log-replaying.
> >
> > >>>>>> What's the problem? How to fix it ?
> >
> > >>>>>> Thanks
> >
> > >>>>>>   -- kuer
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to