Hi Kuer, As I mentioned before, the real issue here is why the logging access group in the METADATA table is being compacted. That seems to be the root of any corruption. Coming to the issue of commit log replay, we have a couple of log replay fixes in the soon to be released 0.9.2.5 release. You can apply the attached patches to see if that solves the problem.
-Sanjit --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
0001-Fixed-bug-in-CommitLogReader-caused-by-fragment-queu.patch
Description: Binary data
0002-Fixed-bug-in-CommitLog-that-was-causing-some-fragmen.patch
Description: Binary data
On Jul 22, 2009, at 6:20 PM, kuer wrote: > > Hi, Sanjit, > > I have patched the HT_INFOF() in AccessGroup.cc. > > But, I modified CellStoreV1.cc to make sure m_trailer.num_filter_items >> 0 when creating BloomFilter. I think this value is something like > capacity of bloomfilter, so enlarging it will not make much trouble. > > After modifying and launching, the rangeserver cannot finish log- > replaying. It seemed that the modified version rangeserver has > destroy something. This time "RANGE SERVER range not found". > > I post it in another post: > http://groups.google.com/group/hypertable-dev/browse_thread/thread/fd137bfa8e98281a > > Thanks > > -- kuer > > > On 7月23日, 上午7时38分, Sanjit Jhala <[email protected]> wrote: >> Hi Kuer, >> >> I suspect the BloomFilter code is fine. Taking a look at your logs it >> looks like the METADATA table is going through a split and the >> AccessGroup "logging" is undergoing a major compaction, however since >> it is empty, nothing gets inserted in BloomFilterItems and hence the >> assert gets hit. >> (From the log: >> 2009-07-22 09:39:01,415 1351514432 Hypertable.RangeServer [INFO] >> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of >> METADATA[0:<FF> <FF>..<FF><FF>](logging)) >> >> This AccessGroup is currently not used by the system and so it should >> be empty and should not undergo a compaction. Can you make the change >> (below) to AccessGroup.cc so we can have a better idea of why its >> compacting? I suspect it might be a memory corruption issue (that we >> have a fix for in the upcoming release). >> >> It would be great if you ran the RangeServer with valgrind turned on >> and see if the valgrind log reveals anything further (to do this add >> the option --valgrind-rangeserver when calling the start-all- >> servers.sh script (eg: <$HYPERTABLE_INSTALL_DIR/bin/start-all- >> servers.sh --valgrind-rangeserver > )) >> >> -Sanjit >> >> --- a/src/cc/Hypertable/RangeServer/AccessGroup.cc >> +++ b/src/cc/Hypertable/RangeServer/AccessGroup.cc >> @@ -375,8 +375,8 @@ void AccessGroup::run_compaction(bool major) { >> if (m_immutable_cache->memory_used()==0 && m_stores.size() >> <= (size_t)1) >> HT_THROW(Error::OK, ""); >> tableidx = 0; >> - HT_INFOF("Starting Major Compaction of %s(%s)", >> - m_range_name.c_str(), m_name.c_str()); >> + HT_INFOF("Starting Major Compaction of %s(%s) immutable >> cache >> mem=%llu, num cell stores=%d", >> + m_range_name.c_str(), m_name.c_str(), >> m_immutable_cache->memory_used(), m_stores.size()); >> } >> else { >> if (m_stores.size() > >> (size_t)Global::access_group_max_files) { >> >> On Jul 22, 2009, at 4:11 AM, kuer wrote: >> >> >> >>> Hi, all, >> >>> I find something interesting in cc/Hypertable/RangeServer/ >>> CellStoreV1.cc : >> >>> 168 if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) { >>> 169 m_bloom_filter_items = new BloomFilterItems(); // >>> aproximator >>> items >>> 170 } >> >>> 367 >>> 368 // if bloom_items haven't been spilled to create a bloom >>> filter >>> yet, do it >>> 369 if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) { >>> 370 if (m_bloom_filter_items) { >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^ >>> I think this cannot promise m_bloom_filter_items->size() > 0 >> >>> 371 m_trailer.num_filter_items = m_bloom_filter_items->size(); >>> ^^^^^ How about adding the following lines ??? >>> + >>> + if (m_trailer.num_filter_items < 1 ) { >>> + m_trailer.num_filter_items = m_max_entries; >>> + } >>> + if (m_trailer.num_filter_items < 1) { >>> + m_trailer.num_filter_items = 1; >>> + } >>> + >>> 372 create_bloom_filter(); >>> 373 } >>> 374 assert(!m_bloom_filter_items && m_bloom_filter); >>> 375 >>> 376 m_bloom_filter->serialize(send_buf); >>> 377 m_filesys->append(m_fd, send_buf, 0, &m_sync_handler); >>> 378 >>> 379 m_outstanding_appends++; >>> 380 m_offset += m_bloom_filter->size(); >>> 381 } >>> 382 >> >>> thanks >> >>> -- kuer >> >>> On 7月22日, 下午5时05分, kuer <[email protected]> wrote: >>>> Hi, all, >> >>>> the content of the file that cause assertion failure of >>>> BloomFilter : >> >>>> /hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0 >> >>>> $ hexdump -C cs0 >>>> 00000000 49 64 78 46 69 78 2d 2d 2d 2d 1a 00 ff ff ff ff | >>>> IdxFix----......| >>>> 00000010 00 00 00 00 00 00 00 00 7d 9f 49 64 78 56 61 72 >>>> |........}.IdxVar| >>>> 00000020 2d 2d 2d 2d 1a 00 ff ff ff ff 00 00 00 00 00 00 >>>> |----............| >>>> 00000030 00 00 87 97 |....| >>>> 00000034 >> >>>> FYI >> >>>> -- kuer >> >>>> On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]> >>>> wrote: >> >>>>> Recovering ranges from crashed RangeServers is one of the high >>>>> priority items Doug is working on. >> >>>>> -Sanjit > --~--~---------~--~----~------------~-------~--~----~ > You received this message because you are subscribed to the Google > Groups "Hypertable Development" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en > -~----------~----~----~----~------~----~------~--~--- >
