Hi Kuer, Thanks for catching that!
-Sanjit On Fri, Jul 24, 2009 at 12:20 AM, kuer <[email protected]> wrote: > > Hi, Sanjit, > > there some code in patch : > > diff --git a/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc b/ > src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc > index 4c408e3..6c55cde 100644 > --- a/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc > +++ b/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc > @@ -86,9 +86,16 @@ void MaintenanceScheduler::schedule() { > * Purge commit log fragments > */ > { > - int64_t revision_root = TIMESTAMP_MAX; > - int64_t revision_metadata = TIMESTAMP_MAX; > - int64_t revision_user = TIMESTAMP_MAX; > + int64_t revision_user; > + int64_t revision_metadata; > + int64_t revision_root; > + > + (Global::user_log !=0) ? > + revision_user = Global::user_log->get_latest_revision() : > TIMESTAMP_MIN; > + (Global::metadata_log !=0) ? > + revision_metadata = Global::metadata_log->get_latest_revision > () : TIMESTAMP_MIN; > + (Global::root_log !=0) ? > + revision_root = Global::root_log->get_latest_revision() : > TIMESTAMP_MIN; > > I think they are wrong. they should be : > > + revision_user = (Global::user_log !=0) ? > + Global::user_log->get_latest_revision() : TIMESTAMP_MIN; > + revision_metadata = (Global::metadata_log !=0) ? > + Global::metadata_log->get_latest_revision() : TIMESTAMP_MIN; > + revision_root = (Global::root_log !=0) ? > + Global::root_log->get_latest_revision() : TIMESTAMP_MIN; > > Is that right??? > > -- kuer > > > On 7月23日, 下午10时18分, Sanjit Jhala <[email protected]> wrote: > > Hi Kuer, > > > > As I mentioned before, the real issue here is why the logging access > > group in the METADATA table is being compacted. That seems to be the > > root of any corruption. Coming to the issue of commit log replay, we > > have a couple of log replay fixes in the soon to be released 0.9.2.5 > > release. You can apply the attached patches to see if that solves the > > problem. > > > > -Sanjit > > > > 0001-Fixed-bug-in-CommitLogReader-caused-by-fragment-queu.patch > > 6K查看下载 > > > > > > > > 0002-Fixed-bug-in-CommitLog-that-was-causing-some-fragmen.patch > > 5K查看下载 > > > > > > > > On Jul 22, 2009, at 6:20 PM, kuer wrote: > > > > > > > > > Hi, Sanjit, > > > > > I have patched the HT_INFOF() in AccessGroup.cc. > > > > > But, I modified CellStoreV1.cc to make sure m_trailer.num_filter_items > > >> 0 when creating BloomFilter. I think this value is something like > > > capacity of bloomfilter, so enlarging it will not make much trouble. > > > > > After modifying and launching, the rangeserver cannot finish log- > > > replaying. It seemed that the modified version rangeserver has > > > destroy something. This time "RANGE SERVER range not found". > > > > > I post it in another post: > > >http://groups.google.com/group/hypertable-dev/browse_thread/thread/fd. > .. > > > > > Thanks > > > > > -- kuer > > > > > On 7月23日, 上午7时38分, Sanjit Jhala <[email protected]> wrote: > > >> Hi Kuer, > > > > >> I suspect the BloomFilter code is fine. Taking a look at your logs it > > >> looks like the METADATA table is going through a split and the > > >> AccessGroup "logging" is undergoing a major compaction, however since > > >> it is empty, nothing gets inserted in BloomFilterItems and hence the > > >> assert gets hit. > > >> (From the log: > > >> 2009-07-22 09:39:01,415 1351514432 Hypertable.RangeServer [INFO] > > >> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of > > >> METADATA[0:<FF> <FF>..<FF><FF>](logging)) > > > > >> This AccessGroup is currently not used by the system and so it should > > >> be empty and should not undergo a compaction. Can you make the change > > >> (below) to AccessGroup.cc so we can have a better idea of why its > > >> compacting? I suspect it might be a memory corruption issue (that we > > >> have a fix for in the upcoming release). > > > > >> It would be great if you ran the RangeServer with valgrind turned on > > >> and see if the valgrind log reveals anything further (to do this add > > >> the option --valgrind-rangeserver when calling the start-all- > > >> servers.sh script (eg: <$HYPERTABLE_INSTALL_DIR/bin/start-all- > > >> servers.sh --valgrind-rangeserver > )) > > > > >> -Sanjit > > > > >> --- a/src/cc/Hypertable/RangeServer/AccessGroup.cc > > >> +++ b/src/cc/Hypertable/RangeServer/AccessGroup.cc > > >> @@ -375,8 +375,8 @@ void AccessGroup::run_compaction(bool major) { > > >> if (m_immutable_cache->memory_used()==0 && m_stores.size() > > >> <= (size_t)1) > > >> HT_THROW(Error::OK, ""); > > >> tableidx = 0; > > >> - HT_INFOF("Starting Major Compaction of %s(%s)", > > >> - m_range_name.c_str(), m_name.c_str()); > > >> + HT_INFOF("Starting Major Compaction of %s(%s) immutable > > >> cache > > >> mem=%llu, num cell stores=%d", > > >> + m_range_name.c_str(), m_name.c_str(), > > >> m_immutable_cache->memory_used(), m_stores.size()); > > >> } > > >> else { > > >> if (m_stores.size() > > > >> (size_t)Global::access_group_max_files) { > > > > >> On Jul 22, 2009, at 4:11 AM, kuer wrote: > > > > >>> Hi, all, > > > > >>> I find something interesting in cc/Hypertable/RangeServer/ > > >>> CellStoreV1.cc : > > > > >>> 168 if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) { > > >>> 169 m_bloom_filter_items = new BloomFilterItems(); // > > >>> aproximator > > >>> items > > >>> 170 } > > > > >>> 367 > > >>> 368 // if bloom_items haven't been spilled to create a bloom > > >>> filter > > >>> yet, do it > > >>> 369 if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) { > > >>> 370 if (m_bloom_filter_items) { > > >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^ > > >>> I think this cannot promise m_bloom_filter_items->size() > 0 > > > > >>> 371 m_trailer.num_filter_items = m_bloom_filter_items->size(); > > >>> ^^^^^ How about adding the following lines ??? > > >>> + > > >>> + if (m_trailer.num_filter_items < 1 ) { > > >>> + m_trailer.num_filter_items = m_max_entries; > > >>> + } > > >>> + if (m_trailer.num_filter_items < 1) { > > >>> + m_trailer.num_filter_items = 1; > > >>> + } > > >>> + > > >>> 372 create_bloom_filter(); > > >>> 373 } > > >>> 374 assert(!m_bloom_filter_items && m_bloom_filter); > > >>> 375 > > >>> 376 m_bloom_filter->serialize(send_buf); > > >>> 377 m_filesys->append(m_fd, send_buf, 0, &m_sync_handler); > > >>> 378 > > >>> 379 m_outstanding_appends++; > > >>> 380 m_offset += m_bloom_filter->size(); > > >>> 381 } > > >>> 382 > > > > >>> thanks > > > > >>> -- kuer > > > > >>> On 7月22日, 下午5时05分, kuer <[email protected]> wrote: > > >>>> Hi, all, > > > > >>>> the content of the file that cause assertion failure of > > >>>> BloomFilter : > > > > >>>> /hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0 > > > > >>>> $ hexdump -C cs0 > > >>>> 00000000 49 64 78 46 69 78 2d 2d 2d 2d 1a 00 ff ff ff ff | > > >>>> IdxFix----......| > > >>>> 00000010 00 00 00 00 00 00 00 00 7d 9f 49 64 78 56 61 72 > > >>>> |........}.IdxVar| > > >>>> 00000020 2d 2d 2d 2d 1a 00 ff ff ff ff 00 00 00 00 00 00 > > >>>> |----............| > > >>>> 00000030 00 00 87 97 |....| > > >>>> 00000034 > > > > >>>> FYI > > > > >>>> -- kuer > > > > >>>> On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]> > > >>>> wrote: > > > > >>>>> Recovering ranges from crashed RangeServers is one of the high > > >>>>> priority items Doug is working on. > > > > >>>>> -Sanjit > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
