Hi Kuer,

Thanks for catching that!

-Sanjit

On Fri, Jul 24, 2009 at 12:20 AM, kuer <[email protected]> wrote:

>
> Hi, Sanjit,
>
> there some code in patch :
>
> diff --git a/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc b/
> src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc
> index 4c408e3..6c55cde 100644
> --- a/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc
> +++ b/src/cc/Hypertable/RangeServer/MaintenanceScheduler.cc
> @@ -86,9 +86,16 @@ void MaintenanceScheduler::schedule() {
>    * Purge commit log fragments
>    */
>   {
> -    int64_t revision_root     = TIMESTAMP_MAX;
> -    int64_t revision_metadata = TIMESTAMP_MAX;
> -    int64_t revision_user     = TIMESTAMP_MAX;
> +    int64_t revision_user;
> +    int64_t revision_metadata;
> +    int64_t revision_root;
> +
> +    (Global::user_log !=0) ?
> +        revision_user = Global::user_log->get_latest_revision() :
> TIMESTAMP_MIN;
> +    (Global::metadata_log !=0) ?
> +        revision_metadata = Global::metadata_log->get_latest_revision
> () : TIMESTAMP_MIN;
> +    (Global::root_log !=0) ?
> +        revision_root = Global::root_log->get_latest_revision() :
> TIMESTAMP_MIN;
>
> I think they are wrong. they should be :
>
> +    revision_user =  (Global::user_log !=0) ?
> +        Global::user_log->get_latest_revision() : TIMESTAMP_MIN;
> +    revision_metadata = (Global::metadata_log !=0) ?
> +        Global::metadata_log->get_latest_revision() : TIMESTAMP_MIN;
> +    revision_root = (Global::root_log !=0) ?
> +        Global::root_log->get_latest_revision() : TIMESTAMP_MIN;
>
> Is that right???
>
>  -- kuer
>
>
> On 7月23日, 下午10时18分, Sanjit Jhala <[email protected]> wrote:
> > Hi Kuer,
> >
> > As I mentioned before, the real issue here is why the logging access
> > group in the METADATA table is being compacted. That seems to be the
> > root of any corruption. Coming to the issue of commit log replay, we
> > have a couple of log replay fixes in the soon to be released 0.9.2.5
> > release. You can apply the attached patches to see if that solves the
> > problem.
> >
> > -Sanjit
> >
> >  0001-Fixed-bug-in-CommitLogReader-caused-by-fragment-queu.patch
> > 6K查看下载
> >
> >
> >
> >  0002-Fixed-bug-in-CommitLog-that-was-causing-some-fragmen.patch
> > 5K查看下载
> >
> >
> >
> > On Jul 22, 2009, at 6:20 PM, kuer wrote:
> >
> >
> >
> > > Hi, Sanjit,
> >
> > > I have patched the HT_INFOF() in AccessGroup.cc.
> >
> > > But, I modified CellStoreV1.cc to make sure m_trailer.num_filter_items
> > >> 0 when creating BloomFilter. I think this value is something like
> > > capacity of bloomfilter, so enlarging it will not make much trouble.
> >
> > > After modifying and launching, the rangeserver cannot finish log-
> > > replaying.  It seemed that the modified version rangeserver has
> > > destroy something. This time "RANGE SERVER range not found".
> >
> > > I post it in another post:
> > >http://groups.google.com/group/hypertable-dev/browse_thread/thread/fd.
> ..
> >
> > > Thanks
> >
> > >   -- kuer
> >
> > > On 7月23日, 上午7时38分, Sanjit Jhala <[email protected]> wrote:
> > >> Hi Kuer,
> >
> > >> I suspect the BloomFilter code is fine. Taking a look at your logs it
> > >> looks like the METADATA table is going through a split and the
> > >> AccessGroup "logging" is undergoing a major compaction, however since
> > >> it is empty, nothing gets inserted in BloomFilterItems and hence the
> > >> assert gets hit.
> > >> (From the log:
> > >> 2009-07-22 09:39:01,415 1351514432 Hypertable.RangeServer [INFO]
> > >> (RangeServer/AccessGroup.cc:379) Starting Major Compaction of
> > >> METADATA[0:<FF> <FF>..<FF><FF>](logging))
> >
> > >> This AccessGroup is currently not used by the system and so it should
> > >> be empty and should not undergo a compaction. Can you make the change
> > >> (below) to AccessGroup.cc so we can have a better idea of why its
> > >> compacting? I suspect it might be a memory corruption issue (that we
> > >> have a fix  for in the upcoming release).
> >
> > >> It would be great if you ran the RangeServer with valgrind turned on
> > >> and see if the valgrind log reveals anything further (to do this add
> > >> the option --valgrind-rangeserver when calling the start-all-
> > >> servers.sh script (eg: <$HYPERTABLE_INSTALL_DIR/bin/start-all-
> > >> servers.sh --valgrind-rangeserver > ))
> >
> > >> -Sanjit
> >
> > >> --- a/src/cc/Hypertable/RangeServer/AccessGroup.cc
> > >> +++ b/src/cc/Hypertable/RangeServer/AccessGroup.cc
> > >> @@ -375,8 +375,8 @@ void AccessGroup::run_compaction(bool major) {
> > >>           if (m_immutable_cache->memory_used()==0 && m_stores.size()
> > >> <= (size_t)1)
> > >>             HT_THROW(Error::OK, "");
> > >>           tableidx = 0;
> > >> -        HT_INFOF("Starting Major Compaction of %s(%s)",
> > >> -                 m_range_name.c_str(), m_name.c_str());
> > >> +        HT_INFOF("Starting Major Compaction of %s(%s) immutable
> > >> cache
> > >> mem=%llu, num cell stores=%d",
> > >> +                 m_range_name.c_str(), m_name.c_str(),
> > >> m_immutable_cache->memory_used(), m_stores.size());
> > >>         }
> > >>         else {
> > >>           if (m_stores.size() >
> > >> (size_t)Global::access_group_max_files) {
> >
> > >> On Jul 22, 2009, at 4:11 AM, kuer wrote:
> >
> > >>> Hi, all,
> >
> > >>> I find something interesting in cc/Hypertable/RangeServer/
> > >>> CellStoreV1.cc :
> >
> > >>> 168   if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) {
> > >>> 169     m_bloom_filter_items = new BloomFilterItems(); //
> > >>> aproximator
> > >>> items
> > >>> 170   }
> >
> > >>> 367
> > >>> 368   // if bloom_items haven't been spilled to create a bloom
> > >>> filter
> > >>> yet, do it
> > >>> 369   if (m_bloom_filter_mode != BLOOM_FILTER_DISABLED) {
> > >>> 370     if (m_bloom_filter_items) {
> > >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >>> I think this cannot promise m_bloom_filter_items->size() > 0
> >
> > >>> 371       m_trailer.num_filter_items = m_bloom_filter_items->size();
> > >>> ^^^^^ How about adding the following lines ???
> > >>> +
> > >>> +           if (m_trailer.num_filter_items <  1 ) {
> > >>> +               m_trailer.num_filter_items = m_max_entries;
> > >>> +           }
> > >>> +           if (m_trailer.num_filter_items < 1) {
> > >>> +               m_trailer.num_filter_items = 1;
> > >>> +           }
> > >>> +
> > >>> 372       create_bloom_filter();
> > >>> 373     }
> > >>> 374     assert(!m_bloom_filter_items && m_bloom_filter);
> > >>> 375
> > >>> 376     m_bloom_filter->serialize(send_buf);
> > >>> 377     m_filesys->append(m_fd, send_buf, 0, &m_sync_handler);
> > >>> 378
> > >>> 379     m_outstanding_appends++;
> > >>> 380     m_offset += m_bloom_filter->size();
> > >>> 381   }
> > >>> 382
> >
> > >>> thanks
> >
> > >>> -- kuer
> >
> > >>> On 7月22日, 下午5时05分, kuer <[email protected]> wrote:
> > >>>> Hi, all,
> >
> > >>>> the content of the file that cause assertion failure of
> > >>>> BloomFilter :
> >
> > >>>> /hypertable/tables/METADATA/logging/AB2A0D28DE6B77FFDD6C72AF/cs0
> >
> > >>>> $ hexdump -C cs0
> > >>>> 00000000  49 64 78 46 69 78 2d 2d  2d 2d 1a 00 ff ff ff ff  |
> > >>>> IdxFix----......|
> > >>>> 00000010  00 00 00 00 00 00 00 00  7d 9f 49 64 78 56 61 72
> > >>>> |........}.IdxVar|
> > >>>> 00000020  2d 2d 2d 2d 1a 00 ff ff  ff ff 00 00 00 00 00 00
> > >>>> |----............|
> > >>>> 00000030  00 00 87 97                                       |....|
> > >>>> 00000034
> >
> > >>>>  FYI
> >
> > >>>>    -- kuer
> >
> > >>>> On 7月22日, 下午1时03分, Sanjit Jhala <[email protected]>
> > >>>> wrote:
> >
> > >>>>>   Recovering ranges from crashed RangeServers is one of the high
> > >>>>> priority items Doug is working on.
> >
> > >>>>> -Sanjit
> > >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to