Hi Doug, thanks for the reply. I'm glad to hear of your upcoming update.
I'll put my investigation on hold for a while then (well, I have to anyway
since I'm going on vacation).
Our current strategy is:
1. Store things in Hypertable until something crashes
2. Wipe out all the commit logs (losing some data)
On Mon, Sep 15, 2008 at 9:32 PM, Doug Judd <[EMAIL PROTECTED]> wrote:
> Hi Josh,
>
> Thanks for investigating! Comments inline ...
>
> On Mon, Sep 15, 2008 at 4:41 PM, Joshua Taylor <[EMAIL PROTECTED]>wrote:
>
>> I've managed to load about 2-3 TB into a Hypertable cluster with 15
>> RangeServers. I'm still having problems with the RangeServers failing to
>> clean up their old logs.
>>
>> Server User commit log in MB
>> 10.10.1.194_38060 51512.8
>> 10.10.1.200_38060 48050.6
>> 10.10.1.199_38060 12464.7
>> 10.10.1.190_38060 90.8244
>> 10.10.1.193_38060 86.9587
>> 10.10.1.188_38060 70.4194
>> 10.10.1.191_38060 58.9111
>> 10.10.1.198_38060 48.7823
>> 10.10.1.192_38060 41.6324
>> 10.10.1.202_38060 34.0847
>> 10.10.1.189_38060 28.5852
>> 10.10.1.197_38060 22.3202
>> 10.10.1.196_38060 17.3054
>> 10.10.1.201_38060 14.2905
>> 10.10.1.195_38060 0
>>
>> No new updates have been added to this cluster in about 12 hours. Most of
>> the log directories look fine but a few have gigabytes of logs that should
>> be cleaned up.
>>
>> I've added some debug logging to try and isolate what's happening. For
>> each call to CommitLog::purge(), I'm logging the cutoff timestamp used.
>> I've also added messages to print out each of the remaining log fragments
>> after the purge breaks off:
>>
>> diff --git a/src/cc/Hypertable/Lib/CommitLog.cc
>> b/src/cc/Hypertable/Lib/CommitLog.cc
>> index a6b8920..ea0f894 100644
>> --- a/src/cc/Hypertable/Lib/CommitLog.cc
>> +++ b/src/cc/Hypertable/Lib/CommitLog.cc
>> @@ -232,6 +232,8 @@ int CommitLog::purge(uint64_t timestamp) {
>> CommitLogFileInfo file_info;
>> String fname;
>>
>> + HT_DEBUGF("purge: %llu", timestamp);
>> +
>> try {
>>
>> while (!m_fragment_queue.empty()) {
>> @@ -243,7 +245,16 @@ int CommitLog::purge(uint64_t timestamp) {
>> HT_INFOF("Removed log fragment file='%s' timestamp=%lld",
>> fname.c_str(), file_info.ti
>> }
>> else {
>> - //HT_INFOF("LOG FRAGMENT PURGE breaking because %lld >= %lld",
>> file_info.timestamp, t
>> + HT_DEBUGF("LOG FRAGMENT PURGE breaking because %lld >= %lld,
>> queue size = %llu",
>> + file_info.timestamp, timestamp,
>> m_fragment_queue.size());
>> + size_t i = 0;
>> + foreach(CommitLogFileInfo const & info, m_fragment_queue)
>> + {
>> + ++i;
>> + HT_DEBUGF(" Fragment %llu: time=%lld size=%llu name=%s%u",
>> + i, info.timestamp, info.size, info.log_dir.c_str(),
>> + info.num);
>> + }
>> break;
>> }
>> }
>>
>> What I find is that the purge() function is still being called regularly,
>> but that the timestamp that gets passed in becomes stuck before the oldest
>> fragment and never changes after in later calls.
>>
>> Sometimes the timestamp threshold is just a few minutes earlier than the
>> oldest log's timestamp. Sometimes the passed in threshold value is 1, which
>> seems like a suspicious number. Here's the first few lines of a recent
>> purge from the 52 GB server:
>>
>> 1221520163 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:235) purge: 1
>> 1221520163 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:249) LOG FRAGMENT
>> PURGE breaking because 1221271063903383000 >= 1, queue size = 831
>> 1221520163 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment 1:
>> time=1221271063903383000 size=67436344
>> name=/hypertable/servers/10.10.1.194_38060/log/user/3
>> 1221520163 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment 2:
>> time=1221271046265865000 size=896929
>> name=/hypertable/servers/10.10.1.193_38060/log/9F4D8D5249B24A2C52F06D2B/0
>> 1221520163 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment 3:
>> time=1221271374609078000 size=100133731
>> name=/hypertable/servers/10.10.1.194_38060/log/user/4
>>
>> Yes, the log is currently holding 831 fragments. Also I see compactions
>> are still happening on this server, but they start and finish in the same
>> second.
>>
>> Here's a purge from the 48 GB server:
>>
>> 1221520459 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:235) purge:
>> 1221327270724907001
>> 1221520459 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:249) LOG FRAGMENT
>> PURGE breaking because 1221327332648604000 >= 1221327270724907001, queue
>> size = 622
>> 1221520459 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment 1:
>> time=1221327332648604000 size=100006176
>> name=/hypertable/servers/10.10.1.200_38060/log/user/916
>> 1221520459 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment 2:
>> time=1221327395726285000 size=100147244
>> name=/hypertable/servers/10.10.1.200_38060/log/user/917
>> 1221520459 DEBUG Hypertable.RangeServer :
>> (/home/josh/hypertable/src/cc/Hypertable/Lib/CommitLog.cc:256) Fragment 3:
>> time=1221327503701781000 size=100179860
>> name=/hypertable/servers/10.10.1.200_38060/log/user/918
>>
>> This one has 622 fragments. The last logged compaction on this server was
>> 25 hours ago, though it apparently did some non-trivial work in that time.
>> At least, the begin and end compaction log messages have some time between
>> them.
>>
>
> This definitely sheds light on the problem. I think I've figured out how
> the purge timestamp can get set to 1. It's set by the following line in
> AccessGroup::run_compaction() :
>
> m_oldest_cached_timestamp = (m_cell_cache_ptr->size() > 0) ?
> timestamp.real + 1 : 0;
>
> If after a split, the split-off portion has a few updates in the split log,
> but never receives any updates therafter, then the range could get compacted
> with timestamp.real == 0 causing the above line to set
> "m_oldest_cached_timestamp" to 1. I'll need to investigate a little further
> to figure out the other case you point out.
>
>
>> The log fragments appear to be in increasing time order, though a lot of
>> the fragments seem to be mostly empty. For example, on the 70 MB server,
>> the first fragment contains 70 MB of data followed by 353 fragments with
>> size=101.
>>
>
> The issue here is that when a range splits, the first thing that happens is
> a split log gets installed. The split log is for capturing any updates to
> the portion of the range that is getting split off. When the split finally
> completes, the RangeServer that receives the newly split off portion will be
> notified with the name of the split log and will re-play it after it loads
> the range. After it finishes replaying the split log, it will link the
> split log into the range's primary commit log. This is just a commit log
> entry that contains the name of the log that is to be linked in. When it
> does this, it rolls the log. So, in this "degenerate" primary-key order
> insert, you have a bunch of split logs that basically contain nothing being
> linked into commit logs that are not receiving any commits. That's why you
> have a bunch of tiny log fragments, then just contain a single link entry.
> There's a program called 'dumplog' that you can use to dump out a commit
> log. Point it to the log directory and it should print out a summary of the
> entries.
>
>
>> I've looked around a bit at where these purge thresholds come from. When
>> the purge starts, the code tries to get the oldest cached timestamp from
>> each access group, then initiates a purge of logs older than that. This
>> timestamp seems to be the thing getting stuck (or set to 1), so tried to
>> find out where that gets set. I think it's set in each access group after a
>> compaction to the timestamp of the compaction request (plus 1). Where the
>> compaction timestamp comes from is a bit of a mystery to me. The range
>> seems to schedule the compactions either from m_state.timestamp or
>> m_timestamp, depending on whether the compaction is initiated in
>> Range::run_compaction or Range::split_compact_and_shrink. Also, I don't
>> understand what's going on with Timestamp.real vs. Timestamp.logical. The
>> flow of this code is just a bit hard for me to track. ;)
>>
>
> This code is definitely questionable and obviously broken. I've spent the
> past 1 1/2 weeks overhauling all of this logic based on two of your
> suggestions:
>
> 1. Freeze and copy the CellCache during a compaction
> 2. Separate the timestamp from the revision number
>
> I'm nearly finished with this overhaul, but have a few more systems tests
> that I need to get passing. I hope to have this checked in within two or
> three days. These changes have cleaned up the code tremendously. Now, the
> m_oldest_cached_timestamp variable gets set with each call to
> AccessGroup::add() and does not get set with the line above. Plus, I've
> combined add() and replay_add() into just add().
>
> Anyway, that's as far as I've managed to go on this problem so far. I was
>> wondering if anyone on the list who understands this code better might be
>> able to look at the problem description and say, "A-ha! I know what's going
>> wrong here!" (And hopefully issue a patch.) Otherwise, I'll continue
>> unravelling a bit at a time.
>>
>
> I should have the new code checked in within a few days, which should take
> care of these problems, so I wouldn't bother spending too much time on
> this. Thanks for all of your help.
>
> - Doug
>
>
> >
>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---