Re: hlogs do not get cleared

stack Thu, 17 Dec 2009 10:54:35 -0800

Thanks for the below Kevin.  It seems like the mechanism in HLog that forces
flushes if too many outstanding WAL logs is being overrun in your case; it
can't keep up with your rate of log rolling.   Thanks for posting the log
snippet.  May I see the full regionserver log?  Or better, stick a reference
to it into this new issue on overrunning log upper-bound:
https://issues.apache.org/jira/browse/HBASE-2053.  Lets fix for 0.20.3.


St.Ack

On Tue, Dec 15, 2009 at 3:17 PM, Kevin Peterson <kevin...@gmail.com> wrote:

> This makes some sense now. I currently have 2200 regions across 3 tables.
> My
> largest table accounts for about 1600 of those regions and is mostly active
> at one end of the keyspace -- our key is based on date, but data only
> roughly arrives in order. I also write to two secondary indexes, which have
> no pattern to the key at all. One of these secondary tables has 488 regions
> and the other has 96 regions.
>
> We write about 10M items per day to the main table (articles). All of these
> get written to one of the secondary indexes (article-ids). About a third
> get
> written to the other secondary index. Total volume of data is about 10GB /
> day written.
>
> I think the key is as you say that the regions aren't filled enough to
> flush. The articles table gets mostly written to near one end and I see
> splits happening regularly. The index tables have no pattern so the 10
> millions writes get scattered across the different regions. I've looked
> more
> closely at a log file (linked below), and if I forget about my main table
> (which would tend to get flushed), and look only at the indexes, this seems
> to be what's happening:
>
> 1. Up to maxLogs HLogs, it doesn't do any flushes.
> 2. Once it gets above maxLogs, it will start flushing one region each time
> it creates a new HLog.
> 3. If the first HLog had edits for say 50 regions, it will need to flush
> the
> region with oldest edits 50 times before the HLog can be removed.
>
> If N is the number of regions getting written to, but not getting enough
> writes to flush on their own, then I think this converges to maxLogs + N
> logs on average. If I think of maxLogs as "number of logs to start flushing
> regions at" this makes sense.
>
>
> http://kdpeterson.net/paste/hbase-hadoop-regionserver-mi-prod-app35.ec2.biz360.com.log.2009-12-14
>

Re: hlogs do not get cleared

Reply via email to