Thanks for the below Kevin. It seems like the mechanism in HLog that forces flushes if too many outstanding WAL logs is being overrun in your case; it can't keep up with your rate of log rolling. Thanks for posting the log snippet. May I see the full regionserver log? Or better, stick a reference to it into this new issue on overrunning log upper-bound: https://issues.apache.org/jira/browse/HBASE-2053. Lets fix for 0.20.3.
St.Ack On Tue, Dec 15, 2009 at 3:17 PM, Kevin Peterson <kevin...@gmail.com> wrote: > This makes some sense now. I currently have 2200 regions across 3 tables. > My > largest table accounts for about 1600 of those regions and is mostly active > at one end of the keyspace -- our key is based on date, but data only > roughly arrives in order. I also write to two secondary indexes, which have > no pattern to the key at all. One of these secondary tables has 488 regions > and the other has 96 regions. > > We write about 10M items per day to the main table (articles). All of these > get written to one of the secondary indexes (article-ids). About a third > get > written to the other secondary index. Total volume of data is about 10GB / > day written. > > I think the key is as you say that the regions aren't filled enough to > flush. The articles table gets mostly written to near one end and I see > splits happening regularly. The index tables have no pattern so the 10 > millions writes get scattered across the different regions. I've looked > more > closely at a log file (linked below), and if I forget about my main table > (which would tend to get flushed), and look only at the indexes, this seems > to be what's happening: > > 1. Up to maxLogs HLogs, it doesn't do any flushes. > 2. Once it gets above maxLogs, it will start flushing one region each time > it creates a new HLog. > 3. If the first HLog had edits for say 50 regions, it will need to flush > the > region with oldest edits 50 times before the HLog can be removed. > > If N is the number of regions getting written to, but not getting enough > writes to flush on their own, then I think this converges to maxLogs + N > logs on average. If I think of maxLogs as "number of logs to start flushing > regions at" this makes sense. > > > http://kdpeterson.net/paste/hbase-hadoop-regionserver-mi-prod-app35.ec2.biz360.com.log.2009-12-14 >