We're running a 13 node HBase cluster. We had some problems a week ago with it being overloaded and errors related to not being able to find a block on HDFS, but adding four more nodes and increasing max heap from 3GB to 4.5GB on all nodes fixed any problems.
Looking at the logs now, though, we see that HLogs are not getting removed: 2009-12-15 01:45:48,426 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260867136036, entries=210524, calcsize=63757422, filesize=41073798. New hlog /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421 2009-12-15 01:45:48,427 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs: logs=130, maxlogs=96; forcing flush of region with oldest edits: articles-article-id,f15489ea-38a4-4127-9179-1b2dc5f3b5d4,1260083783909 2009-12-15 01:57:14,188 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region articles,\x00\x00\x01\x25\x8C\x0F\xCB\x18\xB5U\xF7\xC6\x5DoH\xB8\x98\xEBH,E\x7C\x07\x14,1260830133341 2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Roll /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260870348421, entries=92795, calcsize=63908073, filesize=54042783. New hlog /hbase/.logs/mi-prod-app33,60020,1260495617070/hlog.dat.1260871037510 2009-12-15 01:57:17,519 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs: logs=131, maxlogs=96; forcing flush of region with oldest edits: articles-article-id,f1cd1b02-3d1b-453c-b44f-94ec5a1e3a46,1260007536878 >From reading the log message, I interpret this as saying that every time it rolls an hlog, if there are more than maxlogs logs, it will flush one region. I'm assuming that a log could have edits for multiple regions, so this seems to mean that if we have 100 regions and maxlogs set to 96, if it flushes one region each time it rolls a log, it will create 100 logs before it flushes all regions and is able to delete the log, so it will reach steady state at 196 hlogs. Is this correct? We're concerned because when we had problems last week, we saw lots of log messages related to "Too many hlogs" and had assumed they were related to the problems. Is this anything to worry about?