My suggestions: Don't run below INFO logging level for performance reasons once you have a cluster up and running.
Instead of using DN logs, instead export HBase and HDFS metrics via Ganglia. http://wiki.apache.org/hadoop/GangliaMetrics http://hadoop.apache.org/hbase/docs/current/metrics.html - Andy > On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang > <steven.zhuang.1...@gmail.com> wrote: > >... > > At present, my idea is calculating the data > > IO quantity of both HDFS and HBase for a given day, and > > with the result we can have a rough estimate > > of the situation.