Hi, We are are using the latest release hadoop 0.20.2 and hbase 0.20.3. Thank you for your help.
Tuan Nguyen. On Sat, Mar 20, 2010 at 8:44 PM, Ted Yu <yuzhih...@gmail.com> wrote: > What hbase version are you using ? > > On Saturday, March 20, 2010, Tuan Nguyen <tua...@gmail.com> wrote: > > Hi, > > > > We are running stress test to evaluate the hbase. The test run fine and > > complete. But we have a small problem with one node. Here is our > > configuration and problems: > > > > 1. We have 1 master and 4 slaves. the master is used for both namenode > and > > hbase master server. The slaves are used for both datanode and region > > server. > > 2. We have set xceivier to 8192 and enable the lzo compression. > > 3. From another machine, we create 8 threads to write the data into the > > cluster, each record is about 5kb to 100kb. > > 4. The test run fine for the first 2 hours - 3 hours, but then one of > node > > get the following warning: > > > > 2010-03-19 20:26:22,814 INFO > > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block > > blk_9088042710721149043_145344 file > > > /mnt/moom/hadoop/0.20.1/dfs/data/current/subdir6/subdir33/blk_90880427107211490432010-03-19 > > 20:26:22,846 > > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing > > datanode Commandjava.io.IOException: Error in deleting blocks. > > at > > > org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1361) > > > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:868) > > > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:830) > > > > at > > > org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:710) > > > > at > org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1186) > > > > at java.lang.Thread.run(Thread.java:619) > > > > 5. After the warning, I do not see the info Deleting block > > blk_xxxxxxxxxxxxxxxxxxxxxxxxx message on this node anymore and we loose > the > > disk space very fast on this datanode. I guess because the hbase compact > the > > region and delete the old region, but the datanode is unable to reclaim > > the free block. > > > > 6. After 5 - 6 hours, the datanode is completely run out of the space, > but > > the test is continue running at slower insert rate. > > 7. The entire test finish after 14 hours. > > 8. Right after the test finish, this datanode start resume reclaim the > > deleting blocks. > > > > We run the test twice and the same problem occurs on the same node. I am > > wonder what is the possible reason that cause our problem and any > > configuration parameter we can tune to fix the problem > > > > Thank for your help! > > Tuan Nguyen. > > >