Restarting the master can help. Some of these bugs were fixed in 0.20.3 which was just released. Upgrade if you can!
On Jan 26, 2010 9:03 PM, "James Baldassari" <[email protected]> wrote: Hi, I'm using the Cloudera distribution of HBase, version 0.20.0~1-1.cloudera, in a fully-distributed cluster of 10 nodes. I'm using all default config options except for hbase.zookeeper.quorum, hbase.rootdir, hbase.cluster.distributed, and an updated regionservers file containing all our region servers. After running a map/reduce job which inserted around 180,000 rows into HBase, HBase appeared to be fine. We could do a count on our table, and no errors were reported. We then tried to truncate the table in preparation for another test but were unable to do so because the region became stuck in a transition state. I restarted each region server individually, but it did not fix the problem. I tried the disable_region and close_region commands from the hbase shell, but that didn't work either. After doing all of that, a status 'detailed' showed this: 1 regionsInTransition name=retargeting,,1264546222144, unassigned=false, pendingOpen=false, open=false, closing=true, pendingClose=false, closed=false, offlined=false Then I restarted the master and all region servers, and it looked like this: 1 regionsInTransition name=retargeting,,1264546222144, unassigned=false, pendingOpen=true, open=false, closing=false, pendingClose=false, closed=false, offlined=false I noticed messages in some of the region server logs indicating that their zookeeper sessions had expired. I'm not sure if this has anything to do with the problem. I should mention that this scenario is quite repeatable, and the last few times it has happened we had to shut down HBase and manually remove the /hbase root from HDFS, then start HBase and recreate the table. Any ideas what could put the region into this state or what do to do fix it? How can I prevent this from happening in the future? I was also wondering whether it was normal for there to be only one region with 180,000+ rows. Shouldn't this region be split into several regions and distributed among the region servers? I'm new to HBase, so maybe my understanding of how it's supposed to work is wrong. Thanks, James
