On Tue, Jan 26, 2010 at 9:03 PM, James Baldassari <[email protected]> wrote: > > After running a map/reduce job which inserted around 180,000 rows into > HBase, HBase appeared to be fine. We could do a count on our table, and > no errors were reported. We then tried to truncate the table in > preparation for another test but were unable to do so because the region > became stuck in a transition state.
Yes. In older hbase, truncate of > small tables was flakey. Its better in 0.20.3 (I wrote our brothers over at Cloudera about updating version they bundle especially since 0.20.3 just went out). I restarted each region server > individually, but it did not fix the problem. I tried the > disable_region and close_region commands from the hbase shell, but that > didn't work either. After doing all of that, a status 'detailed' showed > this: > > 1 regionsInTransition > name=retargeting,,1264546222144, unassigned=false, pendingOpen=false, > open=false, closing=true, pendingClose=false, closed=false, offlined=false > > Then I restarted the master and all region servers, and it looked like this: > > 1 regionsInTransition > name=retargeting,,1264546222144, unassigned=false, pendingOpen=true, > open=false, closing=false, pendingClose=false, closed=false, offlined=false Even after a master restart? Above is dump of a master internal datastructure that is kept in-memory. Strange that it would pick up same exact state on restart (As Ryan says, a restart of the master alone is usually a radical but sufficient fix). I was going to say that you try onlining the individual region in the shell but I don't think that'll work either, not unless you update to 0.20.3 era hbase. > > I noticed messages in some of the region server logs indicating that > their zookeeper sessions had expired. I'm not sure if this has anything > to do with the problem. It could. The regionservers will restart if their session w/ zk expires. Whats your hbase schema like? How are you doing your upload? I should mention that this scenario is quite > repeatable, and the last few times it has happened we had to shut down > HBase and manually remove the /hbase root from HDFS, then start HBase > and recreate the table. > For sure you've upped file descriptors and xceiver params as per the Getting Started? > > I was also wondering whether it was normal for there to be only one > region with 180,000+ rows. Shouldn't this region be split into several > regions and distributed among the region servers? I'm new to HBase, so > maybe my understanding of how it's supposed to work is wrong. Get the regions size on the filesystem: ./bin/hadoop fs -dus /hbase/table/regionname. Region splits when its above a size threshold, 256M usually. St.Ack > > Thanks, > James > > >
