I'm seeing a nice variety of Exceptions from HBase and could use some
pointers about what to do next.

This is a new map/reduce program, updating about 550k rows with around a
dozen columns on a very small cluster (only 4 nodes... as we're still
testing and it doesn't have to support production yet).  Hbase Version
0.19.1.

I ran the job and it seems to make some progress, and then dies after
several hours, reporting "NoServerForRegionException: No server address
listed in .META. for region TABLEX,,1250526695078".  I retried it a few
times with the same result.  I also noticed that the load is not well
balanced, all requests seemed to be going to one node.  I adjust
hadoop-site.xml with the addition of these two entries:

    <name>hbase.hregion.max.filesize</name>
    <value>33554432</value>

   <name>hbase.client.retries.number</name>
    <value>5</value>

And restarted hbase (and hadoop to be safe).  Re-ran and got the same error
in the M/R job.

*I thought I'd try dropping the table, since it's a new table and I can
recreate it.  But that gives another exception:
*
hbase(main):002:0> disable 'TABLEX'
NativeException: org.apache.hadoop.hbase.TableNotFoundException:
org.apache.hadoop.hbase.TableNotFoundException: TABLEX
    at
org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:129)
    at
org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:70)
    at
org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:64)
    at
org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:143)
    at org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:691)
...


*And now I see this exception in the HBase logs:
*
org.apache.hadoop.hbase.regionserver.WrongRegionException:
org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out
of range for HRegion .META.,,1250280235390, startKey='',
getEndKey()='TABLEX,,1250219949252',
row='TABLEX,840.56098.0544,1250526661861'
    at
org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1788)
    at
org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1844)
    at
org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1912)
    at
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1244)
    at
org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1216)
...


*As a test, tried a "count"...
*
hbase(main):007:0* count 'TABLEX'
NativeException: org.apache.hadoop.hbase.client.NoServerForRegionException:
No server address listed in .META. for region TABLEX,,1250526695078
    from org/apache/hadoop/hbase/client/HConnectionManager.java:548:in
`locateRegionInMeta'
    from org/apache/hadoop/hbase/client/HConnectionManager.java:478:in
`locateRegion'
    from org/apache/hadoop/hbase/client/HConnectionManager.java:440:in
`locateRegion'
    from org/apache/hadoop/hbase/client/HTable.java:114:in `<init>'
    from org/apache/hadoop/hbase/client/HTable.java:97:in `<init>'
    from sun/reflect/NativeConstructorAccessorImpl.java:-2:in `newInstance0'
...


*Also saw a thread somewhere that suggested doing a major compaction.  Did
that.  It returns almost immediately.  Not sure if that's normal or not...
no perceivable impact from doing this, though.*

hbase(main):013:0> major_compact '.META.'
0 row(s) in 0.0220 seconds
hbase(main):014:0>

Not sure what else to try?  Is there a way to force removal of the table in
question?  Is there something else I should be looking at?

Marc

Reply via email to