[ https://issues.apache.org/jira/browse/HBASE-12255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171047#comment-14171047 ]
Andrew Purtell commented on HBASE-12255: ---------------------------------------- This looks like an HBase bug, so I moved it to the HBase project. By any chance is a complete stacktrace for those NullPointerExceptions logged? Can you provide it here? > hbase table is unavailable after restart hdfs without stop hbase > ---------------------------------------------------------------- > > Key: HBASE-12255 > URL: https://issues.apache.org/jira/browse/HBASE-12255 > Project: HBase > Issue Type: Bug > Environment: hadoop-2.5.1 > hbase-0.98 > phonenix-4.1.0 > Reporter: Junyong Li > > I have a phoenix table 'EVENT', and the table have a index 'IDX_DATE_HOUR_X'. > I restarted hdfs without stop hbase, after then, the hbase table cann't be > scaned. > I try to restart hbase, the all hbase table still cann't be scaned. > the regionserver log have many exception like this: > 2014-10-13 19:33:05,287 INFO > [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t3] > client.AsyncProcess: #4, waiting for some tasks to finish. Expected max=0, > tasksSent=9, tasksDone=8, currentTasksDone=8, retries=8 hasError=fal > se, tableName=IDX_DATE_HOUR_X > 2014-10-13 19:33:05,298 INFO > [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t2] > client.AsyncProcess: #5, waiting for some tasks to finish. Expected max=0, > tasksSent=9, tasksDone=8, currentTasksDone=8, retries=8 hasError=fal > se, tableName=IDX_DATE_HOUR_X > 2014-10-13 19:33:05,311 INFO > [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t1] > client.AsyncProcess: #6, waiting for some tasks to finish. Expected max=0, > tasksSent=9, tasksDone=8, currentTasksDone=8, retries=8 hasError=fal > se, tableName=IDX_DATE_HOUR_X > 2014-10-13 19:33:06,452 INFO [ReplicationExecutor-0] > replication.ReplicationQueuesZKImpl: Moving > A01101303447-V1,60020,1413199414409's hlogs to my queue > 2014-10-13 19:33:15,325 INFO > [A01101303447-V1,60020,1413199890407-recovery-writer--pool4-t1] > client.AsyncProcess: #6, waiting for some tasks to finish. Expected max=0, > tasksSent=10, tasksDone=9, currentTasksDone=9, retries=9 hasError=fa > lse, tableName=IDX_DATE_HOUR_X > 2014-10-13 19:33:15,333 INFO [htable-pool6-t2] client.AsyncProcess: #6, > table=IDX_DATE_HOUR_X, attempt=10/350 failed 12 ops, last exception: > org.apache.hadoop.hbase.exceptions.RegionOpeningException: > org.apache.hadoop.hbase.exceptions.R > egionOpeningException: Region > IDX_DATE_HOUR_X,t\x00\x00\x00\x00\x00,1413186874829.9a92abb84768b129df3faedb877f7bea. > is opening on A01101303447-V1,60020,1413199890407 > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2759) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4213) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3437) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29593) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) > at java.lang.Thread.run(Thread.java:744) > ------------------------------- > After two days's try, i found that: > If idisable 'EVENT', other tables can be scanned, then i enable 'EVENT' > manually, the region log show that NullPointExceptin has occur then replaying > WAL, the following is log: > 2014-10-13 19:25:21,043 INFO [RS_OPEN_REGION-A01101303447-V1:60020-1] > regionserver.HRegion: Replaying edits from > hdfs://localhost/hbase-0.98/data/default/EVENT/def4a581d4ad963cbb8cad32cbfbab2e/recovered.edits/0000000000000000002 > 2014-10-13 19:25:21,048 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t2-SendThread(localhost:2182)] > zookeeper.ClientCnxn: Opening socket connection to server > localhost/0:0:0:0:0:0:0:1:2182. Will not attempt to authe > nticate using SASL (unknown error) > 2014-10-13 19:25:21,049 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t2-SendThread(localhost:2182)] > zookeeper.ClientCnxn: Socket connection established to > localhost/0:0:0:0:0:0:0:1:2182, initiating session > 2014-10-13 19:25:21,051 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3] > zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x199c8484 > connecting to ZooKeeper ensemble=localhost:2182 > 2014-10-13 19:25:21,051 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3] > zookeeper.ZooKeeper: Initiating client connection, > connectString=localhost:2182 sessionTimeout=90000 > watcher=hconnection-0x199c8484, quorum=lo > calhost:2182, baseZNode=/hbase > 2014-10-13 19:25:21,052 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3-SendThread(localhost:2182)] > zookeeper.ClientCnxn: Opening socket connection to server > localhost/0:0:0:0:0:0:0:1:2182. Will not attempt to authe > nticate using SASL (unknown error) > 2014-10-13 19:25:21,052 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3-SendThread(localhost:2182)] > zookeeper.ClientCnxn: Socket connection established to > localhost/0:0:0:0:0:0:0:1:2182, initiating session > 2014-10-13 19:25:21,053 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t2-SendThread(localhost:2182)] > zookeeper.ClientCnxn: Session establishment complete on server > localhost/0:0:0:0:0:0:0:1:2182, sessionid = 0x149093 > fc1c80203, negotiated timeout = 90000 > 2014-10-13 19:25:21,055 INFO > [A01101303447-V1,60020,1413199414409-recovery-writer--pool17-t3-SendThread(localhost:2182)] > zookeeper.ClientCnxn: Session establishment complete on server > localhost/0:0:0:0:0:0:0:1:2182, sessionid = 0x149093 > fc1c80204, negotiated timeout = 90000 > 2014-10-13 19:25:21,103 ERROR [RS_OPEN_REGION-A01101303447-V1:60020-1] > handler.OpenRegionHandler: Failed open of > region=EVENT,\x18,1413186838852.def4a581d4ad963cbb8cad32cbfbab2e., starting > to roll back the global memstore size. > java.lang.NullPointerException > 2014-10-13 19:25:21,104 INFO [RS_OPEN_REGION-A01101303447-V1:60020-1] > handler.OpenRegionHandler: Opening of region {ENCODED => > def4a581d4ad963cbb8cad32cbfbab2e, NAME => > 'EVENT,\x18,1413186838852.def4a581d4ad963cbb8cad32cbfbab2e.', START > KEY => '\x18', ENDKEY => '\x19'} failed, transitioning from OPENING to > FAILED_OPEN in ZK, expecting version 28 > 2014-10-13 19:25:21,104 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-1] > zookeeper.ZKAssign: regionserver:60020-0x149093fc1c80001, > quorum=localhost:2182, baseZNode=/hbase Transitioning > def4a581d4ad963cbb8cad32cbfbab2e from RS_ZK_REGION_OPE > NING to RS_ZK_REGION_FAILED_OPEN > 2014-10-13 19:25:21,107 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-1] > zookeeper.ZKAssign: regionserver:60020-0x149093fc1c80001, > quorum=localhost:2182, baseZNode=/hbase Transitioned node > def4a581d4ad963cbb8cad32cbfbab2e from RS_ZK_REGION > _OPENING to RS_ZK_REGION_FAILED_OPEN > 2014-10-13 19:25:21,108 ERROR [RS_OPEN_REGION-A01101303447-V1:60020-2] > handler.OpenRegionHandler: Failed open of > region=EVENT,\x15,1413186838852.4592fbc9db5e6eb05c812dcd81f5fa4d., starting > to roll back the global memstore size. > java.lang.NullPointerException > 2014-10-13 19:25:21,108 INFO [RS_OPEN_REGION-A01101303447-V1:60020-2] > handler.OpenRegionHandler: Opening of region {ENCODED => > 4592fbc9db5e6eb05c812dcd81f5fa4d, NAME => > 'EVENT,\x15,1413186838852.4592fbc9db5e6eb05c812dcd81f5fa4d.', START > KEY => '\x15', ENDKEY => '\x16'} failed, transitioning from OPENING to > FAILED_OPEN in ZK, expecting version 28 > 2014-10-13 19:25:21,108 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-2] > zookeeper.ZKAssign: regionserver:60020-0x149093fc1c80001, > quorum=localhost:2182, baseZNode=/hbase Transitioning > 4592fbc9db5e6eb05c812dcd81f5fa4d from RS_ZK_REGION_OPE > NING to RS_ZK_REGION_FAILED_OPEN > 2014-10-13 19:25:21,111 DEBUG [RS_OPEN_REGION-A01101303447-V1:60020-2] > zookeeper.ZKAssign: regionserver:60020-0x149093fc1c80001, > quorum=localhost:2182, baseZNode=/hbase Transitioned node > 4592fbc9db5e6eb05c812dcd81f5fa4d from RS_ZK_REGION > _OPENING to RS_ZK_REGION_FAILED_OPEN > 2014-10-13 19:25:21,113 ERROR [RS_OPEN_REGION-A01101303447-V1:60020-0] > handler.OpenRegionHandler: Failed open of > region=EVENT,\x1B,1413186838852.b45c35d23b23fda5643ec5d79083488e., starting > to roll back the global memstore size. > java.lang.NullPointerException > After deleteing the 'recovered.edits' in the 'EVENT' table's region, the > hbase table can be scanned. so i think the reason maybe that: > after restart hbase, the regionserver begin opening the region and replaying > WAL, when replaying the EVENT's WAL, because EVENT has a index table > IDX_DATE_HOUR_X, so the replay process should operate IDX_DATE_HOUR_X, but at > this moment, the IDX_DATE_HOUR_X table's region is in OPENING stats, it is't > unavaiabled, so the EVENT replaying process Time and time again to retry and > throw Exception aging and again. > if I disable EVENT first, others hbase table recory successfully, but when i > enable EVENT, the region log occur java.lang.NullPointerException, this make > the region transition to OPEN_FAILED. -- This message was sent by Atlassian JIRA (v6.3.4#6332)