Can you collect region server log from sjc1-eng-perf-g1-grid03.carrieriq.com?
You can pastebin portion of region server log related to usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. after anonymization. Cheers On Sat, Jul 27, 2013 at 5:47 PM, Vladimir Rodionov <vrodio...@carrieriq.com>wrote: > Nope. this seems to be very serious issue > > When I tried to recreate 'usertable' I got the same issue again: > > > 2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x54022944d180000 Creating (or updating) unassigned node for > a386becc8860c810e33bb9c9d81482bc with OFFLINE state > 2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping > IPC Server Responder > 2013-07-28 00:35:40,747 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan > for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. > destination server is sjc1-eng-perf-g1-grid04.carrieriq.com > ,60020,1374969681440 > 2013-07-28 00:35:40,748 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition > plan was found (or we are ignoring an existing plan) for > usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so generated > a random one; > hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67., src=, > dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20 > (online=20, available=19) available servers > 2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped > SelectChannelConnector@0.0.0.0:60010 > 2013-07-28 00:35:40,749 DEBUG > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED > event for 16938dcb9c3bb52a46ffb7b10fab3c57 > 2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: > Master server abort: loaded coprocessors are: [] > 2013-07-28 00:35:40,749 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; > was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57. > state=CLOSED, ts=1374971740713, server= > sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434 > 2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > master:60000-0x54022944d180000 Creating (or updating) unassigned node for > 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state > 2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: > Unexpected state : > usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. > state=PENDING_OPEN, ts=1374971740749, server= > sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot > transit it to OFFLINE. > java.lang.IllegalStateException: Unexpected state : > usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. > state=PENDING_OPEN, ts=1374971740749, server= > sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot > transit it to OFFLINE. > at > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > at > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: > Aborting > > > Master aborted. > > This is what I ran: > > create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY', > BLOCKCACHE => true}, { SPLITS => ['user', 'user05', > 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95' > ]} > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com > > ________________________________________ > From: Vladimir Rodionov > Sent: Saturday, July 27, 2013 5:08 PM > To: dev@hbase.apache.org > Subject: RE: Master aborts on start up - URGENT > > OK, I managed to fix the issue and minimize the damage. > > The reason why OfflineMetaRepair failed to fix .META. was because there > were inconsistencies in one of the tables > and the tool refused to do META repair. I had to physically remove this > table in HDFS and then I re-ran the tool > and successfully repaired META. > > > > table and > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com > > ________________________________________ > From: Vladimir Rodionov > Sent: Saturday, July 27, 2013 4:21 PM > To: dev@hbase.apache.org > Subject: Master aborts on start up - URGENT > > This may be related to : > > https://issues.apache.org/jira/browse/HBASE-8912 > > > It has started when I tried to install and run YCSB. I have created > 'usertable' and then tried to modify it couple times (added COMPRESSION), > HBase (0.94.6) stopped working (Master could not finish initialization) > > I stopped the cluster and physically removed /hbase/usertable directory as > well as all ZK local stores. Restarted. No success. > > I manually ran OfflineMetaRepair. Restarted. No success. This is FATAL > error in Master's log file. > > For some reason, OfflineMetaRepair did not fix missing 'usertable'. > > Please, advise. This is a development cluster with a large volume of data. > > > > 2013-07-27 23:08:56,504 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: The znode of region > TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. > has been deleted. > 2013-07-27 23:08:56,504 INFO > org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the > region > TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. > that was online on sjc1-eng-perf-g1-grid06.carrieriq.com > ,60020,1374966494222 > 2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: > Unexpected state : > usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. > state=PENDING_OPEN, ts=1374966536502, server= > sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot > transit it to OFFLINE. > java.lang.IllegalStateException: Unexpected state : > usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. > state=PENDING_OPEN, ts=1374966536502, server= > sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot > transit it to OFFLINE. > at > org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) > at > org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) > at > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) > at > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 2013-07-27 23:08:56,504 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: The znode of region > TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. > has been deleted. > 2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: > Aborting > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com > > ________________________________________ > From: stack (JIRA) [j...@apache.org] > Sent: Saturday, July 27, 2013 3:21 PM > To: dev@hbase.apache.org > Subject: [jira] [Created] (HBASE-9063) > TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState > fails > > stack created HBASE-9063: > ---------------------------- > > Summary: > TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState > fails > Key: HBASE-9063 > URL: https://issues.apache.org/jira/browse/HBASE-9063 > Project: HBase > Issue Type: Bug > Components: test > Reporter: stack > Assignee: Jimmy Xiang > > > > https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/ > > {code}java.lang.NullPointerException > at > org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314) > at > org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code} > > Hope you don't mind my assigning it to you Jimmy. Thought you might be > interested. > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or notificati...@carrieriq.com and > delete or destroy any copy of this message and its attachments. >