Forgot to add that from a master UI perspective here's where it is stuck at:
$ curl http://master:60010/master-status?format=json [{"statustimems":-1,"status":"Waiting for distributed tasks to finish. scheduled=5 done=0 error=0","starttimems":1320731070095,"description":"Doing distributed log split in [hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/.logs/ip-10-114-225-185.ec2.internal,60020,1320726988138-splitting]","state":"RUNNING","statetimems":-1}] Regioserver finally dies and if I restart it manually the split seems to be finishing up as intended. Hope this helps. Thanks, Roman. On Mon, Nov 7, 2011 at 10:16 PM, Roman Shaposhnik <[email protected]> wrote: > With HBASE-4754 fix in place I can get further in my testing, > but it still fails :-( > > Here's how it does it this time. It loads OK, but then when it > needs to split here's what happens: > > 11/11/08 00:44:30 INFO handler.ServerShutdownHandler: Splitting logs > for ip-10-114-225-185.ec2.internal,60020,1320726988138 > 11/11/08 00:44:30 INFO master.SplitLogManager: dead splitlog worker > ip-10-114-225-185.ec2.internal,60020,1320726988138 > 11/11/08 00:44:30 INFO master.SplitLogManager: started splitting logs > in > [hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/.logs/ip-10-114-225-185.ec2.internal,60020,1320726988138-splitting] > 11/11/08 00:44:31 ERROR master.HMaster: Region server > ^@^@ip-10-114-225-185.ec2.internal,60020,1320726988138 reported a > fatal error: > ABORTING region server > ip-10-114-225-185.ec2.internal,60020,1320726988138: Unhandled > exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT > rejected; currently processing > ip-10-114-225-185.ec2.internal,60020,1320726988138 as dead server > at > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:222) > at > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:148) > at > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:750) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1306) > > That's on the master side, on the regionserver side, it looks really > weird. It basically hums along > doing the split and then at some point, there's this: > > 11/11/08 00:43:40 INFO regionserver.Store: Added > hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/TestLoadAndVerify_1320729464658/8bd8387431feec2b09983693dfac950b/f1/4fc67a93e580402190b5c8a72820f665, > entries=82049, sequenceid=142942, memsize=18.1m, filesize=4.4m > 11/11/08 00:43:40 INFO regionserver.HRegion: Finished memstore flush > of ~18.4m for region > TestLoadAndVerify_1320729464658,<\xA1\xAF(k\xCA\x1A\xEA,1320729465485.8bd8387431feec2b09983693dfac950b. > in 829ms, sequenceid=142942, compaction requested=false > 11/11/08 00:44:31 INFO zookeeper.ClientCnxn: Unable to read additional > data from server sessionid 0x133817270190001, likely server has closed > socket, closing socket connection and attempting reconnect > 11/11/08 00:44:31 INFO zookeeper.ClientCnxn: Unable to read additional > data from server sessionid 0x133817270190004, likely server has closed > socket, closing socket connection and attempting reconnect > 11/11/08 00:44:31 WARN util.Sleeper: We slept 38891ms instead of > 3000ms, this is likely due to a long garbage collecting pause and it's > usually bad, see > http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9 > 11/11/08 00:44:31 FATAL regionserver.HRegionServer: ABORTING region > server ip-10-114-225-185.ec2.internal,60020,1320726988138: Unhandled > exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT > rejected; currently processing > ip-10-114-225-185.ec2.internal,60020,1320726988138 as dead server > at > org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:222) > at > org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:148) > at > org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:750) > at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1306) > > > Thanks, > Roman. >
