Forgot to add that from a master UI perspective here's where it is
stuck at:

$ curl http://master:60010/master-status?format=json
[{"statustimems":-1,"status":"Waiting for distributed tasks to finish.
scheduled=5 done=0
error=0","starttimems":1320731070095,"description":"Doing distributed
log split in 
[hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/.logs/ip-10-114-225-185.ec2.internal,60020,1320726988138-splitting]","state":"RUNNING","statetimems":-1}]

Regioserver finally dies and if I restart it manually the split seems to be
finishing up as intended.

Hope this helps.

Thanks,
Roman.

On Mon, Nov 7, 2011 at 10:16 PM, Roman Shaposhnik <[email protected]> wrote:
> With HBASE-4754 fix in place I can get further in my testing,
> but it still fails :-(
>
> Here's how it does it this time. It loads OK, but then when it
> needs to split here's what happens:
>
> 11/11/08 00:44:30 INFO handler.ServerShutdownHandler: Splitting logs
> for ip-10-114-225-185.ec2.internal,60020,1320726988138
> 11/11/08 00:44:30 INFO master.SplitLogManager: dead splitlog worker
> ip-10-114-225-185.ec2.internal,60020,1320726988138
> 11/11/08 00:44:30 INFO master.SplitLogManager: started splitting logs
> in 
> [hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/.logs/ip-10-114-225-185.ec2.internal,60020,1320726988138-splitting]
> 11/11/08 00:44:31 ERROR master.HMaster: Region server
> ^@^@ip-10-114-225-185.ec2.internal,60020,1320726988138 reported a
> fatal error:
> ABORTING region server
> ip-10-114-225-185.ec2.internal,60020,1320726988138: Unhandled
> exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> rejected; currently processing
> ip-10-114-225-185.ec2.internal,60020,1320726988138 as dead server
>        at 
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:222)
>        at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:148)
>        at 
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:750)
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1306)
>
> That's on the master side, on the regionserver side, it looks really
> weird. It basically hums along
> doing the split and then at some point, there's this:
>
> 11/11/08 00:43:40 INFO regionserver.Store: Added
> hdfs://ip-10-84-202-94.ec2.internal:17020/hbase/TestLoadAndVerify_1320729464658/8bd8387431feec2b09983693dfac950b/f1/4fc67a93e580402190b5c8a72820f665,
> entries=82049, sequenceid=142942, memsize=18.1m, filesize=4.4m
> 11/11/08 00:43:40 INFO regionserver.HRegion: Finished memstore flush
> of ~18.4m for region
> TestLoadAndVerify_1320729464658,<\xA1\xAF(k\xCA\x1A\xEA,1320729465485.8bd8387431feec2b09983693dfac950b.
> in 829ms, sequenceid=142942, compaction requested=false
> 11/11/08 00:44:31 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x133817270190001, likely server has closed
> socket, closing socket connection and attempting reconnect
> 11/11/08 00:44:31 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x133817270190004, likely server has closed
> socket, closing socket connection and attempting reconnect
> 11/11/08 00:44:31 WARN util.Sleeper: We slept 38891ms instead of
> 3000ms, this is likely due to a long garbage collecting pause and it's
> usually bad, see
> http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A9
> 11/11/08 00:44:31 FATAL regionserver.HRegionServer: ABORTING region
> server ip-10-114-225-185.ec2.internal,60020,1320726988138: Unhandled
> exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT
> rejected; currently processing
> ip-10-114-225-185.ec2.internal,60020,1320726988138 as dead server
>        at 
> org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:222)
>        at 
> org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:148)
>        at 
> org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:750)
>        at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>        at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1306)
>
>
> Thanks,
> Roman.
>

Reply via email to