[ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549793
 ] 

Jim Kellerman commented on HADOOP-2283:
---------------------------------------

The only time that a meta scanner should try to recover a log is when the 
master is starting. 
Since we now have the notion of not assigning any regions other than meta 
regions (including
the root region) until they are all on-line, perhaps we could key of that and 
disable log recovery
in the meta scanner once we start to assign user regions.

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2283
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2283
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because curren        t leaseholder is trying to recreate file.
>   34976     at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977     at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978     at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979     at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981     at java.lang.reflect.Method.invoke(Method.java:597)
>   34982     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986     at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987     at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989     at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990     at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to