[
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546003
]
stack commented on HADOOP-2283:
-------------------------------
Also seeing this compacting:
{code}
2007-11-27 04:11:04,173 DEBUG hbase.HStore - started compaction of 4 files in
/hbase/compaction.dir/hregion_-1572125711/cookie
2007-11-27 04:11:04,193 DEBUG fs.DFSClient - Failed to connect to
/38.99.76.30:50010:java.io.IOException: Got error in response to OP_READ_BLOCK
at
org.apache.hadoop.dfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:753)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:979)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1075)
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1027)
at java.io.FilterInputStream.read(FilterInputStream.java:66)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at org.apache.hadoop.hbase.HStoreFile.loadInfo(HStoreFile.java:590)
at org.apache.hadoop.hbase.HStore.compact(HStore.java:1004)
at org.apache.hadoop.hbase.HRegion.compactStores(HRegion.java:745)
at org.apache.hadoop.hbase.HRegion.compactIfNeeded(HRegion.java:704)
at
org.apache.hadoop.hbase.HRegionServer$Compactor.run(HRegionServer.java:378)
{code}
Nothing in namenode log about OP_READ_BLOCK complaint or even errors other than
a few of these:
{code}
2007-11-27 01:29:00,226 WARN dfs.FSNamesystem - java.io.IOException: Namenode
is not expecting an new image UPLOAD_START
2007-11-27 01:29:00,496 WARN dfs.FSNamesystem - java.io.IOException: Namenode
is not expecting an new image UPLOAD_START
{code}
> [hbase] Stuck replay of failed regionserver edits
> -------------------------------------------------
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
> Issue Type: Bug
> Components: contrib/hbase
> Reporter: stack
> Assignee: stack
>
> Looking in master for a cluster of ~90 regionservers, the regionserver
> carrying the ROOT went down (because it hadn't talked to the master in 30
> seconds).
> Master notices the downed regionserver because its lease timesout. It then
> goes to run the shutdown server sequence only splitting the regionserver's
> edit log, it gets stuck trying to split the second of three log files.
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
> 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException:
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client
> XX.XX.XX.XX because curren t leaseholder is trying to recreate file.
> 34976 at
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
> 34977 at
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
> 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
> 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
> 34980 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 34981 at java.lang.reflect.Method.invoke(Method.java:597)
> 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
> 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> 34984
> 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> 34986 at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 34987 at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 34989 at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
> 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are
> stuck in this eternal loop, ROOT never gets reallocated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.