[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed (Failure was in the unrelated TSRE). Resolving. > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > - > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 0.16.0 > > Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because current leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Status: Patch Available (was: In Progress) Passes tests locally. Giving to hudson. Also, opened an issue to address the issue Jim raised above where meta scanner should only do log splitting on startup > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > - > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 0.16.0 > > Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because current leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Attachment: 2283.patch > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > - > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 0.16.0 > > Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because current leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Status: In Progress (was: Patch Available) > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > - > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 0.16.0 > > Attachments: 2283.patch, compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because current leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Attachment: 2283.patch I wrote a unit test. That found what the problem was. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLog.java (splitLog): The key we were using for logWriters was being changed on us by the call to HStoreKey.getRegionName; use a copy of the region name instead as logWriters key. M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHLog.java Refactor. Added setup and teardown code from testAppend into junit setup and tearDown methods. (testSplit): Run a split w/ multiple log files. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java Add try/finally around open of a Reader file so we close if problem. > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > - > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 0.16.0 > > Attachments: 2283.patch, compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because current leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)
[ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -- Priority: Minor (was: Major) Summary: [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits) (was: [hbase] Stuck replay of failed regionserver edits) AlreadyBeingCreatedException was seen last night in a Bryan Duxbury upload (Added ABCE to title) Committed the compaction.patch as part of HADOOP-2357. > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed > regionserver edits) > - > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 0.16.0 > > Attachments: compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver > carrying the ROOT went down (because it hadn't talked to the master in 30 > seconds). > Master notices the downed regionserver because its lease timesout. It then > goes to run the shutdown server sequence only splitting the regionserver's > edit log, it gets stuck trying to split the second of three log files. > Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending > operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: > org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file > /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client > XX.XX.XX.XX because current leaseholder is trying to recreate file. > 34976 at > org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at > org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > 34986 at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are > stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.