[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)

2007-12-10 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed (Failure was in the unrelated TSRE).  Resolving.

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
>  Issue Type: Bug
>  Components: contrib/hbase
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because current leaseholder is trying to recreate file.
>   34976 at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977 at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981 at java.lang.reflect.Method.invoke(Method.java:597)
>   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986 at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987 at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989 at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)

2007-12-10 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Status: Patch Available  (was: In Progress)

Passes tests locally.  Giving to hudson.

Also, opened an issue to address the issue Jim raised above where meta scanner 
should only do log splitting on startup

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
>  Issue Type: Bug
>  Components: contrib/hbase
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because current leaseholder is trying to recreate file.
>   34976 at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977 at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981 at java.lang.reflect.Method.invoke(Method.java:597)
>   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986 at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987 at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989 at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)

2007-12-10 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Attachment: 2283.patch

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
>  Issue Type: Bug
>  Components: contrib/hbase
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because current leaseholder is trying to recreate file.
>   34976 at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977 at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981 at java.lang.reflect.Method.invoke(Method.java:597)
>   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986 at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987 at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989 at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)

2007-12-10 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Status: In Progress  (was: Patch Available)

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
>  Issue Type: Bug
>  Components: contrib/hbase
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: 2283.patch, compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because current leaseholder is trying to recreate file.
>   34976 at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977 at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981 at java.lang.reflect.Method.invoke(Method.java:597)
>   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986 at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987 at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989 at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)

2007-12-10 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Attachment: 2283.patch

I wrote a unit test.  That found what the problem was.

M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HLog.java
(splitLog): The key we were using for logWriters was being changed
on us by the call to HStoreKey.getRegionName; use a copy of the
region name instead as logWriters key.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHLog.java
Refactor.  Added setup and teardown code from testAppend into junit
setup and tearDown methods.
(testSplit): Run a split w/ multiple log files.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStoreFile.java
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
Add try/finally around open of a Reader file so we close if problem.

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
>  Issue Type: Bug
>  Components: contrib/hbase
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: 2283.patch, compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because current leaseholder is trying to recreate file.
>   34976 at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977 at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981 at java.lang.reflect.Method.invoke(Method.java:597)
>   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986 at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987 at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989 at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits)

2007-12-05 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2283:
--

Priority: Minor  (was: Major)
 Summary: [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
regionserver edits)  (was: [hbase] Stuck replay of failed regionserver edits)

AlreadyBeingCreatedException was seen last night in a Bryan Duxbury upload 
(Added ABCE to title)

Committed the compaction.patch as part of HADOOP-2357.

> [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed 
> regionserver edits)
> -
>
> Key: HADOOP-2283
> URL: https://issues.apache.org/jira/browse/HADOOP-2283
> Project: Hadoop
>  Issue Type: Bug
>  Components: contrib/hbase
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 0.16.0
>
> Attachments: compaction.patch, OP_READ.patch
>
>
> Looking in master for a cluster of ~90 regionservers, the regionserver 
> carrying the ROOT went down (because it hadn't talked to the master in 30 
> seconds).
> Master notices the downed regionserver because its lease timesout. It then 
> goes to run the shutdown server sequence only splitting the regionserver's 
> edit log, it gets stuck trying to split the second of three log files. 
> Eventually, after ~5minutes, the second log split throws:
> 34974 2007-11-26 01:21:23,999 WARN  hbase.HMaster - Processing pending 
> operations: ProcessServerShutdown of XX.XX.XX.XX:60020
>   34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: 
> org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file 
> /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client 
> XX.XX.XX.XX because current leaseholder is trying to recreate file.
>   34976 at 
> org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848)
>   34977 at 
> org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804)
>   34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276)
>   34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>   34980 at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   34981 at java.lang.reflect.Method.invoke(Method.java:597)
>   34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379)
>   34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>   34984 
>   34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>   34986 at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   34987 at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   34989 at 
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>   34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094)
> And so on every 5 minutes.
> Because the regionserver that went down had ROOT region, and because we are 
> stuck in this eternal loop, ROOT never gets reallocated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.