[ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255714#comment-13255714 ]
Lars Hofhansl commented on HBASE-5545: -------------------------------------- If this gets done before HBASE-5782 I'll pull it in, otherwise I don't think I should hold up the release for this. Sounds good? > region can't be opened for a long time. Because the creating File failed. > ------------------------------------------------------------------------- > > Key: HBASE-5545 > URL: https://issues.apache.org/jira/browse/HBASE-5545 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.90.6 > Reporter: gaojinchao > Assignee: gaojinchao > Fix For: 0.90.7, 0.92.2, 0.94.1 > > > Scenario: > ------------ > 1. File is created > 2. But while writing data, all datanodes might have crashed. So writing data > will fail. > 3. Now even if close is called in finally block, close also will fail and > throw the Exception because writing data failed. > 4. After this if RS try to create the same file again, then > AlreadyBeingCreatedException will come. > Suggestion to handle this scenario. > --------------------------- > 1. Check for the existence of the file, if exists delete the file and create > new file. > Here delete call for the file will not check whether the file is open or > closed. > Overwrite Option: > ---------------- > 1. Overwrite option will be applicable if you are trying to overwrite a > closed file. > 2. If the file is not closed, then even with overwrite option Same > AlreadyBeingCreatedException will be thrown. > This is the expected behaviour to avoid the Multiple clients writing to same > file. > Region server logs: > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to > create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo > for > DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 > on client 158.1.132.19 because current leaseholder is trying to recreate file. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) > at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) > at org.apache.hadoop.ipc.Client.call(Client.java:961) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) > at $Proxy6.create(Unknown Source) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at $Proxy6.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3643) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) > at > org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) > at > org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) > at > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > [2012-03-07 20:51:45,858] [WARN ] > [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] > [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the > method call: public abstract void > org.apache.hadoop.hdfs.protocol.ClientProtocol.create(java.lang.String,org.apache.hadoop.fs.permission.FsPermission,java.lang.String,boolean,boolean,short,long) > throws java.io.IOException with arguments of length: 7. The exisiting > ActiveServerConnection is: > ActiveServerConnectionInfo: > Metadata:158-1-131-48/158.1.132.19:9000 > Version:145720623220907 > [2012-03-07 20:51:45,872] [DEBUG] > [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] > [org.apache.hadoop.hbase.zookeeper.ZKAssign 849] > regionserver:20020-0x135ec32b39e0002-0x135ec32b39e0002 Successfully > transitioned node 91bf3e6f8adb2e4b335f061036353126 from M_ZK_REGION_OFFLINE > to RS_ZK_REGION_OPENING > [2012-03-07 20:51:45,873] [DEBUG] > [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] > [org.apache.hadoop.hbase.regionserver.HRegion 2649] Opening region: REGION => > {NAME => 'test1,00088613810,1331112369360.91bf3e6f8adb2e4b335f061036353126.', > STARTKEY => '00088613810', ENDKEY => '00088613815', ENCODED => > 91bf3e6f8adb2e4b335f061036353126, TABLE => {{NAME => 'test1', FAMILIES => > [{NAME => 'value', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS > => '3', COMPRESSION => 'GZ', TTL => '86400', BLOCKSIZE => '655360', IN_MEMORY > => 'false', BLOCKCACHE => 'true'}]}} > [2012-03-07 20:51:45,873] [DEBUG] > [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] > [org.apache.hadoop.hbase.regionserver.HRegion 316] Instantiated > test1,00088613810,1331112369360.91bf3e6f8adb2e4b335f061036353126. > [2012-03-07 20:51:45,874] [ERROR] > [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] [ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira