Hi All, I am running a process to extract feature vectors from images and write as SequenceFiles on HDFS. My dataset of images is very large (~46K images).
The writing process worked all fine for half of the process but all of sudden following problem occured: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recreate file. On investigating, I found that the error started generating after a LeaseExpirationException : dfs.server.namenode.LeaseExpiredException: No lease on /mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 File is not open for writing. [Lease. Holder: DFSClient_148861898, pendingcreates: 1] org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 File is not open for writing. [Lease. Holder: DFSClient_148861898, pendingcreates: 1] The process has already taken me 18-19 hrs and it would be very tough for me to restart the whole process. Is there anything which can be done to fix it run-time ? ( may be force-deleting the concerned file '/mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817' on HDFS ?) Regards Lokendra *Detailed Log:* 2011-05-25 04:03:32,160 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310, call addBlock(/mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817, DFSClient_148861898) from 10.118.177.84:48372: error: org.apache.hadoop.h dfs.server.namenode.LeaseExpiredException: No lease on /mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 File is not open for writing. [Lease. Holder: DFSClient_148861898, pendingcreates: 1] org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 File is not open for writing. [Lease. Holder: DFSClient_148861898, pendingcreates: 1] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1340) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1323) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1251) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2011-05-25 04:03:32,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_-4965605132591592561 is added to invalidSet of 10.118.177.84:50010 2011-05-25 04:03:32,207 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=johndoe,johndoe ip=/10.118.177.84 cmd=delete src=/mnt/tmp/sirs-dataset-k10000/feature-repo/imageList dst=null perm=null 2011-05-25 04:03:32,212 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=johndoe,johndoe ip=/10.118.177.84 cmd=create src=/mnt/tmp/sirs-dataset-k10000/feature-repo/imageList dst=null perm=johndoe :supergroup:rw-r--r-- 2011-05-25 04:03:32,215 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /mnt/tmp/sirs-dataset-k10000/feature-repo/imageList. blk_6557263107434203565_332695 2011-05-25 04:03:32,695 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 10.118.177.84:50010storage DS-199406591-10.118.177.84-50010-1306165949296 2011-05-25 04:03:32,696 INFO org.apache.hadoop.net.NetworkTopology: Removing a node: /default-rack/10.118.177.84:50010 2011-05-25 04:03:32,696 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/10.118.177.84:50010 2011-05-25 04:03:33,045 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.100.245.5:50010 is added to blk_6557263107434203565_332695 size 11746349 2011-05-25 04:03:33,045 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /mnt/tmp/sirs-dataset-k10000/feature-repo/imageList is closed by DFSClient_148861898 2011-05-25 04:03:33,404 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=johndoe,johndoe ip=/10.118.177.84 cmd=delete src=/mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 dst=null perm =null 2011-05-25 04:03:33,405 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=johndoe,johndoe ip=/10.118.177.84 cmd=create src=/mnt/tmp/sirs-dataset-k10000/feature-repo/features/109817 dst=null perm =johndoe:supergroup:rw-r--r-- 2011-05-25 04:03:33,468 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recreate file. 2011-05-25 04:03:33,469 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 54310, call create(/mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817, rwxr-xr-x, DFSClient_148861898, true, 1, 67108864) from 10.118.177.84:48372 : error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recreate file. org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recre ate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) 2011-05-25 04:03:33,709 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage from 10.118.177.84 2011-05-25 04:03:33,709 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 10 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 7 SyncTimes(ms): 220 2011-05-25 04:03:35,165 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.118.177.84:50010 to delete blk_-4965605132591592561_332692 2011-05-25 04:04:33,481 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.startFile: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recreate file. 2011-05-25 04:04:33,481 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 54310, call create(/mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817, rwxr-xr-x, DFSClient_148861898, true, 1, 67108864) from 10.118.177.84:48372 : error: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recreate file. org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /mnt/tmp/sirs-dataset-k10000/feature-repo/metadata/109817 for DFSClient_148861898 on client 10.118.177.84 because current leaseholder is trying to recre ate file. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981) at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) Regards Lokendra