[ https://issues.apache.org/jira/browse/HDFS-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved HDFS-64. ------------------------- Resolution: Not A Problem This has gone stale, and given that we haven't seen this recently at all, looks like it may have been fixed inadvertently. > delete on dfs hung > ------------------ > > Key: HDFS-64 > URL: https://issues.apache.org/jira/browse/HDFS-64 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Devaraj Das > > I had a case where the JobTracker was trying to delete some files, as part of > Garbage Collect for a job, in a dfs directory. The thread hung and this is > the trace: > Thread 19 (IPC Server handler 5 on 57344): > State: WAITING > Blocked count: 137022 > Waited count: 336004 > Waiting on org.apache.hadoop.ipc.Client$Call@eb6238 > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:485) > org.apache.hadoop.ipc.Client.call(Client.java:683) > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source) > sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > java.lang.reflect.Method.invoke(Method.java:597) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source) > org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515) > > org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170) > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118) > org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114) > > org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635) > > org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387) > > org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348) > > org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565) > > org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032) > and it hung for an enormously long amount of time ~1 hour. > Not sure whether these will help: > I saw this message in the NameNode log around the time the delete was issued > by the JobTracker > 2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR* > FSDirectory.unprotectedDelete: failed to remove > /mapredsystem/ddas/mapredsystem/10091.{running.machine.com}/job_200805070458_0004 > because it does not exist > I also checked that the directory in question was actually there (and the job > couldn't have run without this directory being there). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira