"Could not get block locations. Aborting..." exception

Bryan Duxbury Fri, 26 Sep 2008 16:36:30 -0700

Hey all.

We've been running into a very annoying problem pretty frequentlylately. We'll be running some job, for instance a distcp, and it'llbe moving along quite nicely, until all of the sudden, it sort offreezes up. It takes a while, and then we'll get an error like this one:

attempt_200809261607_0003_m_000002_0: Exception closing file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfileattempt_200809261607_0003_m_000002_0: java.io.IOException: Could notget block locations. Aborting...attempt_200809261607_0003_m_000002_0: atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143)attempt_200809261607_0003_m_000002_0: atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)attempt_200809261607_0003_m_000002_0: atorg.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)

At approximately the same time, we start seeing lots of these errorsin the namenode log:

2008-09-26 16:19:26,502 WARN org.apache.hadoop.dfs.StateChange: DIR*NameSystem.startFile: failed to create file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile forDFSClient_attempt_200809261607_0003_m_000002_1 on client 10.100.11.83because current leaseholder is trying to recreate file.2008-09-26 16:19:26,502 INFO org.apache.hadoop.ipc.Server: IPC Serverhandler 8 on 7276, call create(/tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile, rwxr-xr-x,DFSClient_attempt_200809261607_0003_m_000002_1, true, 3, 67108864)from 10.100.11.83:60056: error:org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to createfile /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile forDFSClient_attempt_200809261607_0003_m_000002_1 on client 10.100.11.83because current leaseholder is trying to recreate file.org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to createfile /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile forDFSClient_attempt_200809261607_0003_m_000002_1 on client 10.100.11.83because current leaseholder is trying to recreate file.at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:952)at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:903)

        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:284)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

Eventually, the job fails because of these errors. Subsequent jobruns also experience this problem and fail. The only way we've beenable to recover is to restart the DFS. It doesn't happen every time,but it does happen often enough that I'm worried.

Does anyone have any ideas as to why this might be happening? Ithought that https://issues.apache.org/jira/browse/HADOOP-2669 mightbe the culprit, but today we upgraded to hadoop 0.18.1 and theproblem still happens.


Thanks,

Bryan

"Could not get block locations. Aborting..." exception

Reply via email to