If restarting the entire dfs helped, then you might be hitting http://issues.apache.org/jira/browse/HADOOP-3633
When we were running 0.17.1, I had to grep for OutOfMemory on the datanode ".out" files at least everyday and restart those zombie datanodes. Once datanode gets to this state, as Konstantin mentioned in the Jira, " it appears to happily sending heartbeats, but in fact cannot do any data processing because the server thread is dead." Koji -----Original Message----- From: Piotr Kozikowski [mailto:[EMAIL PROTECTED] Sent: Friday, August 08, 2008 5:42 PM To: core-user@hadoop.apache.org Subject: Re: java.io.IOException: Could not get block locations. Aborting... Thank you for the reply. Apparently whatever it was is now gone after a hadoop restart, but I'll keep that in mind should it happen again. Piotr On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote: > It is possible that your namenode is overloaded and is not able to > respond to RPC requests from clients. Please check the namenode logs > to see if you see lines of the form "discarding calls...". > > dhrua > > On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov > <[EMAIL PROTECTED]> wrote: > > I come across the same issue and also with hadoop 0.17.1 > > > > would be interesting if someone say the cause of the issue. > > > > Alex > > > > 2008/8/8 Steve Loughran <[EMAIL PROTECTED]> > > > >> Piotr Kozikowski wrote: > >> > >>> Hi there: > >>> > >>> We would like to know what are the most likely causes of this sort of > >>> error: > >>> > >>> Exception closing > >>> file > >>> /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311 534_0055_m_000022_0/part-00022 > >>> java.io.IOException: Could not get block locations. Aborting... > >>> at > >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS Client.java:2080) > >>> at > >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja va:1702) > >>> at > >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie nt.java:1818) > >>> > >>> Our map-reduce job does not fail completely but over 50% of the map tasks > >>> fail with this same error. > >>> We recently migrated our cluster from 0.16.4 to 0.17.1, previously we > >>> didn't have this problem using the same input data in a similar map-reduce > >>> job > >>> > >>> Thank you, > >>> > >>> Piotr > >>> > >>> > >> When I see this, its because the filesystem isnt completely up: there are > >> no locations for a specific file, meaning the client isn't getting back the > >> names of any datanodes holding the data from the name nodes. > >> > >> I've got a patch in JIRA that prints out the name of the file in question, > >> as that could be useful. > >> > > > > > > > > -- > > Best Regards > > Alexander Aristov > >