If restarting the entire dfs helped, then you might be hitting 
http://issues.apache.org/jira/browse/HADOOP-3633

When we were running 0.17.1, I had to grep for OutOfMemory on the
datanode ".out" files at least everyday and restart those zombie
datanodes.

Once datanode gets to this state, as Konstantin mentioned in the Jira, 

" it appears to happily sending heartbeats, but in fact cannot
do any data processing because the server thread is dead."

Koji

-----Original Message-----
From: Piotr Kozikowski [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 08, 2008 5:42 PM
To: core-user@hadoop.apache.org
Subject: Re: java.io.IOException: Could not get block locations.
Aborting...

Thank you for the reply. Apparently whatever it was is now gone after a
hadoop restart, but I'll keep that in mind should it happen again.

Piotr

On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote:
> It is possible that your namenode is overloaded and is not able to
> respond to RPC requests from clients. Please check the namenode logs
> to see if you see lines of the form "discarding calls...".
> 
> dhrua
> 
> On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
> <[EMAIL PROTECTED]> wrote:
> > I come across the same issue and also with hadoop 0.17.1
> >
> > would be interesting if someone say the cause of the issue.
> >
> > Alex
> >
> > 2008/8/8 Steve Loughran <[EMAIL PROTECTED]>
> >
> >> Piotr Kozikowski wrote:
> >>
> >>> Hi there:
> >>>
> >>> We would like to know what are the most likely causes of this sort
of
> >>> error:
> >>>
> >>> Exception closing
> >>> file
> >>>
/data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311
534_0055_m_000022_0/part-00022
> >>> java.io.IOException: Could not get block locations. Aborting...
> >>>        at
> >>>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFS
Client.java:2080)
> >>>        at
> >>>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.ja
va:1702)
> >>>        at
> >>>
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClie
nt.java:1818)
> >>>
> >>> Our map-reduce job does not fail completely but over 50% of the
map tasks
> >>> fail with this same error.
> >>> We recently migrated our cluster from 0.16.4 to 0.17.1, previously
we
> >>> didn't have this problem using the same input data in a similar
map-reduce
> >>> job
> >>>
> >>> Thank you,
> >>>
> >>> Piotr
> >>>
> >>>
> >> When I see this, its because the filesystem isnt completely up:
there are
> >> no locations for a specific file, meaning the client isn't getting
back the
> >> names of any datanodes holding the data from the name nodes.
> >>
> >> I've got a patch in JIRA that prints out the name of the file in
question,
> >> as that could be useful.
> >>
> >
> >
> >
> > --
> > Best Regards
> > Alexander Aristov
> >

Reply via email to