Hi again,

The "Could not get block locations" exception was gone after a Hadoop
restart, but further down the road our job failed again. I checked the
logs for "discarding calls" and found a bunch of them, plus the namenode
appeared to have a load spike at that time, so it seems it is getting
overloaded. Do you know how can we prevent this? Currently the namenode
machine is not running anything but the namenode and the secondary
namenode, and the cluster only has 16 machines.

Thank you

Piotr

On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote:
> It is possible that your namenode is overloaded and is not able to
> respond to RPC requests from clients. Please check the namenode logs
> to see if you see lines of the form "discarding calls...".
> 
> dhrua
> 
> On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
> <[EMAIL PROTECTED]> wrote:
> > I come across the same issue and also with hadoop 0.17.1
> >
> > would be interesting if someone say the cause of the issue.
> >
> > Alex
> >
> > 2008/8/8 Steve Loughran <[EMAIL PROTECTED]>
> >
> >> Piotr Kozikowski wrote:
> >>
> >>> Hi there:
> >>>
> >>> We would like to know what are the most likely causes of this sort of
> >>> error:
> >>>
> >>> Exception closing
> >>> file
> >>> /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_000022_0/part-00022
> >>> java.io.IOException: Could not get block locations. Aborting...
> >>>        at
> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
> >>>        at
> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
> >>>        at
> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
> >>>
> >>> Our map-reduce job does not fail completely but over 50% of the map tasks
> >>> fail with this same error.
> >>> We recently migrated our cluster from 0.16.4 to 0.17.1, previously we
> >>> didn't have this problem using the same input data in a similar map-reduce
> >>> job
> >>>
> >>> Thank you,
> >>>
> >>> Piotr
> >>>
> >>>
> >> When I see this, its because the filesystem isnt completely up: there are
> >> no locations for a specific file, meaning the client isn't getting back the
> >> names of any datanodes holding the data from the name nodes.
> >>
> >> I've got a patch in JIRA that prints out the name of the file in question,
> >> as that could be useful.
> >>
> >
> >
> >
> > --
> > Best Regards
> > Alexander Aristov
> >

Reply via email to