Re: Error reading task output
Aaron Kimball wrote: Cam, This isn't Hadoop-specific, it's how Linux treats its network configuration. If you look at /etc/host.conf, you'll probably see a line that says "order hosts, bind" -- this is telling Linux's DNS resolution library to first read your /etc/hosts file, then check an external DNS server. You could probably disable local hostfile checking, but that means that every time a program on your system queries the authoritative hostname for "localhost", it'll go out to the network. You'll probably see a big performance hit. The better solution, I think, is to get your nodes' /etc/hosts files squared away. I agree You only need to do so once :) No, you need to detect whenever the Linux networking stack has decided to add new entries to resolv.conf or /etc/hosts and detect when they are inappropriate. Which is a tricky thing to do as there are some cases where you may actually be grateful that someone in the debian codebase decided that adding the local hostname as 127.0.0.1 is actually a feature. I ended up writing a new SmartFrog component that can be configured to fail to start if the network is a mess, which is something worth pushing out. as part of hadoop diagnostics, this test would be one of the things to deal with and at least warn on. "your hostname is local, you will not be visible over the network". -steve
Re: Error reading task output
Cam Macdonell wrote: Well, for future googlers, I'll answer my own post. Watch our for the hostname at the end of "localhost" lines on slaves. One of my slaves was registering itself as "localhost.localdomain" with the jobtracker. Is there a way that Hadoop could be made to not be so dependent on /etc/hosts, but on more dynamic hostname resolution? DNS is trouble in Java; there are some (outstanding) bugreps/hadoop patches on the topic, mostly showing up on a machine of mine with a bad hosts entry. I also encountered some fun last month with ubuntu linux adding the local hostname to /etc/hosts along the 127.0.0.1 entry, which is precisely what you dont want for a cluster of vms with no DNS at all. This sounds like your problem too, in which case I have shared your pain http://www.1060.org/blogxter/entry?publicid=121ED68BB21DB8C060FE88607222EB52
Re: Error reading task output
Cam, This isn't Hadoop-specific, it's how Linux treats its network configuration. If you look at /etc/host.conf, you'll probably see a line that says "order hosts, bind" -- this is telling Linux's DNS resolution library to first read your /etc/hosts file, then check an external DNS server. You could probably disable local hostfile checking, but that means that every time a program on your system queries the authoritative hostname for "localhost", it'll go out to the network. You'll probably see a big performance hit. The better solution, I think, is to get your nodes' /etc/hosts files squared away. You only need to do so once :) -- Aaron On Thu, Apr 16, 2009 at 11:31 AM, Cam Macdonell wrote: > Cam Macdonell wrote: > >> >> Hi, >> >> I'm getting the following warning when running the simple wordcount and >> grep examples. >> >> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id : >> attempt_200904151649_0001_m_19_0, Status : FAILED >> Too many fetch-failures >> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task >> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stdout >> >> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task >> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stderr >> >> >> The only advice I could find from other posts with similar errors is to >> setup /etc/hosts with all slaves and the host IPs. I did this, but I still >> get the warning above. The output seems to come out alright however (I >> guess that's why it is a warning). >> >> I tried running a wget on the http:// address in the warning message and >> I get the following back >> >> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required. >> >> So perhaps the wrong task ID is being passed to the http request. Any >> ideas on what can get rid of these warnings? >> >> Thanks, >> Cam >> > > Well, for future googlers, I'll answer my own post. Watch our for the > hostname at the end of "localhost" lines on slaves. One of my slaves was > registering itself as "localhost.localdomain" with the jobtracker. > > Is there a way that Hadoop could be made to not be so dependent on > /etc/hosts, but on more dynamic hostname resolution? > > Cam >
Re: Error reading task output
Cam Macdonell wrote: Hi, I'm getting the following warning when running the simple wordcount and grep examples. 09/04/15 16:54:16 INFO mapred.JobClient: Task Id : attempt_200904151649_0001_m_19_0, Status : FAILED Too many fetch-failures 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stdout 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stderr The only advice I could find from other posts with similar errors is to setup /etc/hosts with all slaves and the host IPs. I did this, but I still get the warning above. The output seems to come out alright however (I guess that's why it is a warning). I tried running a wget on the http:// address in the warning message and I get the following back 2009-04-15 16:53:46 ERROR 400: Argument taskid is required. So perhaps the wrong task ID is being passed to the http request. Any ideas on what can get rid of these warnings? Thanks, Cam Well, for future googlers, I'll answer my own post. Watch our for the hostname at the end of "localhost" lines on slaves. One of my slaves was registering itself as "localhost.localdomain" with the jobtracker. Is there a way that Hadoop could be made to not be so dependent on /etc/hosts, but on more dynamic hostname resolution? Cam