Too many fetch-failures ERROR
Hi all. The error I encounter is so common, however, after 2 weeks of searching and following all solution, I still stuck at it. So, I hope that anyone can help me to overcome this issue :) First, I use Ubuntu 9.04 x86_64 and hadoop-0.20.2. I successfully setup for single node based on instruction of Michael G. Noll. Second, I setup Hadoop for multi nodes, following Noll's instruction, and encounter the error. This is my config files /etc/hosts 127.0.0.1localhost 127.0.1.1thailong-desktop #192.168.1.2 localhost #192.168.1.2 thailong-desktop # The following lines are desirable for IPv6 capable hosts #::1 localhost ip6-localhost ip6-loopback #fe00::0 ip6-localnet #ff00::0 ip6-mcastprefix #ff02::1 ip6-allnodes #ff02::2 ip6-allrouters #ff02::3 ip6-allhosts 192.168.1.4 node1 192.168.1.2 master core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/usr/local/hadoop-datastore/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://master:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuemaster:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration hdfs.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration I try to setup the Hadoop on single node again, but at this time, instead of using localhost, I set all value to master, which is the host name of the local machine, and the error is still there. It seems that there is a problem in mapred-site.xml, if I change mapred.job.tracker to localhost, or change the IP address of master in /etc/hosts to 127.0.1.1, the system runs with error. is there something that I missed? This problem has haunted me for weeks, any help from you is precious to me. Regards
Re: cluster under-utilization with Hadoop Fair Scheduler
Hi Abhishek, This behavior is improved by MAPREDUCE-706 I believe (not certain that that's the JIRA, but I know it's fixed in trunk fairscheduler). These patches are included in CDH3 (currently in beta) http://archive.cloudera.com/cdh/3/ In general, though, map tasks that are so short are not going to be very efficient - even with fast assignment there is some constant overhead per task. Thanks -Todd On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma absha...@usc.edu wrote: Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots). In one of my experiments, all the map tasks finish within a heartbeat interval of 3 seconds. I noticed that the maximum number of concurrently active map slots on my cluster never exceeds 100, and hence, the cluster utilization during my experiments never exceeds 50% even when large jobs with more than a 1000 maps are being executed. A look at the Fair Scheduler code (in particular, the assignTasks function) revealed the reason. As per my understanding, with the implementation in Hadoop 0.20.0, a TaskTracker is not assigned more than 1 map and 1 reduce task per heart beat. In my experiments, in every heart beat, each TT has 2 free map slots but is assigned only 1 map task, and hence, the utilization never goes beyond 50%. Of course, this (degenerate) case does not arise when map tasks take more than one 1 heart beat interval to finish. For example, I repeated the experiments with maps tasks taking close to 15 s to finish and noticed close to 100 % utilization when large jobs were executing. Why does the Fair Scheduler not assign more than one map task to a TT per heart beat? Is this done to spread the load uniformly across the cluster? I looked at assignTasks function in the default Hadoop scheduler (JobQueueTaskScheduler.java), and it does assign more than 1 map task per heart beat to a TT. It will be easy to change the Fair Scheduler to assign more than 1 map task to a TT per heart beat (I did that and achieved 100% utilization even with small map tasks). But I am wondering, if doing so will violate some fairness properties. Thanks, Abhishek -- Todd Lipcon Software Engineer, Cloudera
Re: Too many fetch-failures ERROR
Hi, I followed Michael G. Noll's blog post to set up a single node installation on my laptop. Sometimes I did encounter this error. I just used to restart hadoop and that used to fix it. But I don't know the exact reason behind this. Regards, Raghava. On Sun, Apr 11, 2010 at 6:05 AM, long thai thaithanhlong2...@gmail.comwrote: Hi all. The error I encounter is so common, however, after 2 weeks of searching and following all solution, I still stuck at it. So, I hope that anyone can help me to overcome this issue :) First, I use Ubuntu 9.04 x86_64 and hadoop-0.20.2. I successfully setup for single node based on instruction of Michael G. Noll. Second, I setup Hadoop for multi nodes, following Noll's instruction, and encounter the error. This is my config files /etc/hosts 127.0.0.1localhost 127.0.1.1thailong-desktop #192.168.1.2 localhost #192.168.1.2 thailong-desktop # The following lines are desirable for IPv6 capable hosts #::1 localhost ip6-localhost ip6-loopback #fe00::0 ip6-localnet #ff00::0 ip6-mcastprefix #ff02::1 ip6-allnodes #ff02::2 ip6-allrouters #ff02::3 ip6-allhosts 192.168.1.4 node1 192.168.1.2 master core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/usr/local/hadoop-datastore/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://master:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuemaster:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration hdfs.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration I try to setup the Hadoop on single node again, but at this time, instead of using localhost, I set all value to master, which is the host name of the local machine, and the error is still there. It seems that there is a problem in mapred-site.xml, if I change mapred.job.tracker to localhost, or change the IP address of master in /etc/hosts to 127.0.1.1, the system runs with error. is there something that I missed? This problem has haunted me for weeks, any help from you is precious to me. Regards
Re: cluster under-utilization with Hadoop Fair Scheduler
Reading assignTasks() in 0.20.2 reveals that the number of map tasks assigned is not limited to 1 per heartbeat. Cheers On Sun, Apr 11, 2010 at 12:30 PM, Todd Lipcon t...@cloudera.com wrote: Hi Abhishek, This behavior is improved by MAPREDUCE-706 I believe (not certain that that's the JIRA, but I know it's fixed in trunk fairscheduler). These patches are included in CDH3 (currently in beta) http://archive.cloudera.com/cdh/3/ In general, though, map tasks that are so short are not going to be very efficient - even with fast assignment there is some constant overhead per task. Thanks -Todd On Sun, Apr 11, 2010 at 11:42 AM, abhishek sharma absha...@usc.edu wrote: Hi all, I have been using the Hadoop Fair Scheduler for some experiments on a 100 node cluster with 2 map slots per node (hence, a total of 200 map slots). In one of my experiments, all the map tasks finish within a heartbeat interval of 3 seconds. I noticed that the maximum number of concurrently active map slots on my cluster never exceeds 100, and hence, the cluster utilization during my experiments never exceeds 50% even when large jobs with more than a 1000 maps are being executed. A look at the Fair Scheduler code (in particular, the assignTasks function) revealed the reason. As per my understanding, with the implementation in Hadoop 0.20.0, a TaskTracker is not assigned more than 1 map and 1 reduce task per heart beat. In my experiments, in every heart beat, each TT has 2 free map slots but is assigned only 1 map task, and hence, the utilization never goes beyond 50%. Of course, this (degenerate) case does not arise when map tasks take more than one 1 heart beat interval to finish. For example, I repeated the experiments with maps tasks taking close to 15 s to finish and noticed close to 100 % utilization when large jobs were executing. Why does the Fair Scheduler not assign more than one map task to a TT per heart beat? Is this done to spread the load uniformly across the cluster? I looked at assignTasks function in the default Hadoop scheduler (JobQueueTaskScheduler.java), and it does assign more than 1 map task per heart beat to a TT. It will be easy to change the Fair Scheduler to assign more than 1 map task to a TT per heart beat (I did that and achieved 100% utilization even with small map tasks). But I am wondering, if doing so will violate some fairness properties. Thanks, Abhishek -- Todd Lipcon Software Engineer, Cloudera
Re: Announce: Karmasphere Studio for Hadoop 1.2.0
On Apr 10, 2010, at 7:10 PM, Shevek wrote: * Full cross-platform support - Job submission, HDFS and S3 browsing from Windows, MacOS or Linux. If you list three OSes, that isn't cross platform. :)
Re: Hadoop 0.20.3 hangs
On Apr 10, 2010, at 5:36 PM, rishi kapoor wrote: With Hadoop 0.20.3 the command for accessing dfs just hangs (bin/hadoop dfs -ls , bin/hadoop dfs -put ). hadoop 0.20.3 hasn't been released, so you'll need to be more explicit about what you are actually running.
How many nodes are there in the largest hadoop cluster worldwide?
Hi, all: I'm writing an article related to hadoop and want to know how many nodes are there in the largest hadoop cluster worldwide. Regards
Re: Too many fetch-failures ERROR
Hi. For single node installation using localhost in cofig files, Hadoop run very well. However, If I change localhost to the hostname which is assigned to local machine in /etc/hosts file, in my case it is master, I receive Too many fetch-failure error. I think there is a problem with transferring data to mapred process. Am I right? Is there any way to solve it? Regards. On Mon, Apr 12, 2010 at 2:40 AM, Raghava Mutharaju m.vijayaragh...@gmail.com wrote: Hi, I followed Michael G. Noll's blog post to set up a single node installation on my laptop. Sometimes I did encounter this error. I just used to restart hadoop and that used to fix it. But I don't know the exact reason behind this. Regards, Raghava. On Sun, Apr 11, 2010 at 6:05 AM, long thai thaithanhlong2...@gmail.com wrote: Hi all. The error I encounter is so common, however, after 2 weeks of searching and following all solution, I still stuck at it. So, I hope that anyone can help me to overcome this issue :) First, I use Ubuntu 9.04 x86_64 and hadoop-0.20.2. I successfully setup for single node based on instruction of Michael G. Noll. Second, I setup Hadoop for multi nodes, following Noll's instruction, and encounter the error. This is my config files /etc/hosts 127.0.0.1localhost 127.0.1.1thailong-desktop #192.168.1.2 localhost #192.168.1.2 thailong-desktop # The following lines are desirable for IPv6 capable hosts #::1 localhost ip6-localhost ip6-loopback #fe00::0 ip6-localnet #ff00::0 ip6-mcastprefix #ff02::1 ip6-allnodes #ff02::2 ip6-allrouters #ff02::3 ip6-allhosts 192.168.1.4 node1 192.168.1.2 master core-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/usr/local/hadoop-datastore/hadoop-${user.name}/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://master:54310/value descriptionThe name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem./description /property /configuration mapred-site.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/mapred-site.xml -- property namemapred.job.tracker/name valuemaster:54311/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task. /description /property /configuration hdfs.xml ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration !-- In: conf/hdfs-site.xml -- property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. /description /property /configuration I try to setup the Hadoop on single node again, but at this time, instead of using localhost, I set all value to master, which is the host name of the local machine, and the error is still there. It seems that there is a problem in mapred-site.xml, if I change mapred.job.tracker to localhost, or change the IP address of master in /etc/hosts to 127.0.1.1, the system runs with error. is there something that I missed? This problem has haunted me for weeks, any help from you is precious to me. Regards