Re: Too many fetch failures AND Shuffle error
Yeah. With 2 nodes the reducers will go up to 16% because the reducer are able to fetch maps from the same machine (locally) but fails to copy it from the remote machine. A common reason in such cases is the *restricted machine access* (firewall etc). The web-server on a machine/node hosts map outputs which the reducers on the other machine are not able to access. There will be a URL associated with a map that the reducer try to fetch (check the reducer logs for this url). Just try accessing it manually from the reducer's machine/node. Most likely this experiment should also fail. Let us know if this is not the case. Amar Sayali Kulkarni wrote: Can you post the reducer logs. How many nodes are there in the cluster? There are 6 nodes in the cluster - 1 master and 5 slaves I tried to reduce the number of nodes, and found that the problem is solved only if there is a single node in the cluster. So I can deduce that the problem is there in some configuration. Configuration file: hadoop.tmp.dir /extra/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-${user.name} A base for other temporary directories. fs.default.name hdfs://10.105.41.25:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. mapred.job.tracker 10.105.41.25:54311 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. dfs.replication 2 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. mapred.child.java.opts -Xmx1048M mapred.local.dir /extra/HADOOP/hadoop-0.16.3/tmp/mapred mapred.map.tasks 53 The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". mapred.reduce.tasks 7 The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". This is the output that I get when running the tasks with 2 nodes in the cluster: 08/06/20 11:07:45 INFO mapred.FileInputFormat: Total input paths to process : 1 08/06/20 11:07:45 INFO mapred.JobClient: Running job: job_200806201106_0001 08/06/20 11:07:46 INFO mapred.JobClient: map 0% reduce 0% 08/06/20 11:07:53 INFO mapred.JobClient: map 8% reduce 0% 08/06/20 11:07:55 INFO mapred.JobClient: map 17% reduce 0% 08/06/20 11:07:57 INFO mapred.JobClient: map 26% reduce 0% 08/06/20 11:08:00 INFO mapred.JobClient: map 34% reduce 0% 08/06/20 11:08:01 INFO mapred.JobClient: map 43% reduce 0% 08/06/20 11:08:04 INFO mapred.JobClient: map 47% reduce 0% 08/06/20 11:08:05 INFO mapred.JobClient: map 52% reduce 0% 08/06/20 11:08:08 INFO mapred.JobClient: map 60% reduce 0% 08/06/20 11:08:09 INFO mapred.JobClient: map 69% reduce 0% 08/06/20 11:08:10 INFO mapred.JobClient: map 73% reduce 0% 08/06/20 11:08:12 INFO mapred.JobClient: map 78% reduce 0% 08/06/20 11:08:13 INFO mapred.JobClient: map 82% reduce 0% 08/06/20 11:08:15 INFO mapred.JobClient: map 91% reduce 1% 08/06/20 11:08:16 INFO mapred.JobClient: map 95% reduce 1% 08/06/20 11:08:18 INFO mapred.JobClient: map 99% reduce 3% 08/06/20 11:08:23 INFO mapred.JobClient: map 100% reduce 3% 08/06/20 11:08:25 INFO mapred.JobClient: map 100% reduce 7% 08/06/20 11:08:28 INFO mapred.JobClient: map 100% reduce 10% 08/06/20 11:08:30 INFO mapred.JobClient: map 100% reduce 11% 08/06/20 11:08:33 INFO mapred.JobClient: map 100% reduce 12% 08/06/20 11:08:35 INFO mapred.JobClient: map 100% reduce 14% 08/06/20 11:08:38 INFO mapred.JobClient: map 100% reduce 15% 08/06/20 11:09:54 INFO mapred.JobClient: map 100% reduce 13% 08/06/20 11:09:54 INFO mapred.JobClient: Task Id : task_200806201106_0001_r_02_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 08/06/20 11:09:56 INFO mapred.JobClient: map 100% reduce 9% 08/06/20 11:09:56 INFO mapred.JobClient: Task Id : task_200806201106_0001_r_03_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 08/06/20 11:09:56 INFO mapred.JobClient: Task Id : task_200806201106_0001_m_11_0, Status : FAILED Too many fetch-failures 08/06/20 11:09:57 INFO mapred.JobClient: map 95% reduce 9% 08/06/20 11:09:59 INFO mapred.JobClient: map 100% reduce 9% 08/06/20 11:10:04 INFO mapred.JobClient: map 100% reduce 10% 08/06/20 11:10:07 INFO mapred.JobClient: map 100% reduce 11% 08/06/20 11:10:09 INFO mapred.JobClient: map 100% reduce 13% 08/06/20 11:10:12 INFO mapred.JobClient: map 100% reduce 14
Re: Too many fetch failures AND Shuffle error
> Can you post the reducer logs. How many nodes are there in the cluster? There are 6 nodes in the cluster - 1 master and 5 slaves I tried to reduce the number of nodes, and found that the problem is solved only if there is a single node in the cluster. So I can deduce that the problem is there in some configuration. Configuration file: hadoop.tmp.dir /extra/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-${user.name} A base for other temporary directories. fs.default.name hdfs://10.105.41.25:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. mapred.job.tracker 10.105.41.25:54311 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. dfs.replication 2 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. mapred.child.java.opts -Xmx1048M mapred.local.dir /extra/HADOOP/hadoop-0.16.3/tmp/mapred mapred.map.tasks 53 The default number of map tasks per job. Typically set to a prime several times greater than number of available hosts. Ignored when mapred.job.tracker is "local". mapred.reduce.tasks 7 The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". This is the output that I get when running the tasks with 2 nodes in the cluster: 08/06/20 11:07:45 INFO mapred.FileInputFormat: Total input paths to process : 1 08/06/20 11:07:45 INFO mapred.JobClient: Running job: job_200806201106_0001 08/06/20 11:07:46 INFO mapred.JobClient: map 0% reduce 0% 08/06/20 11:07:53 INFO mapred.JobClient: map 8% reduce 0% 08/06/20 11:07:55 INFO mapred.JobClient: map 17% reduce 0% 08/06/20 11:07:57 INFO mapred.JobClient: map 26% reduce 0% 08/06/20 11:08:00 INFO mapred.JobClient: map 34% reduce 0% 08/06/20 11:08:01 INFO mapred.JobClient: map 43% reduce 0% 08/06/20 11:08:04 INFO mapred.JobClient: map 47% reduce 0% 08/06/20 11:08:05 INFO mapred.JobClient: map 52% reduce 0% 08/06/20 11:08:08 INFO mapred.JobClient: map 60% reduce 0% 08/06/20 11:08:09 INFO mapred.JobClient: map 69% reduce 0% 08/06/20 11:08:10 INFO mapred.JobClient: map 73% reduce 0% 08/06/20 11:08:12 INFO mapred.JobClient: map 78% reduce 0% 08/06/20 11:08:13 INFO mapred.JobClient: map 82% reduce 0% 08/06/20 11:08:15 INFO mapred.JobClient: map 91% reduce 1% 08/06/20 11:08:16 INFO mapred.JobClient: map 95% reduce 1% 08/06/20 11:08:18 INFO mapred.JobClient: map 99% reduce 3% 08/06/20 11:08:23 INFO mapred.JobClient: map 100% reduce 3% 08/06/20 11:08:25 INFO mapred.JobClient: map 100% reduce 7% 08/06/20 11:08:28 INFO mapred.JobClient: map 100% reduce 10% 08/06/20 11:08:30 INFO mapred.JobClient: map 100% reduce 11% 08/06/20 11:08:33 INFO mapred.JobClient: map 100% reduce 12% 08/06/20 11:08:35 INFO mapred.JobClient: map 100% reduce 14% 08/06/20 11:08:38 INFO mapred.JobClient: map 100% reduce 15% 08/06/20 11:09:54 INFO mapred.JobClient: map 100% reduce 13% 08/06/20 11:09:54 INFO mapred.JobClient: Task Id : task_200806201106_0001_r_02_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 08/06/20 11:09:56 INFO mapred.JobClient: map 100% reduce 9% 08/06/20 11:09:56 INFO mapred.JobClient: Task Id : task_200806201106_0001_r_03_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 08/06/20 11:09:56 INFO mapred.JobClient: Task Id : task_200806201106_0001_m_11_0, Status : FAILED Too many fetch-failures 08/06/20 11:09:57 INFO mapred.JobClient: map 95% reduce 9% 08/06/20 11:09:59 INFO mapred.JobClient: map 100% reduce 9% 08/06/20 11:10:04 INFO mapred.JobClient: map 100% reduce 10% 08/06/20 11:10:07 INFO mapred.JobClient: map 100% reduce 11% 08/06/20 11:10:09 INFO mapred.JobClient: map 100% reduce 13% 08/06/20 11:10:12 INFO mapred.JobClient: map 100% reduce 14% 08/06/20 11:10:14 INFO mapred.JobClient: map 100% reduce 15% 08/06/20 11:10:17 INFO mapred.JobClient: map 100% reduce 16% 08/06/20 11:10:24 INFO mapred.JobClient: map 100% reduce 13% 08/06/20 11:10:24 INFO mapred.JobClient: Task Id : task_200806201106_0001_r_00_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 08/06/20 11:10:29 INFO mapred.JobClient: map 100% reduce 11% 08/06/20 11:10:29 INFO mapred.JobClient: Task Id : task_200806201106_0001_r_01_0, Status : FAILED Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. 08/06/20 11:10:29 INFO mapred.JobClient: Task Id : task_200806201106_0001_m_
Re: Too many fetch failures AND Shuffle error
Sayali Kulkarni wrote: Hello, I have been getting Too many fetch failures (in the map operation) and shuffle error (in the reduce operation) Can you post the reducer logs. How many nodes are there in the cluster? Are you seeing this for all the maps and reducers? Are the reducers progressing at all? Are all the maps that the reducer is failing from a remote machine? Are all the failed maps/reducers from the same machine? Can you provide some more details. Amar and am unable to complete any job on the cluster. I have 5 slaves in the cluster. So I have the following values in the hadoop-site.xml file: mapred.map.tasks 53 // 53 = nearest prime to 5*10 mapred.reduce.tasks 7 // 7 = nearest prime to 5 Please let me know what would be the suggest fix for this. Hadoop version I am using is hadoop-0.16.3 and it is installed on Ubuntu. Thanks! --Sayali - Sent from Yahoo! Mail. A Smarter Email.
Re: java.io.IOException: All datanodes are bad. Aborting...
Hi Mori Bellamy, I did this twice. and still the same problem is persisting. I don't know how to solve this issue. If any one know the answer, please let me know. Thanks Mori Bellamy wrote: > > That's bizarre. I'm not sure why your DFS would have magically gotten > full. Whenever hadoop gives me trouble, i try the following sequence > of commands > > stop-all.sh > rm -Rf /path/to/my/hadoop/dfs/data > hadoop namenode -format > start-all.sh > > maybe you would get some luck if you ran that on all of the machines? > (of course, don't run it if you don't want to lose all of that "data") > On Jun 19, 2008, at 4:32 AM, novice user wrote: > >> >> Hi Every one, >> I am running a simple map-red application similar to k-means. But, >> when I >> ran it in on single machine, it went fine with out any issues. But, >> when I >> ran the same on a hadoop cluster of 9 machines. It fails saying >> java.io.IOException: All datanodes are bad. Aborting... >> >> Here is more explanation about the problem: >> I tried to upgrade my hadoop cluster to hadoop-17. During this >> process, I >> made a mistake of not installing hadoop on all machines. So, the >> upgrade >> failed. Nor I was able to roll back. So, I re-formatted the name node >> afresh. and then hadoop installation was successful. >> >> Later, when I ran my map-reduce job, it ran successfully,but the >> same job >> with zero reduce tasks is failing with the error as: >> java.io.IOException: All datanodes are bad. Aborting... >> >> When I looked into the data nodes, I figured out that file system is >> 100% >> full with different directories of name "subdir" in >> hadoop-username/dfs/data/current directory. I am wondering where I >> went >> wrong. >> Can some one please help me on this? >> >> The same job went fine on a single machine with same amount of input >> data. >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html >> Sent from the Hadoop core-user mailing list archive at Nabble.com. >> > > > -- View this message in context: http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18022330.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
unit testing of Jython mappers/reducers
Does anyone have an example of a unit test setup for Jython jobs? I'm unable to run my methods outside of the context of Hadoop. This may be a general Jython issue. Here is my attempt. As mentioned in the comment, I am able to resolve "self.mapper.map", but I get an AttributeError when I attempt to call it. Is this a Java polymorphism issue - maybe I'm not passing the right types, and the baseclass doesn't have a method definition with the right types? Or do the JobConf methods to state input/output types that a normal Hadoop run calls have something to do with it? # import style may matter from org.apache import hadoop from org.apache.hadoop.examples.kcluster import KMeansMapper import unittest class TestFoo(unittest.TestCase): def setUp(self): self.mapper = KMeansMapper() def testbar(self): # can do this: # print self.mapper.map => resolves the method # this raises AttributeError: abstract method "map" not implemented self.mapper.map(hadoop.io.LongWritable(0), hadoop.io.Text("10 1 0"), hadoop.mapred.OutputCollector(), hadoop.mapred.Reporter.NULL) if __name__ == "__main__": unittest.main()
Re: hadoop file system error
might it be a synchronization problem? i don't know if hadoops DFS magically takes care of that, but if it doesn't then you might have a problem because of multiple processes trying to write to the same file? perhaps as a control experiment you could run your process on some small input, making sure that each reduce task outputs to a different filename (i just use Math.random()*Integer.MAX_VALUE and cross my fingers). On Jun 18, 2008, at 6:01 PM, 晋光峰 wrote: i'm sure i close all the files in the reduce step. Any other reasons cause this problem? 2008/6/18 Konstantin Shvachko <[EMAIL PROTECTED]>: Did you close those files? If not they may be empty. ??? wrote: Dears, I use hadoop-0.16.4 to do some work and found a error which i can't get the reasons. The scenario is like this: In the reduce step, instead of using OutputCollector to write result, i use FSDataOutputStream to write result to files on HDFS(becouse i want to split the result by some rules). After the job finished, i found that *some* files(but not all) are empty on HDFS. But i'm sure in the reduce step the files are not empty since i added some logs to read the generated file. It seems that some file's contents are lost after the reduce step. Is anyone happen to face such errors? or it's a hadoop bug? Please help me to find the reason if you some guys know Thanks & Regards Guangfeng -- Guangfeng Jin Software Engineer iZENEsoft (Shanghai) Co., Ltd Room 601 Marine Tower, No. 1 Pudong Ave. Tel:86-21-68860698 Fax:86-21-68860699 Mobile: 86-13621906422 Company Website:www.izenesoft.com
Re: java.io.IOException: All datanodes are bad. Aborting...
That's bizarre. I'm not sure why your DFS would have magically gotten full. Whenever hadoop gives me trouble, i try the following sequence of commands stop-all.sh rm -Rf /path/to/my/hadoop/dfs/data hadoop namenode -format start-all.sh maybe you would get some luck if you ran that on all of the machines? (of course, don't run it if you don't want to lose all of that "data") On Jun 19, 2008, at 4:32 AM, novice user wrote: Hi Every one, I am running a simple map-red application similar to k-means. But, when I ran it in on single machine, it went fine with out any issues. But, when I ran the same on a hadoop cluster of 9 machines. It fails saying java.io.IOException: All datanodes are bad. Aborting... Here is more explanation about the problem: I tried to upgrade my hadoop cluster to hadoop-17. During this process, I made a mistake of not installing hadoop on all machines. So, the upgrade failed. Nor I was able to roll back. So, I re-formatted the name node afresh. and then hadoop installation was successful. Later, when I ran my map-reduce job, it ran successfully,but the same job with zero reduce tasks is failing with the error as: java.io.IOException: All datanodes are bad. Aborting... When I looked into the data nodes, I figured out that file system is 100% full with different directories of name "subdir" in hadoop-username/dfs/data/current directory. I am wondering where I went wrong. Can some one please help me on this? The same job went fine on a single machine with same amount of input data. Thanks -- View this message in context: http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Release Date of Hadoop 0.17.1
I strongly suggest you download the candidate release and make sure it solves your problem. Then provide a +1 or -1 reply to the vote thread: http://www.nabble.com/-VOTE--Release-Hadoop-0.17.1-%28candidate-0%29- tt17995523.html Cheers, Nige On Jun 19, 2008, at 4:18 AM, Joman Chu wrote: Hello, I was wondering when Hadoop 0.17.1 was going to be released. I'm being affected by the QuickSort unbounded recursion bug (I think Hadoop-3442), and I want to know if I should apply the patch myself and push it out to my cluster or wait for Hadoop 0.17.1 to be released. I'd rather not duplicate the amount of work I need to do in order to fix the cluster or kill people's jobs unnecessarily. Thanks, Joman Chu
Too many fetch failures AND Shuffle error
Hello, I have been getting Too many fetch failures (in the map operation) and shuffle error (in the reduce operation) and am unable to complete any job on the cluster. I have 5 slaves in the cluster. So I have the following values in the hadoop-site.xml file: mapred.map.tasks 53 // 53 = nearest prime to 5*10 mapred.reduce.tasks 7 // 7 = nearest prime to 5 Please let me know what would be the suggest fix for this. Hadoop version I am using is hadoop-0.16.3 and it is installed on Ubuntu. Thanks! --Sayali - Sent from Yahoo! Mail. A Smarter Email.
Re: from raja
For research and development purposes, a single node with all daemons running will suffice for testing your map reduce code. While it may be valid to attempt to test different instances and the communications between them, your returns will quickly diminish. -Daniel On Thu, Jun 19, 2008 at 6:38 AM, ra ja <[EMAIL PROTECTED]> wrote: > hi sir/madam, > > > > how to integrate virtualization(xen) with hadoop tools? > > give me a idea? > > will it done using c++? > > please give me a response. > > > > with regards > > raja.p > > > > > From Chandigarh to Chennai - find friends all over India. Go to > http://in.promos.yahoo.com/groups/citygroups/
Re: dfs put fails
I ran into some similar issues with firewalls and ended up completely turning them off. That took care of some of the problems but allowed me to figure out that if DNS / HOST files aren't configured correctly, weird things will happen during the communication between daemons. I have a small cluster and configured a hosts file that I copied everywhere, including workstation for HDFS browsing. This made things run much smoother, hope that helps. -Daniel On Wed, Jun 18, 2008 at 12:53 PM, Alexander Arimond < [EMAIL PROTECTED]> wrote: > > Got a similar error when doing a mapreduce job on the master machine. > Mapping job is ok and in the end there are the right results in my > output folder, but the reduce hangs at 17% a very long time. Found this > in one of the task logs a view times: > > ... > 2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Got 0 known map output location(s); > scheduling... > 2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Scheduled 0 of 0 known outputs (0 slow > hosts and 0 dup hosts) > 2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 copy failed: > task_200806181716_0001_m_01_0 from koeln > 2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask: > java.net.ConnectException: Connection refused >at java.net.PlainSocketImpl.socketConnect(Native Method) >at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) >at > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) >at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) >at java.net.Socket.connect(Socket.java:519) >at sun.net.NetworkClient.doConnect(NetworkClient.java:152) >at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) >at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) >at sun.net.www.http.HttpClient.(HttpClient.java:233) >at sun.net.www.http.HttpClient.New(HttpClient.java:306) >at sun.net.www.http.HttpClient.New(HttpClient.java:323) >at > sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788) >at > sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729) >at > sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654) >at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977) >at > org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:139) >at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:815) >at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:764) > > 2008-06-18 17:31:03,276 INFO org.apache.hadoop.mapred.ReduceTask: Task > task_200806181716_0001_r_00_0: Failed fetch #7 from > task_200806181716_0001_m_01_0 > 2008-06-18 17:31:03,276 INFO org.apache.hadoop.mapred.ReduceTask: Failed to > fetch map-output from task_200806181716_0001_m_01_0 even after > MAX_FETCH_RETRIES_PER_MAP retries... reporting to the JobTracker > 2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 adding host koeln to penalty box, next > contact in 150 seconds > 2008-06-18 17:31:03,277 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Need 1 map output(s) > 2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 1 map-outputs from previous failures > 2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Got 1 known map output location(s); > scheduling... > 2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Scheduled 0 of 1 known outputs (1 slow > hosts and 0 dup hosts) > 2008-06-18 17:31:08,336 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Need 1 map output(s) > 2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0: Got 0 new map-outputs & 0 obsolete > map-outputs from tasktracker and 0 map-outputs from previous failures > 2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Got 1 known map output location(s); > scheduling... > 2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask: > task_200806181716_0001_r_00_0 Scheduled 0 of 1 known outputs (1 slow > hosts and 0 dup hosts) > 2008-06-18 17:31:13,356 INFO org.apache.hadoop.ma
java.io.IOException: All datanodes are bad. Aborting...
Hi Every one, I am running a simple map-red application similar to k-means. But, when I ran it in on single machine, it went fine with out any issues. But, when I ran the same on a hadoop cluster of 9 machines. It fails saying java.io.IOException: All datanodes are bad. Aborting... Here is more explanation about the problem: I tried to upgrade my hadoop cluster to hadoop-17. During this process, I made a mistake of not installing hadoop on all machines. So, the upgrade failed. Nor I was able to roll back. So, I re-formatted the name node afresh. and then hadoop installation was successful. Later, when I ran my map-reduce job, it ran successfully,but the same job with zero reduce tasks is failing with the error as: java.io.IOException: All datanodes are bad. Aborting... When I looked into the data nodes, I figured out that file system is 100% full with different directories of name "subdir" in hadoop-username/dfs/data/current directory. I am wondering where I went wrong. Can some one please help me on this? The same job went fine on a single machine with same amount of input data. Thanks -- View this message in context: http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
RE: Release Date of Hadoop 0.17.1
It should be out within a couple of days. As of now voting is on and will end on 23rd. > -Original Message- > From: Joman Chu [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 19, 2008 4:48 PM > To: core-user@hadoop.apache.org > Subject: Release Date of Hadoop 0.17.1 > > Hello, I was wondering when Hadoop 0.17.1 was going to be released. > I'm being affected by the QuickSort unbounded recursion bug > (I think Hadoop-3442), and I want to know if I should apply > the patch myself and push it out to my cluster or wait for > Hadoop 0.17.1 to be released. I'd rather not duplicate the > amount of work I need to do in order to fix the cluster or > kill people's jobs unnecessarily. > > Thanks, > Joman Chu >
Re: Too many Task Manager children...
C G wrote: Hi All: I have mapred.tasktracker.tasks.maximum set to 4 in our conf/hadoop-site.xml, yet I frequently see 5-6 instances of org.apache.hadoop.mapred.TaskTracker$Child running on the slave nodes. Is there another setting I need to tweak in order to dial back the number of children running? The effect of running this many children is that our boxes have extremely high load factors, and eventually mapred tasks start timing out and failing. If mapred.tasktracker.tasks.maximum is set to four, the tasktracker has 4 map slots and 4 reduce slots, summing up to 8 slots. Then seeing 5-6 instances of org.apache.hadoop.mapred.TaskTracker$Child is expected. If you want only 4 instances of it, mapred.tasktracker.tasks.maximum should be 2. thus making 2 map slots and 2 reduce slots. And as far as I know there is no other config variable for tweaking the number of children. Note that the number of instances is for a single job. I see far more if I run multiple jobs simultaneously (something we do not typically do). This is on Hadoop 0.15.0, upgrading is not an option at the moment. Any help appreciate... Thanks, C G Thanks Amareshwari
Release Date of Hadoop 0.17.1
Hello, I was wondering when Hadoop 0.17.1 was going to be released. I'm being affected by the QuickSort unbounded recursion bug (I think Hadoop-3442), and I want to know if I should apply the patch myself and push it out to my cluster or wait for Hadoop 0.17.1 to be released. I'd rather not duplicate the amount of work I need to do in order to fix the cluster or kill people's jobs unnecessarily. Thanks, Joman Chu
Too many Task Manager children...
Hi All: I have mapred.tasktracker.tasks.maximum set to 4 in our conf/hadoop-site.xml, yet I frequently see 5-6 instances of org.apache.hadoop.mapred.TaskTracker$Child running on the slave nodes. Is there another setting I need to tweak in order to dial back the number of children running? The effect of running this many children is that our boxes have extremely high load factors, and eventually mapred tasks start timing out and failing. Note that the number of instances is for a single job. I see far more if I run multiple jobs simultaneously (something we do not typically do). This is on Hadoop 0.15.0, upgrading is not an option at the moment. Any help appreciate... Thanks, C G
from raja
hi sir/madam, how to integrate virtualization(xen) with hadoop tools? give me a idea? will it done using c++? please give me a response. with regards raja.p From Chandigarh to Chennai - find friends all over India. Go to http://in.promos.yahoo.com/groups/citygroups/