Re: Too many fetch failures AND Shuffle error

2008-06-19 Thread Amar Kamat
Yeah. With 2 nodes the reducers will go up to 16% because the reducer 
are able to fetch maps from the same machine (locally) but fails to copy 
it from the remote machine. A common reason in such cases is the 
*restricted machine access* (firewall etc). The web-server on a 
machine/node hosts map outputs which the reducers on the other machine 
are not able to access. There will be a URL associated with a map that 
the reducer try to fetch (check the reducer logs for this url). Just try 
accessing it manually from the reducer's machine/node. Most likely this 
experiment should also fail. Let us know if this is not the case.

Amar
Sayali Kulkarni wrote:
Can you post the reducer logs. How many nodes are there in the cluster? 


There are 6 nodes in the cluster - 1 master and 5 slaves
 I tried to reduce the number of nodes, and found that the problem is solved 
only if there is a single node in the cluster. So I can deduce that the problem 
is there in some configuration.

Configuration file:








  hadoop.tmp.dir
  /extra/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-${user.name}
  A base for other temporary directories.



  fs.default.name
  hdfs://10.105.41.25:54310
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.



  mapred.job.tracker
  10.105.41.25:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  



  dfs.replication
  2
  Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  




  mapred.child.java.opts
  -Xmx1048M



mapred.local.dir
/extra/HADOOP/hadoop-0.16.3/tmp/mapred



  mapred.map.tasks
  53
  The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".
  



  mapred.reduce.tasks
  7
  The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  






This is the output that I get when running the tasks with 2 nodes in the 
cluster:

08/06/20 11:07:45 INFO mapred.FileInputFormat: Total input paths to process : 1
08/06/20 11:07:45 INFO mapred.JobClient: Running job: job_200806201106_0001
08/06/20 11:07:46 INFO mapred.JobClient:  map 0% reduce 0%
08/06/20 11:07:53 INFO mapred.JobClient:  map 8% reduce 0%
08/06/20 11:07:55 INFO mapred.JobClient:  map 17% reduce 0%
08/06/20 11:07:57 INFO mapred.JobClient:  map 26% reduce 0%
08/06/20 11:08:00 INFO mapred.JobClient:  map 34% reduce 0%
08/06/20 11:08:01 INFO mapred.JobClient:  map 43% reduce 0%
08/06/20 11:08:04 INFO mapred.JobClient:  map 47% reduce 0%
08/06/20 11:08:05 INFO mapred.JobClient:  map 52% reduce 0%
08/06/20 11:08:08 INFO mapred.JobClient:  map 60% reduce 0%
08/06/20 11:08:09 INFO mapred.JobClient:  map 69% reduce 0%
08/06/20 11:08:10 INFO mapred.JobClient:  map 73% reduce 0%
08/06/20 11:08:12 INFO mapred.JobClient:  map 78% reduce 0%
08/06/20 11:08:13 INFO mapred.JobClient:  map 82% reduce 0%
08/06/20 11:08:15 INFO mapred.JobClient:  map 91% reduce 1%
08/06/20 11:08:16 INFO mapred.JobClient:  map 95% reduce 1%
08/06/20 11:08:18 INFO mapred.JobClient:  map 99% reduce 3%
08/06/20 11:08:23 INFO mapred.JobClient:  map 100% reduce 3%
08/06/20 11:08:25 INFO mapred.JobClient:  map 100% reduce 7%
08/06/20 11:08:28 INFO mapred.JobClient:  map 100% reduce 10%
08/06/20 11:08:30 INFO mapred.JobClient:  map 100% reduce 11%
08/06/20 11:08:33 INFO mapred.JobClient:  map 100% reduce 12%
08/06/20 11:08:35 INFO mapred.JobClient:  map 100% reduce 14%
08/06/20 11:08:38 INFO mapred.JobClient:  map 100% reduce 15%
08/06/20 11:09:54 INFO mapred.JobClient:  map 100% reduce 13%
08/06/20 11:09:54 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_r_02_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/06/20 11:09:56 INFO mapred.JobClient:  map 100% reduce 9%
08/06/20 11:09:56 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_r_03_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/06/20 11:09:56 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_m_11_0, Status : FAILED
Too many fetch-failures
08/06/20 11:09:57 INFO mapred.JobClient:  map 95% reduce 9%
08/06/20 11:09:59 INFO mapred.JobClient:  map 100% reduce 9%
08/06/20 11:10:04 INFO mapred.JobClient:  map 100% reduce 10%
08/06/20 11:10:07 INFO mapred.JobClient:  map 100% reduce 11%
08/06/20 11:10:09 INFO mapred.JobClient:  map 100% reduce 13%
08/06/20 11:10:12 INFO mapred.JobClient:  map 100% reduce 14

Re: Too many fetch failures AND Shuffle error

2008-06-19 Thread Sayali Kulkarni

> Can you post the reducer logs. How many nodes are there in the cluster? 
There are 6 nodes in the cluster - 1 master and 5 slaves
 I tried to reduce the number of nodes, and found that the problem is solved 
only if there is a single node in the cluster. So I can deduce that the problem 
is there in some configuration.

Configuration file:








  hadoop.tmp.dir
  /extra/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-${user.name}
  A base for other temporary directories.



  fs.default.name
  hdfs://10.105.41.25:54310
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.



  mapred.job.tracker
  10.105.41.25:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  



  dfs.replication
  2
  Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  




  mapred.child.java.opts
  -Xmx1048M



mapred.local.dir
/extra/HADOOP/hadoop-0.16.3/tmp/mapred



  mapred.map.tasks
  53
  The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".
  



  mapred.reduce.tasks
  7
  The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  






This is the output that I get when running the tasks with 2 nodes in the 
cluster:

08/06/20 11:07:45 INFO mapred.FileInputFormat: Total input paths to process : 1
08/06/20 11:07:45 INFO mapred.JobClient: Running job: job_200806201106_0001
08/06/20 11:07:46 INFO mapred.JobClient:  map 0% reduce 0%
08/06/20 11:07:53 INFO mapred.JobClient:  map 8% reduce 0%
08/06/20 11:07:55 INFO mapred.JobClient:  map 17% reduce 0%
08/06/20 11:07:57 INFO mapred.JobClient:  map 26% reduce 0%
08/06/20 11:08:00 INFO mapred.JobClient:  map 34% reduce 0%
08/06/20 11:08:01 INFO mapred.JobClient:  map 43% reduce 0%
08/06/20 11:08:04 INFO mapred.JobClient:  map 47% reduce 0%
08/06/20 11:08:05 INFO mapred.JobClient:  map 52% reduce 0%
08/06/20 11:08:08 INFO mapred.JobClient:  map 60% reduce 0%
08/06/20 11:08:09 INFO mapred.JobClient:  map 69% reduce 0%
08/06/20 11:08:10 INFO mapred.JobClient:  map 73% reduce 0%
08/06/20 11:08:12 INFO mapred.JobClient:  map 78% reduce 0%
08/06/20 11:08:13 INFO mapred.JobClient:  map 82% reduce 0%
08/06/20 11:08:15 INFO mapred.JobClient:  map 91% reduce 1%
08/06/20 11:08:16 INFO mapred.JobClient:  map 95% reduce 1%
08/06/20 11:08:18 INFO mapred.JobClient:  map 99% reduce 3%
08/06/20 11:08:23 INFO mapred.JobClient:  map 100% reduce 3%
08/06/20 11:08:25 INFO mapred.JobClient:  map 100% reduce 7%
08/06/20 11:08:28 INFO mapred.JobClient:  map 100% reduce 10%
08/06/20 11:08:30 INFO mapred.JobClient:  map 100% reduce 11%
08/06/20 11:08:33 INFO mapred.JobClient:  map 100% reduce 12%
08/06/20 11:08:35 INFO mapred.JobClient:  map 100% reduce 14%
08/06/20 11:08:38 INFO mapred.JobClient:  map 100% reduce 15%
08/06/20 11:09:54 INFO mapred.JobClient:  map 100% reduce 13%
08/06/20 11:09:54 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_r_02_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/06/20 11:09:56 INFO mapred.JobClient:  map 100% reduce 9%
08/06/20 11:09:56 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_r_03_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/06/20 11:09:56 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_m_11_0, Status : FAILED
Too many fetch-failures
08/06/20 11:09:57 INFO mapred.JobClient:  map 95% reduce 9%
08/06/20 11:09:59 INFO mapred.JobClient:  map 100% reduce 9%
08/06/20 11:10:04 INFO mapred.JobClient:  map 100% reduce 10%
08/06/20 11:10:07 INFO mapred.JobClient:  map 100% reduce 11%
08/06/20 11:10:09 INFO mapred.JobClient:  map 100% reduce 13%
08/06/20 11:10:12 INFO mapred.JobClient:  map 100% reduce 14%
08/06/20 11:10:14 INFO mapred.JobClient:  map 100% reduce 15%
08/06/20 11:10:17 INFO mapred.JobClient:  map 100% reduce 16%
08/06/20 11:10:24 INFO mapred.JobClient:  map 100% reduce 13%
08/06/20 11:10:24 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_r_00_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/06/20 11:10:29 INFO mapred.JobClient:  map 100% reduce 11%
08/06/20 11:10:29 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_r_01_0, Status : FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
08/06/20 11:10:29 INFO mapred.JobClient: Task Id : 
task_200806201106_0001_m_

Re: Too many fetch failures AND Shuffle error

2008-06-19 Thread Amar Kamat

Sayali Kulkarni wrote:

Hello,
I have been getting 
Too many fetch failures (in the map operation)
and 
shuffle error (in the reduce operation)


  
Can you post the reducer logs. How many nodes are there in the cluster? 
Are you seeing this for all the maps and reducers? Are the reducers 
progressing at all? Are all the maps that the reducer is failing from a 
remote machine? Are all the failed maps/reducers from the same machine? 
Can you provide some more details.

Amar

and am unable to complete any job on the cluster.

I have 5 slaves in the cluster. So I have the following values in the 
hadoop-site.xml file:
  mapred.map.tasks
  53
// 53 = nearest prime to 5*10

  mapred.reduce.tasks
  7
// 7 = nearest prime to 5

Please let me know what would be the suggest fix for this.

Hadoop version I am using is hadoop-0.16.3 and it is installed on  Ubuntu.

Thanks!
--Sayali


   
-

Sent from Yahoo! Mail.
A Smarter Email.
  




Re: java.io.IOException: All datanodes are bad. Aborting...

2008-06-19 Thread novice user

Hi Mori Bellamy,
 I did this twice.  and still the same problem is persisting. I don't know
how to solve this issue. If any one know the answer, please let me know.

Thanks

Mori Bellamy wrote:
> 
> That's bizarre. I'm not sure why your DFS would have magically gotten  
> full. Whenever hadoop gives me trouble, i try the following sequence  
> of commands
> 
> stop-all.sh
> rm -Rf /path/to/my/hadoop/dfs/data
> hadoop namenode -format
> start-all.sh
> 
> maybe you would get some luck if you ran that on all of the machines?  
> (of course, don't run it if you don't want to lose all of that "data")
> On Jun 19, 2008, at 4:32 AM, novice user wrote:
> 
>>
>> Hi Every one,
>> I am running a simple map-red application similar to k-means. But,  
>> when I
>> ran it in on single machine, it went fine with out any issues. But,  
>> when I
>> ran the same on a hadoop cluster of 9 machines. It fails saying
>> java.io.IOException: All datanodes are bad. Aborting...
>>
>> Here is more explanation about the problem:
>> I tried to upgrade my hadoop cluster to hadoop-17. During this  
>> process, I
>> made a mistake of not installing hadoop on all machines. So, the  
>> upgrade
>> failed. Nor I was able to roll back.  So, I re-formatted the name node
>> afresh. and then hadoop installation was successful.
>>
>> Later, when I ran my map-reduce job, it ran successfully,but  the  
>> same job
>> with zero reduce tasks is failing with the error as:
>> java.io.IOException: All datanodes  are bad. Aborting...
>>
>> When I looked into the data nodes, I figured out that file system is  
>> 100%
>> full with different directories of name "subdir" in
>> hadoop-username/dfs/data/current directory. I am wondering where I  
>> went
>> wrong.
>> Can some one please help me on this?
>>
>> The same job went fine on a single machine with same amount of input  
>> data.
>>
>> Thanks
>>
>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18022330.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



unit testing of Jython mappers/reducers

2008-06-19 Thread Karl Anderson
Does anyone have an example of a unit test setup for Jython jobs?  I'm  
unable to run my methods outside of the context of Hadoop.  This may  
be a general Jython issue.


Here is my attempt.  As mentioned in the comment, I am able to resolve  
"self.mapper.map", but I get an AttributeError when I attempt to call  
it.  Is this a Java polymorphism issue - maybe I'm not passing the  
right types, and the baseclass doesn't have a method definition with  
the right types?  Or do the JobConf methods to state input/output  
types that a normal Hadoop run calls have something to do with it?



# import style may matter
from org.apache import hadoop
from org.apache.hadoop.examples.kcluster import KMeansMapper

import unittest


class TestFoo(unittest.TestCase):
def setUp(self):
self.mapper = KMeansMapper()

def testbar(self):
# can do this:
#   print self.mapper.map => resolves the method
# this raises AttributeError: abstract method "map" not  
implemented

self.mapper.map(hadoop.io.LongWritable(0),
hadoop.io.Text("10 1 0"),
hadoop.mapred.OutputCollector(),
hadoop.mapred.Reporter.NULL)

if __name__ == "__main__":
unittest.main()





Re: hadoop file system error

2008-06-19 Thread Mori Bellamy
might it be a synchronization problem? i don't know if hadoops DFS  
magically takes care of that, but if it doesn't then you might have a  
problem because of multiple processes trying to write to the same file?


perhaps as a control experiment you could run your process on some  
small input, making sure that each reduce task outputs to a different  
filename (i just use Math.random()*Integer.MAX_VALUE and cross my  
fingers).

On Jun 18, 2008, at 6:01 PM, 晋光峰 wrote:

i'm sure i close all the files in the reduce step. Any other reasons  
cause

this problem?

2008/6/18 Konstantin Shvachko <[EMAIL PROTECTED]>:


Did you close those files?
If not they may be empty.



??? wrote:


Dears,

I use hadoop-0.16.4 to do some work and found a error which i  
can't get

the
reasons.

The scenario is like this: In the reduce step, instead of using
OutputCollector to write result, i use FSDataOutputStream to write  
result

to
files on HDFS(becouse i want to split the result by some rules).  
After the
job finished, i found that *some* files(but not all) are empty on  
HDFS.

But
i'm sure in the reduce step the files are not empty since i added  
some

logs
to read the generated file. It seems that some file's contents are  
lost
after the reduce step. Is anyone happen to face such errors? or  
it's a

hadoop bug?

Please help me to find the reason if you some guys know

Thanks & Regards
Guangfeng





--
Guangfeng Jin

Software Engineer

iZENEsoft (Shanghai) Co., Ltd
Room 601 Marine Tower, No. 1 Pudong Ave.
Tel:86-21-68860698
Fax:86-21-68860699
Mobile: 86-13621906422
Company Website:www.izenesoft.com




Re: java.io.IOException: All datanodes are bad. Aborting...

2008-06-19 Thread Mori Bellamy
That's bizarre. I'm not sure why your DFS would have magically gotten  
full. Whenever hadoop gives me trouble, i try the following sequence  
of commands


stop-all.sh
rm -Rf /path/to/my/hadoop/dfs/data
hadoop namenode -format
start-all.sh

maybe you would get some luck if you ran that on all of the machines?  
(of course, don't run it if you don't want to lose all of that "data")

On Jun 19, 2008, at 4:32 AM, novice user wrote:



Hi Every one,
I am running a simple map-red application similar to k-means. But,  
when I
ran it in on single machine, it went fine with out any issues. But,  
when I

ran the same on a hadoop cluster of 9 machines. It fails saying
java.io.IOException: All datanodes are bad. Aborting...

Here is more explanation about the problem:
I tried to upgrade my hadoop cluster to hadoop-17. During this  
process, I
made a mistake of not installing hadoop on all machines. So, the  
upgrade

failed. Nor I was able to roll back.  So, I re-formatted the name node
afresh. and then hadoop installation was successful.

Later, when I ran my map-reduce job, it ran successfully,but  the  
same job

with zero reduce tasks is failing with the error as:
java.io.IOException: All datanodes  are bad. Aborting...

When I looked into the data nodes, I figured out that file system is  
100%

full with different directories of name "subdir" in
hadoop-username/dfs/data/current directory. I am wondering where I  
went

wrong.
Can some one please help me on this?

The same job went fine on a single machine with same amount of input  
data.


Thanks



--
View this message in context: 
http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.





Re: Release Date of Hadoop 0.17.1

2008-06-19 Thread Nigel Daley
I strongly suggest you download the candidate release and make sure  
it solves your problem.  Then provide a +1 or -1 reply to the vote  
thread:
http://www.nabble.com/-VOTE--Release-Hadoop-0.17.1-%28candidate-0%29- 
tt17995523.html


Cheers,
Nige

On Jun 19, 2008, at 4:18 AM, Joman Chu wrote:

Hello, I was wondering when Hadoop 0.17.1 was going to be released.
I'm being affected by the QuickSort unbounded recursion bug (I think
Hadoop-3442), and I want to know if I should apply the patch myself
and push it out to my cluster or wait for Hadoop 0.17.1 to be
released. I'd rather not duplicate the amount of work I need to do in
order to fix the cluster or kill people's jobs unnecessarily.

Thanks,
Joman Chu




Too many fetch failures AND Shuffle error

2008-06-19 Thread Sayali Kulkarni
Hello,
I have been getting 
Too many fetch failures (in the map operation)
and 
shuffle error (in the reduce operation)

and am unable to complete any job on the cluster.

I have 5 slaves in the cluster. So I have the following values in the 
hadoop-site.xml file:
  mapred.map.tasks
  53
// 53 = nearest prime to 5*10

  mapred.reduce.tasks
  7
// 7 = nearest prime to 5

Please let me know what would be the suggest fix for this.

Hadoop version I am using is hadoop-0.16.3 and it is installed on  Ubuntu.

Thanks!
--Sayali


   
-
Sent from Yahoo! Mail.
A Smarter Email.

Re: from raja

2008-06-19 Thread Daniel Blaisdell
For research and development purposes, a single node with all daemons
running will suffice for testing your map reduce code. While it may be valid
to attempt to test different instances and the communications between them,
your returns will quickly diminish.

-Daniel

On Thu, Jun 19, 2008 at 6:38 AM, ra ja <[EMAIL PROTECTED]> wrote:

> hi sir/madam,
>
>
>
> how to integrate virtualization(xen) with hadoop tools?
>
> give me a idea?
>
> will it done using c++?
>
> please give me a response.
>
>
>
> with regards
>
> raja.p
>
>
>
>
>  From Chandigarh to Chennai - find friends all over India. Go to
> http://in.promos.yahoo.com/groups/citygroups/


Re: dfs put fails

2008-06-19 Thread Daniel Blaisdell
I ran into some similar issues with firewalls and ended up completely
turning them off. That took care of some of the problems but allowed me to
figure out that if DNS / HOST files aren't configured correctly, weird
things will happen during the communication between daemons. I have a small
cluster and configured a hosts file that I copied everywhere, including
workstation for HDFS browsing. This made things run much smoother, hope that
helps.

-Daniel

On Wed, Jun 18, 2008 at 12:53 PM, Alexander Arimond <
[EMAIL PROTECTED]> wrote:

>
> Got a similar error when doing a mapreduce job on the master machine.
> Mapping job is ok and in the end there are the right results in my
> output folder, but the reduce hangs at 17% a very long time. Found this
> in one of the task logs a view times:
>
> ...
> 2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Got 0 known map output location(s);
> scheduling...
> 2008-06-18 17:31:02,297 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Scheduled 0 of 0 known outputs (0 slow
> hosts and 0 dup hosts)
> 2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 copy failed:
> task_200806181716_0001_m_01_0 from koeln
> 2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask:
> java.net.ConnectException: Connection refused
>at java.net.PlainSocketImpl.socketConnect(Native Method)
>at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>at java.net.Socket.connect(Socket.java:519)
>at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
>at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
>at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
>at sun.net.www.http.HttpClient.(HttpClient.java:233)
>at sun.net.www.http.HttpClient.New(HttpClient.java:306)
>at sun.net.www.http.HttpClient.New(HttpClient.java:323)
>at
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
>at
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
>at
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
>at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
>at
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:139)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:815)
>at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:764)
>
> 2008-06-18 17:31:03,276 INFO org.apache.hadoop.mapred.ReduceTask: Task
> task_200806181716_0001_r_00_0: Failed fetch #7 from
> task_200806181716_0001_m_01_0
> 2008-06-18 17:31:03,276 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
> fetch map-output from task_200806181716_0001_m_01_0 even after
> MAX_FETCH_RETRIES_PER_MAP retries...  reporting to the JobTracker
> 2008-06-18 17:31:03,276 WARN org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 adding host koeln to penalty box, next
> contact in 150 seconds
> 2008-06-18 17:31:03,277 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Need 1 map output(s)
> 2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 1 map-outputs from previous failures
> 2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Got 1 known map output location(s);
> scheduling...
> 2008-06-18 17:31:03,317 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Scheduled 0 of 1 known outputs (1 slow
> hosts and 0 dup hosts)
> 2008-06-18 17:31:08,336 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Need 1 map output(s)
> 2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0: Got 0 new map-outputs & 0 obsolete
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Got 1 known map output location(s);
> scheduling...
> 2008-06-18 17:31:08,337 INFO org.apache.hadoop.mapred.ReduceTask:
> task_200806181716_0001_r_00_0 Scheduled 0 of 1 known outputs (1 slow
> hosts and 0 dup hosts)
> 2008-06-18 17:31:13,356 INFO org.apache.hadoop.ma

java.io.IOException: All datanodes are bad. Aborting...

2008-06-19 Thread novice user

Hi Every one,
 I am running a simple map-red application similar to k-means. But, when I
ran it in on single machine, it went fine with out any issues. But, when I
ran the same on a hadoop cluster of 9 machines. It fails saying 
java.io.IOException: All datanodes are bad. Aborting...

Here is more explanation about the problem:
I tried to upgrade my hadoop cluster to hadoop-17. During this process, I
made a mistake of not installing hadoop on all machines. So, the upgrade
failed. Nor I was able to roll back.  So, I re-formatted the name node
afresh. and then hadoop installation was successful.

Later, when I ran my map-reduce job, it ran successfully,but  the same job
with zero reduce tasks is failing with the error as:
java.io.IOException: All datanodes  are bad. Aborting...

When I looked into the data nodes, I figured out that file system is 100%
full with different directories of name "subdir" in
hadoop-username/dfs/data/current directory. I am wondering where I went
wrong. 
Can some one please help me on this?

The same job went fine on a single machine with same amount of input data.

Thanks



-- 
View this message in context: 
http://www.nabble.com/java.io.IOException%3A-All-datanodes-are-bad.-Aborting...-tp18006296p18006296.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: Release Date of Hadoop 0.17.1

2008-06-19 Thread Devaraj Das
It should be out within a couple of days. As of now voting is on and will
end on 23rd. 

> -Original Message-
> From: Joman Chu [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, June 19, 2008 4:48 PM
> To: core-user@hadoop.apache.org
> Subject: Release Date of Hadoop 0.17.1
> 
> Hello, I was wondering when Hadoop 0.17.1 was going to be released.
> I'm being affected by the QuickSort unbounded recursion bug 
> (I think Hadoop-3442), and I want to know if I should apply 
> the patch myself and push it out to my cluster or wait for 
> Hadoop 0.17.1 to be released. I'd rather not duplicate the 
> amount of work I need to do in order to fix the cluster or 
> kill people's jobs unnecessarily.
> 
> Thanks,
> Joman Chu
> 



Re: Too many Task Manager children...

2008-06-19 Thread Amareshwari Sriramadasu

C G wrote:

Hi All:
   
  I have mapred.tasktracker.tasks.maximum set to 4 in our conf/hadoop-site.xml, yet I frequently see 5-6 instances of  org.apache.hadoop.mapred.TaskTracker$Child running on the slave nodes.  Is there another setting I need to tweak in order to dial back the number of children running?  The effect of running this many children is that our boxes have extremely high load factors, and eventually mapred tasks start timing out and failing.
  
If mapred.tasktracker.tasks.maximum is set to four, the tasktracker has 
4 map slots and 4 reduce slots, summing up to 8 slots. Then seeing 5-6 
instances of org.apache.hadoop.mapred.TaskTracker$Child is expected. If 
you want only 4 instances of it,  mapred.tasktracker.tasks.maximum 
should be 2. thus making 2 map slots and 2 reduce slots.
And as far as I know there is no other config variable for tweaking the 
number of children.
   
  Note that the number of instances is for a single job.  I see far more if I run multiple jobs simultaneously (something we do not typically do).
   
  This is on Hadoop 0.15.0, upgrading is not an option at the moment.
   
  Any help appreciate...

  Thanks,
  C G

   
  


Thanks
Amareshwari


Release Date of Hadoop 0.17.1

2008-06-19 Thread Joman Chu
Hello, I was wondering when Hadoop 0.17.1 was going to be released.
I'm being affected by the QuickSort unbounded recursion bug (I think
Hadoop-3442), and I want to know if I should apply the patch myself
and push it out to my cluster or wait for Hadoop 0.17.1 to be
released. I'd rather not duplicate the amount of work I need to do in
order to fix the cluster or kill people's jobs unnecessarily.

Thanks,
Joman Chu


Too many Task Manager children...

2008-06-19 Thread C G
Hi All:
   
  I have mapred.tasktracker.tasks.maximum set to 4 in our conf/hadoop-site.xml, 
yet I frequently see 5-6 instances of  
org.apache.hadoop.mapred.TaskTracker$Child running on the slave nodes.  Is 
there another setting I need to tweak in order to dial back the number of 
children running?  The effect of running this many children is that our boxes 
have extremely high load factors, and eventually mapred tasks start timing out 
and failing.
   
  Note that the number of instances is for a single job.  I see far more if I 
run multiple jobs simultaneously (something we do not typically do).
   
  This is on Hadoop 0.15.0, upgrading is not an option at the moment.
   
  Any help appreciate...
  Thanks,
  C G

   

from raja

2008-06-19 Thread ra ja
hi sir/madam,



how to integrate virtualization(xen) with hadoop tools?

give me a idea?

will it done using c++?

please give me a response.



with regards

raja.p




  From Chandigarh to Chennai - find friends all over India. Go to 
http://in.promos.yahoo.com/groups/citygroups/