Re: Newbie InputFormat Question

2008-05-08 Thread Amareshwari Sriramadasu

You can have a look at TextInputFormat, KeyValueTextInputFormat etc at
http://svn.apache.org/viewvc/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/ 



coneybeare wrote:

I want to alter the default <"key", "line"> input format to be <"key", "line
number:" + "line"> so that my mapper can have a reference to the line num. 
It seems like this should be easy by overwriting either inputformat or

inputsplit... but after reading some of the docs, I am still unsure of where
to begin.  Any help is much appreciated.

-Matt
  




How to handle tif image files in hadoop

2008-05-08 Thread charan
Hi,

 I want to process the information in tif images using hadoop. For this, a
BufferedImage object has to be created. For JPEG images, ImageIO is used
alongwith the ByteArrayOutputStream which contains byte data of the
image. But  for TIFF image,this doesn't work. Is there any way to handle
this problem?

  Also, can conventional JAI library methods be used to directly access
TIFF files in HDFS?

Thank you.




Can reducer output multiple files?

2008-05-08 Thread Jeremy Chow
Hi list,
I want to output my reduced results into several files according to some
types the results blongs to. How can I implement this?

Thx,
Jeremy

-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com


Re: How to handle tif image files in hadoop

2008-05-08 Thread Peeyush Bishnoi
Hello ,

It's better that you write your own InputFormat for processing the tif
images . For more information you can look into this 

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/InputFormat.html
 

---
Peeyush

On Thu, 2008-05-08 at 13:32 +0530, [EMAIL PROTECTED] wrote:

> Hi,
> 
>  I want to process the information in tif images using hadoop. For this, a
> BufferedImage object has to be created. For JPEG images, ImageIO is used
> alongwith the ByteArrayOutputStream which contains byte data of the
> image. But  for TIFF image,this doesn't work. Is there any way to handle
> this problem?
> 
>   Also, can conventional JAI library methods be used to directly access
> TIFF files in HDFS?
> 
> Thank you.
> 
> 


Re: Can reducer output multiple files?

2008-05-08 Thread Amar Kamat

Jeremy Chow wrote:

Hi list,
I want to output my reduced results into several files according to some
types the results blongs to. How can I implement this?

  
There was a similar query earlier. The reply is here 
[http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/[EMAIL PROTECTED]

Hope that helps.
Amar

Thx,
Jeremy

  




Re: Can reducer output multiple files?

2008-05-08 Thread Alejandro Abdelnur
Lookt at the MultipleOutputFormat class under mapred/lib.

Or you can look at the Hadoop-3149 patch that introduces a
MultipleOutputs class. You could get this work easily on released
Hadoop versions.

HTH

Alejandro

On Thu, May 8, 2008 at 2:05 PM, Jeremy Chow <[EMAIL PROTECTED]> wrote:
> Hi list,
> I want to output my reduced results into several files according to some
> types the results blongs to. How can I implement this?
>
> Thx,
> Jeremy
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> http://coderplay.javaeye.com
>


newbie how to get url paths of files in HDFS

2008-05-08 Thread chaitanya krishna
Hi,

  I want to get the "URL" paths of files that are stored in dfs. Is there
any way to get it?


Thank you


Re: "could only be replicated to 0 nodes, instead of 1"

2008-05-08 Thread jasongs

I get the same error when doing a put and my cluster is running ok

i.e. has capacity and all nodes are live. 
Error message is
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/test/test.txt could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)

at org.apache.hadoop.ipc.Client.call(Client.java:512)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:1487)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1601)
I would appreciate any help/suggestions

Thanks


jerrro wrote:
> 
> I am trying to install/configure hadoop on a cluster with several
> computers. I followed exactly the instructions in the hadoop website for
> configuring multiple slaves, and when I run start-all.sh I get no errors -
> both datanode and tasktracker are reported to be running (doing ps awux |
> grep hadoop on the slave nodes returns two java processes). Also, the log
> files are empty - nothing is printed there. Still, when I try to use
> bin/hadoop dfs -put,
> I get the following error:
> 
> # bin/hadoop dfs -put w.txt w.txt
> put: java.io.IOException: File /user/scohen/w4.txt could only be
> replicated to 0 nodes, instead of 1
> 
> and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it).
> 
> I couldn't find much information about this error, but I did manage to see
> somewhere it might mean that there are no datanodes running. But as I
> said, start-all does not give any errors. Any ideas what could be problem?
> 
> Thanks.
> 
> Jerr.
> 

-- 
View this message in context: 
http://www.nabble.com/%22could-only-be-replicated-to-0-nodes%2C-instead-of-1%22-tp14175780p17124514.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



Re: newbie how to get url paths of files in HDFS

2008-05-08 Thread Peeyush Bishnoi
Hello Chaitanya , 

using getInputpaths() you can do this . See here 

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getInputPaths()
 

---
Peeyush

On Thu, 2008-05-08 at 15:07 +0530, chaitanya krishna wrote:

> Hi,
> 
>   I want to get the "URL" paths of files that are stored in dfs. Is there
> any way to get it?
> 
> 
> Thank you


How to track alive tasktrackers from master

2008-05-08 Thread deepak

Hi,

I'm pretty new to hadoop. Wanted to know if there's some option in 
hadoop to get status details of the tasktrackers ( on slaves ) from master.

like:
hadoop dfsadim -report
This gives me details of the slaves only.
is there any such command to know whether all tasktrackers are alive.


Thanks,
Deepak



Re: newbie how to get url paths of files in HDFS

2008-05-08 Thread Peeyush Bishnoi
I apologize that I misunderstood your question , do you want to get URL
through API or through web interface.

Thanks ,

---
Peeyush
  


On Thu, 2008-05-08 at 16:04 +0530, Peeyush Bishnoi wrote:

> Hello Chaitanya , 
> 
> using getInputpaths() you can do this . See here 
> 
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getInputPaths()
>  
> 
> ---
> Peeyush
> 
> On Thu, 2008-05-08 at 15:07 +0530, chaitanya krishna wrote:
> 
> > Hi,
> > 
> >   I want to get the "URL" paths of files that are stored in dfs. Is there
> > any way to get it?
> > 
> > 
> > Thank you


RE: How to handle tif image files in hadoop

2008-05-08 Thread Ted Dunning

Since you get an InputStream from HDFS, you should be able to use any standard 
I/O package including all of the image I/O stuff.

How do you normally read TIFF files?

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thu 5/8/2008 1:02 AM
To: core-user@hadoop.apache.org
Subject: How to handle tif image files in hadoop
 
Hi,

 I want to process the information in tif images using hadoop. For this, a
BufferedImage object has to be created. For JPEG images, ImageIO is used
alongwith the ByteArrayOutputStream which contains byte data of the
image. But  for TIFF image,this doesn't work. Is there any way to handle
this problem?

  Also, can conventional JAI library methods be used to directly access
TIFF files in HDFS?

Thank you.





RE: newbie how to get url paths of files in HDFS

2008-05-08 Thread Ted Dunning


 

Take the fully qualified HDFS path that looks like this:

hdfs://namenode-host-name:port/file-path

And transform it into this:

hdfs://namenode-host-name:web-interface-port/data/file-path

The web-interface-port is 50070 by default.  This will allow you to read HDFS 
files via HTTP.


-Original Message-
From: Peeyush Bishnoi [mailto:[EMAIL PROTECTED]
Sent: Thu 5/8/2008 5:04 AM
To: core-user@hadoop.apache.org
Subject: Re: newbie how to get url paths of files in HDFS
 
I apologize that I misunderstood your question , do you want to get URL
through API or through web interface.

Thanks ,

---
Peeyush
  


On Thu, 2008-05-08 at 16:04 +0530, Peeyush Bishnoi wrote:

> Hello Chaitanya , 
> 
> using getInputpaths() you can do this . See here 
> 
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobConf.html#getInputPaths()
>  
> 
> ---
> Peeyush
> 
> On Thu, 2008-05-08 at 15:07 +0530, chaitanya krishna wrote:
> 
> > Hi,
> > 
> >   I want to get the "URL" paths of files that are stored in dfs. Is there
> > any way to get it?
> > 
> > 
> > Thank you






Re: How to track alive tasktrackers from master

2008-05-08 Thread Arun C Murthy


On May 8, 2008, at 4:59 AM, deepak wrote:


Hi,

I'm pretty new to hadoop. Wanted to know if there's some option in  
hadoop to get status details of the tasktrackers ( on slaves ) from  
master.

like:
hadoop dfsadim -report
This gives me details of the slaves only.
is there any such command to know whether all tasktrackers are alive.



Hmm... there isn't such a command. Please file a jira if you can  
(https://issues.apache.org/jira/secure/CreateIssue!default.jspa - you  
will need a jira a/c before you can create one).


For now, you can use the JobTracker web-ui, click on the "Nodes" link  
in the "Cluster Summary".


Arun


Re: newbie how to get url paths of files in HDFS

2008-05-08 Thread Doug Cutting

Ted Dunning wrote:

Take the fully qualified HDFS path that looks like this:

hdfs://namenode-host-name:port/file-path

And transform it into this:

hdfs://namenode-host-name:web-interface-port/data/file-path

The web-interface-port is 50070 by default.  This will allow you to read HDFS 
files via HTTP.


Also, starting in release 0.18.0, Java programs can use "hdfs:" URLs. 
For example, one can create a URLClassLoader for a jar stored in HDFS.


Doug


Changing DN hostnames->IPs

2008-05-08 Thread Otis Gospodnetic
Hi,

Will NN get confused if I change the names of slaves from hostnames to IPs?
That is, if I've been running Hadoop for a while, and then decide to shut down 
all its daemons, switch to IPs, and start everything back up, will the 
master/NN still see all the DN slaves as before and will it know they are the 
same old set of DN slaves?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Re: "could only be replicated to 0 nodes, instead of 1"

2008-05-08 Thread Hairong Kuang
Could you please go to the dfs webUI and check how many datanodes are up and
how much available space each has?

Hairong


On 5/8/08 3:30 AM, "jasongs" <[EMAIL PROTECTED]> wrote:

> 
> I get the same error when doing a put and my cluster is running ok
> 
> i.e. has capacity and all nodes are live.
> Error message is
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /test/test.txt could only be replicated to 0 nodes, instead of 1
> at
> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)
> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)
> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
> 
> at org.apache.hadoop.ipc.Client.call(Client.java:512)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
> ava:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocation
> Handler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandle
> r.java:59)
> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient
> .java:2074)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClien
> t.java:1967)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:148
> 7)
> at
> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.jav
> a:1601)
> I would appreciate any help/suggestions
> 
> Thanks
> 
> 
> jerrro wrote:
>> 
>> I am trying to install/configure hadoop on a cluster with several
>> computers. I followed exactly the instructions in the hadoop website for
>> configuring multiple slaves, and when I run start-all.sh I get no errors -
>> both datanode and tasktracker are reported to be running (doing ps awux |
>> grep hadoop on the slave nodes returns two java processes). Also, the log
>> files are empty - nothing is printed there. Still, when I try to use
>> bin/hadoop dfs -put,
>> I get the following error:
>> 
>> # bin/hadoop dfs -put w.txt w.txt
>> put: java.io.IOException: File /user/scohen/w4.txt could only be
>> replicated to 0 nodes, instead of 1
>> 
>> and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it).
>> 
>> I couldn't find much information about this error, but I did manage to see
>> somewhere it might mean that there are no datanodes running. But as I
>> said, start-all does not give any errors. Any ideas what could be problem?
>> 
>> Thanks.
>> 
>> Jerr.
>> 



Re: newbie how to get url paths of files in HDFS

2008-05-08 Thread Ted Dunning

That will be incredibly useful!


On 5/8/08 9:35 AM, "Doug Cutting" <[EMAIL PROTECTED]> wrote:

> Ted Dunning wrote:
>> Take the fully qualified HDFS path that looks like this:
>> 
>> hdfs://namenode-host-name:port/file-path
>> 
>> And transform it into this:
>> 
>> hdfs://namenode-host-name:web-interface-port/data/file-path
>> 
>> The web-interface-port is 50070 by default.  This will allow you to read HDFS
>> files via HTTP.
> 
> Also, starting in release 0.18.0, Java programs can use "hdfs:" URLs.
> For example, one can create a URLClassLoader for a jar stored in HDFS.
> 
> Doug



RE: Hadoop Permission Problem

2008-05-08 Thread Natarajan, Senthil
Hi Nicholas,
Thanks it helped.

I gave permission 777 for /user
So now user "Test" can perform HDFS operations.

And also I gave permission 777 for /usr/local/hadoop/datastore on the master.

When user "Test" tries to submit the MapReduce job, getting this error

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x

Where else I need to give permission so that user "Test" can submit jobs using 
jobtracker and Datanode started by user "hadoop".

Thanks,
Senthil

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 07, 2008 5:49 PM
To: core-user@hadoop.apache.org
Subject: Re: Hadoop Permission Problem

Hi Senthil,

Since the path "myapps" is relative, copyFromLocal will copy the file to the 
home directory, i.e. /user/Test/myapps in your case.  If /user/Test doesn't not 
exist, it will first try to create it.  You got AccessControlException because 
the permission of /user is 755.

Hope this helps.

Nicholas



- Original Message 
From: "Natarajan, Senthil" <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Wednesday, May 7, 2008 2:36:22 PM
Subject: Hadoop Permission Problem

Hi,
My datanode and jobtracker are started by user "hadoop".
And user "Test" needs to submit the job. So if the user "Test" copies file to 
HDFS, there is a permission error.
/usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps
copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: 
Permission denied: user=Test, access=WRITE, 
inode="user":hadoop:supergroup:rwxr-xr-x
Could you please let me know how other users (other than hadoop) can access 
HDFS and then submit MapReduce jobs. Where to configure or what default 
configuration needs to be changed.

Thanks,
Senthil


Hadoop Job Submission

2008-05-08 Thread Natarajan, Senthil
Hi,

I have some rudimentary question.

In order to use Hadoop (both HDFS and MapReduce) does each user whoever wants 
to run the job needs to start their jobtracker and datanode and submit the job.



Or



is it possible to start the jobtracker and datanode using the user "hadoop" and 
other users can able to submit the jobs. If this is possible what configuration 
changes needs to be done so that other users won't get permission error.

Thanks,

Senthil



[Reduce task stalls] Problem Detailed Report

2008-05-08 Thread Amit Kumar Singh
Some of the details that might reveal something more about the the problem
i posted
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/[EMAIL 
PROTECTED]

Hadoop Version Used

0.15.3
0.16.3


My environment
**
Ubuntu 7.10 JDK 6.0.


Setup
**
2 cluster machine (one master and 2 slaves. master is itself a slave)


Application
**
Sample wordcount example provided with hadoop distributions


Problem
**
Tried on both versions. In 0.16.3 the reduce task ends after failures
(mapred.JobClient: Task Id : XTZ , Status : FAILED
Too many fetch-failures). but in 0.15.3 entire thing just stalls

(Dataset size is <10 MB )

(All the logs and outputs are for version 0.15.3)


Console output for 0.15.3
---
08/05/09 11:12:22 INFO mapred.FileInputFormat: Total input paths to
process : 7
08/05/09 11:12:22 INFO mapred.JobClient: Running job: job_200805091110_0001
08/05/09 11:12:23 INFO mapred.JobClient:  map 0% reduce 0%
08/05/09 11:12:31 INFO mapred.JobClient:  map 14% reduce 0%
08/05/09 11:12:32 INFO mapred.JobClient:  map 42% reduce 0%
08/05/09 11:12:33 INFO mapred.JobClient:  map 57% reduce 0%
08/05/09 11:12:34 INFO mapred.JobClient:  map 71% reduce 0%
08/05/09 11:12:35 INFO mapred.JobClient:  map 100% reduce 0%
08/05/09 11:12:43 INFO mapred.JobClient:  map 100% reduce 9%
08/05/09 11:12:53 INFO mapred.JobClient:  map 100% reduce 14%

AND ENTIRE THING HANGS 


Steps followed for Setup
-
1) Modified conf/haddop-env.sh (Java home)
2) Modified conf/master added hostname of master server (in my case
master)--- ONLY ON MASTER
3) Modified conf/slave added hostname of slave (in my case master and
slave)--- ONLY ON MASTER
4) Enables password free ssh from master to slave, master to master ,
slave to slave, and slave to master
5) Modified hadoop-site.xml (Both Master and Slave)

   fs.default.name
   hdfs://master:54310

 
   mapred.job.tracker
   master:54311
 

   dfs.replication
   2



mapred.reduce.copy.backoff  (ADDED THIS PROPERTY AS ONE OF
THE POST SUGGESTED THIS AS SOLUTION)
5


6) hadoop namenode -format --- ONLY ON MASTER
7) start-dfs.sh --- ONLY ON MASTER
8) start-mapred.sh --- ONLY ON MASTER
9)./hadoop dfs -copyFromLocal ../../data/ d1 (d1 - folder containing some
text files) --- ONLY ON MASTER
10)./hadoop jar hadoop-0.15.3-examples.jar wordcount d1 d1_op (map reduce
task) --- ONLY ON MASTER


MASTER LOGS

*
hadoop-hadoop-tasktracker-master.log
*
2008-05-09 11:10:15,582 INFO org.mortbay.util.Credential: Checking
Resource aliases
2008-05-09 11:10:15,637 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2008-05-09 11:10:15,638 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2008-05-09 11:10:15,638 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]
2008-05-09 11:10:16,000 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-05-09 11:10:16,033 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]
2008-05-09 11:10:16,037 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50060
2008-05-09 11:10:16,037 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-05-09 11:10:16,045 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=TaskTracker, sessionId=
2008-05-09 11:10:16,059 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 59074: starting
2008-05-09 11:10:16,059 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 59074: starting
2008-05-09 11:10:16,059 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 59074: starting
2008-05-09 11:10:16,059 INFO org.apache.hadoop.mapred.TaskTracker:
TaskTracker up at: /127.0.0.1:59074
2008-05-09 11:10:16,059 INFO org.apache.hadoop.mapred.TaskTracker:
Starting tracker tracker_cse-desktop:/127.0.0.1:59074
2008-05-09 11:10:16,100 INFO org.apache.hadoop.mapred.TaskTracker:
Starting thread: Map-events fetcher for all reduce tasks on
tracker_cse-desktop:/127.0.0.1:59074
2008-05-09 11:12:26,313 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction: task_200805091110_0001_m_00_0
2008-05-09 11:12:26,764 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction: task_200805091110_0001_m_02_0
2008-05-09 11:12:31,971 INFO org.apache.hadoop.mapred.TaskTracker:
task_200805091110_0001_m_02_0 1.0%
hdfs://master:54310/user/hadoop/suvidya/8ldvc10.txt:0+1427769
2008-05-09 11:12:31,975 INFO org.apache.hadoop.mapred.TaskTracker: Task
task_200805091110_0001_m_02_0 is done.
2008-05-09 11:12:32,032 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction: task_200805091110_0001_m_05_0
2008-05-09 11:12:32,857 INFO org.apache.hadoop.mapred.TaskTracker:
task_200805091110_0001_m_00_0 1.0%
hdfs://master:54310/user/hadoop/suvidya/19699.txt:0+1

RE: [Reduce task stalls] Problem Detailed Report

2008-05-08 Thread Natarajan, Senthil
May be due to firewall.
Try after stopping the iptables.
If it works add firewall rules to allow communication between master and slaves 
(better allow all nodes in the subnet)

-Original Message-
From: Amit Kumar Singh [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 08, 2008 4:50 PM
To: core-user@hadoop.apache.org
Subject: [Reduce task stalls] Problem Detailed Report

Some of the details that might reveal something more about the the problem
i posted
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/[EMAIL 
PROTECTED]

Hadoop Version Used

0.15.3
0.16.3


My environment
**
Ubuntu 7.10 JDK 6.0.


Setup
**
2 cluster machine (one master and 2 slaves. master is itself a slave)


Application
**
Sample wordcount example provided with hadoop distributions


Problem
**
Tried on both versions. In 0.16.3 the reduce task ends after failures
(mapred.JobClient: Task Id : XTZ , Status : FAILED
Too many fetch-failures). but in 0.15.3 entire thing just stalls

(Dataset size is <10 MB )

(All the logs and outputs are for version 0.15.3)


Console output for 0.15.3
---
08/05/09 11:12:22 INFO mapred.FileInputFormat: Total input paths to
process : 7
08/05/09 11:12:22 INFO mapred.JobClient: Running job: job_200805091110_0001
08/05/09 11:12:23 INFO mapred.JobClient:  map 0% reduce 0%
08/05/09 11:12:31 INFO mapred.JobClient:  map 14% reduce 0%
08/05/09 11:12:32 INFO mapred.JobClient:  map 42% reduce 0%
08/05/09 11:12:33 INFO mapred.JobClient:  map 57% reduce 0%
08/05/09 11:12:34 INFO mapred.JobClient:  map 71% reduce 0%
08/05/09 11:12:35 INFO mapred.JobClient:  map 100% reduce 0%
08/05/09 11:12:43 INFO mapred.JobClient:  map 100% reduce 9%
08/05/09 11:12:53 INFO mapred.JobClient:  map 100% reduce 14%

AND ENTIRE THING HANGS 


Steps followed for Setup
-
1) Modified conf/haddop-env.sh (Java home)
2) Modified conf/master added hostname of master server (in my case
master)--- ONLY ON MASTER
3) Modified conf/slave added hostname of slave (in my case master and
slave)--- ONLY ON MASTER
4) Enables password free ssh from master to slave, master to master ,
slave to slave, and slave to master
5) Modified hadoop-site.xml (Both Master and Slave)

   fs.default.name
   hdfs://master:54310

 
   mapred.job.tracker
   master:54311
 

   dfs.replication
   2



mapred.reduce.copy.backoff  (ADDED THIS PROPERTY AS ONE OF
THE POST SUGGESTED THIS AS SOLUTION)
5


6) hadoop namenode -format --- ONLY ON MASTER
7) start-dfs.sh --- ONLY ON MASTER
8) start-mapred.sh --- ONLY ON MASTER
9)./hadoop dfs -copyFromLocal ../../data/ d1 (d1 - folder containing some
text files) --- ONLY ON MASTER
10)./hadoop jar hadoop-0.15.3-examples.jar wordcount d1 d1_op (map reduce
task) --- ONLY ON MASTER


MASTER LOGS

*
hadoop-hadoop-tasktracker-master.log
*
2008-05-09 11:10:15,582 INFO org.mortbay.util.Credential: Checking
Resource aliases
2008-05-09 11:10:15,637 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2008-05-09 11:10:15,638 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2008-05-09 11:10:15,638 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]
2008-05-09 11:10:16,000 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-05-09 11:10:16,033 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]
2008-05-09 11:10:16,037 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50060
2008-05-09 11:10:16,037 INFO org.mortbay.util.Container: Started
[EMAIL PROTECTED]
2008-05-09 11:10:16,045 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=TaskTracker, sessionId=
2008-05-09 11:10:16,059 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 59074: starting
2008-05-09 11:10:16,059 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 59074: starting
2008-05-09 11:10:16,059 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 59074: starting
2008-05-09 11:10:16,059 INFO org.apache.hadoop.mapred.TaskTracker:
TaskTracker up at: /127.0.0.1:59074
2008-05-09 11:10:16,059 INFO org.apache.hadoop.mapred.TaskTracker:
Starting tracker tracker_cse-desktop:/127.0.0.1:59074
2008-05-09 11:10:16,100 INFO org.apache.hadoop.mapred.TaskTracker:
Starting thread: Map-events fetcher for all reduce tasks on
tracker_cse-desktop:/127.0.0.1:59074
2008-05-09 11:12:26,313 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction: task_200805091110_0001_m_00_0
2008-05-09 11:12:26,764 INFO org.apache.hadoop.mapred.TaskTracker:
LaunchTaskAction: task_200805091110_0001_m_02_0
2008-05-09 11:12:31,971 INFO org.apache.hadoop.mapred.TaskTracker:
task_200805091110_0001_m_02_0 1.0%
hdfs://master:54310/user/hadoop/suvidya/8ldvc10.txt:0+1427769
2008-05-09 11:1

Re: How do I copy files from my linux file system to HDFS using a java prog?

2008-05-08 Thread Ajey Shah

Thanks Ted. It solves my problem. :)

Ted Dunning-3 wrote:
> 
> 
> I think that file names of the form "file://directory-path" should work to
> give you local file access using this program.
> 
> 
> On 5/6/08 3:34 PM, "Ajey Shah" <[EMAIL PROTECTED]> wrote:
> 
>> 
>> Thanks Suresh. But even this program reads and writes from the HDFS. What
>> i
>> need to do is read from my normal local linux harddrive and write to the
>> HDFS.
>> 
>> I'm sorry if I misunderstood your program.
>> 
>> Thanks for replying. :)
>> 
>> 
>> 
>> Babu, Suresh wrote:
>>> 
>>> 
>>> Try this program. Modify the HDFS configuration, if it is different from
>>> the default.
>>> 
>>> import java.io.File;
>>> import java.io.IOException;
>>> 
>>> import org.apache.hadoop.conf.Configuration;
>>> import org.apache.hadoop.fs.FileStatus;
>>> import org.apache.hadoop.fs.FileSystem;
>>> import org.apache.hadoop.fs.FSDataInputStream;
>>> import org.apache.hadoop.fs.FSDataOutputStream;
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.io.IOUtils;
>>> 
>>> public class HadoopDFSFileReadWrite {
>>> 
>>>   static void usage () {
>>> System.out.println("Usage : HadoopDFSFileReadWrite 
>>> ");
>>> System.exit(1);
>>>   }
>>> 
>>>   static void printAndExit(String str) {
>>> System.err.println(str);
>>> System.exit(1);
>>>   }
>>> 
>>>   public static void main (String[] argv) throws IOException {
>>> Configuration conf = new Configuration();
>>> conf.set("fs.default.name", "localhost:9000");
>>> FileSystem fs = FileSystem.get(conf);
>>> 
>>> FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory());
>>> for(FileStatus status : fileStatus) {
>>> System.out.println("File: " + status.getPath());
>>> }
>>> 
>>> if (argv.length != 2)
>>>   usage();
>>> 
>>> // HadoopDFS deals with Path
>>> Path inFile = new Path(argv[0]);
>>> Path outFile = new Path(argv[1]);
>>> 
>>> // Check if input/output are valid
>>> if (!fs.exists(inFile))
>>>   printAndExit("Input file not found");
>>> if (!fs.isFile(inFile))
>>>   printAndExit("Input should be a file");
>>> if (fs.exists(outFile))
>>>   printAndExit("Output already exists");
>>> 
>>> // Read from and write to new file
>>> FSDataInputStream in = fs.open(inFile);
>>> FSDataOutputStream out = fs.create(outFile);
>>> byte buffer[] = new byte[256];
>>> try {
>>>   int bytesRead = 0;
>>>   while ((bytesRead = in.read(buffer)) > 0) {
>>> out.write(buffer, 0, bytesRead);
>>>   }
>>> 
>>> } catch (IOException e) {
>>>   System.out.println("Error while copying file");
>>> } finally {
>>>   in.close();
>>>   out.close();
>>> }
>>>   }
>>> }
>>> 
>>> Suresh
>>> 
>>> 
>>> -Original Message-
>>> From: Ajey Shah [mailto:[EMAIL PROTECTED]
>>> Sent: Thursday, May 01, 2008 3:31 AM
>>> To: core-user@hadoop.apache.org
>>> Subject: How do I copy files from my linux file system to HDFS using a
>>> java prog?
>>> 
>>> 
>>> Hello all,
>>> 
>>> I need to copy files from my linux file system to HDFS in a java program
>>> and not manually. This is the piece of code that I have.
>>> 
>>> try {
>>> 
>>> FileSystem hdfs = FileSystem.get(new
>>> Configuration());
>>> 
>>> LocalFileSystem ls = null;
>>> 
>>> ls = hdfs.getLocal(hdfs.getConf());
>>> 
>>> hdfs.copyFromLocalFile(false, new
>>> Path(fileName), new Path(outputFile));
>>> 
>>> } catch (Exception e) {
>>> e.printStackTrace();
>>> }
>>> 
>>> The problem is that it searches for the input path on the HDFS and not
>>> my linux file system.
>>> 
>>> Can someone point out where I may be wrong. I feel it's some
>>> configuration issue but have not been able to figure it out.
>>> 
>>> Thanks.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H
>>> DFS-using-a-java-prog--tp16992491p16992491.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>> 
>>> 
>>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-HDFS-using-a-java-prog--tp16992491p17136713.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: How do I copy files from my linux file system to HDFS using a java prog?

2008-05-08 Thread Ajey Shah

Thanks Suresh. It works! :)


Babu, Suresh wrote:
> 
> This program can read from local file system as well as HDFS.
> Try   
> 
> Thanks
> Suresh
> 
> -Original Message-
> From: Ajey Shah [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, May 07, 2008 4:04 AM
> To: core-user@hadoop.apache.org
> Subject: RE: How do I copy files from my linux file system to HDFS using
> a java prog?
> 
> 
> Thanks Suresh. But even this program reads and writes from the HDFS.
> What i need to do is read from my normal local linux harddrive and write
> to the HDFS.
> 
> I'm sorry if I misunderstood your program.
> 
> Thanks for replying. :)
> 
> 
> 
> Babu, Suresh wrote:
>> 
>> 
>> Try this program. Modify the HDFS configuration, if it is different 
>> from the default.
>> 
>> import java.io.File;
>> import java.io.IOException;
>> 
>> import org.apache.hadoop.conf.Configuration;
>> import org.apache.hadoop.fs.FileStatus; import 
>> org.apache.hadoop.fs.FileSystem; import 
>> org.apache.hadoop.fs.FSDataInputStream;
>> import org.apache.hadoop.fs.FSDataOutputStream;
>> import org.apache.hadoop.fs.Path;
>> import org.apache.hadoop.io.IOUtils;
>> 
>> public class HadoopDFSFileReadWrite {
>> 
>>   static void usage () {
>> System.out.println("Usage : HadoopDFSFileReadWrite  
>> ");
>> System.exit(1);
>>   }
>> 
>>   static void printAndExit(String str) {
>> System.err.println(str);
>> System.exit(1);
>>   }
>> 
>>   public static void main (String[] argv) throws IOException {
>> Configuration conf = new Configuration();
>> conf.set("fs.default.name", "localhost:9000");
>> FileSystem fs = FileSystem.get(conf);
>> 
>> FileStatus[] fileStatus = fs.listStatus(fs.getHomeDirectory());
>> for(FileStatus status : fileStatus) {
>> System.out.println("File: " + status.getPath());
>> }
>> 
>> if (argv.length != 2)
>>   usage();
>> 
>> // HadoopDFS deals with Path
>> Path inFile = new Path(argv[0]);
>> Path outFile = new Path(argv[1]);
>> 
>> // Check if input/output are valid
>> if (!fs.exists(inFile))
>>   printAndExit("Input file not found");
>> if (!fs.isFile(inFile))
>>   printAndExit("Input should be a file");
>> if (fs.exists(outFile))
>>   printAndExit("Output already exists");
>> 
>> // Read from and write to new file
>> FSDataInputStream in = fs.open(inFile);
>> FSDataOutputStream out = fs.create(outFile);
>> byte buffer[] = new byte[256];
>> try {
>>   int bytesRead = 0;
>>   while ((bytesRead = in.read(buffer)) > 0) {
>> out.write(buffer, 0, bytesRead);
>>   }
>> 
>> } catch (IOException e) {
>>   System.out.println("Error while copying file");
>> } finally {
>>   in.close();
>>   out.close();
>> }
>>   }
>> }
>> 
>> Suresh
>> 
>> 
>> -Original Message-
>> From: Ajey Shah [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, May 01, 2008 3:31 AM
>> To: core-user@hadoop.apache.org
>> Subject: How do I copy files from my linux file system to HDFS using a
> 
>> java prog?
>> 
>> 
>> Hello all,
>> 
>> I need to copy files from my linux file system to HDFS in a java 
>> program and not manually. This is the piece of code that I have.
>> 
>> try {
>> 
>>  FileSystem hdfs = FileSystem.get(new
> Configuration());
>>  
>>  LocalFileSystem ls = null;
>>  
>>  ls = hdfs.getLocal(hdfs.getConf());
>>  
>>  hdfs.copyFromLocalFile(false, new
>> Path(fileName), new Path(outputFile));
>> 
>>  } catch (Exception e) {
>>  e.printStackTrace();
>>  }
>> 
>> The problem is that it searches for the input path on the HDFS and not
> 
>> my linux file system.
>> 
>> Can someone point out where I may be wrong. I feel it's some 
>> configuration issue but have not been able to figure it out.
>> 
>> Thanks.
>> --
>> View this message in context:
>> http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to
>> -H DFS-using-a-java-prog--tp16992491p16992491.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> 
>> 
>> 
> 
> --
> View this message in context:
> http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-H
> DFS-using-a-java-prog--tp16992491p17093646.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-do-I-copy-files-from-my-linux-file-system-to-HDFS-using-a-java-prog--tp16992491p17136728.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



RE: [Reduce task stalls] Problem Detailed Report

2008-05-08 Thread Amit Kumar Singh
Hye tried doing the same to no avial.
I guess if firewall was the problem then as mentioned in the post
data distribution using HDFS would not have been a problem.
( I checked map task were executed sucessfully in both master and slave)
For some strange reasons reduce wont execute

on 0.16.3 i get following output on console. (Note for some reasons all
the reduce task get shifted to one reducer. and maps are rexecuted) Any
Idea.
Is it BUG HADOOP-1374

08/05/09 12:33:19 INFO mapred.FileInputFormat: Total input paths to
process : 7
08/05/09 12:33:19 INFO mapred.JobClient: Running job: job_200805091211_0003
08/05/09 12:33:20 INFO mapred.JobClient:  map 0% reduce 0%
08/05/09 12:33:26 INFO mapred.JobClient:  map 14% reduce 0%
08/05/09 12:33:27 INFO mapred.JobClient:  map 28% reduce 0%
08/05/09 12:33:28 INFO mapred.JobClient:  map 42% reduce 0%
08/05/09 12:33:29 INFO mapred.JobClient:  map 71% reduce 0%
08/05/09 12:33:32 INFO mapred.JobClient:  map 100% reduce 0%
08/05/09 12:33:42 INFO mapred.JobClient:  map 100% reduce 19%
08/05/09 12:34:42 INFO mapred.JobClient: Task Id :
task_200805091211_0003_m_03_0, Status : FAILED
Too many fetch-failures
08/05/09 12:34:42 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:42 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:43 INFO mapred.JobClient:  map 85% reduce 19%
08/05/09 12:34:45 INFO mapred.JobClient:  map 100% reduce 19%
08/05/09 12:34:53 INFO mapred.JobClient: Task Id :
task_200805091211_0003_m_04_0, Status : FAILED
Too many fetch-failures
08/05/09 12:34:53 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:53 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:55 INFO mapred.JobClient:  map 85% reduce 19%
08/05/09 12:34:56 INFO mapred.JobClient:  map 100% reduce 19%
08/05/09 12:35:00 INFO mapred.JobClient:  map 100% reduce 23%
08/05/09 12:35:00 INFO mapred.JobClient: Task Id :
task_200805091211_0003_m_00_0, Status : FAILED
Too many fetch-failures
08/05/09 12:35:00 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:35:00 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:35:01 INFO mapred.JobClient:  map 85% reduce 23%
08/05/09 12:35:05 INFO mapred.JobClient:  map 100% reduce 28%
08/05/09 12:35:13 INFO mapred.JobClient:  map 100% reduce 100%
08/05/09 12:35:14 INFO mapred.JobClient: Job complete: job_200805091211_0003
08/05/09 12:35:14 INFO mapred.JobClient: Counters: 12
08/05/09 12:35:14 INFO mapred.JobClient:   Job Counters
08/05/09 12:35:14 INFO mapred.JobClient: Launched map tasks=10
08/05/09 12:35:14 INFO mapred.JobClient: Launched reduce tasks=1
08/05/09 12:35:14 INFO mapred.JobClient: Data-local map tasks=7
08/05/09 12:35:14 INFO mapred.JobClient:   Map-Reduce Framework
08/05/09 12:35:14 INFO mapred.JobClient: Map input records=136582
08/05/09 12:35:14 INFO mapred.JobClient: Map output records=1173106
08/05/09 12:35:14 INFO mapred.JobClient: Map input bytes=6929688
08/05/09 12:35:14 INFO mapred.JobClient: Map output bytes=11403672
08/05/09 12:35:14 INFO mapred.JobClient: Combine input records=1173106
08/05/09 12:35:14 INFO mapred.JobClient: Combine output records=195209
08/05/09 12:35:14 INFO mapred.JobClient: Reduce input groups=131275
08/05/09 12:35:14 INFO mapred.JobClient: Reduce input records=195209
08/05/09 12:35:14 INFO mapred.JobClient: Reduce output records=131275




-- 
Amit Singh
First Year PostGraduate Student.
Department Of Computer Science And Engineering.
Indian Institute Of Technology,Mumbai.


-- 
"A man's reach should exceed his grasps, or what are the heavens for"
--Vinton G Cerf

> May be due to firewall.
> Try after stopping the iptables.
> If it works add firewall rules to allow communication between master and
> slaves (better allow all nodes in the subnet)
>
> -Original Message-
> From: Amit Kumar Singh [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 08, 2008 4:50 PM
> To: core-user@hadoop.apache.org
> Subject: [Reduce task stalls] Problem Detailed Report
>
> Some of the details that might reveal something more about the the problem
> i posted
> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/[EMAIL 
> PROTECTED]
>
> Hadoop Version Used
> 
> 0.15.3
> 0.16.3
>
>
> My environment
> **
> Ubuntu 7.10 JDK 6.0.
>
>
> Setup
> **
> 2 cluster machine (one master and 2 slaves. master is itself a slave)
>
>
> Application
> **
> Sample wordcount example provided with hadoop distributions
>
>
> Problem
> **
> Tried on both versions. In 0.16.3 the reduce task ends after failures
> (mapred.JobClient: Task Id : XTZ , Status : FAILED
> Too many fetch-failures). but in 0.15.3 entire thing just stalls
>
> (Dataset size is <10 MB )
>
> (All the logs and outputs are for version 0.15.3)
>
>
> Console output for 0.15.3
> 

RE: [Reduce task stalls] Problem Detailed Report

2008-05-08 Thread Mohanraj Umapathy

Hi,
I had similar symptoms when one of my slave /etc/hosts entry was wrong.

Few suggestions:
-- Check the output of "start-dfs.sh" and make sure the host names or 
ip addrs are correct for all the nodes

-- Run  bin/hadoop fsck "/"  and make sure the file system is healthy.

thanks,
Mohan

At 02:38 PM 5/8/2008, you wrote:

Hye tried doing the same to no avial.
I guess if firewall was the problem then as mentioned in the post
data distribution using HDFS would not have been a problem.
( I checked map task were executed sucessfully in both master and slave)
For some strange reasons reduce wont execute

on 0.16.3 i get following output on console. (Note for some reasons all
the reduce task get shifted to one reducer. and maps are rexecuted) Any
Idea.
Is it BUG HADOOP-1374

08/05/09 12:33:19 INFO mapred.FileInputFormat: Total input paths to
process : 7
08/05/09 12:33:19 INFO mapred.JobClient: Running job: job_200805091211_0003
08/05/09 12:33:20 INFO mapred.JobClient:  map 0% reduce 0%
08/05/09 12:33:26 INFO mapred.JobClient:  map 14% reduce 0%
08/05/09 12:33:27 INFO mapred.JobClient:  map 28% reduce 0%
08/05/09 12:33:28 INFO mapred.JobClient:  map 42% reduce 0%
08/05/09 12:33:29 INFO mapred.JobClient:  map 71% reduce 0%
08/05/09 12:33:32 INFO mapred.JobClient:  map 100% reduce 0%
08/05/09 12:33:42 INFO mapred.JobClient:  map 100% reduce 19%
08/05/09 12:34:42 INFO mapred.JobClient: Task Id :
task_200805091211_0003_m_03_0, Status : FAILED
Too many fetch-failures
08/05/09 12:34:42 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:42 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:43 INFO mapred.JobClient:  map 85% reduce 19%
08/05/09 12:34:45 INFO mapred.JobClient:  map 100% reduce 19%
08/05/09 12:34:53 INFO mapred.JobClient: Task Id :
task_200805091211_0003_m_04_0, Status : FAILED
Too many fetch-failures
08/05/09 12:34:53 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:53 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:34:55 INFO mapred.JobClient:  map 85% reduce 19%
08/05/09 12:34:56 INFO mapred.JobClient:  map 100% reduce 19%
08/05/09 12:35:00 INFO mapred.JobClient:  map 100% reduce 23%
08/05/09 12:35:00 INFO mapred.JobClient: Task Id :
task_200805091211_0003_m_00_0, Status : FAILED
Too many fetch-failures
08/05/09 12:35:00 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:35:00 WARN mapred.JobClient: Error reading task
outputmtech-desktop
08/05/09 12:35:01 INFO mapred.JobClient:  map 85% reduce 23%
08/05/09 12:35:05 INFO mapred.JobClient:  map 100% reduce 28%
08/05/09 12:35:13 INFO mapred.JobClient:  map 100% reduce 100%
08/05/09 12:35:14 INFO mapred.JobClient: Job complete: job_200805091211_0003
08/05/09 12:35:14 INFO mapred.JobClient: Counters: 12
08/05/09 12:35:14 INFO mapred.JobClient:   Job Counters
08/05/09 12:35:14 INFO mapred.JobClient: Launched map tasks=10
08/05/09 12:35:14 INFO mapred.JobClient: Launched reduce tasks=1
08/05/09 12:35:14 INFO mapred.JobClient: Data-local map tasks=7
08/05/09 12:35:14 INFO mapred.JobClient:   Map-Reduce Framework
08/05/09 12:35:14 INFO mapred.JobClient: Map input records=136582
08/05/09 12:35:14 INFO mapred.JobClient: Map output records=1173106
08/05/09 12:35:14 INFO mapred.JobClient: Map input bytes=6929688
08/05/09 12:35:14 INFO mapred.JobClient: Map output bytes=11403672
08/05/09 12:35:14 INFO mapred.JobClient: Combine input records=1173106
08/05/09 12:35:14 INFO mapred.JobClient: Combine output records=195209
08/05/09 12:35:14 INFO mapred.JobClient: Reduce input groups=131275
08/05/09 12:35:14 INFO mapred.JobClient: Reduce input records=195209
08/05/09 12:35:14 INFO mapred.JobClient: Reduce output records=131275




--
Amit Singh
First Year PostGraduate Student.
Department Of Computer Science And Engineering.
Indian Institute Of Technology,Mumbai.


--
"A man's reach should exceed his grasps, or what are the heavens for"
--Vinton G Cerf

> May be due to firewall.
> Try after stopping the iptables.
> If it works add firewall rules to allow communication between master and
> slaves (better allow all nodes in the subnet)
>
> -Original Message-
> From: Amit Kumar Singh [mailto:[EMAIL PROTECTED]
> Sent: Thursday, May 08, 2008 4:50 PM
> To: core-user@hadoop.apache.org
> Subject: [Reduce task stalls] Problem Detailed Report
>
> Some of the details that might reveal something more about the the problem
> i posted
> 
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/[EMAIL PROTECTED]

>
> Hadoop Version Used
> 
> 0.15.3
> 0.16.3
>
>
> My environment
> **
> Ubuntu 7.10 JDK 6.0.
>
>
> Setup
> **
> 2 cluster machine (one master and 2 slaves. master is itself a slave)
>
>
> Application
> **
> Sample wordcount example provided with hadoop distributions
>
>
> Problem
> *

Hbase on hadoop

2008-05-08 Thread Rick Hangartner

Hi,

We have an issue with hbase on hadoop and file system permissions we  
hope someone already knows the answer to.  Our apologies if we missed  
that this issue has already been addressed on this list.


We are running hbase-0.1.2 on top of hadoop-0.16.3, starting the hbase  
daemon from an "hbase" user account and the hadoop daemon and have  
observed this "feature".   We are running hbase in a separate "hadoop"  
user account and hadoop in it's own "hadoop" user account on a single  
machine.


When we try to start up hbase, we see this error message in the log:

2008-05-06 12:09:02,845 ERROR org.apache.hadoop.hbase.HMaster: Can not  
start master

java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native  
Method)
	at  
sun 
.reflect 
.NativeConstructorAccessorImpl 
.newInstance(NativeConstructorAccessorImpl.java:39)
	at  
sun 
.reflect 
.DelegatingConstructorAccessorImpl 
.newInstance(DelegatingConstructorAccessorImpl.java:27)

at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at org.apache.hadoop.hbase.HMaster.doMain(HMaster.java:3329)
at org.apache.hadoop.hbase.HMaster.main(HMaster.java:3363)
Caused by: org.apache.hadoop.ipc.RemoteException:  
org.apache.hadoop.fs.permission.AccessControlException: Superuser  
privilege is required

... (etc)

If we run hbase in the hadoop user account we don't have any problems.

We think we've narrowed the issue down a bit from the debug logs.

The method "FSNameSystem.checkPermission()" method is throwing the  
exception because the "PermissionChecker()" constructor is returning  
that the hbase user is not a superuser or in the same supergroup as  
hadoop.


  private void checkSuperuserPrivilege() throws  
AccessControlException {

if (isPermissionEnabled) {
  PermissionChecker pc = new PermissionChecker(
  fsOwner.getUserName(), supergroup);
  if (!pc.isSuper) {
throw new AccessControlException("Superuser privilege is  
required");

  }
}
  }

If we look at at the "PermissionChecker()" constructor we see that it  
is comparing the hdfs owner name (which should be "hadoop") and the  
hdfs file system owner's group ("supergroup") to the current user and  
groups, which the log seems to indicate the user is "hbase" and the  
groups for user "hbase" only include "hbase" :

  PermissionChecker(String fsOwner, String supergroup
  ) throws AccessControlException{
UserGroupInformation ugi = UserGroupInformation.getCurrentUGI();
if (LOG.isDebugEnabled()) {
  LOG.debug("ugi=" + ugi);
}

if (ugi != null) {
  user = ugi.getUserName();
  groups.addAll(Arrays.asList(ugi.getGroupNames()));
  isSuper = user.equals(fsOwner) || groups.contains(supergroup);
}
else {
  throw new AccessControlException("ugi = null");
}
  }

The current user and group is derived from the thread information:
  private static final ThreadLocal currentUGI
= new ThreadLocal();

  /** @return the [EMAIL PROTECTED] UserGroupInformation} for the current thread  
*/

  public static UserGroupInformation getCurrentUGI() {
return currentUGI.get();
  }

which we're hoping might be enough to illuminate the problem.

One question this raises is if the "hbase:hbase" user and group are  
being derived from the Linux file system user and group, or if they  
are the hdfs user and group?
Otherwise, how can we indicate that "hbase" user is in the hdfs group  
"supergroup"? Is there a parameter in a hadoop configuration file?   
Apparently setting the groups of the web server to include  
"supergroup" didn't have any effect, although perhaps that could be  
for some other reason?


Thanks.



Jaql 0.2 Released

2008-05-08 Thread Meta Tempus
Jaql is a query language being developed for JSON data that runs on Hadoop and 
HBase.  We are pleased to announce Release 0.2 including: 
* Easier ways to extend functions 
* Easier ways to extend readers/writers 
* Enhanced work on query rewrites, including the use of Map/Reduce's 
combiners 
(local aggregation). 
To learn more about jaql, read the overview and 
get started by downloading. 

Feedback is greatly appreciated!


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: Hadoop Job Submission

2008-05-08 Thread [EMAIL PROTECTED]
Hi Senthil,

Namenode and datanodes are supposed to be started by the same account.  
Jobtracker and its tasktrackers can be started by another account.  Then, you 
can submit job with a third account (if all permission setting are correct).  
See http://issues.apache.org/jira/browse/HADOOP-3182 for more discussion.

Nicholas



- Original Message 
From: "Natarajan, Senthil" <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Thursday, May 8, 2008 1:15:33 PM
Subject: Hadoop Job Submission

Hi,

I have some rudimentary question.

In order to use Hadoop (both HDFS and MapReduce) does each user whoever wants 
to run the job needs to start their jobtracker and datanode and submit the job.



Or



is it possible to start the jobtracker and datanode using the user "hadoop" and 
other users can able to submit the jobs. If this is possible what configuration 
changes needs to be done so that other users won't get permission error.

Thanks,

Senthil

Re: Hadoop Permission Problem

2008-05-08 Thread s29752-hadoopuser
Hi Senthil,

In the error message, it says that the permission for "datastore" is 755.  Are 
you sure that you have changed it to 777?

Nicholas



- Original Message 
From: "Natarajan, Senthil" <[EMAIL PROTECTED]>
To: "core-user@hadoop.apache.org" 
Sent: Thursday, May 8, 2008 11:57:46 AM
Subject: RE: Hadoop Permission Problem

Hi Nicholas,
Thanks it helped.

I gave permission 777 for /user
So now user "Test" can perform HDFS operations.

And also I gave permission 777 for /usr/local/hadoop/datastore on the master.

When user "Test" tries to submit the MapReduce job, getting this error

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: 
org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
user=test, access=WRITE, inode="datastore":hadoop:supergroup:rwxr-xr-x

Where else I need to give permission so that user "Test" can submit jobs using 
jobtracker and Datanode started by user "hadoop".

Thanks,
Senthil

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 07, 2008 5:49 PM
To: core-user@hadoop.apache.org
Subject: Re: Hadoop Permission Problem

Hi Senthil,

Since the path "myapps" is relative, copyFromLocal will copy the file to the 
home directory, i.e. /user/Test/myapps in your case.  If /user/Test doesn't not 
exist, it will first try to create it.  You got AccessControlException because 
the permission of /user is 755.

Hope this helps.

Nicholas



- Original Message 
From: "Natarajan, Senthil" <[EMAIL PROTECTED]>
To: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Sent: Wednesday, May 7, 2008 2:36:22 PM
Subject: Hadoop Permission Problem

Hi,
My datanode and jobtracker are started by user "hadoop".
And user "Test" needs to submit the job. So if the user "Test" copies file to 
HDFS, there is a permission error.
/usr/local/hadoop/bin/hadoop dfs -copyFromLocal /home/Test/somefile.txt myapps
copyFromLocal: org.apache.hadoop.fs.permission.AccessControlException: 
Permission denied: user=Test, access=WRITE, 
inode="user":hadoop:supergroup:rwxr-xr-x
Could you please let me know how other users (other than hadoop) can access 
HDFS and then submit MapReduce jobs. Where to configure or what default 
configuration needs to be changed.

Thanks,
Senthil


Re: Changing DN hostnames->IPs

2008-05-08 Thread Raghu Angadi

Short answer : renaming is not problem.

If you are running fairly recent Hadoop, NN does not store information 
about the DataNode persistently. So you should be ok in the sense 
NameNode does not depend on datanodes before the restart.


If you are running fairly old Hadoop, functionally it will ok. Only 
annoying thing would be that you might see all the previous datanodes 
listed as dead.


The recent trunk has some issues related using Datanode ID for some 
unclosed files.. but you may not be affected by that.


Raghu.

Otis Gospodnetic wrote:

Hi,

Will NN get confused if I change the names of slaves from hostnames to IPs?
That is, if I've been running Hadoop for a while, and then decide to shut down 
all its daemons, switch to IPs, and start everything back up, will the 
master/NN still see all the DN slaves as before and will it know they are the 
same old set of DN slaves?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





Re: Changing DN hostnames->IPs

2008-05-08 Thread Otis Gospodnetic
Thank you, Raghu.  I'm using 0.16.3, so I should be safe. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Raghu Angadi <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 9:10:33 PM
> Subject: Re: Changing DN hostnames->IPs
> 
> Short answer : renaming is not problem.
> 
> If you are running fairly recent Hadoop, NN does not store information 
> about the DataNode persistently. So you should be ok in the sense 
> NameNode does not depend on datanodes before the restart.
> 
> If you are running fairly old Hadoop, functionally it will ok. Only 
> annoying thing would be that you might see all the previous datanodes 
> listed as dead.
> 
> The recent trunk has some issues related using Datanode ID for some 
> unclosed files.. but you may not be affected by that.
> 
> Raghu.
> 
> Otis Gospodnetic wrote:
> > Hi,
> > 
> > Will NN get confused if I change the names of slaves from hostnames to IPs?
> > That is, if I've been running Hadoop for a while, and then decide to shut 
> > down 
> all its daemons, switch to IPs, and start everything back up, will the 
> master/NN 
> still see all the DN slaves as before and will it know they are the same old 
> set 
> of DN slaves?
> > 
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 



Parameters for the record reader

2008-05-08 Thread Derek Shaw
Is it possible for the record reader to get a copy of the job configuration or 
is it otherwise possible to send configuration values to the record reader?

-Derek


Corrupt HDFS and salvaging data

2008-05-08 Thread Otis Gospodnetic
Hi,

I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
-format on a *new DN* that I just added to the cluster.  Is it possible for 
that to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
because bin/stop-all.sh didn't stop them for some reason (it always did so 
before).

Is there any way to salvage the data?  I have a 4-node cluster with replication 
factor of 3, though fsck reports lots of under-replicated blocks:

  
  CORRUPT FILES:3355
  MISSING BLOCKS:   3462
  MISSING SIZE: 17708821225 B
  
 Minimally replicated blocks:   28802 (89.269775 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   17025 (52.76779 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3
 Average block replication: 1.7750744
 Missing replicas:  17025 (29.727087 %)
 Number of data-nodes:  4
 Number of racks:   1


The filesystem under path '/' is CORRUPT


What can one do at this point to save the data?  If I run bin/hadoop fsck -move 
or -delete will I lose some of the data?  Or will I simply end up with fewer 
block replicas and will thus have to force re-balancing in order to get back to 
a "safe" number of replicas?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Hadoop Permissions Question -> [Fwd: Hbase on hadoop]

2008-05-08 Thread stack

Can someone familiar with permissions offer an opinion on the below?
Thanks,
St.Ack
--- Begin Message ---

Hi,

We have an issue with hbase on hadoop and file system permissions we  
hope someone already knows the answer to.  Our apologies if we missed  
that this issue has already been addressed on this list.


We are running hbase-0.1.2 on top of hadoop-0.16.3, starting the hbase  
daemon from an "hbase" user account and the hadoop daemon and have  
observed this "feature".   We are running hbase in a separate "hadoop"  
user account and hadoop in it's own "hadoop" user account on a single  
machine.


When we try to start up hbase, we see this error message in the log:

2008-05-06 12:09:02,845 ERROR org.apache.hadoop.hbase.HMaster: Can not  
start master

java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native  
Method)
	at  
sun 
.reflect 
.NativeConstructorAccessorImpl 
.newInstance(NativeConstructorAccessorImpl.java:39)
	at  
sun 
.reflect 
.DelegatingConstructorAccessorImpl 
.newInstance(DelegatingConstructorAccessorImpl.java:27)

at java.lang.reflect.Constructor.newInstance(Constructor.java:494)
at org.apache.hadoop.hbase.HMaster.doMain(HMaster.java:3329)
at org.apache.hadoop.hbase.HMaster.main(HMaster.java:3363)
Caused by: org.apache.hadoop.ipc.RemoteException:  
org.apache.hadoop.fs.permission.AccessControlException: Superuser  
privilege is required

... (etc)

If we run hbase in the hadoop user account we don't have any problems.

We think we've narrowed the issue down a bit from the debug logs.

The method "FSNameSystem.checkPermission()" method is throwing the  
exception because the "PermissionChecker()" constructor is returning  
that the hbase user is not a superuser or in the same supergroup as  
hadoop.


  private void checkSuperuserPrivilege() throws  
AccessControlException {

if (isPermissionEnabled) {
  PermissionChecker pc = new PermissionChecker(
  fsOwner.getUserName(), supergroup);
  if (!pc.isSuper) {
throw new AccessControlException("Superuser privilege is  
required");

  }
}
  }

If we look at at the "PermissionChecker()" constructor we see that it  
is comparing the hdfs owner name (which should be "hadoop") and the  
hdfs file system owner's group ("supergroup") to the current user and  
groups, which the log seems to indicate the user is "hbase" and the  
groups for user "hbase" only include "hbase" :

  PermissionChecker(String fsOwner, String supergroup
  ) throws AccessControlException{
UserGroupInformation ugi = UserGroupInformation.getCurrentUGI();
if (LOG.isDebugEnabled()) {
  LOG.debug("ugi=" + ugi);
}

if (ugi != null) {
  user = ugi.getUserName();
  groups.addAll(Arrays.asList(ugi.getGroupNames()));
  isSuper = user.equals(fsOwner) || groups.contains(supergroup);
}
else {
  throw new AccessControlException("ugi = null");
}
  }

The current user and group is derived from the thread information:
  private static final ThreadLocal currentUGI
= new ThreadLocal();

  /** @return the [EMAIL PROTECTED] UserGroupInformation} for the current thread  
*/

  public static UserGroupInformation getCurrentUGI() {
return currentUGI.get();
  }

which we're hoping might be enough to illuminate the problem.

One question this raises is if the "hbase:hbase" user and group are  
being derived from the Linux file system user and group, or if they  
are the hdfs user and group?
Otherwise, how can we indicate that "hbase" user is in the hdfs group  
"supergroup"? Is there a parameter in a hadoop configuration file?   
Apparently setting the groups of the web server to include  
"supergroup" didn't have any effect, although perhaps that could be  
for some other reason?


Thanks.

--- End Message ---


Re: Corrupt HDFS and salvaging data

2008-05-08 Thread Otis Gospodnetic
Hi,

Update:
It seems fsck reports HDFS is corrupt when a significant-enough number of block 
replicas is missing (or something like that).
fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I 
restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS 
and started reporting *healthy* HDFS.


I'll follow-up with re-balancing question in a separate email.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 11:35:01 PM
> Subject: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
> not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
> -format on a *new DN* that I just added to the cluster.  Is it possible for 
> that 
> to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
> because 
> bin/stop-all.sh didn't stop them for some reason (it always did so before).
> 
> Is there any way to salvage the data?  I have a 4-node cluster with 
> replication 
> factor of 3, though fsck reports lots of under-replicated blocks:
> 
>   
>   CORRUPT FILES:3355
>   MISSING BLOCKS:   3462
>   MISSING SIZE: 17708821225 B
>   
> Minimally replicated blocks:   28802 (89.269775 %)
> Over-replicated blocks:0 (0.0 %)
> Under-replicated blocks:   17025 (52.76779 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor:3
> Average block replication: 1.7750744
> Missing replicas:  17025 (29.727087 %)
> Number of data-nodes:  4
> Number of racks:   1
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> What can one do at this point to save the data?  If I run bin/hadoop fsck 
> -move 
> or -delete will I lose some of the data?  Or will I simply end up with fewer 
> block replicas and will thus have to force re-balancing in order to get back 
> to 
> a "safe" number of replicas?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



How to re-balance, NN safe mode

2008-05-08 Thread Otis Gospodnetic
Hi,

(I should prefix this by saying that bin/hadoop fsck reported corrupt HDFS 
after I replaced one of the DNs with a new/empty DN)

I've removed 1 old DN and added 1 new DN .  The cluster has 4 nodes total (all 
4 act as DNs) and replication factor of 3.  I'm trying to re-balance the data 
by following http://wiki.apache.org/hadoop/FAQ#6:
- I stopped all daemons
- I removed the old DN and added the new DN to conf/slaves
- I started all daemons

The new DN shows in the JT and NN GUIs and bin/hadoop dfsadmin -report shows 
it.  At this point I expected NN to figure out that it needs to re-balance 
under-replicated blocks and start pushing data to the new DN.  However, no data 
got copied to the new DN.  I pumped the replication factor to 6 and restarted 
all daemons, but still nothing.  I noticed the NN GUI says the NN is in safe 
mode, but it has been stuck there for 10+ minutes now - too long, it seems.

I then tried running bin/hadoop balancer, but got this:

 
$ bin/hadoop balancer
Received an IO exception: org.apache.hadoop.dfs.SafeModeException: Cannot 
create file/system/balancer.id. Name node is in safe mode.
Safe mode will be turned off automatically.
at 
org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:947)
at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:931)
...
...

So now I'm wondering what steps one need to follow when replacing a DN?  Just 
pulling it out and listing a new one in conf/slaves leads to NN getting into 
the permanent(?) safe mode, it seems.

I know I can run bin/hadoop dfsadmin -safemode leave  but is that safe? ;)
If I do that, will I then be able to run bin/hadoop balancer and get some 
replicas of the old HDFS data on the newly added DN?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Re: Corrupt HDFS and salvaging data

2008-05-08 Thread lohit
Hi Otis,

Namenode has location information about all replicas of a block. When you run 
fsck, namenode checks for those replicas. If all replicas are missing, then 
fsck reports the block as missing. Otherwise they are added to under replicated 
blocks. If you specify -move or -delete option along with fsck, files with such 
missing blocks are moved to /lost+found or deleted depending on the option. 
At what point did you run the fsck command, was it after the datanodes were 
stopped? When you run namenode -format it would delete directories specified in 
dfs.name.dir. If directory exists it would ask for confirmation. 

Thanks,
Lohit

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, May 8, 2008 9:00:34 PM
Subject: Re: Corrupt HDFS and salvaging data

Hi,

Update:
It seems fsck reports HDFS is corrupt when a significant-enough number of block 
replicas is missing (or something like that).
fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I 
restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS 
and started reporting *healthy* HDFS.


I'll follow-up with re-balancing question in a separate email.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 11:35:01 PM
> Subject: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
> not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
> -format on a *new DN* that I just added to the cluster.  Is it possible for 
> that 
> to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
> because 
> bin/stop-all.sh didn't stop them for some reason (it always did so before).
> 
> Is there any way to salvage the data?  I have a 4-node cluster with 
> replication 
> factor of 3, though fsck reports lots of under-replicated blocks:
> 
>   
>   CORRUPT FILES:3355
>   MISSING BLOCKS:   3462
>   MISSING SIZE: 17708821225 B
>   
> Minimally replicated blocks:   28802 (89.269775 %)
> Over-replicated blocks:0 (0.0 %)
> Under-replicated blocks:   17025 (52.76779 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor:3
> Average block replication: 1.7750744
> Missing replicas:  17025 (29.727087 %)
> Number of data-nodes:  4
> Number of racks:   1
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> What can one do at this point to save the data?  If I run bin/hadoop fsck 
> -move 
> or -delete will I lose some of the data?  Or will I simply end up with fewer 
> block replicas and will thus have to force re-balancing in order to get back 
> to 
> a "safe" number of replicas?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Why ComparableWritable does not take a template?

2008-05-08 Thread steph


The example in the java doc shows that the compareTo() method uses the  
type of the class instead of the
Object type. However the ComparableWritable class does not take any  
template and therefore it cannot

set the tempate for the class Comparable. Is that a mistake?

* public class MyWritableComparable implements WritableComparable {
 *   // Some data
 *   private int counter;
 *   private long timestamp;
 *
 *   public void write(DataOutput out) throws IOException {
 * out.writeInt(counter);
 * out.writeLong(timestamp);
 *   }
 *
 *   public void readFields(DataInput in) throws IOException {
 * counter = in.readInt();
 * timestamp = in.readLong();
 *   }
 *
 *   public int compareTo(MyWritableComparable w) {
 * int thisValue = this.value;
 * int thatValue = ((IntWritable)o).value;
 * return (thisValue < thatValue ? -1 :  
(thisValue==thatValue ? 0 : 1));

 *   }
 * }

If i try to do what it example shows it does not compile:

  [javac] /Users/steph/Work/Rinera/TRUNK/vma/hadoop/parser-hadoop/ 
apps/src/com/rinera/hadoop/weblogs/SummarySQLKey.java:13:  
com.rinera.hadoop.weblogs.SummarySQLKey is not abstract and does not  
override abstract method compareTo(java.lang.Object) in  
java.lang.Comparable

[javac] public class SummarySQLKey


If i don't use the type but instead use the Object type for compareTo()
i get a RuntimeException:

java.lang.RuntimeException: java.lang.InstantiationException:  
com.rinera.hadoop.weblogs.SummarySQLKey
	at  
org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java: 
75)
	at  
org.apache.hadoop.io.WritableComparator.(WritableComparator.java: 
63)
	at  
org.apache.hadoop.io.WritableComparator.get(WritableComparator.java:42)
	at  
org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java: 
645)
	at org.apache.hadoop.mapred.MapTask 
$MapOutputBuffer.(MapTask.java:313)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:174)
	at org.apache.hadoop.mapred.LocalJobRunner 
$Job.run(LocalJobRunner.java:157)
Caused by: java.lang.InstantiationException:  
com.rinera.hadoop.weblogs.SummarySQLKey

at java.lang.Class.newInstance0(Class.java:335)
at java.lang.Class.newInstance(Class.java:303)
	at  
org.apache.hadoop.io.WritableComparator.newKey(WritableComparator.java: 
73)

... 6 more





Re: Corrupt HDFS and salvaging data

2008-05-08 Thread Otis Gospodnetic
Lohit,


I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and started 
all daemons.
I see the fsck report does include this:
Missing replicas:  17025 (29.727087 %)

According to your explanation, this means that after I removed 1 DN I started 
missing about 30% of the blocks, right?
Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I 
removed?  But how could that be when I have replication factor of 3?

If I run bin/hadoop balancer with my old DN back in the cluster (and new DN 
removed), I do get the happy "The cluster is balanced" response.  So wouldn't 
that mean that everything is peachy and that if my replication factor is 3 then 
when I remove 1 DN, I should have only some portion of blocks under-replicated, 
but not *completely* missing from HDFS?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: lohit <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, May 9, 2008 1:33:56 AM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Hi Otis,
> 
> Namenode has location information about all replicas of a block. When you run 
> fsck, namenode checks for those replicas. If all replicas are missing, then 
> fsck 
> reports the block as missing. Otherwise they are added to under replicated 
> blocks. If you specify -move or -delete option along with fsck, files with 
> such 
> missing blocks are moved to /lost+found or deleted depending on the option. 
> At what point did you run the fsck command, was it after the datanodes were 
> stopped? When you run namenode -format it would delete directories specified 
> in 
> dfs.name.dir. If directory exists it would ask for confirmation. 
> 
> Thanks,
> Lohit
> 
> - Original Message 
> From: Otis Gospodnetic 
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 9:00:34 PM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> Update:
> It seems fsck reports HDFS is corrupt when a significant-enough number of 
> block 
> replicas is missing (or something like that).
> fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I 
> restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS 
> and started reporting *healthy* HDFS.
> 
> 
> I'll follow-up with re-balancing question in a separate email.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> - Original Message 
> > From: Otis Gospodnetic 
> > To: core-user@hadoop.apache.org
> > Sent: Thursday, May 8, 2008 11:35:01 PM
> > Subject: Corrupt HDFS and salvaging data
> > 
> > Hi,
> > 
> > I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm 
> > trying 
> > not to lose the precious data in it.  I accidentally run bin/hadoop 
> > namenode 
> > -format on a *new DN* that I just added to the cluster.  Is it possible for 
> that 
> > to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
> because 
> > bin/stop-all.sh didn't stop them for some reason (it always did so before).
> > 
> > Is there any way to salvage the data?  I have a 4-node cluster with 
> replication 
> > factor of 3, though fsck reports lots of under-replicated blocks:
> > 
> >   
> >   CORRUPT FILES:3355
> >   MISSING BLOCKS:   3462
> >   MISSING SIZE: 17708821225 B
> >   
> > Minimally replicated blocks:   28802 (89.269775 %)
> > Over-replicated blocks:0 (0.0 %)
> > Under-replicated blocks:   17025 (52.76779 %)
> > Mis-replicated blocks: 0 (0.0 %)
> > Default replication factor:3
> > Average block replication: 1.7750744
> > Missing replicas:  17025 (29.727087 %)
> > Number of data-nodes:  4
> > Number of racks:   1
> > 
> > 
> > The filesystem under path '/' is CORRUPT
> > 
> > 
> > What can one do at this point to save the data?  If I run bin/hadoop fsck 
> -move 
> > or -delete will I lose some of the data?  Or will I simply end up with 
> > fewer 
> > block replicas and will thus have to force re-balancing in order to get 
> > back 
> to 
> > a "safe" number of replicas?
> > 
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Re: Parameters for the record reader

2008-05-08 Thread Arun C Murthy


On May 8, 2008, at 8:16 PM, Derek Shaw wrote:

Is it possible for the record reader to get a copy of the job  
configuration or is it otherwise possible to send configuration  
values to the record reader?




Yes. Typically InputFormat.getRecordReader instantiates the  
RecordReader; and that function is passed the JobConf. So you could  
pass it down from there-on.


Take a look at SequenceFileRecordReader.java or LineRecordReader.java  
in trunk/src/java/org/apache/hadoop/mapred for an example.


Arun


Re: Corrupt HDFS and salvaging data

2008-05-08 Thread lohit
When you say all daemons, do you mean the entire cluster, including the 
namenode?
>According to your explanation, this means that after I removed 1 DN I started 
>missing about 30% of the blocks, right?
No, You would only miss the replica. If all of your blocks have replication 
factor of 3, then you would miss only one replica which was on this DN.

It would be good to see full report
could you run hadoop fsck / -files -blocks -location?

That would give you much more detailed information. 


- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, May 8, 2008 10:54:53 PM
Subject: Re: Corrupt HDFS and salvaging data

Lohit,


I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and started 
all daemons.
I see the fsck report does include this:
Missing replicas:  17025 (29.727087 %)

According to your explanation, this means that after I removed 1 DN I started 
missing about 30% of the blocks, right?
Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I 
removed?  But how could that be when I have replication factor of 3?

If I run bin/hadoop balancer with my old DN back in the cluster (and new DN 
removed), I do get the happy "The cluster is balanced" response.  So wouldn't 
that mean that everything is peachy and that if my replication factor is 3 then 
when I remove 1 DN, I should have only some portion of blocks under-replicated, 
but not *completely* missing from HDFS?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: lohit <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, May 9, 2008 1:33:56 AM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Hi Otis,
> 
> Namenode has location information about all replicas of a block. When you run 
> fsck, namenode checks for those replicas. If all replicas are missing, then 
> fsck 
> reports the block as missing. Otherwise they are added to under replicated 
> blocks. If you specify -move or -delete option along with fsck, files with 
> such 
> missing blocks are moved to /lost+found or deleted depending on the option. 
> At what point did you run the fsck command, was it after the datanodes were 
> stopped? When you run namenode -format it would delete directories specified 
> in 
> dfs.name.dir. If directory exists it would ask for confirmation. 
> 
> Thanks,
> Lohit
> 
> - Original Message 
> From: Otis Gospodnetic 
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 9:00:34 PM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> Update:
> It seems fsck reports HDFS is corrupt when a significant-enough number of 
> block 
> replicas is missing (or something like that).
> fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I 
> restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS 
> and started reporting *healthy* HDFS.
> 
> 
> I'll follow-up with re-balancing question in a separate email.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> - Original Message 
> > From: Otis Gospodnetic 
> > To: core-user@hadoop.apache.org
> > Sent: Thursday, May 8, 2008 11:35:01 PM
> > Subject: Corrupt HDFS and salvaging data
> > 
> > Hi,
> > 
> > I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm 
> > trying 
> > not to lose the precious data in it.  I accidentally run bin/hadoop 
> > namenode 
> > -format on a *new DN* that I just added to the cluster.  Is it possible for 
> that 
> > to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
> because 
> > bin/stop-all.sh didn't stop them for some reason (it always did so before).
> > 
> > Is there any way to salvage the data?  I have a 4-node cluster with 
> replication 
> > factor of 3, though fsck reports lots of under-replicated blocks:
> > 
> >   
> >   CORRUPT FILES:3355
> >   MISSING BLOCKS:   3462
> >   MISSING SIZE: 17708821225 B
> >   
> > Minimally replicated blocks:   28802 (89.269775 %)
> > Over-replicated blocks:0 (0.0 %)
> > Under-replicated blocks:   17025 (52.76779 %)
> > Mis-replicated blocks: 0 (0.0 %)
> > Default replication factor:3
> > Average block replication: 1.7750744
> > Missing replicas:  17025 (29.727087 %)
> > Number of data-nodes:  4
> > Number of racks:   1
> > 
> > 
> > The filesystem under path '/' is CORRUPT
> > 
> > 
> > What can one do at this point to save the data?  If I run bin/hadoop fsck 
> -move 
> > or -delete will I lose some of the data?  Or will I simply end up with 
> > fewer 
> > block replicas and will thus have to force re-balancing in order to get 
> > back 
> to 
> > a "safe" number of replicas?
> > 
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch