Thank you for the file:/// tip, i was not including it in the paths. I'm running the example with this line -> bin/hadoop jar hadoop-*-examples.jar grep file:///home/slitz/warehouse/input file:///home/slitz/warehouse/output 'dfs[a-z.]+'
But i'm getting the same error as before, i'm getting org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : * /home/slitz/hadoop-0.15.3/grep-temp-1030179831* at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) (...stack continues...) i think the problem may be the input path, it should be pointing to some path in the nfs share, right? the grep-temp-* dir is being created in the HADOOP_HOME of Box A ( 192.168.2.3). slitz On Fri, Apr 11, 2008 at 4:06 PM, Luca <[EMAIL PROTECTED]> wrote: > slitz wrote: > > > I've read in the archive that it should be possible to use any > > distributed > > filesystem since the data is available to all nodes, so it should be > > possible to use NFS, right? > > I've also read somewere in the archive that this shoud be possible... > > > > > As far as I know, you can refer to any file on a mounted file system > (visible from all compute nodes) using the prefix file:// before the full > path, unless another prefix has been specified. > > Cheers, > Luca > > > > > slitz > > > > > > On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi <[EMAIL PROTECTED] > > > > > wrote: > > > > Hello , > > > > > > To execute Hadoop Map-Reduce job input data should be on HDFS not on > > > NFS. > > > > > > Thanks > > > > > > --- > > > Peeyush > > > > > > > > > > > > On Fri, 2008-04-11 at 12:40 +0100, slitz wrote: > > > > > > Hello, > > > > I'm trying to assemble a simple setup of 3 nodes using NFS as > > > > > > > Distributed > > > > > > > Filesystem. > > > > > > > > Box A: 192.168.2.3, this box is either the NFS server and working as > > > > a > > > > > > > slave > > > > > > > node > > > > Box B: 192.168.2.30, this box is only JobTracker > > > > Box C: 192.168.2.31, this box is only slave > > > > > > > > Obviously all three nodes can access the NFS shared, and the path to > > > > the > > > > share is /home/slitz/warehouse in all three. > > > > > > > > My hadoop-site.xml file were copied over all nodes and looks like > > > > this: > > > > > > > > <configuration> > > > > > > > > <property> > > > > > > > > <name>fs.default.name</name> > > > > > > > > <value>local</value> > > > > > > > > <description> > > > > > > > > The name of the default file system. Either the literal string > > > > > > > > "local" or a host:port for NDFS. > > > > > > > > </description> > > > > > > > > </property> > > > > > > > > <property> > > > > > > > > <name>mapred.job.tracker</name> > > > > > > > > <value>192.168.2.30:9001</value> > > > > > > > > <description> > > > > > > > > The host and port that the MapReduce job > > > > > > > > tracker runs at. If "local", then jobs are > > > > > > > > run in-process as a single map and reduce task. > > > > > > > > </description> > > > > > > > > </property> > > > > > > > > <property> > > > > > > > > <name>mapred.system.dir</name> > > > > > > > > <value>/home/slitz/warehouse/hadoop_service/system</value> > > > > > > > > <description>omgrotfcopterlol.</description> > > > > > > > > </property> > > > > > > > > </configuration> > > > > > > > > > > > > As one can see, i'm not using HDFS at all. > > > > (Because all the free space i have is located in only one node, so > > > > using > > > > HDFS would be unnecessary overhead) > > > > > > > > I've copied the input folder from hadoop to > > > > /home/slitz/warehouse/input. > > > > When i try to run the example line > > > > > > > > bin/hadoop jar hadoop-*-examples.jar grep > > > > /home/slitz/warehouse/input/ > > > > /home/slitz/warehouse/output 'dfs[a-z.]+' > > > > > > > > the job starts and finish okay but at the end i get this error: > > > > > > > > org.apache.hadoop.mapred.InvalidInputException: Input path doesn't > > > > exist > > > > > > > : > > > > > > > /home/slitz/hadoop-0.15.3/grep-temp-141595661 > > > > at > > > > > > > > org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154) > > > > > > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508) > > > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) > > > > (...the error stack continues...) > > > > > > > > i don't know why the input path being looked is in the local path > > > > /home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...) > > > > > > > > Maybe something is missing in my hadoop-site.xml? > > > > > > > > > > > > > > > > slitz > > > > > > > > > > >