slitz wrote:
I've read in the archive that it should be possible to use any distributed
filesystem since the data is available to all nodes, so it should be
possible to use NFS, right?
I've also read somewere in the archive that this shoud be possible...


As far as I know, you can refer to any file on a mounted file system (visible from all compute nodes) using the prefix file:// before the full path, unless another prefix has been specified.

Cheers,
Luca


slitz


On Fri, Apr 11, 2008 at 1:43 PM, Peeyush Bishnoi <[EMAIL PROTECTED]>
wrote:

Hello ,

To execute Hadoop Map-Reduce job input data should be on HDFS not on
NFS.

Thanks

---
Peeyush



On Fri, 2008-04-11 at 12:40 +0100, slitz wrote:

Hello,
I'm trying to assemble a simple setup of 3 nodes using NFS as
Distributed
Filesystem.

Box A: 192.168.2.3, this box is either the NFS server and working as a
slave
node
Box B: 192.168.2.30, this box is only JobTracker
Box C: 192.168.2.31, this box is only slave

Obviously all three nodes can access the NFS shared, and the path to the
share is /home/slitz/warehouse in all three.

My hadoop-site.xml file were copied over all nodes and looks like this:

<configuration>

<property>

<name>fs.default.name</name>

 <value>local</value>

<description>

 The name of the default file system. Either the literal string

"local" or a host:port for NDFS.

 </description>

</property>

 <property>

<name>mapred.job.tracker</name>

 <value>192.168.2.30:9001</value>

<description>

 The host and port that the MapReduce job

tracker runs at. If "local", then jobs are

 run in-process as a single map and reduce task.

</description>

 </property>

<property>

<name>mapred.system.dir</name>

 <value>/home/slitz/warehouse/hadoop_service/system</value>

<description>omgrotfcopterlol.</description>

 </property>

</configuration>


As one can see, i'm not using HDFS at all.
(Because all the free space i have is located in only one node, so using
HDFS would be unnecessary overhead)

I've copied the input folder from hadoop to /home/slitz/warehouse/input.
When i try to run the example line

bin/hadoop jar hadoop-*-examples.jar grep /home/slitz/warehouse/input/
/home/slitz/warehouse/output 'dfs[a-z.]+'

the job starts and finish okay but at the end i get this error:

org.apache.hadoop.mapred.InvalidInputException: Input path doesn't exist
:
/home/slitz/hadoop-0.15.3/grep-temp-141595661
at

org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:154)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:508)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
(...the error stack continues...)

i don't know why the input path being looked is in the local path
/home/slitz/hadoop(...) instead of /home/slitz/warehouse/(...)

Maybe something is missing in my hadoop-site.xml?



slitz



Reply via email to