Do I need to change any configuration beside changing the default file system to "local file system' ? I am trying to input for example input.txt to map job
input.txt will contain file location as following file://path/abc1.doc file://path/abc2.doc .. ... map program will read each line from input.txt and process them Do i need to change any configuration ? This is similar to how Nutch crawls . any feedbacks would be appreciated thanks On Tue, Oct 13, 2009 at 6:49 AM, Jeff Zhang <[email protected]> wrote: > Maybe you could debug your mapreduce job in eclipse, since you run it in > local mode. > > > > On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar < > [email protected]> wrote: > > > > > > > We are trying to read files from local file system. But when running the > > map > > reduce it is not able to read files from the input location (the input > > location is also local file system location). > > > > For this we changed the configuration of the hadoop-site.xml as shown > > below: > > > > /etc/conf/hadoop/hadoop-site.xml > > > > <property> > > <name>fs.default.name</name> > > <value>file:///</value> > > </property> > > > > > > [ad...@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt > > output1 > > > > Suppose Test.txt is pain text file that contains > > Test1 > > Test2 > > Test3 > > > > > > While running simple MapReduce job we get following exception "File not > > found exception " , we are using TextInputFormat in our Job configuration > > > > > > 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to > process > > : 1 > > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to > process > > : 1 > > 09/10/13 17:26:37 INFO mapred.JobClient: Running job: > job_200910131447_0033 > > 09/10/13 17:26:38 INFO mapred.JobClient: map 0% reduce 0% > > 09/10/13 17:27:00 INFO mapred.JobClient: Task Id : > > attempt_200910131447_0033_m_000000_0, Status : FAILED > > java.io.FileNotFoundException: File > file:/home/admin/Desktop/input/test.txt > > does not exist. > > at > > > > > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav > > a:420) > > at > > > > > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25 > > 9) > > at > > > > > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks > > umFileSystem.java:117) > > at > > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275) > > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364) > > at > > > org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206) > > at > > > > > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav > > a:50) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) > > at > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210) > > > > However, running in the code as a separate Main method does work well. > > > > public static void main (String [] args) throws IOException { > > > > Configuration conf = new Configuration(); > > FileSystem fs = FileSystem.get(conf); > > > > Path filenamePath = new Path(theFilename); > > FSDataOutputStream out = fs.create(new Path("abc.txt")); > > out.writeUTF("abc"); > > out.close(); > > > > } > > > > The above code works fine when running it as a jar in hadoop. The above > > code > > successfully creates file in /home/admin/abc.txt when running from admin > > user. > > > > > -- Chandan Tamrakar
