Re: Reading files from local file system

Chandan Tamrakar Tue, 13 Oct 2009 09:06:27 -0700

Do I need to change any configuration beside changing the default file
system to "local file system' ?
I am trying to input for example  input.txt to map job


input.txt will contain file location as following

file://path/abc1.doc
file://path/abc2.doc
..
...

map program will read each line from input.txt and process them

Do i need to change any configuration ? This is similar to how Nutch crawls
.

any feedbacks would be appreciated

thanks



On Tue, Oct 13, 2009 at 6:49 AM, Jeff Zhang <[email protected]> wrote:

> Maybe you could debug your mapreduce job in eclipse, since you run it in
> local mode.
>
>
>
> On Tue, Oct 13, 2009 at 5:56 AM, Chandan Tamrakar <
> [email protected]> wrote:
>
> >
> >
> > We are trying to read files from local file system. But when running the
> > map
> > reduce it is not able to read files from the input location (the input
> > location is also local file system location).
> >
> > For this we changed the configuration of the hadoop-site.xml as shown
> > below:
> >
> > /etc/conf/hadoop/hadoop-site.xml
> >
> > <property>
> >    <name>fs.default.name</name>
> >    <value>file:///</value>
> >  </property>
> >
> >
> >  [ad...@localhost ~]$ hadoop jar Test.jar /home/admin/input/test.txt
> > output1
> >
> > Suppose Test.txt is pain text file that contains
> > Test1
> > Test2
> > Test3
> >
> >
> > While running simple MapReduce job we get following exception  "File not
> > found exception " , we are using TextInputFormat in our Job configuration
> >
> >
> > 09/10/13 17:26:35 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 09/10/13 17:26:35 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 09/10/13 17:26:37 INFO mapred.JobClient: Running job:
> job_200910131447_0033
> > 09/10/13 17:26:38 INFO mapred.JobClient:  map 0% reduce 0%
> > 09/10/13 17:27:00 INFO mapred.JobClient: Task Id :
> > attempt_200910131447_0033_m_000000_0, Status : FAILED
> > java.io.FileNotFoundException: File
> file:/home/admin/Desktop/input/test.txt
> > does not exist.
> >        at
> >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.jav
> > a:420)
> >        at
> >
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:25
> > 9)
> >        at
> >
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(Checks
> > umFileSystem.java:117)
> >        at
> > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:275)
> >        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:364)
> >        at
> >
> org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:206)
> >        at
> >
> >
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.jav
> > a:50)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> >        at
> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)
> >
> > However, running in the code as a separate Main method does work well.
> >
> > public static void main (String [] args) throws IOException {
> >
> >     Configuration conf = new Configuration();
> >     FileSystem fs = FileSystem.get(conf);
> >
> >     Path filenamePath = new Path(theFilename);
> >     FSDataOutputStream out = fs.create(new Path("abc.txt"));
> >     out.writeUTF("abc");
> >     out.close();
> >
> > }
> >
> > The above code works fine when running it as a jar in hadoop. The above
> > code
> > successfully creates file in /home/admin/abc.txt when running from admin
> > user.
> >
> >
>



-- 
Chandan Tamrakar

Re: Reading files from local file system

Reply via email to