Re: nutch-0.8-dev mapred.input.subdir problem ?

Paul Baclace Wed, 21 Dec 2005 14:56:23 -0800

You can ignore mapred.input.subdir; I find it is an unneeded option.

Now that the mapred branch is merged to be the trunk, there is a need
to clarify the documentation since the a change was made to have the
input be specified as a directory and then all files in that directory
are considered input files (no wildcard needed).  I will put that on
my ToDo list.

mapred.input.dir is an abstract path that is either the OS filesystem
or NDFS, depending on which is in use (if fs.default.name is "local" then
the local OS fs is being used, otherwise fs.default.name is something
like domainOfMyMasterNode:port).

To use NDFS, you need to copy your input file(s) from your local fs to NDFS:

  bin/nutch ndfs -put /home/peb/urls_localfs/oneFILENAME  /urls

The destination path "/urls" is arbitrary and is created as a side effect
of the file -put.  Repeat this for each file you have.

Paul

Lukas Vlcek wrote:
> java.io.IOException: No input directories specified in: NutchConf:
> nutch-default.xml , mapred-default.xml ,
> /home/lukas/nutch/mapred/local/localRunner/job_4zwds6.xml ,
> nutch-site.xml
>         at 
org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85)
>         at 
org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95)
>         at 
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63)
> 051220 204249 Running job: job_4zwds6
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:102)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:101)

Re: nutch-0.8-dev *mapred.input.subdir* problem ?

Reply via email to

Re: nutch-0.8-dev mapred.input.subdir problem ?