MilleBii wrote:
HEEEELLLLLPPP !!!
Stuck for 3 days on not able to start any nutch job.
hdfs works fine, ie I can put & look at files.
When i start nutch crawl, I get the following error
Job initialization failed:
java.lang.IllegalArgumentException: Pathname
/d:/Bii/nutch/logs/history/user/_logs/history/localhost_1245788245191_job_200906232217_0001_pc-xxxx%5Cxxxx_inject+urls
It is looking for the file at a wrong location ???? Indeed in my case the
correct location is /d:/Bii/nutch/logs/history, so why is *
"history/user/_logs"* added and how can I fix that ?
2009/6/21 MilleBii <mille...@gmail.com>
Looks like I just needed to transfer from the local filesystem to hdfs:
Is it safe to transfer a crawl directory (and subs) from the local file
system to hdfs and start crawling again ?
1. hadoop fs -put crawl crawl
2. nutch generate crawl/crawldb crawl/segments -topN 500 (where now it
should use the hdfs)
-MilleBii-
2009/6/21 MilleBii <mille...@gmail.com>
I have newly installed hadoop in a distributed single node configuration.
When I run nutch commands it is looking for files my user home directory
and not at the nutch directory ?
How can I change this ?
I suspect your hadoop-site.xml uses relative path somewhere, and not an
absolute path (with leading slash). Also, /d: looks suspiciously like a
Windows pathname, in which case you should either use a full URI
(file:///d:/) or just the disk name d:/ without the leading slash.
Please also note that if you are running this on Windows under cygwin
then in your config files you MUST NOT use the cygwin paths (like
/cygdrive/d/...) because Java can't see them.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com