Re: Nutch and Hadoop not working proper

Andrzej Bialecki Wed, 24 Jun 2009 01:44:26 -0700

MilleBii wrote:

HEEEELLLLLPPP !!!


Stuck for 3 days on not able to start any nutch job.

hdfs works fine, ie I can put & look at files.
When i start nutch crawl, I get the following error

Job initialization failed:
java.lang.IllegalArgumentException: Pathname
/d:/Bii/nutch/logs/history/user/_logs/history/localhost_1245788245191_job_200906232217_0001_pc-xxxx%5Cxxxx_inject+urls

It is looking for the file at a wrong location ???? Indeed in my case the
correct location is /d:/Bii/nutch/logs/history, so why is *
"history/user/_logs"* added and how can I fix that ?

2009/6/21 MilleBii <mille...@gmail.com>

Looks like I just needed to transfer from the local filesystem to hdfs:
Is it safe to transfer a crawl directory (and subs) from the local file
system to hdfs and start crawling again ?

1. hadoop fs -put crawl crawl
2. nutch generate crawl/crawldb crawl/segments -topN 500 (where now it
should use the hdfs)

-MilleBii-

2009/6/21 MilleBii <mille...@gmail.com>

 I have newly installed hadoop in a distributed single node configuration.

When I run nutch commands  it is looking for files my user home directory
and not at the nutch directory ?
How can I change this ?

I suspect your hadoop-site.xml uses relative path somewhere, and not anabsolute path (with leading slash). Also, /d: looks suspiciously like aWindows pathname, in which case you should either use a full URI(file:///d:/) or just the disk name d:/ without the leading slash.Please also note that if you are running this on Windows under cygwinthen in your config files you MUST NOT use the cygwin paths (like/cygdrive/d/...) because Java can't see them.



--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Nutch and Hadoop not working proper

Reply via email to