Hi Diman, If your local file is named as urlsdir.txt then you should put use that filename to put.
Try to rename the urlsdir.txt to urls dir and use the command. When issuing the crawl command make sure that the urlsdir is indeed stored under HDFS Regards, -Stavros. ________________________________________ From: Diman Tootaghaj [[email protected]] Sent: Saturday, October 26, 2013 12:42 AM To: [email protected] Subject: Web Search (problem with put urlsdir and starting crawl) Hi everybody, I have problem running web search, when I run # $HADOOP_HOME/bin/hadoop dfs -put urlsdir urlsdir it says "file urlsdir does not exist" so I tried using # $HADOOP_HOME/bin/hadoop dfs -put urlsdir.txt urlsdir.txt Is that correct? The other problem is that I can't start Nutch crawl using the following command: # $HADOOP_HOME/bin/nutch crawl urlsdir -dir crawl -depth 3 It stops when printing the following line, forever: Injector: converting injected urls to crawl db entries. Has anybody encountered the same problem? Which ports do you use for fs.default.name<http://fs.default.name> shall we use 9000 and 9001 for the master node and job tracker (like: hdfs://127.0.0.1:9000<http://127.0.0.1:9000> and 127.0.0.1:9001<http://127.0.0.1:9001>) ? Thanks a lot.
