Hadoop doesn't find the input file

2014-01-04 Thread Manikandan Saravanan
Hi,

I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is 
running fine and I’ve successfully added the input and output directory on to 
HDFS. But when I run

$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job 
org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5

I’m getting something like:

INFO input.FileInputFormat: Total input paths to process : 0

Which, I understand, is meaning that Hadoop cannot locate the input files. The 
job ends for obvious reasons citing the null pointer exception. Can someone 
help me out?

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

Re: Hadoop doesn't find the input file

2014-01-04 Thread Ted Yu
Can you pastebin the stack trace involving the NPE ?

Thanks

On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan 
manikan...@thesocialpeople.net wrote:

 Hi,
 
 I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster 
 is running fine and I’ve successfully added the input and output directory on 
 to HDFS. But when I run
 
 $HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job 
 org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
 
 I’m getting something like:
 
 INFO input.FileInputFormat: Total input paths to process : 0
 
 Which, I understand, is meaning that Hadoop cannot locate the input files. 
 The job ends for obvious reasons citing the null pointer exception. Can 
 someone help me out?
 
 -- 
 Manikandan Saravanan
 Architect - Technology
 TheSocialPeople


Re: Hadoop doesn't find the input file

2014-01-04 Thread Manikandan Saravanan
Hmm.. I just removed the “crawl” directory (output directory) from the command 
and it works! I’m storing the output in a Cassandra cluster using Gora anyway. 
So I don’t think I want to store that on HDFS :)
-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

On 4 January 2014 at 11:06:56 pm, Ted Yu (yuzhih...@gmail.com) wrote:

Can you pastebin the stack trace involving the NPE ?

Thanks

On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan 
manikan...@thesocialpeople.net wrote:

Hi,

I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is 
running fine and I’ve successfully added the input and output directory on to 
HDFS. But when I run

$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job 
org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5

I’m getting something like:

INFO input.FileInputFormat: Total input paths to process : 0

Which, I understand, is meaning that Hadoop cannot locate the input files. The 
job ends for obvious reasons citing the null pointer exception. Can someone 
help me out?

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople