Hmm.. I just removed the “crawl” directory (output directory) from the command
and it works! I’m storing the output in a Cassandra cluster using Gora anyway.
So I don’t think I want to store that on HDFS :)
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
On 4 January 2014 at 11:0
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan
wrote:
> Hi,
>
> I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster
> is running fine and I’ve successfully added the input and output directory on
> to HDFS
Hi,
I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is
running fine and I’ve successfully added the input and output directory on to
HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job
org.apache.nutch.crawl.Crawler urls -dir crawl -depth