John Martyniak wrote:
Andrzej,
I am a little embarassed asking. But is there is a setup guide for
setting up Hadoop for Nutch 1.0, or is it the same process as setting up
for Nutch 0.17 (Which I think is the existing guide out there).
Basically, yes - but this guide is primarily about the set up of Hadoop
cluster using the Hadoop pieces distributed with Nutch. As such these
instructions are already slightly outdated. So it's best simply to
install a clean Hadoop 0.19.1 according to the instructions on Hadoop
wiki, and then build nutch*.job file separately.
Also I have Hadoop already running for some other applications, not
associated with Nutch, can I use the same install? I think that it is
the same version that Nutch 1.0 uses. Or is it just easier to set it up
using the nutch config.
Yes, it's perfectly ok to use Nutch with an existing Hadoop cluster of
the same vintage (which is 0.19.1 in Nutch 1.0). In fact, I would
strongly recommend this way, instead of the usual "dirty" way of setting
up Nutch by replicating the local build dir ;)
Just specify the nutch*.job file like this:
bin/hadoop jar nutch*.job <className> <args ..>
where className and args is one of Nutch command-line tools. You can
also modify slightly the bin/nutch script, so that you don't have to
specify fully-qualified class names.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com