Regarding:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg08854.html
I too want to run multiple nutch instances.
I have a two CPU (with two cores each) development box on which to develop my
search application. I have installed a nightly build of nutch. Currently that
installation is working on a crawl that will take it many days to complete. In
the meanwhile, I want to be able to try some other tests. At this stage I'm
more interested in
the whole crawl cycle: inject, generate, fetch, updatedb, invertlinks, index.
I'm less interested in search for now.
So for instance, I'd like to install an even more recent nightly build, then
run some short crawls with it. Maybe I'd like to have another version of nutch
that I hack up. I'd want to play with it even as one of the other instances is
running a crawl.
My current installation is in:
/usr/local/nutch-2007-06-27_06-52-44
I've also noticed that the log file hadoop.log gets created here:
/var/tmp/nutch-2007-06-27_06-52-44
Other than these I haven't seen any environment variables or other global
properties that might conflict. So it seems I could just install to
/usr/local/new_nutch
and I presume that this would be created:
/var/tmp/new_nutch
Some other discussions relating to this subject are here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg04838.html
As for different set up for different Nutch instances I think you
could have multiple installations on your server where each instance
would have its own conf directory (with specific config files) and
source code can be shared via symbolic link.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg02138.html
running multiple nutch on one box is possible but difficult.
The problem is that tomcat and also nutch (0.8 map reduce/ ndfs)
use a set of tcp port ports, that are already blocked in case a
other unix user already runs nutch.
The best way to go, is that you first use a subversion or cvs as
centralized repository for your customized code, than all
developers can share code and working together on the same code
basis. Beside that each developer should run a tiny test
instance of nutch on her developer machine. In the end it is a
good idea - to have a script that download once a day the code
from cvs and run a test suite and deploy the code on your 'big'
server.
http://cruisecontrol.sourceforge.net/ is a helpful tool.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg05061.html
Q: Let's say I want to run 2 search engines on the same server.
For search engine one I use the database "crawl" and for the
second search engine I use "crawl2" as the database. For
accessing the content could I use different ports for each
engine? engine one will be localhost:8080 and engine two will
be localhost:8081. Just asking if this is possible.
A: Yes this is possible. You can use different ports or
different virtualhost or different context path to separate the
two ui's. You still need to have two separate web applications
with two separate configurations (pointing to two separate
directories)
Q: the two different web applications is really no big deal. Is
it possible that I could be pointed in the right direction or
setting this up? Someone else setup nutch/tomcat/java for me so
I am not exactly sure where I would set up the virtual host or
where a config file would exist that would point to the
database path.
A: I quess the simplest way to do it is just copy the nutch-
war-file under <TOMCAT>/webapps with two different names
(search1.war and search2.war) then after tomcat has extracted
the archives edit file <TOMCAT>/webapps/search1/WEB-
INF/classes/nutch-site.xml and change searcher.dir to point to
correct directory. For the other instance the configuration
file is <TOMCAT>/webapps/search2/WEB-INF/classes/nutch-site.xml
----- Original Message ----
From: karthik085 <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, July 20, 2007 3:13:24 PM
Subject: Multiple Nuch Instances
1. Can I run multiple instances of nutch for crawling/indexing? I got mixed
opinions - some say yes and some say no. Can someone, who have tried this
let me know? One guy said it is difficult becuase multiple nutch instances
have to use different ports?
2. If i can run multiple instances of nutch, can I run nutch v 0.7.2, nutch
0.9 and nutch-dev at the same time for crawling/indexing websites?
Please let me know. Thanks.
--
View this message in context:
http://www.nabble.com/Multiple-Nuch-Instances-tf4119823.html#a11716837
Sent from the Nutch - User mailing list archive at Nabble.com.
____________________________________________________________________________________Ready
for the edge of your seat?
Check out tonight's top picks on Yahoo! TV.
http://tv.yahoo.com/-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general