Nathan,

We do this within the current framework by having a separate java
application that manages the instances of the crawlers, and only refers to
Nutch which sits in a separate folder.  The structure is something like:

+-Nutch Package (as is)
+-Crawler App
--+--mysite1
-----+--conf
-----+--...nutch generated folders...
-----+--urls
--+--mysite2
-----+--conf
-----+--...nutch generated folders...
-----+--urls

The Crawler App has commands to create a new crawler, to start or stop the
crawler etc.  When creating a crawler, it copies the default conf settings
from the Nutch Package.  Obviously, it has to have properties to define the
location of Nutch, Java etc.

This works pretty well as a skeletal starting point, but obviously for true
enterprise use, a front end administration layer needs to sit above.

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions
Tel: +44 (20) 7257 6125
Mobile: +44 (7796) 932 362
http://blog.idna-solutions.com

-----Original Message-----
From: Nathan Ter Bogt [mailto:[EMAIL PROTECTED] 
Sent: 25 January 2007 01:03
To: [email protected]
Subject: Multiple collections

Has there been any thought given to the possibility of allowing users to
define multiple collections? perhaps something in the structure of 

/conf/mysite1/*.xml 
/conf/mysite2/*.xml 

bin/nutch crawl mysite2? 

I believe a lot of end users would find this extremely useful and it would
make nutch more suitable to becoming an enterprise search solution. 

Thanks, 
-- 
Nathan ter Bogt | Software engineer 

Agileware Pty. Ltd. 


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to