Nathan, We do this within the current framework by having a separate java application that manages the instances of the crawlers, and only refers to Nutch which sits in a separate folder. The structure is something like:
+-Nutch Package (as is) +-Crawler App --+--mysite1 -----+--conf -----+--...nutch generated folders... -----+--urls --+--mysite2 -----+--conf -----+--...nutch generated folders... -----+--urls The Crawler App has commands to create a new crawler, to start or stop the crawler etc. When creating a crawler, it copies the default conf settings from the Nutch Package. Obviously, it has to have properties to define the location of Nutch, Java etc. This works pretty well as a skeletal starting point, but obviously for true enterprise use, a front end administration layer needs to sit above. Best regards, Alan _________________________ Alan Tanaman iDNA Solutions Tel: +44 (20) 7257 6125 Mobile: +44 (7796) 932 362 http://blog.idna-solutions.com -----Original Message----- From: Nathan Ter Bogt [mailto:[EMAIL PROTECTED] Sent: 25 January 2007 01:03 To: [email protected] Subject: Multiple collections Has there been any thought given to the possibility of allowing users to define multiple collections? perhaps something in the structure of /conf/mysite1/*.xml /conf/mysite2/*.xml bin/nutch crawl mysite2? I believe a lot of end users would find this extremely useful and it would make nutch more suitable to becoming an enterprise search solution. Thanks, -- Nathan ter Bogt | Software engineer Agileware Pty. Ltd. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
