Our live implementation is still in spec at the moment (I'm the protoypes guy), but I'm guessing we'll need automation of the crawl, merge, index, dedup etc and a way of monitoring progress, checking for errors etc. Our development team should be ok with the cron jobs, logs etc but ideally for our support team, an easy to use GUI would make their jobs much easier. For example, a customer might ring up with a query etc "new content has been put up but I can't find x", the 1st level support should be able to look at the admin page that'll quickly tell them when the content was last crawled, whether there were any errors etc.
Cheers Aled > -----Neges Wreiddiol-----/-----Original Message----- > Oddi wrth/From: Dan Morrill [mailto:[EMAIL PROTECTED] > Anfonwyd/Sent: 28 April 2006 14:30 > At/To: [email protected] > Pwnc/Subject: RE: Heritrix > > Aled, > > I guess the other question is what are you trying to do, for > example, if you need to automate the crawl you can make a > shell script and cron it (well ok, I am using task manager). > If you want to watch the logs on the screen in a terminal > window, you can tail -f crawl.log it (I am using wintail), I > am more than happy to help if you want to automate your nutch jobs. > > I automated as much as I could on those processes that I > wanted nutch to do, and it sits quietly in the corner doing > all the work, merging, indexing, rebuilding, stopping and > starting tomcat, so it is possible to automate nutch so that > it is 90% stand alone by scripting. Although, its all windows > scripting, I am not running on linux, I have no linux scripts. > > r/d > > -----Original Message----- > From: Aled Jones [mailto:[EMAIL PROTECTED] > Sent: Friday, April 28, 2006 6:14 AM > To: [email protected] > Subject: ATB: Heritrix > > Thanks for your replies guys. I hadn't realised that the > admin gui was already in development. > We should be able to cope till it gets released ;-) > > Thanks again > Aled > > > -----Neges Wreiddiol-----/-----Original Message----- Oddi > wrth/From: > > Dan Morrill [mailto:[EMAIL PROTECTED] > > Anfonwyd/Sent: 28 April 2006 14:07 > > At/To: [email protected] > > Pwnc/Subject: RE: Heritrix > > > > Aled, > > > > I used heritrix before going over to nutch, while it is an > excellent > > program, with lots of good things to offer, it didn't quite meet my > > need, and when designing the architecture had too many dependencies > > for me to be comfortable with. > > > > If you want to run an internet archive though, heritrix can not be > > beat, if you want to run a search engine, nutch is a good choice. > > > > My personal opinion. > > r/d > > > > -----Original Message----- > > From: Aled Jones [mailto:[EMAIL PROTECTED] > > Sent: Friday, April 28, 2006 1:59 AM > > To: [email protected] > > Subject: Heritrix > > > > Hi > > > > Anyone used Heritrix (http://crawler.archive.org/) as a > crawler? How > > does it compare with the Nutch crawler? Can Nutch serve its crawled > > results? Main reason I'm interested is that it has a WUI interface > > that might make maintenance for the IT guys easier, although I know > > that some of you guys are working on an interface. > > > > Cheers > > Aled > > > > > > ########################################### > > > > This message has been scanned by F-Secure Anti-Virus for Microsoft > > Exchange. > > For more information, connect to http://www.f-secure.com/ > > ************************************************************** > > ********** > > This e-mail and any attachments are strictly confidential > and intended > > solely for the addressee. They may contain information which is > > covered by legal, professional or other privilege. If you > are not the > > intended addressee, you must not copy the e-mail or the > attachments, > > or use them for any purpose or disclose their contents to any other > > person. To do so may be unlawful. If you have received this > > transmission in error, please notify us as soon as possible > and delete > > the message and attachments from all places in your computer where > > they are stored. > > > > Although we have scanned this e-mail and any attachments > for viruses, > > it is your responsibility to ensure that they are actually > virus free. > > > > > > > > > ########################################### > > This message has been scanned by F-Secure Anti-Virus for > Microsoft Exchange. > For more information, connect to http://www.f-secure.com/ > > ************************************************************** > ********** > This e-mail and any attachments are strictly confidential and intended > solely for the addressee. They may contain information which > is covered by > legal, professional or other privilege. If you are not the intended > addressee, you must not copy the e-mail or the attachments, > or use them for > any purpose or disclose their contents to any other person. > To do so may be > unlawful. If you have received this transmission in error, > please notify us > as soon as possible and delete the message and attachments > from all places > in your computer where they are stored. > > Although we have scanned this e-mail and any attachments for > viruses, it is > your responsibility to ensure that they are actually virus free. > > > = > > ########################################### This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange. For more information, connect to http://www.f-secure.com/ ************************************************************************ This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored. Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free. ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
