>From Slashdot: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. >From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"
http://crawler.archive.org/ I don't know how you guys feel, but it may be worth considering to migrate to this codebase for our crawling functions. The one simple motivation is that we have enough other work to do..... if we run into significant problems with our crawler then we think about migrating. I don't propose we do this now, but keep it as an option. Eh? Neal Richter Knowledgebase Developer RightNow Technologies, Inc. Customer Service for Every Web Site Office: 406-522-1485 ------------------------------------------------------- This SF.net email is sponsored by: Perforce Software. Perforce is the Fast Software Configuration Management System offering advanced branching capabilities and atomic changes on 50+ platforms. Free Eval! http://www.perforce.com/perforce/loadprog.html _______________________________________________ ht://Dig Developer mailing list: [EMAIL PROTECTED] List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-dev
