>From Slashdot:

 "It looks like the Internet Archive, which hosts the infamous Wayback
Machine has opened its newest in-development crawler code under the LGPL.
>From the announcement: 'Heritrix is the Internet Archive's open-source,
extensible, web-scale, archival-quality web crawler project. Heritrix
(sometimes spelled heretrix , or misspelled or missaid as heratrix /
heritix / heretix / heratix) is an archaic word for inheritess. Since our
crawler seeks to collect the digital artifacts of our culture for the
benefit of future researchers and generations, this name seemed apt.'"

http://crawler.archive.org/

I don't know how you guys feel, but it may be worth considering to migrate
to this codebase for our crawling functions.  The one simple motivation
is that we have enough other work to do..... if we run into significant
problems with our crawler then we think about migrating.

I don't propose we do this now, but keep it as an option.

Eh?

Neal Richter
Knowledgebase Developer
RightNow Technologies, Inc.
Customer Service for Every Web Site
Office: 406-522-1485




-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to