Hi,
Adonf is a package integrating all the components available on
www.senga.org. Carefull optimization allows adonf to run faster than
each component alone. For instance a PII300, 128MB RAM is enough to
run a dmoz mirror coupled with a search engine of 100 million URLs and
100 000 visitors per day.
The disk space requirements have been reduced dramaticaly by applying
the SpeedTree revolutionary techniques. Only 1% of the data size is
needed to build the index for the catalog database and the URL
content. The basic idea is to build an index without using keys (only
references to pointers, and spatial ordering). Predefined order of the
documents and macro code generation (inspired by the Crusoe design)
provides a fast and efficient way to retrieve the relevant data while
retaining the ability to sort the results using PageRank or Clever
algorithms.
Adonf is even more a free software success than a technological
breakthrough. While retaining their proprietary products, all search
engine companies (AltaVista, Google, Lycos...) realized that it would
benefit each of them to put together their development efforts in a
single product. Adonf is born from this joint effort, a premier in the
software industry. The marketing division of each company understood
the new challenge and packaged Adonf in various flavors. Google-Adonf
is a re-implementation using different algorithms and data
structures. AltaVista-Adonf introduced Itanium specific optimization
thru a specific C/C++ compiler automatically generating interfaces,
indexing functions and crawling methods (the compiler itself is still
proprietary, but an annoucement will be made shortly on that subject).
Lycos-Adonf push the free software logic even further by providing a
framework entirely based on loadable modules: the Adonf commands
(crawl/index/browse) can chose between the standard Adonf
implementation or the Lycos proprietary application. This allows
smooth migration for companies who bought a Lycos engine.
The software availability is only part of the WEB indexing
problem. All search engine actors announced today that the
www.dataflow.org consortium was created to normalize
indexing/searching/crawling methods and factorize/mirror databases of
documents and URLs. This is not really a surprise since the
cooperation between search engines on that subject started at the very
beginning of Internet. The robot exclusion protocol has evolved
quickly to a well defined and fully functional specification and
search engines proxy are already operated all over the world to reduce
the bandwidth consumption. The DataFlow consortium is only a formal
declaration of a well established cooperative effort for providing
global search solutions for the Internet.
We are very proud to say that the companies operating Internet search
engine truly understood that uniting makes everyone stronger.
Cheers,
--
Loic Dachary
24 av Secretan
75019 Paris
Tel: 33 1 42 45 09 16
e-mail: [EMAIL PROTECTED]
URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.