Geoff Hutchison writes:

      One additional word. The 3.2 index structure will make it possible
to index large amount of data. But the way the crawler works still prevents
updating a large amount of URLs. At present an update tries again *all*
URLs. It should have some heuristics to say something like : try this URL
only 15 days after a successfull loading, arrange URL updates so that a 
maximum of N URLs is checked every day to prevent saturating the bandwidth
etc..

 > 
 > On Mon, 16 Aug 1999, William Freman wrote:
 > 
 > > memory and disk space...what about a a quad PIII/550 Xenon with
 > > 2GB RAM with a 5TB RAID array on a T3.  somthing such as that would take
 > > away a good amount of the theoretical hindermets.  with those out of the
 > > way, would it be possible to index the web?
 > 
 > I still wouldn't recommend it. It's only recently that we've been
 > receiving feedback on scaling to huge (i.e. 500,000+ URLs) indexes. In
 > particular, the 3.1.x series requires the htmerge phase and that requires
 > sorting the word database. For even modest-sized databases, it can take an
 > enormous amount of RAM to sort.
 > 
 > The 3.2 development code should help with some of these performance
 > bottlenecks.
 > 
 > -Geoff Hutchison
 > Williams Students Online
 > http://wso.williams.edu/
 > 
 > 
 > ------------------------------------
 > To unsubscribe from the htdig3-dev mailing list, send a message to
 > [EMAIL PROTECTED] containing the single word "unsubscribe" in
 > the SUBJECT of the message.

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to