At 8:53 AM -0400 6/28/00, Terry Luedtke wrote:
>There are other ways to improve the speed while still using
>BerkeleyDB (or any other db for that matter). The ability to run
>concurrent digs into the same database for one. An htsearch that
>stays in memory, similar to fast-cgi programs, for another.
Yup. This is one reason that the 3.2 code uses this new database
layout. (It's hard to say "format" since it's still based on Berkeley
DB, but it's storing the data in a different fashion.) The htdig
crawler now generates databases in htsearch-ready format. Granted, if
there are likely to be a large number of bad URLs or changed
documents, it's a good idea to run the "htpurge" program to remove
them.
The problem with concurrent digs into the same database is that it
requires careful locking of writes to make sure the threads or
processes do not change the same data. It's probably more useful to
allow htsearch to browse through "collections" of data.
Yes, there are also loads of ways of speeding up htsearch even
without converting it into a persistent CGI/servlet. Caching, of
course, would help significantly and I'm committed to having that
implemented.
I don't want to get too deep into a database-format discussion on
this list. Personally, I think it would be great to have some SQL
support if people choose to try that out. But so far, no one has
submitted patches to the current 3.2 CVS tree to my knowledge. *I'm*
certainly the last one to do it--I still have to finish up writing
the new htsearch query parser!
-Geoff
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.