[htdig3-dev] Re: [htdig] 10^6 HTML files sucessfully indexed

Jeff Breidenbach Sat, 26 Jun 1999 03:49:42 -0700



>I'm trying to get a list together of all the users with over around
>500,000 documents in their databases. Since many of the improvements for
>3.2 are targeted at improving scalability, we'd like to hear feedback from
>all of you on the responsiveness of the code.

Here's my feedback:

The million files are unevenly spread across about a thousand
databases, the biggest having about 60k pages. du reports that the
total storage used is about 7.3 gigabytes. That includes the
compressed db.wordlist.work files!  Fantastic.

I'm happy with performance. Searches are quite quick. The only time
I've had to wait for a search was when most of the CPU was doing
something else, and that was still less than 7 seconds. Digging is
fast enough that I can do an update dig daily. Stability is excellent.
No scalability problems thus far.

One more comment - in my case, and perhaps many larger htdig setups,
the collection of pages only gets bigger, as opposed to pages changing
content or their numbers getting smaller. I don't know if that
presents additional optimization opportunities.

Cheers,
Jeff

PS. Looking over the TODO list and the developer pages, there 
are two things that I would take advantage of if implemented. One is
improvements or removal of htmerge. The other is better inter-
nationalization. _Maybe_ looping over multiple databases.
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

[htdig3-dev] Re: [htdig] 10^6 HTML files sucessfully indexed

Reply via email to