I was able to index about a 100000 pages in less than a day on a 
machine with 512meg of memory, on a 256meg machine it had only
done 50000 pages after two days. The indexing process does
seem very memory intensive if you want decent performance, I'm
not sure what can be done about it though, it seems to be
just lack of locality of reference into the db.words.db file.

I believe there are plans to checksum pages so as to reject
aliases (duplicates), how is that going?  It is really something
of an administrative nightmare to deal with a large site without it.

I'm also getting close to the 2Gig file size limit on my
words.db file, is there any strucural reason that it
couldn't be split into multiple files?
-- 
 Toivo Pedaste                        Email:  [EMAIL PROTECTED]
 University Computing Services,       Phone:  +61 8 9 380 2605
 University of Western Australia      Fax:    +61 8 9 380 1109
"The time has come", the Walrus said, "to talk of many things"...

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to