According to Nica Huestegge: > I am trying to use ht://dig to index a newssite. Therefore I have to index a > big amount of pages and daily index some more articles. > The news-archive has different folders for each day, like > ...archive/2001/10/15/ or something. > I am thinking of writing a script (maybe in Perl?) that runs in the evening > and indexes the folder with the actual date and merges it with the "main" > index-database. Did anyone of you try this before? Any ideas what > difficulties might await me? And what do I have to do with the config-files > than?
I've never done this myself, but from the messages I've seen on the list, it's a fairly commonly used approach to indexing mailing list archives when they're to big and slow to do update digs on the full archive. If you're using htdig 3.1.5 or older, there are bugs in htmerge that can cause the loss of some words in the word index after merging two databases. There are patches for this on the ftp.ccsf.org site, but you may be better off just grabbing the latest 3.1.6 development snapshot from http://www.htdig.org/files/snapshots/ instead, to get all the recent bug fixes. You'll need two different config files, that have different settings of database_dir or database_base, for the main and current day databases. Then, you use the -m option to htmerge to merge the current day database into the main one. (See http://www.htdig.org/FAQ.html#q4.4 and 4.5) -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

