According to Nica Huestegge:
> I am trying to use ht://dig to index a newssite. Therefore I have to index a
> big amount of pages and daily index some more articles.
> The news-archive has different folders for each day, like
> ...archive/2001/10/15/ or something.
> I am thinking of writing a script (maybe in Perl?) that runs in the evening
> and indexes the folder with the actual date and merges it with the "main"
> index-database. Did anyone of you try this before? Any ideas what
> difficulties might await me?  And what do I have to do with the config-files
> than?

I've never done this myself, but from the messages I've seen on the list,
it's a fairly commonly used approach to indexing mailing list archives
when they're to big and slow to do update digs on the full archive.

If you're using htdig 3.1.5 or older, there are bugs in htmerge that
can cause the loss of some words in the word index after merging
two databases.  There are patches for this on the ftp.ccsf.org site,
but you may be better off just grabbing the latest 3.1.6 development
snapshot from http://www.htdig.org/files/snapshots/ instead, to get all
the recent bug fixes.

You'll need two different config files, that have different settings of
database_dir or database_base, for the main and current day databases.
Then, you use the -m option to htmerge to merge the current day database
into the main one.  (See http://www.htdig.org/FAQ.html#q4.4 and 4.5)

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to