Re: [htdig] avoiding to re-index present files

Gilles Detillieux Tue, 18 Jan 2000 08:05:21 -0800
According to GOMEZ Henri:
> I've got a project which index a huge number of pdf files.
> 
> PDF files are created each days by some internal process 
> and so I got around 1000 pdf new files each days (and I keep the oldest).
> 
> I dynamically regenerate an index file (index.html) each days in the many
> subdirs 
> where the pdf files are stored.
> 
> How could I tell htdig to only index the newly arrived files ?

As long as you don't use htdig's -i option (i.e. don't just use an
unmodified rundig script for updating), then htdig will only index
new or modified documents.  If you index via the local filesystem,
using local_urls, this will be very quick.  If you index via an HTTP
server, this will still work very well as long as the server honours the
If-Modified-Since header (i.e. returns a 304 status for older documents)
and returns a Last-Modified header.  If the HTTP server does not honour
the If-Modified-Since header, but does return a valid Last-Modified
header, it will still work, but the PDF files will be needlessly re-read
(but not re-indexed) each time.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
Re: [htdig] avoiding to re-index present files

Reply via email to