According to GOMEZ Henri:
> I've got a project which index a huge number of pdf files.
>
> PDF files are created each days by some internal process
> and so I got around 1000 pdf new files each days (and I keep the oldest).
>
> I dynamically regenerate an index file (index.html) each days in the many
> subdirs
> where the pdf files are stored.
>
> How could I tell htdig to only index the newly arrived files ?
As long as you don't use htdig's -i option (i.e. don't just use an
unmodified rundig script for updating), then htdig will only index
new or modified documents. If you index via the local filesystem,
using local_urls, this will be very quick. If you index via an HTTP
server, this will still work very well as long as the server honours the
If-Modified-Since header (i.e. returns a 304 status for older documents)
and returns a Last-Modified header. If the HTTP server does not honour
the If-Modified-Since header, but does return a valid Last-Modified
header, it will still work, but the PDF files will be needlessly re-read
(but not re-indexed) each time.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.