On Wed, 1 Jan 2003, Michael Schuerig wrote:

> I'm using htdig (3.1.5) to index a local collection of documents (html, 

First, we highly suggest upgrading to 3.1.6, for stability and security
reasons:

http://www.htdig.org/RELEASE.html

> an existing index with htdig and htmerge. But I found that htdig seems 
> to parse every document again and for this has to conver pdf and ps 
> files to text again. This conversion is pretty time-consuming.

This is strange. If you're doing an update dig, htdig will send the
modification time to the server in an If-Modified-Since header. Apache
recognizes this and should not send a document unless it's been modified.

> start_url:              http://localdocs/
> local_urls:             http://localdocs/=/pub/doc/

Hmm. I'm currently on vacation, so it's hard for me to check,but I wonder
if the local_urls feature isn't checking the modification dates on the
drive before reindexing. :-(

One benefit/workaround in 3.1.6. Added to htdig is the -m flag, which
allows you to index only a set of URLs.

http://www.htdig.org/htdig.html

So you could use 'find' to generate a list of paths to new or modified
files, write it to a file and generate a list of "URLs" for indexing.

--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to