On Thursday 02 January 2003 17:01, Geoff Hutchison wrote: > On Wed, 1 Jan 2003, Michael Schuerig wrote: > > I'm using htdig (3.1.5) to index a local collection of documents > > (html, > > First, we highly suggest upgrading to 3.1.6, for stability and > security reasons:
I've updated in the meantime. I had kept the older version for use with Qt docs, but as I found out, the requisite patch is applied in the debian package for 3.1.6. > > an existing index with htdig and htmerge. But I found that htdig > > seems to parse every document again and for this has to conver pdf > > and ps files to text again. This conversion is pretty > > time-consuming. > > This is strange. If you're doing an update dig, htdig will send the > modification time to the server in an If-Modified-Since header. > Apache recognizes this and should not send a document unless it's > been modified. > > > start_url: http://localdocs/ > > local_urls: http://localdocs/=/pub/doc/ > > Hmm. I'm currently on vacation, so it's hard for me to check,but I > wonder if the local_urls feature isn't checking the modification > dates on the drive before reindexing. :-( I've just tried without using local_urls and it made no difference. First htdig -i -s -v, then htdig -a -s -v. I cancelled the second run when it started to convert the first PDF file. > One benefit/workaround in 3.1.6. Added to htdig is the -m flag, which > allows you to index only a set of URLs. > > http://www.htdig.org/htdig.html > > So you could use 'find' to generate a list of paths to new or > modified files, write it to a file and generate a list of "URLs" for > indexing. Yes, I think I'll do that. It's probably a lot faster anyway. Thanks for your help and enjoy your holidays. Michael -- Michael Schuerig If at first you don't succeed... mailto:[EMAIL PROTECTED] try, try again. http://www.schuerig.de/michael/ --Jerome Morrow, "Gattaca" ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

