According to Michael Schuerig: > On Thursday 02 January 2003 17:01, Geoff Hutchison wrote: > > On Wed, 1 Jan 2003, Michael Schuerig wrote: ... > > > an existing index with htdig and htmerge. But I found that htdig > > > seems to parse every document again and for this has to conver pdf > > > and ps files to text again. This conversion is pretty > > > time-consuming. > > > > This is strange. If you're doing an update dig, htdig will send the > > modification time to the server in an If-Modified-Since header. > > Apache recognizes this and should not send a document unless it's > > been modified. > > > > > start_url: http://localdocs/ > > > local_urls: http://localdocs/=/pub/doc/ > > > > Hmm. I'm currently on vacation, so it's hard for me to check,but I > > wonder if the local_urls feature isn't checking the modification > > dates on the drive before reindexing. :-( > > I've just tried without using local_urls and it made no difference. > First htdig -i -s -v, then htdig -a -s -v. I cancelled the second run > when it started to convert the first PDF file.
I'm pretty certain that the local_urls support does proper modification time checking anyway, so I wouldn't have expected this to be the problem. The problem is that you're using -a for the second run of htdig, but not for the initial run, so there are no *.work files already in place for the second run to update. With no existing .work databases, htdig has no choice but to index again from scratch. You should either run the initial htdig run with -a as well, or after the initial run, you should copy db.docdb and db.wordlist to their *.work counterparts. In either case, for htdig 3.1.x, you must always run htmerge after each run of htdig, even before you attempt another htdig update run. See contrib/examples/rundig.sh in the 3.1.6 source directory for an example of an update script that uses alternate work files. It expects that some of these .work files (db.docdb.work and db.wordlist.work) stick around for the next run. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

