On Thursday 02 January 2003 17:01, Geoff Hutchison wrote:
> On Wed, 1 Jan 2003, Michael Schuerig wrote:
> > I'm using htdig (3.1.5) to index a local collection of documents
> > (html,
>
> First, we highly suggest upgrading to 3.1.6, for stability and
> security reasons:

I've updated in the meantime. I had kept the older version for use with 
Qt docs, but as I found out, the requisite patch is applied in the 
debian package for 3.1.6.

> > an existing index with htdig and htmerge. But I found that htdig
> > seems to parse every document again and for this has to conver pdf
> > and ps files to text again. This conversion is pretty
> > time-consuming.
>
> This is strange. If you're doing an update dig, htdig will send the
> modification time to the server in an If-Modified-Since header.
> Apache recognizes this and should not send a document unless it's
> been modified.
>
> > start_url:              http://localdocs/
> > local_urls:             http://localdocs/=/pub/doc/
>
> Hmm. I'm currently on vacation, so it's hard for me to check,but I
> wonder if the local_urls feature isn't checking the modification
> dates on the drive before reindexing. :-(

I've just tried without using local_urls and it made no difference. 
First htdig -i -s -v, then htdig -a -s -v. I cancelled the second run 
when it started to convert the first PDF file.

> One benefit/workaround in 3.1.6. Added to htdig is the -m flag, which
> allows you to index only a set of URLs.
>
> http://www.htdig.org/htdig.html
>
> So you could use 'find' to generate a list of paths to new or
> modified files, write it to a file and generate a list of "URLs" for
> indexing.

Yes, I think I'll do that. It's probably a lot faster anyway.

Thanks for your help and enjoy your holidays.

Michael

-- 
Michael Schuerig                  If at first you don't succeed...
mailto:[EMAIL PROTECTED]           try, try again.
http://www.schuerig.de/michael/   --Jerome Morrow, "Gattaca"



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to