Re: [htdig] Strange Question

Gilles Detillieux Thu, 14 Nov 2002 08:59:18 -0800

According to Tod Thomas:
> On one of our machines we have migrated to a different search
> technology.  However, I have constructed a real nice dead link report
> from the htdig log and want to keep that running.  The only problem is I
> have to keep re-indexing the site which takes up a LOT of space for the
> db files.  I tried symlinking them to /dev/null but it fails, which I
> kind of expected.
> 
> Is there a way to run htdig but not have it create its databases just so
> I can get the log created.  I know this might sound like over kill - I
> could probably do something along the same lines with wget, or some
> other spidering program - but I've got enough customization in place to
> make it a little painful to explore alternatives.
> 
> Any ides?


htdig needs its docdb to track what it's doing, but you can
drastically reduce the size of that file by setting max_head_length
and max_meta_description_length to 0, and set minimum_word_length to
something large.  Also, if you're using 3.1.x, you can probably safely
link db.wordlist to /dev/null to avoid keeping any words.  I'm assuming
you're not running htmerge after htdig in this case.

See also http://www.htdig.org/FAQ.html#q4.6

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Strange Question

Reply via email to