--- Jim wrote:
> On Mon, 22 Mar 2004, Jonathan B. Bayer wrote:
> 
> > HTdig is indexing several directories on my
> system.  Some of these
> > directories have over 100,000 documents.  A few
> get added each day.
> > 
> > Right now htdig reindexes all the documents each
> night.  Is it possible
> > to have it ignore documents that it has already
> indexed?
> 
> How are you running htdig? Typically unless you
> provide a -i to
> htdig it tries to perform just an update. The rundig
> script uses
> the -i option by default, so you might want to look
> at editing
> that script if you are using it for your digs.
> 
> Jim

In addition, if your pages are dynamic you will have
some additional headaches because htdig checks the
date (and filesize?) of the document to see if the
page has changed. If it has, it will be reindexed. If
all 100K pages report a different timestamp then when
initially indexed, they'll be queued for an update.

You can use the -m (minimal) option for htdig which
allows you to feed a list of URLs to htdig. 
http://htdig.sourceforge.net/htdig.html

If you are using 3.2.0b5 then read this thread. Gilles
 presents a useful workaround for feeding URLs via
STDIN
http://sourceforge.net/mailarchive/forum.php?thread_id=3457663&forum_id=2688

I use this in my content management system to index a
new article when it is published. URLs are immediately
searchable - very cool. 

Josh

__________________________________
Do you Yahoo!?
Yahoo! Finance Tax Center - File online. File on time.
http://taxes.yahoo.com/filing.html


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to