--- Jim wrote: > On Mon, 22 Mar 2004, Jonathan B. Bayer wrote: > > > HTdig is indexing several directories on my > system. Some of these > > directories have over 100,000 documents. A few > get added each day. > > > > Right now htdig reindexes all the documents each > night. Is it possible > > to have it ignore documents that it has already > indexed? > > How are you running htdig? Typically unless you > provide a -i to > htdig it tries to perform just an update. The rundig > script uses > the -i option by default, so you might want to look > at editing > that script if you are using it for your digs. > > Jim
In addition, if your pages are dynamic you will have some additional headaches because htdig checks the date (and filesize?) of the document to see if the page has changed. If it has, it will be reindexed. If all 100K pages report a different timestamp then when initially indexed, they'll be queued for an update. You can use the -m (minimal) option for htdig which allows you to feed a list of URLs to htdig. http://htdig.sourceforge.net/htdig.html If you are using 3.2.0b5 then read this thread. Gilles presents a useful workaround for feeding URLs via STDIN http://sourceforge.net/mailarchive/forum.php?thread_id=3457663&forum_id=2688 I use this in my content management system to index a new article when it is published. URLs are immediately searchable - very cool. Josh __________________________________ Do you Yahoo!? Yahoo! Finance Tax Center - File online. File on time. http://taxes.yahoo.com/filing.html ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

